Operational-Premise-Taxonomy/paper/pieces/app-opt-storage-logs.tex

40 lines
1.7 KiB
TeX

\subsection{Storage Formats for OPT Audit Logs}
For large-scale or longitudinal use, we recommend storing OPT classifications
and evaluations in a machine-readable log format. Two practical options are:
\paragraph{JSON Lines (JSONL).}
Each line contains a single JSON object describing one system evaluation,
including:
\begin{itemize}
\item system identifier and textual description,
\item candidate OPT--Codes and rationales,
\item evaluator verdicts, scores, and issue summaries,
\item adjudicator decisions and final OPT--Code,
\item timestamps, model identifiers, and prompt variants.
\end{itemize}
JSONL is convenient for streaming pipelines, command-line tools, and map--reduce
processing.
\paragraph{YAML.}
YAML provides more human-friendly syntax and supports comments. It is useful for
curated datasets or hand-edited corpora of OPT--annotated systems. The same
fields as above can be stored in a nested structure, with separate top-level
keys for \texttt{description}, \texttt{candidates}, \texttt{evaluations},
\texttt{adjudication}, and \texttt{metadata}.
\paragraph{Schema.}
A minimal schema for either JSONL or YAML includes:
\begin{itemize}
\item \texttt{id}: unique system identifier,
\item \texttt{description}: text or reference to source code,
\item \texttt{candidates}: list of OPT--Codes and rationales,
\item \texttt{evaluations}: evaluator outputs for each candidate,
\item \texttt{adjudication}: final decision and rationale (if any),
\item \texttt{final}: final OPT--Code and attributes,
\item \texttt{meta}: timestamps, model versions, prompt names.
\end{itemize}
Such logs support reproducibility, auditability, and downstream statistical
analysis of taxonomy usage and model performance.