Operational-Premise-Taxonomy/paper/pieces/app-opt-storage-logs.tex

\subsection{Storage Formats for OPT Audit Logs}

For large-scale or longitudinal use, we recommend storing OPT classifications
and evaluations in a machine-readable log format. Two practical options are:

\paragraph{JSON Lines (JSONL).}
Each line contains a single JSON object describing one system evaluation,
including:
\begin{itemize}
    \item system identifier and textual description,
    \item candidate OPT--Codes and rationales,
    \item evaluator verdicts, scores, and issue summaries,
    \item adjudicator decisions and final OPT--Code,
    \item timestamps, model identifiers, and prompt variants.
\end{itemize}
JSONL is convenient for streaming pipelines, command-line tools, and map--reduce
processing.

\paragraph{YAML.}
YAML provides more human-friendly syntax and supports comments. It is useful for
curated datasets or hand-edited corpora of OPT--annotated systems. The same
fields as above can be stored in a nested structure, with separate top-level
keys for \texttt{description}, \texttt{candidates}, \texttt{evaluations},
\texttt{adjudication}, and \texttt{metadata}.

\paragraph{Schema.}
A minimal schema for either JSONL or YAML includes:
\begin{itemize}
    \item \texttt{id}: unique system identifier,
    \item \texttt{description}: text or reference to source code,
    \item \texttt{candidates}: list of OPT--Codes and rationales,
    \item \texttt{evaluations}: evaluator outputs for each candidate,
    \item \texttt{adjudication}: final decision and rationale (if any),
    \item \texttt{final}: final OPT--Code and attributes,
    \item \texttt{meta}: timestamps, model versions, prompt names.
\end{itemize}
Such logs support reproducibility, auditability, and downstream statistical
analysis of taxonomy usage and model performance.