40 lines
1.7 KiB
TeX
40 lines
1.7 KiB
TeX
\subsection{Storage Formats for OPT Audit Logs}
|
|
|
|
For large-scale or longitudinal use, we recommend storing OPT classifications
|
|
and evaluations in a machine-readable log format. Two practical options are:
|
|
|
|
\paragraph{JSON Lines (JSONL).}
|
|
Each line contains a single JSON object describing one system evaluation,
|
|
including:
|
|
\begin{itemize}
|
|
\item system identifier and textual description,
|
|
\item candidate OPT--Codes and rationales,
|
|
\item evaluator verdicts, scores, and issue summaries,
|
|
\item adjudicator decisions and final OPT--Code,
|
|
\item timestamps, model identifiers, and prompt variants.
|
|
\end{itemize}
|
|
JSONL is convenient for streaming pipelines, command-line tools, and map--reduce
|
|
processing.
|
|
|
|
\paragraph{YAML.}
|
|
YAML provides more human-friendly syntax and supports comments. It is useful for
|
|
curated datasets or hand-edited corpora of OPT--annotated systems. The same
|
|
fields as above can be stored in a nested structure, with separate top-level
|
|
keys for \texttt{description}, \texttt{candidates}, \texttt{evaluations},
|
|
\texttt{adjudication}, and \texttt{metadata}.
|
|
|
|
\paragraph{Schema.}
|
|
A minimal schema for either JSONL or YAML includes:
|
|
\begin{itemize}
|
|
\item \texttt{id}: unique system identifier,
|
|
\item \texttt{description}: text or reference to source code,
|
|
\item \texttt{candidates}: list of OPT--Codes and rationales,
|
|
\item \texttt{evaluations}: evaluator outputs for each candidate,
|
|
\item \texttt{adjudication}: final decision and rationale (if any),
|
|
\item \texttt{final}: final OPT--Code and attributes,
|
|
\item \texttt{meta}: timestamps, model versions, prompt names.
|
|
\end{itemize}
|
|
Such logs support reproducibility, auditability, and downstream statistical
|
|
analysis of taxonomy usage and model performance.
|
|
|