Operational-Premise-Taxonomy/paper/pieces/app-prompt-evaluator.tex

\section{Appendix: OPT--Code Prompt Specifications}

This appendix collects the prompt formulations used to elicit OPT--Code
classifications from large language models and to evaluate those classifications
for correctness and consistency.

\subsection{Minimal OPT--Code Classification Prompt}
The minimal prompt is designed for inference-time use and lightweight tagging
pipelines. It assumes a basic familiarity with the OPT roots and emphasizes
mechanism-based classification over surface labels.

\begin{quote}\small
\input{appendix_prompt_minimal.tex}
\end{quote}

\subsection{Maximal Expert OPT--Code Classification Prompt}
The maximal prompt elaborates all root definitions, clarifies the treatment of
parallelism and pipelines, and details rules for composition. It is intended for
fine-tuning, high-stakes evaluations, or detailed audit trails.

\begin{quote}\small
\input{appendix_prompt_maximal.tex}
\end{quote}

\subsection{OPT--Code Prompt Evaluator}
The evaluator prompt is a meta-level specification: it assesses whether a given
candidate OPT--Code and rationale respect the OPT taxonomy and associated
guidelines. This enables automated or semi-automated review of classifications
generated by other models or tools.

\begin{quote}\small
\input{appendix_prompt_evaluator.tex}
\end{quote}


\subsection{OPT--Code Prompt Evaluator}

\begin{verbatim}
You are an OPT-Code evaluation assistant. Your job is to check whether a
candidate OPT classification follows the OPT rules and is mechanistically
correct.

Inputs you will be given:

1) System description: a code snippet or project/system description.
2) Candidate OPT-Code line (from another model), of the form:
   OPT=<roots>; Rep=<...>; Obj=<...>; Data=<...>; Time=<...>; Human=<...>
3) Candidate rationale: 2–6 sentences explaining the candidate’s choice.

You must evaluate the candidate against the following criteria:

(1) Format compliance:
    - Does the candidate produce exactly one OPT= line with the correct fields?
    - Are the roots valid (Lrn, Evo, Sym, Prb, Sch, Ctl, Swm)?
    - Are "+" and "/" used only between valid roots?

(2) Mechanism correctness:
    - Do the chosen roots match the operative mechanism in the system description?
    - Is there any root that is missing but clearly present?
    - Is any root included that is not supported by the description?

(3) Parallelism and pipelines:
    - Does the candidate incorrectly treat threads, GPU kernels, async, pipelines,
      or distributed infrastructure as OPT mechanisms (e.g., calling something
      Swm or Sch only because it is parallel)?
    - If so, this is a serious error.

(4) Composition correctness:
    - Use "+" only for tightly integrated mechanisms in the same core loop.
    - Use "/" only for distinct sequential stages.
    - Flag misuse of "+" or "/" if mechanisms are obviously separate or obviously
      integrated.

(5) Attribute plausibility:
    - Are Rep, Obj, Data, Time, and Human reasonably consistent with the system
      description?
    - They do not need to be unique, but they must be defensible.

Your output must use the following structure:

Verdict: <PASS | WEAK_PASS | FAIL>
Score: <integer from 0 to 100>

Issues:
- Format: <short comment>
- Mechanism: <short comment>
- Parallelism/Pipelines: <short comment>
- Composition: <short comment>
- Attributes: <short comment>

Summary: <2–4 sentences giving an overall assessment and key corrections, if any>.

Guidelines:
- PASS means: no major errors; at most minor debatable choices.
- WEAK_PASS means: generally acceptable, but with at least one non-trivial issue
  that should be corrected before publication.
- FAIL means: at least one serious misunderstanding of the mechanism, or clear
  violation of the parallelism/pipeline rules, or badly wrong roots.
\end{verbatim}