102 lines
3.7 KiB
TeX
102 lines
3.7 KiB
TeX
\section{Appendix: OPT--Code Prompt Specifications}
|
||
|
||
This appendix collects the prompt formulations used to elicit OPT--Code
|
||
classifications from large language models and to evaluate those classifications
|
||
for correctness and consistency.
|
||
|
||
\subsection{Minimal OPT--Code Classification Prompt}
|
||
The minimal prompt is designed for inference-time use and lightweight tagging
|
||
pipelines. It assumes a basic familiarity with the OPT roots and emphasizes
|
||
mechanism-based classification over surface labels.
|
||
|
||
\begin{quote}\small
|
||
\input{appendix_prompt_minimal.tex}
|
||
\end{quote}
|
||
|
||
\subsection{Maximal Expert OPT--Code Classification Prompt}
|
||
The maximal prompt elaborates all root definitions, clarifies the treatment of
|
||
parallelism and pipelines, and details rules for composition. It is intended for
|
||
fine-tuning, high-stakes evaluations, or detailed audit trails.
|
||
|
||
\begin{quote}\small
|
||
\input{appendix_prompt_maximal.tex}
|
||
\end{quote}
|
||
|
||
\subsection{OPT--Code Prompt Evaluator}
|
||
The evaluator prompt is a meta-level specification: it assesses whether a given
|
||
candidate OPT--Code and rationale respect the OPT taxonomy and associated
|
||
guidelines. This enables automated or semi-automated review of classifications
|
||
generated by other models or tools.
|
||
|
||
\begin{quote}\small
|
||
\input{appendix_prompt_evaluator.tex}
|
||
\end{quote}
|
||
|
||
|
||
|
||
\subsection{OPT--Code Prompt Evaluator}
|
||
|
||
\begin{verbatim}
|
||
You are an OPT-Code evaluation assistant. Your job is to check whether a
|
||
candidate OPT classification follows the OPT rules and is mechanistically
|
||
correct.
|
||
|
||
Inputs you will be given:
|
||
|
||
1) System description: a code snippet or project/system description.
|
||
2) Candidate OPT-Code line (from another model), of the form:
|
||
OPT=<roots>; Rep=<...>; Obj=<...>; Data=<...>; Time=<...>; Human=<...>
|
||
3) Candidate rationale: 2–6 sentences explaining the candidate’s choice.
|
||
|
||
You must evaluate the candidate against the following criteria:
|
||
|
||
(1) Format compliance:
|
||
- Does the candidate produce exactly one OPT= line with the correct fields?
|
||
- Are the roots valid (Lrn, Evo, Sym, Prb, Sch, Ctl, Swm)?
|
||
- Are "+" and "/" used only between valid roots?
|
||
|
||
(2) Mechanism correctness:
|
||
- Do the chosen roots match the operative mechanism in the system description?
|
||
- Is there any root that is missing but clearly present?
|
||
- Is any root included that is not supported by the description?
|
||
|
||
(3) Parallelism and pipelines:
|
||
- Does the candidate incorrectly treat threads, GPU kernels, async, pipelines,
|
||
or distributed infrastructure as OPT mechanisms (e.g., calling something
|
||
Swm or Sch only because it is parallel)?
|
||
- If so, this is a serious error.
|
||
|
||
(4) Composition correctness:
|
||
- Use "+" only for tightly integrated mechanisms in the same core loop.
|
||
- Use "/" only for distinct sequential stages.
|
||
- Flag misuse of "+" or "/" if mechanisms are obviously separate or obviously
|
||
integrated.
|
||
|
||
(5) Attribute plausibility:
|
||
- Are Rep, Obj, Data, Time, and Human reasonably consistent with the system
|
||
description?
|
||
- They do not need to be unique, but they must be defensible.
|
||
|
||
Your output must use the following structure:
|
||
|
||
Verdict: <PASS | WEAK_PASS | FAIL>
|
||
Score: <integer from 0 to 100>
|
||
|
||
Issues:
|
||
- Format: <short comment>
|
||
- Mechanism: <short comment>
|
||
- Parallelism/Pipelines: <short comment>
|
||
- Composition: <short comment>
|
||
- Attributes: <short comment>
|
||
|
||
Summary: <2–4 sentences giving an overall assessment and key corrections, if any>.
|
||
|
||
Guidelines:
|
||
- PASS means: no major errors; at most minor debatable choices.
|
||
- WEAK_PASS means: generally acceptable, but with at least one non-trivial issue
|
||
that should be corrected before publication.
|
||
- FAIL means: at least one serious misunderstanding of the mechanism, or clear
|
||
violation of the parallelism/pipeline rules, or badly wrong roots.
|
||
\end{verbatim}
|
||
|