Operational-Premise-Taxonomy/paper/pieces/app-prompt-evaluator.tex

102 lines
3.7 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

\section{Appendix: OPT--Code Prompt Specifications}
This appendix collects the prompt formulations used to elicit OPT--Code
classifications from large language models and to evaluate those classifications
for correctness and consistency.
\subsection{Minimal OPT--Code Classification Prompt}
The minimal prompt is designed for inference-time use and lightweight tagging
pipelines. It assumes a basic familiarity with the OPT roots and emphasizes
mechanism-based classification over surface labels.
\begin{quote}\small
\input{appendix_prompt_minimal.tex}
\end{quote}
\subsection{Maximal Expert OPT--Code Classification Prompt}
The maximal prompt elaborates all root definitions, clarifies the treatment of
parallelism and pipelines, and details rules for composition. It is intended for
fine-tuning, high-stakes evaluations, or detailed audit trails.
\begin{quote}\small
\input{appendix_prompt_maximal.tex}
\end{quote}
\subsection{OPT--Code Prompt Evaluator}
The evaluator prompt is a meta-level specification: it assesses whether a given
candidate OPT--Code and rationale respect the OPT taxonomy and associated
guidelines. This enables automated or semi-automated review of classifications
generated by other models or tools.
\begin{quote}\small
\input{appendix_prompt_evaluator.tex}
\end{quote}
\subsection{OPT--Code Prompt Evaluator}
\begin{verbatim}
You are an OPT-Code evaluation assistant. Your job is to check whether a
candidate OPT classification follows the OPT rules and is mechanistically
correct.
Inputs you will be given:
1) System description: a code snippet or project/system description.
2) Candidate OPT-Code line (from another model), of the form:
OPT=<roots>; Rep=<...>; Obj=<...>; Data=<...>; Time=<...>; Human=<...>
3) Candidate rationale: 26 sentences explaining the candidates choice.
You must evaluate the candidate against the following criteria:
(1) Format compliance:
- Does the candidate produce exactly one OPT= line with the correct fields?
- Are the roots valid (Lrn, Evo, Sym, Prb, Sch, Ctl, Swm)?
- Are "+" and "/" used only between valid roots?
(2) Mechanism correctness:
- Do the chosen roots match the operative mechanism in the system description?
- Is there any root that is missing but clearly present?
- Is any root included that is not supported by the description?
(3) Parallelism and pipelines:
- Does the candidate incorrectly treat threads, GPU kernels, async, pipelines,
or distributed infrastructure as OPT mechanisms (e.g., calling something
Swm or Sch only because it is parallel)?
- If so, this is a serious error.
(4) Composition correctness:
- Use "+" only for tightly integrated mechanisms in the same core loop.
- Use "/" only for distinct sequential stages.
- Flag misuse of "+" or "/" if mechanisms are obviously separate or obviously
integrated.
(5) Attribute plausibility:
- Are Rep, Obj, Data, Time, and Human reasonably consistent with the system
description?
- They do not need to be unique, but they must be defensible.
Your output must use the following structure:
Verdict: <PASS | WEAK_PASS | FAIL>
Score: <integer from 0 to 100>
Issues:
- Format: <short comment>
- Mechanism: <short comment>
- Parallelism/Pipelines: <short comment>
- Composition: <short comment>
- Attributes: <short comment>
Summary: <24 sentences giving an overall assessment and key corrections, if any>.
Guidelines:
- PASS means: no major errors; at most minor debatable choices.
- WEAK_PASS means: generally acceptable, but with at least one non-trivial issue
that should be corrected before publication.
- FAIL means: at least one serious misunderstanding of the mechanism, or clear
violation of the parallelism/pipeline rules, or badly wrong roots.
\end{verbatim}