|
|
||
|---|---|---|
| doc | ||
| paper | ||
| .gitignore | ||
| LICENSE | ||
| README.md | ||
README.md
OPT: Operational Premise Taxonomy for AI Systems
This repository collects:
- The LaTeX manuscript defining the Operational Premise Taxonomy (OPT) and the OPT‐Code convention.
- Prompt sets for classifying AI systems into OPT mechanisms using large language models (LLMs).
- A small Python library and scripts to run an end‐to‐end Classifier → Evaluator → Adjudicator pipeline.
- A hand‐annotated gold test suite of systems (backprop, GA, A*, rule‐based expert systems, PSO, AIS, etc.).
- Example JSONL/YAML audit logs for storing OPT classifications.
The core idea: classify AI implementations by their operative mechanism (learning, evolution, symbolic reasoning, probabilistic inference, search, control, swarm, or hybrids), while explicitly separating that from execution details (parallelism, pipelines, hardware).
1. Repository layout
Operational-Premise-Taxonomy/
├── README.md # This file
├── LICENSE
├── .gitignore
├── Makefile # Top-level convenience targets
│
├── paper/ # LaTeX sources for the OPT paper
│ ├── main.tex # arXiv/general article format
│ ├── main_ieee.tex # IEEE two-column wrapper
│ ├── main_acm.tex # ACM-style wrapper
│ ├── main_kaobook.tex # Book-style wrapper
│ ├── body_shared.tex # Shared main content
│ ├── related-work.tex # Related work section
│ ├── appendix_opt_prompts.tex
│ ├── appendix_prompt_minimal.tex
│ ├── appendix_prompt_maximal.tex
│ ├── appendix_prompt_evaluator.tex
│ ├── figures/ # TikZ/PGFPlots figures
│ │ ├── opt_radar_1.tikz
│ │ ├── opt_radar_2.tikz
│ │ └── opt_eval_pipeline.tikz
│ ├── Makefile # Build main.pdf, IEEE/ACM variants
│ └── bib/
│ └── references.bib
│
├── prompts/ # Plain-text LLM prompts
│ ├── minimal_classifier_prompt.txt
│ ├── maximal_classifier_prompt.txt
│ ├── evaluator_prompt.txt
│ └── adjudicator_prompt.txt
│
├── opt_eval/ # Python library for OPT classification/evaluation
│ ├── __init__.py
│ ├── opt_prompts.py # Utility to load prompt text
│ ├── opt_pipeline.py # Data classes + run_pipeline + parsers
│ ├── model_client.py # Abstraction over your local/remote LLM endpoint
│ ├── cli.py # CLI entrypoint for simple use
│ └── tests/
│ ├── __init__.py
│ ├── test_parsers.py
│ ├── test_gold_suite.py
│ └── data/
│ ├── gold_opt.yaml
│ └── gold_opt.jsonl
│
├── data/
│ ├── gold/
│ │ ├── opt_gold.yaml # Canonical gold test suite
│ │ └── opt_gold.jsonl
│ └── examples/
│ ├── opt_audit_example.jsonl
│ └── opt_audit_example.yaml
│
├── scripts/
│ ├── run_eval_pipeline.py
│ └── export_gold_to_jsonl.py
│
└── docs/
├── usage.md
├── schema_opt_audit.md
└── model_notes_local_llm.md
2. Building the paper
The paper lives in paper/ and is structured to support multiple venues (arXiv, IEEE, ACM, book‐style).
Prerequisites
-
A reasonably recent TeX Live (or MikTeX) with:
pgfplots(withpolarlibrary),newtxtext,newtxmath,booktabs,longtable,framed,fancyvrb, etc.
-
latexmkandmake.
Typical build
From the repository root:
cd paper
make # builds main.pdf by default
# Or explicitly:
make main.pdf
# For an IEEE variant:
make main_ieee.pdf
# For ACM:
make main_acm.pdf
If you run into font or pgfplots compat warnings, consult comments at the top of main.tex and body_shared.tex (we assume \pgfplotsset{compat=1.18} and \usepackage{newtxtext,newtxmath}).
3. Python OPT evaluation pipeline
The opt_eval package provides:
-
Data classes for candidate classifications, evaluator results, and adjudications.
-
Parsers for extracting OPT lines and rationales from LLM output.
-
A
run_pipelinefunction that wires together:- Classifier A and B,
- Evaluator,
- Adjudicator,
- and returns a structured result suitable for JSONL/YAML logging.
3.1 Installation (local dev)
Option 1: editable install with pip:
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .
# or, if you don’t define setup.cfg/pyproject:
pip install pyyaml
Option 2: just use it in-place with PYTHONPATH:
export PYTHONPATH=$PWD
3.2 Configuring a local LLM
You must implement opt_eval/model_client.py to talk to your model(s). A typical pattern:
- For an OpenAI-compatible HTTP endpoint (local or remote), use
requestsoropenaiclient. - For Ollama or llamafile, call
http://localhost:11434or similar.
model_client.call_model(system_prompt, user_content, model="local-llm") should:
- Send
system_promptas the system role (if your API supports it). - Send
user_contentas the user content. - Return the raw text content of the model’s reply.
Once implemented, you can run the pipeline on a simple description.
4. Quickstart: Running the evaluation pipeline
Minimal example (from repo root, after configuring model_client.py):
python scripts/run_eval_pipeline.py << 'EOF'
This system trains a fully-connected neural network on MNIST using SGD and
cross-entropy loss, and then uses the trained weights for inference only.
EOF
A typical JSON-like output will include:
candidate_a,candidate_beval_a,eval_bfinal(final OPT-Code and rationale)adjudication(if performed)
You can adapt run_eval_pipeline.py to write JSONL to data/examples/opt_audit_example.jsonl.
5. Gold test suite and benchmarking
The directory data/gold/ contains a small hand‐annotated test suite (opt_gold.yaml and opt_gold.jsonl) covering:
- Backprop MLP on MNIST (Lrn),
- GA for TSP (Evo),
- A* gridworld planner (Sch),
- Rule-based expert system like XCON (Sym),
- Bayesian network for fault diagnosis (Prb),
- Deep Q-Network for Atari (Lrn),
- PID + Kalman filter drone control (Ctl),
- PSO for hyperparameter tuning (Swm),
- Immune negative-selection anomaly detection (Evo/Sch+Prb),
- Three-stage hybrid: GA → rule pruning → Bayesian classifier (Evo/Sym/Prb).
To run tests (after you’ve wired up model_client.py):
pytest opt_eval/tests
test_gold_suite.py will:
- Call the classifier prompt(s) on each gold description.
- Compare predicted OPT roots against the gold OPT‐Code.
- Optionally compute partial-match metrics (Jaccard similarity of root sets) and simple accuracy.
6. JSONL/YAML audit logs
For large-scale use, we recommend JSONL or YAML for storing evaluations.
- Example JSONL audit:
data/examples/opt_audit_example.jsonl - Example YAML audit:
data/examples/opt_audit_example.yaml
Each record includes:
id,descriptioncandidates(A, B)evaluations(verdicts, scores)adjudicationfinal(final OPT-Code)meta(timestamps, model IDs, etc.)
See docs/schema_opt_audit.md for field descriptions.
7. Using smaller local LLMs
OPT classification needs:
- Understanding of code/algorithm descriptions.
- Solid instruction-following.
- Ability to respect a fairly structured output format.
Models that are feasible to run locally and are good candidates:
-
LLaMA 3 8B Instruct Good general reasoning and code understanding; works well as Classifier, Evaluator, and Adjudicator if VRAM allows.
-
Mistral 7B Instruct (and compatible fine-tunes like Dolphin, OpenHermes) Strong general-purpose local model with solid coding and instruction-following; good as a classifier.
-
Qwen2 7B / 14B Instruct 7B is a capable all-rounder; 14B (if you can run it) is strong for the evaluator/adjudicator roles.
-
Phi-3-mini (3.8B) Instruct Smaller footprint; may work as a classifier on simpler cases. For nuanced hybrid systems (Evo/Sym/Prb, Swm vs Evo, Ctl vs Prb), you may want a larger model as evaluator/adjudicator.
A reasonable starting configuration:
- Classifier A:
llama3-8b-instruct - Classifier B:
mistral-7b-instruct - Evaluator:
qwen2-14b-instruct(if available) orllama3-8b-instruct - Adjudicator: same as Evaluator
You can also run all roles on the same 7–8B model if resources are constrained; the explicit prompts and the evaluator rubric are designed to catch many misclassifications.
See docs/model_notes_local_llm.md for more detailed notes on deployment options (Ollama, llamafile, vLLM, etc.) and recommended quantization levels.
8. Citing
Once the OPT paper is on arXiv or accepted somewhere, include a BibTeX entry like:
@article{Elsberry_OPT_2025,
author = {Wesley R. Elsberry and N.~Collaborators},
title = {Operational Premise Taxonomy (OPT): Mechanism-Level Classification of AI Systems},
journal = {arXiv preprint},
year = {2025},
eprint = {XXXX.YYYYY},
archivePrefix = {arXiv}
}
(Replace with the actual venue and identifier when available.)
9. Contributing
- Extend the gold test suite (YAML + JSONL) with more systems and hybrids.
- Add additional prompts (e.g., language-specific variants for Python-only code, RL-specific prompts).
- Improve the parsing logic or add better metrics (confusion matrices, root-wise F1).
- Open issues for any misclassifications that recur: they can inform future revisions of prompts and possibly the taxonomy itself.
Pull requests that add well-documented examples, tests, or tooling around OPT are welcome.