# OPT: Operational Premise Taxonomy for AI Systems This repository collects: - The LaTeX manuscript defining the **Operational Premise Taxonomy (OPT)** and the OPT‐Code convention. - Prompt sets for classifying AI systems into OPT mechanisms using large language models (LLMs). - A small Python library and scripts to run an end‐to‐end **Classifier → Evaluator → Adjudicator** pipeline. - A hand‐annotated **gold test suite** of systems (backprop, GA, A*, rule‐based expert systems, PSO, AIS, etc.). - Example JSONL/YAML **audit logs** for storing OPT classifications. The core idea: classify AI implementations by their **operative mechanism** (learning, evolution, symbolic reasoning, probabilistic inference, search, control, swarm, or hybrids), while explicitly separating that from **execution details** (parallelism, pipelines, hardware). --- ## 1. Repository layout ```text Operational-Premise-Taxonomy/ ├── README.md # This file ├── LICENSE ├── .gitignore ├── Makefile # Top-level convenience targets │ ├── paper/ # LaTeX sources for the OPT paper │ ├── main.tex # arXiv/general article format │ ├── main_ieee.tex # IEEE two-column wrapper │ ├── main_acm.tex # ACM-style wrapper │ ├── main_kaobook.tex # Book-style wrapper │ ├── body_shared.tex # Shared main content │ ├── related-work.tex # Related work section │ ├── appendix_opt_prompts.tex │ ├── appendix_prompt_minimal.tex │ ├── appendix_prompt_maximal.tex │ ├── appendix_prompt_evaluator.tex │ ├── figures/ # TikZ/PGFPlots figures │ │ ├── opt_radar_1.tikz │ │ ├── opt_radar_2.tikz │ │ └── opt_eval_pipeline.tikz │ ├── Makefile # Build main.pdf, IEEE/ACM variants │ └── bib/ │ └── references.bib │ ├── prompts/ # Plain-text LLM prompts │ ├── minimal_classifier_prompt.txt │ ├── maximal_classifier_prompt.txt │ ├── evaluator_prompt.txt │ └── adjudicator_prompt.txt │ ├── opt_eval/ # Python library for OPT classification/evaluation │ ├── __init__.py │ ├── opt_prompts.py # Utility to load prompt text │ ├── opt_pipeline.py # Data classes + run_pipeline + parsers │ ├── model_client.py # Abstraction over your local/remote LLM endpoint │ ├── cli.py # CLI entrypoint for simple use │ └── tests/ │ ├── __init__.py │ ├── test_parsers.py │ ├── test_gold_suite.py │ └── data/ │ ├── gold_opt.yaml │ └── gold_opt.jsonl │ ├── data/ │ ├── gold/ │ │ ├── opt_gold.yaml # Canonical gold test suite │ │ └── opt_gold.jsonl │ └── examples/ │ ├── opt_audit_example.jsonl │ └── opt_audit_example.yaml │ ├── scripts/ │ ├── run_eval_pipeline.py │ └── export_gold_to_jsonl.py │ └── docs/ ├── usage.md ├── schema_opt_audit.md └── model_notes_local_llm.md ```` --- ## 2. Building the paper The paper lives in `paper/` and is structured to support multiple venues (arXiv, IEEE, ACM, book‐style). ### Prerequisites * A reasonably recent TeX Live (or MikTeX) with: * `pgfplots` (with `polar` library), * `newtxtext`, `newtxmath`, * `booktabs`, `longtable`, `framed`, `fancyvrb`, etc. * `latexmk` and `make`. ### Typical build From the repository root: ```bash cd paper make # builds main.pdf by default # Or explicitly: make main.pdf # For an IEEE variant: make main_ieee.pdf # For ACM: make main_acm.pdf ``` If you run into font or pgfplots `compat` warnings, consult comments at the top of `main.tex` and `body_shared.tex` (we assume `\pgfplotsset{compat=1.18}` and `\usepackage{newtxtext,newtxmath}`). --- ## 3. Python OPT evaluation pipeline The `opt_eval` package provides: * Data classes for candidate classifications, evaluator results, and adjudications. * Parsers for extracting OPT lines and rationales from LLM output. * A `run_pipeline` function that wires together: * Classifier A and B, * Evaluator, * Adjudicator, * and returns a structured result suitable for JSONL/YAML logging. ### 3.1 Installation (local dev) Option 1: editable install with `pip`: ```bash python -m venv .venv source .venv/bin/activate pip install -U pip pip install -e . # or, if you don’t define setup.cfg/pyproject: pip install pyyaml ``` Option 2: just use it in-place with `PYTHONPATH`: ```bash export PYTHONPATH=$PWD ``` ### 3.2 Configuring a local LLM You must implement `opt_eval/model_client.py` to talk to your model(s). A typical pattern: * For an OpenAI-compatible HTTP endpoint (local or remote), use `requests` or `openai` client. * For **Ollama** or **llamafile**, call `http://localhost:11434` or similar. `model_client.call_model(system_prompt, user_content, model="local-llm")` should: 1. Send `system_prompt` as the system role (if your API supports it). 2. Send `user_content` as the user content. 3. Return the raw text content of the model’s reply. Once implemented, you can run the pipeline on a simple description. --- ## 4. Quickstart: Running the evaluation pipeline Minimal example (from repo root, after configuring `model_client.py`): ```bash python scripts/run_eval_pipeline.py << 'EOF' This system trains a fully-connected neural network on MNIST using SGD and cross-entropy loss, and then uses the trained weights for inference only. EOF ``` A typical JSON-like output will include: * `candidate_a`, `candidate_b` * `eval_a`, `eval_b` * `final` (final OPT-Code and rationale) * `adjudication` (if performed) You can adapt `run_eval_pipeline.py` to write JSONL to `data/examples/opt_audit_example.jsonl`. --- ## 5. Gold test suite and benchmarking The directory `data/gold/` contains a small hand‐annotated test suite (`opt_gold.yaml` and `opt_gold.jsonl`) covering: * Backprop MLP on MNIST (Lrn), * GA for TSP (Evo), * A* gridworld planner (Sch), * Rule-based expert system like XCON (Sym), * Bayesian network for fault diagnosis (Prb), * Deep Q-Network for Atari (Lrn), * PID + Kalman filter drone control (Ctl), * PSO for hyperparameter tuning (Swm), * Immune negative-selection anomaly detection (Evo/Sch+Prb), * Three-stage hybrid: GA → rule pruning → Bayesian classifier (Evo/Sym/Prb). To run tests (after you’ve wired up `model_client.py`): ```bash pytest opt_eval/tests ``` `test_gold_suite.py` will: * Call the classifier prompt(s) on each gold description. * Compare predicted OPT roots against the gold OPT‐Code. * Optionally compute partial-match metrics (Jaccard similarity of root sets) and simple accuracy. --- ## 6. JSONL/YAML audit logs For large-scale use, we recommend JSONL or YAML for storing evaluations. * Example JSONL audit: `data/examples/opt_audit_example.jsonl` * Example YAML audit: `data/examples/opt_audit_example.yaml` Each record includes: * `id`, `description` * `candidates` (A, B) * `evaluations` (verdicts, scores) * `adjudication` * `final` (final OPT-Code) * `meta` (timestamps, model IDs, etc.) See `docs/schema_opt_audit.md` for field descriptions. --- ## 7. Using smaller local LLMs OPT classification needs: * Understanding of code/algorithm descriptions. * Solid instruction-following. * Ability to respect a fairly structured output format. Models that are feasible to run locally and are good candidates: * **LLaMA 3 8B Instruct** Good general reasoning and code understanding; works well as Classifier, Evaluator, and Adjudicator if VRAM allows. * **Mistral 7B Instruct** (and compatible fine-tunes like Dolphin, OpenHermes) Strong general-purpose local model with solid coding and instruction-following; good as a classifier. * **Qwen2 7B / 14B Instruct** 7B is a capable all-rounder; 14B (if you can run it) is strong for the evaluator/adjudicator roles. * **Phi-3-mini (3.8B) Instruct** Smaller footprint; may work as a classifier on simpler cases. For nuanced hybrid systems (Evo/Sym/Prb, Swm vs Evo, Ctl vs Prb), you may want a larger model as evaluator/adjudicator. A reasonable starting configuration: * Classifier A: `llama3-8b-instruct` * Classifier B: `mistral-7b-instruct` * Evaluator: `qwen2-14b-instruct` (if available) or `llama3-8b-instruct` * Adjudicator: same as Evaluator You can also run all roles on the same 7–8B model if resources are constrained; the explicit prompts and the evaluator rubric are designed to catch many misclassifications. See `docs/model_notes_local_llm.md` for more detailed notes on deployment options (Ollama, llamafile, vLLM, etc.) and recommended quantization levels. --- ## 8. Citing Once the OPT paper is on arXiv or accepted somewhere, include a BibTeX entry like: ```bibtex @article{Elsberry_OPT_2025, author = {Wesley R. Elsberry and N.~Collaborators}, title = {Operational Premise Taxonomy (OPT): Mechanism-Level Classification of AI Systems}, journal = {arXiv preprint}, year = {2025}, eprint = {XXXX.YYYYY}, archivePrefix = {arXiv} } ``` (Replace with the actual venue and identifier when available.) --- ## 9. Contributing * Extend the gold test suite (YAML + JSONL) with more systems and hybrids. * Add additional prompts (e.g., language-specific variants for Python-only code, RL-specific prompts). * Improve the parsing logic or add better metrics (confusion matrices, root-wise F1). * Open issues for any misclassifications that recur: they can inform future revisions of prompts and possibly the taxonomy itself. Pull requests that add well-documented examples, tests, or tooling around OPT are welcome. ```