Operational-Premise-Taxonomy/README.md

303 lines
9.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OPT: Operational Premise Taxonomy for AI Systems
This repository collects:
- The LaTeX manuscript defining the **Operational Premise Taxonomy (OPT)** and the OPTCode convention.
- Prompt sets for classifying AI systems into OPT mechanisms using large language models (LLMs).
- A small Python library and scripts to run an endtoend **Classifier → Evaluator → Adjudicator** pipeline.
- A handannotated **gold test suite** of systems (backprop, GA, A*, rulebased expert systems, PSO, AIS, etc.).
- Example JSONL/YAML **audit logs** for storing OPT classifications.
The core idea: classify AI implementations by their **operative mechanism** (learning, evolution, symbolic reasoning, probabilistic inference, search, control, swarm, or hybrids), while explicitly separating that from **execution details** (parallelism, pipelines, hardware).
---
## 1. Repository layout
```text
Operational-Premise-Taxonomy/
├── README.md # This file
├── LICENSE
├── .gitignore
├── Makefile # Top-level convenience targets
├── paper/ # LaTeX sources for the OPT paper
│ ├── main.tex # arXiv/general article format
│ ├── main_ieee.tex # IEEE two-column wrapper
│ ├── main_acm.tex # ACM-style wrapper
│ ├── main_kaobook.tex # Book-style wrapper
│ ├── body_shared.tex # Shared main content
│ ├── related-work.tex # Related work section
│ ├── appendix_opt_prompts.tex
│ ├── appendix_prompt_minimal.tex
│ ├── appendix_prompt_maximal.tex
│ ├── appendix_prompt_evaluator.tex
│ ├── figures/ # TikZ/PGFPlots figures
│ │ ├── opt_radar_1.tikz
│ │ ├── opt_radar_2.tikz
│ │ └── opt_eval_pipeline.tikz
│ ├── Makefile # Build main.pdf, IEEE/ACM variants
│ └── bib/
│ └── references.bib
├── prompts/ # Plain-text LLM prompts
│ ├── minimal_classifier_prompt.txt
│ ├── maximal_classifier_prompt.txt
│ ├── evaluator_prompt.txt
│ └── adjudicator_prompt.txt
├── opt_eval/ # Python library for OPT classification/evaluation
│ ├── __init__.py
│ ├── opt_prompts.py # Utility to load prompt text
│ ├── opt_pipeline.py # Data classes + run_pipeline + parsers
│ ├── model_client.py # Abstraction over your local/remote LLM endpoint
│ ├── cli.py # CLI entrypoint for simple use
│ └── tests/
│ ├── __init__.py
│ ├── test_parsers.py
│ ├── test_gold_suite.py
│ └── data/
│ ├── gold_opt.yaml
│ └── gold_opt.jsonl
├── data/
│ ├── gold/
│ │ ├── opt_gold.yaml # Canonical gold test suite
│ │ └── opt_gold.jsonl
│ └── examples/
│ ├── opt_audit_example.jsonl
│ └── opt_audit_example.yaml
├── scripts/
│ ├── run_eval_pipeline.py
│ └── export_gold_to_jsonl.py
└── docs/
├── usage.md
├── schema_opt_audit.md
└── model_notes_local_llm.md
````
---
## 2. Building the paper
The paper lives in `paper/` and is structured to support multiple venues (arXiv, IEEE, ACM, bookstyle).
### Prerequisites
* A reasonably recent TeX Live (or MikTeX) with:
* `pgfplots` (with `polar` library),
* `newtxtext`, `newtxmath`,
* `booktabs`, `longtable`, `framed`, `fancyvrb`, etc.
* `latexmk` and `make`.
### Typical build
From the repository root:
```bash
cd paper
make # builds main.pdf by default
# Or explicitly:
make main.pdf
# For an IEEE variant:
make main_ieee.pdf
# For ACM:
make main_acm.pdf
```
If you run into font or pgfplots `compat` warnings, consult comments at the top of `main.tex` and `body_shared.tex` (we assume `\pgfplotsset{compat=1.18}` and `\usepackage{newtxtext,newtxmath}`).
---
## 3. Python OPT evaluation pipeline
The `opt_eval` package provides:
* Data classes for candidate classifications, evaluator results, and adjudications.
* Parsers for extracting OPT lines and rationales from LLM output.
* A `run_pipeline` function that wires together:
* Classifier A and B,
* Evaluator,
* Adjudicator,
* and returns a structured result suitable for JSONL/YAML logging.
### 3.1 Installation (local dev)
Option 1: editable install with `pip`:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .
# or, if you dont define setup.cfg/pyproject:
pip install pyyaml
```
Option 2: just use it in-place with `PYTHONPATH`:
```bash
export PYTHONPATH=$PWD
```
### 3.2 Configuring a local LLM
You must implement `opt_eval/model_client.py` to talk to your model(s). A typical pattern:
* For an OpenAI-compatible HTTP endpoint (local or remote), use `requests` or `openai` client.
* For **Ollama** or **llamafile**, call `http://localhost:11434` or similar.
`model_client.call_model(system_prompt, user_content, model="local-llm")` should:
1. Send `system_prompt` as the system role (if your API supports it).
2. Send `user_content` as the user content.
3. Return the raw text content of the models reply.
Once implemented, you can run the pipeline on a simple description.
---
## 4. Quickstart: Running the evaluation pipeline
Minimal example (from repo root, after configuring `model_client.py`):
```bash
python scripts/run_eval_pipeline.py << 'EOF'
This system trains a fully-connected neural network on MNIST using SGD and
cross-entropy loss, and then uses the trained weights for inference only.
EOF
```
A typical JSON-like output will include:
* `candidate_a`, `candidate_b`
* `eval_a`, `eval_b`
* `final` (final OPT-Code and rationale)
* `adjudication` (if performed)
You can adapt `run_eval_pipeline.py` to write JSONL to `data/examples/opt_audit_example.jsonl`.
---
## 5. Gold test suite and benchmarking
The directory `data/gold/` contains a small handannotated test suite (`opt_gold.yaml` and `opt_gold.jsonl`) covering:
* Backprop MLP on MNIST (Lrn),
* GA for TSP (Evo),
* A* gridworld planner (Sch),
* Rule-based expert system like XCON (Sym),
* Bayesian network for fault diagnosis (Prb),
* Deep Q-Network for Atari (Lrn),
* PID + Kalman filter drone control (Ctl),
* PSO for hyperparameter tuning (Swm),
* Immune negative-selection anomaly detection (Evo/Sch+Prb),
* Three-stage hybrid: GA → rule pruning → Bayesian classifier (Evo/Sym/Prb).
To run tests (after youve wired up `model_client.py`):
```bash
pytest opt_eval/tests
```
`test_gold_suite.py` will:
* Call the classifier prompt(s) on each gold description.
* Compare predicted OPT roots against the gold OPTCode.
* Optionally compute partial-match metrics (Jaccard similarity of root sets) and simple accuracy.
---
## 6. JSONL/YAML audit logs
For large-scale use, we recommend JSONL or YAML for storing evaluations.
* Example JSONL audit: `data/examples/opt_audit_example.jsonl`
* Example YAML audit: `data/examples/opt_audit_example.yaml`
Each record includes:
* `id`, `description`
* `candidates` (A, B)
* `evaluations` (verdicts, scores)
* `adjudication`
* `final` (final OPT-Code)
* `meta` (timestamps, model IDs, etc.)
See `docs/schema_opt_audit.md` for field descriptions.
---
## 7. Using smaller local LLMs
OPT classification needs:
* Understanding of code/algorithm descriptions.
* Solid instruction-following.
* Ability to respect a fairly structured output format.
Models that are feasible to run locally and are good candidates:
* **LLaMA 3 8B Instruct**
Good general reasoning and code understanding; works well as Classifier, Evaluator, and Adjudicator if VRAM allows.
* **Mistral 7B Instruct** (and compatible fine-tunes like Dolphin, OpenHermes)
Strong general-purpose local model with solid coding and instruction-following; good as a classifier.
* **Qwen2 7B / 14B Instruct**
7B is a capable all-rounder; 14B (if you can run it) is strong for the evaluator/adjudicator roles.
* **Phi-3-mini (3.8B) Instruct**
Smaller footprint; may work as a classifier on simpler cases. For nuanced hybrid systems (Evo/Sym/Prb, Swm vs Evo, Ctl vs Prb), you may want a larger model as evaluator/adjudicator.
A reasonable starting configuration:
* Classifier A: `llama3-8b-instruct`
* Classifier B: `mistral-7b-instruct`
* Evaluator: `qwen2-14b-instruct` (if available) or `llama3-8b-instruct`
* Adjudicator: same as Evaluator
You can also run all roles on the same 78B model if resources are constrained; the explicit prompts and the evaluator rubric are designed to catch many misclassifications.
See `docs/model_notes_local_llm.md` for more detailed notes on deployment options (Ollama, llamafile, vLLM, etc.) and recommended quantization levels.
---
## 8. Citing
Once the OPT paper is on arXiv or accepted somewhere, include a BibTeX entry like:
```bibtex
@article{Elsberry_OPT_2025,
author = {Wesley R. Elsberry and N.~Collaborators},
title = {Operational Premise Taxonomy (OPT): Mechanism-Level Classification of AI Systems},
journal = {arXiv preprint},
year = {2025},
eprint = {XXXX.YYYYY},
archivePrefix = {arXiv}
}
```
(Replace with the actual venue and identifier when available.)
---
## 9. Contributing
* Extend the gold test suite (YAML + JSONL) with more systems and hybrids.
* Add additional prompts (e.g., language-specific variants for Python-only code, RL-specific prompts).
* Improve the parsing logic or add better metrics (confusion matrices, root-wise F1).
* Open issues for any misclassifications that recur: they can inform future revisions of prompts and possibly the taxonomy itself.
Pull requests that add well-documented examples, tests, or tooling around OPT are welcome.
```