diff --git a/README.md b/README.md index 30c4e14..c4b4ec3 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,302 @@ -# Operational-Premise-Taxonomy +# OPT: Operational Premise Taxonomy for AI Systems -A proposed taxonomy of AI approaches that captures relation to biological inspiration and yields an OPT-Code framework for specifying AI systems, including hybrids and pipelines. +This repository collects: -## Making the document +- The LaTeX manuscript defining the **Operational Premise Taxonomy (OPT)** and the OPT‐Code convention. +- Prompt sets for classifying AI systems into OPT mechanisms using large language models (LLMs). +- A small Python library and scripts to run an end‐to‐end **Classifier → Evaluator → Adjudicator** pipeline. +- A hand‐annotated **gold test suite** of systems (backprop, GA, A*, rule‐based expert systems, PSO, AIS, etc.). +- Example JSONL/YAML **audit logs** for storing OPT classifications. + +The core idea: classify AI implementations by their **operative mechanism** (learning, evolution, symbolic reasoning, probabilistic inference, search, control, swarm, or hybrids), while explicitly separating that from **execution details** (parallelism, pipelines, hardware). + +--- + +## 1. Repository layout + +```text +Operational-Premise-Taxonomy/ +├── README.md # This file +├── LICENSE +├── .gitignore +├── Makefile # Top-level convenience targets +│ +├── paper/ # LaTeX sources for the OPT paper +│ ├── main.tex # arXiv/general article format +│ ├── main_ieee.tex # IEEE two-column wrapper +│ ├── main_acm.tex # ACM-style wrapper +│ ├── main_kaobook.tex # Book-style wrapper +│ ├── body_shared.tex # Shared main content +│ ├── related-work.tex # Related work section +│ ├── appendix_opt_prompts.tex +│ ├── appendix_prompt_minimal.tex +│ ├── appendix_prompt_maximal.tex +│ ├── appendix_prompt_evaluator.tex +│ ├── figures/ # TikZ/PGFPlots figures +│ │ ├── opt_radar_1.tikz +│ │ ├── opt_radar_2.tikz +│ │ └── opt_eval_pipeline.tikz +│ ├── Makefile # Build main.pdf, IEEE/ACM variants +│ └── bib/ +│ └── references.bib +│ +├── prompts/ # Plain-text LLM prompts +│ ├── minimal_classifier_prompt.txt +│ ├── maximal_classifier_prompt.txt +│ ├── evaluator_prompt.txt +│ └── adjudicator_prompt.txt +│ +├── opt_eval/ # Python library for OPT classification/evaluation +│ ├── __init__.py +│ ├── opt_prompts.py # Utility to load prompt text +│ ├── opt_pipeline.py # Data classes + run_pipeline + parsers +│ ├── model_client.py # Abstraction over your local/remote LLM endpoint +│ ├── cli.py # CLI entrypoint for simple use +│ └── tests/ +│ ├── __init__.py +│ ├── test_parsers.py +│ ├── test_gold_suite.py +│ └── data/ +│ ├── gold_opt.yaml +│ └── gold_opt.jsonl +│ +├── data/ +│ ├── gold/ +│ │ ├── opt_gold.yaml # Canonical gold test suite +│ │ └── opt_gold.jsonl +│ └── examples/ +│ ├── opt_audit_example.jsonl +│ └── opt_audit_example.yaml +│ +├── scripts/ +│ ├── run_eval_pipeline.py +│ └── export_gold_to_jsonl.py +│ +└── docs/ + ├── usage.md + ├── schema_opt_audit.md + └── model_notes_local_llm.md +```` + +--- + +## 2. Building the paper + +The paper lives in `paper/` and is structured to support multiple venues (arXiv, IEEE, ACM, book‐style). + +### Prerequisites + +* A reasonably recent TeX Live (or MikTeX) with: + + * `pgfplots` (with `polar` library), + * `newtxtext`, `newtxmath`, + * `booktabs`, `longtable`, `framed`, `fancyvrb`, etc. +* `latexmk` and `make`. + +### Typical build + +From the repository root: ```bash -cd doc -latexmk main.tex +cd paper +make # builds main.pdf by default +# Or explicitly: +make main.pdf + +# For an IEEE variant: +make main_ieee.pdf + +# For ACM: +make main_acm.pdf ``` + +If you run into font or pgfplots `compat` warnings, consult comments at the top of `main.tex` and `body_shared.tex` (we assume `\pgfplotsset{compat=1.18}` and `\usepackage{newtxtext,newtxmath}`). + +--- + +## 3. Python OPT evaluation pipeline + +The `opt_eval` package provides: + +* Data classes for candidate classifications, evaluator results, and adjudications. +* Parsers for extracting OPT lines and rationales from LLM output. +* A `run_pipeline` function that wires together: + + * Classifier A and B, + * Evaluator, + * Adjudicator, + * and returns a structured result suitable for JSONL/YAML logging. + +### 3.1 Installation (local dev) + +Option 1: editable install with `pip`: + +```bash +python -m venv .venv +source .venv/bin/activate +pip install -U pip + +pip install -e . +# or, if you don’t define setup.cfg/pyproject: +pip install pyyaml +``` + +Option 2: just use it in-place with `PYTHONPATH`: + +```bash +export PYTHONPATH=$PWD +``` + +### 3.2 Configuring a local LLM + +You must implement `opt_eval/model_client.py` to talk to your model(s). A typical pattern: + +* For an OpenAI-compatible HTTP endpoint (local or remote), use `requests` or `openai` client. +* For **Ollama** or **llamafile**, call `http://localhost:11434` or similar. + +`model_client.call_model(system_prompt, user_content, model="local-llm")` should: + +1. Send `system_prompt` as the system role (if your API supports it). +2. Send `user_content` as the user content. +3. Return the raw text content of the model’s reply. + +Once implemented, you can run the pipeline on a simple description. + +--- + +## 4. Quickstart: Running the evaluation pipeline + +Minimal example (from repo root, after configuring `model_client.py`): + +```bash +python scripts/run_eval_pipeline.py << 'EOF' +This system trains a fully-connected neural network on MNIST using SGD and +cross-entropy loss, and then uses the trained weights for inference only. +EOF +``` + +A typical JSON-like output will include: + +* `candidate_a`, `candidate_b` +* `eval_a`, `eval_b` +* `final` (final OPT-Code and rationale) +* `adjudication` (if performed) + +You can adapt `run_eval_pipeline.py` to write JSONL to `data/examples/opt_audit_example.jsonl`. + +--- + +## 5. Gold test suite and benchmarking + +The directory `data/gold/` contains a small hand‐annotated test suite (`opt_gold.yaml` and `opt_gold.jsonl`) covering: + +* Backprop MLP on MNIST (Lrn), +* GA for TSP (Evo), +* A* gridworld planner (Sch), +* Rule-based expert system like XCON (Sym), +* Bayesian network for fault diagnosis (Prb), +* Deep Q-Network for Atari (Lrn), +* PID + Kalman filter drone control (Ctl), +* PSO for hyperparameter tuning (Swm), +* Immune negative-selection anomaly detection (Evo/Sch+Prb), +* Three-stage hybrid: GA → rule pruning → Bayesian classifier (Evo/Sym/Prb). + +To run tests (after you’ve wired up `model_client.py`): + +```bash +pytest opt_eval/tests +``` + +`test_gold_suite.py` will: + +* Call the classifier prompt(s) on each gold description. +* Compare predicted OPT roots against the gold OPT‐Code. +* Optionally compute partial-match metrics (Jaccard similarity of root sets) and simple accuracy. + +--- + +## 6. JSONL/YAML audit logs + +For large-scale use, we recommend JSONL or YAML for storing evaluations. + +* Example JSONL audit: `data/examples/opt_audit_example.jsonl` +* Example YAML audit: `data/examples/opt_audit_example.yaml` + +Each record includes: + +* `id`, `description` +* `candidates` (A, B) +* `evaluations` (verdicts, scores) +* `adjudication` +* `final` (final OPT-Code) +* `meta` (timestamps, model IDs, etc.) + +See `docs/schema_opt_audit.md` for field descriptions. + +--- + +## 7. Using smaller local LLMs + +OPT classification needs: + +* Understanding of code/algorithm descriptions. +* Solid instruction-following. +* Ability to respect a fairly structured output format. + +Models that are feasible to run locally and are good candidates: + +* **LLaMA 3 8B Instruct** + Good general reasoning and code understanding; works well as Classifier, Evaluator, and Adjudicator if VRAM allows. + +* **Mistral 7B Instruct** (and compatible fine-tunes like Dolphin, OpenHermes) + Strong general-purpose local model with solid coding and instruction-following; good as a classifier. + +* **Qwen2 7B / 14B Instruct** + 7B is a capable all-rounder; 14B (if you can run it) is strong for the evaluator/adjudicator roles. + +* **Phi-3-mini (3.8B) Instruct** + Smaller footprint; may work as a classifier on simpler cases. For nuanced hybrid systems (Evo/Sym/Prb, Swm vs Evo, Ctl vs Prb), you may want a larger model as evaluator/adjudicator. + +A reasonable starting configuration: + +* Classifier A: `llama3-8b-instruct` +* Classifier B: `mistral-7b-instruct` +* Evaluator: `qwen2-14b-instruct` (if available) or `llama3-8b-instruct` +* Adjudicator: same as Evaluator + +You can also run all roles on the same 7–8B model if resources are constrained; the explicit prompts and the evaluator rubric are designed to catch many misclassifications. + +See `docs/model_notes_local_llm.md` for more detailed notes on deployment options (Ollama, llamafile, vLLM, etc.) and recommended quantization levels. + +--- + +## 8. Citing + +Once the OPT paper is on arXiv or accepted somewhere, include a BibTeX entry like: + +```bibtex +@article{Elsberry_OPT_2025, + author = {Wesley R. Elsberry and N.~Collaborators}, + title = {Operational Premise Taxonomy (OPT): Mechanism-Level Classification of AI Systems}, + journal = {arXiv preprint}, + year = {2025}, + eprint = {XXXX.YYYYY}, + archivePrefix = {arXiv} +} +``` + +(Replace with the actual venue and identifier when available.) + +--- + +## 9. Contributing + +* Extend the gold test suite (YAML + JSONL) with more systems and hybrids. +* Add additional prompts (e.g., language-specific variants for Python-only code, RL-specific prompts). +* Improve the parsing logic or add better metrics (confusion matrices, root-wise F1). +* Open issues for any misclassifications that recur: they can inform future revisions of prompts and possibly the taxonomy itself. + +Pull requests that add well-documented examples, tests, or tooling around OPT are welcome. + +``` +