Update with OPT-Intent
This commit is contained in:
parent
e89da19290
commit
0856cb2e76
302
README.md
302
README.md
|
|
@ -1,10 +1,302 @@
|
||||||
# Operational-Premise-Taxonomy
|
# OPT: Operational Premise Taxonomy for AI Systems
|
||||||
|
|
||||||
A proposed taxonomy of AI approaches that captures relation to biological inspiration and yields an OPT-Code framework for specifying AI systems, including hybrids and pipelines.
|
This repository collects:
|
||||||
|
|
||||||
## Making the document
|
- The LaTeX manuscript defining the **Operational Premise Taxonomy (OPT)** and the OPT‐Code convention.
|
||||||
|
- Prompt sets for classifying AI systems into OPT mechanisms using large language models (LLMs).
|
||||||
|
- A small Python library and scripts to run an end‐to‐end **Classifier → Evaluator → Adjudicator** pipeline.
|
||||||
|
- A hand‐annotated **gold test suite** of systems (backprop, GA, A*, rule‐based expert systems, PSO, AIS, etc.).
|
||||||
|
- Example JSONL/YAML **audit logs** for storing OPT classifications.
|
||||||
|
|
||||||
|
The core idea: classify AI implementations by their **operative mechanism** (learning, evolution, symbolic reasoning, probabilistic inference, search, control, swarm, or hybrids), while explicitly separating that from **execution details** (parallelism, pipelines, hardware).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Repository layout
|
||||||
|
|
||||||
|
```text
|
||||||
|
Operational-Premise-Taxonomy/
|
||||||
|
├── README.md # This file
|
||||||
|
├── LICENSE
|
||||||
|
├── .gitignore
|
||||||
|
├── Makefile # Top-level convenience targets
|
||||||
|
│
|
||||||
|
├── paper/ # LaTeX sources for the OPT paper
|
||||||
|
│ ├── main.tex # arXiv/general article format
|
||||||
|
│ ├── main_ieee.tex # IEEE two-column wrapper
|
||||||
|
│ ├── main_acm.tex # ACM-style wrapper
|
||||||
|
│ ├── main_kaobook.tex # Book-style wrapper
|
||||||
|
│ ├── body_shared.tex # Shared main content
|
||||||
|
│ ├── related-work.tex # Related work section
|
||||||
|
│ ├── appendix_opt_prompts.tex
|
||||||
|
│ ├── appendix_prompt_minimal.tex
|
||||||
|
│ ├── appendix_prompt_maximal.tex
|
||||||
|
│ ├── appendix_prompt_evaluator.tex
|
||||||
|
│ ├── figures/ # TikZ/PGFPlots figures
|
||||||
|
│ │ ├── opt_radar_1.tikz
|
||||||
|
│ │ ├── opt_radar_2.tikz
|
||||||
|
│ │ └── opt_eval_pipeline.tikz
|
||||||
|
│ ├── Makefile # Build main.pdf, IEEE/ACM variants
|
||||||
|
│ └── bib/
|
||||||
|
│ └── references.bib
|
||||||
|
│
|
||||||
|
├── prompts/ # Plain-text LLM prompts
|
||||||
|
│ ├── minimal_classifier_prompt.txt
|
||||||
|
│ ├── maximal_classifier_prompt.txt
|
||||||
|
│ ├── evaluator_prompt.txt
|
||||||
|
│ └── adjudicator_prompt.txt
|
||||||
|
│
|
||||||
|
├── opt_eval/ # Python library for OPT classification/evaluation
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── opt_prompts.py # Utility to load prompt text
|
||||||
|
│ ├── opt_pipeline.py # Data classes + run_pipeline + parsers
|
||||||
|
│ ├── model_client.py # Abstraction over your local/remote LLM endpoint
|
||||||
|
│ ├── cli.py # CLI entrypoint for simple use
|
||||||
|
│ └── tests/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── test_parsers.py
|
||||||
|
│ ├── test_gold_suite.py
|
||||||
|
│ └── data/
|
||||||
|
│ ├── gold_opt.yaml
|
||||||
|
│ └── gold_opt.jsonl
|
||||||
|
│
|
||||||
|
├── data/
|
||||||
|
│ ├── gold/
|
||||||
|
│ │ ├── opt_gold.yaml # Canonical gold test suite
|
||||||
|
│ │ └── opt_gold.jsonl
|
||||||
|
│ └── examples/
|
||||||
|
│ ├── opt_audit_example.jsonl
|
||||||
|
│ └── opt_audit_example.yaml
|
||||||
|
│
|
||||||
|
├── scripts/
|
||||||
|
│ ├── run_eval_pipeline.py
|
||||||
|
│ └── export_gold_to_jsonl.py
|
||||||
|
│
|
||||||
|
└── docs/
|
||||||
|
├── usage.md
|
||||||
|
├── schema_opt_audit.md
|
||||||
|
└── model_notes_local_llm.md
|
||||||
|
````
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Building the paper
|
||||||
|
|
||||||
|
The paper lives in `paper/` and is structured to support multiple venues (arXiv, IEEE, ACM, book‐style).
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
* A reasonably recent TeX Live (or MikTeX) with:
|
||||||
|
|
||||||
|
* `pgfplots` (with `polar` library),
|
||||||
|
* `newtxtext`, `newtxmath`,
|
||||||
|
* `booktabs`, `longtable`, `framed`, `fancyvrb`, etc.
|
||||||
|
* `latexmk` and `make`.
|
||||||
|
|
||||||
|
### Typical build
|
||||||
|
|
||||||
|
From the repository root:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd doc
|
cd paper
|
||||||
latexmk main.tex
|
make # builds main.pdf by default
|
||||||
|
# Or explicitly:
|
||||||
|
make main.pdf
|
||||||
|
|
||||||
|
# For an IEEE variant:
|
||||||
|
make main_ieee.pdf
|
||||||
|
|
||||||
|
# For ACM:
|
||||||
|
make main_acm.pdf
|
||||||
```
|
```
|
||||||
|
|
||||||
|
If you run into font or pgfplots `compat` warnings, consult comments at the top of `main.tex` and `body_shared.tex` (we assume `\pgfplotsset{compat=1.18}` and `\usepackage{newtxtext,newtxmath}`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Python OPT evaluation pipeline
|
||||||
|
|
||||||
|
The `opt_eval` package provides:
|
||||||
|
|
||||||
|
* Data classes for candidate classifications, evaluator results, and adjudications.
|
||||||
|
* Parsers for extracting OPT lines and rationales from LLM output.
|
||||||
|
* A `run_pipeline` function that wires together:
|
||||||
|
|
||||||
|
* Classifier A and B,
|
||||||
|
* Evaluator,
|
||||||
|
* Adjudicator,
|
||||||
|
* and returns a structured result suitable for JSONL/YAML logging.
|
||||||
|
|
||||||
|
### 3.1 Installation (local dev)
|
||||||
|
|
||||||
|
Option 1: editable install with `pip`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
pip install -U pip
|
||||||
|
|
||||||
|
pip install -e .
|
||||||
|
# or, if you don’t define setup.cfg/pyproject:
|
||||||
|
pip install pyyaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Option 2: just use it in-place with `PYTHONPATH`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export PYTHONPATH=$PWD
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 Configuring a local LLM
|
||||||
|
|
||||||
|
You must implement `opt_eval/model_client.py` to talk to your model(s). A typical pattern:
|
||||||
|
|
||||||
|
* For an OpenAI-compatible HTTP endpoint (local or remote), use `requests` or `openai` client.
|
||||||
|
* For **Ollama** or **llamafile**, call `http://localhost:11434` or similar.
|
||||||
|
|
||||||
|
`model_client.call_model(system_prompt, user_content, model="local-llm")` should:
|
||||||
|
|
||||||
|
1. Send `system_prompt` as the system role (if your API supports it).
|
||||||
|
2. Send `user_content` as the user content.
|
||||||
|
3. Return the raw text content of the model’s reply.
|
||||||
|
|
||||||
|
Once implemented, you can run the pipeline on a simple description.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Quickstart: Running the evaluation pipeline
|
||||||
|
|
||||||
|
Minimal example (from repo root, after configuring `model_client.py`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/run_eval_pipeline.py << 'EOF'
|
||||||
|
This system trains a fully-connected neural network on MNIST using SGD and
|
||||||
|
cross-entropy loss, and then uses the trained weights for inference only.
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
A typical JSON-like output will include:
|
||||||
|
|
||||||
|
* `candidate_a`, `candidate_b`
|
||||||
|
* `eval_a`, `eval_b`
|
||||||
|
* `final` (final OPT-Code and rationale)
|
||||||
|
* `adjudication` (if performed)
|
||||||
|
|
||||||
|
You can adapt `run_eval_pipeline.py` to write JSONL to `data/examples/opt_audit_example.jsonl`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Gold test suite and benchmarking
|
||||||
|
|
||||||
|
The directory `data/gold/` contains a small hand‐annotated test suite (`opt_gold.yaml` and `opt_gold.jsonl`) covering:
|
||||||
|
|
||||||
|
* Backprop MLP on MNIST (Lrn),
|
||||||
|
* GA for TSP (Evo),
|
||||||
|
* A* gridworld planner (Sch),
|
||||||
|
* Rule-based expert system like XCON (Sym),
|
||||||
|
* Bayesian network for fault diagnosis (Prb),
|
||||||
|
* Deep Q-Network for Atari (Lrn),
|
||||||
|
* PID + Kalman filter drone control (Ctl),
|
||||||
|
* PSO for hyperparameter tuning (Swm),
|
||||||
|
* Immune negative-selection anomaly detection (Evo/Sch+Prb),
|
||||||
|
* Three-stage hybrid: GA → rule pruning → Bayesian classifier (Evo/Sym/Prb).
|
||||||
|
|
||||||
|
To run tests (after you’ve wired up `model_client.py`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest opt_eval/tests
|
||||||
|
```
|
||||||
|
|
||||||
|
`test_gold_suite.py` will:
|
||||||
|
|
||||||
|
* Call the classifier prompt(s) on each gold description.
|
||||||
|
* Compare predicted OPT roots against the gold OPT‐Code.
|
||||||
|
* Optionally compute partial-match metrics (Jaccard similarity of root sets) and simple accuracy.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. JSONL/YAML audit logs
|
||||||
|
|
||||||
|
For large-scale use, we recommend JSONL or YAML for storing evaluations.
|
||||||
|
|
||||||
|
* Example JSONL audit: `data/examples/opt_audit_example.jsonl`
|
||||||
|
* Example YAML audit: `data/examples/opt_audit_example.yaml`
|
||||||
|
|
||||||
|
Each record includes:
|
||||||
|
|
||||||
|
* `id`, `description`
|
||||||
|
* `candidates` (A, B)
|
||||||
|
* `evaluations` (verdicts, scores)
|
||||||
|
* `adjudication`
|
||||||
|
* `final` (final OPT-Code)
|
||||||
|
* `meta` (timestamps, model IDs, etc.)
|
||||||
|
|
||||||
|
See `docs/schema_opt_audit.md` for field descriptions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Using smaller local LLMs
|
||||||
|
|
||||||
|
OPT classification needs:
|
||||||
|
|
||||||
|
* Understanding of code/algorithm descriptions.
|
||||||
|
* Solid instruction-following.
|
||||||
|
* Ability to respect a fairly structured output format.
|
||||||
|
|
||||||
|
Models that are feasible to run locally and are good candidates:
|
||||||
|
|
||||||
|
* **LLaMA 3 8B Instruct**
|
||||||
|
Good general reasoning and code understanding; works well as Classifier, Evaluator, and Adjudicator if VRAM allows.
|
||||||
|
|
||||||
|
* **Mistral 7B Instruct** (and compatible fine-tunes like Dolphin, OpenHermes)
|
||||||
|
Strong general-purpose local model with solid coding and instruction-following; good as a classifier.
|
||||||
|
|
||||||
|
* **Qwen2 7B / 14B Instruct**
|
||||||
|
7B is a capable all-rounder; 14B (if you can run it) is strong for the evaluator/adjudicator roles.
|
||||||
|
|
||||||
|
* **Phi-3-mini (3.8B) Instruct**
|
||||||
|
Smaller footprint; may work as a classifier on simpler cases. For nuanced hybrid systems (Evo/Sym/Prb, Swm vs Evo, Ctl vs Prb), you may want a larger model as evaluator/adjudicator.
|
||||||
|
|
||||||
|
A reasonable starting configuration:
|
||||||
|
|
||||||
|
* Classifier A: `llama3-8b-instruct`
|
||||||
|
* Classifier B: `mistral-7b-instruct`
|
||||||
|
* Evaluator: `qwen2-14b-instruct` (if available) or `llama3-8b-instruct`
|
||||||
|
* Adjudicator: same as Evaluator
|
||||||
|
|
||||||
|
You can also run all roles on the same 7–8B model if resources are constrained; the explicit prompts and the evaluator rubric are designed to catch many misclassifications.
|
||||||
|
|
||||||
|
See `docs/model_notes_local_llm.md` for more detailed notes on deployment options (Ollama, llamafile, vLLM, etc.) and recommended quantization levels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Citing
|
||||||
|
|
||||||
|
Once the OPT paper is on arXiv or accepted somewhere, include a BibTeX entry like:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@article{Elsberry_OPT_2025,
|
||||||
|
author = {Wesley R. Elsberry and N.~Collaborators},
|
||||||
|
title = {Operational Premise Taxonomy (OPT): Mechanism-Level Classification of AI Systems},
|
||||||
|
journal = {arXiv preprint},
|
||||||
|
year = {2025},
|
||||||
|
eprint = {XXXX.YYYYY},
|
||||||
|
archivePrefix = {arXiv}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
(Replace with the actual venue and identifier when available.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Contributing
|
||||||
|
|
||||||
|
* Extend the gold test suite (YAML + JSONL) with more systems and hybrids.
|
||||||
|
* Add additional prompts (e.g., language-specific variants for Python-only code, RL-specific prompts).
|
||||||
|
* Improve the parsing logic or add better metrics (confusion matrices, root-wise F1).
|
||||||
|
* Open issues for any misclassifications that recur: they can inform future revisions of prompts and possibly the taxonomy itself.
|
||||||
|
|
||||||
|
Pull requests that add well-documented examples, tests, or tooling around OPT are welcome.
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue