TriuneCadence/MIGRATION_PLAN.md

261 lines
9.4 KiB
Markdown

# Python 3 Migration Plan
## Scope
The original system is a cooperative composition pipeline built from three neural subsystems:
- `Bach`: a Hopfield-Tank note generator over a 5-position by 8-note grid.
- `Salieri`: a back-propagation critic trained against a rule-based classical-sequence supervisor.
- `Beethoven`: an ART1 novelty/category network over the note sequence plus one classicality bit.
The immediate goal should be a Python 3 package that reproduces the Pascal algorithms and file-driven behavior closely enough to validate compatibility, while replacing the Pascal linked-list memory model with direct numeric data structures.
## What Exists Today
### Core orchestration
- `THES/ANNCOMP.PP` is the integrated driver.
- The composition loop is effectively:
1. Generate a candidate note with the Hopfield-Tank network.
2. Evaluate/train the back-propagation network using the current note window and the rule-based instructor.
3. Pass the same window plus the classical/not-classical flag into ART1.
### Shared state
- `THES/GLOBALS.PP` defines:
- fixed note vocabulary of 8 notes,
- sequence window length of 5,
- ART1 dimensions `Max_F1_nodes = 41`, `Max_F2_nodes = 25`,
- `Common_Area_`, which is the cross-network exchange object.
### Hopfield-Tank subsystem
- `THES/ANNCOMP.PP` implements `Bach` and nested `HTN`.
- The network operates on a flattened 40-cell representation: `8 notes x 5 positions`.
- It loads a `64 x 64` weight matrix from `HTN.DAT`, but the active note grid uses the first 40 cells.
- The update rule uses:
- per-neuron activation `a`,
- output `0.5 * (1 + tanh(a / c))`,
- resistance/capacitance/input/weight/iteration scaling factors from globals.
- `THES/HTNDATA.PP` shows how the Hopfield weights were built from `SEQUENCE.DAT`, plus row/column inhibition and sequence reinforcement.
### Back-propagation subsystem
- `THES/BP_UNIT.PP` is a general BP implementation with:
- input, hidden, and output nodes,
- weight matrix and momentum,
- feed-forward,
- back-propagation,
- file-based parameter and weight loading.
- `THES/S61.DAT` configures Salieri as:
- 40 input nodes,
- 20 hidden nodes,
- 1 output node,
- learning rate `0.5`,
- momentum `0.5`.
- `THES/ANNCOMP.PP` converts the current 5-note window into a 40-bit one-hot vector and trains the network online against `Classical_instructor`.
### Rule-based supervisor
- `THES/CLASINST.PP` loads `SEQUENCE.DAT`.
- It converts the 5-note sequence to a digit string and returns `1` if the target suffix matches any stored example sequence, else `0`.
- This acts as the teaching signal for the BP network.
### ART1 subsystem
- `THES/ANNCOMP.PP` implements `ART1`.
- F1 input is the 40-bit one-hot sequence plus one bit for `Is_classical`, for a total vector length of 41.
- F2 supports up to 25 committed categories.
- The implementation includes a nonstandard compatibility detail: when all categories are saturated and none remain eligible, vigilance is reduced by 1 percent and matching is retried.
### Legacy data model problem
- `THES/STRUCT.PP` provides generic linked-list vectors and matrices (`DVE`, `HVE`) used to work around Turbo Pascal memory constraints.
- `THES/BP_UNIT.PP` stores nodes, IO vectors, and weights through those linked structures rather than direct arrays.
- That representation should not be preserved in Python except where needed for compatibility tests.
## Recommended Python Representation
Use explicit typed structures and dense arrays:
- `numpy.ndarray` for:
- Hopfield state vectors and weight matrices,
- BP activations, deltas, biases, and weights,
- ART1 F1/F2 activations and top-down/bottom-up LTM weights.
- `dataclasses.dataclass` for stable API/state containers.
- `Enum` for note identifiers only if it does not complicate file compatibility.
Recommended canonical encodings:
- `NoteSequence`: shape `(5,)`, integer values `0..8`.
- `SequenceOneHot`: shape `(40,)`, binary.
- `ArtInputVector`: shape `(41,)`, binary.
- `HopfieldWeights`: shape `(40, 40)` as the normalized active subset of the legacy file.
- `BPWeightsIH`, `BPWeightsHO` or one legacy-compatible dense square matrix, depending on whether fidelity or clarity is prioritized in a given layer of the codebase.
## Package Layout
```text
composer_ans/
__init__.py
types.py
encoding.py
io/
__init__.py
legacy_files.py
hopfield.py
backprop.py
art1.py
classical_rules.py
pipeline.py
compatibility.py
tests/
data/
test_encoding.py
test_classical_rules.py
test_hopfield.py
test_backprop.py
test_art1.py
test_pipeline.py
```
## API Design
Keep the public API small and deterministic.
```python
from composer_ans.pipeline import CompositionContext, CompositionPipeline
ctx = CompositionContext(notes=[0, 0, 0, 0, 0])
pipeline = CompositionPipeline.from_legacy_data("THES")
result = pipeline.step(ctx)
```
Suggested subsystem APIs:
```python
candidate = hopfield.generate_next_note(notes, params)
is_classical, bp_state = salieri.evaluate_and_train(notes, target=None)
art_result = beethoven.categorize(notes, is_classical)
```
Where:
- `target=None` means "derive target from the classical instructor", matching the Pascal integrated flow.
- Each call returns structured state useful for debugging and test baselines, not just the final scalar.
## Migration Strategy
### Phase 1: Preserve semantics, not implementation style
- Recreate file readers for:
- `SEQUENCE.DAT`,
- `S61.DAT`,
- `S61.WT`,
- `HTN.DAT`.
- Recreate sequence encodings exactly:
- 5-note rolling window,
- 40-bit one-hot flattening,
- ART1 extra classicality bit.
- Recreate the rule-based instructor exactly before porting the trainable models.
Deliverable:
- A Python package that can parse legacy files and reproduce the same encoded inputs the Pascal code would produce.
### Phase 2: Port Hopfield-Tank
- Implement the continuous-time iterative update as written.
- Preserve:
- noise injection behavior,
- stop condition using epsilon on alternating time buffers,
- "pick max cell in each column" post-processing.
- Isolate random number generation behind an injectable RNG so deterministic tests are possible.
Deliverable:
- `generate_next_note()` producing the same result as Pascal for fixed seeds and known sequences.
### Phase 3: Port Salieri back-propagation
- First implement a legacy-compatible execution mode mirroring the square-node storage and update order.
- Then wrap it with a clearer façade that exposes standard layer matrices.
- Preserve:
- sigmoid behavior,
- theta updates,
- momentum handling,
- online training after every presentation,
- periodic weight dumping capability.
Deliverable:
- `evaluate_and_train()` matching legacy outputs and weight updates for a controlled presentation sequence.
### Phase 4: Port Beethoven ART1
- Port the F1/F2 STM and LTM equations directly.
- Preserve:
- 41-bit input vector,
- eligibility and commitment logic,
- resonance loop,
- modified vigilance-reduction behavior on saturation.
- Keep ART1 state persistent across calls, because the Pascal version learns over the composition session.
Deliverable:
- `categorize()` returning winner, new-category flag, vigilance-change flag, and current category count.
### Phase 5: Rebuild the integrated pipeline
- Recreate `Common_Area_` as a Python dataclass.
- Implement a single-step pipeline equivalent to one iteration of the Pascal composition loop.
- Add an optional batch runner that emits a complete composition and an event log.
Deliverable:
- End-to-end run over a fixed number of notes using legacy data assets.
## Compatibility Plan
Compatibility should be measured in layers:
- Encoding compatibility:
- identical one-hot vectors and ART input vectors for the same note windows.
- File compatibility:
- legacy `.DAT` and `.WT` files load without manual editing.
- Behavioral compatibility:
- same classical instructor decisions,
- same Hopfield winner for fixed seed/input,
- same BP output progression for replayed presentations,
- same ART1 category decisions for replayed inputs.
- Pipeline compatibility:
- same sequence of generated notes for a fixed random seed, or if exact replication is blocked by legacy RNG differences, same per-step subsystem outputs within defined tolerances.
## Known Risks
- Pascal `Single`, file layout, and RNG behavior may not map exactly to Python defaults.
- `HTN.DAT` is written as a Pascal binary `FILE OF ARRAY[1..64,1..64] OF REAL`; a dedicated reader may be needed to confirm element size and ordering.
- The BP code relies on update order within linked structures. A mathematically equivalent refactor may still diverge numerically unless a legacy mode preserves operation order.
- ART1 has thesis-specific modifications; replacing them with textbook ART1 would break compatibility.
## Recommended Delivery Order
1. Build legacy readers and encoders.
2. Port `Classical_instructor`.
3. Port Hopfield-Tank and verify with controlled seeds.
4. Port BP in legacy-compatible mode and replay known presentations.
5. Port ART1 with persistent state.
6. Assemble the integrated pipeline.
7. Add a second, cleaner API layer only after compatibility tests pass.
## Immediate Next Step
Implement the non-neural compatibility layer first:
- legacy file parsers,
- note/sequence encoders,
- rule-based classical instructor,
- golden tests based on the files already in `THES`.
That gives a stable foundation for porting the three neural subsystems without losing track of what the original program actually did.