# Python 3 Migration Plan

## Scope

The original system is a cooperative composition pipeline built from three neural subsystems:

- `Bach`: a Hopfield-Tank note generator over a 5-position by 8-note grid.
- `Salieri`: a back-propagation critic trained against a rule-based classical-sequence supervisor.
- `Beethoven`: an ART1 novelty/category network over the note sequence plus one classicality bit.

The immediate goal should be a Python 3 package that reproduces the Pascal algorithms and file-driven behavior closely enough to validate compatibility, while replacing the Pascal linked-list memory model with direct numeric data structures.

## What Exists Today

### Core orchestration

- `THES/ANNCOMP.PP` is the integrated driver.
- The composition loop is effectively:
  1. Generate a candidate note with the Hopfield-Tank network.
  2. Evaluate/train the back-propagation network using the current note window and the rule-based instructor.
  3. Pass the same window plus the classical/not-classical flag into ART1.

### Shared state

- `THES/GLOBALS.PP` defines:
  - fixed note vocabulary of 8 notes,
  - sequence window length of 5,
  - ART1 dimensions `Max_F1_nodes = 41`, `Max_F2_nodes = 25`,
  - `Common_Area_`, which is the cross-network exchange object.

### Hopfield-Tank subsystem

- `THES/ANNCOMP.PP` implements `Bach` and nested `HTN`.
- The network operates on a flattened 40-cell representation: `8 notes x 5 positions`.
- It loads a `64 x 64` weight matrix from `HTN.DAT`, but the active note grid uses the first 40 cells.
- The update rule uses:
  - per-neuron activation `a`,
  - output `0.5 * (1 + tanh(a / c))`,
  - resistance/capacitance/input/weight/iteration scaling factors from globals.
- `THES/HTNDATA.PP` shows how the Hopfield weights were built from `SEQUENCE.DAT`, plus row/column inhibition and sequence reinforcement.

### Back-propagation subsystem

- `THES/BP_UNIT.PP` is a general BP implementation with:
  - input, hidden, and output nodes,
  - weight matrix and momentum,
  - feed-forward,
  - back-propagation,
  - file-based parameter and weight loading.
- `THES/S61.DAT` configures Salieri as:
  - 40 input nodes,
  - 20 hidden nodes,
  - 1 output node,
  - learning rate `0.5`,
  - momentum `0.5`.
- `THES/ANNCOMP.PP` converts the current 5-note window into a 40-bit one-hot vector and trains the network online against `Classical_instructor`.

### Rule-based supervisor

- `THES/CLASINST.PP` loads `SEQUENCE.DAT`.
- It converts the 5-note sequence to a digit string and returns `1` if the target suffix matches any stored example sequence, else `0`.
- This acts as the teaching signal for the BP network.

### ART1 subsystem

- `THES/ANNCOMP.PP` implements `ART1`.
- F1 input is the 40-bit one-hot sequence plus one bit for `Is_classical`, for a total vector length of 41.
- F2 supports up to 25 committed categories.
- The implementation includes a nonstandard compatibility detail: when all categories are saturated and none remain eligible, vigilance is reduced by 1 percent and matching is retried.

### Legacy data model problem

- `THES/STRUCT.PP` provides generic linked-list vectors and matrices (`DVE`, `HVE`) used to work around Turbo Pascal memory constraints.
- `THES/BP_UNIT.PP` stores nodes, IO vectors, and weights through those linked structures rather than direct arrays.
- That representation should not be preserved in Python except where needed for compatibility tests.

## Recommended Python Representation

Use explicit typed structures and dense arrays:

- `numpy.ndarray` for:
  - Hopfield state vectors and weight matrices,
  - BP activations, deltas, biases, and weights,
  - ART1 F1/F2 activations and top-down/bottom-up LTM weights.
- `dataclasses.dataclass` for stable API/state containers.
- `Enum` for note identifiers only if it does not complicate file compatibility.

Recommended canonical encodings:

- `NoteSequence`: shape `(5,)`, integer values `0..8`.
- `SequenceOneHot`: shape `(40,)`, binary.
- `ArtInputVector`: shape `(41,)`, binary.
- `HopfieldWeights`: shape `(40, 40)` as the normalized active subset of the legacy file.
- `BPWeightsIH`, `BPWeightsHO` or one legacy-compatible dense square matrix, depending on whether fidelity or clarity is prioritized in a given layer of the codebase.

## Package Layout

```text
composer_ans/
  __init__.py
  types.py
  encoding.py
  io/
    __init__.py
    legacy_files.py
  hopfield.py
  backprop.py
  art1.py
  classical_rules.py
  pipeline.py
  compatibility.py
tests/
  data/
  test_encoding.py
  test_classical_rules.py
  test_hopfield.py
  test_backprop.py
  test_art1.py
  test_pipeline.py
```

## API Design

Keep the public API small and deterministic.

```python
from composer_ans.pipeline import CompositionContext, CompositionPipeline

ctx = CompositionContext(notes=[0, 0, 0, 0, 0])
pipeline = CompositionPipeline.from_legacy_data("THES")
result = pipeline.step(ctx)
```

Suggested subsystem APIs:

```python
candidate = hopfield.generate_next_note(notes, params)
is_classical, bp_state = salieri.evaluate_and_train(notes, target=None)
art_result = beethoven.categorize(notes, is_classical)
```

Where:

- `target=None` means "derive target from the classical instructor", matching the Pascal integrated flow.
- Each call returns structured state useful for debugging and test baselines, not just the final scalar.

## Migration Strategy

### Phase 1: Preserve semantics, not implementation style

- Recreate file readers for:
  - `SEQUENCE.DAT`,
  - `S61.DAT`,
  - `S61.WT`,
  - `HTN.DAT`.
- Recreate sequence encodings exactly:
  - 5-note rolling window,
  - 40-bit one-hot flattening,
  - ART1 extra classicality bit.
- Recreate the rule-based instructor exactly before porting the trainable models.

Deliverable:

- A Python package that can parse legacy files and reproduce the same encoded inputs the Pascal code would produce.

### Phase 2: Port Hopfield-Tank

- Implement the continuous-time iterative update as written.
- Preserve:
  - noise injection behavior,
  - stop condition using epsilon on alternating time buffers,
  - "pick max cell in each column" post-processing.
- Isolate random number generation behind an injectable RNG so deterministic tests are possible.

Deliverable:

- `generate_next_note()` producing the same result as Pascal for fixed seeds and known sequences.

### Phase 3: Port Salieri back-propagation

- First implement a legacy-compatible execution mode mirroring the square-node storage and update order.
- Then wrap it with a clearer façade that exposes standard layer matrices.
- Preserve:
  - sigmoid behavior,
  - theta updates,
  - momentum handling,
  - online training after every presentation,
  - periodic weight dumping capability.

Deliverable:

- `evaluate_and_train()` matching legacy outputs and weight updates for a controlled presentation sequence.

### Phase 4: Port Beethoven ART1

- Port the F1/F2 STM and LTM equations directly.
- Preserve:
  - 41-bit input vector,
  - eligibility and commitment logic,
  - resonance loop,
  - modified vigilance-reduction behavior on saturation.
- Keep ART1 state persistent across calls, because the Pascal version learns over the composition session.

Deliverable:

- `categorize()` returning winner, new-category flag, vigilance-change flag, and current category count.

### Phase 5: Rebuild the integrated pipeline

- Recreate `Common_Area_` as a Python dataclass.
- Implement a single-step pipeline equivalent to one iteration of the Pascal composition loop.
- Add an optional batch runner that emits a complete composition and an event log.

Deliverable:

- End-to-end run over a fixed number of notes using legacy data assets.

## Compatibility Plan

Compatibility should be measured in layers:

- Encoding compatibility:
  - identical one-hot vectors and ART input vectors for the same note windows.
- File compatibility:
  - legacy `.DAT` and `.WT` files load without manual editing.
- Behavioral compatibility:
  - same classical instructor decisions,
  - same Hopfield winner for fixed seed/input,
  - same BP output progression for replayed presentations,
  - same ART1 category decisions for replayed inputs.
- Pipeline compatibility:
  - same sequence of generated notes for a fixed random seed, or if exact replication is blocked by legacy RNG differences, same per-step subsystem outputs within defined tolerances.

## Known Risks

- Pascal `Single`, file layout, and RNG behavior may not map exactly to Python defaults.
- `HTN.DAT` is written as a Pascal binary `FILE OF ARRAY[1..64,1..64] OF REAL`; a dedicated reader may be needed to confirm element size and ordering.
- The BP code relies on update order within linked structures. A mathematically equivalent refactor may still diverge numerically unless a legacy mode preserves operation order.
- ART1 has thesis-specific modifications; replacing them with textbook ART1 would break compatibility.

## Recommended Delivery Order

1. Build legacy readers and encoders.
2. Port `Classical_instructor`.
3. Port Hopfield-Tank and verify with controlled seeds.
4. Port BP in legacy-compatible mode and replay known presentations.
5. Port ART1 with persistent state.
6. Assemble the integrated pipeline.
7. Add a second, cleaner API layer only after compatibility tests pass.

## Immediate Next Step

Implement the non-neural compatibility layer first:

- legacy file parsers,
- note/sequence encoders,
- rule-based classical instructor,
- golden tests based on the files already in `THES`.

That gives a stable foundation for porting the three neural subsystems without losing track of what the original program actually did.