# Python 3 Migration Plan ## Scope The original system is a cooperative composition pipeline built from three neural subsystems: - `Bach`: a Hopfield-Tank note generator over a 5-position by 8-note grid. - `Salieri`: a back-propagation critic trained against a rule-based classical-sequence supervisor. - `Beethoven`: an ART1 novelty/category network over the note sequence plus one classicality bit. The immediate goal should be a Python 3 package that reproduces the Pascal algorithms and file-driven behavior closely enough to validate compatibility, while replacing the Pascal linked-list memory model with direct numeric data structures. ## What Exists Today ### Core orchestration - `THES/ANNCOMP.PP` is the integrated driver. - The composition loop is effectively: 1. Generate a candidate note with the Hopfield-Tank network. 2. Evaluate/train the back-propagation network using the current note window and the rule-based instructor. 3. Pass the same window plus the classical/not-classical flag into ART1. ### Shared state - `THES/GLOBALS.PP` defines: - fixed note vocabulary of 8 notes, - sequence window length of 5, - ART1 dimensions `Max_F1_nodes = 41`, `Max_F2_nodes = 25`, - `Common_Area_`, which is the cross-network exchange object. ### Hopfield-Tank subsystem - `THES/ANNCOMP.PP` implements `Bach` and nested `HTN`. - The network operates on a flattened 40-cell representation: `8 notes x 5 positions`. - It loads a `64 x 64` weight matrix from `HTN.DAT`, but the active note grid uses the first 40 cells. - The update rule uses: - per-neuron activation `a`, - output `0.5 * (1 + tanh(a / c))`, - resistance/capacitance/input/weight/iteration scaling factors from globals. - `THES/HTNDATA.PP` shows how the Hopfield weights were built from `SEQUENCE.DAT`, plus row/column inhibition and sequence reinforcement. ### Back-propagation subsystem - `THES/BP_UNIT.PP` is a general BP implementation with: - input, hidden, and output nodes, - weight matrix and momentum, - feed-forward, - back-propagation, - file-based parameter and weight loading. - `THES/S61.DAT` configures Salieri as: - 40 input nodes, - 20 hidden nodes, - 1 output node, - learning rate `0.5`, - momentum `0.5`. - `THES/ANNCOMP.PP` converts the current 5-note window into a 40-bit one-hot vector and trains the network online against `Classical_instructor`. ### Rule-based supervisor - `THES/CLASINST.PP` loads `SEQUENCE.DAT`. - It converts the 5-note sequence to a digit string and returns `1` if the target suffix matches any stored example sequence, else `0`. - This acts as the teaching signal for the BP network. ### ART1 subsystem - `THES/ANNCOMP.PP` implements `ART1`. - F1 input is the 40-bit one-hot sequence plus one bit for `Is_classical`, for a total vector length of 41. - F2 supports up to 25 committed categories. - The implementation includes a nonstandard compatibility detail: when all categories are saturated and none remain eligible, vigilance is reduced by 1 percent and matching is retried. ### Legacy data model problem - `THES/STRUCT.PP` provides generic linked-list vectors and matrices (`DVE`, `HVE`) used to work around Turbo Pascal memory constraints. - `THES/BP_UNIT.PP` stores nodes, IO vectors, and weights through those linked structures rather than direct arrays. - That representation should not be preserved in Python except where needed for compatibility tests. ## Recommended Python Representation Use explicit typed structures and dense arrays: - `numpy.ndarray` for: - Hopfield state vectors and weight matrices, - BP activations, deltas, biases, and weights, - ART1 F1/F2 activations and top-down/bottom-up LTM weights. - `dataclasses.dataclass` for stable API/state containers. - `Enum` for note identifiers only if it does not complicate file compatibility. Recommended canonical encodings: - `NoteSequence`: shape `(5,)`, integer values `0..8`. - `SequenceOneHot`: shape `(40,)`, binary. - `ArtInputVector`: shape `(41,)`, binary. - `HopfieldWeights`: shape `(40, 40)` as the normalized active subset of the legacy file. - `BPWeightsIH`, `BPWeightsHO` or one legacy-compatible dense square matrix, depending on whether fidelity or clarity is prioritized in a given layer of the codebase. ## Package Layout ```text composer_ans/ __init__.py types.py encoding.py io/ __init__.py legacy_files.py hopfield.py backprop.py art1.py classical_rules.py pipeline.py compatibility.py tests/ data/ test_encoding.py test_classical_rules.py test_hopfield.py test_backprop.py test_art1.py test_pipeline.py ``` ## API Design Keep the public API small and deterministic. ```python from composer_ans.pipeline import CompositionContext, CompositionPipeline ctx = CompositionContext(notes=[0, 0, 0, 0, 0]) pipeline = CompositionPipeline.from_legacy_data("THES") result = pipeline.step(ctx) ``` Suggested subsystem APIs: ```python candidate = hopfield.generate_next_note(notes, params) is_classical, bp_state = salieri.evaluate_and_train(notes, target=None) art_result = beethoven.categorize(notes, is_classical) ``` Where: - `target=None` means "derive target from the classical instructor", matching the Pascal integrated flow. - Each call returns structured state useful for debugging and test baselines, not just the final scalar. ## Migration Strategy ### Phase 1: Preserve semantics, not implementation style - Recreate file readers for: - `SEQUENCE.DAT`, - `S61.DAT`, - `S61.WT`, - `HTN.DAT`. - Recreate sequence encodings exactly: - 5-note rolling window, - 40-bit one-hot flattening, - ART1 extra classicality bit. - Recreate the rule-based instructor exactly before porting the trainable models. Deliverable: - A Python package that can parse legacy files and reproduce the same encoded inputs the Pascal code would produce. ### Phase 2: Port Hopfield-Tank - Implement the continuous-time iterative update as written. - Preserve: - noise injection behavior, - stop condition using epsilon on alternating time buffers, - "pick max cell in each column" post-processing. - Isolate random number generation behind an injectable RNG so deterministic tests are possible. Deliverable: - `generate_next_note()` producing the same result as Pascal for fixed seeds and known sequences. ### Phase 3: Port Salieri back-propagation - First implement a legacy-compatible execution mode mirroring the square-node storage and update order. - Then wrap it with a clearer façade that exposes standard layer matrices. - Preserve: - sigmoid behavior, - theta updates, - momentum handling, - online training after every presentation, - periodic weight dumping capability. Deliverable: - `evaluate_and_train()` matching legacy outputs and weight updates for a controlled presentation sequence. ### Phase 4: Port Beethoven ART1 - Port the F1/F2 STM and LTM equations directly. - Preserve: - 41-bit input vector, - eligibility and commitment logic, - resonance loop, - modified vigilance-reduction behavior on saturation. - Keep ART1 state persistent across calls, because the Pascal version learns over the composition session. Deliverable: - `categorize()` returning winner, new-category flag, vigilance-change flag, and current category count. ### Phase 5: Rebuild the integrated pipeline - Recreate `Common_Area_` as a Python dataclass. - Implement a single-step pipeline equivalent to one iteration of the Pascal composition loop. - Add an optional batch runner that emits a complete composition and an event log. Deliverable: - End-to-end run over a fixed number of notes using legacy data assets. ## Compatibility Plan Compatibility should be measured in layers: - Encoding compatibility: - identical one-hot vectors and ART input vectors for the same note windows. - File compatibility: - legacy `.DAT` and `.WT` files load without manual editing. - Behavioral compatibility: - same classical instructor decisions, - same Hopfield winner for fixed seed/input, - same BP output progression for replayed presentations, - same ART1 category decisions for replayed inputs. - Pipeline compatibility: - same sequence of generated notes for a fixed random seed, or if exact replication is blocked by legacy RNG differences, same per-step subsystem outputs within defined tolerances. ## Known Risks - Pascal `Single`, file layout, and RNG behavior may not map exactly to Python defaults. - `HTN.DAT` is written as a Pascal binary `FILE OF ARRAY[1..64,1..64] OF REAL`; a dedicated reader may be needed to confirm element size and ordering. - The BP code relies on update order within linked structures. A mathematically equivalent refactor may still diverge numerically unless a legacy mode preserves operation order. - ART1 has thesis-specific modifications; replacing them with textbook ART1 would break compatibility. ## Recommended Delivery Order 1. Build legacy readers and encoders. 2. Port `Classical_instructor`. 3. Port Hopfield-Tank and verify with controlled seeds. 4. Port BP in legacy-compatible mode and replay known presentations. 5. Port ART1 with persistent state. 6. Assemble the integrated pipeline. 7. Add a second, cleaner API layer only after compatibility tests pass. ## Immediate Next Step Implement the non-neural compatibility layer first: - legacy file parsers, - note/sequence encoders, - rule-based classical instructor, - golden tests based on the files already in `THES`. That gives a stable foundation for porting the three neural subsystems without losing track of what the original program actually did.