TriuneCadence/MIGRATION_PLAN.md

9.4 KiB

Python 3 Migration Plan

Scope

The original system is a cooperative composition pipeline built from three neural subsystems:

  • Bach: a Hopfield-Tank note generator over a 5-position by 8-note grid.
  • Salieri: a back-propagation critic trained against a rule-based classical-sequence supervisor.
  • Beethoven: an ART1 novelty/category network over the note sequence plus one classicality bit.

The immediate goal should be a Python 3 package that reproduces the Pascal algorithms and file-driven behavior closely enough to validate compatibility, while replacing the Pascal linked-list memory model with direct numeric data structures.

What Exists Today

Core orchestration

  • THES/ANNCOMP.PP is the integrated driver.
  • The composition loop is effectively:
    1. Generate a candidate note with the Hopfield-Tank network.
    2. Evaluate/train the back-propagation network using the current note window and the rule-based instructor.
    3. Pass the same window plus the classical/not-classical flag into ART1.

Shared state

  • THES/GLOBALS.PP defines:
    • fixed note vocabulary of 8 notes,
    • sequence window length of 5,
    • ART1 dimensions Max_F1_nodes = 41, Max_F2_nodes = 25,
    • Common_Area_, which is the cross-network exchange object.

Hopfield-Tank subsystem

  • THES/ANNCOMP.PP implements Bach and nested HTN.
  • The network operates on a flattened 40-cell representation: 8 notes x 5 positions.
  • It loads a 64 x 64 weight matrix from HTN.DAT, but the active note grid uses the first 40 cells.
  • The update rule uses:
    • per-neuron activation a,
    • output 0.5 * (1 + tanh(a / c)),
    • resistance/capacitance/input/weight/iteration scaling factors from globals.
  • THES/HTNDATA.PP shows how the Hopfield weights were built from SEQUENCE.DAT, plus row/column inhibition and sequence reinforcement.

Back-propagation subsystem

  • THES/BP_UNIT.PP is a general BP implementation with:
    • input, hidden, and output nodes,
    • weight matrix and momentum,
    • feed-forward,
    • back-propagation,
    • file-based parameter and weight loading.
  • THES/S61.DAT configures Salieri as:
    • 40 input nodes,
    • 20 hidden nodes,
    • 1 output node,
    • learning rate 0.5,
    • momentum 0.5.
  • THES/ANNCOMP.PP converts the current 5-note window into a 40-bit one-hot vector and trains the network online against Classical_instructor.

Rule-based supervisor

  • THES/CLASINST.PP loads SEQUENCE.DAT.
  • It converts the 5-note sequence to a digit string and returns 1 if the target suffix matches any stored example sequence, else 0.
  • This acts as the teaching signal for the BP network.

ART1 subsystem

  • THES/ANNCOMP.PP implements ART1.
  • F1 input is the 40-bit one-hot sequence plus one bit for Is_classical, for a total vector length of 41.
  • F2 supports up to 25 committed categories.
  • The implementation includes a nonstandard compatibility detail: when all categories are saturated and none remain eligible, vigilance is reduced by 1 percent and matching is retried.

Legacy data model problem

  • THES/STRUCT.PP provides generic linked-list vectors and matrices (DVE, HVE) used to work around Turbo Pascal memory constraints.
  • THES/BP_UNIT.PP stores nodes, IO vectors, and weights through those linked structures rather than direct arrays.
  • That representation should not be preserved in Python except where needed for compatibility tests.

Use explicit typed structures and dense arrays:

  • numpy.ndarray for:
    • Hopfield state vectors and weight matrices,
    • BP activations, deltas, biases, and weights,
    • ART1 F1/F2 activations and top-down/bottom-up LTM weights.
  • dataclasses.dataclass for stable API/state containers.
  • Enum for note identifiers only if it does not complicate file compatibility.

Recommended canonical encodings:

  • NoteSequence: shape (5,), integer values 0..8.
  • SequenceOneHot: shape (40,), binary.
  • ArtInputVector: shape (41,), binary.
  • HopfieldWeights: shape (40, 40) as the normalized active subset of the legacy file.
  • BPWeightsIH, BPWeightsHO or one legacy-compatible dense square matrix, depending on whether fidelity or clarity is prioritized in a given layer of the codebase.

Package Layout

composer_ans/
  __init__.py
  types.py
  encoding.py
  io/
    __init__.py
    legacy_files.py
  hopfield.py
  backprop.py
  art1.py
  classical_rules.py
  pipeline.py
  compatibility.py
tests/
  data/
  test_encoding.py
  test_classical_rules.py
  test_hopfield.py
  test_backprop.py
  test_art1.py
  test_pipeline.py

API Design

Keep the public API small and deterministic.

from composer_ans.pipeline import CompositionContext, CompositionPipeline

ctx = CompositionContext(notes=[0, 0, 0, 0, 0])
pipeline = CompositionPipeline.from_legacy_data("THES")
result = pipeline.step(ctx)

Suggested subsystem APIs:

candidate = hopfield.generate_next_note(notes, params)
is_classical, bp_state = salieri.evaluate_and_train(notes, target=None)
art_result = beethoven.categorize(notes, is_classical)

Where:

  • target=None means "derive target from the classical instructor", matching the Pascal integrated flow.
  • Each call returns structured state useful for debugging and test baselines, not just the final scalar.

Migration Strategy

Phase 1: Preserve semantics, not implementation style

  • Recreate file readers for:
    • SEQUENCE.DAT,
    • S61.DAT,
    • S61.WT,
    • HTN.DAT.
  • Recreate sequence encodings exactly:
    • 5-note rolling window,
    • 40-bit one-hot flattening,
    • ART1 extra classicality bit.
  • Recreate the rule-based instructor exactly before porting the trainable models.

Deliverable:

  • A Python package that can parse legacy files and reproduce the same encoded inputs the Pascal code would produce.

Phase 2: Port Hopfield-Tank

  • Implement the continuous-time iterative update as written.
  • Preserve:
    • noise injection behavior,
    • stop condition using epsilon on alternating time buffers,
    • "pick max cell in each column" post-processing.
  • Isolate random number generation behind an injectable RNG so deterministic tests are possible.

Deliverable:

  • generate_next_note() producing the same result as Pascal for fixed seeds and known sequences.

Phase 3: Port Salieri back-propagation

  • First implement a legacy-compatible execution mode mirroring the square-node storage and update order.
  • Then wrap it with a clearer façade that exposes standard layer matrices.
  • Preserve:
    • sigmoid behavior,
    • theta updates,
    • momentum handling,
    • online training after every presentation,
    • periodic weight dumping capability.

Deliverable:

  • evaluate_and_train() matching legacy outputs and weight updates for a controlled presentation sequence.

Phase 4: Port Beethoven ART1

  • Port the F1/F2 STM and LTM equations directly.
  • Preserve:
    • 41-bit input vector,
    • eligibility and commitment logic,
    • resonance loop,
    • modified vigilance-reduction behavior on saturation.
  • Keep ART1 state persistent across calls, because the Pascal version learns over the composition session.

Deliverable:

  • categorize() returning winner, new-category flag, vigilance-change flag, and current category count.

Phase 5: Rebuild the integrated pipeline

  • Recreate Common_Area_ as a Python dataclass.
  • Implement a single-step pipeline equivalent to one iteration of the Pascal composition loop.
  • Add an optional batch runner that emits a complete composition and an event log.

Deliverable:

  • End-to-end run over a fixed number of notes using legacy data assets.

Compatibility Plan

Compatibility should be measured in layers:

  • Encoding compatibility:
    • identical one-hot vectors and ART input vectors for the same note windows.
  • File compatibility:
    • legacy .DAT and .WT files load without manual editing.
  • Behavioral compatibility:
    • same classical instructor decisions,
    • same Hopfield winner for fixed seed/input,
    • same BP output progression for replayed presentations,
    • same ART1 category decisions for replayed inputs.
  • Pipeline compatibility:
    • same sequence of generated notes for a fixed random seed, or if exact replication is blocked by legacy RNG differences, same per-step subsystem outputs within defined tolerances.

Known Risks

  • Pascal Single, file layout, and RNG behavior may not map exactly to Python defaults.
  • HTN.DAT is written as a Pascal binary FILE OF ARRAY[1..64,1..64] OF REAL; a dedicated reader may be needed to confirm element size and ordering.
  • The BP code relies on update order within linked structures. A mathematically equivalent refactor may still diverge numerically unless a legacy mode preserves operation order.
  • ART1 has thesis-specific modifications; replacing them with textbook ART1 would break compatibility.
  1. Build legacy readers and encoders.
  2. Port Classical_instructor.
  3. Port Hopfield-Tank and verify with controlled seeds.
  4. Port BP in legacy-compatible mode and replay known presentations.
  5. Port ART1 with persistent state.
  6. Assemble the integrated pipeline.
  7. Add a second, cleaner API layer only after compatibility tests pass.

Immediate Next Step

Implement the non-neural compatibility layer first:

  • legacy file parsers,
  • note/sequence encoders,
  • rule-based classical instructor,
  • golden tests based on the files already in THES.

That gives a stable foundation for porting the three neural subsystems without losing track of what the original program actually did.