261 lines
9.4 KiB
Markdown
261 lines
9.4 KiB
Markdown
# Python 3 Migration Plan
|
|
|
|
## Scope
|
|
|
|
The original system is a cooperative composition pipeline built from three neural subsystems:
|
|
|
|
- `Bach`: a Hopfield-Tank note generator over a 5-position by 8-note grid.
|
|
- `Salieri`: a back-propagation critic trained against a rule-based classical-sequence supervisor.
|
|
- `Beethoven`: an ART1 novelty/category network over the note sequence plus one classicality bit.
|
|
|
|
The immediate goal should be a Python 3 package that reproduces the Pascal algorithms and file-driven behavior closely enough to validate compatibility, while replacing the Pascal linked-list memory model with direct numeric data structures.
|
|
|
|
## What Exists Today
|
|
|
|
### Core orchestration
|
|
|
|
- `THES/ANNCOMP.PP` is the integrated driver.
|
|
- The composition loop is effectively:
|
|
1. Generate a candidate note with the Hopfield-Tank network.
|
|
2. Evaluate/train the back-propagation network using the current note window and the rule-based instructor.
|
|
3. Pass the same window plus the classical/not-classical flag into ART1.
|
|
|
|
### Shared state
|
|
|
|
- `THES/GLOBALS.PP` defines:
|
|
- fixed note vocabulary of 8 notes,
|
|
- sequence window length of 5,
|
|
- ART1 dimensions `Max_F1_nodes = 41`, `Max_F2_nodes = 25`,
|
|
- `Common_Area_`, which is the cross-network exchange object.
|
|
|
|
### Hopfield-Tank subsystem
|
|
|
|
- `THES/ANNCOMP.PP` implements `Bach` and nested `HTN`.
|
|
- The network operates on a flattened 40-cell representation: `8 notes x 5 positions`.
|
|
- It loads a `64 x 64` weight matrix from `HTN.DAT`, but the active note grid uses the first 40 cells.
|
|
- The update rule uses:
|
|
- per-neuron activation `a`,
|
|
- output `0.5 * (1 + tanh(a / c))`,
|
|
- resistance/capacitance/input/weight/iteration scaling factors from globals.
|
|
- `THES/HTNDATA.PP` shows how the Hopfield weights were built from `SEQUENCE.DAT`, plus row/column inhibition and sequence reinforcement.
|
|
|
|
### Back-propagation subsystem
|
|
|
|
- `THES/BP_UNIT.PP` is a general BP implementation with:
|
|
- input, hidden, and output nodes,
|
|
- weight matrix and momentum,
|
|
- feed-forward,
|
|
- back-propagation,
|
|
- file-based parameter and weight loading.
|
|
- `THES/S61.DAT` configures Salieri as:
|
|
- 40 input nodes,
|
|
- 20 hidden nodes,
|
|
- 1 output node,
|
|
- learning rate `0.5`,
|
|
- momentum `0.5`.
|
|
- `THES/ANNCOMP.PP` converts the current 5-note window into a 40-bit one-hot vector and trains the network online against `Classical_instructor`.
|
|
|
|
### Rule-based supervisor
|
|
|
|
- `THES/CLASINST.PP` loads `SEQUENCE.DAT`.
|
|
- It converts the 5-note sequence to a digit string and returns `1` if the target suffix matches any stored example sequence, else `0`.
|
|
- This acts as the teaching signal for the BP network.
|
|
|
|
### ART1 subsystem
|
|
|
|
- `THES/ANNCOMP.PP` implements `ART1`.
|
|
- F1 input is the 40-bit one-hot sequence plus one bit for `Is_classical`, for a total vector length of 41.
|
|
- F2 supports up to 25 committed categories.
|
|
- The implementation includes a nonstandard compatibility detail: when all categories are saturated and none remain eligible, vigilance is reduced by 1 percent and matching is retried.
|
|
|
|
### Legacy data model problem
|
|
|
|
- `THES/STRUCT.PP` provides generic linked-list vectors and matrices (`DVE`, `HVE`) used to work around Turbo Pascal memory constraints.
|
|
- `THES/BP_UNIT.PP` stores nodes, IO vectors, and weights through those linked structures rather than direct arrays.
|
|
- That representation should not be preserved in Python except where needed for compatibility tests.
|
|
|
|
## Recommended Python Representation
|
|
|
|
Use explicit typed structures and dense arrays:
|
|
|
|
- `numpy.ndarray` for:
|
|
- Hopfield state vectors and weight matrices,
|
|
- BP activations, deltas, biases, and weights,
|
|
- ART1 F1/F2 activations and top-down/bottom-up LTM weights.
|
|
- `dataclasses.dataclass` for stable API/state containers.
|
|
- `Enum` for note identifiers only if it does not complicate file compatibility.
|
|
|
|
Recommended canonical encodings:
|
|
|
|
- `NoteSequence`: shape `(5,)`, integer values `0..8`.
|
|
- `SequenceOneHot`: shape `(40,)`, binary.
|
|
- `ArtInputVector`: shape `(41,)`, binary.
|
|
- `HopfieldWeights`: shape `(40, 40)` as the normalized active subset of the legacy file.
|
|
- `BPWeightsIH`, `BPWeightsHO` or one legacy-compatible dense square matrix, depending on whether fidelity or clarity is prioritized in a given layer of the codebase.
|
|
|
|
## Package Layout
|
|
|
|
```text
|
|
composer_ans/
|
|
__init__.py
|
|
types.py
|
|
encoding.py
|
|
io/
|
|
__init__.py
|
|
legacy_files.py
|
|
hopfield.py
|
|
backprop.py
|
|
art1.py
|
|
classical_rules.py
|
|
pipeline.py
|
|
compatibility.py
|
|
tests/
|
|
data/
|
|
test_encoding.py
|
|
test_classical_rules.py
|
|
test_hopfield.py
|
|
test_backprop.py
|
|
test_art1.py
|
|
test_pipeline.py
|
|
```
|
|
|
|
## API Design
|
|
|
|
Keep the public API small and deterministic.
|
|
|
|
```python
|
|
from composer_ans.pipeline import CompositionContext, CompositionPipeline
|
|
|
|
ctx = CompositionContext(notes=[0, 0, 0, 0, 0])
|
|
pipeline = CompositionPipeline.from_legacy_data("THES")
|
|
result = pipeline.step(ctx)
|
|
```
|
|
|
|
Suggested subsystem APIs:
|
|
|
|
```python
|
|
candidate = hopfield.generate_next_note(notes, params)
|
|
is_classical, bp_state = salieri.evaluate_and_train(notes, target=None)
|
|
art_result = beethoven.categorize(notes, is_classical)
|
|
```
|
|
|
|
Where:
|
|
|
|
- `target=None` means "derive target from the classical instructor", matching the Pascal integrated flow.
|
|
- Each call returns structured state useful for debugging and test baselines, not just the final scalar.
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: Preserve semantics, not implementation style
|
|
|
|
- Recreate file readers for:
|
|
- `SEQUENCE.DAT`,
|
|
- `S61.DAT`,
|
|
- `S61.WT`,
|
|
- `HTN.DAT`.
|
|
- Recreate sequence encodings exactly:
|
|
- 5-note rolling window,
|
|
- 40-bit one-hot flattening,
|
|
- ART1 extra classicality bit.
|
|
- Recreate the rule-based instructor exactly before porting the trainable models.
|
|
|
|
Deliverable:
|
|
|
|
- A Python package that can parse legacy files and reproduce the same encoded inputs the Pascal code would produce.
|
|
|
|
### Phase 2: Port Hopfield-Tank
|
|
|
|
- Implement the continuous-time iterative update as written.
|
|
- Preserve:
|
|
- noise injection behavior,
|
|
- stop condition using epsilon on alternating time buffers,
|
|
- "pick max cell in each column" post-processing.
|
|
- Isolate random number generation behind an injectable RNG so deterministic tests are possible.
|
|
|
|
Deliverable:
|
|
|
|
- `generate_next_note()` producing the same result as Pascal for fixed seeds and known sequences.
|
|
|
|
### Phase 3: Port Salieri back-propagation
|
|
|
|
- First implement a legacy-compatible execution mode mirroring the square-node storage and update order.
|
|
- Then wrap it with a clearer façade that exposes standard layer matrices.
|
|
- Preserve:
|
|
- sigmoid behavior,
|
|
- theta updates,
|
|
- momentum handling,
|
|
- online training after every presentation,
|
|
- periodic weight dumping capability.
|
|
|
|
Deliverable:
|
|
|
|
- `evaluate_and_train()` matching legacy outputs and weight updates for a controlled presentation sequence.
|
|
|
|
### Phase 4: Port Beethoven ART1
|
|
|
|
- Port the F1/F2 STM and LTM equations directly.
|
|
- Preserve:
|
|
- 41-bit input vector,
|
|
- eligibility and commitment logic,
|
|
- resonance loop,
|
|
- modified vigilance-reduction behavior on saturation.
|
|
- Keep ART1 state persistent across calls, because the Pascal version learns over the composition session.
|
|
|
|
Deliverable:
|
|
|
|
- `categorize()` returning winner, new-category flag, vigilance-change flag, and current category count.
|
|
|
|
### Phase 5: Rebuild the integrated pipeline
|
|
|
|
- Recreate `Common_Area_` as a Python dataclass.
|
|
- Implement a single-step pipeline equivalent to one iteration of the Pascal composition loop.
|
|
- Add an optional batch runner that emits a complete composition and an event log.
|
|
|
|
Deliverable:
|
|
|
|
- End-to-end run over a fixed number of notes using legacy data assets.
|
|
|
|
## Compatibility Plan
|
|
|
|
Compatibility should be measured in layers:
|
|
|
|
- Encoding compatibility:
|
|
- identical one-hot vectors and ART input vectors for the same note windows.
|
|
- File compatibility:
|
|
- legacy `.DAT` and `.WT` files load without manual editing.
|
|
- Behavioral compatibility:
|
|
- same classical instructor decisions,
|
|
- same Hopfield winner for fixed seed/input,
|
|
- same BP output progression for replayed presentations,
|
|
- same ART1 category decisions for replayed inputs.
|
|
- Pipeline compatibility:
|
|
- same sequence of generated notes for a fixed random seed, or if exact replication is blocked by legacy RNG differences, same per-step subsystem outputs within defined tolerances.
|
|
|
|
## Known Risks
|
|
|
|
- Pascal `Single`, file layout, and RNG behavior may not map exactly to Python defaults.
|
|
- `HTN.DAT` is written as a Pascal binary `FILE OF ARRAY[1..64,1..64] OF REAL`; a dedicated reader may be needed to confirm element size and ordering.
|
|
- The BP code relies on update order within linked structures. A mathematically equivalent refactor may still diverge numerically unless a legacy mode preserves operation order.
|
|
- ART1 has thesis-specific modifications; replacing them with textbook ART1 would break compatibility.
|
|
|
|
## Recommended Delivery Order
|
|
|
|
1. Build legacy readers and encoders.
|
|
2. Port `Classical_instructor`.
|
|
3. Port Hopfield-Tank and verify with controlled seeds.
|
|
4. Port BP in legacy-compatible mode and replay known presentations.
|
|
5. Port ART1 with persistent state.
|
|
6. Assemble the integrated pipeline.
|
|
7. Add a second, cleaner API layer only after compatibility tests pass.
|
|
|
|
## Immediate Next Step
|
|
|
|
Implement the non-neural compatibility layer first:
|
|
|
|
- legacy file parsers,
|
|
- note/sequence encoders,
|
|
- rule-based classical instructor,
|
|
- golden tests based on the files already in `THES`.
|
|
|
|
That gives a stable foundation for porting the three neural subsystems without losing track of what the original program actually did.
|