ReNunney/docs/RESULTS_IN_HAND.md

234 lines
6.8 KiB
Markdown

# Results In Hand
Updated: 2026-04-12
## Purpose
This note records the concrete Track 1 outputs currently available from recent
local runs. It is intentionally narrower than a full replication report. Its
job is to answer a simpler question: what results do we actually have today,
what do they already imply, and what remains too provisional to treat as a
paper-facing conclusion.
The key point is that most current outputs still live in `/tmp`, not yet in the
repo-managed `runs/results/` tree.
## Main Artifacts
### Small report package
Primary artifact:
- `/tmp/track1-report-small/report.md`
- `/tmp/track1-report-small/tracking_summary.json`
- `/tmp/track1-report-small/aggregate_series.json`
- `/tmp/track1-report-small/*.png`
Parameters:
- `K = 5000`
- `N0 = 20`
- `n = 1`
- `u = 5e-6`
- derived `M = 0.05`
- `R = 10`
- `T = 20`
- `epochs = 8`
- `runs = 2`
- `seed_start = 1`
Observed behavior:
- both runs show substantial lag behind the moving target
- one run never leaves zero allele value
- the other run begins adapting at `t = 12` and remains nonzero through
`t = 46`, but still ends with a large negative tracking gap
- final gaps are about `-1.25` and `-1.30`
- mean absolute tracking gap is about `0.53` to `0.59`
Interpretation:
- this regime appears near or beyond persistence limits
- adaptation can occur transiently without being sufficient to maintain
tracking
- low mutation supply at this setting produces severe and persistent lag
The aggregate population trajectory rises rapidly toward carrying capacity and
then declines strongly once the moving optimum begins to outrun the population.
By roughly `t = 26`, only one of the two runs is still contributing to the
reported mean series.
### Small extinction dataset
Primary artifact:
- `/tmp/track1-extinction-dataset-small/`
- `/tmp/track1-extinction-dataset-small/run_rows.jsonl`
- `/tmp/track1-extinction-fit-small-payload.json`
Grid:
- `K = 500`
- `N0 in {20, 500}`
- `u in {0.001, 0.005}`
- derived `M in {1, 5}`
- `T = 10`
- `epochs = 2`
- `n = 1`
- `runs_per_treatment = 2`
Observed behavior:
- all 8 runs survive
- no treatment in this toy grid produces extinction
- higher `M` generally reduces lag and removes long zero-mutation streaks
- larger `N0` also improves final tracking
Interpretation:
- this dataset is useful as a smoke test for reporting and dataset generation
- it is not suitable for extinction modeling because there is no outcome
variation
The fitting payload states this explicitly:
- `fit_status = "insufficient_outcome_variation"`
- `extinction_count = 0`
- `non_extinction_count = 8`
### Designed-grid extinction dataset
Primary artifact:
- `/tmp/track1-extinction-dataset-designed-grid/`
- `/tmp/track1-extinction-dataset-designed-grid/run_rows.jsonl`
- `/tmp/track1-extinction-fit-designed-grid-payload.json`
Grid:
- `K = 500`
- `N0 in {20, 500}`
- `u in {0.0, 0.0001, 0.0005, 0.001, 0.005}`
- `T in {5, 10, 20}`
- derived `M` varies with `u`
- `epochs = 8`
- `n = 1`
- `runs_per_treatment = 4`
Scale:
- `30` treatments
- `120` runs
- `5559` generation rows
Observed behavior:
- `95` extinctions
- `25` non-extinctions
- the current logistic-style fit converges
Included fitted features:
- `log_M`
- `inv_T`
- `n`
- `log_K`
- `log_N0_over_K`
- `mean_abs_tracking_gap`
- `fraction_generations_below_replacement`
- `longest_zero_mutation_streak`
- `cumulative_mutation_shortfall_per_generation`
Interpretation:
- this is the first dataset in hand that is large enough to support actual
extinction-model fitting
- the included predictors are biologically plausible and align with the current
diagnostic story: mutation supply, pace of environmental change, tracking
lag, and time spent below replacement all matter
Caution:
- the reported fit quality is extremely strong for a `120`-run dataset
- at present this should be treated as an in-sample descriptive fit, not a
validated predictive model
- no cross-validation or held-out assessment is yet recorded in-repo
## Figure 1 Cache State
Primary artifacts:
- `/tmp/track1-figure1-paper-m005-cache.json`
- `/tmp/track1-figure1-paper-m10-cache.json`
- `/tmp/track1-search-m10-n1-runs10-cache.json`
- `/tmp/track1-search-m10-n1-runs10-t1-20-cache.json`
### Paper-scale caches with `N0 = K = 5000`
For the paper-style cached sweeps:
- low mutation supply (`M = 0.05`) shows `20/20` extinctions at all displayed
`n` values for `T = 1.0`, `1.02`, `1.05`, and `1.10`
- even at `M = 10`, the displayed `T` values remain overwhelmingly extinct,
with only a slight improvement for `n = 1` around `T = 10`
Interpretation:
- under the current implementation, paper-scale initialization with `N0 = K`
makes these regimes extremely extinction-prone
- increasing mutation supply helps, but does not obviously eliminate the
problem in the currently cached low-`T` range
### Exploratory cache with `N0 = 20`
The smaller exploratory threshold caches for `M = 10`, `n = 1`, and `runs = 10`
show:
- `0/10` extinctions for `T = 5`, `5.1`, `5.25`, and `5.5`
- `0/10` extinctions for `T = 1`, `1.02`, `1.05`, and `1.1`
Interpretation:
- the current results are highly sensitive to initialization, especially
`N0 / K`
- this is not a minor implementation detail; it directly changes whether the
same nominal treatment appears safely persistent or uniformly extinct
## What The Results Already Say
The current outputs already support the following claims:
1. The Track 1 reporting and dataset stack is operational enough to produce
coherent run reports, row-level datasets, and extinction-model payloads.
2. Low mutation supply can leave the population far behind the moving optimum,
even when transient adaptation occurs.
3. Higher mutation supply and larger initial population improve tracking in the
tested small-grid runs.
4. Extinction behavior is strongly sensitive to initialization conventions,
especially whether runs begin at low `N0` or at `N0 = K`.
## What Is Still Provisional
The current outputs are not yet enough to support a clean replication claim
about Nunney's published thresholds.
The main unresolved issues are:
1. the scientific status of the `N0 / K` choice in relation to the paper
2. whether the current threshold caches reflect the intended historical setup
3. whether the extinction fit generalizes beyond the designed-grid data used to
fit it
4. how these local `/tmp` outputs should be normalized into repo-managed result
locations and paper-ready summaries
## Immediate Next Steps
The highest-value follow-up work is:
1. copy or regenerate the strongest `/tmp` artifacts under `runs/results/`
2. summarize the initialization sensitivity explicitly in the replication notes
3. expand paper-scale Figure 1 sweeps in a way that keeps `N0` assumptions
explicit
4. return to Track 2 only after the current Track 1 result state is documented
clearly enough to serve as the baseline comparison