ReNunney/docs/RESULTS_IN_HAND.md

# Results In Hand

Updated: 2026-04-12

## Purpose

This note records the concrete Track 1 outputs currently available from recent
local runs. It is intentionally narrower than a full replication report. Its
job is to answer a simpler question: what results do we actually have today,
what do they already imply, and what remains too provisional to treat as a
paper-facing conclusion.

The key point is that most current outputs still live in `/tmp`, not yet in the
repo-managed `runs/results/` tree.

## Main Artifacts

### Small report package

Primary artifact:

- `/tmp/track1-report-small/report.md`
- `/tmp/track1-report-small/tracking_summary.json`
- `/tmp/track1-report-small/aggregate_series.json`
- `/tmp/track1-report-small/*.png`

Parameters:

- `K = 5000`
- `N0 = 20`
- `n = 1`
- `u = 5e-6`
- derived `M = 0.05`
- `R = 10`
- `T = 20`
- `epochs = 8`
- `runs = 2`
- `seed_start = 1`

Observed behavior:

- both runs show substantial lag behind the moving target
- one run never leaves zero allele value
- the other run begins adapting at `t = 12` and remains nonzero through
  `t = 46`, but still ends with a large negative tracking gap
- final gaps are about `-1.25` and `-1.30`
- mean absolute tracking gap is about `0.53` to `0.59`

Interpretation:

- this regime appears near or beyond persistence limits
- adaptation can occur transiently without being sufficient to maintain
  tracking
- low mutation supply at this setting produces severe and persistent lag

The aggregate population trajectory rises rapidly toward carrying capacity and
then declines strongly once the moving optimum begins to outrun the population.
By roughly `t = 26`, only one of the two runs is still contributing to the
reported mean series.

### Small extinction dataset

Primary artifact:

- `/tmp/track1-extinction-dataset-small/`
- `/tmp/track1-extinction-dataset-small/run_rows.jsonl`
- `/tmp/track1-extinction-fit-small-payload.json`

Grid:

- `K = 500`
- `N0 in {20, 500}`
- `u in {0.001, 0.005}`
- derived `M in {1, 5}`
- `T = 10`
- `epochs = 2`
- `n = 1`
- `runs_per_treatment = 2`

Observed behavior:

- all 8 runs survive
- no treatment in this toy grid produces extinction
- higher `M` generally reduces lag and removes long zero-mutation streaks
- larger `N0` also improves final tracking

Interpretation:

- this dataset is useful as a smoke test for reporting and dataset generation
- it is not suitable for extinction modeling because there is no outcome
  variation

The fitting payload states this explicitly:

- `fit_status = "insufficient_outcome_variation"`
- `extinction_count = 0`
- `non_extinction_count = 8`

### Designed-grid extinction dataset

Primary artifact:

- `/tmp/track1-extinction-dataset-designed-grid/`
- `/tmp/track1-extinction-dataset-designed-grid/run_rows.jsonl`
- `/tmp/track1-extinction-fit-designed-grid-payload.json`

Grid:

- `K = 500`
- `N0 in {20, 500}`
- `u in {0.0, 0.0001, 0.0005, 0.001, 0.005}`
- `T in {5, 10, 20}`
- derived `M` varies with `u`
- `epochs = 8`
- `n = 1`
- `runs_per_treatment = 4`

Scale:

- `30` treatments
- `120` runs
- `5559` generation rows

Observed behavior:

- `95` extinctions
- `25` non-extinctions
- the current logistic-style fit converges

Included fitted features:

- `log_M`
- `inv_T`
- `n`
- `log_K`
- `log_N0_over_K`
- `mean_abs_tracking_gap`
- `fraction_generations_below_replacement`
- `longest_zero_mutation_streak`
- `cumulative_mutation_shortfall_per_generation`

Interpretation:

- this is the first dataset in hand that is large enough to support actual
  extinction-model fitting
- the included predictors are biologically plausible and align with the current
  diagnostic story: mutation supply, pace of environmental change, tracking
  lag, and time spent below replacement all matter

Caution:

- the reported fit quality is extremely strong for a `120`-run dataset
- at present this should be treated as an in-sample descriptive fit, not a
  validated predictive model
- no cross-validation or held-out assessment is yet recorded in-repo

## Figure 1 Cache State

Primary artifacts:

- `/tmp/track1-figure1-paper-m005-cache.json`
- `/tmp/track1-figure1-paper-m10-cache.json`
- `/tmp/track1-search-m10-n1-runs10-cache.json`
- `/tmp/track1-search-m10-n1-runs10-t1-20-cache.json`

### Paper-scale caches with `N0 = K = 5000`

For the paper-style cached sweeps:

- low mutation supply (`M = 0.05`) shows `20/20` extinctions at all displayed
  `n` values for `T = 1.0`, `1.02`, `1.05`, and `1.10`
- even at `M = 10`, the displayed `T` values remain overwhelmingly extinct,
  with only a slight improvement for `n = 1` around `T = 10`

Interpretation:

- under the current implementation, paper-scale initialization with `N0 = K`
  makes these regimes extremely extinction-prone
- increasing mutation supply helps, but does not obviously eliminate the
  problem in the currently cached low-`T` range

### Exploratory cache with `N0 = 20`

The smaller exploratory threshold caches for `M = 10`, `n = 1`, and `runs = 10`
show:

- `0/10` extinctions for `T = 5`, `5.1`, `5.25`, and `5.5`
- `0/10` extinctions for `T = 1`, `1.02`, `1.05`, and `1.1`

Interpretation:

- the current results are highly sensitive to initialization, especially
  `N0 / K`
- this is not a minor implementation detail; it directly changes whether the
  same nominal treatment appears safely persistent or uniformly extinct

## What The Results Already Say

The current outputs already support the following claims:

1. The Track 1 reporting and dataset stack is operational enough to produce
   coherent run reports, row-level datasets, and extinction-model payloads.
2. Low mutation supply can leave the population far behind the moving optimum,
   even when transient adaptation occurs.
3. Higher mutation supply and larger initial population improve tracking in the
   tested small-grid runs.
4. Extinction behavior is strongly sensitive to initialization conventions,
   especially whether runs begin at low `N0` or at `N0 = K`.

## What Is Still Provisional

The current outputs are not yet enough to support a clean replication claim
about Nunney's published thresholds.

The main unresolved issues are:

1. the scientific status of the `N0 / K` choice in relation to the paper
2. whether the current threshold caches reflect the intended historical setup
3. whether the extinction fit generalizes beyond the designed-grid data used to
   fit it
4. how these local `/tmp` outputs should be normalized into repo-managed result
   locations and paper-ready summaries

## Immediate Next Steps

The highest-value follow-up work is:

1. copy or regenerate the strongest `/tmp` artifacts under `runs/results/`
2. summarize the initialization sensitivity explicitly in the replication notes
3. expand paper-scale Figure 1 sweeps in a way that keeps `N0` assumptions
   explicit
4. return to Track 2 only after the current Track 1 result state is documented
   clearly enough to serve as the baseline comparison