ReNunney/docs/RESULTS_IN_HAND.md

6.8 KiB

Results In Hand

Updated: 2026-04-12

Purpose

This note records the concrete Track 1 outputs currently available from recent local runs. It is intentionally narrower than a full replication report. Its job is to answer a simpler question: what results do we actually have today, what do they already imply, and what remains too provisional to treat as a paper-facing conclusion.

The key point is that most current outputs still live in /tmp, not yet in the repo-managed runs/results/ tree.

Main Artifacts

Small report package

Primary artifact:

  • /tmp/track1-report-small/report.md
  • /tmp/track1-report-small/tracking_summary.json
  • /tmp/track1-report-small/aggregate_series.json
  • /tmp/track1-report-small/*.png

Parameters:

  • K = 5000
  • N0 = 20
  • n = 1
  • u = 5e-6
  • derived M = 0.05
  • R = 10
  • T = 20
  • epochs = 8
  • runs = 2
  • seed_start = 1

Observed behavior:

  • both runs show substantial lag behind the moving target
  • one run never leaves zero allele value
  • the other run begins adapting at t = 12 and remains nonzero through t = 46, but still ends with a large negative tracking gap
  • final gaps are about -1.25 and -1.30
  • mean absolute tracking gap is about 0.53 to 0.59

Interpretation:

  • this regime appears near or beyond persistence limits
  • adaptation can occur transiently without being sufficient to maintain tracking
  • low mutation supply at this setting produces severe and persistent lag

The aggregate population trajectory rises rapidly toward carrying capacity and then declines strongly once the moving optimum begins to outrun the population. By roughly t = 26, only one of the two runs is still contributing to the reported mean series.

Small extinction dataset

Primary artifact:

  • /tmp/track1-extinction-dataset-small/
  • /tmp/track1-extinction-dataset-small/run_rows.jsonl
  • /tmp/track1-extinction-fit-small-payload.json

Grid:

  • K = 500
  • N0 in {20, 500}
  • u in {0.001, 0.005}
  • derived M in {1, 5}
  • T = 10
  • epochs = 2
  • n = 1
  • runs_per_treatment = 2

Observed behavior:

  • all 8 runs survive
  • no treatment in this toy grid produces extinction
  • higher M generally reduces lag and removes long zero-mutation streaks
  • larger N0 also improves final tracking

Interpretation:

  • this dataset is useful as a smoke test for reporting and dataset generation
  • it is not suitable for extinction modeling because there is no outcome variation

The fitting payload states this explicitly:

  • fit_status = "insufficient_outcome_variation"
  • extinction_count = 0
  • non_extinction_count = 8

Designed-grid extinction dataset

Primary artifact:

  • /tmp/track1-extinction-dataset-designed-grid/
  • /tmp/track1-extinction-dataset-designed-grid/run_rows.jsonl
  • /tmp/track1-extinction-fit-designed-grid-payload.json

Grid:

  • K = 500
  • N0 in {20, 500}
  • u in {0.0, 0.0001, 0.0005, 0.001, 0.005}
  • T in {5, 10, 20}
  • derived M varies with u
  • epochs = 8
  • n = 1
  • runs_per_treatment = 4

Scale:

  • 30 treatments
  • 120 runs
  • 5559 generation rows

Observed behavior:

  • 95 extinctions
  • 25 non-extinctions
  • the current logistic-style fit converges

Included fitted features:

  • log_M
  • inv_T
  • n
  • log_K
  • log_N0_over_K
  • mean_abs_tracking_gap
  • fraction_generations_below_replacement
  • longest_zero_mutation_streak
  • cumulative_mutation_shortfall_per_generation

Interpretation:

  • this is the first dataset in hand that is large enough to support actual extinction-model fitting
  • the included predictors are biologically plausible and align with the current diagnostic story: mutation supply, pace of environmental change, tracking lag, and time spent below replacement all matter

Caution:

  • the reported fit quality is extremely strong for a 120-run dataset
  • at present this should be treated as an in-sample descriptive fit, not a validated predictive model
  • no cross-validation or held-out assessment is yet recorded in-repo

Figure 1 Cache State

Primary artifacts:

  • /tmp/track1-figure1-paper-m005-cache.json
  • /tmp/track1-figure1-paper-m10-cache.json
  • /tmp/track1-search-m10-n1-runs10-cache.json
  • /tmp/track1-search-m10-n1-runs10-t1-20-cache.json

Paper-scale caches with N0 = K = 5000

For the paper-style cached sweeps:

  • low mutation supply (M = 0.05) shows 20/20 extinctions at all displayed n values for T = 1.0, 1.02, 1.05, and 1.10
  • even at M = 10, the displayed T values remain overwhelmingly extinct, with only a slight improvement for n = 1 around T = 10

Interpretation:

  • under the current implementation, paper-scale initialization with N0 = K makes these regimes extremely extinction-prone
  • increasing mutation supply helps, but does not obviously eliminate the problem in the currently cached low-T range

Exploratory cache with N0 = 20

The smaller exploratory threshold caches for M = 10, n = 1, and runs = 10 show:

  • 0/10 extinctions for T = 5, 5.1, 5.25, and 5.5
  • 0/10 extinctions for T = 1, 1.02, 1.05, and 1.1

Interpretation:

  • the current results are highly sensitive to initialization, especially N0 / K
  • this is not a minor implementation detail; it directly changes whether the same nominal treatment appears safely persistent or uniformly extinct

What The Results Already Say

The current outputs already support the following claims:

  1. The Track 1 reporting and dataset stack is operational enough to produce coherent run reports, row-level datasets, and extinction-model payloads.
  2. Low mutation supply can leave the population far behind the moving optimum, even when transient adaptation occurs.
  3. Higher mutation supply and larger initial population improve tracking in the tested small-grid runs.
  4. Extinction behavior is strongly sensitive to initialization conventions, especially whether runs begin at low N0 or at N0 = K.

What Is Still Provisional

The current outputs are not yet enough to support a clean replication claim about Nunney's published thresholds.

The main unresolved issues are:

  1. the scientific status of the N0 / K choice in relation to the paper
  2. whether the current threshold caches reflect the intended historical setup
  3. whether the extinction fit generalizes beyond the designed-grid data used to fit it
  4. how these local /tmp outputs should be normalized into repo-managed result locations and paper-ready summaries

Immediate Next Steps

The highest-value follow-up work is:

  1. copy or regenerate the strongest /tmp artifacts under runs/results/
  2. summarize the initialization sensitivity explicitly in the replication notes
  3. expand paper-scale Figure 1 sweeps in a way that keeps N0 assumptions explicit
  4. return to Track 2 only after the current Track 1 result state is documented clearly enough to serve as the baseline comparison