Add biological overview and Nunney analysis docs

This commit is contained in:
welsberr 2026-04-11 07:20:57 -04:00
parent 15f6a6ac4a
commit 4f36070a7f
2 changed files with 309 additions and 51 deletions

136
README.md
View File

@ -1,55 +1,97 @@
# renunney # renunney
Clean working repository for: Working repository for replication and reanalysis of Leonard Nunney's 2003
cost-of-substitution simulations.
- faithful replication of Leonard Nunney's 2003 cost-of-substitution results, ## Biological Question
- orchestration of distributed sweep runs,
- later migration to a faster Rust-backed worker.
## Current Scope Nunney's paper asks how rapidly an adapting population can track a moving
environment before extinction becomes likely. In this model:
This repository was bootstrapped from earlier work in: - the selective optimum moves steadily through time,
- adaptation requires repeated allelic substitutions across one or more loci,
- population growth is density-dependent,
- offspring survival falls as genotypes lag behind the moving optimum,
- and "cost of substitution" is summarized by the smallest environmental-change
interval `T` that still allows persistence.
- [`../collaborations/to_ptbc/evc/cost_of_substitution`](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution) Smaller `T` means faster environmental change and a harder adaptive problem.
That earlier tree remains useful as provenance and historical context. The ## Two Approaches
Track 1 runtime and orchestration stack now live in `renunney`.
`renunney` provides: This project treats the problem in two separate ways.
- a clean git repo, Track 1: Nunney-faithful replication
- a stable working directory layout,
- a local orchestration CLI and library, - reconstruct the published simulation and threshold heuristic as closely as
- local paper-scale Figure 1 submission configs, possible,
- a local Track 1 runner and config/API layer, - preserve historically relevant assumptions even when they are inefficient or
- a local Track 1 analysis layer for tracking summaries and loci-regression, awkward,
- a local Track 1 threshold/search layer for Nunney-style threshold checks, - and use that result as the baseline for replication and criticism.
- a local Track 1 simulation kernel,
- a local Track 1 report generator, Track 2: modern replacement
- a local Track 1 extinction-model data layer,
- a local Track 1 dataset generator, - build a cleaner and faster simulator around the same biological question,
- a local Track 1 fit layer, - define threshold explicitly rather than through Nunney's heuristic,
- a Makefile for common tasks, - and use a more performant implementation path, likely including Rust.
- migration notes for pulling code into this repo in stages.
The current repo is centered on Track 1, with orchestration intended to support
both tracks later.
## Nunney's Approach
The paper-level model combines four pieces:
- constant environmental change,
- genotype survival `w_i` as a function of distance from the moving optimum,
- density-dependent female fecundity `f`,
- and Mendelian transmission with mutation.
The published threshold procedure is heuristic rather than inferential: Nunney
searches for the smallest `T` with no extinctions in 20 runs, then checks
nearby larger values. That historical rule is preserved in Track 1 because it
is part of the claimed result structure.
## This Repo's Approach
`renunney` turns that work into a clean, testable stack:
- local Track 1 simulation kernel,
- local threshold/search layer,
- local analysis, reporting, extinction-dataset, and fitting layers,
- local orchestration CLI and SQLite job registry,
- local paper-scale Figure 1 configs,
- and a Makefile for common operational tasks.
The repo was bootstrapped from earlier work in
[`../collaborations/to_ptbc/evc/cost_of_substitution`](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution),
which remains useful as provenance and historical context, but the Track 1
runtime now lives in `renunney`.
## Key Docs
- [docs/MIGRATION.md](/mnt/CIFS/pengolodh/Docs/Projects/renunney/docs/MIGRATION.md)
- [docs/WORKFLOW.md](/mnt/CIFS/pengolodh/Docs/Projects/renunney/docs/WORKFLOW.md)
- [docs/NUNNEY_ANALYSIS.md](/mnt/CIFS/pengolodh/Docs/Projects/renunney/docs/NUNNEY_ANALYSIS.md)
## Layout ## Layout
- `docs/` - `docs/`
- project and migration notes project, migration, and paper-analysis notes
- `config/` - `config/`
- configuration templates and examples configuration templates and paper-scale treatment configs
- `runs/state/` - `runs/state/`
- SQLite registries and persistent orchestration state SQLite registries and persistent orchestration state
- `runs/results/` - `runs/results/`
- result artifacts collected by orchestration result artifacts collected by orchestration
- `runs/scratch/` - `runs/scratch/`
- local worker scratch and cache files local worker scratch and cache files
- `src/renunney/` - `src/renunney/`
- future in-repo Python package and migration target in-repo Python package
- `scripts/` - `scripts/`
- local CLI entrypoints local CLI entrypoints
- `tests/` - `tests/`
- local verification for migrated boundaries local verification
## Start ## Start
@ -59,18 +101,18 @@ Initialize the local run directories and SQLite registry:
make init make init
``` ```
Run one local Track 1 simulation:
```bash
make track1-sim-smoke
```
Submit a paper-scale Figure 1 treatment: Submit a paper-scale Figure 1 treatment:
```bash ```bash
make submit-figure1-m10 make submit-figure1-m10
``` ```
Run one local Track 1 simulation through the migrated runner/API boundary:
```bash
make track1-sim-smoke
```
Run one worker loop locally: Run one worker loop locally:
```bash ```bash
@ -85,17 +127,9 @@ make collate-figure1
## Status ## Status
The current state is split: The Track 1 runtime and orchestration stack are now local to `renunney`. The
next major step is no longer migration of Track 1 code; it is either:
- orchestration control plane: local to `renunney` - hardening multi-host orchestration,
- Track 1 runner and config/API layer: local to `renunney` - organizing publication-quality replication outputs,
- Track 1 analysis layer: local to `renunney` - or starting the Rust-backed Track 2 path.
- Track 1 threshold/search layer: local to `renunney`
- Track 1 simulation kernel: local to `renunney`
- Track 1 report generator: local to `renunney`
- Track 1 extinction-model data layer: local to `renunney`
- Track 1 dataset generator: local to `renunney`
- Track 1 fit layer: local to `renunney`
This repo is now the clean operational entry point for the Track 1 runtime and
its orchestration stack.

224
docs/NUNNEY_ANALYSIS.md Normal file
View File

@ -0,0 +1,224 @@
# Nunney Analysis
Updated: 2026-04-11
## Purpose
This note gives a compact in-repo analysis of Nunney's main claims, equations,
and reported results, and how the current replication effort interprets them.
It is not a complete paper summary. Its job is to make the scientific and
implementation targets explicit enough that code and results can be reviewed
against them.
Primary paper:
- `nunney_cost_of_substitution_anz40-185.pdf`
Related internal notes:
- [COST_OF_SUBSTITUTION.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/COST_OF_SUBSTITUTION.md)
- [TRACK1_BASELINE.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/TRACK1_BASELINE.md)
- [PAPER_CODE_ALIGNMENT.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/PAPER_CODE_ALIGNMENT.md)
## Core Claim
Nunney's central claim is that the rate of environmental change a population
can tolerate depends on the cost of repeated adaptive substitutions, and that
this cost can be decomposed into:
- a fixed component, and
- a per-locus component.
The paper presents simulation evidence that both components depend on mutation
supply, summarized by `M = 2Ku`, and that extinction occurs when the
environment changes too quickly for the population to keep pace.
## Model Structure
The paper describes four interacting components:
1. constant environmental change
2. genotype-dependent survival relative to the moving optimum
3. density-dependent female fecundity
4. Mendelian transmission with mutation
The adaptive problem is staged so that one substitution is needed every `T`
generations at each selected locus. Smaller `T` means faster change and thus a
more demanding environment.
## Main Equations
### Growth/Fecundity
Nunney uses:
```text
R = 2 exp(r)
```
and:
```text
f = 2 exp(r * (1 - (N/K)^(1/r)))
```
Interpretation:
- `R` is the density-independent net reproductive rate.
- `f` is density-dependent female fecundity.
- fecundity is genotype-independent; selection enters through survival.
### Fitness / Offspring Survival
The key selection equation is:
```text
w_i = exp(-(r/n) * Σ_j (Av_ij - t/T)^2)
```
where:
- `Av_ij` is the mean allelic value at locus `j` in genotype `i`
- `t/T` is the moving optimum on the allele-value scale
- `n` is the number of loci
Interpretation:
- survival declines as genotype means lag the moving optimum,
- the factor `r/n` scales selection intensity across different numbers of
loci,
- and the Gaussian form makes tracking lag the central state variable.
### Mutation Supply
The paper uses `u` as the mutation-rate parameter and `M = 2Ku` as a derived
population-level mutation-supply quantity for comparing treatments.
Important consequence for replication:
- `u` is the paper-native input,
- `M` is a derived comparison variable,
- and the simulation must expose mutation across both diploid strands for the
`M = 2Ku` interpretation to make sense.
## Threshold Claim
Nunney's reported threshold is not a mathematically defined extinction
probability threshold. It is a simulation-search heuristic:
- find the lowest `T` with no extinctions in 20 runs,
- search from below,
- then require no extinction at `1.02T`, `1.05T`, and `1.10T`,
- with extra retesting in borderline cases.
This matters because the paper's "threshold" mixes:
- biological persistence,
- stochastic variation,
- and the search protocol itself.
That is acceptable for Track 1 replication, but it is one of the main reasons
Track 2 exists.
## Claimed Result Structure
The paper's most important reported pattern is that threshold cost can be
regressed on number of loci:
```text
C = C0 + n C1
```
where:
- `C0` is the fixed cost component,
- `C1` is the per-locus cost component,
- and both are analyzed as functions of mutation supply `M`.
Figure 1 and Table 1 are therefore not just descriptive outputs; they are the
main statistical structure the replication must recover if the implementation is
faithful.
## What Must Be Reproduced
A credible Track 1 replication should reproduce, or clearly fail to reproduce,
all of the following:
- the paper's parameter framing in terms of `u`, `K`, `R`, `T`, and derived `M`
- the threshold-search behavior over repeated stochastic runs
- the locus-sweep regression structure `C = C0 + n C1`
- the directional effect of mutation supply on fixed and per-locus cost
- the extinction/non-extinction boundary under the published search rule
## Key Ambiguities In The Paper
Several implementation details are underdetermined by the paper text and must
be treated explicitly as reconstruction choices:
- exact generation update order
- exact stochastic law for realized births
- exact mutation operator over the allele set
- exact practical allele-state truncation for finite runs
- exact sex realization rule
- exact extinction condition in code
These do not make replication impossible, but they mean "faithful replication"
is always conditional on a documented reconstruction policy.
## Current Replication Reading
The present Track 1 implementation uses the following interpretation:
- integer allele states with finite truncation tied to the run horizon
- lottery polygyny with one male sampled per female reproductive event
- births drawn stochastically from fecundity
- offspring survival governed by `w_i`
- extinction on zero population or absence of one sex
- explicit reporting of `f`, mean `w`, `f*w`, mutation supply, allele tracking,
and extinction timing
This is intended to stay as close as possible to the paper while making the
reconstruction auditable.
## Main Scientific Risks
The main ways the replication could still diverge from the paper are:
1. the wrong stochastic realization of fecundity and survival
2. an off-by-one time alignment in `t` versus offspring evaluation
3. mutation semantics that do not match the paper's effective `M = 2Ku`
treatment
4. a threshold-search implementation that is formally similar to Nunney's but
operationally too permissive or too strict
These are exactly the points where current diagnostics and paper-scale runs
should be examined.
## How This Effort Differs From Nunney
Nunney's paper presents the biological argument through a simulation workflow.
This repo separates that into layers:
- simulation kernel
- threshold search
- analysis and reporting
- dataset generation
- extinction fitting
- orchestration
That separation does not change the Track 1 target; it makes it inspectable.
## Current Bottom Line
The project should treat Nunney's paper as making three distinct deliverables
necessary:
1. a faithful historical reconstruction of the published simulation and search
rule
2. a clear statement of where the paper is underdetermined
3. a modern replacement path that keeps the biological question while replacing
the threshold heuristic and performance limitations
Track 1 in `renunney` now addresses the first two. Track 2 remains the next
major scientific and engineering step.