ReNunney/docs/NUNNEY_ANALYSIS.md

225 lines
6.9 KiB
Markdown

# Nunney Analysis
Updated: 2026-04-11
## Purpose
This note gives a compact in-repo analysis of Nunney's main claims, equations,
and reported results, and how the current replication effort interprets them.
It is not a complete paper summary. Its job is to make the scientific and
implementation targets explicit enough that code and results can be reviewed
against them.
Primary paper:
- `nunney_cost_of_substitution_anz40-185.pdf`
Related internal notes:
- [COST_OF_SUBSTITUTION.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/COST_OF_SUBSTITUTION.md)
- [TRACK1_BASELINE.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/TRACK1_BASELINE.md)
- [PAPER_CODE_ALIGNMENT.md](/mnt/CIFS/pengolodh/Docs/Projects/collaborations/to_ptbc/evc/cost_of_substitution/PAPER_CODE_ALIGNMENT.md)
## Core Claim
Nunney's central claim is that the rate of environmental change a population
can tolerate depends on the cost of repeated adaptive substitutions, and that
this cost can be decomposed into:
- a fixed component, and
- a per-locus component.
The paper presents simulation evidence that both components depend on mutation
supply, summarized by `M = 2Ku`, and that extinction occurs when the
environment changes too quickly for the population to keep pace.
## Model Structure
The paper describes four interacting components:
1. constant environmental change
2. genotype-dependent survival relative to the moving optimum
3. density-dependent female fecundity
4. Mendelian transmission with mutation
The adaptive problem is staged so that one substitution is needed every `T`
generations at each selected locus. Smaller `T` means faster change and thus a
more demanding environment.
## Main Equations
### Growth/Fecundity
Nunney uses:
```text
R = 2 exp(r)
```
and:
```text
f = 2 exp(r * (1 - (N/K)^(1/r)))
```
Interpretation:
- `R` is the density-independent net reproductive rate.
- `f` is density-dependent female fecundity.
- fecundity is genotype-independent; selection enters through survival.
### Fitness / Offspring Survival
The key selection equation is:
```text
w_i = exp(-(r/n) * Σ_j (Av_ij - t/T)^2)
```
where:
- `Av_ij` is the mean allelic value at locus `j` in genotype `i`
- `t/T` is the moving optimum on the allele-value scale
- `n` is the number of loci
Interpretation:
- survival declines as genotype means lag the moving optimum,
- the factor `r/n` scales selection intensity across different numbers of
loci,
- and the Gaussian form makes tracking lag the central state variable.
### Mutation Supply
The paper uses `u` as the mutation-rate parameter and `M = 2Ku` as a derived
population-level mutation-supply quantity for comparing treatments.
Important consequence for replication:
- `u` is the paper-native input,
- `M` is a derived comparison variable,
- and the simulation must expose mutation across both diploid strands for the
`M = 2Ku` interpretation to make sense.
## Threshold Claim
Nunney's reported threshold is not a mathematically defined extinction
probability threshold. It is a simulation-search heuristic:
- find the lowest `T` with no extinctions in 20 runs,
- search from below,
- then require no extinction at `1.02T`, `1.05T`, and `1.10T`,
- with extra retesting in borderline cases.
This matters because the paper's "threshold" mixes:
- biological persistence,
- stochastic variation,
- and the search protocol itself.
That is acceptable for Track 1 replication, but it is one of the main reasons
Track 2 exists.
## Claimed Result Structure
The paper's most important reported pattern is that threshold cost can be
regressed on number of loci:
```text
C = C0 + n C1
```
where:
- `C0` is the fixed cost component,
- `C1` is the per-locus cost component,
- and both are analyzed as functions of mutation supply `M`.
Figure 1 and Table 1 are therefore not just descriptive outputs; they are the
main statistical structure the replication must recover if the implementation is
faithful.
## What Must Be Reproduced
A credible Track 1 replication should reproduce, or clearly fail to reproduce,
all of the following:
- the paper's parameter framing in terms of `u`, `K`, `R`, `T`, and derived `M`
- the threshold-search behavior over repeated stochastic runs
- the locus-sweep regression structure `C = C0 + n C1`
- the directional effect of mutation supply on fixed and per-locus cost
- the extinction/non-extinction boundary under the published search rule
## Key Ambiguities In The Paper
Several implementation details are underdetermined by the paper text and must
be treated explicitly as reconstruction choices:
- exact generation update order
- exact stochastic law for realized births
- exact mutation operator over the allele set
- exact practical allele-state truncation for finite runs
- exact sex realization rule
- exact extinction condition in code
These do not make replication impossible, but they mean "faithful replication"
is always conditional on a documented reconstruction policy.
## Current Replication Reading
The present Track 1 implementation uses the following interpretation:
- integer allele states with finite truncation tied to the run horizon
- lottery polygyny with one male sampled per female reproductive event
- births drawn stochastically from fecundity
- offspring survival governed by `w_i`
- extinction on zero population or absence of one sex
- explicit reporting of `f`, mean `w`, `f*w`, mutation supply, allele tracking,
and extinction timing
This is intended to stay as close as possible to the paper while making the
reconstruction auditable.
## Main Scientific Risks
The main ways the replication could still diverge from the paper are:
1. the wrong stochastic realization of fecundity and survival
2. an off-by-one time alignment in `t` versus offspring evaluation
3. mutation semantics that do not match the paper's effective `M = 2Ku`
treatment
4. a threshold-search implementation that is formally similar to Nunney's but
operationally too permissive or too strict
These are exactly the points where current diagnostics and paper-scale runs
should be examined.
## How This Effort Differs From Nunney
Nunney's paper presents the biological argument through a simulation workflow.
This repo separates that into layers:
- simulation kernel
- threshold search
- analysis and reporting
- dataset generation
- extinction fitting
- orchestration
That separation does not change the Track 1 target; it makes it inspectable.
## Current Bottom Line
The project should treat Nunney's paper as making three distinct deliverables
necessary:
1. a faithful historical reconstruction of the published simulation and search
rule
2. a clear statement of where the paper is underdetermined
3. a modern replacement path that keeps the biological question while replacing
the threshold heuristic and performance limitations
Track 1 in `renunney` now addresses the first two. Track 2 remains the next
major scientific and engineering step.