This entry posts the calibrated one-page visual summary for the transport-geometry framing and experiment sequencing.
Source file: {{ page.source_path }}
The Transport Geometry Framework: Visual Summary (Calibrated)
Status Legend
Observed: directly supported by current runs.Hypothesis: theory-consistent but not yet validated in this setup.Test: discriminating experiment with pass/fail criteria.
One-Page Concept Map
┌─────────────────────────────────────────────────────────────────────────┐
│ TRANSFORMER FORWARD PASS │
│ │
│ Input -> Attention -> MLP -> ... -> Logits │
│ │ │
│ v │
│ (Hypothesis) attention weights can be interpreted as │
│ entropy-regularized transport plans │
│ │
└─────────────────────────────────────────────────────────────────────────┘
SAE intervention point: residual stream h_L
h_L -> SAE -> h_hat_L
| |
| +-- Reconstruction metrics (Observed): R2, cosine, MSE
|
+-- Behavioral metric (Observed): CE_rec after patching
Candidate bridge (Hypothesis):
geometry-aware sensitivity at layer L predicts CE drift
Three Spaces (What We Mean)
1. Activation (primal) space
- Object: hidden vectors $h_L$.
- Typical metric: Euclidean error $|h_L-\hat h_L|^2$.
- Used by: standard SAE reconstruction objectives.
2. Probability-sensitive space
- Object: downstream predictive behavior.
- Typical local metric: pullback of output Fisher through downstream map.
- Candidate link to behavior: CE sensitivity to perturbation direction.
3. Attention-plan space (optional second-wave)
- Object: attention matrices before/after patching.
- Candidate diagnostics: per-head plan divergence (KL/JS).
- Role: mechanism probe, not required for first-pass paper claim.
Core Observed Pattern
Observed in current budgeted regime:
- Mid-layer, fixed low $k$: larger model can have higher $R^2$ and lower $CE_{rec}$.
- This means reconstruction quality and behavioral preservation can diverge.
What is not observed yet:
- Whether this persists after stronger low-$k$ convergence checks.
- Whether pullback/Fisher metrics explain the residual gap.
Decision Tree (Execution-First)
START
|
+--> Test A: low-k token sweep (10M/50M/100M, both models)
| |
| +--> gap shrinks to ~0 with uncertainty -> optimization-dominant story
| |
| +--> residual gap remains -> continue mechanism tests
|
+--> Test B: sensitivity-weighted distortion vs R2
| |
| +--> SWD outpredicts R2 for CE -> geometry-aware support
| |
| +--> no gain -> revisit hypotheses
|
+--> Test C (optional second wave): pullback/Fisher and MI critics
Rate-Distortion-Geometry (Calibrated Framing)
rate (bits/token) <-> distortion (behavior) <-> metric choice
Key point:
The measured rate-distortion curve depends on the distortion geometry.
Interpretation:
- In Euclidean distortion, SAE may look better than behavior suggests.
- In sensitivity-weighted distortion, ranking may align better with CE.
- This is exactly what the experiments should test.
Operational Definition (Single Gap Definition)
Use one primary proxy-gap statistic: $$ \text{Gap}(L,k,T)=(R^2_{big}-R^2_{small})-(CE_{rec,big}-CE_{rec,small}). $$
And one sign mismatch indicator: $$ I_{\text{mismatch}}=\mathbf{1}[\Delta R^2>0 \land \Delta CE_{rec}<0]. $$
No alternate definition unless explicitly marked secondary.
Key Equations (Reference)
Attention softmax
$$ A_{ij}=\frac{\exp(q_i^\top k_j/\tau)}{\sum_\ell \exp(q_i^\top k_\ell/\tau)} $$
Output Fisher for categorical distribution
$$ F_{out}=\operatorname{diag}(p)-pp^\top $$
Pullback metric (ideal form)
$$ G_L=J_L^\top F_{out}J_L $$
Sensitivity-weighted distortion (first-pass tractable proxy)
$$ SWD_L=\mathbb{E}[(g_L^\top(h_L-\hat h_L))^2],\quad g_L=\nabla_{h_L}\mathcal L_{CE} $$
Practical Takeaway
Strong claim today:
- behavior-aware evaluation is necessary in low-$k$ mid-layer regimes.
Conditional claim to test next:
- geometry-aware metrics provide consistent predictive lift over Euclidean reconstruction metrics.