This entry posts the calibrated one-page visual summary for the transport-geometry framing and experiment sequencing.

Source file: {{ page.source_path }}


The Transport Geometry Framework: Visual Summary (Calibrated)

Status Legend

  1. Observed: directly supported by current runs.
  2. Hypothesis: theory-consistent but not yet validated in this setup.
  3. Test: discriminating experiment with pass/fail criteria.

One-Page Concept Map

┌─────────────────────────────────────────────────────────────────────────┐
│                         TRANSFORMER FORWARD PASS                        │
│                                                                         │
│ Input -> Attention -> MLP -> ... -> Logits                             │
│          │                                                             │
│          v                                                             │
│  (Hypothesis) attention weights can be interpreted as                  │
│  entropy-regularized transport plans                                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

SAE intervention point: residual stream h_L

h_L -> SAE -> h_hat_L
 |             |
 |             +-- Reconstruction metrics (Observed): R2, cosine, MSE
 |
 +-- Behavioral metric (Observed): CE_rec after patching

Candidate bridge (Hypothesis):
  geometry-aware sensitivity at layer L predicts CE drift

Three Spaces (What We Mean)

1. Activation (primal) space

  1. Object: hidden vectors $h_L$.
  2. Typical metric: Euclidean error $|h_L-\hat h_L|^2$.
  3. Used by: standard SAE reconstruction objectives.

2. Probability-sensitive space

  1. Object: downstream predictive behavior.
  2. Typical local metric: pullback of output Fisher through downstream map.
  3. Candidate link to behavior: CE sensitivity to perturbation direction.

3. Attention-plan space (optional second-wave)

  1. Object: attention matrices before/after patching.
  2. Candidate diagnostics: per-head plan divergence (KL/JS).
  3. Role: mechanism probe, not required for first-pass paper claim.

Core Observed Pattern

Observed in current budgeted regime:

  1. Mid-layer, fixed low $k$: larger model can have higher $R^2$ and lower $CE_{rec}$.
  2. This means reconstruction quality and behavioral preservation can diverge.

What is not observed yet:

  1. Whether this persists after stronger low-$k$ convergence checks.
  2. Whether pullback/Fisher metrics explain the residual gap.

Decision Tree (Execution-First)

START
 |
 +--> Test A: low-k token sweep (10M/50M/100M, both models)
 |      |
 |      +--> gap shrinks to ~0 with uncertainty -> optimization-dominant story
 |      |
 |      +--> residual gap remains -> continue mechanism tests
 |
 +--> Test B: sensitivity-weighted distortion vs R2
 |      |
 |      +--> SWD outpredicts R2 for CE -> geometry-aware support
 |      |
 |      +--> no gain -> revisit hypotheses
 |
 +--> Test C (optional second wave): pullback/Fisher and MI critics

Rate-Distortion-Geometry (Calibrated Framing)

rate (bits/token)  <->  distortion (behavior)  <->  metric choice

Key point:
The measured rate-distortion curve depends on the distortion geometry.

Interpretation:

  1. In Euclidean distortion, SAE may look better than behavior suggests.
  2. In sensitivity-weighted distortion, ranking may align better with CE.
  3. This is exactly what the experiments should test.

Operational Definition (Single Gap Definition)

Use one primary proxy-gap statistic: $$ \text{Gap}(L,k,T)=(R^2_{big}-R^2_{small})-(CE_{rec,big}-CE_{rec,small}). $$

And one sign mismatch indicator: $$ I_{\text{mismatch}}=\mathbf{1}[\Delta R^2>0 \land \Delta CE_{rec}<0]. $$

No alternate definition unless explicitly marked secondary.


Key Equations (Reference)

Attention softmax

$$ A_{ij}=\frac{\exp(q_i^\top k_j/\tau)}{\sum_\ell \exp(q_i^\top k_\ell/\tau)} $$

Output Fisher for categorical distribution

$$ F_{out}=\operatorname{diag}(p)-pp^\top $$

Pullback metric (ideal form)

$$ G_L=J_L^\top F_{out}J_L $$

Sensitivity-weighted distortion (first-pass tractable proxy)

$$ SWD_L=\mathbb{E}[(g_L^\top(h_L-\hat h_L))^2],\quad g_L=\nabla_{h_L}\mathcal L_{CE} $$


Practical Takeaway

Strong claim today:

  1. behavior-aware evaluation is necessary in low-$k$ mid-layer regimes.

Conditional claim to test next:

  1. geometry-aware metrics provide consistent predictive lift over Euclidean reconstruction metrics.