Research Log
Notes, drafts, and working claims from ongoing research in mechanistic interpretability and audio generation with SSMs.
Audio Generation & SSMs
Mamba, linear attention, memory caching, and hybrid architectures for music generation.
-
MC-LA vs Hybrid MC-LA 1:3: Live Training (In Progress) in progress
Two memory-cached linear attention variants training on 25k FMA tracks. Early results: the hybrid nearly matches full MC-LA on loss while running 1.9x faster. GRM gating is learnin
-
Memory Caching for SSMs: From Paper to Implementation
Applying Memory Caching (MC) from arXiv:2602.24281 to Mamba and linear attention for music generation — a novel extension the original paper never tested. Architecture decisions, O
-
SSMs for Music Generation: Baseline Experiments
Baseline comparison of Transformer vs Hybrid Mamba-Attention (1:3) for autoregressive music generation over DAC tokens. The hybrid matches the transformer on all codebook metrics w
Mechanistic Interpretability
SAE proxy gaps, scaling laws, transport geometry, and representation analysis.
-
Your SAE Looks Solved. Your Model Disagrees. Part II: Transport Geometry as a Testable Hypothesis
The sign mismatch was an optimization artifact. The magnitude gap is not. Here's what I think explains it and how I'm testing that.
-
Your SAE Looks Solved. Your Model Disagrees. Part III: The Gap Mostly Closes When You Train It (And Why That Matters)
Phase-2 update: the low-k mid-layer proxy mismatch weakens and reverses with larger SAE training budgets, implying a regime-dependent, optimization-heavy story in the tested settin
-
Primal-Dual Gaps Paper Scaffold (LaTeX Zip)
Posted scaffold archive for the paper draft workflow, including main.tex, references, and build instructions.
-
Proxy Gap, Transport Geometry, and Falsifiable Mechanism Tests (Paper Outline v2)
Working manuscript structure for the SAE proxy-gap program: claim ladder, pre-registered decision rules, and experiment plan.
-
Transport Geometry Framework: Visual Summary (Calibrated)
Visual concept map and decision tree for activation-space vs behavior-space metrics, including calibrated status labels and test gates.
-
Your SAE Looks Solved. Your Model Disagrees.
Initial published writeup on the depth-localized SAE proxy gap: high R^2 can still coincide with degraded CE preservation.