Work Trail
Chronological record of research entries and artifacts.
-
audio
MC-LA vs Hybrid MC-LA 1:3: Live Training (In Progress) in progress
Two memory-cached linear attention variants training on 25k FMA tracks. Early results: the hybrid nearly matches full MC-LA on loss while running 1.9x faster. GRM gating is learning selectivity. FMA-L
-
audio
Memory Caching for SSMs: From Paper to Implementation
Applying Memory Caching (MC) from arXiv:2602.24281 to Mamba and linear attention for music generation — a novel extension the original paper never tested. Architecture decisions, Output Activation Cac
-
audio
SSMs for Music Generation: Baseline Experiments
Baseline comparison of Transformer vs Hybrid Mamba-Attention (1:3) for autoregressive music generation over DAC tokens. The hybrid matches the transformer on all codebook metrics while being architect
-
interpretability
Your SAE Looks Solved. Your Model Disagrees. Part II: Transport Geometry as a Testable Hypothesis
The sign mismatch was an optimization artifact. The magnitude gap is not. Here's what I think explains it and how I'm testing that.
-
interpretability
Your SAE Looks Solved. Your Model Disagrees. Part III: The Gap Mostly Closes When You Train It (And Why That Matters)
Phase-2 update: the low-k mid-layer proxy mismatch weakens and reverses with larger SAE training budgets, implying a regime-dependent, optimization-heavy story in the tested setting.
-
interpretability
Primal-Dual Gaps Paper Scaffold (LaTeX Zip)
Posted scaffold archive for the paper draft workflow, including main.tex, references, and build instructions.
-
interpretability
Proxy Gap, Transport Geometry, and Falsifiable Mechanism Tests (Paper Outline v2)
Working manuscript structure for the SAE proxy-gap program: claim ladder, pre-registered decision rules, and experiment plan.
-
interpretability
Transport Geometry Framework: Visual Summary (Calibrated)
Visual concept map and decision tree for activation-space vs behavior-space metrics, including calibrated status labels and test gates.
-
interpretability
Your SAE Looks Solved. Your Model Disagrees.
Initial published writeup on the depth-localized SAE proxy gap: high R^2 can still coincide with degraded CE preservation.