Topological Signals in Learned Odor Embeddings

This entry points to the odor-topology representation audit.

The question was intentionally narrower than "what is the topology of odor space?":

if a learned odor embedding appears to show nontrivial topology, does that signal survive metric choice, subsampling, null models, checkpoint variation, and comparison to ordinary chemical baselines?

Main result

OpenPOM shows reproducible persistent H1 signal, and that signal survives the obvious robustness checks: repeated subsampling, matched nulls, dataset changes, and checkpoint variation.

The important caveat is that the signal is not unique to POM. Strong Morgan fingerprint baselines are often at least as strong on the raw topology metrics.

So the useful claim is not "POM discovers the hidden loops of odor." It is stricter:

compressed learned odor spaces can preserve real topological structure, but robust topology is not automatically a learned-embedding-only phenomenon.

Why it belongs here

This sits next to the biological representation-geometry work: it asks what a learned embedding actually justifies once the metric, null model, and baseline are treated seriously. The interesting part is the compression mismatch. POM is a 256-dimensional dense embedding; the strongest chemical baselines are 2048-bit sparse combinatorial encodings. Seeing robust topology survive in the compressed learned space is nontrivial, even if it does not uniquely favor the learned model.

Main result

Why it belongs here

Links