Spectral anomaly validator
Flag time-frequency windows whose spectrum deviates >z_thresh MAD
from the track’s own middle-60% baseline. Three axes per frame: RMS
amplitude, high-band energy (>3 kHz, for bird chirps / clicks),
low-band energy (<200 Hz, for thumps / rumble).
How scoring works
- Decode audio to mono 22.05 kHz
- Walk 50 ms frames with 50 ms hop (no overlap)
- Per frame: compute RMS, spectral centroid, high_ratio (>3 kHz), low_ratio (<200 Hz)
- Baseline: median + MAD of the middle 60% (excludes intro / outro / seam)
- Compute z-scores per axis
- Merge adjacent flagged frames (gap ≤ 250 ms)
- Emit one Finding per merged run
Verdict mapping
| Findings | Verdict |
|---|---|
| None | ship |
| Minor only | ship (policy allows minor) |
| Any moderate | regenerate |
| Any severe | needs_review |
Parameters
- z_thresh (default 4.0, nullable) — minimum MAD-σ to flag.
Higher = more permissive.
nulldisables the validator for that scope. - z_thresh_per_category (
Record<string, number | null>) — override by category.nature: nulldisables it for nature tracks (they have lots of legitimate transients). - z_thresh_per_pattern (
Record<string, number | null>) — track-id substring match. Longest pattern wins. E.g.white-noise: 5.0loosens the threshold for tracks whose id contains “white-noise”. - min_run_sec (default 0.04) — skip runs shorter than this (filters single-frame outliers).
Tuning philosophy
- DSP synthetic content: tight (z_thresh ~ 3-4). Anything outlier IS a bug.
- Field recordings (nature, animal): loose (z_thresh ~ 5-6 OR null). Real environmental sounds have intentional outliers.
- Pure noise (colored): very loose (~ 5-6). Noise has high variance by nature.
- Synth-pure (binaural beats, solfeggio): very tight (~ 3) OR paired with intent_fidelity_override.
Common false positives
- Loop-seam transients on freq-sweeps — fixed by
palindromic_loop - Heavy reverb tails being read as “tonal artifacts” — null the per_pattern entry for that track
- Detected transients in nature beds (bird chirps that ARE the track) — null the category
Runtime
~5s per 90s track on this EC2 box.