← All docs

Spectral anomaly validator

Flag time-frequency windows whose spectrum deviates >z_thresh MAD from the track’s own middle-60% baseline. Three axes per frame: RMS amplitude, high-band energy (>3 kHz, for bird chirps / clicks), low-band energy (<200 Hz, for thumps / rumble).

How scoring works

  1. Decode audio to mono 22.05 kHz
  2. Walk 50 ms frames with 50 ms hop (no overlap)
  3. Per frame: compute RMS, spectral centroid, high_ratio (>3 kHz), low_ratio (<200 Hz)
  4. Baseline: median + MAD of the middle 60% (excludes intro / outro / seam)
  5. Compute z-scores per axis
  6. Merge adjacent flagged frames (gap ≤ 250 ms)
  7. Emit one Finding per merged run

Verdict mapping

FindingsVerdict
Noneship
Minor onlyship (policy allows minor)
Any moderateregenerate
Any severeneeds_review

Parameters

  • z_thresh (default 4.0, nullable) — minimum MAD-σ to flag. Higher = more permissive. null disables the validator for that scope.
  • z_thresh_per_category (Record<string, number | null>) — override by category. nature: null disables it for nature tracks (they have lots of legitimate transients).
  • z_thresh_per_pattern (Record<string, number | null>) — track-id substring match. Longest pattern wins. E.g. white-noise: 5.0 loosens the threshold for tracks whose id contains “white-noise”.
  • min_run_sec (default 0.04) — skip runs shorter than this (filters single-frame outliers).

Tuning philosophy

  • DSP synthetic content: tight (z_thresh ~ 3-4). Anything outlier IS a bug.
  • Field recordings (nature, animal): loose (z_thresh ~ 5-6 OR null). Real environmental sounds have intentional outliers.
  • Pure noise (colored): very loose (~ 5-6). Noise has high variance by nature.
  • Synth-pure (binaural beats, solfeggio): very tight (~ 3) OR paired with intent_fidelity_override.

Common false positives

  • Loop-seam transients on freq-sweeps — fixed by palindromic_loop
  • Heavy reverb tails being read as “tonal artifacts” — null the per_pattern entry for that track
  • Detected transients in nature beds (bird chirps that ARE the track) — null the category

Runtime

~5s per 90s track on this EC2 box.