← All docs

YAMNet events validator

Runs Google’s YAMNet (521-class AudioSet CNN) over 1-sec windows. Flags any class detected above score_threshold unless that class is in the per-category allowlist.

How scoring works

  1. Decode audio to mono 16 kHz
  2. Load YAMNet from TensorFlow Hub (17 MB, cached at ~/.tfhub_cache)
  3. Inference: (num_frames, 521) score matrix
  4. Per frame: collect classes above threshold
  5. Group consecutive detections of the same class
  6. Emit one Finding per group, severity by max confidence

Verdict mapping

ConfidenceSeverityVerdict impact
0.30-0.45minorship (if only minor)
0.45-0.70moderateregenerate
> 0.70severeneeds_review

Parameters

  • score_threshold (default 0.30) — global default
  • score_threshold_per_category — per-category override
  • allowed_classesRecord<category, string[]>. Classes in the allowlist for that category are filtered out before findings emit.

Sample allowlist (current pipeline.yaml)

allowed_classes:
  nature: [Rain, Wind, Thunderstorm, Stream, Babbling brook, Bird, ...]
  calming: [Drone, Pad, Vibraphone, Piano, ...]
  frequencies: [Sine wave, Square wave, Triangle wave, Sawtooth wave, Beat, ...]
  affirmations: [Speech, Narration, monologue, Female speech, Whispering, ...]

The full 521-class AudioSet ontology is bundled in tools/curate/yamnet_class_names.json. When you find a legitimate class being flagged on a track, add it to that category’s allowlist.

Common false positives to add to allowlist

  • Pink noise, White noise, Brown noise on noises category
  • Speech on affirmations (it’s expected!)
  • Cat, Domestic animals, pets on calming-cat-purring
  • Hum on drone tracks
  • Vibraphone / Glockenspiel on bell tracks

Tuning philosophy

  • Start permissive (0.4 default) and tighten per-category when needed
  • Always inspect the actual flagged classes before adding new ones
  • A single false positive at 0.95 confidence is usually a real bug (model is very sure); investigate before silencing

Runtime

~5 s per 90 s track. The TensorFlow model load takes 1-2 s the first time per process; cached afterward.