YAMNet events validator
Runs Google’s YAMNet (521-class AudioSet CNN) over 1-sec windows.
Flags any class detected above score_threshold unless that class
is in the per-category allowlist.
How scoring works
- Decode audio to mono 16 kHz
- Load YAMNet from TensorFlow Hub (17 MB, cached at
~/.tfhub_cache) - Inference: (num_frames, 521) score matrix
- Per frame: collect classes above threshold
- Group consecutive detections of the same class
- Emit one Finding per group, severity by max confidence
Verdict mapping
| Confidence | Severity | Verdict impact |
|---|---|---|
| 0.30-0.45 | minor | ship (if only minor) |
| 0.45-0.70 | moderate | regenerate |
| > 0.70 | severe | needs_review |
Parameters
- score_threshold (default 0.30) — global default
- score_threshold_per_category — per-category override
- allowed_classes —
Record<category, string[]>. Classes in the allowlist for that category are filtered out before findings emit.
Sample allowlist (current pipeline.yaml)
allowed_classes:
nature: [Rain, Wind, Thunderstorm, Stream, Babbling brook, Bird, ...]
calming: [Drone, Pad, Vibraphone, Piano, ...]
frequencies: [Sine wave, Square wave, Triangle wave, Sawtooth wave, Beat, ...]
affirmations: [Speech, Narration, monologue, Female speech, Whispering, ...]
The full 521-class AudioSet ontology is bundled in
tools/curate/yamnet_class_names.json. When you find a legitimate
class being flagged on a track, add it to that category’s allowlist.
Common false positives to add to allowlist
Pink noise,White noise,Brown noiseonnoisescategorySpeechonaffirmations(it’s expected!)Cat,Domestic animals, petsoncalming-cat-purringHumon drone tracksVibraphone/Glockenspielon bell tracks
Tuning philosophy
- Start permissive (0.4 default) and tighten per-category when needed
- Always inspect the actual flagged classes before adding new ones
- A single false positive at 0.95 confidence is usually a real bug (model is very sure); investigate before silencing
Runtime
~5 s per 90 s track. The TensorFlow model load takes 1-2 s the first time per process; cached afterward.