← All docs

ElevenLabs SFX (consensus regen)

When a catalog track can’t be DSP’d convincingly (animal sounds, field-recording textures, anything organic), this pipeline re-rolls it via ElevenLabs SFX with three prompt variants per sound_class, then judges each candidate with 3× Gemini consensus. Promotes the first one that ships at 2-of-3 consensus.

How prompts vary by sound_class

The pipeline reads the track’s sound_class from sounds.json and picks a prompt template:

  • cultural-instrument: “pristine close-mic recording of {description}”
  • nature-ambient: “high-quality field recording of {description}”
  • animal-specific: “pristine close recording of {description}”
  • synth-ambient: “sustained synth pad of {description}”
  • voice-narration: (NOT used here — use the TTS affirmations pipeline)
  • mechanical-ambient: “high-quality recording of {description}”

Then it generates 3 variants (different temperature seeds) per prompt template, each 22s long (ElevenLabs SFX max).

The post-processing chain

Every candidate goes through:

  1. period_snap_loop(target_sec=14) — snap to N×amplitude-period
  2. hf_spectral_gate(hf_only_hz=2500, reduction_db=-15) — kill hiss
  3. master_bus chain: hi-pass 30 Hz, EQ, multiband, stereo widen, Schroeder reverb (mix 0.10), LUFS-norm, true-peak limit

Consensus judging

Each candidate is judged 3× by Gemini 2.5 Pro. If ≥ ship_threshold (default 2) of the runs return verdict: ship AND all axes ≥ 8, the candidate is promoted. Otherwise the next candidate is tried.

If all candidates fail, the track is left at its current shipping audio and flagged in the log as “consensus failed”.

Why consensus matters

A single Gemini call has substantial variance. We’ve seen the same pure 639 Hz sine wave get 10/10 on one call and 2/10 on the next. 3-of-3 consensus on synth content was the minimum reliable threshold in Phase 5. For organic content, 2-of-3 is sufficient.

Cost + runtime

  • 3 attempts × $0.05 = $0.15 ElevenLabs
  • ~$0.04 Gemini per consensus pass × 3 ≈ $0.12
  • Wall-clock: ~3 minutes total per track

When to reach for it

  • A catalog track sounds artificial / DSP-ish where listeners expect organic
  • The DSP version is “okay” but not “that’s clearly a real X”
  • You’ve already tuned the underlying synth and it’s still not landing