How to use Text-to-Speech (TTS) with Sync

Last updated: March 10, 2026

Context

Sync integrates with ElevenLabs to let you generate lip-synced video from text — no separate audio file needed. Many users run into issues because the integration is not enabled, the voice ID is invalid, or the script exceeds the character limit. This guide walks you through setup and common pitfalls.

How to set up TTS

  1. Enable the ElevenLabs integration — Go to Settings → Integrations in the Sync Studio and toggle ElevenLabs on. You can use Sync's built-in ElevenLabs integration on any plan, or bring your own ElevenLabs API key on Creator plans and above.

  2. Choose a voice — You need a valid ElevenLabs voiceId (not the voice name or display label). Find voice IDs in the ElevenLabs Voice Library or your ElevenLabs dashboard.

  3. Write your script — Keep it under 5,000 characters per generation. For longer scripts, use the Segments API to split text across multiple TTS inputs.

  4. Submit your generation — In the Studio, select “Text” as your audio input, paste your script, choose your voice, and click Generate. Via the API, include a TTS input object with type: "text", your voiceId, and your script.

Common TTS errors and fixes

ErrorCauseFix
Internal Server Error during TTSElevenLabs integration not enabled or temporary service issueCheck Integrations settings. If enabled, retry after a few minutes. Contact [email protected] if it persists.
“The voice may violate ElevenLabs Terms of Service”Selected voice flagged by ElevenLabs content policyChoose a different voice. Cloned voices of public figures may trigger this.
generation_text_length_exceededScript exceeds 5,000 charactersShorten your script or split into segments using the Segments API.
generation_input_validation_failedInvalid voiceId or missing required fieldsVerify your voiceId is a valid ElevenLabs voice ID string. Ensure both video and TTS inputs are included.
Audio generated but no video outputMissing video input in requestInclude a valid video input alongside your TTS input.

Tips for best results

  • Use clean, well-punctuated text — TTS quality depends on clear sentence structure.
  • Avoid special characters, emojis, or excessive formatting in your script.
  • Test with a short clip first (under 30 seconds) to verify voice quality before processing longer content.
  • For multi-language content, ensure the selected voice supports the target language.

Related docs: Text-to-Speech Lipsync GuideError HandlingTroubleshooting