How to use Text-to-Speech (TTS) with Sync
Last updated: March 10, 2026
Context
Sync integrates with ElevenLabs to let you generate lip-synced video from text — no separate audio file needed. Many users run into issues because the integration is not enabled, the voice ID is invalid, or the script exceeds the character limit. This guide walks you through setup and common pitfalls.
How to set up TTS
Enable the ElevenLabs integration — Go to Settings → Integrations in the Sync Studio and toggle ElevenLabs on. You can use Sync's built-in ElevenLabs integration on any plan, or bring your own ElevenLabs API key on Creator plans and above.
Choose a voice — You need a valid ElevenLabs
voiceId(not the voice name or display label). Find voice IDs in the ElevenLabs Voice Library or your ElevenLabs dashboard.Write your script — Keep it under 5,000 characters per generation. For longer scripts, use the Segments API to split text across multiple TTS inputs.
Submit your generation — In the Studio, select “Text” as your audio input, paste your script, choose your voice, and click Generate. Via the API, include a TTS input object with
type: "text", yourvoiceId, and yourscript.
Common TTS errors and fixes
| Error | Cause | Fix |
|---|---|---|
| Internal Server Error during TTS | ElevenLabs integration not enabled or temporary service issue | Check Integrations settings. If enabled, retry after a few minutes. Contact [email protected] if it persists. |
| “The voice may violate ElevenLabs Terms of Service” | Selected voice flagged by ElevenLabs content policy | Choose a different voice. Cloned voices of public figures may trigger this. |
| generation_text_length_exceeded | Script exceeds 5,000 characters | Shorten your script or split into segments using the Segments API. |
| generation_input_validation_failed | Invalid voiceId or missing required fields | Verify your voiceId is a valid ElevenLabs voice ID string. Ensure both video and TTS inputs are included. |
| Audio generated but no video output | Missing video input in request | Include a valid video input alongside your TTS input. |
Tips for best results
- Use clean, well-punctuated text — TTS quality depends on clear sentence structure.
- Avoid special characters, emojis, or excessive formatting in your script.
- Test with a short clip first (under 30 seconds) to verify voice quality before processing longer content.
- For multi-language content, ensure the selected voice supports the target language.
Related docs: Text-to-Speech Lipsync Guide • Error Handling • Troubleshooting