How to use Text-to-Speech (TTS) with Sync

Last updated: March 10, 2026

Context

Sync integrates with ElevenLabs to let you generate lip-synced video from text — no separate audio file needed. Many users run into issues because the integration is not enabled, the voice ID is invalid, or the script exceeds the character limit. This guide walks you through setup and common pitfalls.

How to set up TTS

Enable the ElevenLabs integration — Go to Settings → Integrations in the Sync Studio and toggle ElevenLabs on. You can use Sync's built-in ElevenLabs integration on any plan, or bring your own ElevenLabs API key on Creator plans and above.
Choose a voice — You need a valid ElevenLabs voiceId (not the voice name or display label). Find voice IDs in the ElevenLabs Voice Library or your ElevenLabs dashboard.
Write your script — Keep it under 5,000 characters per generation. For longer scripts, use the Segments API to split text across multiple TTS inputs.
Submit your generation — In the Studio, select “Text” as your audio input, paste your script, choose your voice, and click Generate. Via the API, include a TTS input object with type: "text", your voiceId, and your script.

Common TTS errors and fixes

Error	Cause	Fix
Internal Server Error during TTS	ElevenLabs integration not enabled or temporary service issue	Check Integrations settings. If enabled, retry after a few minutes. Contact [email protected] if it persists.
“The voice may violate ElevenLabs Terms of Service”	Selected voice flagged by ElevenLabs content policy	Choose a different voice. Cloned voices of public figures may trigger this.
generation_text_length_exceeded	Script exceeds 5,000 characters	Shorten your script or split into segments using the Segments API.
generation_input_validation_failed	Invalid voiceId or missing required fields	Verify your voiceId is a valid ElevenLabs voice ID string. Ensure both video and TTS inputs are included.
Audio generated but no video output	Missing video input in request	Include a valid video input alongside your TTS input.

Tips for best results

Use clean, well-punctuated text — TTS quality depends on clear sentence structure.
Avoid special characters, emojis, or excessive formatting in your script.
Test with a short clip first (under 30 seconds) to verify voice quality before processing longer content.
For multi-language content, ensure the selected voice supports the target language.

Related docs: Text-to-Speech Lipsync Guide • Error Handling • Troubleshooting