Universal TTS Guide

A comprehensive guide to TTS dataset prep and training

View on GitHub

Guide 6: Troubleshooting and Resources

Navigation: Main README Previous Step: Packaging and Sharing

This guide provides solutions for common issues encountered during the TTS data preparation, training, and inference process, along with a list of useful tools and resources.


8. Troubleshooting Common Issues

Refer to this table when you encounter problems. Issues often trace back to data quality or configuration settings.

Problem Category Specific Issue Possible Causes & Solutions Relevant Guide(s)
Data Preparation Script errors during chunking/normalization Incorrect file paths; unsupported audio format initially; missing dependencies (ffmpeg, pydub); extremely noisy/silent audio confusing silence detection. Check script paths, install dependencies, adjust silence parameters. 1_DATA_PREPARATION.md
  Manifest generation skips many files Mismatched filenames between audio and transcripts; empty transcript files; incorrect paths specified in the script; non-UTF8 encoding in text files. Verify naming, check paths, ensure text files have content & UTF-8 encoding. 1_DATA_PREPARATION.md
Training Setup pip install fails Missing system libraries (e.g., libsndfile-dev); incompatible Python version; network issues; conflicts between packages. Read error messages carefully, install system libs, use virtual env, check framework docs for prerequisites. 2_TRAINING_SETUP.md
  PyTorch cuda is not available Incorrect PyTorch version installed (CPU-only); incompatible NVIDIA driver/CUDA toolkit version; GPU not detected by OS. Reinstall PyTorch with correct CUDA version from official site, update drivers. 2_TRAINING_SETUP.md
Training Execution CUDA Out-of-Memory (OOM) error at start/during train batch_size too large for GPU VRAM; model architecture too complex; memory leak in framework/custom code. Reduce batch_size in config; enable Automatic Mixed Precision (AMP/FP16) if available; check for framework updates. 2_TRAINING_SETUP.md, 3_MODEL_TRAINING.md
  Training Loss is NaN or diverges (explodes) Learning rate too high; unstable gradients; bad data batch (e.g., corrupted audio/text); numerical precision issues. Lower learning rate; check data quality; use gradient clipping (often enabled by default); try FP32 if using AMP/FP16. 2_TRAINING_SETUP.md, 3_MODEL_TRAINING.md
  Training Loss stagnates (doesn’t decrease) Learning rate too low; poor data quality/variety; model stuck in local minimum; incorrect model configuration. Increase learning rate slightly; improve/augment data; check config (esp. audio params); try different optimizer. 1_DATA_PREPARATION.md, 2_TRAINING_SETUP.md, 3_MODEL_TRAINING.md
  Validation Loss increases while Training Loss decreases (Overfitting) Model memorizing training data; insufficient/unrepresentative validation set; training for too long. Stop training early (based on best val loss); add more diverse training data; use regularization (weight decay, dropout - check config); improve validation set. 1_DATA_PREPARATION.md, 3_MODEL_TRAINING.md
Inference Quality Output sounds robotic/monotonic Insufficient training; poor prosody in training data; model architecture limitations; text normalization issues. Train longer; improve data variety/quality; try different model architecture; ensure text is punctuated/normalized well. 1_DATA_PREPARATION.md, 3_MODEL_TRAINING.md, 4_INFERENCE.md
  Output is noisy/garbled/unintelligible Bad data quality (noise baked in); model didn’t converge; mismatch between training config and inference config/checkpoint; incorrect sampling rate used in inference. Clean training data rigorously; train longer; ensure EXACT config/checkpoint match; verify audio parameters. All Guides
  Output sounds like the wrong speaker (fine-tuning) Pre-trained model not loaded correctly; learning rate too high initially; insufficient fine-tuning data/steps; speaker ID mismatch. Verify pretrained_model_path and ignore_layers in config; use lower LR for fine-tuning; train longer; check speaker ID. 2_TRAINING_SETUP.md, 3_MODEL_TRAINING.md, 4_INFERENCE.md
  Inference cuts off early or speaks too fast/slow Model limitation (duration prediction); inference setting limiting max output length; length scale/speed parameter incorrect. Check framework docs for max decoder steps / max length settings; adjust speed control parameters. 4_INFERENCE.md
Model Usage Cannot load checkpoint file Corrupted download/file; using checkpoint with incompatible framework version or config file; incorrect file path. Re-download/verify file integrity; use the correct config; ensure framework version matches the one used for training; check path. 5_PACKAGING_AND_SHARING.md, 4_INFERENCE.md

10. Useful Resources & Tools

This list includes software, libraries, and communities helpful for TTS projects.

Audio Processing & Analysis:

Transcription (ASR):

TTS Frameworks & Codebases (Examples - Check for active forks/successors):

Python Environment & Deep Learning:

Communities:


This concludes the main series of guides. Remember that building good TTS models often involves iteration – revisiting data preparation or adjusting training parameters based on results is common practice. Good luck!


Navigation: Main README | Previous Step: Packaging and Sharing | Back to Top