Please enable JavaScript.
Coggle requires JavaScript to display documents.
TTS, Novel NLP Methods
for Improved Text-To-Speech Synthesis
2021, Novel…
TTS
TTS Main Process
Text Normalization
Converting text to a standard format (e.g., expanding abbreviations, handling numbers)
Part-of-speech tagging
Identifying the grammatical role of each word (e.g., noun, verb).
-
-
Prosody Generation
Applying the determined pitch, duration, and stress patterns to the sequence of phonemes
-
-
-
-
-
Audio Post-Processing
Cleaning up the audio to enhance clarity and naturalness by removing artifacts and adjusting volume levels
-
-
-
TTS improvements
Pre-processing
-
Technique/methods
Data augmentation
Techniques like time stretching, pitch shifting, and adding background noise artificially enrich the training data, improving model robustness to variations in real-world speech
-
Synthesis
Error types
Poor Prosody
Unnatural or monotonous intonation, rhythm, and stress patterns
Mispronunciation
Homograph Errors
Incorrect pronunciation of words that have the same spelling but different pronunciations based on context
-
-
Techniques/methods
-
Multi-speaker training
Training on data from multiple speakers enables the model to generate diverse voices and adapt to different speaking styles.
Prosody modeling
Techniques like phrase boundaries and intonation control regulate the rhythm, emphasis, and pitch of the synthesized speech, improving naturalness and clarity.
End-to-end learning
This approach integrates text processing and speech synthesis in a single model, simplifying the pipeline and potentially improving both accuracy and naturalness
-
Introduction
TTS (Text-to-Speech) is one of the main elements of human-machine interaction systems. As the name suggests, a text-to-speech system converts text into spoken audio and thus a machine (such as a robot) can interact via using speech with its
environment.
Sinhala TTS
Sinhala is the first language of Sri Lanka and speaks by over 16 million people which are the
major ethnic group in the country.Few researchers have tried to develop Sinhala-dependent TTS systems using traditional
methods.
Existing Sinhala TTS
According to Weerasighe et al. [9], their synthesis system
named "Festival-si
-
Wickramasinghe, Kumara, and Dias [12]
highlight several fundamental challenges
-
-
Confusion between numbers, fractions, and dates within
the text
-
-
-
-
-
-
-
-
VOICE FILTER: FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION USING VOICE
CONVERSION AS A POST-PROCESSING MODULE
2022
-
-
-
-