Please enable JavaScript.
Coggle requires JavaScript to display documents.
evaluation metric for emotional TTS - Coggle Diagram
evaluation metric for emotional TTS
전반적인 음질 (fidelity)
Subjective
Naturalness MOS
ABX
MUSHRA
CMOS
PESQ
STOI
Objective
MCD
PCC
F0
FFE (f0 frame error)
GPE (Gross pitch error)
VDE (Voicing decision error)
no. of errors (skipping, repeating, mispronounciation)
Linguistic related
WER
CER
PER
WIL (Word information lost)
감정 표현력
Subjective
Expressive MOS
ABX
MUSHRA
CMOS
Objective
SER accuracy, confusion matrix
Diversity
f0 distribution distance (Wasserstein distance)
JSD, NDB
speaking rate eror
Statistics of prosody features (f0/energy/duration)
mean, std
PCC (Pearson's correlation coefficient)
NLL
KLD
RMSE (MCD etc.)
visualization tool
t-SNE
emotional embeddings
spk embeddings
pitch contour
mel-spectrogram
f0 and delta-f0 distribution
idea
self-supervised embedding으로 ?
KNN 성능 잴 때 쓰는 metric
glicko-2 ranking
화자 특징
Objective
SV metrics
EER
Micro/Macro F1 score
Cosine similarity
Subjective
speaker MOS
speaker ABX
speaker CMOS
efficiency
Inference latency
RTF
Model size