[1]. Laptev, A., Korostik, R., Svischev, A., Andrusenko, A., Medennikov, I. and Rybin, S., 2020. You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation. arXiv preprint arXiv:2005.07157.
(2) Rossenbach, N., Zeyer, A., Schlüter, R. and Ney, H., 2020, May. Generating synthetic audio data for attention-based speech recognition systems. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7069-7073). IEEE.
(3) Li, J., Gadde, R., Ginsburg, B. and Lavrukhin, V., 2018. Training neural speech recognition systems with synthetic speech augmentation. arXiv preprint arXiv:1811.00707.
(4) Rosenberg, A., Zhang, Y., Ramabhadran, B., Jia, Y., Moreno, P., Wu, Y. and Wu, Z., 2019, December. Speech recognition with augmented synthesized speech. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 996-1002). IEEE.
(5) Karita, S., Watanabe, S., Iwata, T., Delcroix, M., Ogawa, A. and Nakatani, T., 2019, May. Semi-supervised end-to-end speech recognition using text-to-speech and autoencoders. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6166-6170). IEEE.
(6) SLT 2018, Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, Tatsuya Kawahara, Kyoto University, School of Informatics, Sakyo-ku, Kyoto 606-8501, Japan
(7) Wang, Y., Stanton, D., Zhang, Y., Skerry-Ryan, R.J., Battenberg, E., Shor, J., Xiao, Y., Ren, F., Jia, Y. and Saurous, R.A., 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. arXiv preprint arXiv:1803.09017.