TY - JOUR
T1 - Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum
AU - Inai, Tadashi
AU - Hara, Sunao
AU - Abe, Masanobu
AU - Ijima, Yusuke
AU - Miyazaki, Noboru
AU - Mizuno, Hideyuki
N1 - Publisher Copyright:
Copyright © 2015 ISCA.
PY - 2015
Y1 - 2015
N2 - As described in this paper, we propose a sub-band speech syn- thesis approach to develop a high quality Text-to-Speech (TTS) system: A sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech syn- thesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smooth- ness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.
AB - As described in this paper, we propose a sub-band speech syn- thesis approach to develop a high quality Text-to-Speech (TTS) system: A sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech syn- thesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smooth- ness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.
KW - HMM-based speech synthesis
KW - Sub-band
KW - Waveform-based speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84959169493&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959169493&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84959169493
SN - 2308-457X
VL - 2015-January
SP - 264
EP - 268
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Y2 - 6 September 2015 through 10 September 2015
ER -