Abstract
As described in this paper, we propose a sub-band speech syn- thesis approach to develop a high quality Text-to-Speech (TTS) system: A sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech syn- thesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smooth- ness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.
Original language | English |
---|---|
Pages (from-to) | 264-268 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
Publication status | Published - 2015 |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: Sept 6 2015 → Sept 10 2015 |
Keywords
- HMM-based speech synthesis
- Sub-band
- Waveform-based speech synthesis
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation