Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum

Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno

研究成果査読

1 被引用数 (Scopus)

抄録

As described in this paper, we propose a sub-band speech syn- thesis approach to develop a high quality Text-to-Speech (TTS) system: A sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech syn- thesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smooth- ness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.

本文言語English
ページ(範囲)264-268
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2015-January
出版ステータスPublished - 2015
イベント16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden
継続期間: 9月 6 20159月 10 2015

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル