A JAPANESE TEXT-TO-SPEECH SYSTEM BASED ON MULTI-FORM UNITS WITH CONSIDERATION OF FREQUENCY DISTRIBUTION IN JAPANESE

Kimihito Tanaka, Hideyuki Mizuno, Masanobu Abe, Shin'ya Nakajima

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper proposes our new text-to-speech (TTS) system that concatenates large numbers of speech segments to produce very natural and intelligible synthetic speech. One novel point of our system is its new synthesis unit, which is has three remarkable characteristics as follows; (1) The synthesis units contain all Japanese syllables together with all possible vowel sequences, so very smooth synthetic speech is produced. (2) Both previous and succeeding phoneme environments are considered when speech segments are concatenated, so natural sounding transients from a vowel to a consonant, which is the only concatenation point with the proposed unit, are present in the synthetic speech. (3) Each unit has various fundamental frequency (F0) contours. Therefore, F0 modification rates are very small in any synthesis event, and the F0 modification process causes only minor distortion. To develop a unit database efficiently and effectively, we analyzed 4,850,000 Japanese phrases (breath-group) containing 87,810,000 phonemes and ranked them in order of appearance frequency. Listening tests confirm the high intelligibility and naturalness of speech produced by our new TTS system. It uses the 50,000 highest frequency units that cover over 77% of Japanese texts.

Original languageEnglish
Pages839-842
Number of pages4
Publication statusPublished - 1999
Externally publishedYes
Event6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 - Budapest, Hungary
Duration: Sept 5 1999Sept 9 1999

Conference

Conference6th European Conference on Speech Communication and Technology, EUROSPEECH 1999
Country/TerritoryHungary
CityBudapest
Period9/5/999/9/99

Keywords

  • fundamental frequency
  • multi-form unit
  • text-to-speech system

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'A JAPANESE TEXT-TO-SPEECH SYSTEM BASED ON MULTI-FORM UNITS WITH CONSIDERATION OF FREQUENCY DISTRIBUTION IN JAPANESE'. Together they form a unique fingerprint.

Cite this