An investigation to transplant emotional expressions in DNN-based TTS synthesis

Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

研究成果

24 被引用数 (Scopus)

抄録

In this paper, we investigate deep neural network (DNN) architectures to transplant emotional expressions to improve the expressiveness of DNN-based text-to-speech (TTS) synthesis. DNN is expected to have potential power in mapping between linguistic information and acoustic features. From multispeaker and/or multi-language perspectives, several types of DNN architecture have been proposed and have shown good performances. We tried to expand the idea to transplant emotion, constructing shared emotion-dependent mappings. The following three types of DNN architecture are examined; (1) the parallel model (PM) with an output layer consisting of both speaker- dependent layers and emotion-dependent layers, (2) the serial model (SM) with an output layer consisting of emotion-dependent layers preceded by speaker-dependent hidden layers, (3) the auxiliary input model (AIM) with an input layer consisting of emotion and speaker IDs as well as linguistics feature vectors. The DNNs were trained using neutral speech uttered by 24 speakers, and sad speech and joyful speech uttered by 3 speakers from those 24 speakers. In terms of unseen emotional synthesis, subjective evaluation tests showed that the PM performs much better than the SM and slightly better than the AIM. In addition, this test showed that the SM is the best of the three models when training data includes emotional speech uttered by the target speaker.

本文言語English
ホスト出版物のタイトルProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
出版社Institute of Electrical and Electronics Engineers Inc.
ページ1253-1258
ページ数6
2018-February
ISBN(電子版)9781538615423
DOI
出版ステータスPublished - 2月 5 2018
イベント9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur
継続期間: 12月 12 201712月 15 2017

Other

Other9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
国/地域Malaysia
CityKuala Lumpur
Period12/12/1712/15/17

ASJC Scopus subject areas

  • 人工知能
  • 人間とコンピュータの相互作用
  • 情報システム
  • 信号処理

フィンガープリント

「An investigation to transplant emotional expressions in DNN-based TTS synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル