TY - JOUR
T1 - Naturalness improvement algorithm for reconstructed glossectomy patient's speech using spectral differential modification in voice conversion
AU - Murakami, Hiroki
AU - Hara, Sunao
AU - Abe, Masanobu
AU - Sato, Masaaki
AU - Minagi, Shogo
N1 - Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.
PY - 2018
Y1 - 2018
N2 - In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.
AB - In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.
KW - Glossectomy
KW - Neural network
KW - Spectral differential
KW - Speech intelligibility
KW - Voice conversion
UR - http://www.scopus.com/inward/record.url?scp=85054996045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054996045&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2018-1239
DO - 10.21437/Interspeech.2018-1239
M3 - Conference article
AN - SCOPUS:85054996045
SN - 2308-457X
VL - 2018-September
SP - 2464
EP - 2468
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Y2 - 2 September 2018 through 6 September 2018
ER -