TY - GEN
T1 - Enhancing a glossectomy patient's speech via GMM-based voice conversion
AU - Tanaka, Kei
AU - Hara, Sunao
AU - Abe, Masanobu
AU - Minagi, Shogo
PY - 2017/1/17
Y1 - 2017/1/17
N2 - In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.
AB - In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.
UR - http://www.scopus.com/inward/record.url?scp=85013858356&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013858356&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2016.7820909
DO - 10.1109/APSIPA.2016.7820909
M3 - Conference contribution
AN - SCOPUS:85013858356
T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
Y2 - 13 December 2016 through 16 December 2016
ER -