TY - GEN
T1 - Rapid acoustic model adaptation using inverse MLLR-based feature generation
AU - Ito, Arata
AU - Hara, Sunao
AU - Kitaoka, Norihide
AU - Takeda, Kazuya
PY - 2010/12/1
Y1 - 2010/12/1
N2 - We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.
AB - We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.
UR - http://www.scopus.com/inward/record.url?scp=84869128367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869128367&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84869128367
SN - 9781617827457
T3 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
SP - 3783
EP - 3788
BT - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
T2 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
Y2 - 23 August 2010 through 27 August 2010
ER -