Title: Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin
Authors: Zheng, YB; Li, Y; Wen, ZQ; Liu, B; Tao, JH
Author Full Names: Zheng, Yibin; Li, Ya; Wen, Zhengqi; Liu, Bin; Tao, Jianhua
Source: JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 90 (7):1039-1052; SI 10.1007/s11265-017-1290-2 JUL 2018
Language: English
Abstract: Currently, most speech synthesis systems only generate speech in a reading style, which greatly affects the expressiveness of the synthetized speech. To improve the expressiveness of the synthetized speech, this paper focuses on the generation of exclamatory and interrogative speech for Mandarin spoken language. A multi-style (exclamatory and interrogative) deep neural network-based acoustic model with a style-specific layer (which can have multiple layers) and several shared hidden layers is proposed. The style-specific layer is used to model the distinct style specific patterns. The shared layers allow maximum knowledge sharing between the declarative and multi-style speech. We investigate five major aspects of the multi-style adaptation: neural network type and topology, the number of layers in style-specific layer, initial model, adaptation parameters and adaptation corpus size. Both objective and subjective evaluations are carried out to evaluate the proposed method. Experiment results show the proposed multi-style BLSTM with top one layer adapted is superior to our prior work (which is trained by the combination of constrained Maximum likelihood linear regression and structural maximum a posterior), and achieves the best performance. We also find that adapting on both spectral and excitation parameters are more effective than only adapting on the excitation parameters.
ISSN: 1939-8018
eISSN: 1939-8115
IDS Number: GH6LK
Unique ID: WOS:000433555600008
*Click Here to View Full Record