Design of Tree Structure in Context Clustering Process of Hidden Markov Model-Based Thai Speech Synthesis

Suphattharachai Chomphan

doi:10.3844/ajassp.2012.313.320

Research Article Open Access

Design of Tree Structure in Context Clustering Process of Hidden Markov Model-Based Thai Speech Synthesis

Suphattharachai Chomphan¹

¹ Department of Electrical Engineering, Faculty of Engineering at Si Racha, Kasetsart University, 199 M.6, Tungsukhla, Si Racha, Chonburi, 20230, Thailand

Abstract

Problem statement: In HMM-based Thai speech synthesis, the tone degradation due to the imbalance of training data of all tones. Some distortion of syllable duration is obviously noticeable when the system is trained with a small amount of data. These problems cause the decrement in naturalness and intelligibility of the synthesized speech. Approach: This study proposes an approach to improve the correctness of tone of the synthesized speech which is generated by an HMM-based Thai speech synthesis system. In the tree-based context clustering process, tone groups and tone types are used to design four different structures of decision tree including a single binary tree structure, a simple tone-separated tree structure, a constancy-based-tone-separated tree structure and a trend-based-tone-separated tree structure. Results: A subjective evaluation of tone correctness is conducted by using tone perception of eight Thai listeners. The simple tone-separated tree structure gives the highest level of tone correctness, while the single binary tree structure gives the lowest level of tone correctness. The additional contextual tone information which is applied to all structures of the decision tree achieves a significant improvement of tone correctness. Finally, the evaluation of syllable duration distortion among the four structures shows that the constancy-based-tone-separated and the trend-based-tone-separated tree structures can alleviate the distortions that appear when using the simple tone-separated tree structure. Conclusion: The appropriate structure of tree in context clustering process with the additional contextual tone information can improve the correctness of tones, while the constancy-based-tone-separated and the trend-based-tone-separated tree structures can alleviate the syllable duration distortions.

American Journal of Applied Sciences

Volume 9 No. 3, 2012, 313-320

DOI: https://doi.org/10.3844/ajassp.2012.313.320

Submitted On: 9 October 2011 Published On: 14 January 2012

How to Cite: Chomphan, S. (2012). Design of Tree Structure in Context Clustering Process of Hidden Markov Model-Based Thai Speech Synthesis. American Journal of Applied Sciences, 9(3), 313-320. https://doi.org/10.3844/ajassp.2012.313.320

Copyright: © 2012 Suphattharachai Chomphan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4,280 Views
3,391 Downloads
0 Citations

Download

Keywords

Thai speech
speech synthesis
tree-based context clustering
HMM-based speech synthesis
tone correctness
syllable duration distortion
important suprasegmental