Fujisaki's Model of Fundamental Frequency Contours for Thai Dialects
DOI : 10.3844/jcssp.2010.1263.1271
Journal of Computer Science
Volume 6, Issue 11
Problem statement: In general, there are a number of rural dialects in Thai. However, four dialects are mainly spoken by Thai people residing in four core region including central, north, northeast and south regions. Recognizing and synthesizing Thai speech with different dialects are consequently difficult. Approach: Prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. To treat the problem, the speech prosody is carefully preserved through modeling the fundamental frequency (F0) contours. The differences among the model parameters of four Thai dialects have been summarized. This study proposed an analysis of model parameters for Thai speech prosody with four regional dialects and two genders which is a preliminary work for speech recognition and synthesis. Fujisaki’s modeling; a powerful tool to model the F0 contour has been adopted. Seven derived parameters from the Fujisaki’s model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each regional dialect includes 200 samples of one sentence with male and female speech. Therefore our speech database contains 1600 utterances in total. The results showed that most of the proposed parameters can distinguish four kinds of regional dialects explicitly. Conclusion: By using the Fujisaki’s model, the results confirm that the proposed parameters can distinguish the regional dialects efficiently. In the future research, they were expected to be applied in the speech recognition and synthesis with various regional dialect characteristics.
© 2010 Suphattharachai Chomphan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.