Effects of Noises on Fujisaki’s Model of Fundamental Frequency Contours for Thai Dialects
- 1 Department of Electrical Engineering, Faculty of Engineering at Si Racha, Kasetsart University, 199 M.6, Tungsukhla, Si Racha, Chonburi, 20230, Thailand
- 2 School of Social and Environmental Development, National Institute of Development Administration, 118 M.3, Serithai Road, Klong-Chan, Bangkapi, Bangkok, 10240, Thailand
Abstract
Problem statement: Modeling of fundamental frequency (F0) contour plays an important role on the natural speech processing, since F0 is an important speech feature defining the human speech prosody. In Thai, there are four main dialects spoken by Thai people residing in four core region including central, north, northeast and south regions. Environmental noises are also plays an important role in corrupting the speech quality. The study of effects of noises on modeling of F0 contour for Thai dialects will evaluate robustness of the modeling techniques. Approach: The Fujisaki’s model has been selected in this study because of its achievement in modeling of various Thai speech units. Four types of environmental noises are simulated with different levels of power. The differences among the model parameters of four Thai dialects have been summarized. This study proposes an analysis of model parameters for Thai speech prosody with four regional dialects and two genders and four types of noises. Seven derived parameters from the Fujisaki’s model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. Results: In the experiments, each regional dialect includes 10 samples of 10 sentences with male and female speech. Four types of noises include train, factory, car and air conditioner. Moreover, five levels of each type of noise are varied from 0-20 dB. The results show that most of the proposed parameters can distinguish four kinds of regional dialects explicitly. Conclusion: By using the Fujisaki’s model, the results confirm that the proposed parameters can distinguish the regional dialects efficiently. However, the simulated noises deteriorate the F0 contours and also distort the model parameters.
DOI: https://doi.org/10.3844/ajassp.2012.1684.1693
Copyright: © 2012 Suphattharachai Chomphan and Chutarat Chompunth. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,327 Views
- 2,447 Downloads
- 0 Citations
Download
Keywords
- Fundamental frequency modeling
- environmental noise
- fundamental frequency
- command durations
- phrase commands