Analytical Study of Fujisaki’s Model of Fundamental Frequency Contour for Thai Tones

Problem statement: Tone of a tonal language is an important feature o f a prosodic syllable to identify the meanings of that syllable or that p rt of word. Ii is very crucial to model the featur e related to tone of speech to achieve the most natur alness in speech communication. Approach: The study presents an approach to analyze the model par ameters of Thai tones for two genders. The successive modeling of fundamental frequency, Fujis aki’ model is selected. We derive seven parameters; baseline frequency, the numbers of phra se commands and tone commands, phrase command and tone command durations, amplitudes of p hrase command and tone command. Results: In the experimental results, there are 20 syllables and each syllable includes 50 samples of a tone with male and female speech. Five tones are re cord d in the same environment. Thereafter, there are ten thousands samples in the speech corpus. It can be obviously seen that Thai tones are determined by the derived parameters. Conclusion: All in all, Thai tones are able to be discriminate d by the derived parameter of Fujisaki’s model.


INTRODUCTION
The necessity of fundamental frequency modeling with high accuracy is required in the modern speech processing technology. In the past decades, this related work were performed in many research group of speech technology (Tao et al., 2006;Ni and Hirose, 2006;Fujisaki and Sudo, 1971;Saito and Sakamoto, 2002;Li et al., 2004;Tran et al., 2006;Fujisaki et al., 1990;Fujisaki and Ohno, 1998). The speech utterance is divided into many level of speech units. Thereafter, they are modeled by using a lot of modeling techniques (Hiroya and Sumio, 2002). In Thai language, the derived model was developed and applied to many levels of speech units. This study proposes an analytical study of fundamental frequency modeling of tonal language of Thai. We specify on the level of tone speech units (Seresangtakul and Takara, 2002;2003).

MATERIALS AND METHODS
Fujisaki's model: Figure 1 illustrates an extension of Fujisaki's model for the generation of fundamental frequency contour of a speech utterance. The series of fundamental frequency values is determined as a linear superposition of a local accent component and a global phrase component on a logarithmic scale. (Fujisaki and Sudo, 1971;Chomphan and Kobayashi, 2008). The related parameters are extracted from the speech corpus using the Fujisaki's model for all utterances. Subsequently, the derived output parameters are computed, averaged and systematically analyzed (Chomphan and Kobayashi, 2009;Seresangtakul and Takara, 2003).
Output parameters: Seven derived output parameters are computed from the conventional parameters. It is noticed that these derived paramenters mostly reflect the geometrical appearance of the F0 contour of the speech. They are baseline frequency, number of phrase commands, number of tone commands, phrase command duration, tone command duration, amplitude of phrase command and amplitude of tone command. The derived output parameters are mostly extracted for Thai tones (Chomphan, 2011).
To analyzed the distributions of the output parameters statistically, all of output parameters have been extracted for five Thai tones including mid, low, falling, high and rising tones.

Tones in Thai:
In tonal language; e.g.,Thai language, tone is a feature embedding within the syllable speech unit. The tone is sometimes interpreted as the accent component (Chomphan and Kobayashi, 2009). However, the number of tones are different for any tonal languauges. In Thai, there are basically 5 tones including mid, low, falling, high and rising tones. The tone numbers are assigned as tone 0, tone 1, tone 2, tone 3 and tone 4, respectively. The trajectories of fundamental frequencies for all tones are different as illustrated in Fig. 2 (Seresangtakul and Takara, 2002;2003;Chomphan and Kobayashi, 2008).

RESULTS
At first, the Thai speech material is analyzed. Twenty syllables are concerned in this study. Both man speech and woman speech are exploited in the experiment (Mixdorff and Fujisaki, 1997). As for one syllable, five types of tones were covered. Each tone contains 50 samples. Therefore we have ten thousands of samples for both man and woman speech.

DISCUSSION
We notice from the frequency distribution plots as illustrated in Fig. 3-12. Comparing between five tones, the mean, median and mode of each tone are quite discriminating. At the beginning, the parameter of baseline frequency of male speech in Fig. 3-7, we can empirically noticed that the distribution of tone2 is bimodal, while the others are uni-modal. Secondly, we can empirically noticed the distributions of parameter of baseline frequency of female speech in Fig. 8-12, the distributions of tone 1 and tone 4 are uni-modal, while the others are not uni-modal. Moreover, we can empirically noticed the parameters of number of phrase command and number of tone command in Fig. 3-12, most of them has only one unit, except for that of male-speech tone1in Fig. 4. Moreover, we can empirically noticed the parameter of phrase command duration in Fig. 3-12, mean value of tone 3 is the lowest for male speech, while the mean values of tone 3 and tone 4 are the about lowest for female speech. Considering the parameter of tone command duration in Fig. 3-12, we can see that deviation of tone 0 is the highest for male speech, while the mean value of tone1 is the highest for female speech. Considering the parameter of amplitude of phrase command in Fig. 3-12, we can notice that the mean values for all five tones are quite similar for both male and female speech. Finally, considering the parameter of amplitude of tone command in Fig. 3-12, we can observe that the mean value of tone4 is the highest for both male and female speech.

CONCLUSION
Tone modeling is analyzed in this research. The speech material covers both man and woman speech and contains all fives tone in Thai language equally. The selected core modeling technique is of Fujisaki's. Most of seven derived ouput parameters are studied. The derived output parameter distributions discriminate one tone from the others.

ACKNOWLEDGEMENT
The author is grateful to Kasetsart University for the research scholarship through the Center for advanced Studies in Industrial Technology.