MODELING OF FUNDAMENTAL FREQUENCY CONTOURS FOR THAI DIALECTS WITH LARGE SPEECH DATABASE

In four core regions of Thailand, there are four main dialects including central, north, northeast and south dialects. The prosody is significantly unique for each dialect. One important factor determining the prosody is the fundamental frequency. As a result, modeling of Fundamental frequency (F0) contour is very important for the natural speech processing. Even though there are many modeling techniques for modeling the F0 contour. In this study, the Fujisaki’s model has been selected because of its achievement in modeling of various Thai speech units. This study proposes an analysis of model parameters of Thai speech prosody for four regional dialects and two genders. Seven derived parameters from the Fujisaki’s model are as follows. The first parameter is baseline frequency which is the lowest level of F0 contour. The second and third parameters are the numbers of phrase commands and tone commands which reflect the frequencies of surges of the utterance in global and local levels, respectively. The fourth and fifth parameters are phrase command and tone command durations which reflect the speed of speaking and the length of a syllable, respectively. The sixth and seventh parameters are amplitudes of phrase command and tone command which reflect the energy of the global speech and the energy of local syllable. In the experimental results, the large speech material of each regional dialect includes 50 samples of 50 sentences with male and female speech. It can be obviously seen that most of the proposed parameters can distinguish four kinds of regional dialects explicitly. The results reveal that the proposed parameters of Fujisaki’s model can distinguish the regional dialects explicitly.


INTRODUCTION
The former study on F0 modeling has been considerably conducted in various speech units and several techniques such as utterance level (Fujisaki and Ohno, 1998;Fujisaki et al., 1990;Tao et al., 2006;Saito and Sakamoto, 2002;Ni and Hirose, 2006;Li et al., 2004), word and syllable levels (Fujisaki et al., 1990;Hiroya and Hiroshi, 1971;Dat et al., 2006). In Thai speech, Fujisaki's model has been successfully applied for modeling of utterances, tones and words (Hiroya and Sumio, 2002;Seresangtakul and Takara, 2002;2003). To study how efficient the Fujisaki's model perform in each of Thai dialects (central, north, northeast and south dialects), it has been adapted to the same utterances for all dialects. An analysis of model parameters of Thai speech prosody for four regional dialects and two genders will be performed in the same way as modeling of fundamental frequency for Thai expressive speech conducted in 2010 which is proved to be effective for a limited-domain speech corpus (Chomphan, 2010a). The previous study shows that the derived parameters can distinguish one style of speech from each other. Fujisaki's Modeling of F0 contours for Thai Dialects with a compact speech database has been conducted by Chomphan (2010b). However, the significant differences among dialects cannot be noticed. This study increases the speech material size to 25 times higher than that of the previous study. The study proposes an analysis of F0 modeling of four Thai dialects including standard Thai, Lanna or North dialect, Lao-style or North East dialect and South dialect. The extension of Fujisaki's model which is a preliminary work for the advanced research in speech synthesis and recognition is mainly selected in this study (Seresangtakul and Takara, 2002;2003).

Fujisaki's Model
The fundamental frequency contour of an utterance of human speech is treated as a linear superposition of a global phrase and local accent components on a logarithmic scale, as depicted in Fig. 1 (Hiroya and Hiroshi, 1971).
By using this generative model, the parameters are extracted from our speech database, utterance by utterance. Subsequently, the derived parameters are computed are analyzed (Chomphan and Kobayashi, 2008;2009;Seresangtakul and Takara, 2003).

Derived Parameters
From the conventional parameters, we calculated seven derived parameters which reflect the geometrical appearance of the F0 contour of an utterance as follows: • Baseline frequency • Number of phrase commands • Number of tone commands • Phrase command duration • Tone command duration • Amplitude of phrase command • Amplitude of tone command All of these derived parameters have been extracted for four regional Thai dialects including standard Thai, Lanna or North dialect, Lao-style or North East dialect and South dialect.

RESULTS
In our large speech database, we use fifty sentences in Thai for male and female genders. The sentences have been recorded in four Thai dialects of standard Thai (Center-dialect), Lanna Thai dialect (North-dialect), Laostyle Thai dialect (Northeast-dialect) and South Thai dialect (South-dialect). Each dialect contains two thousands and five hundred utterances of samples. Therefore we have ten thousands utterances of samples for each gender. The parameter extraction tools as used in (Mixdorff and Fujisaki, 1997;Chomphan and Kobayashi, 2007a;2007b) are exploited in this study.
In each derived parameter, we analyzed the frequency distribution over its range and then the distributions of four Thai dialects including Center dialect, North dialect, Northeast dialect and South dialect are plot in a graph to show the differences and similarities among those dialects. The first seven graphs Science Publications AJAS are of female speech ( Fig. 2-8) for the baseline frequency, number of phrase commands, number of tone commands, phrase command duration, tone command duration, amplitude of phrase command and amplitude of tone command, respectively. The second seven graphs are of male speech (Fig. 9-15) for the baseline frequency, number of phrase commands, number of tone commands, phrase command duration, tone command duration, amplitude of phrase command and amplitude of tone command, respectively. The abbreviations are defined and used in most figures; frame num, fb, PC num, AC num, PC delta t, AC delta t, PC amplitude and AC amplitude, mean number of frames, baseline frequency, number of phrase commands, number of tone commands, phrase command duration, tone command duration, amplitude of phrase command and amplitude of tone command, respectively.

DISCUSSION
It can be seen from the frequency distribution graphs of female and male speech in Fig. 2-15 that most results show that the distributions of four dialects are significantly different. As for the parameter of baseline frequency of female speech in Fig. 2, it can be empirically seen that deviation of the south dialect is lowest, while the highest mean value is of the northeast dialect. As for the parameter of number of phrase commands of female speech in Fig. 3, it can be noticed that the mode value is at 1 for the center and the northeast dialects, meanwhile the mode value is at 2 for the north and the south dialects. As for the parameter of tone command amplitude of female speech in Fig. 8, it can be noticed that the highest mean value is of the northeast dialect, meanwhile the mean values of the other dialects are somewhat similar.
As for the parameter of baseline frequency of male speech in Fig. 9, it can be empirically seen that deviation of the south dialect is lowest, while the highest mean value is of the north dialect. As for the parameter of number of phrase commands of male speech in Fig. 10, it can be noticed that the mode value is at 1 for the center and the northeast dialects, meanwhile the mode value is at 2 for the north and the south dialects. As for the parameter of tone command amplitude of male speech in Fig. 15, it can be noticed that the highest mean value is of the northeast dialect, meanwhile the lowest mean value is of the south dialect.

CONCLUSION
In this study, the study of a modeling of F0 contour for Thai dialects with a large speech database is conducted. The Fujisaki's model which is proved to be efficient for several Thai speech units has been chosen in this study. The differences among the model parameters of four Thai dialects have been discussed. The experimental results indicate that most of the proposed parameters can distinguish four kinds of Thai dialects obviously.

ACKNOWLEDGEMENT
The researchers are grateful to Kasetsart University for the research scholarship through the Center for Advanced Studies in Industrial Technology.