Structural Modeling of Fundamental Frequency Contour for Thai Tones

Problem statement: In Thai, tone is an essential feature of a prosodic syllable to identify the meanings of that syllable or that part of word. To generate the tonal speech with natural prosody, it is needed to manage the fundamental frequency (F0) of the speech appropriately. A successful approach of structural modeling from Mandarin Chinese has been adapted to model Thai tone. Approach: The structural modeling of voice F0 contours for Thai tones has been studied. Both male and female speech are concerned. The speech material covers 15 syllables with 5 tones. We use 30 samples for each syllable. The structural modeling parameters for all tones are extracted. Thereafter, the Root Mean Square (RMS) error between the re-synthesized F0 contour and the natural F0 contour is calculated. Results: The experimental analysis shows that RMS errors of all tones are mutually different. It has been noticed that the tone 1 or low tone has the smallest error among all tones in average. Conclusion: The structural model is effectively applied to model Thai tones. The structural modeling can distinguish each tone empirically.


INTRODUCTION
In human speech production, the vocal chords vibrate at a temporal frequency to produce a semiperiodic air flow through the vocal tract.This frequency is known as the fundamental frequency of the output speech signal.It is an essential feature among other speech features which carry prosodic information of the natural speech.In the modern speech technology, e.g., speech recognition, speech analysis and synthesis, it is necessary to model the F0 with high accuracy.In the former studies, several modeling techniques have been conducted at different levels of speech units, e.g., word and syllable levels, sentence level (Tran et al., 2006;Tao et al., 2006;Fujisaki and Ohno, 1998;Fujisaki et al., 1990;Saito and Sakamoto, 2002;Li et al., 2004;Fujisaki and Sudo, 1971).This model has been efficiently applied to Thai language in the levels of utterance, word and tone (Seresangtakul and Takara, 2002;Hiroya and Sumio, 2002;Seresangtakul and Takara, 2003).It has been noted that tone is an essential feature for a speech unit of syllable in Thai.The different tone of a syllable gives the different meanings.Modeling of tone in tonal language is very crucial in the application of speech processing.
This research presented another approach of fundamental frequency contour modeling for Thai tones.The structural model is chosen to be exploited (Ni and Hirose, 2006).The root mean square (RMS) error is calculated for evaluating the modeling quality for all tones including tone 0, tone 1, tone 2, tone 3 and tone 4 (mid tone, low tone, falling tone, high tone and rising tone).

Structural model:
The fundamental frequency contour is illustrated in Fig. 1.The mathematical model has been applied (Chomphan, 2011a;2011b;Ni and Hirose, 2006).This contour is modeled by using a structural control which consists of locating a number of normalized fundamental freuency targets along time axis in logarithmic scale.The fundamental freuency targets or pitch targets are specified by amplitudes and transition time.This transition between any two adjacent targets is approximated by using a truncated second-order function as depicted in Fig. 2. database to data analysis.The speech database is firstly contructed.The speech of both man and woman is recorded in syllable basis.Five Thai tones including tone 0, 1, 2, 3 and 4 are designed with the same amount and pattern.Each tone consists of 15 syllables, while each syllable consists of 30 samples.As a result, the speech database covers 4,500 speech utterances.After constructing the speech database, the fundamental frequency of an utterance are extracted.Thereafter, the pitch targets are placed by finding for all of the local minimums and maximums.An exponential function is used to approximate an appropriate route between two adjacent pitch targets.The extracted parameters of most of exponential functions is used to re-synthesis the fundamental frequency contour.To evaluate the difference between the natural fundamental frequency contour and the resynthesis fundamental frequency contour, RMS error calculation is performed.Lastly, the RMS error has been statistically analyzed for all tones in Thai.

RESULTS
The evaluation of the model can be presented in the eleven figures (Fig. 5-15) resulting from the RMS error calculation process.The averaged RMS errors for all five tones of two genders have been calculated.

DISCUSSION
It can be seen from the experimental figures of Figs.5-14, it has been noticed from both female speech and male speech that the averaged RMS errors of all tones are quite different.Comparing between female speech and male speech in Fig. 15, it can be obviously noticed that the errors of female speech are higher than that of male speech for all tones.Another point, the averaged RMS errors of tone 4 are in the highest level for both female and male speech.Moreover, the averaged RMS errors of tone 1 are in the lowest level for both female and male speech.

CONCLUSION
The structural modeling of fundamental frequency contour for Thai tones is presented in this study.Five lexical Thai tones are statistically studied.The averaged root mean square error of each tone differs from a tone to the others.All in all, the structural modeling technique can be appropriately applied for modeling of Thai tones.

Fig. 1 :
Fig. 1: F0 contour with a trend line in a logarithmic scale