Fujisaki’s Model of Thai’s Fundamental Frequency Contours with Environmental Noises

: Problem statement: An important human speech feature is the fundamental frequency (F0) contour which represents the speech prosody. It indicates the naturalness and intelligibility of the speech. Modeling of fundamental frequency contour was an essential procedure in the natural speech processing. In speech communication, environmental noise plays an essential role in damaging the digital communication quality. The study of effects of noises on modeling of F0 contour for standard Thai is conducted. Approach: The selected modeling technique in this study was adapted from Fujisaki’s model, because of its achievement in modeling of various Thai speech units. Four types of environmental noises were recorded for different levels of power. This study was proposed an analysis of some parameters of modeling of Thai speech prosody for two genders and four types of noises. The derived Fujisaki’s model was covered seven parameters including baseline frequency, the numbers of phrase commands and tone commands, phrase command and tone command durations, amplitudes of phrase command and tone command. Results: In the experimental results, the standard Thai of 2 samples of 5 sentences with 5 males and 5 females was used. Four types of noises include train, factory, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The results were showing that the different noises give the distinguished effects for most of the proposed model parameters. Conclusion: The results confirm that the effects of four types of noises are significantly different. It can be seen that the environmental noises deteriorate the model parameters empirically.


INTRODUCTION
In the former study, modeling of F0 contour with noisy environment causes the deterioration of naturalness of the speech. To develop the modern natural speech processing system, it is very important to know how the noise degrades the model parameters. The previous study of F0 modeling has been conducted in many levels of speech units, for examples, utterance level, word and syllable levels (Fujisaki and Sudo, 1971;Fujisaki et al., 1990;Fujisaki and Ohno, 1998;Saito and Sakamoto, 2002;Li et al., 2004;Tao et al., 2006;Ni and Hirose, 2006;Tran et al., 2006). Moreover, in Thai speech, this model has been effectively applied for applying to the utterances, words and tones (Seresangtakul and Takara, 2002;2003;Hiroya and Sumio, 2002). The modeling of fundamental frequency for Thai expressive speech with a limited-domain speech database was succesfully conducted in 2010 (Chomphan, 2010a). It has been seen that the selected model parameters are able to distinguish all styles of expressive speech.
Fujisaki's Modeling of F0 contours for Thai Dialects has been conducted by Chomphan (2010b). However, the effects of noises have not been studied.
This study applies the same way of the former study by using an analysis of F0 contour modeling of standard Thai with four different types of noises. The Fujisaki's model is a basic tool for applying in the the advanced research of the natural speech recognition and synthesis. (Seresangtakul and Takara, 2002;2003;Chomphan, 2010c;2011a).  (Fujisaki and Sudo, 1971). By appling the Fujisaki's model, the related parameters are extracted from the speech corpus, utterance by utterance. Therafter the output parameters are calculated are systematically analyzed (Chomphan and Kobayashi, 2008;2009;Seresangtakul and Takara, 2003).
Derived parameters: Seven derived parameters are calculated from the conventional parameters. It is noted that these derived paramenters mostly reflect the geometrical appearance of the F0 contour of the speech: The derived parameters are mostly extracted for Thai speech. It has been noted that number of frame is also extracted in the experimental results. However it is not the main focused parameters explained earlier (Chomphan, 2011b).

Environmental noises:
To evaluate the effects of noises in speech communication, four types of noises including train, factory, car and air conditioner are recorded. They are mixed directly with the pre-recorded clean speech in the speech database. Before mixing noises with the clean speech, the noise volume or power are adjusted in several exact levels. As for the level variation of noises, the levels of each type of noise are varied from 0, 5, 10, 15, 20 dB, respectively.

RESULTS
In the speech corpus, we use standard Thai of 2 samples of 5 sentences with 5 males and 5 females. The sentences have been recorded in standard Thai. Both male and female speech has been constructed in the speech database. The extraction tools are applied in this study (Mixdorff and Fujisaki, 1997;Chomphan and Kobayashi, 2007a;2007b). For each parameter, the frequency distribution over its range is constructed, subsequently the distributions of standard Thai are plot in a graph. The differences and similarities among those different types of noises are illustrated in the graph. The first graph is of clean speech (Fig. 2), Fig. 3 and 4 are of speech corrupted by air-conditioner noises at 0 and 20 dB, respectively. Figure 5 and 6 are of speech corrupted by car noises at 0 and 20 dB, respectively. Figure 7 and 8 are of speech corrupted by factory noises at 0 and 20 dB, respectively. Figure 9 and 10 are of speech corrupted by train noises at 0 and 20 dB, respectively. These abbreviations are used in most figures; frame num, fb, AC num, PC num, AC delta t, PC delta t, AC amplitude and PC amplitude, mean number of frames, baseline frequency, number of tone commands, number of phrase commands, tone command duration, phrase command duration, amplitude of tone command and amplitude of phrase command, respectively. Please be noted that the first sub-graph in the following figures is not the main list of 7 model parameters. However it reflects the distribution of length of utterance. They show that the distributions of speech corrupted by four types of noises are significantly different. The distribution ranges of most parameters from speech corrupted by noises at 0 dB are mostly lower than that of 20 dB. It can be seen that the air-conditioner and car noises cause the distributions of baseline frequency split into two sub-groups. As for the parameter of number of phrase commands, there is a very little change for its distribution caused by noises.

CONCLUSION
This study presents the effects of noises on modeling of F0 contour for standard Thai. The Fujisaki's model was chosen in this study. In the experiments, four types of environmental noises are recorded with different levels of power. The model parameters have been explained and summarized. The experimental results indicate that some types of noises cause some differences in the distributions of the model parameters. All in all, the environmental noises deteriorate the F0 contours and also distort the model parameters.