Effects of Noises on the Analysis of Fundamental Frequency Contours for Thai Speech

Problem statement: In speech communication, noises from surrounding environments affect the communication quality with various aspects. The received speech quality should be analyzed to see how the important noises reduce the speech quality so that we can eliminate them in an appropriate way. Approach: This study presents a study on the analysis of the noise effects on Thai speech. Four kinds of noises; air conditioner, car, factory and train, are chosen to be simulated in the study. The various levels of signal-to-noise ratios are conducted. The root mean square error between the fundamental frequency contours of the corrupted speech and the clean speech is calculated. Finally, the analysis of the root mean square error in terms of comparisons among genders, the four kinds of noises and various levels of signal-to-noise ratios is performed. Results: In the experiments, 400 speech utterances of male and female are used as speech materials. The average values of root mean square error are calculated. The results show that the fundamental frequency contour of female speech is affected more than that of male speech. Comparing among four kinds of noises, the car noise has the highest influence, while the factory noise has the lowest influence. Moreover, the root mean square error is inversely proportional to the level of signal-to-noise ratio. Conclusion: From the finding, the noises from surrounding environments have affected the speech quality of fundamental frequency contour. This study is the preliminary knowledge to enhance the speech quality for further works such as speech synthesis systems or other speech processing technologies.


INTRODUCTION
In speech technology study, speech analysis has been conducted for many languages. It is a preliminary procedure in the speech processing area; including speech recognition, speech synthesis, speech analysis and speech coding (Chomphan and Kobayashi, 2007;Chomphan and Kobayashi, 2008;Chomphan and Kobayashi, 2009). In practical situation, noises from surrounding environments affect the speech quality with various aspects. The fundamental frequency extracted frame-by-frame from the speech is an important feature indicating the pitch or voicing level of the speech. It has been widely exploited in most of speech processing technology mentioned above. The study of the affect of surrounding noises to the fundamental frequency should be conducted appropriately. The important background noises include car noise, train noise, factory noise and air conditioner noise (Manohar and Rao, 2006). The study concentrates how the noises affect the fundamental frequency contour of the speech by varying the level of signal-tonoise ratio. It is expected to apply the finding knowledge in further study in advanced research such as speech synthesis and recognition (Chomphan, 2009;Chomphan, 2010a;Chomphan, 2010b;Chomphan, 2010c).

Fundamental Frequency Contour (F0 contour):
There is a substantial amount of data on the frequency of the voice fundamental or fundamental frequency (F0) in the speech of speakers who differ in age and sex. The data have been published for several languages and for various types of discourse. The data always include an average measure of F0, usually expressed in Hz, but in some cases the average duration of a period has been reported instead. Typical values obtained for F0 are 120 Hz for male speech and 210 Hz for female speech (Waldstein and Boothroyd, 1994). An example of F0 contour of the natural speech is depicted in Fig. 1.
Typically, the mean values of F0 change slightly with age. For female speech, F0 is quite stationary up to the period of menopause, when it decreases to reach the minimum which is about 15 Hz lower around 70 years of age (Pegoraro-Krook, 1988). The physiological changes is an effect of the increased testosteroneoestrogen ratio at that period. A similar decreasing of F0 can be caused by the habit of smoking (Gilbert and Weismer, 1974). For male speech, the dramatic decrease in F0 during puberty duration has been observed to continue with subsequent deceleration until about 35 years of age. Thereafter, at about 55 years of age, F0 begins to rise again (Pegoraro-Krook, 1988).
F0 modeling is another issue that is related to this study. The former study on F0 modeling has been widely performed in various speech units and several techniques such as utterance level (Fujisaki and Ohno, 1998;Fujisaki et al., 1990;Tao et al., 2006;Saito andSakamoto, 2002 Ni andHirose, 2006;Li et al., 2004), word and syllable levels (Fujisaki et al., 1990). In Thai speech, Fujisaki's model has been successfully applied for modeling of utterances, tones and words (Hiroya and Sumio, 2002;Seresangtakul and Takara, 2002;Seresangtakul and Takara, 2003). pure car noise signal, (c) car noise-merged signal with SNR 0dB, (d) car noise-merged signal with SNR 5dB, (e) car noise-merged signal with SNR 10dB, (f) car noise-merged signal with SNR 15dB and (g) car noise-merged signal with SNR 20dB Types of noise: The interesting background noises include car noise, train noise, factory noise and air conditioner noise. Each type of the noises is recorded separately and its amplitude is scaled to acquire the desired level of noise. In other words, the energy of noise and the speech signal are calculated then these signals are merged with five levels of SNRs of 0, 5, 10, 15 and 20 dB (Shareha et al., 2009;Geravanchizadeh and Rezaii, 2009;Rushaidin et al., 2009;Rajarathinam and Parmar, 2011).

Procedures of fundamental frequency contour analysis:
The following analysis procedures are implemented for an utterance from the speech data material and noise data material (Abdellaoui, 2009;Chomphan, 2010b;2010c;2010d;2010e;Lampson et al., 2010;Ramadan, 2010;Teymourzadeh et al., 2010): • Scaling of noise signal to obtain five desired level comparing with clean speech signal • Merging noise signal with the clean speech signal • Extracting F0 contour from both the noise-merged signal and the clean speech signal • Calculating the Root Mean Square Error (RMSE) between the F0 contours of noise-merged signal and the clean speech signal • Calculating the statistical values of the root mean square error in step 4 It has been noted that these analysis procedures are conducted for all types of noises. Figure 2 shows the examples of fundamental frequency contours of female speech extracted from speech signal with different situations including clean speech signal, pure car noise signal, car noise-merged signal with SNR 0dB, car noisemerged signal with SNR 5dB, car noise-merged signal with SNR 10dB, car noise-merged signal with SNR 15dB and car noise-merged signal with SNR 20dB. It can be seen that the higher level of noise (or lower level of SNR) can deteriorate the F0 contour from the original clean speech. Moreover, no F0 value can be extracted from pure noise signal as seen in Fig. 2b.

RESULTS
By using the speech database of 200 sentences of female speech and 200 sentences of male speech, the extracted fundamental frequency contours of noisemerged signal with SNR 0dB, noise-merged signal with SNR 5dB, noise-merged signal with SNR 10dB, noisemerged signal with SNR 15dB and noise-merged signal with SNR 20dB and the clean speech signal are extracted. Root mean square error between the extracted fundamental frequency values of the noise-merged signal and the clean speech signal are calculated for all sentences in the speech database. Four important types of noises which are simulated in this study include air conditioner noise, car noise, factory noise and train noise. Figure 3-6 show the root mean square errors for different levels of SNRs, meanwhile the comparison of RMSEs between the female speech and the male speech is present for all figures.

DISCUSSION
From the comparison of root mean square errors between the female corrupted speech and the corrupted male speech for different levels of SNRs in Fig. 3-6, it can be seen that the RMSEs of female speech are mostly higher than those of male speech. Comparing among four kinds of noises in Fig. 3-6, the car noise in Fig. 4 has the highest influence, while the factory noise in Fig. 5 has the lowest influence averagely. The RMSE decreases when the SNR is increasing for both male and female speech and for all types of noises. It is the result from the less effect of noise. All in all, the root mean square error is inversely proportional to the level of signal-to-noise ratio.

CONCLUSION
This study proposes a study on the analysis of the noise effects on Thai speech. Four interesting kinds of noises; air conditioner, car, factory and train, are mainly focused. The various levels of signal-to-noise ratios are performed. The root mean square error between the fundamental frequency contours of the corrupted speech and the clean speech is calculated. Finally, the analysis of the root mean square error in terms of comparisons among genders, the four kinds of noises and various levels of signal-to-noise ratios is conducted. We use 400 of male and female utterances in the study. The results show that the fundamental frequency contour of female speech is affected more than that of male speech. Comparing among four kinds of noises, the car noise has the highest influence, while the factory noise has the lowest influence. Moreover, the root mean square error is inversely proportional to the level of signal-to-noise ratio. All in all, the noises from surrounding environments have affected the speech quality of fundamental frequency contour.