Performance Evaluation of Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Bitrate Scalable Tool over Additive White Gaussian Noise and Rayleigh Fading Channels

: Problem statement: In the modern speech communication technology, the speech coding with bitrate scalability was needed. However, various types of noises in the communication channels cause damages in the transmitted information especially speech data. Tonal-language speech was also affected by this situation. Approach: Based on the high pitch delay resolution technique, the MP-CELP speech coding was proposed over the environment with CDMA AWGN and Rayleigh fading channels for tonal language. The proposed coder supports multiple bitrates and also has the functionality of bitrate scalability. Results: Through performance analysis and computer simulation, the quality of the proposed coding was presented with an improvement from conventional scalable MP-CELP in the specific-noise environments. The HPDR technique was applied to the MP-CELP to use for tonal language, meanwhile it can support the core coding rate of 5.6, 8.2, 12.2 kbps and additional scaled bitrates. Conclusion: By applying the high pitch delay resolution technique with the MP-CELP speech coding, we can improve the quality of tonal encoded speech. Moreover, the coding quality of the proposed coder was better than the conventional coder for Thai language over both AWGN channel and Rayleigh fading channel.


INTRODUCTION
Nowadays the digital communications are widely developed. The audio, images, video or data information can be transmitted pass through wire or wireless network channels. Simultaneously, the number of users to access these networks increases rapidly. Consequently, channel capacity has to be increased, signal compression aims to perform this (Chompun et al., 2000;Chompun, 2004). Since the multimedia applications such as videophone and videoconference on ATM and Internet are widely used, the high quality speech codes are highly demanded. These kinds of applications require special considerations for packet loss. To overcome this problem, it is to realize a scalable coder where the synthesized speech signal can be decoded from the received packets, which contain only a part of the whole encoded bitstream. One of standardization activities for such areas is undergoing at the MPEG-4 (Nomura et al., 1998).
As for the 3GPP CDMA systems, the EVRC speech coder performs very well with much more robustness than the older codec's (Jabrane et al., 2007). But for the bit rate range, it can support the range of 0.81-8.55 kbps. One candidate of the MPEG4 natural speech coder is MP-CELP which supports a more flexible and wider range of 5-29.5 kbps. This flexible coder employs the multi-pulse excitation which the number of pulses in fixed-entry codebook is selective for bitrate scalability and multiple bitrate functionality according to the MPEG-4 CELP speech coder requirements (Nomura et al., 1998).
The performance of MP-CELP with High Pitch Delay Resolutions (HPDR) technique is presented in this study by examining in time varying channels. Result for Rayleigh flat fading channels is compared to the AWGN channel in the context of cellular communication environment (Adetunde and Seidu, 2008;Vlasie and Rousseau, 2005).
In MP-CELP, amplitudes or signs for multi-pulse excitation are simultaneously vector quantized. To improve speech quality for background noise conditions, the adaptive pulse location restriction method are applied . This coder operates at various bitrates ranging from 4-12 kbps utilizing the flexibility in multi-pulse excitation coding (Nomura et al., 1998). As for tonal language, such as Thai, a syllable is composed of consonants, vowels and tone (Chompun et al., 2001a;2001b). The smallest structure of sounds or syllables in Thai is composed of one vowel unit or one diphthong, one, two or three consonants and a tone. The structure can be represented as illustrated in Fig. 1. Ci is initial consonant, Cf is final consonant, V is vowel and T is tone.
The significant difference between tonal and toneless language is tone (T). In tonal language, the words of different tones yield their distinguished meaning. By using the standard speech coder such as CS-ACELP with tonal language, it showed the degraded speech quality when compared to those of toneless language. The reason is that the tone information precision is not enough for tonal language, (Chompun et al., 2000;Chompun et al., 2001a;2001b).
This study proposes a bitrate scalable tonal language speech coder based on a multi-pulse based code excited linear predictive coding (Taumi et al., 1996;Ozawa et al., 1997). The proposed coder provides the bitrate scalabilities which is effective in multimedia communications. Moreover, this coder is improved for the tonal language speech by applying the high pitch delay resolutions to retain the tone information precision.

Bitrate scalable MP-CELP coder:
The operation principle for bitrate scalable MP-CELP coder can be separated into 2 parts, MP-CELP core coder and bitrate scalable tool.
The MP-CELP core coder achieves a high coding performance by introducing a multi-pulse vector quantization as depicted in Fig. 2 (Taumi et al., 1996;Ozawa et al., 1997). The input speech of 10 ms frame is processed through Linear Prediction (LP) and pitch analysis. The LP coefficients are quantized in the Line Spectrum Pairs (LSP) domain. The pitch delay is encoded by using an adaptive codebook. The residual signal for LP and the pitch analysis is encoded by the multi-pulse excitation scheme. The multi-pulse excitation signal is composed of several non-zero pulses. The pulse positions are restricted in the algebraicstructure codebook and determined by an analysis-bysynthesis approach (Laflamme et al., 1991). The pulse signs and positions are encoded, while the gains for pitch predictor and the multi-pulse excitation are normalized by the frame energy and encoded.
Bitrate scalable tool: This study uses at most 3 stages of the bitrate scalable tools according to the MPEG-4 CELP requirement. The bitrate scalable tool is connected to the core coder as illustrated in Fig. 3 (Nomura et al., 1998). The bitrate scalable tool encodes the residual signal produced at the MP-CELP core coder utilizing the multi-pulse vector quantization. Adaptive pulse position control is employed to change the algebraic-structure codebook at each excitationcoding stage depending on the encoded multi-pulse excitation at the previous stage. The algebraic-structure codebook is adaptively controlled to inhibit the same pulse positions as those of the multi-pulse excitation in the MP-CELP core coder or the previous stage. The pulse positions are determined so that the perceptually weighted distortion between the residual signal and output signal from the scalable tool is minimized. The LP synthesis and perceptually weighted filters are commonly used for both the MP-CELP core coder and the scalable tool. For this conventional coder, to support the functionality of multiple bitrates, the number of multipulse is chosen as 1, 5 and 10. The bit allocation is shown in Table 1. As for bitrate scalable tool, each stage increases the bitrate of 800 bps. Though, as for 1 multi-pulse, the total bitrate are 5600, 6400, 7200 and 8000 bps respectively. As for 5 multi-pulses, the total bitrate are 8200, 9000, 9800 and 10600 bps respectively. And as for 10 multi-pulses, the total bitrate are 12200, 13000, 13800 and 14600 bps respectively.
HPDR technique for tonal language speech: In Thai language, there are 5 different tones, mid (0), low (1), falling (2), high (3) and rising (4), whose characteristics are depicted in Fig. 4 (Chompun et al., 2001a;2001b). Each graph represents the behavior of fundamental frequency (f0) in a period of syllable time where f0 is the inverse of pitch delay time. Though, f0 indicates the periodicity of voice. Investigating the difference between Thai male and Thai female f0 behaviors, Thai female f0 change rate is almost all more than Thai male f0's, see e.g., (Thathong et al., 2000). This is why the Thai female speech quality encoded by CS-ACELP coder is lower than the Thai male speech quality (Chompun et al., 2000). Hence, detecting f0 with high precision yields the improvement of the tonal language speech quality.
Since pitch delay (inverse of f0) significantly involves in tone of tonal language, this study proposes an improvement of the bitrate scalable MP-CELP coder by applying the HPDR technique to the pitch analysis of the core coder. The HPDR at pitch fraction of 1/2, 1/3 and 1/4 is adopted to the pitch analysis, consequently, it causes the increments of bitrate as 200, 400 and 400 bps respectively.
The HPDR technique is done by including the pitch fraction analysis within the conventional pitch analysis which finds the optimum fraction around the prior pitch delay integer of the conventional pitch analysis. In order to find the adaptive excitation for the proposed technique, the FIR filter based on a Hamming windowed sin(x)/x function truncated at ±11 and padded with zeros at ±12 is adopted to weight the excitation in the pitch fraction analysis (Chompun et al., 2001a;2001b).
Gaussian Channel Model: In the AWGN channel, zero-mean white Gaussian noise is added to the transmitted signal s(t), so that the received signal r(t) can be represented as: where, n(t) is a zero-mean white Gaussian noise process with power N0/2 (Manglani and Bell, 2001).  (Sklar, 1997).

Experimental conditions:
The coding quality of the proposed coder was evaluated subjectively and types of simulated channels including AWGN channel and Rayleigh fading channel. The comparison tests objectively by using 36 tested sentences from 16 men and 16 women, some of them were shown in Table 2.  The effectiveness of the high pitch delay resolutions applied to the conventional coder was evaluated using average MOS scores. There are two between the conventional coder and the modified coder were conducted and shown in graphs of Fig. 5 and 6. Figure 5 show the speech quality transmitted through the AWGN channel, while Fig. 6 show the speech quality transmitted through the Rayleigh fading channel.

DISCUSSION
According to the subjective test (MOS score), graphs in Fig. 5 and 6 show that the speech quality of the coder modified with HPDR is above that of the conventional coder for all level of SNR at the same bitrate. This indicates that the proposed HPDR technique brings about better pitch precision which causes the improvement of the coding quality for tonal language over both AWGN channel and Rayleigh fading channel.

CONCLUSION
A modification of bitrate scalable tonal language speech coder has been proposed. This coder consists of a MP-CELP core coder and the bitrate scalable tools. The high pitch delay resolutions are applied to adaptive codebook of core coder for tonal speech quality improvement. The results show that the coding quality of the proposed coder is better than the conventional coder for Thai language over both AWGN channel and Rayleigh fading channel.