Analytical Study of High Pitch Delay Resolution Technique for Tonal Speech Coding

: Problem statement: In tonal-language speech, since tone plays important role not only on the naturalness and also the intelligibility of the speech, it must be treated appropriately in a speech coder algorithm. Approach: This study proposes an analytical study of the technique of High Pitch Delay Resolutions (HPDR) applied to the adaptive codebook of core coder of Multi-Pulse based Code Excited Linear Predictive (MP-CELP) coder. Results: The experimental results show that the speech quality of the MP-CELP speech coder with HPDR technique is improved above the speech quality of the conventional coder. An optimum resolution of pitch delay is also presented. Conclusion: From the analytical study, it has been found that the proposed technique can improve the speech coding quality.


INTRODUCTION
Speech coding is an important process in the present digital mobile communications. The number of users to access the communication networks increases rapidly. As a result, channel capacity has to be increased, in which the speech compression or coding aims to perform this (Chompun et al., 2000).
MP-CELP coder has been proposed to be a scalable coder for Moving Picture Expert Group-4 (MPEG-4) speech coder standards at low bit rate. This flexible coder employs the multi-pulse excitation which the number of pulses in fixed-entry codebook is selective for bitrate scalability and multiple bitrate functionality according to the MPEG-4 CELP speech coder requirements (Nomura et al., 1998;Chomphan, 2010b). In MP-CELP, amplitudes or signs for multipulse excitation are simultaneously vector quantized.
To improve speech quality for background noise conditions, the adaptive pulse location restriction method are applied . This coder operates at various bitrates ranging from 4-12 kbps utilizing the flexibility in multi-pulse excitation coding (Chomphan, 2010a).
Since Thai is a tonal language, a syllable is composed of consonants, vowels and tone (Wutiwiwatchai and Furui, 2007). The smallest structure of sounds or syllables is composed of one vowel unit or one diphthong, one, two or three consonants and a tone. The structure is illustrated in Fig. 1, where Ci is initial consonant, Cf is final consonant, V is vowel and T is tone.
The important difference between tonal and toneless language is the existence of Tone. In tonal language, the words with different tones yield their distinguished meaning. By using the standard speech coder such as CS-ACELP with tonal language, it showed the degraded speech quality when compared to those of toneless language. The reason is that the tone information precision is not enough for tonal language, e.g., (Chompun et al., 2000;Wutiwiwatchai and Furui, 2007).
This study presents a technique of high pitch delay resolutions or HPDR for a bitrate scalable tonal language speech coder based on a multi-pulse based code excited linear predictive coding. It aims at preserving the tone information precision. The experimental results show the efficiency of the HPDR technique with different resolutions.

MP-CELP core coder:
The MP-CELP core coder achieves a high coding performance by introducing a multi-pulse vector quantization as depicted in Fig. 2 (Taumi et al., 1996;Ozawa et al., 1997). The input speech of 10 m sec frame is processed through Linear Prediction (LP) and pitch analysis. The LP coefficients are quantized in the Line Spectrum Pairs (LSP) domain. The pitch delay is encoded by using an adaptive codebook. The residual signal for LP and the pitch analysis is encoded by the multi-pulse excitation scheme. The multi-pulse excitation signal is composed of several non-zero pulses. The pulse positions are restricted in the algebraic-structure codebook and determined by an analysis-by-synthesis approach, e.g., (Laflamme et al., 1991;Chomphan, 2010a). The pulse signs and positions are encoded, while the gains for pitch predictor and the multi-pulse excitation are normalized by the frame energy and encoded.
HPDR technique: An important parameter of MP-CELP speech coder is pitch delay which is inversely propotional with the fundamental frequency. Basically, the fundamental frequency contour determines the characteristics of tones. In summary, to treat the tonal characteristics precisely, the pitch delay should be analyzed correctly and precisely.
Instead of using the pitch delay with integer number, we apply the pitch fraction in the order of one second, one third and one fourth.
Applying the pitch fraction at one second (1/2), the additional bit information that must be included in the output bitstream is 200 bps. Meanwhile, applying the pitch fraction at one third (1/3) and one fourth (1/4), the additional bit information that must be included in the output bitstream is 400 bps.

Pitch fraction analysis:
It is done by considering the cross correlation of the target signal and the excitation signal in the previous stage or in the buffer memory. Applying the pitch fraction at one second (1/2), the optimal pitch fraction corresponds to the fraction that maximizes the cross correlation function in the following Eq. 1: In the maximization process of cross correlation function, the optimal pitch delay (k) and pitch fraction (t) are used to obtain the optimal excitation signal v (n) by going back to excitation signal u (n) with corresponding time distance in the buffer memory as shown in Eq. 3 and Fig. 4: To apply the pitch fraction at one third (1/3), Eq. 4 is used instead of Eq. 1 and Eq. 5 is used instead of Eq. 3. Finally, to apply the pitch fraction at one fourth (1/4), Eq. 6 is used instead of Eq. 1 and 7 are used instead of Eq. 3:

RESULTS
The coding quality of the MP-CELP speech coder with HPDR technique was evaluated subjectively by using 36 tested sentences from 16 men and 16 women. The Hamming window width is varied from 5-37 samples. The sign "-" in Table 1 denotes the conventional coder without HPDR (1/2) technique.

DISCUSSION
From Table 1, it has been seen that the coding quality increases when the Hamming window width is increased.   It is noted that the coder with HPDR technique gives the higher score than that of the conventional coder for all core bitrates. Moreover the coding quality of MP-CELP speech coder at the core bitrate of 12200 bps gives the highest score, while the coding quality of MP-CELP speech coder at the core bitrate of 5600 bps gives the lowest score at the same Hamming window width.

CONCLUSION
This study presents the HPDR technique to improve the coding quality for tonal language such as Thai. This core coder is based on MP-CELP speech coder. The high pitch delay resolutions are applied to adaptive codebook of core coder for tonal speech quality improvement. The results show that the coding quality of the proposed coder is better than the conventional coder for Thai language.