Prosody Modification of Standard Arabic Speech Using Combining Synchronous Overlap and Add With Fixed-Synthesis Algorithm and Multi Level Discrete Wavelet Transform

Ykhlef Faycal; Bensebti Mesaoud; Bendaouia Lotfi

doi:10.3844/jcssp.2010.392.405

Research Article Open Access

Prosody Modification of Standard Arabic Speech Using Combining Synchronous Overlap and Add With Fixed-Synthesis Algorithm and Multi Level Discrete Wavelet Transform

Ykhlef Faycal, Bensebti Mesaoud and Bendaouia Lotfi

Abstract

Problem statement: The objective of prosody modification is to change the amplitude, duration and pitch (F₀) of speech segments without altering their spectral envelop. Applications are numerous, including, Text-To-Speech synthesis, transformation of voice characteristics and foreign language learning. Several approaches have been developed in the literature to achieve this goal. The main restrictions of these latter are in the modification range, the synthesized speech quality and naturalness of spoken language. The latest research studies provide evidence that the first Formant (F₁) and F₀ are dependent; suggesting that in order to preserve high quality and naturalness of the speech signal, any change to one of these parameters must be accompanied by a suitable modification of the other. Approach: This study introduced a prosody modification method using combining Synchronous Overlap and Add with Fixed-Synthesis (SOLAFS) algorithm and a multi level decomposition based on Discrete Wavelet Transform (DWT) to overcome the limitations cited above. It used Standard Arabic (SA) sounds. For a purpose of comparison, two techniques based on frame by frame processing were proposed. The first one consists in a pitch synchronous processing of the m^th approximation level time segments used in SOLAFS algorithm. It was aimed to modify the prosody of the input speech without affecting the spectral envelop. The second one explores the correlation between F₁ and F₀ in the corresponding approximation level of SA sounded and modified duration and both F₀ and F₁ scales. It was based on a re-sampling method using FFT interpolation. The use of multi level analysis was aimed to provide independent control over the spectral envelope. In both techniques, the decomposition level depends on the chosen sampling Frequency (F_S). F₀ marking was based on multi level peaks comparison. Both techniques use an automatic speech classification algorithm based on modified version of the Johnson algorithm. Results: The performances of The performances of the proposed techniques were evaluated by listening tests using sentences in SA language sampled at an F_S of 16 kHz. It was found that manipulation in the third approximation level of F₀ in conjunction with the local F₁ improved significantly the naturalness of the modified speech compared to the classical prosody modification. Conclusion: This improvement was most suitable for high F₀ scales from the fact that speaker generally increases F₁ as they increase their F₀. Further, the technique can be used in the manipulation of the remained formant structure.

Journal of Computer Science

Volume 6 No. 4, 2010, 392-405

DOI: https://doi.org/10.3844/jcssp.2010.392.405

Submitted On: 22 February 2010 Published On: 30 April 2010

How to Cite: Faycal, Y., Mesaoud, B. & Lotfi, B. (2010). Prosody Modification of Standard Arabic Speech Using Combining Synchronous Overlap and Add With Fixed-Synthesis Algorithm and Multi Level Discrete Wavelet Transform. Journal of Computer Science, 6(4), 392-405. https://doi.org/10.3844/jcssp.2010.392.405

Copyright: © 2010 Ykhlef Faycal, Bensebti Mesaoud and Bendaouia Lotfi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

2,576 Views
2,743 Downloads
0 Citations

Download

Keywords

Prosody modification
SOLAFS
PSOLA
intelligibility
naturalness
distortion