Extraction of Arabic Standard Micromelody

: Problem statement: In the early days of speech synthesis research the obvious focus of attention was intelligibility. But many researchers agree that the major remaining obstacle to fully acceptable synthetic speech is that it continues to be insufficiently natural. Approach: In this study, we exploited microvariations of fundamental frequency (F 0 ) of speech (intrinsic and co-intrinsic effects) to extract micromelody effect in Standard Arabic in view to improve synthesis speech systems. Results: We found that Arabic voiced consonants micromelody effect exists and seems to be possible to be included in a prosodic generating unit by a simple model. Conclusion: This preliminary result need to be tested on larger corpora and must be following by incorporating a microprosodic model of duration and intensity.


INTRODUCTION
The analysis of prosody is important [1] in speech synthesis because it gives us the basis for making prosodic effects around our utterance plans (phonological prosodic processing) and later to arrive at suitable rendering strategies for the marked prosody (phonetic prosodic processing).
From the acoustic point of view, prosody refers to the phenomena linked to the variation in the time of the parameters of pitch, intensity and duration. The perception of pitch is essentially related to fundamental frequency which, at the physiological level of the production of the speech, corresponds to the frequency of vibration of the vocal cords. Intensity is essentially connected to the energy of the sound while the acoustic duration corresponds to its time of emission [2] . These three parameters harmonize in uneven proportions to give to every language its particular prosodic characteristics.
Prosody is often defined on two different levels [3] : • An abstract, phonological level (phrase, accent and tone structure • A physical phonetic level (fundamental frequency, intensity and duration) F 0 variations can be considered as the superposition of two phenomena: the macroprosodic effects which can be considered as the elocution intonative choice and microprosodic effects, which are linked to the phonetic constituents of the sentence. The macroprosody allows to apply a global approach of the melodic curve when the microprosody gives local variations.
So, microprosody is defined as the intrinsic and co-intrinsic influences of fundamental frequency (F 0 ), duration and intensity, due to segment identity and to phonetic context [3] .
Although for several year synthetic speech has been fully intelligible from a segmental perspective, there are areas of naturalness which still await satisfactory implementation [4] . High quality text-tospeech synthesis systems require accurate prosody labels to generate natural-sounding speech. In these systems, prosody is assigned based on information extracted from text.
In order to reinforce the existing systems of synthesis and recognition of Standard Arabic (SA), we made us in this study a new approach to determine the micromelody effect in Standard Arabic using our scripts in Praat [5] , the program for speech analysis and synthesis and results obtained by the Praat script for MOMEL (MELodic Modelisation ) [6] . Results shows that Arabic micromelody can be extract and its effect can be simply included in prosodic bloc generation.
Momel algorithm: MOMEL [6] is developed to allow automatic modelling of a raw fundamental frequency curve as a sequence of target points defining by a quadratic spline function.
Momel modelling: MOMEL consist of 4 stages to automatically modelling the curves [7] . There are: • Pre-processing of F 0 • Estimation of target candidates • Partition of candidates • Reduction of candidates In short, pre-processing of F 0 consist of reassign value for values which are more than a given ratio (reassign 0 for out of pitch value). Estimation of target candidates involves the process of eliminating out-of range value and label target value. Partitioning of target candidates is done by partitioning the window of 200 ms into two and comparing the value for the mean of the two. Reduction will eliminate outlaying candidate value and the remaining target let is then recalculated as final target of the segment. At the end of the MOMEL modelization, a quadratic spline function is used to give a close fit to the original curve [8] .

Corpus:
One native Arabic-speaker pronounced 16 sentences including all Arabic phonemes. The recording was made in an anechoic recording chamber in the Laboratory Parole et Langage (LPL) in Aix-en-Provence. The Praat computer program [5] was then used to analyse and manipulate the speech data. Sentences are then, segmented and aligned semiautomatically in phonemes and at the end, a Phonetic Transcription is made.

Method of Extraction of Micromelody Effect (EME):
In our approach, we followed the following stages: Stage 1: We execute MOMEL algorithm to our corpus with intention to extract the corresponding melodic curves from.

Stage 2:
We then proceed to a manual correction of the melodic curves modelled by MOMEL.
Stage 3: We then execute our 1 st developed PRAAT script, which allows us to determine the microprosodic profile looked for.

Stage 4:
Once the melodic data representing the microprosodic effect calculated, we then execute our 2nd PRAAT script, which we developed to model microprosodic profile for every used consonant. In order to model with precision the microprosodic evolution's effect of a consonant, we adopted the following approach: Knowing that each consonant lies between two vowels, we carry out to extract measurements of microprosodic effect of F 0 at the following points (Fig. 1).

RESULTS
Knowing that each consonant can appear several times throughout the used corpus, we have then opted for calculation of the median value of each of the 7 points corresponding in microprosodic profile. This choice is justified by the fact that contrary to the arithmetic mean which is considered as an average of size, the median is rather considered as an average of position and it is not influenced by the extreme values possibly very large or very small.
Once the calculation of the median made, we proceed to the layout of corresponding profile (we don't take into accounts both extreme values, S -∆ et E + ∆). Figure 2 shows the microprosodic profile evolution of phoneme [b] (case of 6 possible values of results) where x-axis represents the corresponding selected points and y-axis represents the mpp: the ratio microprosody = F 0 / Momel_F 0. We note that all curves generally, follow the same trajectory. Figure 3 shows median values obtained from the microprosody evolution of phoneme [b]: • We noted that although there is a well variation of the micromelodic curve, this last one is very weak (about 0.045) • We also noticed that this variation is always (for the most voiced studied consonants) maximum at the level of the M point

DISCUSSION
By calculating the global median value for each studied voiced Arabic consonants, we then obtained the Table 1.
From Table 1, several observations were made: • Calculated Median values lie between 0.81 and 1.00. It implies that the MOMEL's modeled frequency approaches very strongly the real value of F 0 • The microprosodic effect is almost non-existent for the case of nasals, semivowels and the liquid consonants. • The microprosodic effect is more evident for the fricatives than for the occlusives. • The microprosodic effect exists certainly but it is very weak, which does not require a complex mathematical expression to model its variation. • The obtained results come to strengthen the theory of several researchers who maintain the idea that the micromelodic effect can be very well neglected, what affects not at all the good quality of the corresponding speech synthesis

CONCLUSION
In this study, we present a new method which makes it possible to extract most automatically possible, the micromelodic information from speech signal using the original curve of fundamental frequency and of its macromelodic curve obtained using algorithm MOMEL.
Results obtained come to reinforce the idea that the microprosodic effect exists in fact. But the variance analysis pushes us to propose quite simply, only one additional relative lowering of the macromelodic curve, if we wish to improve the most simply possible naturalness of the synthesized voice.
However, complementary studies concerning microprosodic effect of duration as of energy are to be envisaged to complete our analysis.