CASRA+: A Colloquial Arabic Speech Recognition Application
Abstract
The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this case the stored model chosen was a phoneme-based model. This reference model differs from the direct word template matching, where speech features that are extracted from the input are directly compared to the word templates. Each word template in the direct matching model was stored as a vector of feature parameters. Thus, when the vocabulary size of the ASR system becomes large, the memory size for the word template will become humongous. In contrast, the model used here was phoneme-like template matching. Word templates are stored as phoneme-like template parameters. Thus, the memory size for the word templates will not grow as fast as that of the direct matching model.
DOI: https://doi.org/10.3844/ajassp.2007.23.32
Copyright: © 2007 Ramzi A. Haraty and Omar El Ariss. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,544 Views
- 2,455 Downloads
- 18 Citations
Download
Keywords
- Arabic language
- speech recognition
- template matching