Summary: | In this paper speech theories and some methodological concerns about feature extraction and classification techniques widely used in speech recognition system are surveyed and discussed. The shortage of isolated word speech recognition is addressed as compared to its phoneme-based counterpart. This paper could be regarded as a very early stage towards methodology establishment in searching for better accuracy and less complexity system which has more generic properties. It is hoped that the system can classify speech regardless of the varieties across languages or accents. Speaker independency (SI) manner speech recognition system is required for this application and in fact, in many other potential applications as much as a telephonic network (large database consists of many different speakers) is a primary requirement. Isolated-word ASR for fixed vocabularies has been successfully implemented using HMM, ANN and SVM but suffers from lack of adaptability to other languages and increase in complexity as number of vocabularies increases. Conversely, phonemes, the smallest unit of human speech sounds are apparently more feasible to represent the basic building block for cross-language mapping. In fact, the phonetic transcription systems such as IPA and SAMPA are widely recognized and standardized for several languages in the world. This paper intends to investigate the phoneme-based potential as language independent phonetic units to overcome the lack of available training data so as to achieve a more generic speech recognizer. © 2011 IEEE.
|