Infrastructure

Home page

Infrastructure

Azerbaijan embarks on speech recognition project

12 February 2009 10:34 (UTC +04:00)

Baku. Aynur Veliyeva â€“ APA-Economics. The first Azeri language speech recognition system is being developed in Azerbaijan as part of the Dilamnc (Translator) Project between the UNDP Office in Baku and Azerbaijanâ€™s Ministry of Communications and Information Technologies.
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. This technology also makes speech-to-speech automatic translation possible.
The project timeframe is 2009-2012 and English-Azeri and Azeri-English options will be created with an option to add other languages in the future if there is a demand.
The project officer Rauf Fatullayev said the initially developed software is slated for presentation at the end of this year.
According to experts, speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear.
Second, acoustic variabilities can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speaker’s physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal tract size and shape can contribute to across-speaker variabilities.
Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect and accent are handled by allowing search algorithms to find alternate paths of phonemes through these networks.
Besides, grammatically complex side of the Azerbaijani language makes it difficult for the development of speech recognition system because of multiple affixation.
â€œAutomatic speech recognition as a tool to convert spoken words to machine-readable input is useful in automatic translation, health care, military, aviation, telephony and other domains, traffic control and navigation systems, industry, smart house, learning a foreign language, use of computers etc. This will allow controlling household appliances hands-free,â€ said Rauf Fatullayev.
Modern general-purpose speech recognition systems are generally based on Hidden Markov Models (HMM). These are statistical models which output a sequence of symbols or quantities.
Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach.