As of last night I finished up a submission to Eurospeech 2005 with Abhinav Sethy.
I’ve posted it here:
Modeling and Automating Detection of Errors in Arabic Language Learner Speech.
In a nutshell: Understanding bad-accent/bad-grammar learner speech is hard for humans. And what’s hard for humans is even harder for machines. Compound a relative lack of much speech data with an exponentially explosive number of ways a learner can mis-speak a sentence, and it feels like an impossible task. Through tricky statistical natural language processing techniques and smart Automated Speech Recognition, we do our best to face the problem.
This paper basically an abridged summary of my research up to this point for ISI‘s Tactical Language Project, concentrating on the Speech Recognition aspect of things. It’s also the first time we’ve posted acutal results of system accuracy. I must say that, upon running the formal tests this last week, I was surpised (in a good way) with the accuracy numbers that we were able to get!
(aside: For more information on the TactLang project, here is a good intro news story.)