Skip to content

Modeling Second Language Learner Speech

A week from now I’m giving a talk on creating a language model for second-language-learner speech (basically, my PhD research up to this point, and what will eventually become my thesis).

Information:

Speaker: Nick Mote
Date: 10 Dec 04
Time: 3:00pm – 4:30pm
Location: Information Sciences Institute (Marina Del Rey, California)

Abstract:

ISI’s Tactical Language Project is a system designed to teach Americans how to speak Arabic through a video game environment. We’ve taken a FPS engine (Unreal 2003), added skins and maps so it looks like you’re in a typical Lebanese village, taken away the guns, added speech recognition, and set the player in the middle of it all. The theory is that if you learn well in a classroom, you’ll perform well in a classroom–but if you learn well in a pseudo-naturalistic environment, you’ll perform better in real life. My research comes into play because speech recognition is a hard thing–especially when you’re trying to understand language-learner speech, with all of its mispronunciations, disfluencies, and grammatical errors. Understanding speech is hopeless unless you have a good approximation of what kinds of mistakes learners make, and can anticipate them.

Say an English learner says “Water”. Is he asking you for water? Is he telling you there’s a puddle in front of you? Is he saying his name is “Walter”, but mispronouncing it? There’s a lot of ambiguity involved. In order to disambiguate, we need to look at context, the learner’s past language performance, and details about the learner’s mother language as it relates to English, to be able to guess what he is actually trying to say.

And then, of course, once we have a good guess at what the learner has said, what do we do about it? How do we correct him? How serious are different speech disfluencies in terms of native listener comprehension, pedagogical objectives, and social politeness (the Lebanese word ra’iib (sergeant) dangerously close to the word rahiib (terrible) ). We want to take special corrective care to make sure learners don’t make errors like these). And how do we compensate for poorly-performing speech recognition (ASR works great with a lot of data, but there’s not too much annotated data of Americans learning specific subdialects of Arabic)?

This is basically what I’m doing. I use a lot of Natural Language Processing–primarily statistical NLP, with a bit of pedagogy theory and linguistic (SLA and phonology) theory sprinkled in.

Let me know if you want to come, I can give you more details. I’ll also put my presentation slides up here (once I’m finished writing them).