I spent last week attending the CALICO symposium. The attendees were an interesting mix of language teachers, managers of language learning computer labs at Universities, linguists, and a few people interested in Computer Science/Linguist hybrid folk like me, interested in the intersection of Natural Language Processing and AI-driven computer pedagogy.
The signal-to-noise ratio would have been horrendously low, were it not for the handy “NLP in CALL” preconference workshop–it was easy to see early on who had research similar to mine.
The biggest surprise for me at the workshop was that I recognized nearly no one there. Now, I’m not an academic hermit by any means, but one of the biggest bits of take-home-knowledge for me from this conference was not technical in nature, but more meta: AI in CALL is a horribly fragmented field. Vis:
- On one hand you have the ASR-heavy research groups that focus on modeling of learner pronunciation. Project LISTEN, SRI’s Eduspeak, John Morgan @ USMA. Conferences like EuroSpeech
- On another hand, you have the AI-focused AI in Education groups that deal with CALL as a subset of general Computer Aided Pedagogy. AIEd is a good example of this
- On the third hand, you have these folks from CALICO, a tight-knit group of (what seems to be mostly German) language teachers and linguists, interested in the practical use of building tools to automatically assess student input.
I am still, constantly surprised to see the lack of NLP in the CALL field. Even CALICO researchers seem to be stuck in the 90s, what with their rules-based rather than statistical approaches. Or, perhaps there are facets to the field of learner speech (high language interference effects, small corpora) that don’t lend themselves as well to pure statistical solutions.
As an aside, I should note the conference was held in Honolulu, Hawaii (Yes, yes, as one of my fellow conference-goers said, “it’s funny, people have such a different preconceptions when you tell them you’re going to a conference in Hawaii, as opposed to, say, Detroit”). I’ve put up a bunch of pictures from this year’s CALICO on my flickr account.
I’ll be giving a presentation next Friday at 3pm, as part of the Natural Lanuage Processing seminar series, addressing some of my work with the Tactical Language project.
All are welcome to attend.
This talk will be a preview of the work I’ll be presenting at CALICO this year.
—-
Time: 3-4pm, Friday 2006/05/12
Location: 11-CR, ISI
Title: Pedagogical Contextualization of Language Learner Speech Errors
Abstract:
The traditional approach to diagnosing learner speech errors in Computer Aided Language Learning is to create a linguistic profile of the learner/user. We, however, propose that work must also be done to model the linguistic profile of a typcial native listener.
Not all errors in second langage learner speech are created equal. Different errors sound more “severe” or “harsh” to native speaker ears and should therefore be treated with more emphasis in pedagogical interaction.
The Tactical Language Training System (TLTS) is a speech-enabled virtual-reality based computer learning environment designed to teach Arabic spoken communication to American English speakers. This talk addresses the ways the TLTS contextualizes non-native speech errors, and how this contextualization fits in the corrective exchanges between a non-native learner and a pedagogical agent built to model a native listener.
The pedagogical system used in TLTS includes:
- Automatic Speech Recognition (ASR) models which are built on a combination of both annnotated and unannotated non-native speech with native speech data.
- A stochastic generative model for errors in learner speech that creates mispronunciation grammars for the ASR
- Reweighting of system-perceived mispronunciation severity based on aggregate native speaker judgements of quality pronunciation and intelligiblity.
- Contextualization of feedback based on lexical and phonetic inventories of the native and non-native languages.
Ah. Just submitted a summer research proposal last night and my MS thesis this morning (you can read the MS thesis here). Now all I have left is to finish up the slides for CALICO and to wrap up my PhD thesis proposal. Busy, busy, busy.
Huh. Leonard got filelight working on his mac. Not willing to headache with fink unstable, I searched around for another solution, and found Disk Inventory X. DIX isn’t bad by any means. It reminds me of SequoiaView, a favorite app of mine from my windows days. DIX’s UI is a bit slow (if, by “a bit”, you mean “glacially”), but its sorted metadata display is really neat (i.e. how much space is taken up, not only by directory path, but also by filetype).
I’d been suffering for space on my too-small-harddrive iBook, and the results of the scan were surprising: 1.5Gb iDVD.app? Deleted. Another 1.2Gb encyclopedia? I haven’t used an encyclopedia in years–that’s what wikipedia is for. Deleted.