Skip to content

Human-Computer Collaborative Approach to Computer Aided Assessment

trying to blog my research :
Mary Wood from University of Manchester just gave a talk on “A Human-Computer Collaborative Approach to Computer Aided Assessment“.

Abstract:

The ABC (Assess by Computer) system has been developed and used in the School of Computer Science at the University of Manchester for formative and (principally) summative assessment at undergraduate and postgraduate
level. We believe that fully automatic marking of constructed answers -especially free text answers – is not a sensible aim. Instead – drawing on parallels in the history of machine translation – we take a “human-computer collaborative” approach, in which the system does what it can to support the efficiency and consistency of the human marker, who
keeps the final judgement.

Our current work focuses on what are generally referred to as “short text answers” as contrasted to “essays”. However we prefer to contrast “factual” with “discursive” answers, and speculate that the former may be
amenable to simple statistical techniques, while the latter require more sophisticated natural language analysis. I will show some examples of real exam data and the techniques we are using and developing to handle them.

Interesting. Rather than try to make computers do a job perfectly, they team up a computer with human and try to exploit the strengths of each. To enable adoption in the teaching community they have made the machine aide as subtle as possible, so that it remains in the background and doesn’t “steal control” from the teacher during the grading process. Most interaction is thus passive (sorting and clustering of student answers, key word hilighting, etc).

Challenges:

  • System must deal with different question types, each requiring different approaches (multiple choice choice could easily be judged by a machine system, short answer could theoretically be graded using clustering and keyword matches, and long essays should theoretically be graded using natural language understanding techniques).
  • System must deal with dirty data (spelling mistakes, many ways of expressing the same thing, context-specific synonyms, deep domain knowledge).

Solutions:

  • Take advantage of the effectiveness of simple tools: use LSA and other techniques for clustering, sort by length, hilite keywords (both exact matches to kewords and fuzzy matches)
  • Let teachers adaptively build domain knowledge on-the-fly

(fuzzy matching to keywords was done using string edit distance. naive attempts have been made at pronunciation edit distance (and what about keyboard-typo edit distance?).).

Benefits:

  • improves grading speed by a factor of 2-3 (pessimistic) or 6-8 (optimistic)
  • anonymizes students’ work
  • allows different “views” on data (“let me see all answers from this students’ test” or “let me see all the students answers for this question”)
  • as an NLP project, system use, data gathering, and system usability improvement is a bootstrapping cycle
  • no noticeable bad effects on grading reliability

Very cool, just a shame that Manchester doesn’t have a bigger NLP group to do work on this.