Live blogging from CALICO: Using Instant Messaging Bots and Unsupervised Learning techniques to promote Vocabulary Acquisition

(updated, or at least made a valiant attempt to do so, on June 3)

J. Scott Payne (Amherst College) Michael Lipschultz, (Penn State)

Note: Congrats to Michael who was just accepted to a PhD program in Computer Science at Pitt…

Issues: Scott talked about the need for longitudinal (Second Language Acquisition) SLA research- We are at the cusp of a very interesting time because we now have the tools and the number crunching machines that can give us the large quantities data we want/need… The problem with past SLAstudies: descriptive in nature, anecdotal, small sample size, and therefore very difficult to generalize these results

The tool: Bodo Bot
According to Scott, BodoBot is a “SLA tool in drag”

It aims to pull together The plethora of online language learning resources (i.e.machine trabnslation, grammar reference tools, corpora/concordances, wikipedia) that our students already use and chart, chronicle, monitor how and how successfully our students use them….

Knowing that 46% of 18-27 report using IM more frequently than email (Pew/Internet, 2004) and that the means of communication between students is IM (email is for talking to teachers and parents) The question is now how can we harness IM for SLA research?

Scott began with an explanation of “bots” From wikipedia:”software applications that run automated tasks over the internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human editor alone” The idea, therefore, would be to use these bots to analyze how students are using online resources for language learning.

Kinds of bots that exist now:

Chatterbots (e.g.ELIZA,ALICE –Artifical Linguisitic Internet Computer Entity–)

Goal oriented Bots:
Airline reservation systems, for example (limited in what they do, but more precise)

Goals for the project: Develop research tool prototype that tracks user behavior, IM Usage patterns

Features of BodoBot:
–can do basic conversations
–single multiword translations
–will provide some examples of single words in context
–will generate vocab quizzes
–recommends readings from google news to be matched to the user’s reported proficiency level
–logs all queries to the bot and buddy info

1) Word of the Day and 2) Vocab Quizzes 3) Suggested readings

readings: Come from Google news–> frequency band vocabulary coverage, automate readability index, coleman-liau index (it has the 2000 most frequent words and will look for the words within an article, compare to the list and the higher the percentage of words the easier the article)

Users–> GAIM or Pidgin (supports a multitude of IM tools)–>connect with Bodobot as one of the users–> Ask bodobot questions–> Bodobotn uses web based L2 resources (Google News, for example) to help answer questions—>all questions being entered into a MySql data base (and back again) which, if I am understanding this correctly, not only logs students’ trends when using this tool but also creates more words for the bodobots “memory”

Example: Student sends instant message to Bodobot, asks a questions in target language, The Bodobot parses results from Leo ( and other resources, and responds: single words to Leo, full sentences to google dictionaries …

You can ask the bot for the word of the day, which today was angrenzenden (adjacent) and then you can ask it to use that word in a sentence. The translation of the day (German–> English) was “”I do not want to go home because I my beer did not up drink”

Usage Stats:

2.5 months
100 users (94 active)
Average conversation 15.6 messages / 40 minutes
78.6 of messages in English
response time 2.5 seconds
Accuracy: 99% accuracy, took no more than 2 messages for the bot to “learn” the request
about 500 single word translation requests, 50 multi word translations, 100 reading requests…

Activities by Proficiency Group:
the lowest proficiency group is using the translation feature for whole sentences with greater frequency than other groups (relative to other groups)

Future research:
more sophisticated techniques for selecting contextual segments
refine the corpus…
news tasks (ie Jigsaw)
develop a web interface for tracking vocabulary learning
enable learners to send results to teachers
develop more sophisticated quiz item selection model (e.g. rasch model)
Include other languages (Spanish, Japanese, Russian, Chinese)
Examine translation request behavior
Examine IM use via a”buddy status” data

(the last two can now be analyzed using this tool)

At what point did the bot stop being helpful and became annoying?
What do we know about what the students are doing when using the tools?
–the researchers realize that they need to know more about the context of what students are doing when using the bot… need to find out what the task is that they have been assigned to do in class vs just the words they need to know
–The need for pedagogical framing so that the students will use some of the available higher level tools. Need to remind the students (through the bot? Through the teachers?) as to what is possible through this tool (not just single word translation)

The tool is all open source…available for looking under the hood as well as sharing

Interested? contact Scott:

Barbara is a Lecturer in Hispanic Studies at a small liberal arts college in Maine. Rumor has it this was also her alma mater. She used to work for a small liberal arts college in the cornfields of Ohio for almost 20 years as a teacher and language center director. Prior to these adventures in higher ed she taught high school Spanish and loved it. She wishes she had more time in her life to play with her dogs, write, read, swim, do yoga things and making stuff out of clay. To see her online portfolio please click here!

Leave a Reply

%d bloggers like this: