Speller Challenge

Spelling Alteration for Web Search Workshop

July 19, 2011 Bellevue, WA, USA


Goal

The Spelling Alteration for Web Search workshop addresses the challenges of web scale Natural Language Processing with a focus on search query spelling correction. The goal of the workshop is the following:

  • provide a forum for participants in the Speller Challenge (details at http://www.spellerchallenge.com, with a submission deadline for the award competition on June 2, 2011) to exchange ideas and share experience;
  • official award ceremony for the prize winners of the Speller Challenge
  • engage the community on future research directions on spelling alteration for web search

Although the workshop is not limited to the Speller Challenge participants, we encourage the on-demand evaluation web application and the referenced testing dataset (see below) to be included in the system evaluation of the prospective workshop submissions so that different systems and approaches can be compared on the same benchmark. Furthermore, we encourage the submitted systems be made publicly accessible throughout the workshop to facilitate live demonstrations.

Agenda

TimeEvent/TopicPresenterChair
08:30Breakfast
09:00OpeningEvelyne Viegas - Microsoft ResearchJianfeng Gao
09:10Award presentationHarry Shum - Corporate Vice President Microsoft
09:30ReportKuansan Wang - Microsoft Research
10:00Break - snacks
10:30A Data-Driven Approach for Correcting Search QueriesGord Lueck - Independent Software Developer, CanadaJianfeng Gao
10:50CloudSpeller: Spelling Correction for Search Queries by Using a Unified Hidden Markov Model with Web-scale ResourcesYanen Li, Huizhong Duan, ChengXiang Zhai - UIUC, Illinois, USA
11:10qSpell: Spelling Correction of Web Search Queries using Ranking Models and Iterative CorrectionYasser Ganjisaffar, Andrea Zilio, Sara Javanmardi, Inci Cetindil, Manik Sikka, Sandeep Katumalla, Narges Khatib, Chen Li, Cristina Lopes - University Of California, Irvine, USA
11:30TiradeAI: An Ensemble of SpellcheckersDan Stefanescu, Radu Ion, Tiberiu Boros - Research Institute for Artificial Intelligence, Romania
11:50Spelling Generation based on Edit DistanceYoh Okuno - Yahoo Corp. Japan
12:00Lunch
13:00A REST-based Online English Spelling Checker "Pythia"Peter Nalyvayko - Analytical Graphics Inc., USAKuansan Wang
13:15Why vs. HowTo: Maintaining the right balanceDan Stefanescu, Radu Ion - Research Institute for Artificial Intelligence, Romania
13:30Panel DiscussionAnkur Gupta, Li-wei He - Microsoft
Gord Lueck, Yanen Li, Yasser Ganjisaffar, Dan Stefanescu - Speller Challenge Winners
14:30Wrap up

Workshop Proceedings

The workshop papers will be published in an electronic proceeding with highlights to be presented in the SIGIR 2011 main conference.

Data Services for the workshop

As part of the Speller Challenge, Microsoft has made available the following datasets and utilities in the form of publicly accessible web services (published at http://web-ngram.research.microsoft.com). These datasets and resources offer a common platform on which comparable results can be contrasted in the workshop. Furthermore, these resources will remain accessible after the challenge, so that future researchers can test new ideas against the published benchmarks presented in the workshop.

  • On-demand evaluation web application: The challenge makes available a web application that can be invoked on demand to evaluate a spelling correction system conforming to the web service interface defined by the challenge (published in http://www.spellerchallenge.com). The evaluation program utilizes a standard dataset comprised of search queries received by Bing in the EN-US market with manual annotation for typographical errors.
  • Web scale multi-style language models and Contextual similarity dataset: two web services based on Bing’s web snapshots of June 2009 and April 2010 containing (1) the language models derived from web document body, the title, and the anchor text and (2) web document terms appearing in similar lexical contexts where many spelling errors can be observed.
  • Spelling correction development dataset: A query dataset based on TREC million-query track manually annotated with spelling corrections.

Submission

Paper length:

4 pages of content and any number of additional pages containing references only

Paper format:

Submissions should be in ACM SIGIR format. LaTeX and Word templates are available on the ACM Web site.

How to submit:

Papers should be submitted electronically via the online submission page at https://cmt.research.microsoft.com/SC2011/. All submission must be in English. At least one author of each accepted paper is expected to present the paper at the workshop.

Organizers

Evelyne Viegas, Jianfeng Gao, Kuansan Wang

Microsoft Research, One Microsoft Way Redmond, WA 98052

Jan Pedersen

Microsoft, One Microsoft Way Redmond, WA 98052

Questions related to this speller challenge should be directed to spellerchallenge@microsoft.com