Spelling Alteration for Web Search Workshop
July 19, 2011
Bellevue, WA, USA
Goal
The Spelling Alteration for Web Search workshop addresses the challenges of web scale
Natural Language Processing with a focus on search query spelling correction. The goal
of the workshop is the following:
- provide a forum for participants in the Speller Challenge (details at http://www.spellerchallenge.com,
with a submission deadline for the award competition on June 2, 2011) to exchange ideas and share experience;
- official award ceremony for the prize winners of the Speller Challenge
- engage the community on future research directions on spelling alteration for web search
Although the workshop is not limited to the Speller Challenge participants, we encourage
the on-demand evaluation web application and the referenced testing dataset (see below)
to be included in the system evaluation of the prospective workshop submissions so that
different systems and approaches can be compared on the same benchmark.
Furthermore, we encourage the submitted systems be made publicly accessible throughout the workshop to facilitate live demonstrations.
Agenda
| Time | Event/Topic | Presenter | Chair |
| 08:30 | Breakfast |
| 09:00 | Opening | Evelyne Viegas - Microsoft Research | Jianfeng Gao |
| 09:10 | Award presentation | Harry Shum - Corporate Vice President Microsoft |
| 09:30 | Report | Kuansan Wang - Microsoft Research |
| 10:00 | Break - snacks |
| 10:30 | A Data-Driven Approach for Correcting Search Queries | Gord Lueck - Independent Software Developer, Canada | Jianfeng Gao |
| 10:50 | CloudSpeller: Spelling Correction for Search Queries by Using a Unified Hidden Markov Model with Web-scale Resources | Yanen Li, Huizhong Duan, ChengXiang Zhai - UIUC, Illinois, USA |
| 11:10 | qSpell: Spelling Correction of Web Search Queries using Ranking Models and Iterative Correction | Yasser Ganjisaffar, Andrea Zilio, Sara Javanmardi, Inci Cetindil, Manik Sikka, Sandeep Katumalla, Narges Khatib, Chen Li, Cristina Lopes - University Of California, Irvine, USA |
| 11:30 | TiradeAI: An Ensemble of Spellcheckers | Dan Stefanescu, Radu Ion, Tiberiu Boros - Research Institute for Artificial Intelligence, Romania |
| 11:50 | Spelling Generation based on Edit Distance | Yoh Okuno - Yahoo Corp. Japan |
| 12:00 | Lunch |
| 13:00 | A REST-based Online English Spelling Checker "Pythia" | Peter Nalyvayko - Analytical Graphics Inc., USA | Kuansan Wang |
| 13:15 | Why vs. HowTo: Maintaining the right balance | Dan Stefanescu, Radu Ion - Research Institute for Artificial Intelligence, Romania |
| 13:30 | Panel Discussion | Ankur Gupta, Li-wei He - Microsoft Gord Lueck, Yanen Li, Yasser Ganjisaffar, Dan Stefanescu - Speller Challenge Winners |
| 14:30 | Wrap up |
Workshop Proceedings
The workshop papers will be published in an electronic proceeding with highlights to be presented in the SIGIR 2011 main conference.
Data Services for the workshop
As part of the Speller Challenge, Microsoft has made available the following
datasets and utilities in the form of publicly accessible web services (published at http://web-ngram.research.microsoft.com).
These datasets and resources offer a common platform on which comparable results
can be contrasted in the workshop. Furthermore, these resources will remain
accessible after the challenge, so that future researchers can test new ideas
against the published benchmarks presented in the workshop.
- On-demand evaluation web application: The challenge makes available a web
application that can be invoked on demand to evaluate a spelling correction system
conforming to the web service interface defined by the challenge (published in http://www.spellerchallenge.com).
The evaluation program utilizes a standard dataset comprised of search queries received by Bing in the EN-US market with manual annotation for typographical errors.
- Web scale multi-style language models and Contextual similarity dataset: two web
services based on Bing’s web snapshots of June 2009 and April 2010 containing (1) the language models derived from web document body,
the title, and the anchor text and (2) web document terms appearing in similar lexical contexts where many spelling errors can be observed.
- Spelling correction development dataset: A query dataset based on TREC million-query track manually annotated with spelling corrections.
Submission
Paper length:
4 pages of content and any number of additional pages containing references only
Paper format:
Submissions should be in ACM SIGIR format. LaTeX and Word templates are available on the
ACM Web site.
How to submit:
Papers should be submitted electronically via the online submission page at
https://cmt.research.microsoft.com/SC2011/. All submission must be in English. At least one author of each accepted paper is expected to present the paper at the workshop.
Organizers
Evelyne Viegas, Jianfeng Gao, Kuansan Wang
Microsoft Research,
One Microsoft Way
Redmond, WA 98052
Jan Pedersen
Microsoft,
One Microsoft Way
Redmond, WA 98052
Questions related to this speller challenge should be directed to
spellerchallenge@microsoft.com