Speller Challenge

Contest closed - Congratulations to all winners!

Task

The goal of the Speller Challenge (the “Challenge”) is to build the best speller that proposes the most plausible spelling alternatives for each search query. Spellers are encouraged to take advantage of cloud computing and must be submitted to the Challenge in the form of REST-based Web Services. At the end of the challenge, the entry that you designate as your “primary entry” will be judged according to the evaluation measures described below to determine five (5) winners of the prizes described below.

Evaluation Measures

Submissions will be evaluated based on the Expected F1 (EF1, with tie breaking rules described below) against a test set created by sampling the search queries received by Bing. Due to legal and other constraints, the Bing Test Dataset will not be made publicly available. For the purpose of this Challenge, however, a development dataset based on the publicly available TREC queries based on the 2008 Million Query Track (1MQ) are annotated using the same guidelines and processes as in the creation of the Bing Test Dataset. This TREC Evaluation Dataset, will be made publicly available through the Microsoft Web N-gram service at http://spellerchallenge.com/DataSets.aspx.

The datasets have been manually annotated by up to 4 trained experts. The guidelines encourage the experts to annotate multiple plausible spelling variations for each term in the query so as to minimize any biases towards or against any typographic styles. For example, one can assume that there is no systematic preference to “webpage” versus “web page”. Systems that cover more plausible spelling variations are deemed as better spellers for purposes of this Challenge. Please note that the spelling variations in the datasets depict the necessary, but not the sufficient conditions, of a perfect speller. Namely, one should not regard the examples in the datasets as the ultimate guide for the speller’s outputs. Rather, a more appropriate way to interpret the data is that, if your speller misses some of the variations deemed plausible by human experts, there is obvious room for improvement in your speller. In addition to the TREC Dataset, participants are also encouraged to use a wide variety of other data sources, including third party datasets and the Microsoft Web N-gram service at http://web-ngram.research.microsoft.com/. The Challenge will maintain a web page containing links to shared datasets made available by other Challenge participants.

The Expected F1 (EF1) is the harmonic mean of expected Precision and Recall. Expected Precision (EP) is calculated based on the uniform utility function and the posterior probability for each spelling variation your speller generates for each query. To be more specific, suppose for query q your system generates spelling variations C(q)={ci,i=1,2,...,Nq} with posterior probabilities P(ciq). Let S(q) denote the set of plausible spelling variations annotated by the human experts for query q, and Ip(c,q)=1 if cS(q),0 otherwise. Given a query set Q, the Expected Precision is defined as

Expected Precision Formula

Note that if |S(q)| = 1 for all q, EP reduces to the well known Mean Reciprocal Rank (MRR) metric if

Similarly, we define a utility function IR(C(q),a)=1 if aC(q) for aS(q), 0 otherwise. The Expected Recall (ER) is defined as

Expected Recall Formula

and 1/EF1 = 0.5*(1/EP+1/ER).

Submissions via a Web Service Interface

To evaluate your speller using our data, just wrap your system as a REST based web service that is accessible from the internet. Once you submit the URL of your service at the evaluation web site http://spellerchallenge.com/, a job will be scheduled to call your web service and post a status update to the Challenge’s community page on Facebook and Twitter when the evaluation task is completed. Because we do not collect personally identifiable information (PII), there will be no notification through email or SMS. Following us on these social media or regularly checking the leaderboard are the only ways to monitor the status of your jobs.

Your web service should be accessible through HTTP GET with the following two parameters:

name type Description
runID String
(UTF-8)
A unique identifier you choose to name a particular run when you submit your evaluation request. This is useful when you implement many systems sharing the same web service interface. White spaces are not allowed in the string.
q String
(UTF-8)
A single, all-lower-cased test query. Depending on the software development tool you are using, you may have to un-escape percent-encoding characters (e.g. ‘%20’ as space) that are needed for HTTP transport

After your system successfully process the query, your web service should respond with a HTTP 200 OK and place your results in the response body encoded in UTF-8 with the MIME type ‘text/plain’. If you decide to use the Visual Studio (free download) as your development platform, you can start with the code examples described in Examples in .Net to avoid the plumbing details. For examples on coding web services on Azure, use the steps in Hosting in Windows Azure.

For each test query, your result may include multiple lines with each line containing one single answer. A single answer should contain two tab-separated fields. The first field is the corrected string and the second a score roughly indicating the posterior probability of the corrected string given the test query.

For example, if you submit a request to evaluate the system at “http://myinstitution.edu/speller” with a runID=”johnsmith123”, you can expect our evaluation job to send multiple HTTP GET requests to “http://myinstituion.edu/speller?runID=johnsmith123&q=foobar”. For each request, your answer should produce a result like:

foobar 0.65
foo bar 0.35

Each line, in the format of TAB separated UTF-8 string followed by a floating point number, corresponds to a spelling variation and its posterior probability. The output should be arranged in the descending order of the posterior probabilities. If the posterior probabilities sum up to be greater than 1, a normalization step will be applied. If any of the line is missing the posterior probability, the default distribution leading to MRR described above will be used. As implied in the interface design, the evaluation job will call your web service one query at a time to record the result and the latency. A request not receiving a response for more than 1 minute will be aborted and regarded as a failed request. The number of failed requests and the overall latency are used as tie-breakers to determine the winner of the challenge.

Speller Submission Limits

Between January 17th, 2011 and June 2, 2011, you may enter as many “test submissions” as you would like. Test submissions will be run against the TREC Evaluation Dataset to help you improve your spellers, but they will not count as official contest entries.

As implied in the Web service description above, each submission must contain a unique runID to distinguish the submissions.

At any time during the competition, participants will have the opportunity to select, among all their submissions, the one they wish to have judged as their "primary submission" in the competition.

The results on the TREC Evaluation Dataset will be posted on a leaderboard when participants have agreed to have their anonymous results posted. The results on the Bing Test Dataset will not be disclosed until the end of the competition. In order to be eligible for a prize, you must agree to have your results posted.

Primary Submission for Selecting the Winners

The speller you have listed on your team’s page as your primary entry at 11:59 AM PDT on June 2, 2011 will be your official contest entry. Only the speller labeled as “primary” will be used to select the Speller Challenge winner. You may submit up to two additional spellers to be evaluated on the Bing Test Dataset (but those spellers will not be eligible for the contest). By submitting your primary speller, you will receive one (1) entry into the contest.

Entries will be accepted until June 2, 2011, 11:59 AM PDT.

Participation

You may participate in the Challenge as an individual or as part of a team. However, teams must finalize and inform the organizer of their member lists before June 2 by creating a team ID. Although you can be a member of multiple teams, no two teams can have exactly the same members or Team ID. Each team needs to select one individual as the official entrant to the Challenge and the recipient of the prize check (if the team wins the Challenge).

The names and e-mail addresses of teams will not be publicly displayed during the competition. If you are an individual winner or member of a winning team, your name and affiliation will be announced following the selection of the Challenge winner.

Winners Selections and Prizes

At the end of the competition, the primary entries from all entrants will be ranked in decreasing order of their respective scores (defined above). The top five competitors will receive the following prizes:

  • 1st place: $10,000
  • 2nd place: $8,000
  • 3rd place: $6,000
  • 4th place: $4,000
  • 5th place: $2,000

Each of the five winning entries will be asked to submit a workshop paper describing their system as a condition for receiving the prize. Winners may also be invited to present a talk describing their winning method at an upcoming workshop in July.

Winners have 3 business days to claim the prize once the results of the Challenge have been announced.

Prizes are subject to eligibility criteria. View the full Official Rules of the Contest.

Questions related to this speller challenge should be directed to spellerchallenge@microsoft.com