Contest closed - Congratulations to all winners!
Task
The goal of the Speller Challenge (the “Challenge”) is to build the best speller
that proposes the most plausible spelling alternatives for each search query. Spellers
are encouraged to take advantage of cloud computing and must be submitted to the
Challenge in the form of REST-based Web Services. At the end of the challenge, the
entry that you designate as your “primary entry” will be judged according to the
evaluation measures described below to determine five (5) winners of the prizes
described below.
Evaluation Measures
Submissions will be evaluated based on the Expected F1 (EF1, with tie breaking rules
described below) against a test set created by sampling the search queries received
by Bing. Due to legal and other constraints, the Bing Test Dataset will not be made
publicly available. For the purpose of this Challenge, however, a development dataset
based on the publicly available TREC queries based on the 2008 Million Query Track
(1MQ) are annotated using the same guidelines and processes as in the creation of
the Bing Test Dataset. This TREC Evaluation Dataset, will be made publicly available
through the Microsoft Web N-gram service at http://spellerchallenge.com/DataSets.aspx.
The datasets have been manually annotated by up to 4 trained experts. The guidelines
encourage the experts to annotate multiple plausible spelling variations for each
term in the query so as to minimize any biases towards or against any typographic
styles. For example, one can assume that there is no systematic preference to “webpage”
versus “web page”. Systems that cover more plausible spelling variations are deemed
as better spellers for purposes of this Challenge. Please note that the spelling
variations in the datasets depict the necessary, but not the sufficient conditions,
of a perfect speller. Namely, one should not regard the examples in the datasets
as the ultimate guide for the speller’s outputs. Rather, a more appropriate way
to interpret the data is that, if your speller misses some of the variations deemed
plausible by human experts, there is obvious room for improvement in your speller.
In addition to the TREC Dataset, participants are also encouraged to use a wide
variety of other data sources, including third party datasets and the Microsoft
Web N-gram service at http://web-ngram.research.microsoft.com/.
The Challenge will maintain a web page containing links to shared datasets made
available by other Challenge participants.
The Expected F1 (EF1) is the harmonic mean of expected Precision and Recall. Expected
Precision (EP) is calculated based on the uniform utility function and the posterior
probability for each spelling variation your speller generates for each query. To
be more specific, suppose for query q your system generates spelling variations
C(q)={ci,i=1,2,...,Nq}
with posterior probabilities P(ci│q). Let S(q)
denote the set of plausible spelling variations annotated by the human experts for
query q, and Ip(c,q)=1 if c∈S(q),0
otherwise. Given a query set Q, the Expected Precision is defined as
Note that if |S(q)| = 1 for all q, EP reduces to the well known
Mean Reciprocal Rank (MRR) metric if
Similarly, we define a utility function IR(C(q),a)=1
if a∈C(q) for a∈S(q), 0 otherwise.
The Expected Recall (ER) is defined as
and 1/EF1 = 0.5*(1/EP+1/ER).
Submissions via a Web Service Interface
To evaluate your speller using our data, just wrap your system as a REST based web
service that is accessible from the internet. Once you submit the URL of your service
at the evaluation web site http://spellerchallenge.com/,
a job will be scheduled to call your web service and post a status update to the
Challenge’s community page on
Facebook and Twitter when the
evaluation task is completed. Because we do not collect personally identifiable
information (PII), there will be no notification through email or SMS. Following
us on these social media or regularly checking the leaderboard are the only ways
to monitor the status of your jobs.
Your web service should be accessible through HTTP GET with the following two parameters:
|
name
|
type
|
Description
|
|
runID
|
String
(UTF-8)
|
A unique identifier you choose to name a particular run when you submit your evaluation
request. This is useful when you implement many systems sharing the same web service
interface. White spaces are not allowed in the string.
|
|
q
|
String
(UTF-8)
|
A single, all-lower-cased test query. Depending on the software development tool
you are using, you may have to un-escape percent-encoding characters (e.g. ‘%20’
as space) that are needed for HTTP transport
|
After your system successfully process the query, your web service should respond
with a HTTP 200 OK and place your results in the response body encoded in UTF-8
with the MIME type ‘text/plain’. If you decide to use the
Visual Studio (free download) as your development platform, you can start
with the code examples described in Examples in
.Net to avoid the plumbing details. For examples on coding web services
on Azure, use the steps in Hosting in Windows
Azure.
For each test query, your result may include multiple lines with each line containing
one single answer. A single answer should contain two tab-separated fields. The
first field is the corrected string and the second a score roughly indicating the
posterior probability of the corrected string given the test query.
For example, if you submit a request to evaluate the system at “http://myinstitution.edu/speller”
with a runID=”johnsmith123”, you can expect our evaluation job to send multiple
HTTP GET requests to “http://myinstituion.edu/speller?runID=johnsmith123&q=foobar”.
For each request, your answer should produce a result like:
foobar 0.65
foo bar 0.35
Each line, in the format of TAB separated UTF-8 string followed by a floating point
number, corresponds to a spelling variation and its posterior probability. The output
should be arranged in the descending order of the posterior probabilities. If the
posterior probabilities sum up to be greater than 1, a normalization step will be
applied. If any of the line is missing the posterior probability, the default distribution
leading to MRR described above will be used. As implied in the interface design,
the evaluation job will call your web service one query at a time to record the
result and the latency. A request not receiving a response for more than 1 minute
will be aborted and regarded as a failed request. The number of failed requests
and the overall latency are used as tie-breakers to determine the winner of the
challenge.
Speller Submission Limits
Between January 17th, 2011 and June 2, 2011, you may enter as many “test submissions”
as you would like. Test submissions will be run against the TREC Evaluation Dataset
to help you improve your spellers, but they will not count as official contest entries.
As implied in the Web service description above, each submission must contain a
unique runID to distinguish the submissions.
At any time during the competition, participants will have the opportunity to select,
among all their submissions, the one they wish to have judged as their "primary
submission" in the competition.
The results on the TREC Evaluation Dataset will be posted on a leaderboard when
participants have agreed to have their anonymous results posted. The results on
the Bing Test Dataset will not be disclosed until the end of the competition. In
order to be eligible for a prize, you must agree to have your results posted.
Primary Submission for Selecting the Winners
The speller you have listed on your team’s page as your primary entry at 11:59 AM
PDT on June 2, 2011 will be your official contest entry. Only the speller
labeled as “primary” will be used to select the Speller Challenge winner. You may
submit up to two additional spellers to be evaluated on the Bing Test Dataset (but
those spellers will not be eligible for the contest). By submitting your primary
speller, you will receive one (1) entry into the contest.
Entries will be accepted until June 2, 2011, 11:59 AM PDT.
Participation
You may participate in the Challenge as an individual or as part of a team. However,
teams must finalize and inform the organizer of their member lists before June 2
by creating a team ID. Although you can be a member of multiple teams, no two teams
can have exactly the same members or Team ID. Each team needs to select one individual
as the official entrant to the Challenge and the recipient of the prize check (if
the team wins the Challenge).
The names and e-mail addresses of teams will not be publicly displayed during the
competition. If you are an individual winner or member of a winning team, your name
and affiliation will be announced following the selection of the Challenge winner.
Winners Selections and Prizes
At the end of the competition, the primary entries from all entrants will be ranked
in decreasing order of their respective scores (defined above). The top five competitors
will receive the following prizes:
- 1st place: $10,000
- 2nd place: $8,000
- 3rd place: $6,000
- 4th place: $4,000
- 5th place: $2,000
Each of the five winning entries will be asked to submit a workshop paper describing
their system as a condition for receiving the prize. Winners may also be invited
to present a talk describing their winning method at an upcoming workshop in July.
Winners have 3 business days to claim the prize once the results of the Challenge
have been announced.
Prizes are subject to eligibility criteria. View the full
Official Rules of the Contest.
Questions related to this speller challenge should be directed to
spellerchallenge@microsoft.com