Microsoft Web N-gram services are a cloud-based platform for language modeling research in the areas of web search, natural language processing, speech, and related areas. A collaboration between Microsoft Research and Bing, the services provide access to real-world web-scale data with regular updates.
The Web N-gram services provide you access to:
- Content types: Document Body, Document Title, Anchor Texts, and Query
- Model types: Smoothed Backoff N-gram models with N up to 5
- Locale: Web documents indexed by Bing in the EN-US market
- Access: Hosted Services by Microsoft with SOAP and REST interfaces. Python development kits are also available.
- Web models: N-gram models based on Web snapshot taken in June 2009 has been and will always be available. Additionally with the support of NSF, models from two snapshots taken in April 2010 and October 2010 will be hosted on Windows Azure for at least 3 years. Further updates will be updated based on community feedback.
- Query models: N-gram models based on 9 months of Bing queries up to June 2009 will always be available. In addition, a monthly update to query N-gram will also be provided. The services will maintain up to 3 query Ngrams based on storage and usage patterns.
Ready to try it out? Please read the terms and quickstart guides at the information page and join us on the social media (See "Learn More" section on the right) for updates and community discussion. Send mail to us if you have links to your demos you would like to share!
Web N-gram is brought to you by Microsoft Research in partnership with Microsoft Bing.
- Workshop on Speller Alteration for Web Search, July 19, 2011, Bellevue, Washington
- MSR-Bing Speller Challenge
- SIGIR Web N-gram Workshop
July 23, 2010, Geneva, Switzerland
- Exploring Web Scale Language Models for Search Query Processing, WWW 2010
- An Overview of Microsoft Web N-gram Corpus and Applications, NAACL-HLT 2010