Kaggle, a company most famous for facilitating competitions that allow organisations to solicit the help of teams of data scientists to solve their problems in return for a nice big prize, recently introduced a new section useful even for the less competitive types: “Kaggle Datasets“.
Here they host “high quality public datasets” you can access for free. But what is especially nice is that as well as the data download itself, they host any scripts, code and results that people have already written to handle them, plus some general discussion.
For example, on the “World Food Facts” page you can see a script that “ByronVergoesHouwens” wrote to see which countries ate the most sugar, and also a chart that that script produced. In fact you can even execute scripts online, thanks to their “Kaggle Scripts” product.
It looks like the datasets will be added to regularly, but right now the list is:
- Amazon Fine Food Reviews
- Twitter US Airline Sentiment
- SF Salaries
- First GOP debate Twitter Sentiment
- 2013 American Community Survey
- US Baby Names
- May 2015 Reddit Comments
- 2015 Notebook UX Survey
- NIPS 2015 Papers
- Iris (yes, the one you will have seen many times already if you’ve read ANY books/tutorials on clustering in R or similar!)
- Meta Kaggle
- Health Insurance Marketplace
- US Dept of Education: College Scoreboard
- Ocean Ship Logbooks (1750-1850)
- World Development Indicators
- World Food Facts
- Hilary Clinton’s Emails (sounds fun…:-))