Kaggle now offers free public dataset and script combos

Kaggle, a company most famous for facilitating competitions that allow organisations to solicit the help of teams of data scientists to solve their problems in return for a nice big prize, recently introduced a new section useful even for the less competitive types: "Kaggle Datasets". Here they host "high quality public datasets" you can access for free. … Continue reading Kaggle now offers free public dataset and script combos

Microsoft Academic Graph: paper, journals, authors and more

The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals and conference "venues" and fields of study. Microsoft have been good enough to structure and release a bunch of web-crawled data around scientific papers, journals, authors, URLs, keywords, references between and so on for … Continue reading Microsoft Academic Graph: paper, journals, authors and more

Free dataset: all Reddit comments available for download

As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download

Free data: Constituency Explorer – UK demographics, politics, behaviour

From some combination of the Office of National Statistics, the House of Commons and Durham library comes Constituency Explorer. Billing itself as "reliable evidence for politicians and journalists - data for everyone", it allows interactive visualisation of many interesting demographics/behavioural/political attributes by UK political constituency. It's easy to view distributions and compare between a specific contstituency, the region … Continue reading Free data: Constituency Explorer – UK demographics, politics, behaviour

Free data: data.gov.uk – thousands of datasets from the UK government

Data.gov.uk is the official portal that releases what the UK government deems of as open data. The government is opening up its data for other people to re-use. This is only about non-personal, non-sensitive data – information like the list of schools, crime rates or the performance of your council. At the time of writing it … Continue reading Free data: data.gov.uk – thousands of datasets from the UK government

Free data: Yelp “challenge” dataset: 1.6mi reviews, tips, business data

"1.6M reviews and 500K tips by 366K users for 61K businesses 481K business attributes, e.g., hours, parking availability, ambience. Social network of 366K users for a total of 2.9M social edges. Aggregated check-ins over time for each of the 61K businesses" Plus if you're a student you could win $5000 for playing with it. Go … Continue reading Free data: Yelp “challenge” dataset: 1.6mi reviews, tips, business data