As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download
Tag: Data
Free data: Constituency Explorer – UK demographics, politics, behaviour
From some combination of the Office of National Statistics, the House of Commons and Durham library comes Constituency Explorer. Billing itself as "reliable evidence for politicians and journalists - data for everyone", it allows interactive visualisation of many interesting demographics/behavioural/political attributes by UK political constituency. It's easy to view distributions and compare between a specific contstituency, the region … Continue reading Free data: Constituency Explorer – UK demographics, politics, behaviour
Free data: data.gov.uk – thousands of datasets from the UK government
Data.gov.uk is the official portal that releases what the UK government deems of as open data. The government is opening up its data for other people to re-use. This is only about non-personal, non-sensitive data – information like the list of schools, crime rates or the performance of your council. At the time of writing it … Continue reading Free data: data.gov.uk – thousands of datasets from the UK government
Free data: Yelp “challenge” dataset: 1.6mi reviews, tips, business data
"1.6M reviews and 500K tips by 366K users for 61K businesses 481K business attributes, e.g., hours, parking availability, ambience. Social network of 366K users for a total of 2.9M social edges. Aggregated check-ins over time for each of the 61K businesses" Plus if you're a student you could win $5000 for playing with it. Go … Continue reading Free data: Yelp “challenge” dataset: 1.6mi reviews, tips, business data