Free dataset: all Reddit comments available for download

As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download

When is it safe to stop watching the match?

Despite the Harvard Business Review's insistence that data analyst is the sexiest job of the 21st century, ask a non-quant about popular references to data analyssis and you are quite likely to hear some reference to Moneyball (be that book or film). Spoiler alert: "sabermetric" data analysis enabled a baseball team with less money to … Continue reading When is it safe to stop watching the match?

Basic text tokenisation with Alteryx

Free text analytics seems a fashionable pastime at present. The most commonly seen form in the wild might be the very basic text visualisation known as the "word cloud". Here, for instance is the New York Times' "most searched for terms" represented in such a cloud. When confronted with a body of human-written text, one of the first steps for many text-related analytical techniques … Continue reading Basic text tokenisation with Alteryx

From restaurant-snobbery to racism: some perils of data-driven decision-making

Wired recently wrote a piece explaining how now OpenTable, a leading "reserve a restuarant over the internet" service, was starting to permit customers to pay for their meal via an app at their leisure, rather than flag down a waiter and awkwardly fiddle around with credit cards. There's an obvious convenience to this for the … Continue reading From restaurant-snobbery to racism: some perils of data-driven decision-making

Data science vs rude Lego

Data science moves onwards each day, helping (perhaps) solve more and more of the world's problems. But apparently there's at least one issue for which we don't have a great machine-learning/AI solution for just yet - identifying penises made out of Lego. Indeed this is apparently the problem that plagued the potential-Minecraft-beater "Lego Universe" nearly … Continue reading Data science vs rude Lego

UK election 2015: Who actually voted for the Conservative party?

Here in the UK we just had our general election, electing the government who will rule over us for the next 5 years. The results - a Conservative majority - were something of a surprise to most people, myself included. I'm sure I won't be able to hide my leanings for long, so to be clear, … Continue reading UK election 2015: Who actually voted for the Conservative party?