Notes on the book “Becoming a Data Head”

Below are notes that I took when reading Alex J. Gutman and Jordan Goldmeier's book "Becoming a Data Head - How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning". The notes simply aim to summarise the parts of the book that most attracted my attention, sometimes reworded or reorganised, and don’t necessarily … Continue reading Notes on the book “Becoming a Data Head” →

How to be happy: the data driven answer (part 1)

A fundamental goal for many people, explicit or otherwise, is to be maximally happy. Easily said, not always so easily done. So how might we set about raising our level of happiness? OK, at some level, we're all individuals with our own set of wishes and desires. But, at a more macro level, there are … Continue reading How to be happy: the data driven answer (part 1) →

Free dataset: all Reddit comments available for download

As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download →

Basic text tokenisation with Alteryx

Free text analytics seems a fashionable pastime at present. The most commonly seen form in the wild might be the very basic text visualisation known as the "word cloud". Here, for instance is the New York Times' "most searched for terms" represented in such a cloud. When confronted with a body of human-written text, one of the first steps for many text-related analytical techniques … Continue reading Basic text tokenisation with Alteryx →

The most toxic place on Reddit

Reddit, the "front page of the internet" - and a network I hardly ever dare enter for fear of being sucked in to reading 100s of comments for hours on highly pointless yet entertaining things - has had its share of controversies over the years. The site is structurally divided up into "subreddits" , which … Continue reading The most toxic place on Reddit →