A fundamental goal for many people, explicit or otherwise, is to be maximally happy. Easily said, not always so easily done. So how might we set about raising our level of happiness? OK, at some level, we're all individuals with our own set of wishes and desires. But, at a more macro level, there are … Continue reading How to be happy: the data driven answer (part 1)
Tag: Text analytics
Free dataset: all Reddit comments available for download
As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download
Basic text tokenisation with Alteryx
Free text analytics seems a fashionable pastime at present. The most commonly seen form in the wild might be the very basic text visualisation known as the "word cloud". Here, for instance is the New York Times' "most searched for terms" represented in such a cloud. When confronted with a body of human-written text, one of the first steps for many text-related analytical techniques … Continue reading Basic text tokenisation with Alteryx
The most toxic place on Reddit
Reddit, the "front page of the internet" - and a network I hardly ever dare enter for fear of being sucked in to reading 100s of comments for hours on highly pointless yet entertaining things - has had its share of controversies over the years. The site is structurally divided up into "subreddits" , which … Continue reading The most toxic place on Reddit