Free dataset: all Reddit comments available for download

As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download

Basic text tokenisation with Alteryx

Free text analytics seems a fashionable pastime at present. The most commonly seen form in the wild might be the very basic text visualisation known as the "word cloud". Here, for instance is the New York Times' "most searched for terms" represented in such a cloud. When confronted with a body of human-written text, one of the first steps for many text-related analytical techniques … Continue reading Basic text tokenisation with Alteryx

Extracting SPSS variable labels and factors with Alteryx

SPSS is a nice statistics/analytics package that, since 1968 (!), seems to have been well-regarded program for classic statistics. It now has many new bits and pieces that target the predictive modelling market too. In my experience it was previously mostly used in academia, especially the social sciences, but these days it seems it has made inroads into business and government data too.But it's … Continue reading Extracting SPSS variable labels and factors with Alteryx

Data dictionary functionality in Tableau Server 9

Whilst Tableau is by far the best dataviz/exploration tool I have ever had the pleasure of using, I've traditionally felt it's not so strong in the boring "enterprise" areas around governance, metadata management, documentation and so on. It's quite clear to me why - the tool is/was aimed at data practitioners wanting to bypass all that slow, boring traditional IT … Continue reading Data dictionary functionality in Tableau Server 9

“Move datasource” arrives in Tableau Server v9

Oh happy days - just noticed the new Tableau Server / Online version 9 now allows one to move a datasource! Previously it was very easy to move a workbook from one project to another, but that wasn't possible for a datasource. Instead one had to delete the original datasource and republish it seperately to a … Continue reading “Move datasource” arrives in Tableau Server v9

Gephi basics: simple network graph analysis from spreadsheet data

Several interesting phenomena can be modelled and analysed using graph theory. Graph theory, which Wikipedia tells me first had a paper published about it in 1736 (!) can at its most basic perhaps be thought of as mathematical techniques to analyse problems where one can represent the protagonists as a set of objects (nodes) and lines connecting … Continue reading Gephi basics: simple network graph analysis from spreadsheet data

Calculating prior year differences with custom calendars Tableau challenge

Visualisations of KPIs always require some context in order to make the analysis conducive to decision-making rather than just looking pretty. One that is very common within businesss and elsewhere would be to check some value, for example sales revenue, against the same value within the same time period last year.  "Same time period last … Continue reading Calculating prior year differences with custom calendars Tableau challenge

Why version 9 will {FIX} Tableau for me (and workarounds in the mean time)

Excitement builds within the dataviz world as the next version of Tableau gets close to launch, supposedly within the next 60 days. It has many new features, which data geeks and other fans can see an preview being dripped out piece by piece in the Tableau blog, and summaries elsewhere, but one has really caught my attention, … Continue reading Why version 9 will {FIX} Tableau for me (and workarounds in the mean time)