The Datasaurus: a monstrous Anscombe for the 21st century

Most people trained in the ways of data visualisation will be very familiar with Anscombe's Quartet. For the uninitiated, it's a set of 4 fairly simple looking X-Y scatterplots that look like this. What's so great about those then? Well, the reason data vizzers get excited starts to become clear when you realise that the dotted grey … Continue reading The Datasaurus: a monstrous Anscombe for the 21st century

Simpson’s paradox and the importance of segmentation

Here's a classic business analysis scenario, which I'd like to use to illustrate one of my favourite mathematical curiosities. Your marketers have sent out a bunch of direct mail to a proportion of your previous customers, and deliberately withheld the letters from the rest of them so that they can act as a control group. As analyst extraordinaire, you get … Continue reading Simpson’s paradox and the importance of segmentation

New website launch from the Office of National Statistics

Yesterday, the UK Office of National Statistics, the institution that is "responsible for collecting and publishing statistics related to the economy, population and society", launched its new website. As well as a new look, they've concentrated on improving the search experience and making it accessible to mobile device users. The front page is a nice at-a-glance … Continue reading New website launch from the Office of National Statistics

The Sun and its dangerous misuse of statistics

Here's the (pretty abhorrent) front cover of yesterday's Sun newspaper. Bearing in mind that several recent terrorist atrocities are top of everyone's mind at the moment, it's clear what the Sun is implying here. The text on the front page is even more overt: Nearly one in five British Muslims have some sympathy with those who have fled … Continue reading The Sun and its dangerous misuse of statistics

Kruskal Wallis significance testing with Tableau and R

Whilst Tableau has an increasing number of advanced statistical functions - a case in point being the newish analytics pane from Tableau version 9 - it is not usually the easiest tool to use to calculate any semi-sophisticated function that hasn't yet been included. Various clever people have tried to work some magic aroud this, for instance by … Continue reading Kruskal Wallis significance testing with Tableau and R