Sometimes we don't have the luxury of running a gold-standard randomised controlled trial when wanting to understand the effect of some intervention on some population. Perhaps the required experiment would be unethical, too costly, or otherwise unfeasible. Or maybe the powers that be just never thought about doing a proper experiment, but yet want to … Continue reading A super fast and flexible near-optimal matching method using the quickmatch library in R

# Tag: Statistics

# Chebyshev’s inequality – a 68-95-99.7 style rule for all distributions

Pafnuty Chebyshev, looking stern, from Wikipedia. Most people that have studied a certain amount of statistics theory will likely have encountered the 68-95-99.7 rule. It could surely do with a more catchy name, but the point of it is to quickly express the proportion of values that should lie within 1, 2 and 3 standard deviations … Continue reading Chebyshev’s inequality – a 68-95-99.7 style rule for all distributions

# Are you (statistically) smarter than a politican?

Time to test yourself! Give the below three questions a go, before proceeding 1. If you toss a fair coin twice, what is the probability of getting two heads? 2. Suppose you roll a 6-sided die. The rolls are: 1, 3, 4, 1, and 6. What is the mean value? 3. And what is the mode value? 4. Suppose there was a diagnostic test for a virus. The false-positive rate (the proportion of people without the virus who get a positive result) is one in 1,000. You have taken the test and tested positive. What is the probability that you have the virus?

# How to evaluate the results of an experiment early and often without increasing false positives

Most data folk I know love experiments. They're the ideal way to use data to answer the question of not only whether A is associated with B, but also if A causes B. Randomised Controlled Trials are a subset of experiments that most interested people seem to agree are the gold standard in, for instance, … Continue reading How to evaluate the results of an experiment early and often without increasing false positives

# The effectiveness of the Covid-19 vaccine: 95% or 0.84%?

At the time of writing, about 87% of UK adults have received at least one dose of a Covid-19 vaccine. The huge majority of mainstream scientific or journalistic sources report the vaccine efficacy as being very high, up to 95% depending on the specific vaccine and specific measure in question. It may be somewhat lower … Continue reading The effectiveness of the Covid-19 vaccine: 95% or 0.84%?

# Create similar test and control groups by randomising participants with blocking in R

In the classic randomised experiment we randomly assign participants to at least two groups, test and control, by metaphorically tossing a coin to allocate them to one or the other. However, in reality sometimes slightly more sophisticated methods can be useful. One such method is blocking. Here, you first create "blocks" of participants, usually based … Continue reading Create similar test and control groups by randomising participants with blocking in R

# Covid-19 testing in England’s secondary schools: the story so far

Last week, secondary schools in England reopened for all pupils, after several months of semi-closure due to the Covid-19 pandemic. During that period, most teachers were busy labouring hard to conduct their lessons remotely, often persevering with minimal extra resources and inconsistent central guidance. A smaller proportion were still going to school, in order to … Continue reading Covid-19 testing in England’s secondary schools: the story so far

# Using R to run many hypothesis tests (or other functions) on subsets of your data in one go

It's easy to run a basic hypothesis test in R, once you know how. For example, if you've a nice set of data that you know meets the relevant assumptions, then you can run a t test in the following sort of way . Here we'll assume that you're interested in comparing the differences in … Continue reading Using R to run many hypothesis tests (or other functions) on subsets of your data in one go

# My favourite R package for: correlation

R is a wonderful, flexible, if somewhat arcane tool for analytics of all kinds. Part of its power, yet also its ability to bewilder, comes from the fact that there are so many ways of doing the same, or similar, things. Many of these ways are instantly available thanks to many heroes of the R … Continue reading My favourite R package for: correlation

# The Datasaurus: a monstrous Anscombe for the 21st century

Most people trained in the ways of data visualisation will be very familiar with Anscombe's Quartet. For the uninitiated, it's a set of 4 fairly simple looking X-Y scatterplots that look like this. What's so great about those then? Well, the reason data vizzers get excited starts to become clear when you realise that the dotted grey … Continue reading The Datasaurus: a monstrous Anscombe for the 21st century