Travelling through time to query a BigQuery database from the past

Whilst looking into a broken Google BigQuery query recently, I chanced upon the "time travel" feature. This lets you query your BQ database to see what results it would have returned given the state of the data in the past, even if they are different to the results it now returns. I used that to … Continue reading Travelling through time to query a BigQuery database from the past →

Create similar test and control groups by randomising participants with blocking in R

In the classic randomised experiment we randomly assign participants to at least two groups, test and control, by metaphorically tossing a coin to allocate them to one or the other. However, in reality sometimes slightly more sophisticated methods can be useful. One such method is blocking. Here, you first create "blocks" of participants, usually based … Continue reading Create similar test and control groups by randomising participants with blocking in R →

Code-efficient alternatives to CASE WHEN statements in SQL

The CASE statement in SQL is the archetypal conditional statement, corresponding to the "if <A> then <B> else <C>" construct in other languages. Here's a quick refresher on what it looks like. Imagine we have a data table consisting of people, their ages and the number of years they've lived at their current and previous … Continue reading Code-efficient alternatives to CASE WHEN statements in SQL →

Solving the the case of the 6cm man with assertive programming

Liam Thorp, political editor of the Liverpool Echo, was recently surprised to receive an invite to receive a Covid-19 vaccine. Whilst the UK does seem to be numerically ahead in absolute terms of many (but not all) other countries when it comes to at putting first-doses into people, we're still at the stage where only … Continue reading Solving the the case of the 6cm man with assertive programming →

How to be happy: the data driven answer (part 1)

A fundamental goal for many people, explicit or otherwise, is to be maximally happy. Easily said, not always so easily done. So how might we set about raising our level of happiness? OK, at some level, we're all individuals with our own set of wishes and desires. But, at a more macro level, there are … Continue reading How to be happy: the data driven answer (part 1) →

Using R to run many hypothesis tests (or other functions) on subsets of your data in one go

It's easy to run a basic hypothesis test in R, once you know how. For example, if you've a nice set of data that you know meets the relevant assumptions, then you can run a t test in the following sort of way . Here we'll assume that you're interested in comparing the differences in … Continue reading Using R to run many hypothesis tests (or other functions) on subsets of your data in one go →

Extracting the date and time a UUID was created with Bigquery SQL (with a brief foray into the history of the Gregorian calendar)

I was recently working with records in a database that were identified by a Universally Unique Identifier, aka a UUID. These IDs are strings of characters that look something like "31ae75f0-cbe0-11e8-b568-0800200c9a66". I needed to know which records were generated during in a particular time period, but sadly there was no field about dates to be … Continue reading Extracting the date and time a UUID was created with Bigquery SQL (with a brief foray into the history of the Gregorian calendar) →

Analysing your 23andme genetic data in R part 2: exploring the traits associated with your genome

In part one of this mini-series, you heroically obtained and imported your 23andme raw genome data into R. Fun as that was, let's see if we can learn something interesting from it. After all, 23andme does automatically provide several genomic analysis reports, but - for many sensible reasons - it is certainly limited in what … Continue reading Analysing your 23andme genetic data in R part 2: exploring the traits associated with your genome →

Analysing your 23andme genetic data in R part 1: importing your genome into R

23andme is one of the ever-increasing number of direct to consumer DNA testing companies. You send in a vial of your spit; and they analyse parts of your genome, returning you a bunch of reports on ancestry, traits and - if you wish - health. Their business is highly regulated, as of course it should … Continue reading Analysing your 23andme genetic data in R part 1: importing your genome into R →

R packages for summarising data – part 2

In a recent post, I searched a tiny percentage of the CRAN packages in order to check out the options for R functions that quickly and comprehensively summarise data, in a way conducive to tasks such as data validation and exploratory analytics. Since then, several generous people have been kind enough to contact me with … Continue reading R packages for summarising data – part 2 →