Tips and tricks for knitting R Markdown

If you're working in R, especially in RStudio, then using the R Markdown format is a great way to organise and later render your analysis in the form of a visually pleasing and potentially interactive document. It's a version of the classic analysis notebook format - chunks of real working code in between explanatory text, … Continue reading Tips and tricks for knitting R Markdown

Easily write your own custom functions in Excel and Google Sheets with LAMBDA

Just in case modern-day spreadsheets don't already have enough functions for you, I learned that both Microsoft Excel and Google Sheets have added the ability to define your own custom functions, even without having to learn a new programming language. If you can express what you want your function to do in terms of standard … Continue reading Easily write your own custom functions in Excel and Google Sheets with LAMBDA

A super fast and flexible near-optimal matching method using the quickmatch library in R

Sometimes we don't have the luxury of running a gold-standard randomised controlled trial when wanting to understand the effect of some intervention on some population. Perhaps the required experiment would be unethical, too costly, or otherwise unfeasible. Or maybe the powers that be just never thought about doing a proper experiment, but yet want to … Continue reading A super fast and flexible near-optimal matching method using the quickmatch library in R

How to install a CRAN package that has been archived

Sometimes you may come to install a copy of one of your favourite R libraries from CRAN, only to be confronted with the nightmare scenario of it no longer being available. If you try and install the package in the conventional install.packages() way then you'll get the error "package is not available for this version … Continue reading How to install a CRAN package that has been archived

Chebyshev’s inequality – a 68-95-99.7 style rule for all distributions

Pafnuty Chebyshev, looking stern, from Wikipedia. Most people that have studied a certain amount of statistics theory will likely have encountered the 68-95-99.7 rule. It could surely do with a more catchy name, but the point of it is to quickly express the proportion of values that should lie within 1, 2 and 3 standard deviations … Continue reading Chebyshev’s inequality – a 68-95-99.7 style rule for all distributions

How to evaluate the results of an experiment early and often without increasing false positives

Most data folk I know love experiments. They're the ideal way to use data to answer the question of not only whether A is associated with B, but also if A causes B. Randomised Controlled Trials are a subset of experiments that most interested people seem to agree are the gold standard in, for instance, … Continue reading How to evaluate the results of an experiment early and often without increasing false positives

Accessing your Duolingo data for analysis via Python

Duolingo is a popular app-and-website for learning a new (human) language, with hundreds of millions of users across the world. You tell it what language you speak and which you'd like to learn, and it teaches you via bite-size lessons, stories and audio clips with interactive tests and the like. Even as someone who hasn't … Continue reading Accessing your Duolingo data for analysis via Python

Create rows for missing combinations of data with R

Sometimes one gets a dataset that is in one sense missing rows, but in another sense missing nothing, because those rows represent occasions where nothing happened. That's perhaps a rather confusing description, so to demonstrate with a common example of this let's imagine some sales data. Here each row tells you how much each customer … Continue reading Create rows for missing combinations of data with R

A quick way to count the number of null values in each field of a BigQuery table

After perhaps day 2 of many real-world jobs, most analysts have likely learnt to never fully trust any dataset without doing at least a little pre-exploration of the content and quality of the data. Before beginning the work to generate your world-shattering insights, it's therefore usually wise to run a few checks. One of my … Continue reading A quick way to count the number of null values in each field of a BigQuery table