Situations when multicollinearity in regression model variables isn’t important

When creating basic multiple regression models, if your predictor variables correlate with each other this usually presents a problem in that you can end up with unstable estimates for the resulting coefficients. One way to test for multi-collinearity is to check for a relatively high Variance Inflation Factor, or VIF. Many packages exist that make … Continue reading Situations when multicollinearity in regression model variables isn’t important →

Notes on the book “Becoming a Data Head”

Below are notes that I took when reading Alex J. Gutman and Jordan Goldmeier's book "Becoming a Data Head - How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning". The notes simply aim to summarise the parts of the book that most attracted my attention, sometimes reworded or reorganised, and don’t necessarily … Continue reading Notes on the book “Becoming a Data Head” →

Using ChatGPT’s Data Analysis bot to analyse your data

One less widely known feature of OpenAI's large language model chatbot, ChatGPT, is that if you become a paying subscriber then you can create your own bots that are attuned to be good at doing specific types of task. OpenAI also provides you with a few examples that they created, which include the one I'm … Continue reading Using ChatGPT’s Data Analysis bot to analyse your data →

Writing conditional filter statements in dplyr

Somehow only recently did I realise that you can use if statements directly within R’s dplyr library filter function. This lets you create conditional filter criteria that can filter on different variables based on some other condition external to the function call. For instance you can change what you filter for by referencing another unrelated variable in your code. … Continue reading Writing conditional filter statements in dplyr →

Are AIs developing unpredictable new abilities, or are we just measuring them badly?

One of the things that make people nervous, awestruck, or both about the development and release of recent AI models is the prospect of them developing "emergent abilities". The terminology here can be complicated. Different people mean different things by "emergent abilities". Here in the context of large language models (LLMs), we're talking about the … Continue reading Are AIs developing unpredictable new abilities, or are we just measuring them badly? →

Tips and tricks for knitting R Markdown

If you're working in R, especially in RStudio, then using the R Markdown format is a great way to organise and later render your analysis in the form of a visually pleasing and potentially interactive document. It's a version of the classic analysis notebook format - chunks of real working code in between explanatory text, … Continue reading Tips and tricks for knitting R Markdown →

Easily write your own custom functions in Excel and Google Sheets with LAMBDA

Just in case modern-day spreadsheets don't already have enough functions for you, I learned that both Microsoft Excel and Google Sheets have added the ability to define your own custom functions, even without having to learn a new programming language. If you can express what you want your function to do in terms of standard … Continue reading Easily write your own custom functions in Excel and Google Sheets with LAMBDA →

A super fast and flexible near-optimal matching method using the quickmatch library in R

Sometimes we don't have the luxury of running a gold-standard randomised controlled trial when wanting to understand the effect of some intervention on some population. Perhaps the required experiment would be unethical, too costly, or otherwise unfeasible. Or maybe the powers that be just never thought about doing a proper experiment, but yet want to … Continue reading A super fast and flexible near-optimal matching method using the quickmatch library in R →

How to install a CRAN package that has been archived

Sometimes you may come to install a copy of one of your favourite R libraries from CRAN, only to be confronted with the nightmare scenario of it no longer being available. If you try and install the package in the conventional install.packages() way then you'll get the error "package is not available for this version … Continue reading How to install a CRAN package that has been archived →

Multi-armed bandits, and the Duolingo example

Duolingo, the company behind the famous language learning app, face a similar challenge to the majority of companies whose existence largely depends on usage of an app or website. They want to promote "user engagement". That's to say it's in their interest - and fortunately in this case also in the interest of their customers … Continue reading Multi-armed bandits, and the Duolingo example →