Geographic location analysis has been an important subset of data analysis since time immemorial. One of the most famous examples from times past is the visualisation that John Snow created in response to an outbreak of cholera almost 170 years ago. That dataviz led to an action - the disabling of a water pump that … Continue reading Aggregating and analysing location data using H3 in Snowflake or R
Tag: R
How to get a Wikipedia (or other HTML) table into R as a dataframe
I recently wanted to use some data I found in a Wikipedia article for analysis in R. Acknowledging of course the historical buyer-beware status of Wikipedia data - although these days it often seems as reliable as any other source. It turns out it's pretty easy to do. You can use the rvest library, which … Continue reading How to get a Wikipedia (or other HTML) table into R as a dataframe
Situations when multicollinearity in regression model variables isn’t important
When creating basic multiple regression models, if your predictor variables correlate with each other this usually presents a problem in that you can end up with unstable estimates for the resulting coefficients. One way to test for multi-collinearity is to check for a relatively high Variance Inflation Factor, or VIF. Many packages exist that make … Continue reading Situations when multicollinearity in regression model variables isn’t important
Tips and tricks for knitting R Markdown
If you're working in R, especially in RStudio, then using the R Markdown format is a great way to organise and later render your analysis in the form of a visually pleasing and potentially interactive document. It's a version of the classic analysis notebook format - chunks of real working code in between explanatory text, … Continue reading Tips and tricks for knitting R Markdown
How to install a CRAN package that has been archived
Sometimes you may come to install a copy of one of your favourite R libraries from CRAN, only to be confronted with the nightmare scenario of it no longer being available. If you try and install the package in the conventional install.packages() way then you'll get the error "package is not available for this version … Continue reading How to install a CRAN package that has been archived
How to evaluate the results of an experiment early and often without increasing false positives
Most data folk I know love experiments. They're the ideal way to use data to answer the question of not only whether A is associated with B, but also if A causes B. Randomised Controlled Trials are a subset of experiments that most interested people seem to agree are the gold standard in, for instance, … Continue reading How to evaluate the results of an experiment early and often without increasing false positives
Create rows for missing combinations of data with R
Sometimes one gets a dataset that is in one sense missing rows, but in another sense missing nothing, because those rows represent occasions where nothing happened. That's perhaps a rather confusing description, so to demonstrate with a common example of this let's imagine some sales data. Here each row tells you how much each customer … Continue reading Create rows for missing combinations of data with R
Create similar test and control groups by randomising participants with blocking in R
In the classic randomised experiment we randomly assign participants to at least two groups, test and control, by metaphorically tossing a coin to allocate them to one or the other. However, in reality sometimes slightly more sophisticated methods can be useful. One such method is blocking. Here, you first create "blocks" of participants, usually based … Continue reading Create similar test and control groups by randomising participants with blocking in R
Solving the the case of the 6cm man with assertive programming
Liam Thorp, political editor of the Liverpool Echo, was recently surprised to receive an invite to receive a Covid-19 vaccine. Whilst the UK does seem to be numerically ahead in absolute terms of many (but not all) other countries when it comes to at putting first-doses into people, we're still at the stage where only … Continue reading Solving the the case of the 6cm man with assertive programming
How to be happy: the data driven answer (part 1)
A fundamental goal for many people, explicit or otherwise, is to be maximally happy. Easily said, not always so easily done. So how might we set about raising our level of happiness? OK, at some level, we're all individuals with our own set of wishes and desires. But, at a more macro level, there are … Continue reading How to be happy: the data driven answer (part 1)