I'm late to discovering this, but in case I'm not the last: In what might be a data analyst's best gift of the year 2023, I recently learned that you can now stop Microsoft Excel from automatically "recognising" and converting certain types of data to other types. Think here of Excel's ability to decide that … Continue reading You can finally can turn off Microsoft Excel’s incessant desire to mess up your data by auto-converting it to an inappropriate type
Category: How to
Aggregating and analysing location data using H3 in Snowflake or R
Geographic location analysis has been an important subset of data analysis since time immemorial. One of the most famous examples from times past is the visualisation that John Snow created in response to an outbreak of cholera almost 170 years ago. That dataviz led to an action - the disabling of a water pump that … Continue reading Aggregating and analysing location data using H3 in Snowflake or R
Creating a granular “votes cast” dataset for the last century’s worth of UK General Elections
Earlier this year the UK had a General Election, which, in many quarters, was declared a landslide victory for Labour. Certainly we have - thank goodness - a new government, and the switch towards Labour in terms of the number of seats (and hence MPs) was dramatic. But there remains an ongoing debate as to … Continue reading Creating a granular “votes cast” dataset for the last century’s worth of UK General Elections
How to get a Wikipedia (or other HTML) table into R as a dataframe
I recently wanted to use some data I found in a Wikipedia article for analysis in R. Acknowledging of course the historical buyer-beware status of Wikipedia data - although these days it often seems as reliable as any other source. It turns out it's pretty easy to do. You can use the rvest library, which … Continue reading How to get a Wikipedia (or other HTML) table into R as a dataframe
Situations when multicollinearity in regression model variables isn’t important
When creating basic multiple regression models, if your predictor variables correlate with each other this usually presents a problem in that you can end up with unstable estimates for the resulting coefficients. One way to test for multi-collinearity is to check for a relatively high Variance Inflation Factor, or VIF. Many packages exist that make … Continue reading Situations when multicollinearity in regression model variables isn’t important
Notes on the book “Becoming a Data Head”
Below are notes that I took when reading Alex J. Gutman and Jordan Goldmeier's book "Becoming a Data Head - How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning". The notes simply aim to summarise the parts of the book that most attracted my attention, sometimes reworded or reorganised, and don’t necessarily … Continue reading Notes on the book “Becoming a Data Head”
Using ChatGPT’s Data Analysis bot to analyse your data
One less widely known feature of OpenAI's large language model chatbot, ChatGPT, is that if you become a paying subscriber then you can create your own bots that are attuned to be good at doing specific types of task. OpenAI also provides you with a few examples that they created, which include the one I'm … Continue reading Using ChatGPT’s Data Analysis bot to analyse your data
Writing conditional filter statements in dplyr
Somehow only recently did I realise that you can use if statements directly within R’s dplyr library filter function. This lets you create conditional filter criteria that can filter on different variables based on some other condition external to the function call. For instance you can change what you filter for by referencing another unrelated variable in your code. … Continue reading Writing conditional filter statements in dplyr
Are AIs developing unpredictable new abilities, or are we just measuring them badly?
One of the things that make people nervous, awestruck, or both about the development and release of recent AI models is the prospect of them developing "emergent abilities". The terminology here can be complicated. Different people mean different things by "emergent abilities". Here in the context of large language models (LLMs), we're talking about the … Continue reading Are AIs developing unpredictable new abilities, or are we just measuring them badly?
Tips and tricks for knitting R Markdown
If you're working in R, especially in RStudio, then using the R Markdown format is a great way to organise and later render your analysis in the form of a visually pleasing and potentially interactive document. It's a version of the classic analysis notebook format - chunks of real working code in between explanatory text, … Continue reading Tips and tricks for knitting R Markdown