I’m late to discovering this, but in case I’m not the last: In what might be a data analyst’s best gift of the year 2023, I recently learned that you can now stop Microsoft Excel from automatically “recognising” and converting certain types of data to other types.
Think here of Excel’s ability to decide that the text string 2024-11-01 is in fact a date, to be represented as “November 1st 2024”, or that the zip code 01928 is a simple integer number where the leading zero is entirely unimportant and hence discardable; a stance that I’m afraid the Post Office disagrees with.
I hence put “recognising” in quotes because it wasn’t exactly perfect at knowing when to not do this. Typically – dangerously – it wouldn’t always even notify you that it had made such changes, or at least not in a way strident enough that I ever noticed. Blaring sirens should have been involved as a minimum.
Whilst it could absolutely be a useful feature at times given the dire state of formatting that adhoc data sometimes arrives at the analyst’s desk in, at other times you don’t want the phone number 01234123123 to be changed to an undialable 1234123123.
Or perhaps you prefer to be able to publish scientific papers about various genes called things like “Membrane Associated Ring-CH-Type Finger 1”, understandably commonly abbreviated to MARCH1, which Excel unhelpfully rewrites as “1-Mar”. Or perhaps you’d like to analyse the gene 2310009E13, or”2.31E+13″, as Excel likes to call it.
The latter type of phenomenon was a problem so frustrating that after noting the embarrassing fact that data from a whopping 20% of a sample of several thousand published papers on genetics was subject to errors caused by this Excel behaviour, scientists decided to rename a chunk of human genes rather than wait for Microsoft to finally do something about this issue.
From the 2020 update to the HUGO Gene Nomenclature Committee guidelines:
Symbols that affect data handling and retrieval. For example, all symbols that autoconverted to dates in Microsoft Excel have been changed (for example, SEPT1 is now SEPTIN1; MARCH1 is now MARCHF1); tRNA synthetase symbols that were also common words have been changed (for example, WARS is now WARS1; CARS is now CARS1).
But Microsoft did get to addressing the issue eventually. If only the HGNC folk had waited yet another few years they could simply have asked every relevant Excel-using scientist in the world to disable the now-available Automatic Data Conversion setting “Convert continuous letters and numbers to a date”!
Other such auto-conversions that you can now disable include:
- Remove leading zeros and convert to a number
- Keep first 15 digits of long numbers and display in scientific notation.
- Convert digits surrounding the letter E to a number in scientific notation.
In recent versions of Excel, go into Options -> Data and look for the Automatic Data Conversion section of tickboxes to see those features.
