The great SQL leading vs trailing commas debate

It might seem a small thing, but I noticed that a recent update of the Snowflake database now allows you to have a trailing comma at the end of the SQL’s SELECT statement.

For example, this now works:

SELECT  my_field, 
        my_field_2,     
FROM  my_table

Whereas before that’d give an error.

It’s arguably bad form nonetheless, but it took me a while to get used to not being able to do that when transferring from BigQuery which has permitted that syntax for as long as I can remember.

The main time I find it a helpful timesaver is when iterating on a query I want to quickly and temporarily comment things out for testing. For instance the below would now work, but previously would give an error

SELECT my_field,
        -- myfield_2 Testing how the query works without this field
FROM my_table

That said, when left to my own devices I often code with leading commas, which avoids that problem entirely:

SELECT myfield
       , myfield_2
FROM my_table

Comment out the second field there and no sensible database interpreter will complain:

SELECT myfield
       --, myfield_2
FROM my_table

However, the whole leading vs trailing commas debate is of course one of those weird coding issues that makes no difference whatsoever as to how computers interpret your instructions but can cause real fights within human teams. The existence of SQL auto-formatters that can transform your code from leading to trailing commas and back again at the click of a button has probably saved many a collaboration. I’ve successfully used both SQLFluff and SQLinForm for this.

So is there an actual answer to the “which commas are best, trailing vs leading?” question? Both styles are of course feasible, and people of all competencies exist that use either or both. Perhaps unsurprisingly, I don’t see much published work on the subject, but a couple of folk have made rather less formal attempts to quantify differences which make for entertaining reading.

First, there’s an article from Mode with the tongue-in-cheek title “A Dispassionate Examination of the Empirical Evidence Regarding Positional Punctuation in SQL“. Mode is an analytics platform which enables its employees to see the SQL queries people are running – nearly a billion of them at the time they wrote about it.

They found that trailing commas is the much more popular style. Of the queries that have at least 5 lines of code in them:

  • 82.3% used trailing commas
  • 16.3% used leading commas
  • with the remaining just-over-1 percent using both in the same query.

When looking at the author level:

  • 80% of query writers used only trailing commas.
  • 5% always use leading commas.
  • and ~14% used both at some time.

But quantity doesn’t necessarily denote quality! Some attempts they made to get at that include looking at volumes per person, perhaps a measure of experience (at a stretch…remember the researchers can only see queries written within the Mode platform!). With that, they found that whilst 89% of people who wrote fewer than 10 queries use trailing commas, whereas fewer, only 76%, of those with over 100 queries in the system did.

How about query correctness? Syntactic correctness anyway. There they found that queries that use leading commas failed less often than those that used trailing comas. There difference was admittedly small; query failure rates of 12.7 vs 13.4% respectively. This relationship holds even when controlled for “experience” as measured by the number of queries written, as well as database type.

Despite these findings, the author of that piece preserves their love of trailing commas, suggesting that there is more beauty, elegance and human-readability in the trailing style. Not everything has to be optimised for functional efficiency. They even wrote what seems to be almost a poem about it in their last paragraph.

The second article I saw was by Felipe Hoffa on Medium. There he analysed 320GB of SQL he found in 80,000 Github repos. He provides the queries he used to do this – leading comma formatted – so you can follow along.

He finds that trailing commas are again much more popular.

  • 50 million lines of SQL end with a comma vs 1.4 million that begin with one.
  • At the query level, 339k SQL files have trailing commas, vs 18k having leading ones.
  • Project-wise, 73k repos have tailing commas, 3.4k have leading ones.

Defining success as some combination of how many stars the Github project has received, the number of active users and the general activity seen within the rep, he finds that projects that use both styles are the most successful, followed by those that have only leading commas, with those using only trailing commas finishing last.

Neither of these works are rigorous scientific studies, nor do they pretend to be. No-one needs to immediately change their style as a results. But at least they show that my slight personal preference for leading commas is, whilst unpopular, not ridiculous.

Leave a comment