Travelling through time to query a BigQuery database from the past

Whilst looking into a broken Google BigQuery query recently, I chanced upon the “time travel” feature. This lets you query your BQ database to see what results it would have returned given the state of the data in the past, even if they are different to the results it now returns.

I used that to show that records that shouldn’t have been changing in the database actually were, but there are probably all sorts of rare-but-useful use cases one could imagine where it might be useful to query a past version of your database, even if something happened to it since.

The trick is to use a “FOR SYSTEM TIME AS OF” clause.

So for instance, this query is a standard-enough way of determining how many customers I have.

SELECT COUNT(1) FROM customer_details

Let’s say it tells me I’ve got 1,000 customers.

But how many rows did this table have yesterday?

SELECT COUNT(1) FROM customer_details
FOR SYSTEM TIME AS OF ('2021-03-27 00:00:00')

Depending on how your database is supposed to work, perhaps you would be worried about your data quality if yesterday’s number was bigger than today’s. If you are missing data in your database as of today, maybe you would continue with this style of back-in-time querying to retrieve it. Note though that if your goal is to fully restore a copy of the state of the the table as it was of some time in the past week, you can do that in one step using the bq command line tool as shown here.

Note that this is a feature to query the state of the database as it was in the past, so is in no way supposed to substitute for having storing the appropriate date fields in any table you want to be able to query with date-based criteria.

There are a few limitations. Primarily that you can only query back as far as 7 days ago. A query like this, if issued today:

SELECT COUNT(1) FROM customer_details
FOR SYSTEM TIME AS OF ('2021-03-01 00:00:00')

fails with the error “Invalid snapshot time 1614585600000 for table customer_details@1614585600000. Cannot read before 1616311588019”.

And whilst it does still work if you subsequently replaced the table via a “CREATE OR REPLACE TABLE” type statement, it doesn’t work if you fully deleted the table. In that case it’ll give an error to the effect of saying that the table cannot be found.

All is not necessarily lost in that case though, as, as long as no more than 7 days have past, the method of fully restoring the table linked to above should still work.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s