Now we all have nice access to large sprawling relational databases, the next problem is locating in which table you need to look to find the data you’re after. For instance, which of the 100 tables available contain my sales?
Of course the proper way would to consult the comprehensive, accurate, well-governed and accessible data dictionary to understand the entities involved and the precise location and definition of what you need.
However, those of us cursed to live in the real world do not always (haha) have that luxury.
Fortunately, if you can guess roughly what the field name of the data you want might be, then you can often use the database’s metadata to find where to look.
In Microsoft SQL Server for instance, running the below query will show you a table of results that tell you in which schema, table and column names you can find fields that have “sales” in their name.
SELECT TABLE_SCHEMA,TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%sales%'
Notes:
- I believe it will only show tables to which you have access. If you don’t find what you want, perhaps you don’t have permissions.
- It looks only for the pattern you type. If you ask for “sales”, but the field is actually called “revenue”, you’re out of luck.
- Of course it gives no clues as to the definition of the field and whether it matches what you hoped it did. For instance is it net or gross sales, how are vouchers integrated, are records ever obseleted, and so on.
- The query is just normal SQL, so you can use any valid “WHERE” clause to filter. The % signs there for instance are acting as wildcards; so you could remove them if you only want to see columns that are called precisely “sales” and not “product sales” for instance.
- Other metadata about the fields is available in that INFORMATION_SCHEMA.COLUMNS table such as datatype etc. Here’s the official documentation on the entirety of that table.
- This is a table in the same sense as the explicit data tables – so you could always connect tools like Tableau etc. to it if you want to produce some fun, or perhaps even useful, meta-visualisations.
Given the absence of any more proper clues, this might still be better than manually reviewing thousands of tables if you’re willing to accept the caveats. Good luck.
Note that although the above example is from SQL Server, a lot of other mainstream RDBMS’ have similar capabilities. For example, here’s some documentation on equivalents for MySQL and Oracle.