Pseudo Change Timestamp-Finding recently changed table rows using Flashback Version Query to provide a surrogate last_changed column
--
My challenge: replicate changes from database A to database B, once per day. The tables involved are pretty big (100M records). I really would not want to process them all into database B. However, the tables do not have a LAST_DATE_MODIFIED column that has a timestamp that is updated with the current timestamp whenever the record is changed (and when it is first created). Fortunately, we can work with a pseudo LAST_MODIFIED timestamp in Oracle Databases, based on a mechanism called Flashback Versions Query. And before you start wondering: is that available in my version or edition of the database or is that something I have to buy as extra option, I can reassure you: every Oracle Database starting with 10g has this feature and it does not cost you anything to use it. In fact, the underlying mechanism is what allows you to run read consistent queries — something that has been available in Oracle from (almost) the very beginning.
In this article I will tell you how you can get a pseudo last_modified timestamp for every record in every table. There are a few caveats to bear in mind:
- the flashback version query uses the UNDO data created by Oracle whenever a transaction is committed; we can only look back in time as far as the available UNDO data allows. This may mean that for rows that have not been updated recently, we do not get a result. Because I am looking for recently changed rows, this is acceptable for me because the UNDO data allows we to look back in time far enough to cover what I call recently
- the value we get for the pseudo last_modified timestamp is accurate only to about 3 seconds; this is good enough for my use case
- the value we get for the pseudo last_modified timestamp for a row is actually the timestamp for the most recent change to any row in the data block that contains the row; this means that the timestamp we get is the upper limit: the record has not been updated more recently than the value we get — but it could have been updated a lot less recently; this means we typically will get more rows reported as recently changed than in fact were recently changed; for my use case, the reduction in the number of records I have to process is still huge, so this overshooting does not worry me much.