Sometimes
you have a process to load data from a flat file to a database, even from a
database to your DW.
But if something
goes wrong? How do you know if your load process finished successfully or not?
If not, how do you reprocess the files without duplicate records?
This post
will bring an option to try addressing those problems.
The main
ideas of control process are (1) generate an ID for the process (2) save a
timestamp at beginning (3) save ID on all the tables controlled (4) save a
timestamp at the end.
So, if some
problem occurs and aborts the job, next time you will be able to identify all data
inserted by that job, delete them and insert again.
I’m
assuming PostgreSQL as Database. For others databases might be necessary some adjusts.