Wednesday, November 9, 2011

Addition to BIDS: Table Difference component

One question arising very often in the Data Warehouse programming is "What are the differences between two tables?" You receive several million customers and should decide what changes need to be done. The standard SSIS toolkit provides SCD component for handling this process, but it is of no practical use when you have to handle more than a couple of thousand records. The Table Difference component doesn't have this limitation.
The component has two inputs: one for "old" and one for "new data". The old and new rows are compared and the result is sent to different outputs:
  • Unchanged rows (same in both old and new)
  • Deleted rows (appear in old but not in new)
  • New rows (appear in new but not in old)
  • Updated rows (appear in both old and new, but there is a change)
The diagram below illustrates the component functionality better than thousand words:

Occasionally you may have a requirement to perform different updates, for different columns. The component includes option to tag each input column with identifier. Then for each unique identifier, a separate update output is created and maintained. In this way you can tag with "10" the historical attributes and with "20" the other columns and decide what kind of operation to carry on with the different updates.

The component inputs MUST be sorted and have a collection of fields (keys) that let the component decide when two rows from the inputs represent the same row. This is easily accomplished in SQL with a simple “order by” and a convenient index. Usually slowly changing dimensions maintain an index by the business key, so the sorting requirement is easily accomplished and do not represent a problem.
Click here for more information about the Table Difference component.

No comments:

Post a Comment