Monday, March 26, 2012

How to insert a lookup table row aftre lookup fails

I want to do a a look-up, then if that lookup does not find a record I want to insert a new record into the lookup table, so that effectively the lookup now succeeds.

How can I do that? I can't figure that one out.

TIAI have solved this problem.

I am using a two pass mechanism in the Control Flow. In the first pass (first Data Flow) I do a lookup and then on failure redirct to insert the records.

I have a number of these to do - so in the first pass I multicast the source data to each of the lookups.
You see, I am loading survey data entered into Excel spreadsheets by humans. :)|||FYI it is often more performant not to redirect the error rows but to just do a filter on the rows that don't have the lookup value. This is all down to the redirect creating a new row, and the associated costs of assigning memory etc.
where as processing the normal flow no extra data has to be created in the buffer.

Does that make sense?|||It certainly does Simon. Thank-you very much ... i did not think of that. :)|||

SimonSa wrote:

FYI it is often more performant not to redirect the error rows but to just do a filter on the rows that don't have the lookup value. ...
Does that make sense?

Simon,

Maybe you can provide some insight on a similar scenario. In my control flow, I do a bunch of processing over millions of rows and finally I want to insert a subset into an xyz_master table. However, I want to check if the record already exists or not. I am using the error row redirecting mechanism and it is painfully slow. As per your suggestion, I should do some kind of filter. How do I do it?

Let me know if I need to explain further.

TIA,
Nitesh|||So you have you output from the lookup which is the input to the lookup along with a value that was looked up, i.e the lookup value. For rows found in the lookup this column will contain the key and rows not found will be empty.

This extra column is added to the buffer so the whole row doesn't have to be copied to a new buffer, which is what happens with the redirect, only the new column value is populated in the existing buffer

You then use the conditional split to only output those rows with no value in the lookup key column. The key here is that the conditional split is a synchronous transform and so again uses the same buffer so no data is copied.

You can then use the output of the conditional split (the one where lookup column is null) in any other component.

One thing I will say is that if you are using the lookup make sure you are only selecting the columns you need from the lookup tables, i.e. just the lookup value and any other values needed to be added to the flow.|||Great ... that is exactly what I did :) Glad to see I am on the right track.

I love SSIS!

No comments:

Post a Comment