Friday, March 30, 2012

how to insert only distinct values from a Flat File

I have to insert the Values from the Flat Files , My table structures have Primary keys , how do i insert only the distinct values into the table without the ERROR VIOLATION OF Primary Key already exists a record.

Dropping and Adding Relationships after insert is a way but doesnt serve the whole purpose is there a way we can eliminate duplicate records based on their Primary key before inserting them into the Database.

You can do a lookup against your target table. Redirect the error output and send these rows to a sql destination for the target table (i.e. if the lookup didn't find it, it is a new row)

See the following thread...

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211340&SiteID=1|||

Sort transformation has a check box that allows you to eliminate duplicates; another approach could be to use a agregation transformation.|||Thanks for the Quick Solutions i will do that and come back with the feedback, what about the performance which one is better the sort or Aggregates using group by or Lookup|||Are there duplicate primary key's in your flat file? If so you will need to go with either the aggregate or sort solutions (I may have misread your question, the lookup will find any primary keys already in the table before you start to insert your new records). I don't know which of these will be more effecient for you, but you can always try them both out for a few test runs and see for yourself as it will often depend on environment and data being loaded as to which will come out ahead.|||

The Lookup approach will not detect duplicates within the batch being processed. You would need to change it to no-cache mode. Sort and aggregation transformation should be give you about the same performance. In general the no-cache lookup, sort or aggregation approach are not great from the performance standpoint; but that would depend on many factors; so test and measure yourself.

Sort/aggregation transformation cache the full set of rows in RAM; so if the volume of data to be de-duplicated is huge; the system could run out of memory.

An alternative could be to use an staging table and then let the DB engine to do the dedup work. (e.g. http://rafael-salas.blogspot.com/2007/04/remove-duplicates-using-t-sql-rank.html )|||

I got to filter out the distinct values using aggregate transformation but i am unsure baout how to get all the columns into the output ... as the Aggregate is returning only the columns used to get the distinct on the Primary Keys.

|||There are two fundamentals being discussed in this thread. We've yet to get the crucial question answered though:

Are you trying to dedupe the source data before going into the destination table, or are you trying to prevent duplicate records from getting inserted (unique data from the source, but not necessarily all unique in the destination) and hence raising a primary key violation?

Deduping the source data can be done with an aggregate or the sort transformations. You could also load the data into a staging table (as Rafael stated) and then run a SQL statement against that data (select DISTINCT perhaps). Using the techniques described in the thread linked to earlier, you can ensure that your data does not violate primary keys by using a lookup. Note that you may have to do a combination of all of the above.|||

Dev2624 wrote:

I got to filter out the distinct values using aggregate transformation but i am unsure baout how to get all the columns into the output ... as the Aggregate is returning only the columns used to get the distinct on the Primary Keys.

Use the sort transformation instead.|||I am trying to Prevent the Duplicate Records from getting inserted . i am working on it will post back with the results. Thanks!!!sql

No comments:

Post a Comment