hide system tables: how to index large varchar field

Here's the situation: we have a character based field of up to 2048
characters in length that we want to store in a single column in a
sqlserver table. need to be able to search on this field and guarantee
uniqueness. it would be nice if this field has an index with a unique
constraint. the problem is that sqlserver can't index anything over
900 characters in length.
any suggestions appreciated.
we have a couple thoughts so far:
1 - compress this field in the data access layer before saving in sql -
that should keep the length of the field under 900 chars and let us
apply a unique index in sql. however, we would still have the
uncompressed, unindexed version of the field in another column
searching.
2 - split the field into multiple columns each under 900 characters.
apply an index to each column. however, we'd lose the uniqueness
constraint and this would complicate searches.
3 - combination of 1 and 2.
by the way - we're using SQL Server 2005.
In an effort to maintain uniqueness you could investigate creating three
additional columns (varchar (683)) then popultate the data from the field in
question evenly across the new columns. To ensure uniqueness you could
created a concatenated key with the three new columns. Just a thought....
Thomas
"steve.c.thompson@.gmail.com" wrote:

> Here's the situation: we have a character based field of up to 2048
> characters in length that we want to store in a single column in a
> sqlserver table. need to be able to search on this field and guarantee
> uniqueness. it would be nice if this field has an index with a unique
> constraint. the problem is that sqlserver can't index anything over
> 900 characters in length.
> any suggestions appreciated.
> we have a couple thoughts so far:
> 1 - compress this field in the data access layer before saving in sql -
> that should keep the length of the field under 900 chars and let us
> apply a unique index in sql. however, we would still have the
> uncompressed, unindexed version of the field in another column
> searching.
> 2 - split the field into multiple columns each under 900 characters.
> apply an index to each column. however, we'd lose the uniqueness
> constraint and this would complicate searches.
> 3 - combination of 1 and 2.
> by the way - we're using SQL Server 2005.
>
|||I was thinking the same thing Thomas. I created the three additional
columns and the tried to create the composity key, but sql won't create
an index across the three fields since the sum of the field lengths is
greater then 900.
steve.
|||steve.c.thompson@.gmail.com wrote:
> Here's the situation: we have a character based field of up to 2048
> characters in length that we want to store in a single column in a
> sqlserver table. need to be able to search on this field and
> guarantee uniqueness. it would be nice if this field has an index
> with a unique constraint. the problem is that sqlserver can't index
> anything over 900 characters in length.
> any suggestions appreciated.
> we have a couple thoughts so far:
> 1 - compress this field in the data access layer before saving in sql
> - that should keep the length of the field under 900 chars and let us
> apply a unique index in sql. however, we would still have the
> uncompressed, unindexed version of the field in another column
> searching.
Compression means you're dealing with binary data and you're not going
to get a lot of compression on a max of 2K worth of data.

> 2 - split the field into multiple columns each under 900 characters.
> apply an index to each column. however, we'd lose the uniqueness
> constraint and this would complicate searches.
Adds additional management.
I would recommend that you hash the data and generate a 32-bit hash
value. You can do this from the client using .Net, for example, or using
an extended stored procedure. SQL Server has the T-SQL BINARY_CHECKSUM()
function, but it has some limitations generating the same checksum for a
few different values. However, the T-SQL function would be quite easy to
implement using an INSERT/UPDATE trigger on the table.
select BINARY_CHECKSUM('ABC123'), BINARY_CHECKSUM('aBC123')
|||Steve,
I think you must get the data size down to 900 characters or less,
otherwise you will not be able to guarantee uniqueness.
As for the searching: if you have to search in the texts (for example
LIKE '%some text%'), then an index is only useful if the data column is
narrow in comparison to average row size. That is probably not the case
here, which means a table scan (or clustered index scan) is probably
most efficient.
If it is relatively small in comparison to the average row size, then
you could create a separate table with only a key column and the
varchar(2048) column, with a one-to-one relation to the original table.
HTH,
Gert-Jan
steve.c.thompson@.gmail.com wrote:
> Here's the situation: we have a character based field of up to 2048
> characters in length that we want to store in a single column in a
> sqlserver table. need to be able to search on this field and guarantee
> uniqueness. it would be nice if this field has an index with a unique
> constraint. the problem is that sqlserver can't index anything over
> 900 characters in length.
> any suggestions appreciated.
> we have a couple thoughts so far:
> 1 - compress this field in the data access layer before saving in sql -
> that should keep the length of the field under 900 chars and let us
> apply a unique index in sql. however, we would still have the
> uncompressed, unindexed version of the field in another column
> searching.
> 2 - split the field into multiple columns each under 900 characters.
> apply an index to each column. however, we'd lose the uniqueness
> constraint and this would complicate searches.
> 3 - combination of 1 and 2.
> by the way - we're using SQL Server 2005.
|||Thanks for the replies guys.
We were originally using a hash for the key as David suggested, but
thinking back to my days at university when we had to program hash
algorithms, they dont gaurantee unquiness. In which case the insert
would cause a sql exception and we wouldn't be able to store certain
data strings with non-unique hashes. That would be bad.
In the end, I think we're just going to have to reduce the field to 900
characters as Gert-Jan has suggested. I don't think there's any other
way around it.
steve
|||why didnt u use the text datatype..any reason
|||i need to be able to put a unique index on this field, text fields do
not allow that.

Friday, March 23, 2012

how to index large varchar field

No comments:

Post a Comment

hide system tables

Blog Archive

About Me