Wednesday, March 21, 2012

How to improve performance if inner join has more than 2 or 3 tables

Hi everyone

I need a solution for this query. It is working fine for 2 tables but when there are 1000's of records in each table and query has more than 2 tables. The process never ends.

Here is the query

(select siqPid= 1007, t1.Gmt909Time as GmtTime,(t1.engValue+t2.engValue+t3.engValue+t4.engValue) as EngValue,

t1.Loc1Time as locTime,t1.msgId

into #temp5

from #temp1 as t1,#temp2 as t2,#temp3 as t3,#temp4 as t4

where t1.Loc1Time = t2.Loc1Time and t2.Loc1Time = t3.Loc1Time and t3.Loc1Time = t4.Loc1Time)

I was trying to do something with this query.

But the engValues cant be summed up. and if I add that in the query, the query isnt compiling.

(select siqPid= 1007, t1.Gmt909Time as GmtTime,

t1.Loc1Time as locTime,t1.msgId,(t1.engValue+t2.engValue+t3.engValue+t4.engValue) as engValue

--into #temp5

from #temp1 as t1

where exists

(Select 1

from #temp2 as t2

where t1.Loc1Time = t2.Loc1Time and

exists

(Select 1

from #temp3 as t3

where t2.Loc1Time = t3.Loc1Time and

exists

(Select 1

from #temp4 as t4

where t3.Loc1Time = t4.Loc1Time))))

I need immediate help on that, I would appreciate an input on it.

Thanks

-Sarah

What do you mean by the engValues can't be added? You shouldn't have problems joining 4 tables with 1000's of records. SQL Server is designed for queries involving orders of magnitude more records. Here are some suggestions:

1. Use better join syntax. Instead of FROM A, B WHERE A.ID = B.ID use FROM A INNER JOIN B ON A.ID = B.ID.

2. Read up on Table Variables and use them instead of your temp tables. Here's a good link to read about them:http://odetocode.com/Articles/365.aspx. In particular, note that you can define primary keys for table variables which should assist in your query performance.

HTH

|||

Thanks ZLA for the input

The problem is there is already alot of work done before and temp tables are crreated already. So I need to use those for this project.

Can you please help me in refining the query. I did use the syntax you asked me, but its still taking too long. Something like this

Select t1.engValue+t2.engValue+t3.engValue+t4.engValue
from #temp5 t5
inner join #temp1 t1
on t1.Loc1Time = t5.locTime
inner join #temp2 t2
on t2.Loc1Time = t5.locTime
inner join #temp3 t3
on t3.Loc1Time = t5.locTime
inner join #temp4 t4
on t4.Loc1Time = t5.locTime

but that too is taking too long... Please help me with this..

|||

Have you looked at the estimated or actual execution plans for your query. In general, table scans perform slow and I suspect you have too many of them in your execution plan. It is usually better if the plan makes use of indexes.

You can create temp tables with identities, primary keys and indexes. Here is a good article that shows how:http://www.sqlteam.com/item.asp?ItemID=15442 Even though you are stuck with the temp tables, can you at least add some indexes to them to speed up the query execution?

There are other alternatives as well; you might try thinking laterally. Perhaps you can update temp table 5 (or use a separate temp table 6) where you insert 1 row from each of the other 4 temp tables. Just insert locTime, engValue. Then do the following:

SELECT locTime, Sum(engValue) From #Temp6 GROUP BY locTime HAVING Count(*) = 4

This gives the sum but only where all four tables have inserted a record (equivalent to locTime being present in all four temp tables which your inner joins select for).

HTH

|||

Hi ZLA

Thats article has very good points, I can incorporate that in my process but I need to get rid of the error also in this query. Can you please help me in finding out why am I getting an error on this line and what can I do to make it work.

,(t1.engValue+t2.engValue+t3.engValue+t4.engValue) as engValue

(select siqPid= 1007, t1.Gmt909Time as GmtTime,

t1.Loc1Time as locTime,t1.msgId,(t1.engValue+t2.engValue+t3.engValue+t4.engValue) as engValue

--into #temp5

from #temp1 as t1

where exists

(Select 1

from #temp2 as t2

where t1.Loc1Time = t2.Loc1Time and

exists

(Select 1

from #temp3 as t3

where t2.Loc1Time = t3.Loc1Time and

exists

(Select 1

from #temp4 as t4

where t3.Loc1Time = t4.Loc1Time))))

You suggestions are awasome, and the last suggestion that you gave did actually worked for me, but I still want to know why the where exisit query doesnt work, if I want to add the column values from the later tables.

I would appreciate your help on this.

-Sarah

|||

Is it in this last query you are still getting an error? What error are you getting? Even without that, I suspect the problem is because not all the fields exist. Using the exists clause, your query is essentially:

select siqPid= 1007, t1.Gmt909Time as GmtTime, t1.Loc1Time as locTime,t1.msgId,(t1.engValue+t2.engValue+t3.engValue+t4.engValue) as engValue

from #temp1 as t1

where exists ( ... )

This select statement only pulls selected fields from #temp1. Therefore t2, t3, t4 aren't available. They are only used in the WHERE clause. You'll need to go back to a join or use my prior idea of inserting their individual results into a new temp table (#temp6). Then sum across temp6 into temp5.

|||

How long is it taking?

How long is "too long"?

How many rows does each of the temp tables have?

|||

Yes, the inner join is actually never shows up any results, it says its processing the query... I ran it with 282 records each in 4 tables, and the process never got completed, so talking about 1000 of records in each tables it far. But I will be doing that eventually, so I dont know whats wrong with the inner join statement too.

ZLA, yes, that makes sense, the exists clause probably is doing that. And the sum does work with teh group by statement. So for right now I have a solution that you suggested to sum them up. For one time its fine, but not always... as I had to put extra three query's instead of putting just one inner join query.

But to avoid future errors, is the where clause or on clause in the inner join query asking for too much, 'cause there is no clear filter like loctime = 5/6/1990...

Please let me know how else can I improve my inner join then..

Thanks

-Sarah

|||

Am sorry, I was talking about this query now in my last post

Select t1.engValue+t2.engValue+t3.engValue+t4.engValue
from #temp5 t5
inner join #temp1 t1
on t1.Loc1Time = t5.locTime
inner join #temp2 t2
on t2.Loc1Time = t5.locTime
inner join #temp3 t3
on t3.Loc1Time = t5.locTime
inner join #temp4 t4
on t4.Loc1Time = t5.locTime

|||

I started to try to test this locally by creating my own test tables and realized one possible cause for your query to take so long, even with so few records. How many duplicate locTime / Loc1Time values are there across the 5 temp tables?

For example, assume the same datetime value appears 10 times in each table. Since you are only joining on the datetime field, temp5 joined to temp1 will produce 10 x 10 = 100 result rows. Since you have 5 tables, 10 duplicates could produce 100,000 rows by themselves. If your actual number of duplicates is higher than my example, you could be trying to run a query that will return a million or more records.

Try running the following to test this:

SELECT locTime, Count(*) FROM #temp5 GROUP BY locTime HAVING Count(*) > 1 ORDER BY Count(*) DESC

SELECT loc1Time, Count(*) FROM #temp1 GROUP BY loc1Time HAVING Count(*) > 1 ORDER BY Count(*) DESC

etc...

Let me know what you find out.

|||

Well, I went ahead and tested this locally. For my test, I was able to join the tables, each with only 3 duplicates, to return 74,500 rows in less than a second. Below is all of the SQL I used to test this. If you are still seeing the performance problems, it makes me think either the server is way overloaded or there is some other resource constraint impeding your efforts.

Test Table Creation:

ifexists (select *from dbo.sysobjectswhere id =object_id(N'[dbo].[T5]')andOBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[T5]GOifexists (select *from dbo.sysobjectswhere id =object_id(N'[dbo].[T1]')andOBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[T1]GOifexists (select *from dbo.sysobjectswhere id =object_id(N'[dbo].[T2]')andOBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[T2]GOifexists (select *from dbo.sysobjectswhere id =object_id(N'[dbo].[T3]')andOBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[T3]GOifexists (select *from dbo.sysobjectswhere id =object_id(N'[dbo].[T4]')andOBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[T4]GOCREATE TABLE [dbo].[T5] ([ID] [int]IDENTITY (1, 1)NOT NULL ,[locTime] [datetime]NOT NULL )ON [PRIMARY]GOCREATE TABLE [dbo].[T1] ([ID] [int]IDENTITY (1, 1)NOT NULL ,[loc1Time] [datetime]NOT NULL ,[engValue] [float]NOT NULL )ON [PRIMARY]GOCREATE TABLE [dbo].[T2] ([ID] [int]IDENTITY (1, 1)NOT NULL ,[loc1Time] [datetime]NOT NULL ,[engValue] [float]NOT NULL )ON [PRIMARY]GOCREATE TABLE [dbo].[T3] ([ID] [int]IDENTITY (1, 1)NOT NULL ,[loc1Time] [datetime]NOT NULL ,[engValue] [float]NOT NULL )ON [PRIMARY]GOCREATE TABLE [dbo].[T4] ([ID] [int]IDENTITY (1, 1)NOT NULL ,[loc1Time] [datetime]NOT NULL ,[engValue] [float]NOT NULL )ON [PRIMARY]GOALTER TABLE [dbo].[T5]WITH NOCHECK ADD CONSTRAINT [PK_T5]PRIMARY KEY CLUSTERED ([ID])ON [PRIMARY] GOALTER TABLE [dbo].[T1]WITH NOCHECK ADD CONSTRAINT [PK_T1]PRIMARY KEY CLUSTERED ([ID])ON [PRIMARY] GOALTER TABLE [dbo].[T2]WITH NOCHECK ADD CONSTRAINT [PK_T2]PRIMARY KEY CLUSTERED ([ID])ON [PRIMARY] GOALTER TABLE [dbo].[T3]WITH NOCHECK ADD CONSTRAINT [PK_T3]PRIMARY KEY CLUSTERED ([ID])ON [PRIMARY] GOALTER TABLE [dbo].[T4]WITH NOCHECK ADD CONSTRAINT [PK_T4]PRIMARY KEY CLUSTERED ([ID])ON [PRIMARY] GO

Test Table Population:

DECLARE @.Counterint, @.Datedatetime, @.Table intSET @.Counter = 0SET @.Date ='01/01/2001 00:00'WHILE (@.Counter < 1000)BEGIN IF @.Counter = 350or @.Counter = 700BEGIN SET @.Date ='01/01/2001 00:00'END INSERT T5 (locTime)VALUES (@.Date)SET @.Counter = @.Counter + 1SET @.Date =DateAdd(hh, 1, @.Date)ENDSET @.Table = 1WHILE (@.Table < 5)BEGIN SET @.Counter = 0SET @.Date ='01/01/2001 00:00'WHILE (@.Counter < 1000)BEGIN IF @.Counter = 350or @.Counter = 700BEGIN SET @.Date ='01/01/2001 00:00'END IF @.Table = 1INSERT T1 (loc1Time, engValue)VALUES (@.Date, rand())ELSE IF @.Table = 2INSERT T2 (loc1Time, engValue)VALUES (@.Date, rand())ELSE IF @.Table = 3INSERT T3 (loc1Time, engValue)VALUES (@.Date, rand())ELSE IF @.Table = 4INSERT T4 (loc1Time, engValue)VALUES (@.Date, rand())SET @.Counter = @.Counter + 1SET @.Date =DateAdd(hh, 1, @.Date)END SET @.Table = @.Table + 1ENDselectcount(*)from T5selectcount(*)from T1selectcount(*)from T2selectcount(*)from T3selectcount(*)from T4

Temp Table Join Test:

SELECT *INTO #temp5FROM T5SELECT *INTO #temp1FROM T1SELECT *INTO #temp2FROM T2SELECT *INTO #temp3FROM T3SELECT *INTO #temp4FROM T4Select t1.engValue+t2.engValue+t3.engValue+t4.engValuefrom #temp5 t5innerjoin #temp1 t1on t1.Loc1Time = t5.locTimeinnerjoin #temp2 t2on t2.Loc1Time = t5.locTimeinnerjoin #temp3 t3on t3.Loc1Time = t5.locTimeinnerjoin #temp4 t4on t4.Loc1Time = t5.locTimeDROP TABLE #temp5DROP TABLE #temp1DROP TABLE #temp2DROP TABLE #temp3DROP TABLE #temp4
|||

Hi ZLA

Yes, that can be a possiblity because for certain locValues, I have been having probably 20-30 duplicate locId's that must be the cause of keeping the process busy.

I still need to go over the table example you showed me, I will go over it tomorrow, its been very busy at work today... I am really thankful to you for this help.

Tomorrow I will go over it and will create more duplicate records and will confirm if thats the reason of my problem.

Thanks once again

-Sarah

|||

That sounds like it's the cause, then. If you have trouble figuring out how to handle the duplicates, let me know. You'll need to decide / know what the correct answer should be. For example, if

Temp1.loc1Time = '7/1/2006 12:34', Temp1.engValue = 1.0

Temp1.loc1Time = '7/1/2006 12:34', Temp1.engValue = 2.0

Temp2.loc1Time = '7/1/2006 12:34', Temp2.engValue = 0.1

Temp2.loc1Time = '7/1/2006 12:34', Temp2.engValue = 0.2

Assume all other matching records have engValue of 0.0. Using joins, you would get 4 matches: 1.0 and 0.1, 1.0 and 0.2, 2.0 and 0.1, 2.0 and 0.2. That would give you 4 separate sums for that time: 1.1, 1.2, 2.1 and 2.2. If those four are summed together, the grand total is 6.6.

If, however, the grand total for that time should just be 3.3, then the joins are not the right method.

Let me know what you find out.

|||

Hi ZLA

Yes, thats exactly whats happening, one table has 100 adn another has 98 rows with loc1time as 1/1/1900. and with teh join, when I stop in a matter of seconds its showing 14,000 rows and its still not complete.

Yes, it seems like its giving the result of 6.6 and we want 3.3. Either we need to group by the loc1time in each table adn make sure that there is one record in each of the temp tables or delete teh enteries from teh temp tables that have duplicate joining condition.

I think we are going towards your first solution, which asked for grouping in the fifth table with the expression and summing the values up.

what do you suggest?

-Sarah

|||

Hurrah! I can think of several approaches:

1. Is the 1/1/1900 date valid? If not, either correct the underlying problem or exclude these records from the query. To exclude them, you could eliminate them when creating #temp1, #temp2, ... Or you could add a general where clause to the multiple inner join such as "WHERE temp1.loc1Time <> '1/1/1900' AND temp2loc1Time <> '1/1/1900' ..." Or you could modify the joins to do the filtering: "FROM #temp5 t5 INNER JOIN #temp1 t1 ON t1.Loc1Time = t5.locTime AND t1.Loc1Time <> '1/1/1900' ..." I think the first is best if possible; the last is next best.

2. If duplicate values are not correct, I would recommend building the temp tables so they only hold the distinct values you want. That should be the most efficient. If possible, you could get that by adding DISTINCT to the select statement that builds them. If you can't alter the temp table creation, you can use sub-queries in your join:

Select t1.engValue+t2.engValue+t3.engValue+t4.engValue
from (select distinct locTime from #temp5) t5
innerjoin (select distinct Loc1Time from #temp1) t1
on t1.Loc1Time = t5.locTime
innerjoin (select distinct Loc1Time from #temp2) t2
on t2.Loc1Time = t5.locTime
innerjoin (select distinct Loc1Time from #temp3) t3
on t3.Loc1Time = t5.locTime
innerjoin (select distinct Loc1Time from #temp4) t4
on t4.Loc1Time = t5.locTime

3. Use a summing technique on a separate temp table as I described earlier:

There are other alternatives as well; you might try thinking laterally. Perhaps you can update temp table 5 (or use a separate temp table 6) where you insert 1 row from each of the other 4 temp tables. Just insert locTime, engValue. Then do the following: "SELECT locTime, Sum(engValue) From #Temp6 GROUP BY locTime HAVING Count(*) = 4". Please note that the count assumes only 1 row [datetime] is inserted from each temp table. If not, you will need to alter this approach.

The summing technique would be best if a temp table may not have a datetime that is in one of the other temp tables. That would be dependent on your process. If you only want sums where the datetime is in all 5 tables, then I would use a join. If you want all datetimes then I would look at a summing technique.

HTH. Let me know how it turns out.

No comments:

Post a Comment