7 Replies - 1788 Views - Last Post: 08 February 2013 - 02:08 AM Rate Topic: -----

#1 November-06  Icon User is offline

  • D.I.C Regular

Reputation: 46
  • View blog
  • Posts: 396
  • Joined: 04-January 11

What approach works best?

Posted 05 February 2013 - 03:06 AM

I studied SQL from w3schools, worked my way through the basics of querying and joins and managed to work in a good but a non-IT company after I graduated fresh from the university.

The one-developer-to-many-projects setting has really helped me make a lot of progress after only a year of experience. The company though wouldn't be teaching you the skills. They expect you to know the skills they hired you for.

In my first 7 months, I was in an individual project but in a team with a senior developer who wouldn't be teaching you the step-by-step but is kind enough to assist you in case you are having problem with something. Problem is, I was transferred in a diferent project in which department, I was the only developer. That means, no senior developer or whoever to assist me on the technical side.

So, dreamincode has always been my best friend and so are all the guys in this forum who are passionate with programming. I hope you guys can assist me in my journey.

My problem is a very common issue of "Performance".

I am trying to work on something. The idea is as easy as searching or filtering.

I have coded a lot of searches before but this one I am trying to do takes a lot of time.

I'll disguise the project I am working on by changing the type of data to a very simple example.

There are a lot of manga or anime nowadays from different authors and from different genres. Let's imagine there is a database table containing, the anime/manga title, the author, the year it was animated/published, and the genre.

Suppose our database table has millions of records.

Now user searches for a certain anime. He doesn't know the exact title he wants but decided to filter results by the indicating the name of his favorite author and selects the genres he would like to read.

If it is only one author and genre, then I can pass @authorname and @genre as parameters to my stored procedure and query

SELECT * FROM tblAnime WHERE Author=@authorname AND Genre=@genre



Now, how about if the user is allowed to select one author but multiple genres at once? Lets say, there are 50 available genres. The user can select some (26 for example) or even all 50 genres.

What I did is store the author and genres in a dataset, and passed it to the stored proceure as an xml, stored the xml to a temp table

SELECT * FROM tblAnime anm
INNER JOIN #TEMP tmp
ON anm.Author=tmp.Author AND anm.Genre=tmp.Genre



Problem is... it takes a lot of time. The first query runs for 8 seconds and the next for 33 seconds. Considering both only get 683 rows.

Is there anything I need to do to improve performance. Can you suggest a better approach? Maybe using XML is not a best approach at all.

Is This A Good Question/Topic? 0
  • +

Replies To: What approach works best?

#2 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: What approach works best?

Posted 06 February 2013 - 06:58 PM

Your method looks like it would cause the server some grief. What DB are you running on? The use of @ symbols for parameters suggests MSSQL. If that is so, I can probably help you out.
Was This Post Helpful? 0
  • +
  • -

#3 November-06  Icon User is offline

  • D.I.C Regular

Reputation: 46
  • View blog
  • Posts: 396
  • Joined: 04-January 11

Re: What approach works best?

Posted 06 February 2013 - 11:28 PM

I am using MSSQL. Sorry for failing to indicate it on my first post.

I just ended up continuing with the XML approach. By checking my subqueries and statements in WHERE, from the 33 seconds, it was reduced to only 13. I was happy at first but then after getting the result, there are still tables that I need to join to the result. Now, the whole process takes about 20 seconds in sql.

I am still worried though that it will run slower from the .net web application itself so if there are still some ways to improve this, I would really be glad to know.
Was This Post Helpful? 0
  • +
  • -

#4 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: What approach works best?

Posted 07 February 2013 - 04:04 AM

I'll give you an answer tomorrow when I'm at work and have access to my custom functions that deal with this situation.
Was This Post Helpful? 0
  • +
  • -

#5 rgfirefly24  Icon User is offline

  • D.I.C Lover
  • member icon


Reputation: 267
  • View blog
  • Posts: 1,477
  • Joined: 07-April 08

Re: What approach works best?

Posted 07 February 2013 - 09:52 AM

have you run the execution plan or client statistics to get an idea of where the majority of your time is spent and what is actually doing? It could be that what your using is doing a table scan instead of a key seek or there might be Hash lookups that are causing problems. Shredding XML into a temp table and using that is a very effective way of doing what your asking especially if you do not want to use Dynamic SQL.
Was This Post Helpful? 2
  • +
  • -

#6 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: What approach works best?

Posted 07 February 2013 - 01:50 PM

I know that you've just given a pseudo-example above, but what data types are your author and genre columns in your real-life scenario? Preferably they should be int columns matching against a lookup table, for resolution of specifics (description, code, etc). Int datatypes JOIN a lot quicker than varchar datatypes.

rgfirefly24 has some good advice above, especially in regared to the execution plan. Is there any chance you can show an image of the execution plan? Blanking out the names is fine, we'd mainly be looking for Clustered Index Scans and Key Lookups, though Index Scans and Hash Matches can be candidates for optimisation in some cases. If you have very large tables, I imagine you;d have a bit of Parallelism going on in your exec plan as well.

If you're using SSMS you can use the Query Analyser to generate a graphical execution plan. You can also generate a text representation of the plan by using the following command:
SET SHOWPLAN_TEXT ON
GO
/* enter your query here */
GO


Was This Post Helpful? 1
  • +
  • -

#7 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: What approach works best?

Posted 07 February 2013 - 04:02 PM

Alrighty. The way I approach this situation is reasonably complex, but that's the result of a lot of optimisation testing over many months, with many real-world cases. I have found this method to be very good, easy to implement, and easy to maintain.

Firstly, in the relevant DB, I create a function that parses a comma-delimited string out to a table. The reason I use comma-delimited rather than XML is because I work in the reporting space, and in MS Report Designer it is easy to implode a multi-valued parameter by using Join(Parameters!MyParameter.Value, ","). You could adapt this function to XML, but I'll leave that to you:
Spoiler

Now that I have a table-valued function, I can use this in queries by JOINing against it. We'll get to that later in an example, where I'll show you the syntax, but for now we're going to talk about stored procedures.

There was a time (the 90s I think) where stored procedures were all the rage, to the point that some DBAs converted most of their queries from ad-hoc to stored procs. This of course has the trickle down effect in the DBA community of experts saying to adepts "Always use stored procs!", which leads to newcomers always using stored procs, with no idea why they are practicing the mantra of their predecessors. For a while I used ad-hocs almost exclusively, as I found it easier to maintain the SQL code (I didn't have to jump from the IDE to SSMS and back again), but my recent work has found me moving almost everything to stored procedures. There are two main reasons for this:
  • We have reports that are used for various clients, and it's easier to maintain a stored proc globally than an ad-hoc globally
  • Whenever a query fires, an execution plan is created and temporarily stored in the system tables for the DB. These execution plans are not always shared when there are multiple connections to the DB, unless they are execution plans of stored procedures.

That last point is the most important, but if we combine that with the table-valued function above, then we reach an even more influential conclusion. When you pass a multi-value parameter (from reports, at least) to SQL Server, the query is rewritten so that the multi-values are verbosely stated in the query itself. SO, something like this:
SELECT *
FROM LookupTables
WHERE ID IN @LookupTableID


...when passed the value [1,2,3] to the parameter @LookupTableID, ends up going to the server like this:
SELECT *
FROM LookupTables
WHERE ID IN (1,2,3)


...instead of being passed a parameter. As you can imagine, this means there is a different query string for every combination of multi-values. The downfall of this is that a new execution plan has to be created per new multi-valued call, as execution plans are only shared if the query string is exactly the same, character for character (even down to whitespace and line breaks). You can eliminate this issue altogether by using a stored procedure, which will have the same query string each and every time.

Next, there is a gotcha in stored procedures that is largely unknown and quite insidious. If you use the parameters directly passed to a stored procedure, the stored proc runs slow. I cannot explain why this happens, only that it is a known issue in SQL Server, but there is a workaround. Instead of doing something like this:
CREATE PROCEDURE [dbo].[spReportGlobalBudgetSpendByEntity]
	@ParamID int
AS
BEGIN
	SELECT *
	FROM LookupTables
	WHERE ID = @ParamID
END


...it is better practice to do this...
CREATE PROCEDURE [dbo].[spReportGlobalBudgetSpendByEntity]
	@ParamID int
AS
BEGIN
	DECLARE @ID AS int = @ParamID;

	SELECT *
	FROM LookupTables
	WHERE ID = @ID
END


As I said before, I am unsure as to the particulars of why the engine executes the query faster this way, but you will see a speed up if you declare a variable inside the SP, assign it the parameter value, and then reference the variable instead of the parameter.

Next I shall talk about how to use the table-valued function in a stored procedure, but I will apply it to your case, so that you have a baseline to understand this by. This is the stored procedure I would write to deal with your case given in the OP:
CREATE PROCEDURE [dbo].[spSearchAnimeByAuthorAndGenre]
	@ParamAuthor nvarchar(max),
	@ParamGenre nvarchar(max)
AS
BEGIN
	DECLARE @Author AS nvarchar(max) = @ParamAuthor;
	DECLARE @Genre AS nvarchar(max) = @ParamGenre;
	
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

	SELECT *
	FROM tblAnime anm
	WHERE EXISTS (
		SELECT *
		FROM dbo.fnStringToTable(@Author, ',', 1)
		WHERE Val = anm.Author
	) AND EXISTS (
		SELECT *
		FROM dbo.fnStringToTable(@Genre, ',', 1)
		WHERE Val = anm.Author
	)

END


There's a couple of points to make with that stored procedure.

Firstly, note how I have called the table-valued function fnStringToTable - the first parameter is the comma-delimited string, the second parameter is the delimiter, the third parameter indicates that whitespace should be ignored.

Secondly, instead of using an INNER JOIN, I have used a WHERE EXISTS clause. The simple reason for this is that WHERE EXISTS returns TRUE on the first match, whereas an INNER JOIN will traverse the entire table when determining matches. Given that the @Author and @Genre parameters are essentially filters (i.e. knockout clauses), it is better practice to use WHERE EXISTS, not to mention better on the execution plan.

Lastly, the stored procedure of course gets called with parameters that are comma-delimited strings. If you need to use XML, then you'll need to change the function fnStringToTable to parse XML out to a table. I haven't run benchmarks between comma-delimited and XML, XML may be quicker, but I have not had need to do this.

-------------------------

As an addendum to this, I still think it would be wise to post the execution plan here, as there may be chokepoints in your schema that are causing the query to run slow. If you haven't done so already, I would strongly suggest putting an index on tblAnime with index columns of Author and Genre, though you will probably need to have included (i.e. non-index) columns as well in order for the execution plan to utilise the index. If you are using SELECT *, then you will need to include all the other columns, but it is unlikely that the optimiser will use your index in that case. Indexes are supposed to be on subsets of a table*. Also, you must keep in mind that if there are downstream usages of this query (i.e. - this query is JOINed against) then any further predicate columns used in JOINs against tblAnime will need to be added to the index key columns of your index.

*There are rare occasions where an index on every column of a table is useful, most especially in bridging tables. If a 3-column bridging table, for instance, has a PK of (x, y, z) it is sometimes useful to place 5 more indexes on the table so that every permutation is covered. i.e. (x, z, y) (y, x, z) (y, z, x) (z, x, y) (z, y, x). You could further augment this by placing indexes on ASC/DESC permutations, but you would end up with a total of 48 indexes, which would severely hamper INSERT/UPDATE operations.

This post has been edited by e_i_pi: 07 February 2013 - 04:11 PM

Was This Post Helpful? 1
  • +
  • -

#8 November-06  Icon User is offline

  • D.I.C Regular

Reputation: 46
  • View blog
  • Posts: 396
  • Joined: 04-January 11

Re: What approach works best?

Posted 08 February 2013 - 02:08 AM

Wow. That was impressive guys. I'll try to work out my problem using your advice. I have been going through google searches on enhancing performance but this is the first time I've read about query slowing down when directly using parameters. And there are a lot of things I haven't tried before especially those coming from e_i_pi's post.

I'll sort things out for now. Thanks a lot everyone. I hope I can be really good at database too like you guys.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1