7 Replies - 12641 Views - Last Post: 24 September 2012 - 11:13 PM

#1 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 782
  • View blog
  • Posts: 1,664
  • Joined: 30-January 09

Polling the DB - many small queries or one large query?

Posted 19 January 2012 - 02:58 PM

(It's a toss-up whether this topic should reside in the MySQL forum or the PHP forum, but I think it should be here.)

In my application, one part involves users creating projects, uploading maps (images) to that project, defining territories on those maps, and then defining borders between those territories. For arguments sake, let's use this naming schema:
  • A map is referred to by ProjectID
  • A territory is referred to by TerritoryID
  • A territory for a specific map is referred to by ProjectTerritoryID
  • A border between two territories is referred to by ProjectTerritoryBorderID


Now, since I'm following the convention of 'Desing by Contract', I need to make a few checks when a user wants to add a new border between territories. The checks are:
  • Does the project exist (i.e. - does the ProjectID exist in the DB)?
  • Does the user own the project (i.e. - does the UserID exist against the ProjectID)?
  • Does the source territory exist (i.e. - does the TerritoryID exist in the DB)?
  • Does the target territory exist (i.e. - does the TerritoryID exist in the DB)?
  • Does the source territory exist in that project (i.e. - does the ProjectTerritoryID exist in the DB)?
  • Does the target territory exist in that project (i.e. - does the ProjectTerritoryID exist in the DB)?
  • Finally, check whether the territories already have a border (i.e. - does the ProjectTerritoryBorderID exist in the DB)?


Currently, I am performing these checks within a singleton connection to the DB. There is, in total:
1 DB connection
5 prepared statements
7 executions of prepared statements

Each of these statements is extremely lightweight, generating a COUNT(*) of the matching rows, with zero count meaning the check has failed.

I have a second option - query the database once, generating a model of the project that has information on all users and territories attached to that project (i.e. - generating an associative array of many rows). This query is not so lightweight, as a project with 2 users, 100 territories, and 300 borders will generate 2*100*300 = 60000 rows of data.

Now, while I'm pretty good with database querying and design, I'm not so sure on the administration side of things. Does one connection with 6 queries generate 6 threads, for instance? Should I even be concerned over how many queries 1 DB connection is sending? Is it feasible to pass some of the checking to PHP, or should I stick with my mantra of "let SQL do the heavy lifting"?

This post has been edited by e_i_pi: 19 January 2012 - 03:01 PM


Is This A Good Question/Topic? 1
  • +

Replies To: Polling the DB - many small queries or one large query?

#2 AdaHacker  Icon User is offline

  • Resident Curmudgeon

Reputation: 452
  • View blog
  • Posts: 811
  • Joined: 17-June 08

Re: Polling the DB - many small queries or one large query?

Posted 20 January 2012 - 08:21 AM

Well, I'm no DBA, but I have picked up a few things. Of course, my experience is with high-volume media websites, which tend to be read-heavy scenarios, so I don't know how relevant it is to your situation. Take it for what it's worth.

In general, number of queries is not something you need to worry too much about. Of course, it can become an issue if you're getting more volume than your database server can handle, so you don't want to run needless queries. But it's not the first thing you should have on your list. Think of it this way: either you need the data or you don't. Barring really rookie mistakes (e.g. issuing two queries to get different columns in the same row), if too many queries is your problem, the solution is usually just to live without some of the data. Rewriting the SQL to get the same data in a lower number of queries is in no way guaranteed to help overall performance.

The more immediate and controllable problem is the weight of the queries. In general, you want to minimize total execution time rather than total number of queries. This is especially the case if request volume is a concern. The longer a query runs, the more danger there is that they'll end up stacking up. That is, another request will come in before the server has finished the first one, and so on, and server load will start to skyrocket as you have more and more queries running at once. Also, heavier queries are more subject to change in their execution time when table sizes change, particularly if they end up generating temp tables. With simpler, short-running queries, it's much easier to predict the performance characteristics.

That said, there's nothing inherently wrong with using heavier queries. If that one big query runs faster than the seven smaller ones, that may very well be the best way to go. It all depends on your situation.

As far as doing the "heavy lifting" in SQL or in code, that also depends on your situation. It's kind of a balancing act. If you don't have performance issues, it's usually easier to just do it in SQL, because you're just getting your information directly from the authoritative source. Once you start doing things in your application code, you introduce complications surrounding data integrity, concurrency, cache invalidation, and so forth. If you're in a high-volume scenario and need to stretch your database capacity, then doing that stuff is absolutely worth the effort. If you have more than enough capacity to meet your current load and aren't anticipating any massive growth, then it's probably not worth the effort.

So, as with all performance questions, it depends on your situation. Is what you have now not performing acceptably? And if not, do you know where the bottlenecks are? Or is this just speculative? It's hard to offer any concrete recommendations without knowing the actual problem.

Hope that helps.
Was This Post Helpful? 4
  • +
  • -

#3 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 782
  • View blog
  • Posts: 1,664
  • Joined: 30-January 09

Re: Polling the DB - many small queries or one large query?

Posted 20 January 2012 - 02:35 PM

Yeah it's speculative at the moment. I, like every other website owner, believe my site will generate millions of hits per hour in the near future (j/k). But seriously, it's more a matter of looking into how a single user action may impact on the whole operation. For instance, a single user action may consist of the following calls to the DB:
  • User authentication for that session
  • 3-5 contract checks (e.g. - [inline]if(!User::IsAuthorised() || !User::IsFoo() || !User::IsBar())[/inline)
  • 1-2 INSERTs of user-generated information
  • 3 SELECTs to generate resolved descriptions/information to send back to the user

That would be the typical DB traffic for some very common actions (such as attacking an opponent's territory, or fortifying troops from one territory to another, or entering a message in the chat log). Oh, PS, it's an online game, hence the common actions.

So given that even moderately successful sites like my own have roughly 200 people online at once, and very successful sites have 500-1500 online at once, then I can make some guestimates as to how the DB traffic will flow. If there are 200 people online, perhaps 20 are committing an action at roughly the same time (within a few seconds of each other - actions take maybe 10-30 seconds each). This would translate to perhaps 200 queries across 20 connections every 5 seconds, given the current pattern of using lots of small queries.

I guess my main reason for asking is that I'm unsure of how to accurately measure load on the DB and the PHP server. I know some basics, but not in-depth stuff, my DB skills lie in normalisation and optimised querying, not in measuring the load. I tend to measure a query's optimisation is milliseconds/seconds at the moment.
Was This Post Helpful? 0
  • +
  • -

#4 nightscream  Icon User is offline

  • D.I.C Head

Reputation: 19
  • View blog
  • Posts: 237
  • Joined: 04-December 08

Re: Polling the DB - many small queries or one large query?

Posted 23 September 2012 - 12:22 PM

I never had to deal with such a big problem so i'm not sure it is helpfull or possible.

Your system runs mostly on cache (apc, memcache, ...), so if someone attacks a border you store it in cache and every 5 min or so you save the cache to a database? I think this will reduce the load on mysql but most mysql servers are quite good at doing a lot of small queries. The bigger ones are mostly causing problems if they are called every second.
Was This Post Helpful? 1
  • +
  • -

#5 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 782
  • View blog
  • Posts: 1,664
  • Joined: 30-January 09

Re: Polling the DB - many small queries or one large query?

Posted 23 September 2012 - 02:37 PM

Heh, I've recently been working on this part of my application actually, and found that the bottlenecks are coming not from many small queries, but from a few complex ones. The solution I'm using thus far is to cache query results for the duration of the session, optimising costly queries, and indexing the DB where applicable. There's also opportunities to cache data server-side and client-side which I'm looking into.

Most of the DB operations at the moment are acting on PK fields, so they're covered by a Clustered Index Scans, and they take fractions of a second. I'm talking about a handful of milliseconds here, so even if 10 fire off, it's peanuts for the DB. The main speed-up I've generated so far is caching the DBC checks, things like "Is the project ready to be edited?" and "Does the user have permissions to edit the project?". Given that generating a project screen can involve rendering as many as 300 objects on screen, and that each object must undergo these checks, I've managed to cut down the number of DB hits by up to 95%, which is resulting in a massive increase in performance.

I'll probably write a couple of tutorials soon on how to optimise code and queries, as I've come up with a couple of techniques that seem pretty effective.
Was This Post Helpful? 1
  • +
  • -

#6 nightscream  Icon User is offline

  • D.I.C Head

Reputation: 19
  • View blog
  • Posts: 237
  • Joined: 04-December 08

Re: Polling the DB - many small queries or one large query?

Posted 24 September 2012 - 11:29 AM

sounds great to hear, any change to give us some keywords so we can search google or some directions. I would love to know these things.
Was This Post Helpful? 0
  • +
  • -

#7 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 782
  • View blog
  • Posts: 1,664
  • Joined: 30-January 09

Re: Polling the DB - many small queries or one large query?

Posted 24 September 2012 - 02:17 PM

Reading up about caching techniques is a good start. There's a multitude of ways you can cache, it's not a single method. Think of it as the idea of storing something now that might well be used again in the future. The overhead of storing the information in a cache is outweighed by the time you save on retrieving the data from the cache rather than the original soucre (e.g. - DB).

Some good reading material can be found here and here on query result caching, though I use a different technique that only lasts as long as the user session, since I have a highly dynamic website.

This post has been edited by e_i_pi: 24 September 2012 - 02:17 PM

Was This Post Helpful? 1
  • +
  • -

#8 nightscream  Icon User is offline

  • D.I.C Head

Reputation: 19
  • View blog
  • Posts: 237
  • Joined: 04-December 08

Re: Polling the DB - many small queries or one large query?

Posted 24 September 2012 - 11:13 PM

Ok nothing new than but you receive a reputation point of course :)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1