2 Replies - 1370 Views - Last Post: 20 January 2012 - 02:35 PM

Topic Sponsor:

#1 e_i_pi  Icon User is online

  • = -1
  • member icon

Reputation: 361
  • View blog
  • Posts: 1,020
  • Joined: 30-January 09

Polling the DB - many small queries or one large query?

Posted 19 January 2012 - 02:58 PM

(It's a toss-up whether this topic should reside in the MySQL forum or the PHP forum, but I think it should be here.)

In my application, one part involves users creating projects, uploading maps (images) to that project, defining territories on those maps, and then defining borders between those territories. For arguments sake, let's use this naming schema:
  • A map is referred to by ProjectID
  • A territory is referred to by TerritoryID
  • A territory for a specific map is referred to by ProjectTerritoryID
  • A border between two territories is referred to by ProjectTerritoryBorderID


Now, since I'm following the convention of 'Desing by Contract', I need to make a few checks when a user wants to add a new border between territories. The checks are:
  • Does the project exist (i.e. - does the ProjectID exist in the DB)?
  • Does the user own the project (i.e. - does the UserID exist against the ProjectID)?
  • Does the source territory exist (i.e. - does the TerritoryID exist in the DB)?
  • Does the target territory exist (i.e. - does the TerritoryID exist in the DB)?
  • Does the source territory exist in that project (i.e. - does the ProjectTerritoryID exist in the DB)?
  • Does the target territory exist in that project (i.e. - does the ProjectTerritoryID exist in the DB)?
  • Finally, check whether the territories already have a border (i.e. - does the ProjectTerritoryBorderID exist in the DB)?


Currently, I am performing these checks within a singleton connection to the DB. There is, in total:
1 DB connection
5 prepared statements
7 executions of prepared statements

Each of these statements is extremely lightweight, generating a COUNT(*) of the matching rows, with zero count meaning the check has failed.

I have a second option - query the database once, generating a model of the project that has information on all users and territories attached to that project (i.e. - generating an associative array of many rows). This query is not so lightweight, as a project with 2 users, 100 territories, and 300 borders will generate 2*100*300 = 60000 rows of data.

Now, while I'm pretty good with database querying and design, I'm not so sure on the administration side of things. Does one connection with 6 queries generate 6 threads, for instance? Should I even be concerned over how many queries 1 DB connection is sending? Is it feasible to pass some of the checking to PHP, or should I stick with my mantra of "let SQL do the heavy lifting"?

This post has been edited by e_i_pi: 19 January 2012 - 03:01 PM


Is This A Good Question/Topic? 0
  • +

Replies To: Polling the DB - many small queries or one large query?

#2 AdaHacker  Icon User is online

  • Resident Curmudgeon

Reputation: 377
  • View blog
  • Posts: 739
  • Joined: 17-June 08

Re: Polling the DB - many small queries or one large query?

Posted 20 January 2012 - 08:21 AM

Well, I'm no DBA, but I have picked up a few things. Of course, my experience is with high-volume media websites, which tend to be read-heavy scenarios, so I don't know how relevant it is to your situation. Take it for what it's worth.

In general, number of queries is not something you need to worry too much about. Of course, it can become an issue if you're getting more volume than your database server can handle, so you don't want to run needless queries. But it's not the first thing you should have on your list. Think of it this way: either you need the data or you don't. Barring really rookie mistakes (e.g. issuing two queries to get different columns in the same row), if too many queries is your problem, the solution is usually just to live without some of the data. Rewriting the SQL to get the same data in a lower number of queries is in no way guaranteed to help overall performance.

The more immediate and controllable problem is the weight of the queries. In general, you want to minimize total execution time rather than total number of queries. This is especially the case if request volume is a concern. The longer a query runs, the more danger there is that they'll end up stacking up. That is, another request will come in before the server has finished the first one, and so on, and server load will start to skyrocket as you have more and more queries running at once. Also, heavier queries are more subject to change in their execution time when table sizes change, particularly if they end up generating temp tables. With simpler, short-running queries, it's much easier to predict the performance characteristics.

That said, there's nothing inherently wrong with using heavier queries. If that one big query runs faster than the seven smaller ones, that may very well be the best way to go. It all depends on your situation.

As far as doing the "heavy lifting" in SQL or in code, that also depends on your situation. It's kind of a balancing act. If you don't have performance issues, it's usually easier to just do it in SQL, because you're just getting your information directly from the authoritative source. Once you start doing things in your application code, you introduce complications surrounding data integrity, concurrency, cache invalidation, and so forth. If you're in a high-volume scenario and need to stretch your database capacity, then doing that stuff is absolutely worth the effort. If you have more than enough capacity to meet your current load and aren't anticipating any massive growth, then it's probably not worth the effort.

So, as with all performance questions, it depends on your situation. Is what you have now not performing acceptably? And if not, do you know where the bottlenecks are? Or is this just speculative? It's hard to offer any concrete recommendations without knowing the actual problem.

Hope that helps.
Was This Post Helpful? 3
  • +
  • -

#3 e_i_pi  Icon User is online

  • = -1
  • member icon

Reputation: 361
  • View blog
  • Posts: 1,020
  • Joined: 30-January 09

Re: Polling the DB - many small queries or one large query?

Posted 20 January 2012 - 02:35 PM

Yeah it's speculative at the moment. I, like every other website owner, believe my site will generate millions of hits per hour in the near future (j/k). But seriously, it's more a matter of looking into how a single user action may impact on the whole operation. For instance, a single user action may consist of the following calls to the DB:
  • User authentication for that session
  • 3-5 contract checks (e.g. - [inline]if(!User::IsAuthorised() || !User::IsFoo() || !User::IsBar())[/inline)
  • 1-2 INSERTs of user-generated information
  • 3 SELECTs to generate resolved descriptions/information to send back to the user

That would be the typical DB traffic for some very common actions (such as attacking an opponent's territory, or fortifying troops from one territory to another, or entering a message in the chat log). Oh, PS, it's an online game, hence the common actions.

So given that even moderately successful sites like my own have roughly 200 people online at once, and very successful sites have 500-1500 online at once, then I can make some guestimates as to how the DB traffic will flow. If there are 200 people online, perhaps 20 are committing an action at roughly the same time (within a few seconds of each other - actions take maybe 10-30 seconds each). This would translate to perhaps 200 queries across 20 connections every 5 seconds, given the current pattern of using lots of small queries.

I guess my main reason for asking is that I'm unsure of how to accurately measure load on the DB and the PHP server. I know some basics, but not in-depth stuff, my DB skills lie in normalisation and optimised querying, not in measuring the load. I tend to measure a query's optimisation is milliseconds/seconds at the moment.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1