2 Replies - 1291 Views - Last Post: 12 September 2012 - 05:47 AM Rate Topic: -----

#1 cupidvogel  Icon User is offline

  • D.I.C Addict

Reputation: 31
  • View blog
  • Posts: 593
  • Joined: 25-November 10

SQL Data Redundancy vs Query Performance

Posted 12 September 2012 - 03:47 AM

Imagine a site like Stackoverflow, where there are millions of users. Each user has his own data in the server, like questions he asked, answers he gave, reputation, edits, bounties, etc. Now the server should have a master database which contains, say, all questions asked in all tags combined, where each row might have the associated tag-id to identify which tags it relates to. What I want to know is how to store the individual info for a user, say for the record of all questions asked by him as seen on his user-page, do I query the master-table for all questions asked by that particular user, or do I create a separate table for each user, create a trigger in the master table that whenever a new record is added, a copy of the record (or perhaps, only a portion, like the question-id and the tag-id) should be updated in the individual user questions table. Then that way, whenever the records for that user have to be retrieved, I can join that table with the master table to fetch the relevant records.

So which method would be better? The first one is direct, there is no data redundancy, but the number of questions might exceed billions, it may be be time and resource consuming to query the entire table for 50 records. On the other hand, the 2nd method will perhaps reduce the time because the records can be fetched directly (though the join may take some time, I think it won't be as much as the time taken to query the master table), but will involve substantial data redundancy (even when only one or two keys are stored) when the number of users become huge.
Edit/Delete Message

Is This A Good Question/Topic? 0
  • +

Replies To: SQL Data Redundancy vs Query Performance

#2 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3730
  • View blog
  • Posts: 6,017
  • Joined: 08-June 10

Re: SQL Data Redundancy vs Query Performance

Posted 12 September 2012 - 05:28 AM

Your second method is a major violation of the basic principles of RDBMS design. If your database has grown to such a massive size that your database system can no longer process your queries in an acceptable amount of time without such tactics, and you've exhausted the normal methods of performance optimization, then I'd say it's time to move to a new system. Don't force a RDBMS to act like a document based NoSQL database; instead just use a document based NoSQL database.

It'd take a whole lot to max out a high-end RDBMS systems like Oracle or MSSQL though. I doubt even a website on the scale of SO could do that. (Given that the people running the system know what they're doing.)


However, if you wanted to keep the main database a RDBMS, and still employ your second method, then I would suggest that you do so on separate system. In a relational database it would be far less than ideal to create a new table per user to list partial references to the user's entries in the main table, but many NoSQL systems would be perfectly suited for such things. It could be considered it a persistent "cache" server, that your front-end code would update whenever a user posts or votes, so that when the user profile is visited only the data relevant to the user could be loaded from that "cache".
Was This Post Helpful? 1
  • +
  • -

#3 cupidvogel  Icon User is offline

  • D.I.C Addict

Reputation: 31
  • View blog
  • Posts: 593
  • Joined: 25-November 10

Re: SQL Data Redundancy vs Query Performance

Posted 12 September 2012 - 05:47 AM

Can you tell me why is the 2nd option violating RDBMS rules? What is wrong with it? And what did you mean by I would suggest that you do so on separate system? Why would NoSQL be ideal here?
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1