Join 300,354 Programmers for FREE! Get instant access to thousands of experts, tutorials, code snippets, and more! There are 1,785 people online right now. Registration is fast and FREE... Join Now!
I've been learning SQL and PHP for about a year. A database for an application I've been developing was initially a one-table monster, then split up, and now I'm thinking about putting it all back together.
Here's my question: is this table just too big and unwieldy? The application has five tables total, but this one is by far the biggest. I'm not sure if searching through a table with more columns slows down MySQL. Ultimately, this application is going to be handling massive amounts of traffic, mostly SELECT requests. As a result, I'm trying to avoid joins as much as possible (apparently they slow down performance). The only place this table could use normalization is under cat1-cat4, those are categories; but since the searches will rely heavily on them, I decided (correctly? you tell me) against normalization.
CODE
CREATE TABLE info ( id INT unsigned NOT NULL AUTO_INCREMENT, name VARCHAR(80) NOT NULL, phone VARCHAR(14) NOT NULL, email VARCHAR(60) NOT NULL, addressline1 VARCHAR(80) NOT NULL, addressline2 VARCHAR(80), zip VARCHAR(5) NOT NULL, city VARCHAR(50) NOT NULL, url VARCHAR(80) NOT NULL, state CHAR(2) NOT NULL, price TINYINT(1) NULL, cat1 TINYINT(2), cat2 TINYINT(2), cat3 TINYINT(2), cat4 TINYINT(2), lat FLOAT(10,6), lng FLOAT(10,6), abou TINYINT unsigned NOT NULL DEFAULT 1, even TINYINT unsigned NOT NULL DEFAULT 1, hour TINYINT unsigned NOT NULL DEFAULT 1, info TINYINT unsigned NOT NULL DEFAULT 1, lnks TINYINT unsigned NOT NULL DEFAULT 1, tour TINYINT unsigned NOT NULL DEFAULT 1, su VARCHAR(19), mo VARCHAR(19), tu VARCHAR(19), we VARCHAR(19), th VARCHAR(19), fr VARCHAR(19), sa VARCHAR(19), smallsum TEXT, largesum TEXT, menulink VARCHAR(100), websitelink VARCHAR(100), p_enabled unsigned TINYINT NOT NULL DEFAULT 0, PRIMARY KEY (id), INDEX (id), INDEX (vs1), INDEX (vs2), INDEX (vs3), INDEX (vs4), INDEX lat_lng (lat,lng), INDEX (url) );
Any advice would be greatly appreciated, I've been pulling out my hair trying to find something that advices my situation
I'm trying to avoid joins as much as possible (apparently they slow down performance).
This is incorrect. A table with a huge row size will actually impact performance worse. A database is designed for joins. Avoiding them is crippling yourself.
QUOTE(smclark1489 @ 27 Jun, 2009 - 07:47 PM)
The only place this table could use normalization is under cat1-cat4, those are categories; but since the searches will rely heavily on them, I decided (correctly? you tell me) against normalization.
This will also slow you down, a lot.
Which of these two is faster?
sql
select name from info where cat1=2 or cat2=2 or cat3=2 or cat4=2
-- or
select a.name from info a inner join info_cat b on a.info_id=b.info_id and b.cat_id=2
The second is faster.
Without knowing more about the data, it's hard to clean this up more. However, "abou,even,hour,info,lnks,tour" look like flags and can also be in a different table. The fields "su,mo,tu,we,th,fr,sa" may also be the same.
The issue I have with joins (but, I'm a noob) is that almost all of the data in this table would be retrieved all at the same time (including the flags) when loading a profile - in addition to a couple other tables (requiring joins). So in this situation, it would still be faster to split it up and perform multiple joins? It seems like I'm always cautioned against it, but these questions don't seem to get very relevant answers when fired into Google.
Always normalise your data and utilise joins if needed; this is a tried and tested method used by most database designers.
That's solid advice; the thing about this table is that it is already normalized (except for the four categories). The reason I would split it up is because there are many columns; and I keep finding contradictory information about whether that's a bad or good thing. A query won't have to search through multiple tables to return the result, everything it needs will be right there - but that row will be substantially bigger (maybe that's bad?).
The reason I'm questioning normalizing the 'categories' columns is that over 50% of the queries will be searching based on those columns, which makes me wonder if the organization benefit of having those in their own table, with another table linking the two, is really going to pay off. It seems like the extra resources needed to do the joins isn't going to be worth the cleaner table setup, unless there's another piece of the equation I'm missing.
Also, I'm not trying to say anybody who answers this is wrong; this situation is just a peculiar one, which is why I haven't been able to find a clear answer either way with Google. You know a lot more than me, and I respect that.
One other thing: I'm probably going to go with InnoDB tables over myISAM. 95% of usage will probably be SELECTs, 4% UPDATEs, and 1% INSERTs. MyISAM sounds like it would be faster for selects, but isn't too easy to back up - and with a massive database like this, it would need to be backed up quite often. Is that a good choice? I did some research, but it seemed to turn up more fanatics than science.
I am rather inexperienced with databases, however I can attest that I have never had a problem with speed relating to multiple joins.
As you can see from the query below, there are several tables being joined together. Returning < 10 rows takes 0.0013s on average, while returning 15,000 - 30,000 rows takes 0.1 - 0.2 seconds. Running the query again takes significantly less after that.
CODE
SELECT wo.WORKORDERID AS "ID", wo.TITLE AS "Subject", u.FIRST_NAME AS "Requester Name", u2.FIRST_NAME AS "Technician", cd.CATEGORYNAME AS "Category", scd.NAME AS "Sub Category", FROM_UNIXTIME(wo.CREATEDTIME/1000) AS "Created Date" FROM workorder wo, sduser sdu, workorderstates wos LEFT JOIN aaauser u on u.user_id=sdu.userid LEFT JOIN aaauser u2 on u2.user_id=wos.ownerid LEFT JOIN helpdeskcrew t on t.technicianid=wos.ownerid LEFT JOIN categorydefinition cd on wos.categoryid=cd.categoryid LEFT JOIN subcategorydefinition scd on wos.subcategoryid = scd.subcategoryid WHERE wo.requesterid = sdu.userid AND wo.workorderid=wos.workorderid ORDER BY wo.CREATEDTIME DESC;
Hope this can give you some guidance in making a final judgement.
I develop medical information systems and often come across the problem where a large user base is accessing the database at the same time.
As I mentioned earlier you need to normalise the structure you posted but possibly look at using stored procedures to process the data as you can reduce the server calls which will keep network traffic to a minimum.
If you access the database more than once for the same thing you should look at utilising functions and stored procedures.