3 Replies - 293 Views - Last Post: 05 June 2013 - 06:27 AM Rate Topic: -----

#1 Tenderfoot  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 11
  • View blog
  • Posts: 160
  • Joined: 21-March 12

Query speed (a matter of optimization)

Posted 05 June 2013 - 02:55 AM

I have a database that looks like this:

CREATE TABLE Manufacturer 
(
manufacturer_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(64) NOT NULL,
CONSTRAINT manufacturer_PK PRIMARY KEY(manufacturer_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE PhoneType
(
phoneType_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(128) UNIQUE NOT NULL,
color VARCHAR(32),
CONSTRAINT phoneType_PK PRIMARY KEY(phoneType_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE InvoicingPartner
(
invoicingPartner_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(64),
CONSTRAINT invoicingPartner_PK PRIMARY KEY(invoicingPartner_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE Invoice
(
invoice_ID INT AUTO_INCREMENT NOT NULL,
invoiceNumber INT NOT NULL UNIQUE,
invoicingPartner_ID INT NOT NULL,
CONSTRAINT Invoice_PK PRIMARY KEY(invoice_ID),
CONSTRAINT Invoice_InvoicingPartner_FK FOREIGN KEY(invoicingPartner_ID)
REFERENCES InvoicingPartner(invoicingPartner_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE Customer
(
customer_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(64),
CONSTRAINT Customer_PK PRIMARY KEY(customer_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE Supplier
(
supplier_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(64),
CONSTRAINT Supplier_PK PRIMARY KEY(supplier_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE Location
(
location_ID INT AUTO_INCREMENT NOT NULL,
name VARCHAR(64),
CONSTRAINT Location_PK PRIMARY KEY(location_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;

CREATE TABLE Phone 
(
phone_ID INT AUTO_INCREMENT NOT NULL,
imeiNumber CHAR(14) UNIQUE NOT NULL, 
phoneType_ID INT NOT NULL,
manufacturer_ID INT NOT NULL,
invoice_ID INT NOT NULL,
customer_ID INT NOT NULL,
supplier_ID INT,
location_ID INT,
date_of_delivery DATE NOT NULL,
date_of_credit DATE,
date_of_note DATE,
note VARCHAR(256),
salesPrice INT NOT NULL,
creditAmount INT,
CONSTRAINT phone_PK PRIMARY KEY(phone_ID),
CONSTRAINT Phone_to_PhoneType_FK FOREIGN KEY(phoneType_ID)
REFERENCES PhoneType(phoneType_ID),
CONSTRAINT Phone_to_Manufacturer_FK FOREIGN KEY(manufacturer_ID)
REFERENCES Manufacturer(manufacturer_ID),
CONSTRAINT Phone_to_Invoice_FK FOREIGN KEY(invoice_ID)
REFERENCES Invoice(invoice_ID),
CONSTRAINT Phone_to_Customer_FK FOREIGN KEY(customer_ID)
REFERENCES Customer(customer_ID),
CONSTRAINT Phone_to_Supplier_FK FOREIGN KEY(supplier_ID)
REFERENCES Supplier(supplier_ID),
CONSTRAINT Phone_to_Location_FK FOREIGN KEY(location_ID)
REFERENCES Location(location_ID)
)DEFAULT CHARACTER SET utf8 COLLATE utf8_icelandic_ci ENGINE=InnoDB;



My boss wants to be able to read roughly 10.000 records (rows) with about 14 columns each, from an Excel file, and place it into the database. This is a one time deal and will probably take a lot of time, and that's that. Given that just reading through the Excel file and sorting out the duplicates took my program about 15 minutes, I'm afraid putting it into the database will increase that time significantly.

The route I was thinking of taking was:
1. Checking to see if each field exists in the database. Phone references most of the tables, so I need to see if its fields exist there (and then get the IDs) before I insert.
2. Insert it.

This needs to happen ten thousand times, and after that, anywhere from 100-300 times at once. Is there a better way to go about this? I need to do this for the following fields:

"phoneType_ID INT NOT NULL,
manufacturer_ID INT NOT NULL,
invoice_ID INT NOT NULL,
customer_ID INT NOT NULL,
supplier_ID INT,
location_ID INT,"

That's.. 6 queries - is there something I'm missing here?

An example would be, if the Excel field for the phone type is "Lumia 710", I'd need to check the PhoneType table for the ID of Lumia 710, so that I could insert the ID into the Phone table. I'd need to do this for the other 5 fields as well, and then make sure to display an error if it doesn't exist in there at all, e.g. if the excel document had Lumia 710 with two spaces. Then finally I'd have to insert it. I wouldn't see a problem with this if I was only inserting a few records at a time, but he needs to insert those first 10.000, and then 200-300 more at once in the future. And well, waiting is not an option.

Is This A Good Question/Topic? 0
  • +

Replies To: Query speed (a matter of optimization)

#2 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5805
  • View blog
  • Posts: 12,644
  • Joined: 16-October 07

Re: Query speed (a matter of optimization)

Posted 05 June 2013 - 03:59 AM

Create a "staging" or "interface" table. Into that table load all the data from the excel sheet in it's natural order. No constraints on that table, you're just dumping.

Once you have the raw data in a table, then you can start thinking about how to more cleanly organize it. The database has the best tools to move and query data. You might as well use it.

Your Manufacturer, Supplier, Customer, and Location tables are probably unnecessary. Having a table for a few grouped fields makes sense. Having a table for a single value, essentially just replacing that value with a key value instead, does not.
Was This Post Helpful? 1
  • +
  • -

#3 Tenderfoot  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 11
  • View blog
  • Posts: 160
  • Joined: 21-March 12

Re: Query speed (a matter of optimization)

Posted 05 June 2013 - 04:22 AM

View Postbaavgai, on 05 June 2013 - 03:59 AM, said:

Create a "staging" or "interface" table. Into that table load all the data from the excel sheet in it's natural order. No constraints on that table, you're just dumping.

Once you have the raw data in a table, then you can start thinking about how to more cleanly organize it. The database has the best tools to move and query data. You might as well use it.


It needs to be automated though (unsure if that matters).

View Postbaavgai, on 05 June 2013 - 03:59 AM, said:

Your Manufacturer, Supplier, Customer, and Location tables are probably unnecessary. Having a table for a few grouped fields makes sense. Having a table for a single value, essentially just replacing that value with a key value instead, does not.


It's more so that, well, let's say the location is 'company name' on one phone in the excel document, and then 'company nAme' in the other. That's a bad example actually because that could be fixed, but let's say they added an extra space or a dot by accident, I wouldn't want them to create a new location in the database for a location that already exists, so I figured I'd have them make a list of allowed locations, and the program would simply stop them if their document contained an error. The location as is has maybe 3-4 values, the customers are.. 1-2, and it's the same for manufacturers/suppliers.
Was This Post Helpful? 0
  • +
  • -

#4 rgfirefly24  Icon User is offline

  • D.I.C Lover
  • member icon


Reputation: 267
  • View blog
  • Posts: 1,470
  • Joined: 07-April 08

Re: Query speed (a matter of optimization)

Posted 05 June 2013 - 06:27 AM

I've done this two different ways and both of them were pretty good for speed and data integrity. The first was to create a stored proc that would take the file and BCP It in to a staging table (Like Baavgai stated) and then having another procedure that will go in and remove duplicates and parse out the data to the respective tables.

EX:
SET @SQL=N'BULK INSERT #BulkData FROM ''' + @FullFilePath + ''' WITH (FIRSTROW=2 , FIELDTERMINATOR=''' + @Delimiter + ''', ROWTERMINATOR=''\n'')'

EXEC(@SQL) --Import data



#BulkData is a temp table that is created to hold the data until it is moved into the real tables.

The second method I've used takes advantage of the XML Bulk Load functionality of SQL SERVER. Basically I used a program that would take the file, use a XSL translation on it to convert it to XML, and then using XSD it would load the XML file into the database and then run a post bulk load procedure on it. Here is an example: XmlBulkLoad

This post has been edited by rgfirefly24: 05 June 2013 - 06:29 AM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1