8 Replies - 1814 Views - Last Post: 08 July 2013 - 05:55 AM Rate Topic: -----

#1 Aboch  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 2
  • View blog
  • Posts: 50
  • Joined: 28-April 09

Dealing with big datasets

Posted 18 June 2013 - 03:25 PM

Hi all,

Theory/advice question, without making you have to read a long explanation. I need to process a lot of data from a lot of tables. About 1070 tables, 120 million records, not normalized.... :death:/>

So I can't change any of the source issues, but I need to get data out.... single complex SQL is overtaxing the server so I was thinking about eating the elephant one bite at a time and tiering my SQL... But I am not sure this is any better or faster then a Complex SQL.

i.e
Dim userparam as string
Dim SQL1 as string
Dim SQL2 as string
dim SQl3 as string
dim SQl4 as string 

' all the other data connection code

sql1 = "Select * from firsttable where firstfield =" + SQL1


' throw it in a datagrid and cycle with a FOR statement
' use the reference key to query the next table 

Secondcriteria = "reference key from first datagrid"

For i = 0 to count of datagrid
SQl2 = "select * from secondtable where Secondfield =" + Secondcriteria(i)

'Rinse repeat FOR statements and datagrids till I get the final Dataset  



I am hoping someone has experience with this problem and can give me some advice or references about the issue. Thank you for your time.

Is This A Good Question/Topic? 0
  • +

Replies To: Dealing with big datasets

#2 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 1992
  • View blog
  • Posts: 4,139
  • Joined: 11-December 07

Re: Dealing with big datasets

Posted 21 June 2013 - 04:52 AM

Quote

About 1070 tables

W. T. F.

I feel your pain!

A naive question, are the columns you use for searching and joining indexed? Do you have enough space to do that?
Was This Post Helpful? 1
  • +
  • -

#3 P4L  Icon User is offline

  • Your worst nightmare
  • member icon

Reputation: 34
  • View blog
  • Posts: 2,788
  • Joined: 07-February 08

Re: Dealing with big datasets

Posted 26 June 2013 - 11:18 AM

What you should do is create a temp table that stores the information, and then write your simple query against that.

Sadly, I don't know how to do that in vb.net.
Was This Post Helpful? 1
  • +
  • -

#4 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: Dealing with big datasets

Posted 27 June 2013 - 06:47 PM

Apologies in advance as I am not a VB.NET coder. I've dabbled in it, but only as a last resort.

I've had to deal with large datasets on a few occasions, generally in the area of, what the client expresses as, "Broken, no want wait, fixey fixey". Most of the time this is with EAVs of millions of data points, or hierarchical structures tiering as far a 12-15 deep, with millions of branch and leaf nodes.

What's the structure of the data in these 1070 tables? I know you've said they aren't normalised, but how do they relate to each other? Are we talking hierarchical data, flat data spread across multiple tables, nexus data, something else?
Was This Post Helpful? 1
  • +
  • -

#5 Aboch  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 2
  • View blog
  • Posts: 50
  • Joined: 28-April 09

Re: Dealing with big datasets

Posted 02 July 2013 - 02:44 PM

First off sorry all I haven't checked this in a bit and i appreciate the responses.
cfoley:

As for indexing goes if they are it is not something that I have seen or have been pointed to. I don't see any planned order to the column structure and don't really know if there is a way to best use that if there was. I just found a bunch of resources aboiut indexing after reading your response so i ll need to do some reading to know if i can answer your question correctly.

P4L:

If I understand correctly essentially that's what I am thinking. I said data grid but your right a table is better.

e_i_pi:

What I can tell for structure is there is Primary keys that attach tables together. i.e. table1.num to table2.num. But past that there are issues like duplicate data on tables and no clear path to get from one table to another efficiently. So if I want data on one table that has names on it and data on another table with any other information I may have to link 4-5 tables in a chain just to link the data. With some tables being 300 thousand records and others 6 million records the exponential growth of data is out of control. So I don't know what structure it would really fall under the company that made it said it was relational.... Um I do see some hierarchical structure as there are some keys that are unique to a person or item while others are free reign almost flat. They also have linking style tables that only server to bridge some of the more common tables but they are rare.

I was re-tasked for a minute but I am back on this so i'll post something when I have a something together and tested.

Thank you all.
Was This Post Helpful? 0
  • +
  • -

#6 e_i_pi  Icon User is offline

  • = -1
  • member icon

Reputation: 795
  • View blog
  • Posts: 1,681
  • Joined: 30-January 09

Re: Dealing with big datasets

Posted 02 July 2013 - 05:42 PM

Presuming there are hundreds of thousands or millions of records, and JOINing in all the information for just one record is extremely expensive, something you could try is batch processing of data. Depending on how much grunt your application server has, you might able to do it in code in one go (as you are doing) but I would say that this would place an enormous amount of strain not only on the application server but also the DB server.

An alternate way to do batch processing is to add a column to the primary table that is a boolean flag of whether the record has been processed or not. If the data is being constantly updated, you could set a trigger on the primary table to set that flag to false on any update to a row. If you are constrained to the schema (i.e. - you can't add columns) then you could create another table, even in another DB, which holds the PK information and the flag.

After that, you could create the replete query that JOINs in all the necessary information, and run that query for batches of 10/100/1000/etc at a time. I would do some benchmarking first to see what batch size is appropriate. You might also want to get some output on server load across the day/week as there are times when it is more appropriate to do batch processing (i.e. - night time or the weekend).

What is it precisely that you are trying to do? Are you updating existing records, collating information into another table, or migrating data?
Was This Post Helpful? 1
  • +
  • -

#7 Aboch  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 2
  • View blog
  • Posts: 50
  • Joined: 28-April 09

Re: Dealing with big datasets

Posted 03 July 2013 - 08:33 AM

Sorry I should communicate better the DB I use has 1070 tables but i don't need each of them together all at the same time. I write crystals and do comparative statistics on customer data. I can manually build reports using multiple crystals and MSAccess but I am trying to automate the process so I can hand it to a boss so I don't have to manually process the data every day for them. i.e. I just made a report that looked at customer demographics + purchase location + delivery location + purchased service. This took 24 joins and crashed my computer. When I push processes to the server you can see the load effect the whole company. :devil2:/>/> I apprecaite your guys thoughts it has given me a lot of reading to do and defiantly pointed me in the right direction.
Was This Post Helpful? 0
  • +
  • -

#8 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 1992
  • View blog
  • Posts: 4,139
  • Joined: 11-December 07

Re: Dealing with big datasets

Posted 03 July 2013 - 08:43 AM

Aaah, so you already CAN deal with the data and the issue is purely performance. Correct?

I would look at indexes before all else. Indexing your primary and foreign keys would help with all the joins. Downsides would be time for creating and space for storing the indexes, and a slight performance hit when inserting rows. Other indexes might help too (for columns you search on) but overdoing it could lead to other problems. I'd start with the keys and see if that makes your scripts run at an acceptable speed without breaking anything else.
Was This Post Helpful? 1
  • +
  • -

#9 Aboch  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 2
  • View blog
  • Posts: 50
  • Joined: 28-April 09

Re: Dealing with big datasets

Posted 08 July 2013 - 05:55 AM

Yes performance is my primary issue, sorry about that I wish I was more articulate.

I will do that thank you all for your time on this.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1