7 Replies - 1160 Views - Last Post: 20 June 2016 - 08:02 AM Rate Topic: -----

#1 oldSwede   User is offline

  • D.I.C Regular
  • member icon

Reputation: 4
  • View blog
  • Posts: 464
  • Joined: 08-January 16

How to handle aggregating data from multiple DBs into one?

Posted 15 June 2016 - 01:48 AM

I'm working in a company where we have machines that perform certain production and testing tasks. I am not at liberty to reveal the company name or what the actual products are so this will be a little abstract, sorry about that.

These tasks generate results which are stored in local SQL Server Express DBs (databases), one for each machine. Each local DB is supposed to be identical to the others regarding layout (design of tables and columns) but of course have different data.

The company now wants to store a copy of all the data from all machines in one common central DB that is reachable through our company network. The requirement is that we keep the local storage and perform a daily copy of the local data to the central DB, this requirement is because we want the production to continue even if there is a network failiure.

I'm trying to figure out how aggregation can be done and still keep the integrity of the data and would really like some thoughts on this.

The problem I see is that we have a lot of tables that are related to each other via one-to-one, one-to-many and many-to-many relations using row identities to keep the relations. I.e: all tables have a row identity column; "ID", "Integer", "Auto increment" that is used to keep the relations.

So a machine has, as an example (this is just a part of it), the following tables:
* "Gadget" with columns "ID" that identifies the individual gadget and a bunch of other columns about the gadget.
* "TestResult" with columns "ID", "GadgetID" that points to the individual gadget and a bunch of other columns about gadget testing results.
* "GadgetOption" with an "ID" and a bunch of other columns about the gadgets configuration.
* "GadgetOption_Gadget" with an "ID", "GadgetOptionID" and "OptionID". (Many-to-many that connects Gadgets and Options.)

This means that we have several databases, e.g. DB1, DB2, DB3, and so on that have the same ID for the gadgets.

This might not be a major problem since the Gadget.IDs are not used as observable serial numbers, just to identify a gadget in the system. We could change the Gadget.IDs (e.g. by changing int to bigint on the ID columns and adding e.g. n*10^10 for each DBn). But we also have a lot of relations, as described above, that utilizes the row ID of the tables to keep the relations in the DB. Copying these to an aggregated DB would of course mess everything up completely.

Even though I have some ideas about how to attack this task I run in to problems of one kind or an other for each idea I think about. Thus I turn to you guys for input and ideas. :helpsmilie:

How does one go about something like this? :/

This post has been edited by oldSwede: 15 June 2016 - 03:47 AM


Is This A Good Question/Topic? 0
  • +

Replies To: How to handle aggregating data from multiple DBs into one?

#2 modi123_1   User is online

  • Suitor #2
  • member icon



Reputation: 14151
  • View blog
  • Posts: 56,726
  • Joined: 12-June 08

Re: How to handle aggregating data from multiple DBs into one?

Posted 15 June 2016 - 06:53 AM

Quote

These tasks generate results which are stored in local SQL Server Express DBs (databases), one for each machine. Each local DB is supposed to be identical to the others regarding layout (design of tables and columns) but of course have different data.


Immediate question - why do all the machines have their own databases locally?

Quote

This means that we have several databases, e.g. DB1, DB2, DB3, and so on that have the same ID for the gadgets.


The quick way would be in the 'backup'/networked database have a new column called 'machine id' or something along those lines.. When ripping out of some local storage you would provide the id on insert.
Was This Post Helpful? 0
  • +
  • -

#3 oldSwede   User is offline

  • D.I.C Regular
  • member icon

Reputation: 4
  • View blog
  • Posts: 464
  • Joined: 08-January 16

Re: How to handle aggregating data from multiple DBs into one?

Posted 15 June 2016 - 07:55 AM

modi123_1 said:

Immediate question - why do all the machines have their own databases locally?


The requirement is that we keep the local storage and perform a daily copy of the local data to the central DB, this requirement is because we want the production to continue even if there is a network failiure.

And the reason for this is that some of our machines are in places where the network infrastructure has some flaws... :-)


Quote

The quick way would be in the 'backup'/networked database have a new column called 'machine id' or something along those lines.. When ripping out of some local storage you would provide the id on insert.


Would you care to elaborate on this? If we have e.g. the case of a many-to-many I'd have to add the suggested machineID in four places to couple one many-to-many realationship.

1 in Gadget, 1 in Option and 2 in Gadget_Option which is the coupling table for the many-to-many. It is doable but I would like to find some other way.

The way I'm thinking of at the moment, that might not be great is to change all ID columns from int to bigint and add 1001*10^10 to the ID columns of the first machine, add 1002*10^10 to the ID columns of the second machine and so on. That would solve the problem I think. I'm a bit unsure though about messing with the ID columns since I don't know how SQL server will react if I remove isidentity and autoincrement for a table, add 1001*10^10 to all values and reenable isidentity and autoincrement. Anyone tried something like that?

This post has been edited by oldSwede: 15 June 2016 - 07:57 AM

Was This Post Helpful? 0
  • +
  • -

#4 modi123_1   User is online

  • Suitor #2
  • member icon



Reputation: 14151
  • View blog
  • Posts: 56,726
  • Joined: 12-June 08

Re: How to handle aggregating data from multiple DBs into one?

Posted 15 June 2016 - 08:05 AM

Network flaws like so far remote off a satellite foot print you need to shoot the signal about three feet above the immediate horizon to do anything?

I am not sure where to elaborate. If all your local databases have the same table structure then the central DB would have all the same tables with the additional key of a 'machine id' on all of them. If multiple machines may have redundant data then slap an additional key on them all for the machine id.

If you want data integrity then I certainly wouldn't advocate messing with existing ID column data types.

Alternatively - have the individual apps do backups of their database and just dump those to an FTP.
Was This Post Helpful? 1
  • +
  • -

#5 oldSwede   User is offline

  • D.I.C Regular
  • member icon

Reputation: 4
  • View blog
  • Posts: 464
  • Joined: 08-January 16

Re: How to handle aggregating data from multiple DBs into one?

Posted 15 June 2016 - 08:16 AM

Quote

Network flaws like so far remote off a satellite foot print you need to shoot the signal about three feet above the immediate horizon to do anything?


:-D Nah, it's more like in places where they don't maintain the network properly, some countries and sites where they are understaffed, underfunded and don't think of a bad network as a bigger problem than that you have to reload the page on internet you were viewing. I have no control of the network infrastructure and no one pays any attention to my pleas for a better network and a central database with a somewhat reliable connection.


Quote

If you want data integrity then I certainly wouldn't advocate messing with existing ID column data types.


:crazy: Yes, I'm a bit reluctant on that part. It might end with a big :oops: so it would be interesting to hear from someone who has done something like this.
Was This Post Helpful? 0
  • +
  • -

#6 ybadragon   User is offline

  • Home Owner
  • member icon

Reputation: 571
  • View blog
  • Posts: 2,647
  • Joined: 11-May 12

Re: How to handle aggregating data from multiple DBs into one?

Posted 16 June 2016 - 10:45 AM

The way I would tackle this is (given you have the machinename) on the production database have a table specific to those machine names, have an id int identity column on that table. Then all other tables can be the same as they originally are, but instead of the PK being just GadgetID make it GadgetID AND MachineID. Then in the service or whatever you are using to sync these databases assign the machineID accordingly. This way you don't have to worry about potentially duplicating primary keys.

Edit - I should have read modi's suggestion prior to posting as it is basically the same.

This post has been edited by ybadragon: 16 June 2016 - 10:47 AM

Was This Post Helpful? 1
  • +
  • -

#7 oldSwede   User is offline

  • D.I.C Regular
  • member icon

Reputation: 4
  • View blog
  • Posts: 464
  • Joined: 08-January 16

Re: How to handle aggregating data from multiple DBs into one?

Posted 17 June 2016 - 12:36 AM

Yes, thank you! :-)

The main disadvantage with the suggested solution - in my opinion - is that all relations will have to be changed to use this new machine name column. E.g. the Gadgets_Options table, which is a many-to-many, will have to handle dual gadget IDs and also dual Options IDs. The options values may differ for any record so those will need the machine name too and so will most other tables.

The suggested solution, as I understand it, forces me to change the key from "ID" to "Machine + ID" for every table, thus adding the "Machine" column to every table and all relations. Being me I'm a little worried that this will be messed up somewhere and since one place is enough to cause corruption I'm a bit reluctant.
Was This Post Helpful? 0
  • +
  • -

#8 ybadragon   User is offline

  • Home Owner
  • member icon

Reputation: 571
  • View blog
  • Posts: 2,647
  • Joined: 11-May 12

Re: How to handle aggregating data from multiple DBs into one?

Posted 20 June 2016 - 08:02 AM

A way to make sure it isn't missed - Download a trial of Redgate sql toolbelt (If you are using SSMS, it is an awesome tool I use at work.). Right click the main table and map dependencies. This will show you all connected stored procedures/functions/triggers/tables/etc that reference your primary key. Write a script that will make the changes necessary for all the mapped things, then use a tool that comes with the toolbelt called SQL Compare. Link sql compare to your main db for the source and a different database that doesn't have your changes as the target. Generate the deployment script and send the script to whomever needs it to run on their in house database. I believe the last step may not be necesary as there is another tool through redgate that will deploy to multiple databases at once so long as you can touch them remotely.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1