Quick question on technique to use here :

How to handle 3MB database inside my program

Page 1 of 1

9 Replies - 837 Views - Last Post: 31 December 2010 - 03:29 AM Rate Topic: -----

#1 sixpindin  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 17-December 10

Quick question on technique to use here :

Posted 17 December 2010 - 04:50 AM

Hi there,

Thanks for reading my question! -

I have a 3MB text file that will stores all the data that I need to do a lot of calculation. This is a database file, but it's not in a standard format. It's from an old be-spoke system that used a text file for it's database. It's not XML or anything, it's just text with quite complicated delimiers.

I need to take this file, import it and represent the records in a way that I can use from within my program.

I was just thinking of creating classes for each record from each of the tables in the flat-file. I would then import the file and create a new object for each record that I read from the file.

In my main class (for working with the file) I would have instance variables for each table in the file ad these would be arrays of record objects.


Does this sound silly? - Please can anybody advice what would be the best technique to do what I'm trying to do?

Thanks,
Jamie

Is This A Good Question/Topic? 0
  • +

Replies To: Quick question on technique to use here :

#2 BigR1983  Icon User is offline

  • D.I.C Head

Reputation: 57
  • View blog
  • Posts: 221
  • Joined: 12-April 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 05:23 AM

I do something similar with an application that I have.

I read in the records, mine are in XML so makes it a bit easier, but I create a new object for each record and they are stored in a List.
Was This Post Helpful? 0
  • +
  • -

#3 Zunera  Icon User is offline

  • D.I.C Head

Reputation: 28
  • View blog
  • Posts: 74
  • Joined: 07-December 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 05:44 AM

Hi Jamie,

your approach in general sounds fine for me as it is object-oriented. But ADO.Net (System.Data) already provides classes like DataSet, DataTable, DataColumn and DataRow to handle database like datastructure.

So i would use DataTables (you can also used the typed ones which easy up your life if you define a property for each field within a record) as basis for your tables and apply corresponding DataColumns and then adding DataRows for each record in your textfile to the corresponding datatable. You may also use a DataSet to put all your DataTables in - and you have a real working database structure you can use in .Net environment making use of DataGrid, Databinding, ... and store/restore to/from XML-file.

This links i found by quick google search:
http://allabout-dotn...-datatable.html
http://msdn.microsof...v=vs.71%29.aspx

BTW: Dont get bothered with SQLDataAdapter and other connection stuff if you humble over it on some articles - as you have a very special database file you have to fill your datatables manually as you already mentioned.
Was This Post Helpful? 0
  • +
  • -

#4 Sergio Tapia  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1252
  • View blog
  • Posts: 4,168
  • Joined: 27-January 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 05:46 AM

It depends really. Is you text file using a delimiter for each value in a record? You mention it uses a non-standard format, can you elaborate?

I would create a nice POCO Object that acts as a model and then create an interface IRecordFinder that way you can create different concrete implementations of the act of fetching records. Right now you'll be handling only text files, but maybe you'll need a different information source in the future.

Give us some more information.

This post has been edited by Sergio Tapia: 17 December 2010 - 05:47 AM

Was This Post Helpful? 0
  • +
  • -

#5 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4424
  • View blog
  • Posts: 7,692
  • Joined: 08-June 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 08:17 AM

View PostZunera, on 17 December 2010 - 05:44 AM, said:

But ADO.Net (System.Data) already provides classes like DataSet, DataTable, DataColumn and DataRow to handle database like datastructure.


I think that depends on the paradigms you are comfortable with. If you already know ADO.NET, then that is a great suggestion.

If you're "tabula rasa" or have a LINQ background, creating your own objects and using Lists or IEnumerables of them is also a great way to go, and more "current" (in the sense that MS is taking the language in that direction). You can query your collections with LINQ.

This is really just a style preference, but it can be important.
Was This Post Helpful? 0
  • +
  • -

#6 Zunera  Icon User is offline

  • D.I.C Head

Reputation: 28
  • View blog
  • Posts: 74
  • Joined: 07-December 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 10:04 AM

View PostinsertAlias, on 17 December 2010 - 02:17 PM, said:

I think that depends on the paradigms you are comfortable with. If you already know ADO.NET, then that is a great suggestion.

If you're "tabula rasa" or have a LINQ background, creating your own objects and using Lists or IEnumerables of them is also a great way to go, and more "current" (in the sense that MS is taking the language in that direction). You can query your collections with LINQ.

This is really just a style preference, but it can be important.

I agree that it depends on the preferences you have. But I guess my suggestion is great in either ways ;) to avoid a lot of work for "tabula rasa" guys creating an own database structure from the scratch. Especially that you are pointing out the usage of LinQ (to object): It's also applicable to the ADO.Net datastructure.
Was This Post Helpful? 0
  • +
  • -

#7 sixpindin  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 17-December 10

Re: Quick question on technique to use here :

Posted 17 December 2010 - 03:41 PM

Hi BigR1983, Zunera, Sergio, insertAlias,

Thank-you so much for your time. I'm still not used to C# yet (comming from visual basic 5) but I've spent all day getting to grips with all your comments. It's really given me something to go on.

Sergio:
To answer your questions : This is for parsing a file tha holds information on how to charge for calls on a telecoms network. Dispite the size of this company, the system is humble enough and was build by a local company, 15 years ago. Rates often change and the calculation for cost of a call is actually quite complex; for-instance: we need to know a-lot of information about the kind of price-plan the caller is subscribed to and also a plethora of information about the circumstances in which the call is made (where?, when?, Duration?, Does it minute-round?, where are they calling to (which is derrived from prefix)) - etc.
All this makes the database file very big (currently about 3.2MB in plain-text form).

Delimiters are odd. It seems to be delimited by whitespace of more than 10..spaces..(but could be more) - It's really very annoying.

Everybody:

After today (Reading alot) I feel that, since ADO and POCO seems to be about the same amount of work, I would've gone for POCO because it feels more efficient.
The problem with that is, LINQ is only supported in .NET 2.0 and above.

I need this tool, 'Bertha', (anybody remember the 80's childrens program??), to be compatable with the earliest versions of .NET. I would like to compile against .NET Version 1.1.

Zumera's suggestion of ADO is one that I would go for, except I worry about performance.
I need to do as many lookups as I can, as fast as I can, as this impacts how many telephone-events (calls, mms, sms, data, etc) I can simulate, in the time the tool would have (about 10 minutes..)-


All that said, I'm feeling that POCO is the way to go here, and that (since I know that the inputs will never change) I would just build my query methods into the POCO that holds the database.

I know it's not cool to think "but the inputs never change", however, in this case, they won't, and if they do, they will not resemble this system in any way.

Please, if you have a moment, review what I've said and let me know if you think I'm being rational!


Thanks again,
Jamie
Was This Post Helpful? 0
  • +
  • -

#8 Zunera  Icon User is offline

  • D.I.C Head

Reputation: 28
  • View blog
  • Posts: 74
  • Joined: 07-December 10

Re: Quick question on technique to use here :

Posted 18 December 2010 - 05:29 AM

Hi Jamie,

it's quite good that you really think about the basis before starting the project. You are on the right way already ;)

I dont really get the part with this 'Bertha' tool - dont know what it is and dont know why its important for you to compile against .NET 1.1?! But in fact you won't have access to LinQ libraries which makes the POCO approach even more unattractive to me...
All I see at your post's is that you have a database (or database-like organized data) but no datastructure in .NET - and thats exactly the reason why ADO has been invented.

However, it's not my aim to advertise ADO.Net as I'm only use it if it's makes sence either (our company uses a 3rd party library and not ADO at all). I just want to make sure you dont do the same mistake I did once: Implementing alot of functions that already exists - with the result of less functionality, lower performance and much more time I wasted at the end.

So let me just ask you some more questions:

1. Ammount of data? - ADO is sufficient for up to 500.000 datarows per table (my personal experience), do POCO if you are working with more
2. Is it all about performance? - do POCO and be sure you read about hashing, sorting, trees, ...; but you dont need to have general doubts of the performance of ADO => it surely has some overhead cause of its structure but it's implemented very well and probably performing better than self-writting functions which were not optimized at the end
3. Is the time to implement a factor? - ADO really(!) needs some time to get in the whole possibilities and doing it right but rushes at the end; with your own classes it all depends on your programming skills - you might only be faster at the first glance
4. Does your data need a visualization in .NET (e.g. within a grid)? - i would prefer ADO

At the end insertAlias said it right:

Quote

This is really just a style preference, but it can be important.

Hf
Was This Post Helpful? 0
  • +
  • -

#9 Sergio Tapia  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1252
  • View blog
  • Posts: 4,168
  • Joined: 27-January 10

Re: Quick question on technique to use here :

Posted 18 December 2010 - 10:52 AM

Can you post a sample of what the database information looks like? To see delimiters, and other aspects.
Was This Post Helpful? 0
  • +
  • -

#10 Guest_sixpindin*


Reputation:

Re: Quick question on technique to use here :

Posted 31 December 2010 - 03:29 AM

Hi Sergio,

I've looked for an example of the more complicated table definitions but have come up against a brick wall, that is, I can't get away with posting too detailed information about the tables, due to client confidentiality. The annoying thing is that, sometimes, there may be a field missing, so, this means I can not just delimit the stream of rows by 'whitespace' as, if the middle field from one record was missing, I'd incorrectly use the last field as the second field.
There appears to be a maximum length of whitespace, before it becomes a valid delimiter. I think about 10 spaces.

Zunera,

Thanks for that reply. You covered a lot of my own considerations, thanks.
Here are some answers to your questions:

1. Ammount of data? - ADO is sufficient for up to 500.000 datarows per table (my personal experience), do POCO if you are working with more
Biggest table is about 15,000 records. Entire database is about 50,000 rows of data spread through-out 19 tables.


2. Is it all about performance? - do POCO and be sure you read about hashing, sorting, trees, ...; but you dont need to have general doubts of the performance of ADO => it surely has some overhead cause of its structure but it's implemented very well and probably performing better than self-writting functions which were not optimized at the end
The big table (15,000) must be able to return full-search query at-least once per 0.5 seconds.

3. Is the time to implement a factor? - ADO really(!) needs some time to get in the whole possibilities and doing it right but rushes at the end; with your own classes it all depends on your programming skills - you might only be faster at the first glance
Time to impliment is a factor, but I have all Christmas to make this part (importing of database) work.

4. Does your data need a visualization in .NET (e.g. within a grid)? - i would prefer ADO
This would be a very good future feature.

5. Why compile against .NET 1.1?
The client machines use .NET 1.1, and restrictions mean it would not be practical for every user to upgrade their framework version.

The main reason I was thinking POCO was because of the requirments I have for seaching and returning data from a 15,000 row table, as fast as possible.

Perhaps ADO will be best then, and if the speed is not acceptable, then I can create a special class for stuff that needs to be faster.
Was This Post Helpful? 0

Page 1 of 1