File Archiver Application

  • (2 Pages)
  • +
  • 1
  • 2

23 Replies - 689 Views - Last Post: 12 June 2013 - 03:00 PM Rate Topic: -----

#1 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

File Archiver Application

Posted 12 June 2013 - 12:04 PM

Hello,

I am working on a prototype application where I need to read a file at the byte level and then add each byte in to a data base. I have written a same application to test with it works fine but its really slow at adding the data to the database. Just for a simple 4.60MB file it took about 20mins to add the data. I read a file in parts of 8 bytes so that if I add a second file to the database if there is already a bit Patton that match I can then refance that database row instead of add adding a second bit patton of the same.

example

File A = [112][12][1][2][32][56][245][90]
File B = [1] [2] [90] [66] [80] [91] [78] [99]

the database will have all of File A bits with in it I then add File B and in stead of adding the following bytes [1] [2] it would refance the File A indexs.

I just need a way to speed it up.

Thanks

Here is my code.

using System;
using System.Collections.Generic;
using System.Data.SQLite;
using System.Linq;
using System.Net.Mime;
using System.Security.Cryptography;
using System.Text;
using System.IO;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace ConsoleApplication1
{
    class Program
    {
        private static void Main(string[] args)
        {
            CreateDatabaseAndTables(Path.Combine(Application.StartupPath, "MyDatabase.db3"));
            ConvertFileIntoByteArray(@"C:\Users\jm136063\Downloads\BlendWPFSDK_en.msi");
        }

        private static void CreateDatabaseAndTables(string database)
        {
            if (!File.Exists(database))
            {
                try
                {
                    SQLiteConnection.CreateFile(database);

                    SQLiteConnection m_dbConnection = new SQLiteConnection("Data Source=" + database + ";Version=3;");
                    m_dbConnection.Open();


                    string sql = "create table BytesTable (id INTEGER PRIMARY KEY AUTOINCREMENT, Bytes VARCHAR(1000))";
                    SQLiteCommand command = new SQLiteCommand(sql, m_dbConnection);
                    command.ExecuteNonQuery();

                    sql =
                        "create table FilesTable (id INTEGER PRIMARY KEY AUTOINCREMENT, FileName VARCHAR(100), OriginalLocation VARCHAR(200), CheckSum VARCHAR(150), Version int, ByteIndexes VARCHAR(1000), DeleteMarked int)";
                    command = new SQLiteCommand(sql, m_dbConnection);
                    command.ExecuteNonQuery();

                    m_dbConnection.Clone();
                }
                catch (Exception exception)
                {
                    
                }
            }
        }

        private static void ConvertFileIntoByteArray(string file)
        {
            List<int> insertIds = new List<int>();
            using (FileStream fileStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            {
                byte[] buffer = new byte[8];

                while (true)
                {
                    int index = 0;
                    while (index < buffer.Length)
                    {
                        int bytesRead = fileStream.Read(buffer, index, buffer.Length - index);
                        if (bytesRead == 0)
                        {
                            break;
                        }

                        index += bytesRead;
                    }

                    if (index != 0)
                    {
                        //send data
                        //http://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa
                        StringBuilder hex = new StringBuilder(buffer.Length * 2);
                        foreach (byte b in buffer)
                        {
                            hex.AppendFormat("{0:x2}", B)/>;
                        }


                        int id = AddFileByte(Path.Combine(Application.StartupPath, "MyDatabase.db3"), hex.ToString());
                        if (id != -1)
                        {
                            insertIds.Add(id);
                        }
                        insertIds.Sort();
                    }

                    if (index != buffer.Length)
                    {
                        return;
                    }
                }

            }
        }

        private static int AddFileByte(string database, string byteValue)
        {
            try
            {
                int row = 0;
                SQLiteConnection m_dbConnection = new SQLiteConnection("Data Source=" + database + ";Version=3;");
                m_dbConnection.Open();

                string sql = string.Format("SELECT id, Bytes From BytesTable WHERE Bytes = '{0}'", byteValue);
                SQLiteCommand command = new SQLiteCommand(sql, m_dbConnection);

                SQLiteDataReader reader = command.ExecuteReader();
                while (reader.Read())
                {
                    row = Convert.ToInt32(reader["id"]);
                }

                //if found use the row value
                if (row != 0)
                {
                    m_dbConnection.Clone();
                    return row;
                }
                else
                {
                    sql = string.Format("INSERT into BytesTable (Bytes) VALUES ('{0}')", byteValue);
                    command = new SQLiteCommand(sql, m_dbConnection);
                    command.ExecuteNonQuery();

                    sql = string.Format("SELECT id, Bytes From BytesTable WHERE Bytes = '{0}'", byteValue);
                    command = new SQLiteCommand(sql, m_dbConnection);

                    reader = command.ExecuteReader();
                    while (reader.Read())
                    {
                        row = Convert.ToInt32(reader["id"]);
                    }

                    if (row != 0)
                    {
                        m_dbConnection.Clone();
                        return row;
                    }
                }
                m_dbConnection.Clone();
                return -1;
            }
            catch (Exception exception)
            {
                Console.WriteLine(exception.Message);
                return 0;
            }
        }
    }
}



This post has been edited by madmorgan: 12 June 2013 - 12:06 PM


Is This A Good Question/Topic? 0
  • +

Replies To: File Archiver Application

#2 tlhIn`toq  Icon User is offline

  • Please show what you have already tried when asking a question.
  • member icon

Reputation: 5677
  • View blog
  • Posts: 12,209
  • Joined: 02-June 10

Re: File Archiver Application

Posted 12 June 2013 - 12:23 PM

You're saving the bytes as strings of hex? Huh?
Why not just take the file as a byte array and shove it as a byte array into a blob type?

Am I following this right (method starting line 101) that you
  • read a byte,
  • Open db connection
  • then save that one byte to the database,
  • Close the connection
  • then read one more byte, {repeat}

Yeah, that is going to have a monsterous overhead.

The way to speed this up is to re-think the logic.
  • Don't convert bytes to strings.
  • Don't work byte-by-byte. Get all your data, format all your data, send it in one transaction.
  • Use datatypes that are more appropriate to the type of data


My suggestion is to put a breakpoint in the write method and walk it through line by line with the f10 key. You'll quickly SEE exactly what is happening. As you go through 20 steps to save 1 byte and get tired of hitting that F-10 key, you'll experience the same drag the computer experiences with your logic.
Was This Post Helpful? 0
  • +
  • -

#3 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9579
  • View blog
  • Posts: 36,291
  • Joined: 12-June 08

Re: File Archiver Application

Posted 12 June 2013 - 12:26 PM

This seems pretty convoluted... why exactly are you doing it this way? What benefit do you gain by indexing every byte of a file? I mean you can totally store the entire file as a byte array in the database.. why try and hook up some odd routing and referencing to a byte?

not to mention your whole insert into the database doesn't seem to follow what you are saying - of if it does goes a long way around doing it.
Was This Post Helpful? 0
  • +
  • -

#4 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

Re: File Archiver Application

Posted 12 June 2013 - 12:31 PM

the reason I did it this was is that I'm going to have to deal with 4GB + files at some point. I thought it would be best to open the DB put the data in then close in case of a DB error latter on.

I don't know how I can put that raw 8 bytes in to the DB in one go with out converting it to a string Hex format.
Was This Post Helpful? 0
  • +
  • -

#5 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9579
  • View blog
  • Posts: 36,291
  • Joined: 12-June 08

Re: File Archiver Application

Posted 12 June 2013 - 12:33 PM

Then that later begs the question - why in the world are you s *STOREING* 4gb files in your data table? Why not store the file in a folder and the table stores the information about said file and it's location?
Was This Post Helpful? 1
  • +
  • -

#6 tlhIn`toq  Icon User is offline

  • Please show what you have already tried when asking a question.
  • member icon

Reputation: 5677
  • View blog
  • Posts: 12,209
  • Joined: 02-June 10

Re: File Archiver Application

Posted 12 June 2013 - 12:33 PM

Quote

ConvertFileIntoByteArray(@"C:\Users\jm136063\Downloads\BlendWPFSDK_en.msi");



Holy crap! Are you trying to store entire MSI's and installers in a database? Nobody does this. Most people don't even store entire large bitmaps in a database like this. Its far more efficient to keep files of that magnitude in a directory then store the path in your database.

Another consideration is what does your table look like? If you have all your info in one table, including these massive files then it slows the database engine.

You could have one table with fields for things like
UniquieID
Name
Description

Then another table with
UniqueID
blob of data

Now you can search your first table quickly because its small bits of easy fields. When you get the match you want by name, you then match up the UniqueID from table1 to the UniqueID in table2 then extract the blob of data.
Was This Post Helpful? 0
  • +
  • -

#7 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

Re: File Archiver Application

Posted 12 June 2013 - 12:35 PM

the second reason I want to refance the sored bytes when adding a new file is for space saving so that instead of having 100+ files that mostly have a but Patten of [112] [80] [90] and sorting the same bytes over and over why not have then in the DB once and refance then same bits for the new files to save space.
Was This Post Helpful? 0
  • +
  • -

#8 tlhIn`toq  Icon User is offline

  • Please show what you have already tried when asking a question.
  • member icon

Reputation: 5677
  • View blog
  • Posts: 12,209
  • Joined: 02-June 10

Re: File Archiver Application

Posted 12 June 2013 - 12:39 PM

View Postmadmorgan, on 12 June 2013 - 01:31 PM, said:

I don't know how I can put that raw 8 bytes in to the DB in one go with out converting it to a string Hex format.


From my FAQ:
[*]Q:... how to do x,y,z with a database {probably for the first time}...
A: Read this tutorial
Entire section of tutorials
Parameterizing Your SQL Queries: The RIGHT Way To Query A Database.
Using SqlDependency to monitor SQL database changes

View Postmadmorgan, on 12 June 2013 - 01:35 PM, said:

the second reason I want to refance the sored bytes when adding a new file is for space saving so that instead of having 100+ files that mostly have a but Patten of [112] [80] [90] and sorting the same bytes over and over why not have then in the DB once and refance then same bits for the new files to save space.


That's just 1980's thinking, but is nuts in today's world. Nobody gives a frak about storage space when a 3tb drive is $100. Just add more drives to your database as needed. You don't sacrifice performance to save $2 in space - while at the same time risk the integrity of your data like that. Any little glitch and everything is trashed.
Was This Post Helpful? 1
  • +
  • -

#9 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9579
  • View blog
  • Posts: 36,291
  • Joined: 12-June 08

Re: File Archiver Application

Posted 12 June 2013 - 12:39 PM

View Postmadmorgan, on 12 June 2013 - 02:35 PM, said:

the second reason I want to refance the sored bytes when adding a new file is for space saving so that instead of having 100+ files that mostly have a but Patten of [112] [80] [90] and sorting the same bytes over and over why not have then in the DB once and refance then same bits for the new files to save space.

... are you telling me you are trying to use your database like a compression routine? Lie a 7zip/zip/rar/tar? Ah.. data tables are *not* the best way to do that... let alone I don't see your algorithm for determining what a pattern of bytes is.
Was This Post Helpful? 0
  • +
  • -

#10 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

Re: File Archiver Application

Posted 12 June 2013 - 12:45 PM

that's right modi123_1 I am trying to use a DB like a compression storage like 7zip ect.
Was This Post Helpful? 0
  • +
  • -

#11 tlhIn`toq  Icon User is offline

  • Please show what you have already tried when asking a question.
  • member icon

Reputation: 5677
  • View blog
  • Posts: 12,209
  • Joined: 02-June 10

Re: File Archiver Application

Posted 12 June 2013 - 12:52 PM

Also, did you realize you're actually adding to the size of most of this? For the single byte of '250' you want to write "fa" as a string.
hex.AppendFormat("{0:x2}", B)/>/>;


So you're trippling your requirement right off the bat. (strings are null terminated so instead of the one byte of 250 you now have f-a-null)
Let's optimistically assume you compress your data by 2/3rds.
You now are going to take up the exact amount of space that you would have if you didn't try to do this.

Even if you strip the nulls and make one massive string of two bytes then dissect it into two-byte chunks, you still double the size of the data as compared to raw bytes. You have to compress at some significant factor greater than by half to make this at all meaningful. In short, its just not worth it.
Was This Post Helpful? 0
  • +
  • -

#12 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

Re: File Archiver Application

Posted 12 June 2013 - 12:52 PM

ah I did not know that tlhIn`toq.
Was This Post Helpful? 0
  • +
  • -

#13 tlhIn`toq  Icon User is offline

  • Please show what you have already tried when asking a question.
  • member icon

Reputation: 5677
  • View blog
  • Posts: 12,209
  • Joined: 02-June 10

Re: File Archiver Application

Posted 12 June 2013 - 12:53 PM

View Postmadmorgan, on 12 June 2013 - 01:45 PM, said:

that's right modi123_1 I am trying to use a DB like a compression storage like 7zip ect.


How about this...
  • Compress the file into a zip.
  • Save the file in a directory
  • Database the path of the file

Was This Post Helpful? 0
  • +
  • -

#14 madmorgan  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 104
  • Joined: 07-May 10

Re: File Archiver Application

Posted 12 June 2013 - 12:59 PM

I did not want to do that really as you sill using up the same space and more by recording the location in a DB I thought it would be better to put the byes in the DB then you only use space when the bit Patten in not all ready in the DB.
Was This Post Helpful? 0
  • +
  • -

#15 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9579
  • View blog
  • Posts: 36,291
  • Joined: 12-June 08

Re: File Archiver Application

Posted 12 June 2013 - 01:03 PM

This is true - if you have no desire to index the files (or their information) then adding a db would be bad. As would be the catastrophic explosion of attempting to store patterns of bytes and then construct files from references to the patterns. A database is not a compression algorithm. If you want compression then create a better algorithm for that!
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2