14 Replies - 9062 Views - Last Post: 26 June 2010 - 01:03 PM Rate Topic: -----

#1 Sergio Tapia   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1258
  • View blog
  • Posts: 4,168
  • Joined: 27-January 10

Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 08:14 AM

I have a website where businesses can load items for sale. Then end users can search for that item, and find all the stores that carry it.

Most stores sell more than 100 things, and while I do have a form for inserting a single item, it's tremendously stupid to have only this for businesses to offer things.

My idea is to have an option for businesses to upload an excel file and have my ASP.Net application extract information from the columns and populate classes, then with Entity Framework 4 magic I'll persist that information all in one fell swoop.

Are there any pitfalls I should I should be aware of when doing this?
Is this possible?
Is this a very taxing process on my server (will it muck up my website)?

Is This A Good Question/Topic? 0
  • +

Replies To: Is it possible to load an Excel file to memory?

#2 tlhIn`toq   User is offline

  • Xamarin Cert. Dev.
  • member icon

Reputation: 6536
  • View blog
  • Posts: 14,450
  • Joined: 02-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 08:51 AM

And you trust the end user to create the Excel spreadsheet to your exact specifications?
What happens when they screw it up? Add extra columns... play with the part numbers... and so on.
You will need to do a lot of validation on the received document.
Was This Post Helpful? 0
  • +
  • -

#3 Momerath   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1021
  • View blog
  • Posts: 2,463
  • Joined: 04-October 09

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 01:33 PM

View PosttlhIn, on 25 June 2010 - 06:51 AM, said:

And you trust the end user to create the Excel spreadsheet to your exact specifications?
What happens when they screw it up? Add extra columns... play with the part numbers... and so on.
You will need to do a lot of validation on the received document.

I once wrote some software to do this. I just validated all the data and rejected the entire sheet if anything was bad. They quickly learned not to put in bad data :) Of course this client needed my software or they probably would have gone to someone else who was more accomidating.
Was This Post Helpful? 0
  • +
  • -

#4 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 02:19 PM

Things you should know? Well, for one, you'll have to have Excel installed on your server. Probably best to have the latest version, so that it can tolerate any version of document the user uploads.

You probably won't be able to do this in memory. You'll likely need to store the Excel document in a temporary location, do what you need to do, and then (optionally) delete it.
Was This Post Helpful? 0
  • +
  • -

#5 tlhIn`toq   User is offline

  • Xamarin Cert. Dev.
  • member icon

Reputation: 6536
  • View blog
  • Posts: 14,450
  • Joined: 02-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 05:14 PM

Quote

Well, for one, you'll have to have Excel installed on your server.

I'm learning here too. I didn't realize that Excel would have to be installed on the server because I always that the .NET framework had the ability to handle the document type.
Isn't that what the Excel namespace is for? To handle documents without spending a license fee on a machine that never actually opens a document?
http://msdn.microsof...cel(VS.80).aspx

Or do I misunderstand that (probably the case)?
Was This Post Helpful? 0
  • +
  • -

#6 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 05:19 PM

We'll soon find out. I have a VM that doesn't have and never has had any office installations, but does have VS 2010. I'll report back.
Was This Post Helpful? 0
  • +
  • -

#7 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 06:08 PM

Well, I can add the required references (it's actually Microsoft.Office.Interop.Excel), and I can read values from an Excel sheet on my windows box with office, but when I run the same code on the VM, I get an error about a COM object not being registered. And the way the code works suggests you need Excel installed. You are actually creating an instance of an Excel application, and then working with it. I think you're actually opening excel, and using COM to manipulate it.

So, I do believe you'll need excel installed to read from a spreadsheet.

I would suggest that they give you CSVs, since that's dead simple, and excel can save as a CSV.
Was This Post Helpful? 0
  • +
  • -

#8 Momerath   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1021
  • View blog
  • Posts: 2,463
  • Joined: 04-October 09

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 10:39 PM

View PostinsertAlias, on 25 June 2010 - 04:08 PM, said:

Well, I can add the required references (it's actually Microsoft.Office.Interop.Excel), and I can read values from an Excel sheet on my windows box with office, but when I run the same code on the VM, I get an error about a COM object not being registered. And the way the code works suggests you need Excel installed.

You should try installing the free Excel reading software and see if you can open an Excel file. It might have the required COM object.
Was This Post Helpful? 0
  • +
  • -

#9 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 25 June 2010 - 11:02 PM

I'll have to try that later. I'm a bit tired and zoned out now.

One thing I am interested in, though, is if someone has come up with a library or something to actually parse the file itself, rather than use Excel to do it. I mean, office documents from 2k7 and 2010 are stored in the OOXML format, so I would imagine if someone learned that format, they could effectively recreate the office apps, or at least the ability to properly read/edit/save office documents.
Was This Post Helpful? 0
  • +
  • -

#10 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7506
  • View blog
  • Posts: 15,556
  • Joined: 16-October 07

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 02:46 AM

View PostinsertAlias, on 25 June 2010 - 03:19 PM, said:

Things you should know? Well, for one, you'll have to have Excel installed on your server.


This is incorrect and probably not the way to go. As you touched on later, modern Excel docs are in XML format.

All you need to do is parse the doc. This immediately validates some of the user format requirements. It is also dramatically faster than doing calls against a loaded excel instance.

Who knows, you get one parser working, you might also allow the users the option of open office as well. ;)
Was This Post Helpful? 0
  • +
  • -

#11 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 03:50 AM

OK, So I decided to do some real investigation on this. First, an .XSLX document is actually a zipped file. If you change the extension to .ZIP, you can open it with windows's zip client.

But it's a little complicated. For example, I made a spreadsheet called "hw.xslx". It only contains data in two cells: [A,1] = "Hello" [A,2] = "World".

I changed the extension to .zip and unpacked it. Here's the directory structure that came out:
>hw (dir)
|>[Content_Types].xml (file)
|>docProps (dir)
||>app.xml (file)
||>core.xml (file)
|>xl (dir)
||>sharedStrings.xml (file)
||>styles.xml (file)
||>workbook.xml (file)
||>theme (dir)
|||>theme1.xml (file)
||>worksheets (dir)
|||>sheet1.xml (file)
|||>sheet2.xml (file)
|||>sheet3.xml (file)
||>_rels (dir)
|||>workbook.xml.rels (file)
|>_rels (dir)
||>.rels (file)


Most of those are irrelevant to the discussion. The important file is: hw\xl\worksheets\sheet1.xml

Here is the file content:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
  <dimension ref="A1:B1"/>
  <sheetViews>
    <sheetView tabSelected="1" workbookViewId="0">
      <selection activeCell="B1" sqref="B1"/>
    </sheetView>
  </sheetViews>
  <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
  <sheetData>
    <row r="1" spans="1:2" x14ac:dyDescent="0.25">
      <c r="A1" t="s">
        <v>0</v>
      </c>
      <c r="B1" t="s">
        <v>1</v>
      </c>
    </row>
  </sheetData>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>


But that's only part of it. Notice that the cells have a numeric value rather than a string. I actually found the strings themselves in hw\xl\sharedStrings.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
  <si>
    <t>Hello</t>
  </si>
  <si>
    <t>World</t>
  </si>
</sst>


So, I assume that the number references the index of the shared string. I am now interested, so later on I will test it with several data types and see what I come up with.

And this is just for Office 2010. Imagine what fun I can have if I can get ahold of a 2007 document. And anything prior to it is a totally different format. Still xml, but different.

This post has been edited by insertAlias: 26 June 2010 - 04:07 AM

Was This Post Helpful? 0
  • +
  • -

#12 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7506
  • View blog
  • Posts: 15,556
  • Joined: 16-October 07

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 04:16 AM

View PostinsertAlias, on 26 June 2010 - 04:50 AM, said:

Imagine what fun I can have if I can get ahold of a 2007 document.


Strangely enough, earlier versions of office xml were much simpler. That is to say, the xml was almost unreadable but it was all in one file. Later version actually mimic the open document format. The xml from 2003 is what I'll usually use for office output. That version doesn't save to xml by default, but it's an option.

Which actually brings up a point. You can save to a myriad of formats in excel. For the OP, why not just CSV? Almost anything can save to CSV. It's simple and transparent. The user about to upload can easily scan it visually to make sure they're sending what they think they're sending. Seems like a reasonable compromise.

Another option is to provide a template spreadsheet with a send button. Have a macro talk to a web service and make the excel client do the work. I've done this, works pretty well. If they complain it's not working, you just ask, "did you use the supported template?"

And, so it's in a zip file with multiple entries in the manifest? It honestly sounds like a fun puzzle. That's what professional programmers are often paid to do; extract the data no matter what. I can't even count then number of times I've written interfaces to translate data from disparate systems. I could probably do this one in less than a day.
Was This Post Helpful? 0
  • +
  • -

#13 Curtis Rutland   User is offline

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 5106
  • View blog
  • Posts: 9,283
  • Joined: 08-June 10

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 04:21 AM

Quote

And, so it's in a zip file with multiple entries in the manifest? It honestly sounds like a fun puzzle.


It actually does. I'm curious now, so tomorrow (today actually, it's 6:00 am and I'm just about ready to sleep), I'm going to try out several data types, and see what happens. Then I'm going to try to set up a parser that will handle most cases. Then I'm going to wrap it up as a library for later use, and if it all goes well, perhaps an article for the site.

But yeah, CSV is the way I would go. Allowing your clients to upload an actual spreadsheet adds unnecessary overhead, in my opinion.
Was This Post Helpful? 0
  • +
  • -

#14 Momerath   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1021
  • View blog
  • Posts: 2,463
  • Joined: 04-October 09

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 11:53 AM

View Postbaavgai, on 26 June 2010 - 02:16 AM, said:

View PostinsertAlias, on 26 June 2010 - 04:50 AM, said:

Imagine what fun I can have if I can get ahold of a 2007 document.
I could probably do this one in less than a day.

This I'd like to see :) Have you actually read the OOXML specification? If not, you are in for a big shock.
Was This Post Helpful? 0
  • +
  • -

#15 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7506
  • View blog
  • Posts: 15,556
  • Joined: 16-October 07

Re: Is it possible to load an Excel file to memory?

Posted 26 June 2010 - 01:03 PM

View PostMomerath, on 26 June 2010 - 12:53 PM, said:

This I'd like to see :) Have you actually read the OOXML specification? If not, you are in for a big shock.


You mistake the scope of the project. The project is not "implement a fully functional OOXML translation layer." Rather, it's "given a document matching a particular template, extract relevant data."

I've seen the spec, it's an ugly mess. Again, being concerned with all the particular of a spec and taking an object that's known to be invariant on all but a small subset of that spec are two completely different tasks.

Think of it like having an HTML document that always has lines like "<tr><td>CUST001</td><td>Acme</td></tr>" containing the data I'm after. I don't need to know HTML or even XML for the task of extracting that data. In a real HTML doc, there could be whitespace, attributes, nonconformant tags and even missing tags. That's a challenge, but for known data it's not an issue.

This post has been edited by baavgai: 26 June 2010 - 01:03 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1