Multiple tables conversion from Excel XLS file to XML with OpenXML

  • (2 Pages)
  • +
  • 1
  • 2

16 Replies - 1433 Views - Last Post: 27 April 2014 - 10:21 AM Rate Topic: -----

#1 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 25 April 2014 - 09:02 AM

I have downloaded this project from here -->> http://code.msdn.mic...l-file-7a9bb404

This one works fine and it's a great commented code to understand even for beginners like myself, but it only works with one table per sheet. Once I add second table in sheet, it throws an error that something is wrong in one of the columns in my spreadsheet: It says: " Error occurs! The error message is: Cannot find column 4. "

Basically, I have this type of tables in my spreadsheet:

Posted Image

So I want that my program would export those two tables in a single .XML file (just read them both). And those two tables should be separated in two XML childs: the upper one should be Order-Header and the lower one - Line-Items, like this:

<ROOT>
   <Order-Header>
       .....
   </Order-Header>
   <Line-Items>
       .....
   </Line-Items>
</ROOT> 


ConvertExcelToXML.cs:

using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

namespace CSOpenXmlExcelToXml
{
    public class ConvertExcelToXml
    {
        /// <summary>
        ///  Read Data from selected excel file into DataTable
        /// </summary>
        /// <param name="filename">Excel File Path</param>
        /// <returns></returns>
        private DataTable ReadExcelFile(string filename)
        {
            // Initialize an instance of DataTable
            DataTable dt = new DataTable();

            try
            {
                // Use SpreadSheetDocument class of Open XML SDK to open excel file
                using (SpreadsheetDocument spreadsheetDocument = Spreadsheetdocument.Open(filename, false))
                {

                    // Get Workbook Part of Spread Sheet Document
                    WorkbookPart workbookPart = spreadsheetdocument.WorkbookPart;

                    // Get all sheets in spread sheet document 
                    IEnumerable<Sheet> sheetcollection = spreadsheetdocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();

                    // Get relationship Id
                    string relationshipId = sheetcollection.First().Id.Value;

                    // Get sheet1 Part of Spread Sheet Document
                    WorksheetPart worksheetPart = (WorksheetPart)spreadsheetdocument.WorkbookPart.GetPartById(relationshipId);

                    // Get Data in Excel file
                    SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
                    IEnumerable<Row> rowcollection = sheetData.Descendants<Row>();

                    // If there is no rows in the spreadsheet at all, when just return in how it is and output it
                    if (rowcollection.Count() == 0)
                    {
                        return dt;
                    }

                    // ============================================================================================================

                    // Add columns
                    foreach (Cell cell in rowcollection.ElementAt(0))
                    {
                        dt.Columns.Add(GetValueOfCell(spreadsheetDocument, cell));
                    }

                    // Add rows into DataTable
                    foreach (Row row in rowcollection)
                    {
                        // Create temporary row to read rows in spreadsheet
                        DataRow temprow = dt.NewRow();
                        int columnIndex = 0;
                        foreach (Cell cell in row.Descendants<Cell>())
                        {
                            // Get Cell Column Index
                            int cellColumnIndex = GetColumnIndex(GetColumnName(cell.CellReference));

                            if (columnIndex < cellColumnIndex)
                            {
                                do
                                {
                                    temprow[columnIndex] = string.Empty;
                                    columnIndex++;
                                }

                                while (columnIndex < cellColumnIndex);
                            }

                            temprow[columnIndex] = GetValueOfCell(spreadsheetDocument, cell);
                            columnIndex++;
                        }

                        // Add the row to DataTable
                        // the rows include header row
                        dt.Rows.Add(temprow);
                    }
                }

                // Here remove header row
                dt.Rows.RemoveAt(0);
                return dt;
            }

            // Throw error message
            catch (IOException ex)
            {
                throw new IOException(ex.Message);
            }
        }

        /// <summary>
        ///  Get Value of Cell
        /// </summary>
        /// <param name="spreadsheetdocument">SpreadSheet Document Object</param>
        /// <param name="cell">Cell Object</param>
        /// <returns>The Value in Cell</returns>
        private static string GetValueOfCell(SpreadsheetDocument spreadsheetdocument, Cell cell)
        {
            // Get value in Cell
            SharedStringTablePart sharedString = spreadsheetdocument.WorkbookPart.SharedStringTablePart;
            if (cell.CellValue == null)
            {
                return string.Empty;
            }

            string cellValue = cell.CellValue.InnerText;
            
            // The condition that the Cell DataType is SharedString
            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
            {
                return sharedString.SharedStringTable.ChildElements[int.Parse(cellValue)].InnerText;
            }
            else
            {
                return cellValue;
            }
        }

        /// <summary>
        /// Get Column Name From given cell name
        /// </summary>
        /// <param name="cellReference">Cell Name(For example,A1)</param>
        /// <returns>Column Name(For example, A)</returns>
        private string GetColumnName(string cellReference)
        {
            // Create a regular expression to match the column name of cell
            Regex regex = new Regex("[A-Za-z]+");
            Match match = regex.Match(cellReference);
            return match.Value;
        }

        /// <summary>
        /// Get Index of Column from given column name
        /// </summary>
        /// <param name="columnName">Column Name(For Example,A or AA)</param>
        /// <returns>Column Index</returns>
        private int GetColumnIndex(string columnName)
        {
            int columnIndex = 0;
            int factor = 1;

            // From right to left
            for (int position = columnName.Length - 1; position >= 0; position--)   
            {
                // For letters
                if (Char.IsLetter(columnName[position]))
                {
                    columnIndex += factor * ((columnName[position] - 'A') + 1) - 1;
                    factor *= 26;
                }
            }

            return columnIndex;
        }

        /// <summary>
        /// Convert DataTable to Xml format
        /// </summary>
        /// <param name="filename">Excel File Path</param>
        /// <returns>Xml format string</returns>
        public string GetXML(string filename)
        {
            using (DataSet ds = new DataSet())
            {
                ds.DataSetName = "Document-Order";
                ds.Tables.Add(this.ReadExcelFile(filename));
                return ds.GetXml();
            }
        }
    }
}



P.S. I copied all the code because I think it is the easier way to spot where to change it, to read those two tables in one spreadsheet and export them both in one XML file.

Would appreciate any help or just some ideas, how could I archieve this kind of functionality. Thank you.

Is This A Good Question/Topic? 0
  • +

Replies To: Multiple tables conversion from Excel XLS file to XML with OpenXML

#2 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3666
  • View blog
  • Posts: 11,497
  • Joined: 05-May 12

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 25 April 2014 - 01:47 PM

On which line number does it report that error?
Was This Post Helpful? 0
  • +
  • -

#3 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 04:36 AM

I think, what from line 066 foreach cicle is going over and over again: go from 0 to 4 by calculating cells and columns of the FIRST TABLE, but when it reachies 4, it starts again from 0 through 4. Basically it never ends and never reaches my next table first cell ()I think that's why I'm getting this type of error). I could be wrong. I tried debugging my Windows Form by pressing F10 all the time but it never ends and never reaches my next table. Thank you.
Was This Post Helpful? 0
  • +
  • -

#4 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 04:42 AM

By saying those numbers (from 0 to 4) I meant columnIndex.
Was This Post Helpful? 0
  • +
  • -

#5 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3666
  • View blog
  • Posts: 11,497
  • Joined: 05-May 12

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 06:16 AM

In your original message, you said it is throwing an error. Now you are saying it just keeps going. Which is which?
Was This Post Helpful? 0
  • +
  • -

#6 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 07:23 AM

I have a Windows Form, which has 2 buttons: Browse (to Browse XLS files in computer), and Convert, which converts XLS file to XML. When I found my file in computer, I click "Convert" button and I get an error, which says "Error occurs! The error message is: Cannot find column 4." . This error stops me from any actions and I cannot convert my XLS file (which contains TWO TABLES) to XML format file.

What I have in my mind by saying "It keeps going" is you asked me, in which line my program stops (I basically debug it and see how it works in the "Autos" window). I see that after line 066, it goes through all the columns and indexes and it keeps going to 4 (4 is the last column of the first table in my spreadsheet). I though that this could be the problem, because it NEVER reaches my other table, and nver reads it.

I hope I made it clear now what is going on. Sorry if I get you confused. Appreciate your help.
Was This Post Helpful? 0
  • +
  • -

#7 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1012
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 08:23 AM

There is code missing from what you've posted. Nothing in that code would display "Error occurs!". So what code aren't you showing us?
Was This Post Helpful? 0
  • +
  • -

#8 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 08:31 AM

I did not show MainForm.cs code, because what it only contains all the button clicks and stuff like that (it just a form, which handles errors and exceptions). I thought it was unnecessary.

MainForm.cs:

using System;
using System.IO;
using System.Windows.Forms;

namespace CSOpenXmlExcelToXml
{
    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();
            this.btnSaveAs.Enabled = false;
        }

        /// <summary>
        ///  Open an dialog to let users select Excel file
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void btnBrowser_Click(object sender, EventArgs e)
        {
            // Initializes a OpenFileDialog instance 
            using (OpenFileDialog openfileDialog = new OpenFileDialog())
            {
                openfileDialog.RestoreDirectory = true;
                openfileDialog.Filter = "Excel files(*.xlsx;*.xls)|*.xlsx;*.xls";

                if (openfileDialog.ShowDialog() == DialogResult.OK)
                {
                    tbExcelName.Text = openfileDialog.FileName;
                }
            }
        }

        /// <summary>
        ///  Convert Excel file to Xml format and view in Listbox control
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void btnConvert_Click(object sender, EventArgs e)
        {
            tbXmlView.Clear();
            string excelfileName = tbExcelName.Text;

            if (string.IsNullOrEmpty(excelfileName) || !File.Exists(excelfileName))
            {
                MessageBox.Show("The Excel file is invalid! Please select a valid file.");
                return;
            }

            try
            {
                string xmlFormatstring = new ConvertExcelToXml().GetXML(excelfileName);
                if (string.IsNullOrEmpty(xmlFormatstring))
                {
                    // Line just for checking, if Excel document is empty. If it's true, when just print out an error message
                    MessageBox.Show("The content of Excel file is Empty!");
                    return;
                }

                // Print it in TextView
                tbXmlView.Text = xmlFormatstring;

                // If txbXmlView has text, set btnSaveAs button to be enable
                btnSaveAs.Enabled = true;
            }
            catch (Exception ex)
            {
                // General error message checking for errors
                MessageBox.Show("Error occurs! The error message is: " +ex.Message);
            }
        }

        /// <summary>
        ///  Save the XMl format string as Xml file
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void btnSaveAs_Click(object sender, EventArgs e)
        {
            // Initializes a SaveFileDialog instance 
            using (SaveFileDialog savefiledialog = new SaveFileDialog())
            {
                savefiledialog.RestoreDirectory = true;
                savefiledialog.DefaultExt = "xml";
                savefiledialog.Filter = "All Files(*.xml)|*.xml";
                if (savefiledialog.ShowDialog() == DialogResult.OK)
                {
                    Stream filestream = savefiledialog.OpenFile();
                    StreamWriter streamwriter = new StreamWriter(filestream);
                    streamwriter.Write("<?xml version='1.0'?>" +
                        Environment.NewLine + tbXmlView.Text);
                    streamwriter.Close();
                }
            }
        }

        // Disable maximize button of the form
        private void MainForm_Load(object sender, EventArgs e)
        {
            this.MaximizeBox = false;                           //this is an instance of Form or its decendant
        }
    }
}


Was This Post Helpful? 0
  • +
  • -

#9 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1012
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 08:46 AM

Display the StackTrace from the exception in line 70, that will show where the actual error is occurring.
Was This Post Helpful? 0
  • +
  • -

#10 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 09:37 AM

StackTrace Windows (from Line 70):

Posted Image

Call Stack Window (from Line 70):

Posted Image
Was This Post Helpful? 0
  • +
  • -

#11 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3666
  • View blog
  • Posts: 11,497
  • Joined: 05-May 12

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 26 April 2014 - 07:38 PM

But line 70 is blank in the code you posted originally. Additionally, looking at that screenshot, it also refers to line 90 as a function call. Line 90 in the code you posted is a closing curly brace.

Why are you calling Descendants<Cell>() on each row collection on line 66?
Was This Post Helpful? 0
  • +
  • -

#12 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 27 April 2014 - 12:35 AM

I changed a code a litle bit from the original post. These are the screenshots from the original code:

StackTrace :

Posted Image

CallStack :

Posted Image

Now it refers to line 82, which is
temprow[columnIndex] = GetValueOfCell(spreadsheetDocument, cell);



What I should do with
Descendants<Cell>()
?
Was This Post Helpful? 0
  • +
  • -

#13 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1012
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 27 April 2014 - 12:52 AM

It's detecting that there are 23 rows in your spreadsheet but when it goes to read row 12, there isn't a row 12 so you get the error (for some reason they go from the rightmost column to the leftmost). I don't know enough about the Open XML SDK to tell you what to do and I'm not getting paid enough to research it :)/>

You'll need to find the documentation on Open XML and start reading. I'd go to http://1code.codeplex.com/discussions and ask there.

This post has been edited by Momerath: 27 April 2014 - 12:54 AM

Was This Post Helpful? 0
  • +
  • -

#14 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 27 April 2014 - 03:49 AM

Thank you for your help. I'll try research it.
Was This Post Helpful? 0
  • +
  • -

#15 afterMath007  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 06-March 13

Re: Multiple tables conversion from Excel XLS file to XML with OpenXML

Posted 27 April 2014 - 09:11 AM

Well, I found a cheat to this problem: if I move the bigger table on top of the sheet, and the smaller one to the bottom, in works without any errors, but the TAGS after when the bigger table ends become like this:

Posted Image


Basically, it doesn't recognize headers and the second table after all. I would be very surprise if there is a way of completing this propertly.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2