How to work with word document?

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4

51 Replies - 2062 Views - Last Post: 19 January 2013 - 09:25 AM Rate Topic: -----

#31 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 13 January 2013 - 12:42 PM

Yes, exactly, the translit program for Kazakh language.

I'm just wondering, why whould you translit by words? Is it faster?
and there are also some rules that should be applied for some letter, and then there are some exception words that I need to check with database...
That's why I choosed first to check for certain letters, then for exception words, and then translit the rest of the document letter by letter.


View PostSkydiver, on 13 January 2013 - 10:55 AM, said:

Ah... Translit.

If I had to tackle this project the way I would go about it is I would have a single interface for getting words and putting back words, and multiple classes for reading and writing. Something like:

interface IFileReaderWriter : IDisposable
{
    bool MoveNext();
    string Current { get; set; }
}

void TranslitFile(string fileName)
{
    using (IFileReaderWriter readerWriter = CreateReaderWriter(fileName))
    {
        while (readerWriter.MoveNext())
        {
            string word = TranslitWord(readerWriter.Current));
            readerWriter.Current = word;
        }
    }
}

string TranslitWord(string word)
{
    // returns a translit version of the incoming word
}

IFileReaderWriter CreateReaderWriter(string fileName)
{
    if (Word97ReaderWriter.IsValid(fileName))
        return new Word97ReaderWriter(fileName);
    if (Word2007ReaderWriter.IsValid(fileName))
        return new Word2007ReaderWriter(fileName);
    if (PdfReaderWriter.IsValid(fileName))
        return new PdfReaderWriter(fileName);
    :
}

class Word2007ReaderWriter : IFileReaderWriter
{
    public static IsValid(string fileName)
    {
        // return true if file is a Word2007 file
    }

    public Word2007ReaderWriter(string fileName)
    {
        // open the file
    }

    public override bool MoveNext()
    {
        // write Current word to the file
        // read the next word into Current
        // return false if end of file
    }

    public override string Current { get; set; }

    public override void Dispose()
    {
        // flush all changes to the file
        // close the file
    }
}


Was This Post Helpful? 0
  • +
  • -

#32 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3320
  • View blog
  • Posts: 11,229
  • Joined: 12-December 12

Re: How to work with word document?

Posted 13 January 2013 - 12:51 PM

Quote

So, you mean, that I can use Microsoft.Office.Interop to open and get the content of the file and save it as a new document and then using StreamReader to modify it?


I mentioned this before I clarified translit. This process is possible but you've mentioned modifying again. You haven't clarified: Do you intend to modify the Word document, or to create a new, perhaps Word?, document?

Similarly, I mentioned Regex before understanding 'translit'. Regex is to search complex patterns of text, so it's possibly not relevant to your purpose.

Skydiver will be able to assist you more, as I have no knowledge of the translit process. Andy.

This post has been edited by andrewsw: 13 January 2013 - 12:53 PM

Was This Post Helpful? 0
  • +
  • -

#33 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 13 January 2013 - 04:21 PM

Sorry, I tried it many time but it doesn't work...

I can't get the nextChar...

if (wordApp.Selection.Find.Found)
           {
           wordApp.Selection.Collapse(WdCollapseDirection.wdCollapseEnd);
           nextChar = wordApp.Selection.Range.Characters[1].ToString();
                      
           FindAndReplace(wordApp, nextChar, "@@@@");
           
              
           }

It doesn't get the next letter.. it gets some com object..

Yes, I need to modify the text, but I'm saving the modifyed text in a new Word document..

View Postandrewsw, on 13 January 2013 - 12:51 PM, said:

Quote

So, you mean, that I can use Microsoft.Office.Interop to open and get the content of the file and save it as a new document and then using StreamReader to modify it?


I mentioned this before I clarified translit. This process is possible but you've mentioned modifying again. You haven't clarified: Do you intend to modify the Word document, or to create a new, perhaps Word?, document?

Similarly, I mentioned Regex before understanding 'translit'. Regex is to search complex patterns of text, so it's possibly not relevant to your purpose.

Skydiver will be able to assist you more, as I have no knowledge of the translit process. Andy.

Was This Post Helpful? 0
  • +
  • -

#34 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 3531
  • View blog
  • Posts: 10,935
  • Joined: 05-May 12

Re: How to work with word document?

Posted 13 January 2013 - 04:42 PM

If you are creating a new word document, how are you moving across the formatting? For example, the title is 18 point Times New Roman centered, if you are just writing out the new letters and text, how is the formatting getting written out?
Was This Post Helpful? 0
  • +
  • -

#35 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3320
  • View blog
  • Posts: 11,229
  • Joined: 12-December 12

Re: How to work with word document?

Posted 13 January 2013 - 04:58 PM

nextChar = wdApp.Selection.Characters[1].Text;

to replace the character..

wdApp.Selection.Range.Characters[1].Text = "@@@@";

There is no FindAndReplace(?!) method; if there were, you would need to precede it with wdApp., wdApp.Selection. (or similar) anyway.

As Skydiver points out, you'll need to consider the formatting as well. It will either default to Normal, default para, or (helpfully) assume the style of the current paragraph. Unfortunately/fortunately, there are about 20 different ways to include formatting :)
Was This Post Helpful? 0
  • +
  • -

#36 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3320
  • View blog
  • Posts: 11,229
  • Joined: 12-December 12

Re: How to work with word document?

Posted 13 January 2013 - 05:29 PM

If it does default to some other formatting then something like the following might work:

nextChar = wdApp.Selection.Characters[1].Text;
Range charF = wdApp.Selection.Characters[1].FormattedText;
charF.Text = "@@@@";


It should(!) keep the formatting that the original letter had, but you'll need to explore.
Was This Post Helpful? 0
  • +
  • -

#37 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 3531
  • View blog
  • Posts: 10,935
  • Joined: 05-May 12

Re: How to work with word document?

Posted 13 January 2013 - 06:40 PM

View Postasem0525, on 13 January 2013 - 02:42 PM, said:

I'm just wondering, why whould you translit by words? Is it faster?

I choose to translit a word at a time because of the following reasons:
- Any special case words are looked up and replaced by one lookup.
- If the word is not a special case word, then I would be able to replace individual letters within the word.
- There is better locality of access by having an entire word in memory rather than having to remote one character at a time from the unmanaged Word code to my managed C# code, and then back to unmanaged Word code again.
- I make implementing the individual IFileReaderWriter easier when they have to deal with formatting, specially for the cases when a replacement word is longer or shorter than the original word.
Was This Post Helpful? 1
  • +
  • -

#38 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 03:41 PM

View PostSkydiver, on 13 January 2013 - 04:42 PM, said:

If you are creating a new word document, how are you moving across the formatting? For example, the title is 18 point Times New Roman centered, if you are just writing out the new letters and text, how is the formatting getting written out?


I'm just opening the document modify it and Saving it as a new document. So, actually I don't deal with formatting...
The code for Finding and Replacing the letters is like this:

private static void FindAndReplace(Microsoft.Office.Interop.Word.Application WordApp,
                                    object findText,
                                    object replaceWithText)
        {
            object matchCase = true;
            object matchWholeWord = false;
            object matchWildCards = false;
            object matchSoundsLike = false;
            object nmatchAllWordForms = false;
            object forward = true;
            object format = false;
            object matchKashida = false;
            object matchDiacritics = false;
            object matchAlefHamza = false;
            object matchControl = false;
            object read_only = false;
            object visible = true;
            object replace = 2;
            object wrap = 1;

            WordApp.Selection.Find.Execute(ref findText,
                ref matchCase, ref matchWholeWord,
                ref matchWildCards, ref matchSoundsLike,
                ref nmatchAllWordForms, ref forward,
                ref wrap, ref format, ref replaceWithText,
                ref replace, ref matchKashida,
                ref matchDiacritics, ref matchAlefHamza,
                ref matchControl);
        }

Was This Post Helpful? 0
  • +
  • -

#39 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 03:54 PM

I'm feeling really guilty, but I just can't figure it out. It still doesn't get the nextChar, it get the first Character of my file... however I modifyed my code it does the same...

private static void FindIyYa(Microsoft.Office.Interop.Word.Application wordApp,
                                                object findLetter)
       {
           object missing = System.Reflection.Missing.Value;
           string nextChar;


           object matchCase = true;
           object matchWholeWord = false;
           object matchWildCards = false;
           object matchSoundsLike = false;
           object nmatchAllWordForms = false;
           object forward = true;
           object format = false;
           object matchKashida = false;
           object matchDiacritics = false;
           object matchAlefHamza = false;
           object matchControl = false;
           object read_only = false;
           object visible = true;
           object replace = 2;
           object wrap = 1;

           wordApp.Selection.Find.Execute(ref findLetter,
               ref matchCase, ref matchWholeWord,
               ref matchWildCards, ref matchSoundsLike,
               ref nmatchAllWordForms, ref forward,
               ref wrap, ref format, ref missing,
               ref replace, ref matchKashida,
               ref matchDiacritics, ref matchAlefHamza,
               ref matchControl);
           if (wordApp.Selection.Find.Found)
           {
               nextChar = wordApp.Selection.Characters[1].Text;
                Range charF = wordApp.Selection.Characters[1].FormattedText;
                charF.Text = "@@@@";

               
  
           }


View Postandrewsw, on 13 January 2013 - 04:58 PM, said:

nextChar = wdApp.Selection.Characters[1].Text;

to replace the character..

wdApp.Selection.Range.Characters[1].Text = "@@@@";

There is no FindAndReplace(?!) method; if there were, you would need to precede it with wdApp., wdApp.Selection. (or similar) anyway.

As Skydiver points out, you'll need to consider the formatting as well. It will either default to Normal, default para, or (helpfully) assume the style of the current paragraph. Unfortunately/fortunately, there are about 20 different ways to include formatting :)/>

Was This Post Helpful? 0
  • +
  • -

#40 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 04:17 PM

Thank you very much, Skydriver
Do you thing that translitting word by word is better solution even for files with about 500 pages?
I just thought that letter by letter translit works faster than getting every word and then check for exception and then translit it...
My going to write my program like this:
First I find some special letters.. like "i" then check for the next character and apply the rule.. if the next character is not the letter of the rule case then I need to get the whole word and check with my database of exception... The exception are only the words that contain "I" and "C" letters, so I don't need to ckeck all the words..

and then after I translitted special cases, I just translit the letters, so wil run my FindAndReplce function for every letter of alphabet (42 times)..
What do you think about my approach?

View PostSkydiver, on 13 January 2013 - 06:40 PM, said:

View Postasem0525, on 13 January 2013 - 02:42 PM, said:

I'm just wondering, why whould you translit by words? Is it faster?

I choose to translit a word at a time because of the following reasons:
- Any special case words are looked up and replaced by one lookup.
- If the word is not a special case word, then I would be able to replace individual letters within the word.
- There is better locality of access by having an entire word in memory rather than having to remote one character at a time from the unmanaged Word code to my managed C# code, and then back to unmanaged Word code again.
- I make implementing the individual IFileReaderWriter easier when they have to deal with formatting, specially for the cases when a replacement word is longer or shorter than the original word.

Was This Post Helpful? 0
  • +
  • -

#41 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3320
  • View blog
  • Posts: 11,229
  • Joined: 12-December 12

Re: How to work with word document?

Posted 14 January 2013 - 04:22 PM

You've dropped the line that collapses the selection to the end of the (found) text:

            if (wordApp.Selection.Find.Found) {
                wordApp.Selection.Collapse(WdCollapseDirection.wdCollapseEnd);
                //nextChar = wordApp.Selection.Characters[1].Text;
                Range charF = wordApp.Selection.Characters[1].FormattedText;
                charF.Text = "@@@@";
            }


nextChar is not currently used but you might keep the line in case you need to use it.

This post has been edited by andrewsw: 14 January 2013 - 04:23 PM

Was This Post Helpful? 0
  • +
  • -

#42 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 05:20 PM

Now, even with the collapse line it gets the first Character of The document...
How to get the range after finded letter?
//nextChar = wordApp.Selection.Characters[1].Text;


View Postandrewsw, on 14 January 2013 - 04:22 PM, said:

You've dropped the line that collapses the selection to the end of the (found) text:

            if (wordApp.Selection.Find.Found) {
                wordApp.Selection.Collapse(WdCollapseDirection.wdCollapseEnd);
                //nextChar = wordApp.Selection.Characters[1].Text;
                Range charF = wordApp.Selection.Characters[1].FormattedText;
                charF.Text = "@@@@";
            }


nextChar is not currently used but you might keep the line in case you need to use it.

Was This Post Helpful? 0
  • +
  • -

#43 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3320
  • View blog
  • Posts: 11,229
  • Joined: 12-December 12

Re: How to work with word document?

Posted 14 January 2013 - 05:42 PM

object replace = 2;

The value 2 is to replace all. My previous understanding was that you were just using Find and then performing the modifications manually.

This post has been edited by andrewsw: 14 January 2013 - 05:42 PM

Was This Post Helpful? 0
  • +
  • -

#44 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 05:55 PM

I converted my wholoe project to VB and applyed the code you provided above, and it worked!!!

  If wordApp.Selection.Find.Execute Then
            'was the Find successful?
            wordApp.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
            nextChar = wordApp.Selection.Range.Characters(1).Text
            MsgBox(nextChar)
        End If


View Postandrewsw, on 14 January 2013 - 05:42 PM, said:

object replace = 2;

The value 2 is to replace all. My previous understanding was that you were just using Find and then performing the modifications manually.

Was This Post Helpful? 0
  • +
  • -

#45 asem0525  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 29-December 07

Re: How to work with word document?

Posted 14 January 2013 - 06:04 PM

View Postandrewsw, on 14 January 2013 - 05:42 PM, said:

object replace = 2;

The value 2 is to replace all. My previous understanding was that you were just using Find and then performing the modifications manually.


I have replace the "2" to "0" and it also worked!!!! :) Thank you!
Was This Post Helpful? 0
  • +
  • -

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4