12 Replies - 15843 Views - Last Post: 28 March 2011 - 01:26 AM Rate Topic: -----

#1 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Build in an OCR in my application

Posted 16 May 2010 - 03:04 AM

Hi!

I'm trying to build an application wich i can add images to (scanned papers) etc. and get out the text.

But i need help :P

How can i add a OCR libary to my application and how do i get the text from OCR into a textbox? :)

I'm new at this field in coding and i'm just kidding around trying to get better at each aspect of C#.

- Thanks for all answers!

Is This A Good Question/Topic? 0
  • +

Replies To: Build in an OCR in my application

#2 Bacanze  Icon User is offline

  • D.I.C Head

Reputation: 36
  • View blog
  • Posts: 202
  • Joined: 09-April 10

Re: Build in an OCR in my application

Posted 16 May 2010 - 04:01 AM

I imagine it's not an easy thing to implement from scratch, however OCR functionality was included in Office 2003, 2007 etc. Luckily you can use the API from MS Word, check out:

http://www.codeproje...ffice/modi.aspx
Was This Post Helpful? 0
  • +
  • -

#3 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Re: Build in an OCR in my application

Posted 16 May 2010 - 04:14 AM

I was thinking more like how to use example this:

http://www.pixel-tec...eware/tessnet2/
Was This Post Helpful? 0
  • +
  • -

#4 Rico Diesel  Icon User is offline

  • D.I.C Head

Reputation: 62
  • View blog
  • Posts: 122
  • Joined: 06-May 10

Re: Build in an OCR in my application

Posted 17 May 2010 - 01:37 AM

I have actually built in an OCR engine into my code, it is not very hard... The analysis of a document is a more interesting aspect of OCR'ing. I used the OCR engine of Microsoft Office 2003 (MODI). It has a nice COM interface and is easy to include in your VS project. (Took me about ten minutes to have something working and OCR'ing)

The Tesseract engine still leaks memory if you use it in your main process
See this link. A suitable solution seems to use it in a separate process.

But it all depends on what kind of documents you want to OCR/Analyse. If it is strictly computer generated documents the MODI engine is fine (200 DPI and up is almost flawless), if you want to do also handwritten stuff and be able to learn your engine something you should go for a more refined solution like Tesseract or OmniPage (I believe the MODI engine is based on an old version of the OmniPage engine, but not totally sure).

If you decide to go for MODI I can provide you with some pointers and code examples. Hope this helps

Rico
Was This Post Helpful? 1
  • +
  • -

#5 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Re: Build in an OCR in my application

Posted 17 May 2010 - 05:34 AM

I was thinking more computer pages, not handwritten.

MODI should work good :)

I would loved some pieces of code, yes :)
Was This Post Helpful? 0
  • +
  • -

#6 Rico Diesel  Icon User is offline

  • D.I.C Head

Reputation: 62
  • View blog
  • Posts: 122
  • Joined: 06-May 10

Re: Build in an OCR in my application

Posted 17 May 2010 - 07:08 AM

Okay, first of all... do you have the Microsoft Office Document Imaging stuff installed on your system? (I mean fully installed, not install on first usage, cos then it's not gonna work).
After you made sure that you this, you have to add a reference to the MODI library in your project.

1. Right click 'References' in your solution explorer and select 'Add reference...'
2. Goto the 'COM' tabpage and find: 'Microsoft Office Document Imaging 11.0 Type Library(or 12.0)'
3. Select this and press 'OK', if everything goes well you should see MODI in your reference list
4. Okay, here is a bit of code to get you going:

MODI.Document MyDoc = new MODI.Document();
MyDoc.Create(aFilename);
MyDoc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);  //this will start the actual OCR process, it will take some time
MODI.IImage myImage = (MODI.IImage)MyDoc.Images[0]; //first page in file
MODI.ILayout myLayout = (MODI.ILayout)myImage.Layout;

//Separate text fragments are stored in the myLayout.Words property
IWord myWord = (IWord)myLayout.Words[0];
//The IWord interface provides the text, but also info about the location and confidence

//... Actions with the text

MyDoc.Close(false); //Closes the document and deallocates the memory.



This should give you a little push in the right direction, more info about the methods & properties can be found here. It shouldn't be too hard to translate the VB code to C# code.
One more thing what could be interesting, the MODI library provides a viewer for the documents. If you want this component in your project you have to add it to your Toolbox. (Basically same trick as adding the reference only this time with the toolbox). As soon as you have a MiDocView setup on your form you can add an OCR'd document to it through the Document property. Other properties, methods and event can be found in the MSDN documentation.

MiDocView1.Document = MyDoc;



A couple of important things to know about the MODI engine.
1. It eats Bitmaps and TIF documents, .bmp takes forever is and horrible in memory usage and hogs your processor, .tif on the other hand works quite well and is friendly on your resources.
2. Check the DPI settings of your scanner, you will not get much valid data from low quality pictures. I work with a 200 DPI minimum, but with that you can only get the black on white text things, numbers are already difficult. 300 to 400 DPI is a lot better and sometimes grey text comes through. 600 DPI takes a little bit longer to process, but works like a charm.

Good, have fun with it and keep me posted on your results...

Rico
Was This Post Helpful? 1
  • +
  • -

#7 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Re: Build in an OCR in my application

Posted 17 May 2010 - 10:41 AM

Thanks for the helpful post! Rep. given.

So i need 2003 installed or can i use 2007?
Was This Post Helpful? 0
  • +
  • -

#8 Rico Diesel  Icon User is offline

  • D.I.C Head

Reputation: 62
  • View blog
  • Posts: 122
  • Joined: 06-May 10

Re: Build in an OCR in my application

Posted 17 May 2010 - 11:59 PM

View PostJustin Credible, on 17 May 2010 - 05:41 PM, said:

Thanks for the helpful post! Rep. given.

So i need 2003 installed or can i use 2007?


In case of 2003 it is MODI 11.0, 2007 is MODI 12.0. In Office 2010 the OCR functionality is deprecated from the COM libraries and is worked into MS Word.

Thanks for the rep.

Rico
Was This Post Helpful? 1
  • +
  • -

#9 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Re: Build in an OCR in my application

Posted 18 May 2010 - 04:57 AM

View PostRico Diesel, on 17 May 2010 - 10:59 PM, said:

View PostJustin Credible, on 17 May 2010 - 05:41 PM, said:

Thanks for the helpful post! Rep. given.

So i need 2003 installed or can i use 2007?


In case of 2003 it is MODI 11.0, 2007 is MODI 12.0. In Office 2010 the OCR functionality is deprecated from the COM libraries and is worked into MS Word.

Thanks for the rep.

Rico


I'ma go with 2007 then, gotta be more accurate.

Got exam tomorrow but will look into it after! :D
Was This Post Helpful? 0
  • +
  • -

#10 Rico Diesel  Icon User is offline

  • D.I.C Head

Reputation: 62
  • View blog
  • Posts: 122
  • Joined: 06-May 10

Re: Build in an OCR in my application

Posted 18 May 2010 - 07:47 AM

View PostJustin Credible, on 18 May 2010 - 11:57 AM, said:

View PostRico Diesel, on 17 May 2010 - 10:59 PM, said:

View PostJustin Credible, on 17 May 2010 - 05:41 PM, said:

Thanks for the helpful post! Rep. given.

So i need 2003 installed or can i use 2007?


In case of 2003 it is MODI 11.0, 2007 is MODI 12.0. In Office 2010 the OCR functionality is deprecated from the COM libraries and is worked into MS Word.

Thanks for the rep.

Rico


I'ma go with 2007 then, gotta be more accurate.

Got exam tomorrow but will look into it after! :D


Hehehe, what I can remember from the article on the web (can't find it anymore) it is the exact same engine only with a different version number :P.
Good luck with your exam!

Rico
Was This Post Helpful? 1
  • +
  • -

#11 Justin Credible  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 13
  • Joined: 07-May 10

Re: Build in an OCR in my application

Posted 18 May 2010 - 10:20 AM

ahh, kk.

Thanks!
Was This Post Helpful? 0
  • +
  • -

#12 abdulWahab1187  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 28-March 11

Re: Build in an OCR in my application

Posted 28 March 2011 - 12:49 AM

View PostRico Diesel, on 17 May 2010 - 01:37 AM, said:

I have actually built in an OCR engine into my code, it is not very hard... The analysis of a document is a more interesting aspect of OCR'ing. I used the OCR engine of Microsoft Office 2003 (MODI). It has a nice COM interface and is easy to include in your VS project. (Took me about ten minutes to have something working and OCR'ing)

The Tesseract engine still leaks memory if you use it in your main process
See this link. A suitable solution seems to use it in a separate process.

But it all depends on what kind of documents you want to OCR/Analyse. If it is strictly computer generated documents the MODI engine is fine (200 DPI and up is almost flawless), if you want to do also handwritten stuff and be able to learn your engine something you should go for a more refined solution like Tesseract or OmniPage (I believe the MODI engine is based on an old version of the OmniPage engine, but not totally sure).

If you decide to go for MODI I can provide you with some pointers and code examples. Hope this helps

Rico

Dear Rico, i am interested to learn from an builtin application, how can i process OCR algo, for "text/characters" from image, i am also searching over it, but i would definitely appreciate your MODI engine buitlin application, in c# code,,

thanks in advancce
Abdul Wahab
Was This Post Helpful? 0
  • +
  • -

#13 Rico Diesel  Icon User is offline

  • D.I.C Head

Reputation: 62
  • View blog
  • Posts: 122
  • Joined: 06-May 10

Re: Build in an OCR in my application

Posted 28 March 2011 - 01:26 AM

Abdul,

Check out post #6 of this thread, I provided Justin Credible with an example on how to use the MODI engine in C# code. This should also be enough for you to start with. The code I wrote, which uses the MODE engine, is owned by a previous boss of mine and under no circumstance I can provide you with it (legal consequences and it would also be against the rules of the forum).

If you decide to give it a shot yourself and you get stuck, simply post your code with a clear explanation what you are trying to do (in a new thread) and I and many others here are willing to help you out.

Rico
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1