3 Replies - 394 Views - Last Post: 11 January 2018 - 10:56 AM

#1 tmasterslayer  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 12-January 10

Do I need to retrain Tesseract? Easier method?

Posted 10 January 2018 - 07:05 PM

I'm playing Tetris on a NES emulator, specifically a ROM that provides metrics at the end of the round. I thought it would be interesting to record them and see how they change as I become a better player.

I already have a script that makes entering the resulting metrics into a database easier, but it's still a bit repetitive.

I figured I might be able to use OCR to detect the values and enter them into the database automatically, using a screenshot from the emulator. Tesseract doesn't seem to recognize the font used in the game. I looked into retraining and it's far from a trivial task, especially since I'm not working with a regular font here (not that I know of at least).


Do I need to retrain tesseract to recognize this font? Is there an easier method to programmatically extract the values from the image?


Posted Image
When I run tesseract on an image of "019" it gives me "1319"
When I run tesseract on an image of "019" specifying digits in the image I get "1.119"

Posted Image
When I run tesseract on an image of "008" with colors inverted it gives me "flflfl"



Here is an image of the final metrics screen:
Posted Image

Is This A Good Question/Topic? 0
  • +

Replies To: Do I need to retrain Tesseract? Easier method?

#2 andrewsw  Icon User is online

  • blow up my boots
  • member icon

Reputation: 6544
  • View blog
  • Posts: 26,527
  • Joined: 12-December 12

Re: Do I need to retrain Tesseract? Easier method?

Posted 11 January 2018 - 03:22 AM

Related topic at SO, https://stackoverflo...-for-a-new-font

How is this related to Python?
Was This Post Helpful? 1
  • +
  • -

#3 tmasterslayer  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 12-January 10

Re: Do I need to retrain Tesseract? Easier method?

Posted 11 January 2018 - 09:35 AM

Sorry, I was using the tesser-ocr python library so when I was making the post I think I saw Python and decided to post there. Maybe this should be moved to "Other languages"?

Thanks for the link too.

This post has been edited by Skydiver: 11 January 2018 - 10:45 AM
Reason for edit:: Removed unnecessary quote. No need to quote the post above yours.

Was This Post Helpful? 0
  • +
  • -

#4 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 6164
  • View blog
  • Posts: 21,253
  • Joined: 05-May 12

Re: Do I need to retrain Tesseract? Easier method?

Posted 11 January 2018 - 10:56 AM

Since you know exactly what the header text is in each of those boxes; the box positions and text positions are fixed after each game, and further more; it's always numbers that you have to match, it maybe better to simply do simple bitmap overlay matching, instead of fullblown OCR. Basically go back to the days of old fashioned OCR where you had to print handwritten letters into specific boxes of a form.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1