OCR and comparison

TomTom · 06-12-2010 11:18AM #1

I am posting here in hopes that A) I have the right section and

you can help me.

I am faced with an issue where I work where we get the status of 3rd party devices displayed on a screen. We get no info from it in numerical format and it is a closed source program. We need to create reports based on the status of these devices and in the past we just sat down and took a manual note of what device was in what state. The issue we are now faced with is they want the reports to be more detailed, so this manual task has to be done more often.

We are pushing for access to the back end that drives this system but to be honest that could take quite some time. What we do at the moment is run a program that takes screen captures at set intervals during the day so we can sit down at a quite time and compare them. This is now proving to be a little to time consuming.

What I was hoping to find was a program that could use OCR to identify each label and then export it into a spreadsheet with the corresponding flag or yes or no depending on the colour of the text. A programming friend of mine explained how this was possible and I know it would be myself but in order to get something passed to be installed on the network it has to be a commercial grade product with support of something of an open nature that can be examined internally if required.

So after all that waffle, can anyone make any suggestions?

NeverSayDie · 06-12-2010 12:39PM

You don't mention the platform, but if this is a Windows app for instance, you can extract details from other running application's windows programmatically. There's a pretty extensive API in Windows for accessing windows and the controls (child windows) in them, injecting or reading data, manipulating messages, etc. I've used that kind of thing before to insert input into the GUI of another program - eg, filling out text boxes (as if from user keyboard input) in another arbitrary app I didn't have any other access to. I was using it for a simple automation application (filling out a stored username/password to some arbitrary apps' login windows), which along with automated GUI testing is probably the main use for this kind of thing.

AFAIK it shouldn't be too difficult to do the opposite and extract details back out, though this does very much depend on how the app you're trying to talk to is built - eg, if it's not using normal Windows text, this probably won't work (though you'd still use these techniques as a source for your visual-based solution, I guess). Setup behaviour would be on a similar basis to the Spy++ tool that comes with Visual Studio - interrogate windows (based on the user's selection) to get the window handles of the controls you want to extract data from, and then programatically access them later on to gather your data.

Here's a few articles on the general topic that should help;
http://www.codeproject.com/KB/dialog/windowfinder.aspx
http://www.codeproject.com/KB/miscctrl/UICtrlDataSpyTool.aspx
http://www.codeproject.com/KB/windows/ReadScreenText.aspx

Google should turn up plenty more. Most of those articles discuss doing this in C++ Win32, though if you want to do it in .NET, P/Invoking works fine for most of those APIs in my experience.

I guess I'd use Spy++ (or roll your own) to interrogate this app's windows and try and figure out its structure, which should give you a better idea of how you could manipulate it.

amen · 06-12-2010 01:33PM

or if the data you are interested in is always on the same location on the screen you could get the pixels at that location and figure out what they are and then record the data

Freddio · 06-12-2010 06:22PM

If there was any sort of a backend (even text logs) which could be used, it would be far easier to develop than OCR which is never 100% accurate (in the printed format anyway)

OCR and comparison

Comments