C# - best way to approach problem

Mr.Plough · 22-06-2012 11:22AM #1

Hi,

I'm developing an object recognition program and need some advice on how to best approach this. The problem is as follows: I have a glove hanging from the ceiling with fingers pointing downward about a meter in front of a relatively plain surface. What I want to do is locate the positions of the fingertips of the glove.

I have a depth image of the scene so for every pixel in the image I have a corresponding depth value in millimeters.

My plan was to loop through each pixel beginning at the bottom of the image, and when I encounter a significant change in depth, this represents a new object. I then have to determine whether this object is the glove or not.

Here's where my head begins to melt:
Assuming the first part of the glove that I would encounter is the tip of the middle finger (as it is the longest and fingers are pointing downwards), I would then;
- check if depth is similar for pixels directly upwards (as I move up the middle finger).
- check if depth is not similar say 2cm in either horizontal direction (gaps between fingers).
- check if depth is similar as I move further in each horizontal direction (index and ring fingers).

I'm not looking for someone to do this for me, but if someone could provide a good way to structure the code would be a huge help, because I'm unsure what would be the most efficient loops to do this with and how to best structure them. I'm a fairly novice programmer (good choice of thesis then :rolleyes:)

Or any advice on better ways to approach this would be appreciated.

Thanks

srsly78 · 22-06-2012 01:35PM

There are libraries built specifically for this: OpenCV is one -> http://en.wikipedia.org/wiki/OpenCV (lol at the example picture, it's exactly the same as your problem)
Doing it yourself is indeed head melting as you have discovered.

Colonel Panic · 22-06-2012 02:20PM

It's not much of a thesis project if he just hooks up a library to do all his dirty work.

padraig_f · 22-06-2012 05:25PM

It sounds similar to OCR, so you could look at some open-source OCR libraries or OCR algorithms and see how they work. You'd get credit for that in the thesis as well, if you document the research and say how you used it, or adapted it to your own problem.

Off the top of my head how I might tackle it is draw an imaginary line from the bottom of the image towards the top until you hit a certain depth (which represents the hand). Take the length of that line and store it in an array. Do that for every pixel along the bottom of the image. Then you have an array of the lengths of these lines, and you profile that array to match certain characteristics (i.e. it has 5 peaks for the fingers, and 4 valleys in between, for the gaps). You configure the profiling function with parameters to adjust how tolerant it is. You test with the image, and adjust the parameters, or add more parameters if necessary.

e.g. array looks something like:
[0,0,0,0,74,75,76,75,74,50,49,50,74,75,76,75,74.....0,0,0,0]

where 75 (plus or minus 1) represents tips of the fingers, and 50 (plus or minus 1) represents the joints in between.

The profiling function is still difficult, but what you have now is a 2-d graph so maybe you can use some mathematical technique that takes a 2-d graph and interprets the characteristics of the curve.

Have a look at edge-detection image-processing algorithms as well. There are some relatively simple algorithms to do this (though I'm sure some complex ones as well), and it's in the same ballpark as what you're doing.

srsly78 · 22-06-2012 10:32PM

Colonel Panic wrote: »

It's not much of a thesis project if he just hooks up a library to do all his dirty work.

Nope but it's an open-source library with lots of documentation so he can take inspiration from it.

Mr.Plough · 23-06-2012 02:19AM

padraig_f wrote: »

It sounds similar to OCR, so you could look at some open-source OCR libraries or OCR algorithms and see how they work. You'd get credit for that in the thesis as well, if you document the research and say how you used it, or adapted it to your own problem.

Off the top of my head how I might tackle it is draw an imaginary line from the bottom of the image towards the top until you hit a certain depth (which represents the hand). Take the length of that line and store it in an array. Do that for every pixel along the bottom of the image. Then you have an array of the lengths of these lines, and you profile that array to match certain characteristics (i.e. it has 5 peaks for the fingers, and 4 valleys in between, for the gaps). You configure the profiling function with parameters to adjust how tolerant it is. You test with the image, and adjust the parameters, or add more parameters if necessary.

e.g. array looks something like:
[0,0,0,0,74,75,76,75,74,50,49,50,74,75,76,75,74.....0,0,0,0]

where 75 (plus or minus 1) represents tips of the fingers, and 50 (plus or minus 1) represents the joints in between.

The profiling function is still difficult, but what you have now is a 2-d graph so maybe you can use some mathematical technique that takes a 2-d graph and interprets the characteristics of the curve.

Have a look at edge-detection image-processing algorithms as well. There are some relatively simple algorithms to do this (though I'm sure some complex ones as well), and it's in the same ballpark as what you're doing.

Interesting, I'll look into this. Even if I don't use this its good to have a variety of possible solutions to write about.

In the shower today I was thinking of the following;

Get the physical glove, and draw a template around it on paper, placing various points at different locations, making sure to place points between the fingers also. I then convert these point locations from mm to pixels.

Then have one of the points at X,Y, and the rest at X + or - cx and Y + or - cy, where cx and cy are the distances in pixels of the other points from the XY point.

Then starting at X = 0 and Y = 0, loop through pixels with something like

if depth at glove points are all the same AND depth at gap points are significantly different from those at glove points

break

and you've found the glove. Essentially template matching but could be pretty versatile and work when there are other objects in the scene. I'm using the microsoft kinect sdk so I can easily transform from pixel to global coordinate systems using built in functions.

Won't get near a computer until sunday so will update then when I no doubt run into problems!

C# - best way to approach problem

Comments