Are the various branches of computer vision completely distinct?

Question

Computer vision includes:

Character recognition (converting an image to text)
Face recognition (detecting a face in an image)
Object recognition (detecting various objects in an image)
Human recognition (detecting a human shape in an image)
Motion detection/object Tracking (detecting motions/movement)

All of these are among the various branches of computer vision. All of these can use machine learning to train the program to detect things. So wouldn't they all be very similar, in that the only difference would be to tell the program to look for what?

Like if I am trying to detect Text, the program would be trying to separate each letter and depending on the joint, curve and shape, determine which letter it is.

And if I am trying to detect a Face, the program would be looking for facial features like eyes, nose, mouth and face shape.

And if I am trying to detect an Object, the program would be looking for borders.

And if I am trying to detect a Human, the program would be looking for head, body and limbs.

And if I am doing Motion Detection, it will basically be object detection while capturing its movement.

I know I am overly simplifying things, but my point is - suppose someone is trying to create a library/program for computer vision and creating a learning program with some parameters (what to detect), would it work? Would it be too complex? Am I overlooking some technical barriers that would stop this being possible?

anyone who thinks this question is inappropriate kindly state the reason in comments — 6119, May 12 '16 at 07:38
The trouble with "I'm not sure how this works" questions are that they are not fully clear in what you're trying to ask - obviously. Some (not me) think that makes them not suitable. I've given an answer.. I hope it was the kind of answer you were looking for. — gbjbaanb, May 12 '16 at 08:10
ok......but I wasn't asking how it works.....I was suggesting if it would work like this......thanks though, I will try to frame my question clearer next time. — 6119, May 12 '16 at 09:17

score 1 · Answer 1 · answered May 12 '16 at 08:07

Consider what an "object" is - could an object be a face or an aircraft? Of course it could, so technically, everything is an object. But a face looks nothing like an aircraft, so you need to specialise it to detect the differences.

Now I know that when it comes to face detection, there are already many parameters you have to pass to detect different types of faces - beards are different to spectacle wearers, hats can screw the detection. If you look in the OpenCV haar cascade directory, it has trained xml files for eye, eye_tree_glasses, frontalcatface, profileface, fullbody, lowerbody, and even licence plates.

So your detection routines are generic as there's only one routine (eg haar or lpb classification) you use, but you have to pass it quite a lot of differently trained parameters.

CV is all about "object" detection at the algorithm level, but very specialised at the "training" level.

Motion detection is either tracking an object by detecting it frame by frame, or it is detecting changes in frames without any tracking of what is moving. I'd say that is a separate topic from CV.

thanks, exactly what I was saying, that if I pass it a trained xml to detect face, the program would detect face.....if I pass it a trained xml to detect license plate, it would detect license plate, if i pass it a trained xml to detect alphabets, the same machine learning program would be able to detect alphabets.....Of course motion detection or tracking is a separate topic as it involves a different functionality. — 6119, May 12 '16 at 09:24

Robbie Dee · Answer 2 · 2016-05-12T10:45:28.777

There are certainly themes in computer vision but the differences in the problem domain makes it highly unlikely that a common engine is desirable for all but but the most advanced applications (robotics) or even possible in the short term.

Text recognition is arguably the simplest of the spheres but even here you have to account for problems like variation in font, font size, contrast, print quality etc.

Face recognition is relatively simple provided the picture is taken facing the camera. An algorithm detects where the various facial elements should be and by checking contrasts, can verify with a reasonable degree of accuracy whether there is a face in the picture.

Object detection is slightly more problematic. Take a cat for example. This could be in a variety of poses in a variety of colours and against a variety of backgrounds. A hugely complex problem. So much so in fact that Google has taken the opposite approach. Their algorithm took in pictures that were known to contain cats and reverse engineered a more precise algorithm. The final thing was so complex that it was beyond what a human could reasonably understand.

Similar to the cat problem is identifying humans - specifically when one adds in common accompanying object such as wheelchairs, clothes, vehicles etc.

Motion detection/tracking while obviously related has an entirely different set of technical hurdles to overcome.

Are the various branches of computer vision completely distinct?

2 Answers2