Computer vision includes:
- Character recognition (converting an image to text)
- Face recognition (detecting a face in an image)
- Object recognition (detecting various objects in an image)
- Human recognition (detecting a human shape in an image)
- Motion detection/object Tracking (detecting motions/movement)
All of these are among the various branches of computer vision. All of these can use machine learning to train the program to detect things. So wouldn't they all be very similar, in that the only difference would be to tell the program to look for what?
Like if I am trying to detect Text, the program would be trying to separate each letter and depending on the joint, curve and shape, determine which letter it is.
And if I am trying to detect a Face, the program would be looking for facial features like eyes, nose, mouth and face shape.
And if I am trying to detect an Object, the program would be looking for borders.
And if I am trying to detect a Human, the program would be looking for head, body and limbs.
And if I am doing Motion Detection, it will basically be object detection while capturing its movement.
I know I am overly simplifying things, but my point is - suppose someone is trying to create a library/program for computer vision and creating a learning program with some parameters (what to detect), would it work? Would it be too complex? Am I overlooking some technical barriers that would stop this being possible?