Computer vision algorithms (how is this possible?)

Question

I recently stumbled across a company that has created what appears to be a computer vision technology that is capable of detecting shoplifting automatically and alert its users.

LINK

Watching some of the videos and examples provided by the company has left me completely baffled and amazed as to how on earth they may have achieved this functionality.

I understand that no-one here will be able to tell me exactly how this may have been achieved but is anyone aware - and could point me to - research in this field or alternatively perhaps provide details as to how something like this could be implemented or guidance of where one might start?

My understanding was the computer vision algorithms were many years away from being this sophisticated. Is this sort of application really possible? Anyone willing to hazard a guess at how they achieved this?

This doesn't seem so difficult. Games detect collisions all the time between objects; why couldn't you detect collisions between a person and a shelf of items, and then raise an alarm when that person was walking towards the door without paying? — Robert Harvey, Dec 11 '12 at 23:41
Exactly. It's just object recognition and collision detection. Unless they hook it up to the scanner it is easily over come by moving the objects over the scanner but just slightly above it. The object will have appeared to of collided with the scanner but in fact did not. — Andrew T Finnell, Dec 11 '12 at 23:43
Anyway, none of the detection mechanisms described at the website (sweethearting, basket-loss and self-checkout) require anything even remotely that sophisticated. They check in a very confined area (the cashier counter), and can cross-check items seen in the basket against what the bar-code scanner is saying was actually scanned. — Robert Harvey, Dec 11 '12 at 23:44
I'm sorry I'm confused. Lets take the sweathearting example. I have two items, one a low cost item, one a high. I put the low cost item under the high cost one and scan. At that point we can compare what was scanned in the POS system to what is visible on the camera in the hand of the cashier but that requires the system being able to "understand what is being put in the bag" against hundreds of thousands of potential items through a camera of marginal quality. This seems extremely complicated. What am I missing? — Maxim Gershkovich, Dec 11 '12 at 23:48
@MaximGershkovich: The video shows the "sweethearted" items being passed by the checkout counter without being scanned. The camera has the ability to count the number of items and (according to the video) the type of item it is (e.g. meats). Presumably the sweethearted item was present in the camera view at least once, or the camera can detect the presence of an elongated shadow on an item, indicating a hidden item underneath. I can think of several such checks. — Robert Harvey, Dec 11 '12 at 23:54
Ok sorry perhaps I used the wrong term. I agree that might be a bit easier to detect but taking the example provided I still couldn't imagine how an automated system would be able to detect that and yet this system claims it is capable of doing exactly that... (Sorry made edit at the same time as you, lol) — Maxim Gershkovich, Dec 11 '12 at 23:56
Well, the detections occurring in the video certainly seem plausible. Perhaps you are expecting 100% accuracy; I doubt that you'll get that. — Robert Harvey, Dec 11 '12 at 23:57
I think you're assuming too much about how well this works. I bet you there's a decent error rate, and it's likely very easy to game the system. I see this as more of a whistle blower type system, where it just identifies potential places in the video that need human review. As such, inaccuracy is well tolerated. — chris, Dec 12 '12 at 00:14
@chris I agree, the videos are misleading as to what they can detect and how it works. It's more like a demonstration video not a real usage one (I assume). — Alex, Dec 12 '12 at 02:45
@chris Yes, and I'm guessing there's a setting for whether to prefer type I or type II errors (false positives vs. false negatives). If the software isn't accurate enough overall, perhaps no setting will work well at all. — Daniel B, Dec 12 '12 at 06:27
@RobertHarvey How exactly do you think they figure out which pixels are a person and which pixels are a shelf of items? The video game already knows everything about the person and the shelf. Seems a bit odd to respond to a question about how something is accomplished with computer vision with a comment about how it's trivial to accomplish in a system where computer vision is irrelevant. — 8bittree, Dec 10 '19 at 20:47

score 5 · Answer 1 · answered Dec 12 '12 at 01:07

You're misinformed about the state of the art. Several years ago I worked for a company that built such systems for a variety of purposes. One was an extremely successful airport egress-control system, which could easily tell the difference between a person walking the wrong way down the exit hallway and things like balls in motion or people headed the right way. Recognizing objects in a scene in real-time isn't easy, but we were doing it on embedded CPUs, not on supercomputers.

I didn't see anything there that wasn't believable a few years ago.

Mike MacMillan · Answer 2 · 2012-12-12T23:34:22.550

4

Actually this company uses a hybrid of computer vision and manual review in India. It is not pure computer vision especially for elements like sweethearting. In fact I know one retailer who has quite a problem with this system not due to the system performance I store but the bandwidth shipping video to India. This manual coding is how they reduce errors and is a typical tech inquest with some vendors now.

edited Dec 12 '12 at 23:34

answered Dec 12 '12 at 23:25

Mike MacMillan

41
2

Computer vision algorithms (how is this possible?)

2 Answers2