30

A long-time client has asked us to help screen their work machines for pornography. They're worried about liability if sensitive materials were found. Their main concerns (for obvious reasons) are video, audio, and image files. If possible, they'd also like to scan text-based documents for inappropriate content. They have a hierarchy of not-for-work content starting with blatantly illegal (I don't have to list details), moving down to obviously offensive, and also including things that may be offensive to some - think lingerie ads, joke cards featuring butt cracks, and anything related to Howie Mandel.

My questions are:

  • Is this ethical? I think it is since every employee legally agrees that their work machine belongs to the company and is subject to search. The screenings are not to occur on personal machines brought to work.
  • Is it feasible? I've done a lot of image processing/indexing but this seems like a whole new world of complexity.
  • Any references to successful techniques for discovering porn?
  • Is it appropriate for me to archive the results when something is discovered?
Scant Roger
  • 9,038
  • 2
  • 29
  • 47
  • 29
    Whose porn will be using to test this? – ChaosPandion Mar 03 '11 at 05:56
  • 12
    I want to be a tester for this project!! – Mayank Mar 03 '11 at 06:04
  • @Mayank, you could apply for one of the "experts" Marting refers to [in his answer](http://programmers.stackexchange.com/posts/54169/edit). – P Shved Mar 03 '11 at 06:44
  • 56
    Create a script that posts all images it finds on 4chan; if other members answer "MOAR!", you know it's porn. If the script gets banned, it's probably CP. – user281377 Mar 03 '11 at 07:29
  • 8
    You'd have to think there's umpteen million commercial products available for this already. – GrandmasterB Mar 03 '11 at 07:32
  • 2
    I have a number of techniques for discovering porn. The most discrete of which is to use Chrome incognito ;) – Tim Bender Mar 03 '11 at 09:25
  • 2
    You're mixing "ethical" with "legal"... that "every employee legally agrees that their work machine belongs to the company and is subject to search" makes it only legal, but not ethical. Ethical is subjective. What's ethical to one person can be not ethical for another. – fretje Mar 03 '11 at 10:58
  • @ammoQ I have to say, that is probably the technically most promising solution. Throw in some pre-screening and this could actually work. Crowd-source to 4chan. ;-) – Konrad Rudolph Mar 03 '11 at 11:10
  • 34
    Honest question: is this actually a likely problem? Porn on the work computer? I mean … who does that? Furthermore, how do they intend to handle accidental porn content? My GF actually had a virus on her work PC recently which redirected arbitrary Google queries to porn sites and ever so often I will accidentally type “python.com” [NSFW!] instead of “python.org” … What’s more, if this is actually a problem, I think this betrays a more fundamental trust and / or professionality problem in the company. Address that instead of searching the computers. – Konrad Rudolph Mar 03 '11 at 11:14
  • 3
    @Mayank sure, you can do the gay bestiality test cases – jk. Mar 03 '11 at 11:49
  • 3
    @Konrad Rudolph - This is a very real and very likely problem. This happens all the time. Most companies try to use firewalls and proxies to block it at the source, but that doesn't stop people from bringing in media with that stuff on it. – Joel Etherton Mar 03 '11 at 11:56
  • 2
    @Joel I don’t want to come across as a prude … but this sounds absolutely depraved. Watching porn at work? Just whoa. – Konrad Rudolph Mar 03 '11 at 12:00
  • @Konrad Rudolph - I don't do it, but having been the person required to do the harvesting you wouldn't believe what people do on their computers when they have unfettered access. Most people believe they have the same anonymity on their work computer that they have on the internet, and we all know what anonymity on the internet does to people. – Joel Etherton Mar 03 '11 at 12:03
  • 3
    Amen to mobile porn \o/ – Filip Dupanović Mar 03 '11 at 12:15
  • @ammoQ How are you going to fix the captcha problem, then? – AndrejaKo Mar 03 '11 at 12:58
  • 5
    Jokes aside, if this is a large company it is very likely that somebody has porn on their computers that most of you volunteering would find disgusting. What the company does to the employee is up to the company, but the distinction between accidental porn and a porn collection is probably fairly clear. – David Thornley Mar 03 '11 at 17:15
  • 3
    @Konrad: simple search on the dailywtf: http://thedailywtf.com/Articles/The-Wrong-Thing-to-Say.aspx with its famous quote *So what does the company consider a healthy amount of kiddie porn?* – Matthieu M. Mar 03 '11 at 18:09
  • 3
    @MatthieuM. Wow. I had no idea. Where I live (planet Earth) kiddie porn is considered one of the worst offenses imaginable. A company taking such a laissez-faire attitude is just not thinkable. If someone got caught doing that, they’d not just be fired, they’d go to jail (do not pass go, do not collect $200). – Konrad Rudolph Mar 03 '11 at 18:18
  • 1
    @Konrad: honestly I don't know what would happen where I live and work. I personally find it disgusting... and I just love the flippant remark :) – Matthieu M. Mar 03 '11 at 18:42
  • @AndrejaKo: CAPTCHAs aren't a problem. Obviously, [you can use people's desire to view porn to solve them](http://blogs.msdn.com/b/oldnewthing/archive/2004/03/16/90449.aspx). Oh, wait... – sbi Mar 03 '11 at 19:10
  • @Scant: I definitely see _technical_ problems with archiving this, because the machine it's archived on would be found by a scan, which would lead to an infinite archival recursion... – sbi Mar 03 '11 at 19:32
  • 2
    Somehow I read "A client wants us to *use touch screen machines* for pornography." Kinky. – Andrew Mar 03 '11 at 22:33
  • "Furthermore, how do they intend to handle accidental porn content?" Like my father getting links to hundreds of porn sites doing an innocent Google search with a term that happens (without him knowing it) to be used for a certain style of porn, when he intended to use another, similar, word. Especially a problem for people for whom English isn't their first language. – jwenting Mar 04 '11 at 08:03
  • Infamous case of the English pensioner, who googled for a crossword puzzle solution for [Onager](http://en.wikipedia.org/wiki/Onager), the clue? "Wild Asian Ass". Apparently the old gent was suitably horrified by his search results :) – Binary Worrier Mar 04 '11 at 14:43
  • 2
    Another example would be when I seemed to remember that Anne McCaffrey's Dragonriders of Pern trilogy included the idea of "blooding", where Dragons would bond with their riders over a meal. I typed "pern" and "blooding" into Google. Google, without asking, changed "pern" to "porn". You can imagine what the results were like. – user16764 Jun 30 '11 at 23:09
  • 1
    Porn on work computers? Who does that? Everyone knows that's what smartphones are for now. – Bart Silverstrim Nov 17 '11 at 22:21
  • @BartSilverstrim smartphones? We have tablets! – Sulthan Jul 08 '13 at 15:53

17 Answers17

125

You can do this with 90% Headology, 10% software.

Firstly, quietly scan employees computers, build a database of files and sizes for each employee.

Then leak a memo that all PC's will be scanned for questionable content, i.e. The bosses have a Shazam like program that can identify porn etc.

Then a couple of days later, scan the computers for files and sizes again. Look at any deleted files, are they movie or image files? Then those are the employees you need to keep an eye on.

Routinely scan those employees PC's for images and movies, and manually check them for questionable content.

Michael Kohne
  • 10,038
  • 1
  • 36
  • 45
Binary Worrier
  • 3,112
  • 2
  • 25
  • 23
75

This is an obvious neural network task. First you need a large training set of images selected by experts in your company.....

A more effective solution is to announce that you will be checking everyones machine for porn NEXT week/month/whatever, then write a simple app that just exercises the disk. I guarantee that the machines will have been cleaned by then.


ps - A couple of 'serious' points - you actually don't want to find anything.

If you do find a couple of images in a browser cache then perhaps they hit a bad link or a dodgy popup - remember the teacher fired over whitehouse.com? If you fire/discipline them for this then there is going to be a backlash from workers/union. How would your company work if every click had to be submitted to legal for approval before your workers researched a question or checked a price online?

If you find a stack of porn on a machine how are you going to prove it was put there by that employee? Do you have the sort of security and audit systems that would stand up in court? Do you use (or even know of) an OS where a system admin couldn't put them there and make it look like the user's files?

Plus in my experience the most common locations for porn stashes are on the laptops of CxOs and senior VPs.

It's much better to just arrange for the files to just vanish ahead of time.

Martin Beckett
  • 15,776
  • 3
  • 42
  • 69
  • +1 - although I would combine this with a %-based human check. Maybe randomly selecting 0.1% of the workstations for real inspection. – Drew Mar 03 '11 at 07:29
  • @nikie: Martin was setting up a joke about the training set. – Andrew Grimm Mar 03 '11 at 08:01
  • This is a very funny solution, and an accurate one :) – crosenblum Mar 03 '11 at 14:30
  • 10
    Wait. How did you know I put my porn stash on the CEO's laptop? – Jaap Mar 03 '11 at 17:39
  • 1
    I'd be surprised if people would _really_ delete all their porn when confronted with the prospect of their machines getting scanned. Programmer would probably do it, but IME other people are really, erm, "strange" regarding such things. However, you could combine this with Binary's idea and look closer at those machines where lots of MB's got deleted. Nevertheless, `+1` from me for the observation that you don't really want to find something. – sbi Mar 03 '11 at 19:21
  • Awesome answer. I wish I could accept more than one. – Scant Roger Mar 04 '11 at 03:59
8

This approach of controlling is certainly painful for both employees and IT people. Once anything enters inside the employee machine, there is no sure way of detecting it. You need to stop it entering in the machine at the first place.
The best known practice for this is obviously control over the sites/domain which can be visited. Such list must be available somewhere on the net. Other than this you can also track the number of images, videos the employee has downloaded and from where it has come.
There are chances that the material can come from other than web, like from external hard drive. There could be once a month random scan of the system where you can randomly pick some of the videos and images and check it manually. Not sure how it can be done. But automating of checking the images and videos is certainly out of scope and certainly will be erroneous.
Actually I am not very much with the idea of restricting employees from doing personal stuff. You should trust your employees for this. Your employees should be busy enough in the office so that they don't get any time for this. The more worries are is the employee not doing his/her work right? Or has s/he installed some cracked or hacked software?

Manoj R
  • 4,076
  • 22
  • 30
  • 1
    I agree that Developers - and other creative folks - shouldn't have machines that are locked down. However - and trust me when I say this - when you have 200+ employees processing workflow documents you do not want to give those guys _anything_ that can distract them, and including a browser. Yes, 90% of folks are hard working and won't be distacted, but that means you'll have 20+ gobshites pulling the piss and being unproductive. – Binary Worrier Mar 03 '11 at 08:04
  • 6
    those 10% will be unproductive anyway. If not browsing websites, then playing games, reading, goofing off, sitting around being bored, etc.). – jwenting Mar 03 '11 at 08:25
  • 2
    People either get their work done or they don't. They're easier to spot when you have 200 doing similar tasks that can be measured. – JeffO Mar 03 '11 at 09:50
  • 2
    In the US, there are legal issues involved with porn on company computers, and there are really serious legal issues involved with child porn. It's safest to have a no-porn policy and take steps to keep it off. – David Thornley Mar 03 '11 at 17:18
7

There are a number of products in the marketplace that perform "content filtering" of various forms. (A Google search on some obvious terms throws up some obvious candidates.) It is probably a better idea to use one of these products than building a lot of scanning / filtering software from scratch. Another option is to just watch at the borders; e.g. by monitoring external emails and web traffic. Again there are products that do this kind of thing.

While there is no doubt that it is ethical for a company to scan its computers for "bad stuff", this does not mean that there aren't issues.

First issue:

  • Determining what is and what is not "objectionable content" is subjective.
  • Software for detecting images, videos containing (let us say) "depictions of the naked body" is (AFAIK) likely to be unreliable, resulting in false positives and false negatives.

So ... this means that someone in your customer's organization needs to review the "hits". That costs money.

Second issue: There can be an innocent explanation. The file could have been downloaded by accident, or it could have been planted by a vindictive co-worker. If there is an innocent explanation, the customer's organization needs to be careful what they do / say. (OK this is not really your issue, but you might cop some of the backwash.)

Third issue: Not-withstanding that the company has a right to monitor for objectionable material, a lot of employees will find this distasteful. And if they too far, this will impact on employee morale. Some employees will "walk". Others may take protest action ... e.g. by trying to create lots of false positives. (Again, not really your issue, but ...)

Fourth issue: People can hide objectionable material by encrypting it, by putting it on portable or removable media, etc. People can fake the metadata to make it look like someone else is responsible.

Stephen C
  • 25,180
  • 6
  • 64
  • 87
  • 1
    The OP said this was for liability issues, which makes a lot of sense in the US. That means getting the stuff off the computers, not necessarily blaming people. – David Thornley Mar 03 '11 at 17:22
  • I'd say it was more than that. Consider the tail end of the list of "not for work" content in the question. It sounds like at someone has an "agenda" ... – Stephen C Mar 03 '11 at 22:18
  • @David: it's ALWAYS about blaming people. If you have a potential liability issue, finding someone to blame ("this person acted in violation of company policy, and we can prove it, so is personally responsible rather than we as a company") becomes the standard way of working. In fact it's what most people in positions of responsibility in a lot of companies spend a good part of their time doing, trying to find people to blame for whatever can possibly go wrong and ensuring noone can blame them for whatever trouble they happen to find themselves in. – jwenting Mar 04 '11 at 08:06
6

About legal aspects, in France:

The boss owns the computers and the internet connection: He can do whatever pleases him.

BUT, employee privacy cannot be violated. If a directory on the computer is labelled PERSONAL, the boss is not allowed to scan it.

The only way to bypass that is to get elements of evidence that employee stores illegal material and to get a court request a scan of computer (Note that pornography is not illegal in France.)

ChrisF
  • 38,878
  • 11
  • 125
  • 168
mouviciel
  • 15,473
  • 1
  • 37
  • 64
5

If the employees agreed that their work machine belongs to the company and is subject to search, then yes, this is legal. For proof, archival of the files would most likely be necessary.

As for how to actually find the material. You could:

  1. First and foremost, scan file names for a certain set of words (porn, lesbians, etc.)
  2. Scan text documents for the same set of words
  3. For images, you could find the average color of the image, and if that color happens to be within a range that most would refer to as 'flesh' colored, then flag the image (someone double checking these flagged images will most likely be necessary). Wouldn't want to report someone for an image that ends up being a family photo from the beach.

If you scan the files as they're entering the computer (e.g. have the program loaded on every work machine and log flagged cases to a central database), then I don't think it would be too obtrusive (other than the blatant distrust the employer clearly has for their employees).

With the video files, I'm not 100% sure. Possibly a similar approach as with the image scanning (choose random frames and scan for a certain level of 'flesh' color).

Scanning audio files seems like it would get into speech recognition, which is a whole 'nother can of worms. Scanning the file name, however, would be easy and could be done as with the documents, images, and video.

Ryan
  • 568
  • 4
  • 11
  • Yeah, I was thinking along the same lines. Flesh tones are hard with all the varieties. Not to mention a big old shot of someone's head (like my gravatar) is likely to trigger the flesh to non-flesh ratio warning. Awesome start, though. – Scant Roger Mar 03 '11 at 06:13
  • too much risk of false positives (depending in part on the business involved). – jwenting Mar 03 '11 at 08:23
  • There is a color space where most human skin tones fall in a given range. YCbCr if I recall correctly. Chop the image into blocks and if in most blocks the mean pixel value falls in the range, flag it as a "skin" photo. – Vitor Py Mar 03 '11 at 11:55
  • There's another problem. The person who is tasked with checking the video might sue you. I certainly would not want to be doing that. (Not all porn is to all people's taste.) – Christopher Mahan Mar 04 '11 at 05:13
  • The [Green Dam](http://en.wikipedia.org/wiki/Green_Dam_Youth_Escort) (yes, from the big brother government in the east) is purported to use OpenCV for its face detection capabilities. This will still generate a lot of false-positives, even when combined with skin tone detection. – rwong Mar 04 '11 at 08:22
4

There is significant, recent research into detection of pornography using conventional classification methods. Examples are available here, and here.

Nishant
  • 139
  • 5
4

As @Ryan said, image analysis can focus on color analysis.

Feasibility? My sister works in an area of the govt where they get some form of audit every year, and once it was for porn. She (geophysicist) had several false positives (pink rocks).

Rick Berge
  • 101
  • 1
  • 1
  • 3
3
  • Is this ethical?

Depends on the implementation and reasonable expectations of the employees. For example, if your software scans any machine connected to the network, then there's an additional requirement that infra needs to prevent unauthorized machines from plugging in. (Maybe that should be obvious, but it's frequently overlooked on networks I've seen.)

  • Is it feasible? I've done a lot of image processing/indexing but this seems like a whole new world of complexity.

Is it feasible to drug test every employee? Maybe so, but I question its worth. I would randomize it. Let employees know their machines may be scanned for inappropriate content at any time.

  • Any references to successful techniques for discovering porn?

I'm not touching this one. I don't think I could keep my sense of humor in check. But watch out for The Scunthorpe Problem when searching text.

  • Is it appropriate for me to archive the results when something is discovered?

This one concerns me the most, and I would ask a lawyer. I suspect if you find illegal content you may technically be legally obliged to disclose it. That's bad, particularly if the user was exposed by no real fault of his own. You(r client) will need real legal advice on how to handle this. Get HR and the lawyers involved.

kojiro
  • 2,105
  • 1
  • 15
  • 29
2

From a purely technical standpoint: This sounds like an object category recognition problem. I've never done anything like that, but from what I've read, state of the art category recognition systems work like this:

  • First you search for a large number of interest points (e.g. using a Harris Corner Detector, extremal points of LoG/DoG filters in scale space; some authors even suggest picking random points)
  • Then you apply a feature transform to each point (something like SIFT, SURF, GLOH or many others)
  • Combine all the features you found into a histogram (Bag-Of-Features)
  • Use standard machine learning algorithms (like support vector machines) to learn the distinction between object categories using a large number of training images.
nikie
  • 6,303
  • 4
  • 36
  • 39
2

Everyone's computer operating system was probably installed from a disk image.

  1. start with the disk image and get a list of files you probably don't need to scan.
  2. get a list of all the other files on each PC.
  3. pull the actual files from 10-20 random machines and use as a test bed
  4. search for items in a profanity and questionable word (hotties, jugs, 'barely legal', joke, etc) dictionary
  5. View the video - should anyone have any video at all?
  6. View photos
  7. Any video or image files that are questionable can be used to search the other machines

It will take one or two employees to get caught before any one will put anything on their work computer.

Charge an obscene amount of money for this service. I'll be Zappos would never do this to their employees.

JeffO
  • 36,816
  • 2
  • 57
  • 124
2

Assuming you are a domain admin on the network.

  1. C$ into each users desktop machine.
  2. Copy porn files into personal private share.
  3. Delete from original location.
  4. Make popcorn.
  5. Complete detailed analysis of all "evidence".
Anonymous Type
  • 414
  • 4
  • 11
1

I just wanted to comment, but only have 1 rep, so I can't.

In the case of Gravatar, you could add a function to filter out from a list of clean sites in internet cache locations. I.E. Gravatar and other sites you don't want false positives from. You could also filter out things like the desktop wallpaper. If they are displaying porn on the desktop you'd think people would notice outside of your audit.

a2j
  • 111
  • 1
  • 1
  • 3
1

Such things never work reliably. You can use a blocklist to block domains either on name or on being included on some list (a common practice). But those lists are never complete, and blocking on name based on criteria can lead to many false positives.

You can block on words appearing in the text of sites, but again this can lead to false positives (and gets very slow as you need to parse every single bit of data that passes through your network in order to detect "naughty bits").

you can block pictures (and maybe sites containing them) that show more than a certain percentage of skintones. But again it leads to many false positives. A university medical department blocking a medical encyclopedia with images of limbs and torsos showing wounds and skin conditions is a well known example of that. And of course it'd be racist as it'd only block certain skintones. If you block colours matching Caucasian skin, there's always porn using black actors for example.

Best just trust your employees, and have policies in place for when that trust is broken.

jwenting
  • 9,783
  • 3
  • 28
  • 45
  • personal proxy server, encrypted hidden partition, virtual machines. There's always a way to hide stuff. Of course, there's the android smartphones with 3g. Last I checked there's no way for an employer to stop their employee watching anything they want on their own phone with their own bandwidth. – Christopher Mahan Mar 04 '11 at 05:17
  • that's why technical means are useless, certainly without policy. If people know what's allowed and what isn't (and I can't think of an educated person who'd use a work machine for porn, even without such policies, but that's another matter) most will adhere to that. Those that don't will sooner or later get found out whether there's technical means in place or not (most likely someone will see something on their screen they weren't supposed to see while walking past). – jwenting Mar 04 '11 at 08:09
1

I don't know, there has to be a middle answer, that isn't as invasive, but solves the real issue, LIABILITY.

Have them sign a waiver, that releases the company of any liability for illegal stuff found on work pc's, that is not work related.

crosenblum
  • 997
  • 4
  • 12
  • I don't think the waiver would work in the US. I don't know about other countries. – David Thornley Mar 03 '11 at 17:21
  • Why wouldn't it work? If user's have the ability to download content, install software, then they naturally assume the liability for it. – crosenblum Mar 03 '11 at 17:47
  • And, if the company has the ability to filter out porn, which is generally assumed, and doesn't, it's partly the company's liability. – David Thornley Mar 03 '11 at 21:39
  • not everywhere. In some countries the company is responsible for whatever happens with any equipment they own, whether the employee was using it for its intended purpose or not. This applies to some extent to the US as well, in fact there've been attempts to sue companies for illegal use of their products after those products had been sold legally (see for example the constant lawsuits against firearms manufacturers for liability when their products are used in crimes, lawsuits that luckily usually get thrown out but sadly not always). – jwenting Mar 04 '11 at 08:12
1
  1. tell the user a url is considered adult - bluecoat proxy does that.
  2. License the thing google does in their image search http://code.google.com/apis/safebrowsing/ http://www.google.com/search?q=google+image+recognition+api
  3. scan the computer for items not in a pre-agreed list.
mplungjan
  • 231
  • 2
  • 10
1

Image and content analysis to determine the differences between a tasteful photograph of a person, a swimsuit photograph, a nude photograph, depictions of pornography... as far as I know is nowhere near sophisticated enough to do in software alone.

Fortunately crowdsourcing should be useful here, as @ammoQ suggested in a comment. However I don't believe members of 4chan or any other forum would appreciate the vast number of non-pornographic images, such as generic web graphics for buttons, frames, advertisements, etc. being posted.

My recommendation would be to look into existing crowdsourcing solutions, such as Amazon Mechanical Turk. (However the terms of service may explicitly prohibit the involvement of pornographic content, so be advised you might have to find another solution or roll your own.)

To make crowdsourcing feasible, your software should be prepared to do some or all of the following:

  • Store information that links the content with the computer it came from
  • Identify exact duplicates across the entire inventory and remove them (but origin information is retained)
  • Downsample images to some dimension, perhaps 320x200, which is sufficient to identify the content of the image without retaining unnecessary detail and wasting storage space/bandwidth
  • Create still images of video content at some regular interval and apply the same downsampling rule

Finally, the database of reduced images that represent the original image and video content is checked by users (or a designated team if you have the resources) according to your company's code of conduct. The program or interface might show a single image at a time, or a screen of thumbnails--whatever you deem best to obtain accurate information.

The identity of the computer from which images came should absolutely be secret and unknown to the persons evaluating the data. Additionally it should be randomized and each image probably checked more than once to remove bias.

The same technique could be used for text, but first the content could be scored by keyword rankings which remove the bulk of text from crowdsource review. Classifying a long document will of course be more time consuming than classifying an image.

JYelton
  • 734
  • 1
  • 6
  • 12