3

The type of data I'm hoping to collect is a little specific, and unfortunately I'm under NDA and the data is a core part of the business plan so I'm not at liberty to post it online. I've come up with a similar example, so please bare with me and pay no attention to the flaws of this hypothetical service.

Say I make an online "school finder" service, where the user enters their address and the service finds and scrapes the tuition cost of all schools within say a 100km distance from that address. The service would display the list of schools sorted by price, and the user would select the school they are going to attend based on that information. They would also specify any other contributing factors, such as the aesthetics of the campus and the reputation of the school (things like this would be selected from a drop-down menu or something, a small set of non-personally-identifiable options). The school finder would passively collect this data (as opposed to actively seeking out every school everywhere and scraping the qualities of it from their public web page) and use it to both provide better search results, and also display a map centered on the user's region, representing that data somehow. It would point out, based on past usage, the most desired schools based on a number of factors (say, cheapest and most desired would be green, most desired for other reasons would be yellow, and seldom selected schools would be red).

Most of the data is just passively collected and is otherwise publicly available, but some of it (the reason the school was picked) is somewhat personal, as well as the user's home address being personal information. The only publicly visible data would be the "rating" of each school, based on a selected region. The only saved data would be the rating and cost of each school, as well as the postal code (or city, whatever) associated with that rating.

The service has no user accounts, so it's not feasible to ask every user to agree to a privacy policy for every request. Is it legal/ethical to collect and release this set of information without explicit consent?

Carson Myers
  • 2,480
  • 3
  • 24
  • 25
  • 2
    There's no point asking whether something's legal unless you say which jurisdictions might apply. Your profile says you're in Canada, but for all we know you're working for a French company which will host its servers in India, so there could be three or more sets of laws to consider. – Peter Taylor May 24 '11 at 10:32
  • Legal depends on your jurisdiction. Ethical depends on your ethics. Both of these are wildly variable so you will be unlikely to get a useful answer to this question. –  May 24 '11 at 12:49
  • Good point -- I was hoping that if it was legal/illegal in _most_ places, I could gauge the amount of caution I need when opening the service to new areas, or opening it at all. Mostly I was hoping for comments about the relatively small amount of information I would store from users about each region, namely their choice from a small collection of options which is tied to the city they live in, rather than them – Carson Myers May 25 '11 at 20:42
  • 4
    I'm voting to close this question as off-topic because it is a question of legality (which we, as programmers, are not qualified to answer) or ethics (which are even harder to nail down than business requirements from a client on vacation). –  Oct 17 '15 at 03:14

6 Answers6

4

Is it legal/ethical to collect and release this set of information without explicit consent?

Check with a lawyer. That's what they are there for. However, if you want a hint, most web sites usually display a privacy policy that describes how user data is handled. Certain laws might require you to disclose such a policy. Furthermore, having a policy does not imply that you are not violating any law; that's why check with a lawyer.

As far as implementation of policies are concerned, it wouldn't matter if you are finding it difficult to implement or not. You can't just wish away certain provisions of any law because you don't like it. I suppose a lawyer would say the same thing, so the following point is moot.

The service has no user accounts, so it's not feasible to ask every user to agree to a privacy policy for every request.

TLDR: Check with your lawyer. If your lawyer says so, do so without hesitation.

waiwai933
  • 743
  • 7
  • 12
Vineet Reynolds
  • 653
  • 4
  • 9
  • Okay, I can see that check with a lawyer is going to come in from all directions. Thanks for the advice, I will make sure everything checks out before aggregating any data. – Carson Myers May 24 '11 at 06:05
  • Knowing lawyers, they will say that you should ask the user to agree. That's because they primarily look at the potential costs of not asking, and only secondary at the direct costs of asking. – MSalters May 24 '11 at 11:52
  • Given that submitting their choice of school, in this case, comes after their request for schools nearby, would it be sufficient to put a small disclaimer next to the "submit" button when I ask them to take their pick? – Carson Myers May 25 '11 at 20:43
  • @Carson, I'm not sure what information you would be storing about users, so I can't offer advice on whether disclaimers are sufficient. I suppose you would have heard of [California SB 1386](http://en.wikipedia.org/wiki/SB_1386) and the [EU 95/46/EC directive](http://en.wikipedia.org/wiki/Data_Protection_Directive), but if you haven't, now would be a good time to read them. Put that in conjunction with Peter Taylor's comment against your question, and you would know how difficult it would be to comply with these directives based on a mere internet discussion. – Vineet Reynolds May 26 '11 at 04:10
4

Legal issues aside, I'd like to consider the ethical standpoint, because even if something's legal you might still consider it unwise.

Usually privacy is considered to be breached if details are tracable to one individual (or a member of a certain family). And this certainly is the case if you use and save full addresses. But since you are searching quite a large area (100km) perhaps it is ok to have a precision of less than the full address, for example neighborhood. One way of doing this is allowing people to enter just their city or town, or over here partial or full zip, or street name without number.

A different approach is using a lat/lon calculation that rounds precision to 1km (and perhaps give people the option to extend this to 10, noting that it is more anonymous but less accurate.)

For the reasons, you might opt for selecting a set of standard items (this will also probably simplify matching) and include an extra text field for extra information. The standard items can be used as anonymous info. For a text field there are several options, just explain next to the field what you are going to do with it (publicly but anonymously list (and where you list it, with the area or the school), use it to extend your set of standard items, use them for private research (like going over them yourselves and distilling useful information out of it) etc.

For the users this is very simple, they can choose their privacy options by simply entering or not entering certain information or selecting their level very specifically.

As for all the legal implications I wouldn't know, but I personally consider an application that would implement those options very ethically sound, and I believe it is possible without really compromising on accuracy.

Inca
  • 1,534
  • 10
  • 11
  • +1 for a very good definition of a privacy breach, and also for customizable levels of anonymity. I've used similar services without hesitation in the past if they only require a zip code to search for distance. If they require a street address I will usually give it a pass. – Karl Bielefeldt May 24 '11 at 15:03
  • Of course the lawyer is required, but I would think that collecting "blurred out" data should work (like for photos). And bonus point of the lat/lon, this way lies geo-hashing, and it's very efficient for neighborhoud requests :) – Matthieu M. May 24 '11 at 17:57
  • Right, I just re-read my question and failed to mention that they wouldn't be entering anything personally identifiable. Only their choice from a small collection of options would be stored, and it would be tied to their city rather than their address (the address would only be used, for example, to show the user the distance from their house to the school, when their search results come back). So in your opinion, this is ethically sound? – Carson Myers May 25 '11 at 20:45
1

If you collect and release any aggregate of information, you will need to check with a lawyer to confirm that what you're doing with the data is OK; certain types of data are protected no matter what, and the laws will vary from state to state and nation to nation.

You will also want to put a relatively easy-to-reach and easy-to-read privacy policy where a user can read it, with the links at the points where you collect the data, even if you don't force an agreement each time.

But in short, vet your plans with an attorney, make sure you're in compliance with the laws where the data is hosted and processed as well as, as best you can, with the laws where the users are located. Make sure you publish a procedure whereby a user can get his data removed from your collection, if he can identify it at all.

Finally, don't be a weasel with mined data, because as Google and Facebook can both attest, even the rumor of data misuse can bite you back.

Rob Perkins
  • 521
  • 4
  • 17
  • Ah, I figured as much for the data that isn't freely available (by factoring in the users' patterns and preferences) -- is the same true for passively collecting data, for example, when a user requests all schools in a 100km radius, we can store the public information from those schools even though the request was made with (but not stored with) the user's location? – Carson Myers May 24 '11 at 05:46
  • I am not a lawyer, but I suppose it hinges on whether you CAN use the data to identify an individual, and the RISK involved in being the person with that power. I don't personally imagine that there is any risk in copying publicly available information, but there are differences of opinion about that. – Rob Perkins May 26 '11 at 18:04
1

I'll volunteer the contrarian view here... Don't bother checking with a lawyer unless you've money to waste for shallow advice from someone who will charge you an arm and a leg.

If you're thinking you might end up doing some unethical stuff by collecting anonymized information, I cannot help but laugh at what you must be thinking about Google or Facebook.

Online advertisement businesses have successfully lobbied digital privacy rights into irrelevance on the west side of the Atlantic. The assumption in online interaction is opt-out; not opt-in. And even opt-out, to a large extent, is not enforced -- because unenforceable.

I've no Google account to speak of, nor have I ever allowed the latter to collect any kind of information on me when logged out (which is always). But they do so extremely actively and aggressively on a daily basis. They do so through their own site when I search, through Google Analytics on untold numbers of sites, through my contacts' gmail accounts, and many more. To serve me ads, no less.

So, please. If they can do it, common sense dictates that can you too. Add a brief notice in your site's legal section that says what you're collecting, how and why, make it clear that it's anonymous information, and begone with it.

Denis de Bernardy
  • 3,913
  • 21
  • 18
  • I don't like this approach. Google et al likely consulted several lawyers regarding their practices, and have had several lawsuits against them regardless. Additionally, I used an example to illustrate the magnitude of the data that I would be collecting, hoping to get an answer based on the small amount of information I'd collect from each user. So far, the other answers have said to take the safe route, while this one isn't too convincing to the contrary. I'm not downvoting though, because your point about legal expenses is definitely valid. – Carson Myers May 24 '11 at 08:11
  • Safe tends to be very expensive. Especially when lawyers are involved. :-) – Denis de Bernardy May 24 '11 at 08:17
  • 1
    yes I agree, but unsafe tends to be more expensive, especially when lawyers are involved :P – Carson Myers May 24 '11 at 08:18
  • Very true as well, lol. Ever heard this lawyer joke? Guy calls up his lawyer: "I'd like to speak to Mr XYZ." Secretary answers: "Sorry Sir, he died last week." "Oh!" Hangs up. Next day, exact same thing. And the next. And another. The secretary eventually asks why he bothers calling every day. The guy answers: "I just enjoy hearing you tell me he's dead." :D – Denis de Bernardy May 24 '11 at 08:24
  • hahaha, good one. I'll make sure to bring that up the next time one of my friends says they want to be a lawyer for the good pay :P – Carson Myers May 24 '11 at 08:25
1

Legal? Ask a lawyer.

Ethical? I'd say no, except where it is vital for the operation of your service that you collect this data, and you do everything you can to keep the data anonymous and confidential.

In fact, pretty much everyone who runs a web server does it, and not even anonymously - but it is generally accepted that access logs are more or less a necessary part of normal server operation. Ethics mandate you keep them stored in a safe way and handle them confidentially, and destroy them after a reasonable amount of time.

For things that go beyond standard web procedure, such as storing personal data and linking it to various other things, I'd say ethics mandate that you tell your users exactly what you are collecting and why, and you give them a chance to opt out before you do (even if this means they can't use the service).

tdammers
  • 52,406
  • 14
  • 106
  • 154
  • The data is vital to the service, so much so that I wasn't even allowed to talk about it under NDA -- the data _is_ the service. FWIW, it's not the kind of personal data that would appear in an access log, the only part of the user that is re-displayed to other users is their choice from a discrete set of options related to one school that they chose to attend. That selection would appear to other users when they search for schools in the same or a nearby city as the original user, as it will affect the way they are sorted in the results (or displayed on a map). – Carson Myers May 25 '11 at 20:51
  • The key question is, is collecting this information from user X required to provide the service to the same user X? If it isn't (and even in most cases where it's not), the ethical thing to do is ask permission. – tdammers May 26 '11 at 19:30
1

Can't comment on the legal issue, that way lies madness and largely depends on where you are and how much your net worth is.

Ethically, if you can protect the privacy of the users, I don't see a problem, but doing that is harder then it looks. Remember not to release any data that doesn't have a large enough and diverse enough pool to obscure the individual. If it's just a rating, meh. But with the cost and the region in there, if there's only one person from Ohio in the theoretical college, then you'll know exactly what he paid. If 99 people give out a good vote, and just one gives out a bad, sometimes it'll be obvious to the people in the know who that one person is. Also, if you never release the cost of the school to a user, why store it at all?

As a cop out, just tell them in big bold letters that this information will be public and the burden is theirs.

Technically though, if there's no account system and anyone can submit whatever they feel like, you can expect this system to be abused. Say, by a theoretical school giving themselves AAA ratings, or a disgruntled ex-professor down-voting his old school each day.

Philip
  • 6,742
  • 27
  • 43
  • The cost stored wouldn't be associated with any region, as it doesn't change depending on where the user is. But say the user clicks on their state or city on a map and some green, yellow, and red points show up within a 100km (or so, it could be configurable) radius that just represents whether other users said "I picked this because it was cheap and close", or "I picked this because it's cheap but getting there is a pain," etc. The only personalized part would be an amalgamation of the cost and what they picked from a drop down, represented as a single color. – Carson Myers May 25 '11 at 20:33
  • I'm really asking because this sort of thing feels like a grey area, it's not like "read reviews of this school," or "this school got 4 out of 5 stars," it's so not-personal that I'm really not sure if it classifies as personal information. Also, regarding abuse of the system: it's just an example, as I wasn't at liberty to ask about the real data we're collecting, but it really amounts to "what you clicked out of a small, discrete set of not-really-personal items that is only tied to your general region." You have a good point about not releasing the data until you have a good amount, though. – Carson Myers May 25 '11 at 20:36