1

I have an open-source library that provides multiple services around a particular domain. I would like to know how the various components are being used so that I can prioritize updates and new features.

The primary way of collecting that data in applications is usually through anonymous stats collection. But as a user of libraries in my other work, I don't think I'd appreciate that kind of data collection. Also there is a whole rat's nest of issues (legal and otherwise) I can get into when I'm trying to retrieve the data. See this and this for reasons not to implement anonymous data tracking.

The only other option I can think of is creating an online poll and linking to it in the documentation, but that doesn't seem like it'd be very effective to me because it takes time for the user to complete.

Are there any other ways to collect anonymous usage stats for my library that I'm not considering?

gregsdennis
  • 259
  • 2
  • 7
  • If you get data **only** about your library that cannot be tracked in any way to the caller of your library I think you're fine, it's really useful information. – Ignacio Soler Garcia Jun 15 '18 at 09:01
  • 4
    Never take the opinion of a "FOSS Advocate" to in any way reflect the views of the majority of OSS users. They are 100% tin hat brigade. If you want to collect anonymous data, just: (1) make it clear that the system can collect that data if the user is happy for it to do so, (2) make it opt-in (never collect without permission and assume you do not have permission until explicitly told you do) and (3) give the users access to the data collected so they can see what you are collecting is what you say you are. – David Arno Jun 15 '18 at 09:19
  • 2
    well, collecting usage data is really frowned upon. And you will have to conform to various privacy laws around the world, most notably the GDPR. That's not impossible (especially if you only collect truly anonymous data), but takes a lot of work. There are easier ways to solve your actual problem (prioritizing your work): Ask users to vote on your issue tracker. Take a look at questions in your support channels. As an open source maintainer, I'm well aware which issues have priority just from user feedback, though I'd love to know which features are literally unused and can be removed. – amon Jun 15 '18 at 10:19
  • @amon, I already use the issue tracker. It's good for questions/bugs/feature requests, but I don't get a sense for how the library is actually being used. – gregsdennis Jun 15 '18 at 10:56
  • @gregsdennis: is there a support forum for your lib where users can ask questions freely? That is the best source of information you can install. And by the way, this has nothing to do with open-source, there is absolutely no difference to closed source software products. – Doc Brown Jun 16 '18 at 11:55

1 Answers1

2

There are several ways to collect statistics automatically, but the problem is how to get that information back. If you choose to collect statistics automatically, then I recommend the following:

  • Have the statistic gathering easily turned on or off.
  • Allow the user to control where the statistics are stored.
  • Provide the tools for users to inspect and make use of that data themselves. Chances are that people who use your library are just as interested in how much it is used as you are. This keeps you in the open source mentality
  • Make the submission of that information voluntary, or part of your bug reporting

Things that will severely limit who can use your library are:

  • Automatic transmission to an undisclosed server
  • Assumption that the library will even be used on a network that can connect to the internet

Security audits look for things like that, and if your library is considered a risk, you gain a very bad reputation that is hard to shake.


All that said, the most reliable way to determine if you have users that use a particular feature is to threaten to remove it. It won't make you popular, but the silent users will speak up. If no-one says anything, it's a safe bet to remove. If they do, then you have the option to start a dialog to see what the real needs are and if there is a better way to resolve it.

Berin Loritsch
  • 45,784
  • 7
  • 87
  • 160