1

I have an idea for a client-server software project which involves using information from Wikipedia, but I am not sure about what the conditions for using it in regards of licensing and copyrights would be in my case.

My planned architecture: On the server side there is a script which fetches articles from a wikipedia database dump, extracts certain pieces of data (numbers, names, dates, etc.) from them and writes those into a database. The main server application reads data from that database, does some processing on it and includes it as part of the service it provides to the client application (e.g. Android app).

My main question now is: Am I allowed to do this and if yes, under which conditions?

I have read through the Creative Commons Attribution-ShareAlike 3.0 and the GNU Free Documentation License, applying to Wikipedia articles. It is clear to me what the conditions are when using artices in textual form, but in my case I only have small ckunks of information taken from the text. Do the same restrictions still apply and if yes, how would I comply with them? One of the restrictions is for example, that if I modify the content of the article, I need to apply the same license to the result. Would that mean I need to put my database under the license or the individual pieces of data within? And how would attributing the source or the author work in this scenario? Storing a reference to the containing Wikipedia article for every single piece of data would not really be feasable for my current project concept. Furthermore what would it mean for the main application and the service it provides?

Would there be any additional problems or limitations if this was done as a commercial software?

I will start this project if I know that there will be no problems regarding legal issues, so it would be great if someone could clarify this for me or point me into the right direction.

1 Answers1

1

Am I allowed to do this

Yes. You can do whatever you want if it's legal in every other respect.

under which conditions?

You have to mention original authors (Wikipedia) and use the same license for your derived work, as described here: https://creativecommons.org/licenses/by-sa/3.0/

but in my case I only have small ckunks of information taken from the text. Do the same restrictions still apply

It depends:

  • For example, if you count the number of sentences in each article and store it as a sequence of integer values, I don't think it may be considered as a derived work, because you cannot really identify the original data with that.

  • On the other hand, if you copy some significant parts of the articles, it's a (partial) copy, not just a derived work.

  • If you're doing something similar to what DBPedia does, you might want to check how they deal with this license: http://wiki.dbpedia.org/terms-imprint (They do use the same license for their derived work.)

And how would attributing the source or the author work in this scenario?

You don't have to put a copyright statement to every record in your database. Just put a LICENSE file with a copy of the text of the license somewhere in the source code of your project. If there is a special page in your project's website, put the license there. Here is my favorite example of attribution page: http://jisho.org/about

Would there be any additional problems or limitations if this was done as a commercial software?

Yes, if you want to keep your source code private. Basically, you have to do it Open Source. You don't have to publish your code on the web, but you'd have to send a copy of it to anyone who might ask for it.

There are two options:

  1. If you want your code private, you can make your database (and the code which manages it) a separate Open Source project, while implementing the application which uses it as independent as it might be. That way you'll have two separate projects. I'm not 100% sure if such application would not count as "derived", because it still depends on the data. It is up to a person who interprets that, really, since there is no mathematically precise definition of "derived work".

  2. Make it all Open Source and still make money by selling services, not software.

And remember, I'm not a lawyer and cannot predict how well this may work. I'm only judging by what I've seen other people do.

scriptin
  • 4,432
  • 21
  • 32