Unicode license

Question

The Unicode Terms of Use state that any software that uses their data files (or a modification of them) should carry the Unicode license references. It seems to me that most Unicode libraries have functions to check whether a character is a digit, a letter, a symbol, etc., and so will contain a modification of the Unicode Data Files (usually in the form of tables). Does that mean the license applies and all applications that use such Unicode libraries should carry the license?

I've checked around, and it appears that very few Unicode programs do carry the license, though arguably most of those that didn't carry the license were from companies that were members of the Unicode consortium (do they get license exemptions?).

Some (e.g., Mozilla) are only "Liaison Members", and while their software do not carry the license (as far as I can tell), they do obviously rely on data derived from those data files. Is Mozilla in breach of the license?

Should we carry the license in all apps that include any form of advanced Unicode support (i.e., are bound to rely on the Unicode data files)? Or is there some form of broad exemption (since very very few programs out there carry the license)?

I've forwarded this question to Unicode staff. I'll post the reply here when/if I get one.

FWIW Firefox has a lot of license texts in it: `about:license` — Thilo, Sep 28 '12 at 07:17
@Thilo Yes and none of them refer to Unicode AFAICT. The fact they acknowledge so many, yet not the Unicode one, was part of what prompted this question. — Eric Grange, Sep 28 '12 at 11:42
Firefox uses the ICU library. The Unicode license is not a viral license. — Hans Passant, Sep 29 '12 at 07:00
The Unicode License affects the data, not the code, if ICU includes the data, then so does Firefox. Unless thee is an exclusion clause, but I couldn't find any... — Eric Grange, Sep 29 '12 at 19:03

Arseni Mourzenko · Accepted Answer · 2014-11-18T16:28:24.130

Preliminary remark: I'm not a lawyer any longer, and never specialized myself in laws related to copyrights and intellectual property. If you want an unquestionable answer, you should consult a lawyer.

1. Data and data files are not the same

As it states, the exhibit 1 covers data files:

BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES [...]

Data files and data itself are not the same. When Microsoft implements uppercase and lowercase methods in .NET Framework, the unicode standard is used, but this doesn't mean that .NET Framework contains, somewhere, the files downloaded from http://www.unicode.org/

Simple illustration of a difference between the data and the support:

Imagine that I create a database with a list of countries, cities and the corresponding post codes. I expose this data through a web service and on my website.

The data itself is in public domain: you can't reasonably copyright the list of countries and ask every person who use such list to pay you or to distribute a copy of your copyright.

On the other hand, nothing forbids for me to enforce a restrictive license on the usage of the web service or the website (especially since I invested a lot of effort while creating this set of data). If I find that an application is scraping my website to download the data, this would be a copyright infringement, and I would be able to sue the person who created the scraper.

2. Data is too vague

If http://www.unicode.org/ stated that the license covers the data itself, it would be very difficult for this organization to enforce such copyright.

Imagine the following method:

public char ToUpper(char c)
{
    string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    if (upper.Contains(c))
    {
        return c;
    }

    string lower = "abcdefghijklmnopqrstuvwxyz";
    if (lower.Contains(c))
    {
        return upper[lower.IndexOf(c)];
    }

    throw new OutOfRangeException();
}

Is this a violation of the copyright? Did I actually used the data from http://www.unicode.org/ and I should include the copy of the license in my answer below? Or maybe I just typed those letters myself?

In other words, if data itself was licensed, how far the license could go?

3. Copyright and data

Here are some interesting quotes:

http://www.lib.umich.edu/copyright/facts-and-data: University of Michigan

Copyright law does not apply to facts, data, or ideas. [...]

However, copyright may protect a collection of data as contained in a database or compilation, but only if it meets certain requirements. Simply working really hard to gather the data [...] is not enough. [...]

In order for a database to qualify for copyright protection, the author has to make choices about the selection, coordination, or arrangement of the facts or data, and those choices must be at least a little bit creative. [...]

It is important to remember that even if a database or compilation is arranged with sufficient originality to qualify for copyright protection, the facts and data within that database are still in the public domain.

http://www.ands.org.au/guides/copyright-and-data-awareness.html: Australian National Data Service

A table or compilation, consisting of words, figures or symbols (or a combination of these) is protected if it is

a literary work and

has the required degree of originality.

[...] Copyright applies not to the facts/information itself, but to the particular way the facts/information are presented in the dataset or database.

Those two examples, one concerning USA, the other one - Australia, clearly shows that the data itself, i.e. the unicode symbols with their respective numbers and the attributes such as "is this a digit?" or "is this a capital letter from Cyrillic alphabet?" is not covered by the copyright.

Data files, on the other hand, may be covered by the copyright, depending on their originality. For example, the PDFs you find on http://www.unicode.org/ would be very probably covered by a copyright. If, on the other hand, it is purely question of a CSV associating lowercase characters to uppercase or vice versa, the author of such data would hardly be able to enforce the copyright on it.

Clearly, the ToUpper method I put above is not a violation of http://www.unicode.org/ copyright. Nor the code used by .NET Framework or Firefox, unless those systems contain somewhere inside the data files which are clearly, undoubtedly copied from http://www.unicode.org/ with, optionally, some minor changes.

1. The license covers a modification of the Data Files, it is doubtful that Microsoft f.i. recreated those from scratch, especially since those files are what make the very Unicode standard. — Eric Grange, Oct 01 '12 at 11:17
2. See EXHIBIT 1 details what the Date Files are, it doesn't look vague at all. — Eric Grange, Oct 01 '12 at 11:18
3. Data can be licensed, and certainly is (f.i. mapping data, or research data, http://www.dcc.ac.uk/resources/how-guides/license-research-data) — Eric Grange, Oct 01 '12 at 11:20
@Eric Grange: (1) doubtful or not, there is no formal proof that Microsoft used those data files as is. (3) I completely agree, data in general can be licensed. It's just that in those two particular cases I listed (i.e. names of countries and the English alphabet), you'll have hard time convincing the judge that this is *your* data and that it is covered by your copyright. — Arseni Mourzenko, Oct 01 '12 at 11:34
Excellent answer, and I would upvote it more if I could solely for the twist on "I am not a lawyer" (anymore). — , Oct 01 '12 at 11:48

score 1 · Answer 2 · answered Oct 01 '12 at 09:38

1

These files legally form a database, which means many jurisdictions it's treated not as a copyrightable work but subject to other kinds of protection. In particular, such jurisdictions will consider the (quantitative and/or qualitative) effort necessary for the compilation of such databases. See e.g. the European Database Directive.

AS an example, there's no creative decision involved in defining the relation between uppercase and lowercase letters. That particular table is therefore not subject to copyright in the EU, and since the Unicode Consortium isn't European either, it's not covered under the database right laws either. (There's no equivalent of the Berne treaty for database rights)

answered Oct 01 '12 at 09:38

MSalters

8,692
1
20
32

The wikipedia page you linked says the contrary: they are protected by copyright as collections... – Eric Grange Oct 01 '12 at 11:24
2

No, there's an explicit distinction made. As per the wiki, "Copyright protection is not available for databases which aim to be `complete`". And from unicode.org, "Unicode provides a unique number for `every` character". Collections are protected by copyright when they're a creative selection. E.g. ASCII would be, as it's a selection of 127 characters most useful in English, at least according to its creators. – MSalters Oct 01 '12 at 11:47

Unicode license

2 Answers2

1. Data and data files are not the same

2. Data is too vague

3. Copyright and data