15 August 2010

Is your data copyable?

Of all the concepts that data users have to concern themselves with, copying and the legal ramifications of copying has to be one of the most important. All technology is a product of copying but in the last 100 years media companies have invented and promoted the idea that copying is theft. It's one thing to have some difficulty and possibly incur some costs accessing and using data, it's an entirely different thing to have to worry about being prosecuted or sued for using data. Developers are rightly cautious about the liabilities associated with copying data.

The environment of caution around copying affects people releasing, promoting and using open data. If it's not perfectly clear to developers that they can use your data they often won't for this reason alone. Developers have enough to think about. If they have to wade through your license or terms of agreement to try to understand what all the ramifications are, they usually won't bother. If they decide to read it then even one clause that looks risky is enough to drive them away.

There are lots of interesting project ideas and few developers that can make them happen. If you want people to use your data, it makes sense to make that process as easy and risk free as possible. Fortunately, there are several ways to do that, but first, what exactly is the copyright protecting? When we talk about data in this context we are talking mostly about facts and collections of facts.

The interesting thing about facts when it comes to copyrights is that it's generally accepted that facts themeselves do not enjoy any copyright protection. The reason being that copyrights only generally apply to creative works, and facts, by their very defintion, are not creative. If I present a number to you as the truth then I am saying that I didn't make up the number, I at best discovered it, but it already existed. If I present a number to you and say I made it up - in other words I invented it from nothing, then it's not a fact, it's made up.

It's for this reason that the idea of licensing data then strikes me as odd. To grant someone a license to use something which they can already do is redundant. Attempting to restrict use by means of a license is then equally strange. If someone already has the right to use something then to restrict it would require their agreement, and why would anyone agree to fewer rights than they already have.

I think the USA is on the right track here. My preference is that all non-personal government data be released as public domain. Public domain is easy to understand and it fully opens the door to innovation giving developers the raw materials they can really use to create valuable apps.

No comments: