Quantcast
Channel: Paul Miller - The Cloud of DataCreative Commons | Paul Miller - The Cloud of Data

Sun releases Creative Commons-licensed API to their new Cloud

0
0

On stage at CommunityOne East in New York just now, Dave Douglas (Senior Vice President, Cloud Computing and Developer Platforms) and Lew Tucker (CTO, Cloud Computing) just unveiled a RESTful API to Sun’s new Cloud.

Walk-on parts during the hour-long streamed presentation included two recent podcast victims; EUCALYPTUS Project Director Rich Wolski and RightScale CEO Michael Crandell, both of whom pledged support.

Sun’s official announcement is here.

Sun has also released A Guide to Getting Started with Cloud Computing, which offers a useful overview of the basic issues whilst relegating most of the Sun pitch to a separate section.

GoGrid have, of course, already released an equivalently licensed API of their own.

Reblog this post [with Zemanta]

John Wilbanks talks about Creative Commons, Data, Science and more

0
0
John Wilbanks, Science Commons
Image by mecredis via Flickr

My latest podcast is with John Wilbanks, the VP at Creative Commons with responsibility for their Science Commons project.

John has a varied background that includes founding a bio-informatics startup, Harvard’s Berkman Center, the World Wide Web Consortium and the US Congress.

In his current role at Science Commons, he is working to ensure that the outputs of publicly funded science become more available; both for other scientists to use, and for the wider public. The successes of the Open Access movement have led to greater visibility for scientific papers, but the data upon which those papers depend still tends to be difficult to locate.

We discuss initiatives at Science Commons and elsewhere, and consider some of the barriers to a change in approach.

Production of this podcast was supported by Talis, and show notes are available on their Xiphos blog.

Reblog this post [with Zemanta]

Licensing of Linked Data

0
0

As part of a workshop at this year’s International Semantic Web Conference (ISWC), former colleague Leigh Dodds prepared an interesting diagram on the ways in which resources comprising the Linked Data Cloud are currently licensed.

4043803502_7df222bedb

For various reasons, I was unable to make it to Virginia for the event, but a scan through the presentations from Leigh, Tom Heath (another former colleague), Jordan Hatcher (with whom I worked on earlier iterations of the Open Data Commons license), and Creative Commons‘ Kaitlin Thaney, it looks like they did a great job covering the bases on this critically important aspect of the evolving Data Web.

If we are to encourage the sorts of break-out use of data that Tim Berners-Lee and Tim O’Reilly were discussing at last week’s Web 2.0 Summit, we need to move past the current laissez-faire approaches adopted by too many and ensure that there are licensing regimes in place to enable, facilitate and encourage widespread re-use.

As Leigh’s work demonstrates, we have a long way to go. Only around a third of the current Linked Data projects are explicitly licensing their content, and several of those may be mis-licensing by applying copyright-based licenses such as those from Creative Commons to data not covered by copyright legislation.

With the current state of data licensing, and the reliance of CC0 and Open Data Commons’ Public Domain Dedication and License upon the public domain as a means of overcoming territorial differences, will we find enterprise use of Linked Data increasingly relying upon the more cumbersome tool of contract law… to the detriment of a free and flexible exchange of ideas?

Linked Open Data Rights Survey‘ image by Leigh Dodds, shared on Flickr under a Creative Commons license.

Reblog this post [with Zemanta]

Open is good – but encouragement better than mandate

0
0
English: Open Data stickers

Image via Wikipedia

Openness is undeniably cool right now, at least if you move in the slightly odd circles that I do. Openly available scientific papers are disrupting the world of scholarly publishing (which may not be all good, but that’s a post for another day). Openly available university courses are finally beginning to work out how to offer meaningful accreditation to students. Openly accessible data from government agencies around the world bulks out almost every data marketplace, and anchors many an analysis. Openly available code for cloud infrastructure or networking is challenging the hold of the tech world’s giants. Everywhere you look, ‘incumbents’ are apparently being ‘challenged’ and ‘disrupted’ by the power of open.

The truth, of course, is a little more complex and a lot more nuanced, as business models shift and evolve just like they always have. In sustainable systems, some people still need to be rewarded (often through being paid) for their effort. And in sustainable systems, paying someone can often be a pretty straightforward means of ensuring that you have a throat to choke if something breaks; big companies adopting open source often seek a proper financial relationship with someone who installs and maintains the ‘free’ software or hardware they’re depending upon.

One area of openness that I’ve been involved with for about ten years is that of open licensing for both creative works and data. And it’s come a very long way.

Here in Europe, for example, the (badly flawed) 2003 Public Sector Information Directive is under review, and there’s every likelihood that the replacement will make a number of sensible moves toward greater openness, transparency, and reusability for publicly funded data. As the EPSI Platform site notes today, Andrés Nin proposes going a step further than the European Commission is currently contemplating, by instituting a common open license across Europe;

“The creation of a single public information re-use space in Europe requires much more, it requires a common European OpenData license applicable to all data generated by European public administrations.”

I would certainly welcome a model license that European member states might be enabled to use. I’d also welcome — and support — vigorous efforts to dissuade individual member states or ministries from their usual practice of tweaking and otherwise modifying perfectly good documents in order to demonstrate how ‘special’ or ‘different’ their circumstances apparently are. When will they all realise that they are neither as special nor as different as they like to think?

But — and it’s a big but — it seems unwise, premature, and unhelpful to even begin to suggest that such a license might be mandated across Europe. It isn’t required, and attempts to develop a single document that everyone could accept would be an unhelpful distraction that would result in something so bureaucratic, so ringed in opt-outs and prevarications, as to be utterly worthless. It would also, in all likelihood, be one of those exercises in which the process very quickly subsumed the point. A prime candidate for, in the words of an old boss, being too busy to be effective.

Survey: How open is your data?

0
0

Open Data StickersBack in 2006 as we rolled out the first public draft of the Talis Community Licence, the world of data licensing seemed a simple place. Today, the Open Knowledge Foundation‘s Data Hub contains 3,888 data sets, many of which are explicitly licensed with respect to the Open Definition. But many are still not explicitly licensed. Over at the UK Government, there are 8,619 data sets today, and an assertion that “in general, the data is licensed under the Open Government License.” Too much still isn’t, of course, but they’re getting there. And then there are the many, many more data sets out on the web, not registered with repositories like the Data Hub or data.gov.uk at all.

More than four years on, how are we really doing?

As a scoping exercise for a larger project that I might be undertaking, I’d be really grateful if you could take a moment to fill in this brief survey [which will open in a new window or tab].

It simply sets out to assess the relative proportions of data that are not openly licensed, that are implicitly open, explicitly open with some home-grown statement, or explicitly open and using a recognised data license like CC0 or one of the Open Data Commons licenses.

We’ve seen a welcome burst of enthusiasm for ‘open’ release of data. This has been driven most visibly by government transparency agendas here and overseas. But libraries, the scholarly publishing community and others have also been enthusiastic adopters in recent years. Less welcome has been the sometimes rampant license proliferation. Everyone, it seems, finds something not quite right about one of the licenses on the table. Everyone, it sometimes appears, has a burning desire to create their own license that is just a little bit different, just a little bit closer to their world view. Everyone, perhaps, has a lawyer who sees the opportunity to write themselves a blank cheque alongside a new — ’better’ — license. Every local tweak to a common license, however well-meaning, is a barrier to interoperability. Every new license, however laudable the aims behind its creation, is a further complication to an already complicated picture; another excuse to wait rather than do. Although the meaning and the intent may be the same in all of these licenses, every different set of legalese requires careful — repeated — study as everyone else tries to work out whether or not some incompatibility or impediment has (unintentionally, we hope!) been introduced. Unconstrained license proliferation is, simply, bad.

So… I’ll be taking a look at figures from the Data Hub, data.gov.uk and elsewhere, to get some solid numbers on license proliferation, and on the geographies, domains and volumes in which each license is used. I’ll track all of that and more here, when it happens.

Until then, a couple of minutes of your time for the survey will be very valuable in setting the scene. I’d also be grateful for anything you can do to get your peers to complete the survey themselves. The more data we get, the clearer a picture we’ll see. I’ll provide updates on progress with this survey as your responses begin to come in, and make all the results available here.

And if you have data, and it’s even a little bit open, why not take a moment to register it with the Data Hub? That should make it so much easier for others to find.

Thank you.

Image, Open Data Stickers, from Wikimedia Commons.

Thinking about Open Data, with a little help from the Data Hub

0
0

Continuing to explore the adoption of explicit Open Data licenses, I’ve been having a trawl through some of the data in the Open Knowledge Foundation‘s Data Hub. I’m disappointed – but not surprised – by the extent to which widely applicable Open Data licenses are (not!) being applied.

For those who are impatient or already aware of the background, feel free to skip straight to the results. For the rest of you, let me begin with a little background and an explicit description of my methodology.

Background

Open Data is, increasingly, recognised as being A Good Thing. Governments are releasing data, making them more accountable, (possibly) saving themselves money by avoiding the need to endlessly answer Freedom of Information requests, and providing the foundation upon which a whole new generation of websites and mobile apps are being built. Museums and Libraries are releasing data, increasing visibility of their collections and freeing these institutional collections from their decades-long self imposed exile in the ghetto of their own web sites. Scientists are beginning to release their data, making it far easier for their peers to engage in that fundamental principle of science; the reproduction of published results.

Open Data is good, and useful, and valuable, and increasingly visible. But without a license telling people what they can and cannot do, how much use is it? Former colleague Leigh Dodds did some work a few years ago, to look at the extent to which (notionally open) data was being explicitly licensed. He was concerned with a very specific set of data; contributions to the Linked Data Cloud. At the time, Leigh found only a third of the data sets carried an explicit license, and several of those license choices were dubious.

I was interested to see how the situation had changed. I’m running a short survey, inviting people to describe their own licensing choices. I’ve also taken a look at the Data Hub, which “contains 4004 datasets that you can browse, learn about and download.” This is a far larger set of data than the one Leigh studied back in 2009, and should hopefully therefore provide a richer picture of licensing choices. It’s worth remembering, though, that data owners must actively choose to contribute their data to the Data Hub. The Hub is run by the Open Knowledge Foundation, and it therefore seems likely that submissions will skew in favour of those who are more than normally enthusiastic about their data and more than normally predisposed toward open. For more, listen to my podcast with the Open Knowledge Foundation’s Rufus Pollock and Irina Bolychevsky.

Methodology

I began by querying the Data Hub’s api, to discover the set of permissible licenses. This resulted in a set of 15 possible values;

  • Not Specified [notspecified]
  • Open Data Commons Public Domain Dedication & License [odc-pddl]
  • Open Data Commons Open Database License [odc-odbl]
  • Open Data Commons Attribution License [odc-by]
  • Creative Commons CC0 [cc-zero]
  • Creative Commons Attribution [cc-by]
  • Creative Commons Attribution Share Alike [cc-by-sa]
  • GNU Free Documentation License [gfdl]
  • Other Open Licenses [other-open]
  • Other Public Domain Licenses [other-pd]
  • Other Attribution Licenses [other-at]
  • UK Open Government License [uk-ogl]
  • Creative Commons Non-Commercial [cc-nc]
  • Other Non-Commercial Licenses [other-nc]
  • Other closed licenses [other-closed]

I then downloaded the JSON dump from the Data Hub, but found that it was far older (and smaller) than the set of data available to the API. The JSON dump was last updated on 30 August 2011, and only contained just over 2,000 entries. At the time of writing, the API offers access to 4,004 entries. With the help of Adrià Mercader, I learned how to submit the correct query to the API itself, giving me access to all 4,004 records. Results included 44 different values for the license_id attribute; the 15 above, 12 numeric values that were presumably errors of some kind, assorted ways of either saying nothing or specifying that the data had no license, and then a small number of records associated with some specific licenses such as a Canadian Crown Copyright and the MIT License. Of 4,012 records, 874 appear to say nothing whatsoever about their license conditions; not even the

"license_id": ""

used by 523.

Results

 

Looking at the raw numbers, the first impression must be a depressing one. Fully 50% of the records either explicitly state that there is no license (14), explicitly state that the license is ‘not specified’ (604), explicitly record a null value (523), or fail to include the license_id attribute at all (874). Given all of the effort that has gone into evangelising the importance of data licensing, and all the effort that Data Hub contributors have gone to in collecting, maintaining and submitting their data in the first place, that really isn’t very good at all. But at least it’s an improvement on what Leigh observed back in 2009.

If we remove the 2,015 unlicensed records and the 31 errors (those well-known data licenses, including ’1,’ ’34,’ ’73,’ etc), the picture becomes somewhat clearer.

The licenses that many have worked so hard to promote for open data (CC0, the Open Data Commons family and – in some circumstances – CC-BY) are far less prevalent than I’d expected them to be. 125 resources are licensed CC0, 273 CC-BY, 119 ODC-PDDL, 61 ODC-ODBL, and 36 ODC-BY. That’s a total of 614 out of 1,966 licensed resources, or just 31%. 44% of the 614 are licensed CC-BY; an attribution license based upon copyright rather than database rights. At least some of those may therefore be wrongly licensed. The two core data licenses are almost tied, (125 for CC0, 119 for ODC-PDDL), but together account for a tiny 12% of all the licensed resources in the Data Hub.

The picture’s not all bad, as there is clearly a move toward the principle of ‘open’ and ‘public domain’ licenses. CC0 (125) and ODC-PDDL (119) are joined by 167 data sets licensed with some other public domain license. And with 444 data sets, ‘other open license’ is the single most popular choice; almost one quarter of the licensed data sets use an open license that is not one of the mainstream ones.

In total, the Creative Commons family of licenses (including the odd ‘sharealike’ variant and the hugely annoying ‘noncommercial’ anachronism) account for 602 data sets, or 31%. The Open Data Commons family account for 216, or 11%.

By most measures, we should probably welcome the use of any open or public domain license. But the more choices there are, the more scope there is for confusion, contradiction, and a lack of interoperability. Every time I want to take an ‘open’ dataset licensed with Open License A, and combine it with an ‘open’ dataset licensed with Open License B, there’s the nagging doubt that some wording in one of the licenses introduces a problem. Do I need to check with a lawyer? Do I need to check with one or both of the data providers? Is this all too much bother, and should I just go and do something else? License proliferation is friction.

So those are the results. What do they say to you?

It will be interesting to check back over time, and see how the proportions shift. Let’s work to eradicate the ‘None/ Not Specified’ category altogether, and then see what we can do to shrink all of the ‘Other’ categories.





Latest Images