23 June 2010

OpenDataBC: Toward A Data Usability Scale

I am currently involved in a project named OpenDataBC.  OpenDataBC is an open platform for government data sets and APIs released by governments in British Columbia.  It makes it easy to find datasets by and about government, across all levels (provincial, regional, and municipal) and across all branches. The catalogue is both entered by hand and imported from multiple sources and is curated by our team of volunteers.

Being a site called "OpenDataBC" you would think it would be pretty straightforward to put such a site together. Take the available catalogues from Nanaimo, Vancouver and the province and stick them together and voila, a catalogue is born. But, it's actually not that easy. The site is named OpenDataBC because we wanted to pay particular attention to "Data" that is "Open" that originates in or is about "BC", and for that we have to be a bit more careful about how we put it together.

The definition of "open" as it relates to data is still evolving at a rapid pace.  In it's ideal form what we mean by open data is:
Open data is data that you are allowed to use for free without restrictions.  Open data does not require additional permission, agreements or forms to be filled out and it is free of any copyright restrictions, patents or other mechanisms of control.
By this definition, there is very little open data available today.  Rather than soften the definition of open we think that it's useful to promote the use of data that's been released while acknowledging data that is more open (doing the right thing) while at the same time encouraging the data that is less open, to evolve.

Our goal is ultimately to facilitate the process of making more BC data available in a form that people can use. To that end OpenDataBC will highlight the most usable datasets that we can find.  For that we need some sort of usability ranking or scale, which right now does not exist, so we are inventing it. Here I present the following questions as questions to consider when assessing the usability of data being released. It's a starting point and we expect it to evolve.

1. Is it machine readable electronic data?
Although technically a scanned image of a map with gold stickers pasted on it is data, is not something that a programmer can use.  What we look for is machine readable data.  Documents or electronic files containing data that are published in formats that a software program can ready easily and consistently without errors is considered machine readable.  A databases, spreadsheets, CSV files are all examples of machine readable electronic data that are easily readable, thus they are considered more usable.  PDF files, word documents, scanned images - while technically readable by a software program - it's not easy and it is time consuming, thus this it's less usable.

2. Is it accessible?
I should be able to get it easily over the internet.  I should be able to get it on demand, with a simple program using open source software.  I should not have to submit a form to get it.  I should be able to enter a URL and in return I get the data.

3. Is it published in an open format?
From wikipedia: "An open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical licenses used by each. In contrast to open formats, proprietary formats are controlled and defined by private interests."

4. Is it free?
In this context, I mean am I free to use this data however I want?  Can I use it to produce a product that I sell?  Can I combine it with other data and publish it?  Can I sell a copy of it?  Data that puts any sort of restrictions on the ways in which the data can be used, or imposes any conditions or constraints on the user, is not free.  For example, if  I have to enter into an agreement to use it, it's not free.

5. Is it released under a common license?
Data that is released under a common license, such as the Creative Commons license or the Open Knowledge Definition are preferred over licenses created by the party releasing the data because licenses are hard to understand.  The more time people have to spend understanding the license in order to use the data, the less usable the data is.  Common licenses address this problem because once the license is learned for one dataset that license is understood and can be applied to other datasets released similarly.

6. Is it provided without a fee?
The data needs to be available at no cost to the user.  If it costs money, it's less usable and it's not open data.

7. Is it complete?
Data should not be missing values that ought to be there.  If it's point-in-time data it should include all of the relevant information for that point in time.  If it's time series data, it should include the entire time series from the first record to the most recent record.   If the data is about a geographical province, region or city, it should include the entire province, region or city and not leave out some geographical part of the data.

8. Is it timely?
The data should have the most up to date information as soon as it is available.  Ideally the data is available as an updated feed or at least updated on a regular schedule.  If the data is a feed, it should be available in as near real time as possible.

The plan is to add to this list and to refine the questions as we move along and gain experience with it. By applying a standardized set of questions to ask, users who come to the site will be able to easily determine what they might be up against if they want to use data in the catalogue. More usable data will thus be featured more prominently and less usable data will be identified as such so the issues that are contributing to it's less usable status can be addressed.

Please let me/us know if you think we're missing something or of something here needs adjusting.


markson said...

for example, Spark, R programming, Python just as beneficial programming like SPSS and SAS. ExcelR Data Science Courses

Data Science pune said...

Such a very useful article. Very interesting to read this article. I would like to thank you for the efforts you had made for writing this awesome article.
Data Science Course in Pune
Data Science Training in Pune

DataScience Specialist said...

It was good experience to read about dangerous punctuation. Informative for everyone looking on the subject.
Data Science Course in Bangalore

DataScience Specialist said...

Excellent work done by you once again here. This is just the reason why I’ve always liked your work. You have amazing writing skills and you display them in every article. Keep it going!
Data Science Training in Bangalore

Data Science Institute In Banglore said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
Data Science Institute in Bangalore

Best Data Science Courses In Bangalore said...

After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
Data Science Certification in Bangalore

Digital Marketing training - 360DigiTMG said...

Extraordinary blog went amazed with the content that they have developed in a very descriptive manner. This type of content surely ensures the participants to explore themselves. Hope you deliver the same near the future as well. Gratitude to the blogger for the efforts.

360DigiTMG Cloud Computing Course

hrithiksai said...

Found your post interesting to read. I cant wait to see your post soon. Good Luck for the upcoming update. This article is really very interesting and effective, data sciecne course in hyderabad

EXCELR said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work. data science training in Hyderabad

Tech Institute said...

Top quality information with unique content and excellent writing thank you.
Data Science Course in Hyderabad 360DigiTMG

Data Science Institute In Banglore said...

I'm really thankful that I read this. It's extremely valuable and quite informative and I truly learned a great deal from it.

360DigiTMG Data Science Training Institute in Bangalore

Best Data Science Courses In Bangalore said...

Additionally, this is an excellent article which I truly like studying. It's not everyday I have the option to see something similar to this.
Data Science Course In Bangalore With Placement

Business Analytics Course said...

First of all, you have a great blog. I will be interested in more similar topics. I see you have some very useful topics, I will always check your blog thank you.

Business Analytics Course in Bangalore

Data Analytics Course said...

I am very happy to have seen your website and hope you have so many entertaining times reading here. Thanks again for all the details.

Data Analytics Course in Bangalore

Data Science Training said...

Top quality blog with very informative information found very useful thanks for sharing.
Data Analytics Course Online

Artificial Intelligence Course said...

I found Habit to be a transparent site, a social hub that is a conglomerate of buyers and sellers willing to offer digital advice online at a decent cost.

Artificial Intelligence Course in Bangalore

Data Science said...

I really appreciate the writer's choice for choosing this excellent article information shared was valuable thanks for sharing.
Data Science Training in Hyderabad

Ashok said...

Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!

data science course in India

Ashok said...

Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also.
Artificial Intelligence Course

Best Data Science Courses said...

This is an excellent article. I like this topic. This site has many advantages. I have found a lot of interesting things on this site. It helps me in so many ways. Thanks for posting this again.

Best Data Science Courses in Bangalore

Huongkv said...

Mua vé máy bay tại Aivivu, tham khảo

mua ve may bay di my

ve may bay eva tu my ve vn

vé máy bay vietnam airlines đi đà nẵng

vé máy bay đi đà lạt vietjet

lịch bay sài gòn phú quốc

Anonymous said...

Thanks for sharing such nice info. I hope you will share more information like this. please keep on sharing!

Python Training In Bangalore | Python Online Training

Artificial Intelligence Training In Bangalore | Artificial Intelligence Online Training

Data Science Training In Bangalore | Data Science Online Training

Machine Learning Training In Bangalore | Machine Learning Online Training

AWS Training In Bangalore | AWS Online Training

IoT Training In Bangalore | IoT Online Training

Adobe Experience Manager (AEM) Training In Bangalore | Adobe Experience Manager (AEM) Online Training

Oracle Apex Training In Bangalore | Oracle Apex Online Training

Professional Course said...

Happy to chat on your blog, I feel like I can't wait to read more reliable posts and think we all want to thank many blog posts to share with us.

Data Science Training in Bangalore

Digital Marketing Training in Bangalore said...

Wonderful blog found to be very impressive to come across such an awesome blog. I should really appreciate the blogger for the efforts they have put in to develop such amazing content for all the curious readers who are very keen on being updated across every corner. Ultimately, this is an awesome experience for the readers. Anyways, thanks a lot and keep sharing the content in the future too.

Digital Marketing Training in Bangalore

Machine Learning Course in Bangalore said...

Truly incredible blog found to be very impressive due to which the learners who go through it will try to explore themselves with the content to develop the skills to an extreme level. Eventually, thanking the blogger to come up with such phenomenal content. Hope you arrive with similar content in the future as well.

Machine Learning Course in Bangalore

Maneesha said...

Excellent post.I want to thank you for this informative read, I really appreciate sharing this great post.Keep up your work
data scientist training and placement

Mallela said...

Thanks for posting the best information and the blog is very important.digital marketing institute in hyderabad

data science said...

I was just examining through the web looking for certain information and ran over your blog.It shows how well you understand this subject. Bookmarked this page, will return for extra. data science course in vadodara

Mallela said...

Thanks for posting the best information and the blog is very important.data science institutes in hyderabad

Data Scientist Course in Dombivli said...

Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. data scientist course in delhi

Training Institute said...

I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, hope you will provide more information on these topics in your next articles.

Artificial Intelligence Training in Bangalore

Professional Course said...

Really impressed! Everything is a very open and very clear clarification of the issues. It contains true facts. Your website is very valuable. Thanks for sharing.

Digital Marketing Training in Bangalore

Machine Learning Course in BLR said...

Happy to chat on your blog, I feel like I can't wait to read more reliable posts and think we all want to thank many blog posts to share with us.

Machine Learning Course in Bangalore

dataanalytics said...

Easily, the article is actually the best topic on this registry related issue. I fit in with your conclusions and will eagerly look forward to your next updates. Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your rss feed to stay informed of any updates.
data scientist course in hyderabad

data science said...

Great to become visiting your weblog once more, it has been a very long time for me. Pleasantly this article i've been sat tight for such a long time. I will require this post to add up to my task in the school, and it has identical subject along with your review. Much appreciated, great offer. data science course in nagpur

tech science said...

Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.business analytics course in nagpur

Professional Course said...

A good blog always contains new and exciting information and as I read it I felt that this blog really has all of these qualities that make a blog.

Digital Marketing Institute in Bangalore

Professional Course said...

Happy to chat on your blog, I feel like I can't wait to read more reliable posts and think we all want to thank many blog posts to share with us.

Data Science Training Institutes in Bangalore

Professional Course said...

I am excited about this blog. It is an informative subject. It helped me a lot in solving some problems. Your opportunity is so fantastic and the style of work so fast.

Data Science Training in Ernakulam

Professional Course said...

It would also motivate almost everyone to save this webpage for their favorite helper to help get the look published.

Data Science Training in Patna

Data Science Course in Bhilai - 360DigiTMG said...

Impressive blog to be honest definitely this post will inspire many more upcoming aspirants. Eventually, this makes the participants to experience and innovate themselves through knowledge wise by visiting this kind of a blog. Once again excellent job keep inspiring with your cool stuff.

Data Science Training in Bhilai