Iterating data.govt.nz

Blog post from the data.govt.nz team to update data analysts, wranglers about progress on the beta site - requesting consideration, feedback around new functionality outlined.

TLDR: We've updated beta.data.govt.nz, added some more data (old and new, records and hosted data) for testing, and need your feedback to help us decide whether to proceed and what to prioritise.

It has been a couple of months since we released a beta for data.govt.nz and we've had some constructive feedback from both inside and outside of government, so thank you everyone!

We've made some useful updates to the functionality of the beta and have migrated all the working data and records from data.govt.nz for further testing and feedback. We need to determine whether CKAN is the right platform for data.govt.nz and would greatly appreciate your feedback on this point. Apologies for the long blog post, but there is a lot to consider!

What we've been doing

We found some of the data on data.govt.nz was out of date so it was good to clean it up and we have now have a tool on beta to help data publishers keep track of any broken links or data, which will improve reliability in future. We've also migrated all data requests into the new data request system, and use cases into the new showcase.

The beta platform is built on CKAN and we have added a number of plugins to test with data publishers and users.

We are also building a roadmap for data.govt.nz which will be blogged in the coming month or so for feedback, after we've had your feedback on the beta site.

In updating beta, we have identified that most agencies do not have the capacity to implement their own data catalog or hosting infrastructure. As such, data.govt.nz can now either harvest dataset metadata records from agencies with catalogs (CSW, CKAN, etc), or it can be used as the metadata catalog and hosting site for agencies that don't have those capabilities. The idea would be that data.govt.nz could be iterated over time to response to the changing needs of data publishers and data users.

New functionalities

Beta.data.govt.nz 2.1 includes the following functionality:

Standardised metadata schema

We are working towards being a whole of government data catalog with a standardised metadata schema (based on DCAT and mapped to other schemas such as ANZLIC). Whilst it will take a little more time to get a more complete list of data across government, we are working closely with data publishers to ensure all government data is searchable through data.govt.nz.

If you find government data that isn't on data.govt.nz please let us know and encourage data custodians to list their data on data.govt.nz

Automated harvesting of data

We are testing harvesting functionality to automatically populate and regularly update the data.govt.nz catalog with datasets from existing catalogues across New Zealand.

We can easily harvest from CSW (spatial), data.json (inc ARCGIS) and CKAN catalogs and we have developed an easy-to-use spreadsheet for agencies which don't have catalogues to populate data.govt.nz.

Data hosting

Tabular data can be hosted on beta.data.govt.nz by agencies that need somewhere to publish their data assets. If the data is machine readable, an API will be automatically generated, making the data more useful to all data users including the original data publisher!

Technical Quality Framework tools

We are building a basic technical quality framework to automatically grade all datasets according to the most basic needs of data users. We have started with the TBL 5 star quality plugin for CKAN, and have a plan for how to usefully extend this so data users can far more easily identify data they can use.

Analytics

We have integrated basic CKAN stats and are working on analytics for the site so data users and publishers can look at how often datasets are viewed or downloaded, and we'll work on API calls.

License mapping

We've mapped the NZGOAL licences to beta.data.govt.nz so data publishers can more easily select an appropriate open license for their open data.

Basic data visualisation

We've added the basic charts plugin to make data that is machine readable easier to play with and gain basic insights without the need for data skills or software. It is just a basic function to help people identify data that may be useful to use, analyse and build further value on.

Archiver function

We have enabled the “Archiver” plugin so we can identify broken links proactively, and to locally cache data files stored on agency websites to make them more accessible. Where such data is machine readable, it will automatically get an API on data.govt.nz and will be browseable as data and through the basic graphs functionality.

Other useful functions on beta you may appreciate include:

You can subscribe to data publishing updates, including by search term, organisation or group. See the About page for more information.
The CKAN API documentation linked below will tell you more about how you can use the API to search the catalog, to use API enabled datasets, and for data publishers, to automate harvester and published dataset updates.

Next steps

The next step is to analyse whether the beta has been successful, whether CKAN is the right platform for data.govt.nz, and if so, for beta to go live to data.govt.nz. As mentioned we are also building a roadmap for data.govt.nz and completing some analysis of the all of government data landscape to identify what is needed to support the broader open government data agenda.

Possible future ideas for the roadmap include:

Improved search based on the quality framework, user tagging and semantic keywords generated from the data itself.
Implementation of the Basic Technical Quality Framework.
Once the broader Basic Technical Quality Framework above is implemented, we would like to work with colleagues across government to integrate data quality frameworks for specialised data types (eg, statistical or spatial) so users interested in these datasets have both a basic confidence in the technical usefulness of the data as well as a domain specific indication of data quality.
Extended data hosting options for agencies, depending on demand and all of government requirements. Could include some spatial, unstructured data, real time, etc.
An “unpublished data” function for data users to discover securely hosted data they may wish to pursue, though these datasets are not publicly available for privacy or security reasons.
Improved analytics to report on API usage of datasets.
The ability for data publishers to share data models, vocabularies, ontologies or other dataset artifacts useful to data users.
The ability for data users to share artifacts back to a community hub (perhaps github?) so that when data users do vocab mappings, code to fix data, data improvements or any other artifacts they create, they are able to be shared with other data users and picked up by data publishers where appropriate.

If you have other ideas, whether you are a data publisher or a data user, please let us know through the beta user survey!

We are working closely with our colleagues across government to collaboratively establish a more data driven public service with data infrastructure you can rely on. We look forward to continuing this journey with you.

About data.govt.nz

Working with Open Data and Information Programme partners, Stats and government agencies, data.govt.nz is making open government data easier to find, publish and use. The new beta.data.govt.nz site released in June 2016 will play an important role in gathering feedback from, and building a better user experience for the open data community.

The data.govt.nz team are working with international counterparts throughout this beta phase, getting further insight into both the platform and best practices with open data release.