Skip to content

Accessing dataset metadata via API

Data.govt.nz's datasets, organisations and groups can be accessed and queried as JSON date through the metadata API; this page describes how.

The prefix for web resource endpoints is https://catalogue.data.govt.nz/api/action and they return responses in JSON format. You can retrieve dataset and resource metadata through these functions:

Resource Functions Description
/package_metadata_show?id={package id} Gets the metadata of a dataset and all of its data resources. All of the data resources' download link, last updated date, etc. can be retrieved here.
/resource_metadata_show?id={resource id} Gets the metadata of a specific data resource. This function is a subset of package_metadata_show as it would have listed all of its data resources. The data resource download link, last updated date, etc. can be retrieved here.

CKAN provides full API documentation with further endpoints and actions that can be called.

Code examples to get started are provided below (feel free to suggest further code snippets and examples via a pull request).

Code Examples

Example #1: Get the metadata of a Dataset:

  • Identify the appropriate web resource method to use: package_metadata_show
  • Identify the dataset you want and retrieve it's ID, for instance: new-zealand-public-sector-websites
  • Create POST HTTP call to: https://catalogue.data.govt.nz/api/action/package_metadata_show?id=new-zealand-public-sector-websites.
  • You will receive a JSON response containing the metadata for this dataset. After parsing the JSON response into an object, check if the request was successful by ensuring that the response-json-object.success value evaluates to true before proceeding.

The following example shows how you can use Python to retrieve dataset information from the site:

#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint

# Use the json module to dump a dictionary to a string for posting.
data = urllib.quote(json.dumps({'id': 'new-zealand-public-sector-websites'}))

# Make the HTTP POST request.
response = urllib2.urlopen('https://catalogue.data.govt.nz/api/action/package_show', data)
assert response.code == 200

# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())

# Check the contents of the response.
assert response_dict['success'] is True
result = response_dict['result']
pprint.pprint(result)

Example #2: Get the metadata of a Resource:

Note: that retrieving the dataset metadata (as above) already includes metadata for all of its data resources.

  • Identify the appropriate web resource method to use: resource_show.
  • Identify the data resource you want and retrieve it's ID, for instance: 4c5f6967-6c6d-4981-aa10-6b6790918cb5.
  • Create the URL call: https://catalogue.data.govt.nz/api/action/resource_show?id=4c5f6967-6c6d-4981-aa10-6b6790918cb5.
  • Make a POST request to the above URL and you will receive a response. After parsing the JSON response into an object, check if the request was successful by ensuring that the response-json-object.success value evaluates to true before proceeding.
  • Refer to CKAN's API Documentation for more detailed information on how to execute the API calls.

How to make more advanced queries on the data.govt.nz CKAN API

The CKAN API leverages the Solr search query langauge to perform more complex searches against the metadata and data held in the data.govt.nz CKAN portal.

Example: Returning new datasets created between 2 dates

In this exmaple we're going to return any newly created datasets since the migration of data.govt.nz from our old portal to the new CKAN powered one.

  1. Firstly, the API endpoint to use is:
https://catalogue.data.govt.nz/api/3/action/package_search

This will return everything from the catalogue but only the first 10 items and it will include all metadata for each record i.e. all the resources, tags etc

  1. Next, let's refine this list using the Solr filtered quiery (fq) parameter on the metadata_created property to get datasets created between 12 April and 30 June (date of migration until end of financial year for NZ Government).

https://catalogue.data.govt.nz/api/3/action/package_search?fq=metadata_created:[2017-04-12T00:00:00Z TO 2017-06-30T23:59:99.999Z]

  1. There should be 272 results however we can only see 10, let's expand the number of results using the rows parameter (note: you use a combo of the rows and start paramters to do paging in the CKAN API).

https://catalogue.data.govt.nz/api/3/action/package_search?fq=metadata_created:[2017-04-12T00:00:00Z%20TO%202017-06-30T23:59:99.999Z]&rows=500

 


Top