This guidance note provides practical advice for agencies when selecting the formats for releasing public information and data for re-use in accordance with the NZ Government Open Access and Licensing framework (NZGOAL), as required by the 2011 Declaration on Open and Transparent Government. This replaces the January 2015 version.
Application of this advice will assist agencies to assess data readiness for re-use in line with the 5-star open data measure.
Cabinet has approved the following format principles.
“When licensing copyright works and releasing non-copyright material for re-use, agencies should:
When releasing works or material in proprietary formats, agencies should also release the works or material in open, non-proprietary formats”.
NZ Data and Information Management Principles
“Data and information released can be discovered, shared, used and re-used over time and through technology change. Copyright works are licensed for re-use and open access to and re-use of non-copyright materials is enabled, in accordance with the New Zealand Government Open Access and Licensing framework.
Data and information released in proprietary formats are also released in open, non-proprietary formats. Digital rights technologies are not imposed on materials made available for re-use”.
This table lays out common formats for releasing data for re-use. If you are considering releasing data for re-use in formats not listed below, you should consult the Open Government Data Programme. Note that document formats such as PDF and Word are not suitable formats for providing data for re-use.
Recommended formats for the release of open data
Format | Machine-readable for purposes of data re-use | Open standard | Best used for |
---|---|---|---|
JSON | Yes | RFC 7159, ECMA-404 | General data interchange and is commonly used as part of a RESTful API service. |
Comma Separated Variable (CSV) | Yes | RFC 4180 | Tabular and statistical data |
Spreadsheets (XLSX, ODS) | Yes if laid out in CSV-like format (may be as a supplementary worksheet). Spreadsheets laid out for visual understanding require manipulation to be made machine readable. | ISO 29500 (XLSX) ISO 26300 (ODS) |
Tabular and statistical data |
Spreadsheets (XLS) | Yes if laid out in CSV-like format (may be as a supplementary worksheet). Spreadsheets laid out for visual understanding require manipulation to be made machine readable. | Proprietary | Tabular and statistical data |
Hypertext Markup Language (HTML) | Yes, but additional formats should be provided for data. | W3C Recommendation | Web documents |
Extensible Markup Language (XML) | Yes | W3C Recommendation | Documents / data structures conforming to published schemas |
Resource Description Framework (RDF) and Linked RDF | Yes | Suite of W3C Recommendations | Any data |
iCal | Yes | Proprietary (maintained by Apple Inc.), but widely supported | Used for sharing events and calendar based information |
Open Geospatial Consortium standards (WFS, WCS, WPS, WMS, WMTS) | Yes | OGC Standard | All geospatial data |
Keyhole Markup Language (KML) and Geography Markup Language (GML) | Yes | OGC Standard | Geospatial data, but has limitations compared with other OGC standards; may be convenient for non-geospatial specialists. |
GeoPackage | Yes | OGC Standard | Sharing geospatial data, modern alternative to Shapefile. |
GeoJSON | Yes | Publicly developed, freely available specification. | Geospatial data, but has limitations compared with OGC standards; may be convenient for non-geospatial specialists. |
Shape Files (SHP) | Yes | Proprietary, but specification published and maintained by ESRI. | Geospatial data, but has limitations compared with OGC standards; may be convenient for non-geospatial specialists. |
Sensor Observation Service (SOS) | Yes | OGC Standard | Sensor data, generally associated with a geospatial location. |
CityGML | Yes | OGC Standard | Storage and exchange of virtual 3D city models. |
Always provide alternatives. Re-users have a range of needs, capabilities and tools at their disposal. Providing data in alternative formats or layouts facilitates broader opportunities for re-use.
Consider industry- or sector-specific formats. Many industries and verticals have specialised formats for data representation and interchange, often in XML or JSON format. It is recommended that these be explored with industry or sector groups before releasing specialised data and used where possible. Some examples of industry specific formats are:
Providing data in the form of spreadsheets laid out to aid human comprehension is useful for many people, but generally requires laborious manipulation to be made machine readable and usable by software programmes and visualisation or analytical tools.
Human-friendly spreadsheets should always be accompanied by raw data in CSV format, or at the very least a worksheet containing all the raw data that underpins the spreadsheet, laid out CSV-style (one row of headings, complete rows of data cells and no visual formatting). A good example can be found in this spreadsheet from Treasury [XLS 348 KB] - go to the Raw data worksheet.
If agencies need to release data in tab, tilde (~) or other delimited formats, it should be noted in descriptive text accompanying the release, and on data.govt.nz.
Agencies should also consider providing readily-available query methods (such as JSON APIs) for commonly accessed data, to allow advanced users to search and retrieve a subset of the raw data in machine readable form as and when needed. APIs should be accompanied by thorough documentation and example implementations to facilitate their use.
Some users have rigorous requirements of geospatial data in order to ensure high degrees of accuracy over time, and need data in the form of OGC web services and ISO 19115 metadata. These formats support the development of robust spatial data infrastructures, local and national physical infrastructure, surveying and geographical services etc.
Others however can benefit from simpler mechanisms such as KML and the Google Maps APIs, or converting KML or Shapefiles for use on OpenStreetMap. They are useful for people who may not be geospatial professionals but are using spatially-aware tools to develop services or products such as visualisations, simple mapping or real-time plotting services.
Where possible geospatial data should be provided in alternative formats - via web services and download - to support a range of uses.
Datasets should be listed on data.govt.nz at the most granular level possible. For example, agencies publishing survey data as a collection of spreadsheets or CSV files should provide a description of each spreadsheet or file and list them individually on data.govt.nz.
For tabular and statistical data presented as spreadsheets of multiple worksheets but also containing a worksheet of raw data, the spreadsheet can be considered sufficiently granular to list on data.govt.nz.
Agencies publishing formatted spreadsheets without the accompanying raw data should include the raw data – in a CSV-like layout – underpinning all worksheets in the spreadsheet as an additional worksheet.
In all cases the metadata description for a record on data.govt.nz should be sufficiently detailed that users can understand what type of data they will find in the dataset, and have confidence that the data to be downloaded is the data they want.
Geospatial data is more easily discoverable when listed on data.govt.nz as individual layers, as the LINZ Data Service and others do, rather than as aggregated collections of data. Individual layers accessible in a range of formats and comprehensively described in metadata should be listed as individual entries on data.govt.nz.
In the simplest terms, an open format is a format that has an open standard associated with it. An open standard is made through a transparent, collaborative process, fairly accessible for zero or low cost, mature and supported by the market.
Proprietary formats are formats designed to work only in the proprietary programmes that created them. When releasing high-value public data and information for re-use, it should be released in open and non-proprietary formats. However, if a proprietary format is commonly used the data may be released in a proprietary format, as well as a non-proprietary format.
Machine readable data is data that is designed to be consumed directly by computer programs (applications) without a human middleman.
Last updated 7 August 2015