This report explores barriers currently preventing agencies from providing more open data that is comparable and interoperable.
New Zealand's Open Data Charter Implementation plan currently focuses on the Charter principle of 'comparable and interoperable'. This principle was chosen as the initial focus as it is foundational to the other Charter principles.
Comparable and interoperable data is achieved through standardisation, consistent formats, and fully described metadata and documentation. Creating a data system where datasets can integrate and systems can communicate, with little effort, establishes a basis for the other principles to be more easily achieved.
To be most effective and useful, data should be easy to compare within and between sectors, across geographic locations, and over time. Data should also be presented in structured and standardised formats to support interoperability, traceability, and effective reuse.
Open Data Charter implementation plan
To encourage the creation of more comparable and interoperable data across the data system, we must first understand the existing barriers preventing progress and identify opportunities to resolve them.
A number of barriers were identified through engagement with New Zealand’s open data community, and through Stats NZ's experience encouraging agencies to open their data. These barriers need to be addressed to ensure agencies routinely release open data that is comparable and interoperable:
A lack of standardisation across the system leads to inconsistent, and therefore incompatible, data. Data consistency makes it easier to bring together data from different sources and enables comparative decision-making. Consistency covers data release formats, classifications, definitions, and taxonomies.
Many agencies focus on internal consistency and ensuring data is fit for the purpose for which it was collected. This means however that agencies often use classifications that meet their own needs, with little consideration of future use by others, limiting the potential value of the data. Data published in a way that allows integration and clear comparisons can provide insights that go beyond the purpose for which that data was created.
Consistency is a particular issue when it comes to local-level data access. With a wide range of differences in the resources, capacity, and capability of data providers at the regional level, it is difficult to implement consistent standards. This makes comparing regional level data more complicated, further compounded by the use of different geographical boundaries, for example regional council boundaries do not align with District Health Board (DHB) boundaries.
Data needs to be described adequately and in a standardised way to help users to understand and interpret it correctly, and to help users know when data is fit for purpose. Currently, agencies capture metadata in different ways, with little consistency.
This also leads to poor discoverability – users don't know what open data is available, or what data could potentially be made open. Easily discoverable data requires good metadata, where datasets are well documented in a consistent and agreed way.
While most agree that the provision of metadata with published datasets is necessary, current metadata practices vary across agencies and there is often no standard in use, few data dictionaries, and little documentation. Having a set of core metadata requirements for published datasets would help to overcome this.
Some agencies with self-imposed quality standards fear reputational risk from publishing less-than-perfect datasets. They want to maintain a reputation for quality and reliability, and believe that data considered to be low quality may reflect badly on them or on government. This sacrifices the value that results when datasets are made open, and the opportunities for agencies to understand the full power of their data.
Some agencies are uncomfortable with not being able to fully anticipate possible uses, possible misuse or misinterpretation, or unforeseen consequences. As a result, there is reluctance to release some datasets.
The value of data is realised through its use. If data isn’t used due to low discoverability, capability, understanding, or interoperability, it won’t generate as much value. A key barrier to more comparable and interoperable data is not understanding users' needs. This results when there is a lack of communication between the provider and user of the data in determining first, what the user wants and then what they need for it to be accessible to them.
The Third Wave of Open Data report discusses the shift from 'open by default' to 'publish with purpose'. At the heart of this concept is understanding what users need from those who hold the data, so that the data with the most value to those users can be prioritised for opening. This is also relevant in the case of non-use of data that has been made open.
Low use means the data is failing a user need, but it is important to understand the source of that failure. It could be that the data is not of high interest, it may be in an undesirable format, it may be hard to find, or the user may not have the technical skills required to gain valuable insights from the data.
Third wave of open data report
There is a tension that exists between machine and human readable formats. Availability of resources often determines which of these formats is prioritised. When resources are limited, APIs (Application Programming Interfaces) are seen as an efficient way of reaching the most people. However, some technical skills are required to use APIs so this format does not suit everyone.
Communication with users needs to be ongoing to understand who APIs actually work for and what alternatives may be necessary. It may be that different formats or channels are necessary to meet the needs of different users. Determining a highest common format that serves the most users and enables all downstream uses may be a better option.
Another barrier to comparable and interoperable data lies in understanding who the key players are, and who can influence or implement change. Decision making and advocacy around open data tends to be delegated to roles with insufficient influence.
In addition, those who do hold the power to make change may not understand the importance of open data when they’re dealing with competing priorities. The importance of open data needs to be communicated in ways that resonate with Tier 1 and 2 leaders – the right understanding and support at high levels is essential to enable the right focus.
There are many barriers to a more comparable and interoperable data system across New Zealand. These barriers aren't new but continue to prevent agencies from providing and creating more open data.
While progress has been made in many of the areas mentioned, more still needs to be done to attain the streamlined system desired. Understanding and periodically reassessing these barriers is a key step in measuring the progress that has been made in these areas, as well as aligning them with planned work that can continue that progress.
A system-wide goal will take a system-wide effort – we must use our collective knowledge to prioritise which barriers are most important to focus on first, to create the most impact for both providers and users of data.
Understanding the barriers, prioritising them, and working together to remove them, will not only create a data system of comparable and interoperable data, but will lay the foundation for further opportunities to achieve the Open Data Charter principles.
Your feedback on this report is encouraged. Please share your ideas.
If you'd like more information, have a question, or want to provide feedback, email opendata@stats.govt.nz
The Open Data Institute (ODI) recently held a roundtable with organisations that steward open data. They discussed the impacts these organisations have had, and the challenges they have faced. Many of their challenges are similar to the barriers we identified. You can read the ODI blog for more information.
Roundtable with stewards of open data
Content last reviewed 23 December 2021.