Skip to content

How do you create a data dictionary?

Learn about the decisions you need to make before creating a data dictionary and the tools that might help. Explore examples of data dictionaries published by other government organisations. 

On this page

Plan the level of detail needed

Before you go about making a data dictionary for each specific dataset, you have a few things to think about:

  • the end goal - what are you trying to achieve?
  • the audience - who is going to use your data?
  • the user need - what do they need to know about your data to use it appropriately?

The answers to those questions will help you decide on the level of detail that you will need to include in your data dictionary.

Example dictionaries

We have divided our examples into three levels: no data dictionary, basic, and comprehensive. These levels have been made up by us for the purpose of showing you how different aims, audience needs, and data complexities can require different levels of detail in your data dictionary.

No data dictionary

Some data doesn’t need detailed information to make it findable and useable. In these cases, there is no need for a data dictionary. For instance, columns or content may obvious to those that want to use the data.

The data about DOC huts published by the Department of Conservation is a good example. Their answers to the planning questions mentioned above might be:

  • the end goal - data about DOC hut locations are used by others
  • the audience - the wider public (low technical skill) to software developers (higher technical skill)
  • the user need - no extra information other than that already provided in the dataset.

DOC hut dataset

Basic data dictionary

The columns or values in your data could be hard to understand, but the data could be easy for your audience to find.

In these situations, you may only need a basic data dictionary. In that dictionary, you might include a description of the data, a definition of the column headers, and the codes used as values in the columns.

The motor vehicle registry open data dictionary, published by Waka Kotahi - NZTA, is a good example of a basic data dictionary. Their answers to the planning questions mentioned above might be:

  • the end goal - analysts confidently and reliably use the motor vehicle registry data
  • the audience - data analysts (medium to high technical skill)
  • the user need - a description of the columns, the units related to data in columns, the codes used and their meaning. 

The New Zealand Vehicle Fleet Open Data 
Motor vehicle registry open data dictionary [CSV 9 KB]

Comprehensive data dictionary

Basic data dictionaries are good in many situations. There are datasets or situations in which basic data dictionaries just aren't enough, for instance: 

  • complex datasets that include variable transformation
  • when there is a high risk and consequence of other's misunderstanding the dataset
  • if there are regular updates, with new columns, and changes to old mathematical methods and sampling techniques
  • when there are many datasets, about related and complex topics, that are hard to find. 

In these situations, you may need to publish comprehensive data dictionaries that describe every detail of the data. 

The data dictionary, published by Stats NZ, on the Consumers Price Index is a good example. In the context of this nationally significant dataset, the answers to the planning questions mentioned above might be:

  • the end goal - to ensure accurate, reliable, and trustworthy analysis of the Consumers Price Index to inform nationally significant decisions
  • the audience - data analysts, data modelers, and data scientists (high technical skill)
  • the user need - descriptions and codes for all columns and variables, sampling methods, mathematical transformations, a record of earlier forms of the data, related datasets, and concepts. 

Stats NZ Consumers Price Index data dictionary

Helpful tools and software

You can make a data dictionary using many software products. Some are free, while others cost. Some are simple, easy to begin with, and will work well for basic dictionaries. Others are comprehensive, hard to master, but provide powerful benefits for those who have complex needs.

Data.govt.nz does not endorse any one product above the other. But, we wish to give you a good idea of what is out there and suited to your needs.

Microsoft Office tools

If your data is easy to understand and only needs a simple data dictionary, Microsoft Office products might suit you. You can make a data dictionary in Microsoft Excel or Microsoft Word. The following two links provide good basic templates.

Data dictionary template for Microsoft Excel [CSV, 1 KB]
Data dictionary basic template from the USDA

Pros:

  • You probably already use Microsoft Office products
  • Templates are simple and quick to fill in
  • You can share with others in formats familiar to most.

Cons:

  • These tools won’t scale well – as your data gets more complex, your data dictionaries will become inaccessible
  • The technology is proprietary – not everyone can use Microsoft Office products
  • The formats typically produced are not machine readable.

Colectica for Excel

The next option is to use the Colectica plug-in for Excel. This option might suit those producing basic data dictionaries for some but not all of their datasets. 

Colectica for Excel
Using Colectica in Excel

Pros:

  • Free for a basic version
  • Integrates with Excel - make a data dictionary while you are using your data
  • Publishes the dictionaries in human- and machine-readable formats.
  • Add extra fields that are relevant to your organisation.

Cons:

  • IT environment requires an add-in to Excel
  • It will add the add-in to everyone’s Excel version.

Dataverse or Colectica (full license)

For more detailed data dictionaries, you can try Dataverse or the full version of Colectica. These tools are powerful because they will standardise the metadata you include, link the concepts or variables across datasets, populate a data dictionary, improve findability,  and producing data dictionaries.  

Colectica products
Dataverse

Pros:

  • Integrates across multiple datasets, teams, and organisations
  • Captures metadata from the start to the finish of the data lifecycle - DDI lifecycle
  • If you pay for the extra, then you will be given a pre-built portal
  • Contains all features required to support your complete metadata collection
  • Publishes the dictionaries in human- and machine-readable formats
  • Maintains international best standards.

Cons

  • Unintuitive and complex - requires quite a long time to learn it
  • Difficult to use if you don’t use it all the time

Contact us

If you’d like more information, have a question, or want to provide feedback, email datalead@stats.govt.nz.

Content last reviewed 11 January 2021.

Top