# UNESCO Data Quality Standards and Guidelines

### Introduction

This guide provides comprehensive standards for data quality management within UNESCO organizations. It aims to harmonize data practices across all UNESCO projects and provide support to data managers and project managers, regardless of their technical expertise.System owners and data owners should, wherever possible, consider relevant international and standards for data elements. Standards bodies dealing.The official UNESCO data catalog, which follows the DCAT standard, is available at [https://data.unesco.org](https://data.unesco.org/) . All UNESCO-related data catalogs must adhere to these same standards.\ <br>

### **Official UNESCO Policy: Data Governance, Stewardship, and Compliance Standards**

**UNESCO’s Data Stewardship and Format Standards** from Administrative Manual represent the organization’s validated official internal policies, endorsed through rigorous governance. These standards have undergone comprehensive review by the Internal Oversight Service (IOS), the Data Protection Officer (DPO), and Legal Affairs, with valuable contributions from the Web Communication team (CPE). Coordinated by Digital Business Solutions (DBS) and developed in consultation with UNESCO’s Sectors, these policies ensure the integrity, interoperability, and accessibility of data across all projects. They provide a robust framework for data managers and system owners to harmonize data quality, adhere to global standards, and align with UNESCO’s mission of fostering transparency and knowledge sharing. For detailed guidelines, refer to the Administrative Manual - Data Policies (Section 9.6A).

* [9.6 Data Stewardship and Format Standards](https://unesco.sharepoint.com/:w:/r/sites/ADM-Manual/_layouts/15/Doc.aspx?sourcedoc=%7B8F7DCA14-193D-4E40-8CC7-BF992DE50E63%7D\&file=9.6%20Data%20Stewardship%20and%20Format%20Standard.docx\&action=default\&mobileredirect=true)
  * [9.6A Data Stewardship and Format Standards - Technical Compliance](https://unesco.sharepoint.com/:w:/r/sites/ADM-Manual/_layouts/15/Doc.aspx?sourcedoc=%7BCC045437-B52A-4741-B954-D3D2D5A811D8%7D\&file=9.6A%20Data%20Stewardship%20and%20Format%20Standards%20-%20Technical%20Compliance.docx\&action=default\&mobileredirect=true)
* [Communication Webpage](https://dataviz.unes.co/data-policies/)

### Summary

* [Data Quality](#data-quality)
* [Data Governance](#data-governance)
* [Data Standards Organizations](#data-standards-organizations)
* [File Format Standards](#file-format-standards)
* [CSV Requirements](#csv-requirements)
* [Dataset ID Format](#dataset-id-format)
* [File Naming Conventions](#file-naming-conventions)
* [Supported File Types](#supported-file-types)
* [Metadata Standards](#metadata-standards)
* [Why Use Standard Metadata?](#why-use-standard-metadata)
* [Metadata Fields and Requirements](#metadata-fields-and-requirements)
* [Recommended Metadata Standards](#recommended-metadata-standards)
* [Additional Guidelines](#additional-guidelines)
* [Data Types and Formatting](#data-types-and-formatting)
* [Basic Data Types](#basic-data-types)
* [Arrays and Lists](#arrays-and-lists)
* [Dates](#dates)
* [Country and Language Codes](#country-and-language-codes)
* [JSON Table](#json-table)
* [URLs](#urls)
* [Phone Numbers](#phone-numbers)
* [Geographic Data](#geographic-data)
* [HTML Content](#html-content)
* [Boolean Fields](#boolean-fields)
* [Float / Decimal Numbers](#float-decimal-numbers)
* [Data Models and Common Standards](#data-models-and-common-standards)
* [UUID and ID Management](#uuid-and-id-management)
* [Reference Datasets, SSOT (Single Source of Truth)](#reference-datasets-ssot-single-source-of-truth)
* [Common Fields](#common-fields)
* [Multilingual Support](#multilingual-support)
* [Translation Guidelines](#translation-guidelines)
* [Thematic Standards](#thematic-standards)

### Data Quality

Data quality is a critical aspect of managing data within UNESCO organizations. Ensuring high-quality data involves adherence to international standards, consistent data formats, and robust validation methods. The following principles guide data quality at UNESCO:

1. Accuracy: Data must be correct, reliable, and free from errors.
2. Consistency: Data should be consistent across all datasets and applications.
3. Completeness: All necessary data elements should be present and not contain empty values unless explicitly allowed.
4. Timeliness: Data should be up-to-date and regularly reviewed.
5. Accessibility: Data must be easily accessible to authorized users while maintaining security and privacy standards.
6. Compliance: All data should comply with international and UNESCO-specific standards.

#### Data Governance

System owners and data managers should implement data governance practices to maintain data quality over time. This includes setting up regular data validation processes, ensuring metadata is updated, and implementing access controls.<br>

#### Data Standards Organizations

UNESCO aligns with international data standards to ensure data consistency, interoperability, and quality across its projects. The following organizations provide essential standards that UNESCO projects must adhere to:

| Organization | Website                                                    | Focus Area                                                                        |
| ------------ | ---------------------------------------------------------- | --------------------------------------------------------------------------------- |
| ISO          | [iso.org](http://www.iso.org/iso/catalogue_ics)            | International standards across all domains, including data management and quality |
| UN/CEFACT    | [unece.org/cefact](http://www.unece.org/cefact/about.html) | Trade facilitation and electronic business standards                              |
| W3C          | [w3.org](https://www.w3.org/)                              | Web standards, including RDF and DCAT metadata standards                          |
| IETF         | [ietf.org](https://www.ietf.org/)                          | Internet protocols and standards, including JSON and data exchange formats        |
| Schema.org   | [schema.org](https://schema.org/)                          | Structured data for the internet, including metadata formats and taxonomies       |

These organizations provide foundational standards, including ISO 8601 for date formatting, ISO 3166-1 for country codes, and ISO 639-1 for language codes, which are all recommended for UNESCO datasets.<br>

### File Format Standards

UNESCO datasets should use open, widely accepted file formats to facilitate data sharing, interoperability, and long-term accessibility. The primary format for tabular data is CSV (Comma-Separated Values), adhering to the UNIX open-source CSV standard.

#### CSV Requirements

| Requirement    | Specification              | Example                  |
| -------------- | -------------------------- | ------------------------ |
| Encoding       | UTF-8                      | -                        |
| Separator      | Comma (,)                  | `value1,value2,value3`   |
| Text Delimiter | Double quotes (")          | `"text,with,commas"`     |
| Empty Values   | Empty string               | `""`                     |
| Column Names   | Lowercase with underscores | `first_name, birth_date` |

* Empty Rows: No "N/A" or placeholder text should be used. Leave fields empty (`""`) if no value is available.
* Avoid Special Characters: Avoid using special characters in column names and data values, unless necessary.

#### Dataset ID Format

Each dataset should include a unique identifier following specific formatting rules to maintain consistency:

| Unit Length | Format   | Example  | Notes                           |
| ----------- | -------- | -------- | ------------------------------- |
| 2 letters   | `XX####` | `DC0001` | 4 digits, for short unit codes  |
| 3 letters   | `XXX###` | `DCE001` | 3 digits, for longer unit codes |

ID Naming Convention: The ID should reflect the unit or department associated with the dataset. Before assigning a new ID, verify the existing datasets to avoid duplication.

#### File Naming Conventions

When storing files, especially reports and datasets, follow this naming pattern to ensure consistency:

```plaintext
[sector]_[entity]_[year]_[country]_[filename].pdf
```

Example: `culture_education_2024_FR_project_report.pdf`

#### Supported File Types

| File Type | Usage                          | Example             |
| --------- | ------------------------------ | ------------------- |
| CSV       | Tabular data                   | `data_export.csv`   |
| JSON      | Structured data, API exchanges | `metadata.json`     |
| PDF       | Reports and official documents | `annual_report.pdf` |
| GeoJSON   | Geographic data                | `map_data.geojson`  |

<br>

### Metadata Standards

Metadata is critical for describing datasets, enhancing discoverability, and enabling interoperability with other data systems. UNESCO adheres to internationally recognized metadata standards, including DCAT (Data Catalog Vocabulary) and Dublin Core.

#### Why Use Standard Metadata?

* Consistency: Standardized metadata ensures that datasets are described uniformly across projects.
* Interoperability: Enables integration with external systems and data catalogs.
* Improved Discoverability: Metadata enhances searchability and usability in data catalogs.
* Support for RDF Uses: Complies with semantic web standards, enabling linked data applications.

#### Metadata Fields and Requirements

| Field Category | Required Fields                     | Format         | Example                                               |
| -------------- | ----------------------------------- | -------------- | ----------------------------------------------------- |
| Basic Info     | `title`, `description`              | Text           | `title: "UNESCO Report 2024"`                         |
| Classification | `terms`, `themes`, `categories`     | Array          | `["Education","Culture"]`                             |
| Contact        | `phone`, `email`, `website`         | Formatted text | `email: contact@unesco.org`                           |
| Personal Info  | `first_name`, `last_name`, `gender` | Text           | `first_name: "John"`                                  |
| Geographic     | `location`, `coordinates`           | GeoJSON, Text  | `{"type": "Point", "coordinates": [48.8566, 2.3522]}` |

#### Recommended Metadata Standards

1. DCAT (Data Catalog Vocabulary): Ideal for publishing data catalogs on the web. It supports integration with other public sector datasets.
   * More info: [DCAT Standard](https://www.w3.org/TR/vocab-dcat/) &#x20;
2. Dublin Core: Provides a simple and standard way to describe a wide range of resources, focusing on elements like `title`, `creator`, `subject`, and `date`.
   * More info: [Dublin Core Standard](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) &#x20;

#### Additional Guidelines

* Use controlled vocabularies wherever possible, such as the UNESCO Thesaurus ([vocabularies.unesco.org](https://vocabularies.unesco.org/browser/thesaurus/fr/)).
* Metadata should always be provided in English and, where relevant, in French or other UNESCO official languages.
* For multilingual support, use the suffix convention (`_en`, `_fr`, `_es`, etc.) in metadata field names.

<br>

### Data Types and Formatting

Ensuring consistent data types and formatting across datasets is crucial for data quality and interoperability. UNESCO adheres to standardized formats for all data types, including arrays, dates, geographic data, and numeric values.

#### Basic Data Types

| Data Type       | Format                | Example                | Notes                                              |
| --------------- | --------------------- | ---------------------- | -------------------------------------------------- |
| Arrays/Lists    | `[item1,item2,item3]` | `["Paris","London"]`   | Use quotes if items contain commas                 |
| Dates           | ISO 8601              | `2024-02-20`           | Format: `YYYY-MM-DD`                               |
| Country Codes   | ISO 3166-1 alpha-2    | `FR`, `US`, `JP`       | Two-letter codes only                              |
| Language Codes  | ISO 639-1             | `en`, `fr`, `es`, `ru` | Two-letter language codes                          |
| Phone Numbers   | E.164                 | `+55 11 1234-5678`     | Include the country code                           |
| Boolean         | `true/false`          | `true`                 | Use lowercase only                                 |
| Decimal Numbers | `#.#`                 | `1234567890.12`        | Use a dot as a decimal separator                   |
| Gender          | Schema.org Standard   | `male`, `female`       | [GenderType Schema](https://schema.org/GenderType) |

#### Arrays and Lists

Arrays should be formatted in JSON style according to [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259) standard, enclosed in square brackets. Separate items with commas and enclose all text items with double quotes.Mixed data types are allowed: strings, numbers, booleans, null, objects, nested arrays.

* Empty arrays, use: \[].
* Example: \["Education","Culture","Science"]

#### Dates

<br>

* Standard: ISO 8601
* Format: `YYYY-MM-DD`
* Example: `2025-02-20`

#### Country and Language Codes

* Use ISO 3166-1 alpha-2 for countries (e.g., `FR` for France, `US` for the United States).
* Use ISO 639-1 for language codes (e.g., `en` for English, `fr` for French).

#### JSON Table

When using JSON tables within datasets or metadata, ensure the structure is valid and follows the standard JSON syntax.

```json
{
  "name": "UNESCO Dataset",
  "date": "2025-02-20",
  "countries": ["FR", "US", "JP"]
}
```

#### URLs

* URLs must include the protocol (`http` or `https`).
* For files, follow the naming conventions outlined in the File Format Standards section.

Example: `[sector]_[entity]_[year]_[country]_[filename].pdf`

#### Phone Numbers

* Follow the E.164 international format.
* Include the country code, e.g., `+33 1 2345 6789` for France.

#### Geographic Data

| Type        | Format  | Example                                               |
| ----------- | ------- | ----------------------------------------------------- |
| Coordinates | `x,y`   | `48.8566,2.3522`                                      |
| Areas       | GeoJSON | `{"type": "Point", "coordinates": [48.8566, 2.3522]}` |

#### HTML Content

For fields allowing HTML (e.g., descriptions), only the following tags are permitted:

* Titles: `<h2>`, `<h3>`, `<h4>`
* Lists: `<ul>`, `<li>`
* Links: `<a href="url">text</a>`
* Paragraphs: `<p>Text content</p>`

#### Boolean Fields

* Acceptable values are true and false only.
* Ensure lowercase usage to maintain consistency.

#### Float / Decimal Numbers

UNESCO adopts a raw format with two digits after a single dot for numeric values:

| Format Type | Example             |
| ----------- | ------------------- |
| Raw         | `1234567890.12`     |
| English     | `1,234,567,890.123` |
| French      | `1 234 567 890,123` |
| Italian     | `1.234.567.890,123` |
| German      | `1 234 567.890,123` |

For data processing, always use the raw format to avoid localization issues.<br>

### Data Models and Common Standards

Data models provide a structured approach to organizing data elements, ensuring consistency and interoperability across UNESCO datasets. By adopting standardized data models, UNESCO can facilitate data integration, reporting, and analysis.<br>

#### UUID and ID Management

To ensure unique identification and traceability of datasets, UNESCO mandates the use of both a standard ID and a UUID (Version 4):ID Requirements

* Standard ID: Follow the existing format rules, e.g., `DCE001`, using 2 or 3 letters with the required number of digits.
* UUID V4: Automatically generated, globally unique identifier, stored alongside the standard ID.

Implementation Guidelines

* Columns: Use dedicated columns named `id` (for standard ID) and `uuid` (for UUID V4).
* Automatic Generation: Generate the UUID V4 programmatically during data ingestion or dataset creation.
* Consistency: Ensure every row in the dataset has both `id` and `uuid` populated.

Example:

```json
{
  "id": "DCE001",
  "uuid": "550e8400-e29b-41d4-a716-446655440000"
}
```

\ <br>

#### Reference Datasets, SSOT (Single Source of Truth)

UNESCO maintains several Single Source of Truth (SSOT) datasets that should be referenced wherever applicable:

| Dataset ID | Content           | Usage                                            |
| ---------- | ----------------- | ------------------------------------------------ |
| PAX001     | Member States     | Includes country, regional, and electoral groups |
| LA0001     | Legal Instruments | Instrument IDs, names, and descriptions          |

#### Common Fields

The following fields are standardized across all datasets to ensure uniformity:

| Field Name  | Description                                                          | Example                                                |
| ----------- | -------------------------------------------------------------------- | ------------------------------------------------------ |
| title       | The name or title of the dataset or item                             | `"UNESCO Annual Report"`                               |
| terms       | Keywords or tags related to the content                              | `["Education","Culture"]`                              |
| themes      | Thematic areas covered by the dataset                                | `["Science","Heritage"]`                               |
| categories  | Broader categories for classification                                | `["Research","Policy"]`                                |
| description | A brief summary of the content                                       | `"A comprehensive report on global education trends."` |
| phone       | Contact phone number                                                 | `"+33 1 2345 6789"`                                    |
| email       | Contact email address                                                | `"contact@unesco.org"`                                 |
| website     | Relevant website link                                                | `"https://www.unesco.org"`                             |
| first\_name | First name of a contact person                                       | `"John"`                                               |
| last\_name  | Last name of a contact person                                        | `"Doe"`                                                |
| gender      | Gender, using [Schema.org GenderType](https://schema.org/GenderType) | `"male"` or `"female"`                                 |

#### Multilingual Support

UNESCO datasets should support multilingual content for global accessibility. Follow the suffix convention for field names to indicate the language:

| Language | Code | Column Suffix | Example    |
| -------- | ---- | ------------- | ---------- |
| English  | `en` | `_en`         | `title_en` |
| French   | `fr` | `_fr`         | `title_fr` |
| Spanish  | `es` | `_es`         | `title_es` |
| Russian  | `ru` | `_ru`         | `title_ru` |
| Arabic   | `ar` | `_ar`         | `title_ar` |
| Chinese  | `zh` | `_zh`         | `title_zh` |

Example: A dataset title in multiple languages:

```json
{
  "title_en": "UNESCO Report",
  "title_fr": "Rapport de l'UNESCO",
  "title_es": "Informe de la UNESCO"
}
```

#### Translation Guidelines

* When adding translations, ensure they are accurate and culturally appropriate.
* Always include English as a baseline, with additional languages based on the dataset's intended audience.

#### Thematic Standards

UNESCO adopts specific thematic data standards to ensure consistency and alignment with international practices:

| Domain                               | Standard                   | Documentation Link                                                        |
| ------------------------------------ | -------------------------- | ------------------------------------------------------------------------- |
| Cultural Heritage                    | Europeana Data Model (EDM) | [EDM Documentation](https://pro.europeana.eu/page/edm-documentation)      |
| UN Projects                          | IATI Standard              | [IATI Standard](https://iatistandard.org/en/iati-standard/)               |
| Sustainable Development Goals (SDGs) | UN SDG Indicators          | [SDG Indicators](https://unstats.un.org/sdgs/indicators/indicators-list/) |

Key Guidelines:

* Cultural Heritage: Use EDM for metadata in cultural datasets to support integration with international databases.
* UN Projects: Apply the IATI Standard for transparency and consistency in project data.
* SDGs: Follow UN SDG Indicators for measuring progress on the 2030 Agenda for Sustainable Development.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unesco.gitbook.io/unesco-data-ai/good-practices/unesco-data-quality-standards-and-guidelines.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
