# Data Collection Guide

### 🎯 Purpose: From Survey to Visualization

This guide helps programme officers collect data that can be aggregated, analyzed, and visualized across UNESCO sectors. Following these practices ensures your data meets international statistical standards and contributes to organization-wide insights.

***

### 🏗️ Foundation: Structure for Statistical Quality

#### The Data Hierarchy Principle

Your survey should produce data that flows naturally from individual responses to global dashboards. Each question should generate data that can be:

* Aggregated (summed, averaged, or counted across countries)
* Compared (benchmarked against other regions or time periods)
* Visualized (displayed in charts, maps, or dashboards)

Example of Good Data Structure: Instead of asking "How is literacy in your country?" (produces text), ask:

* Adult literacy rate: \_\_\_% (produces comparable number)
* Data year: \_\_\_\_ (enables trend analysis)
* Source: \[Dropdown of standard sources] (ensures reliability)

#### Essential Components for Visualization-Ready Data

Every data point collected should include these five dimensions to meet statistical standards:

1. Geographic identifier - ISO2 country codes enable automatic map generation
2. Temporal marker - Consistent date formats allow trend analysis
3. Numerical value - Quantities that can be calculated and compared
4. Unit specification - Clear whether it's %, absolute numbers, or USD
5. Quality indicator - Confidence level for appropriate data treatment

***

### 📋 Survey Design for Statistical Integrity

#### Question Architecture

Design questions that produce discrete, analyzable data points rather than narratives.The UNESCO Institute for Statistics (UIS) methodology recommends this distribution:70% Quantitative Metrics These form the backbone of cross-sector dashboards.Examples:

* Budget allocations (numerical with currency)
* Participant numbers (integer with demographic breakdowns)
* Coverage percentages (0-100 scale with geographic distribution)
* Timeline indicators (dates in ISO format: YYYY-MM-DD)

20% Coded Qualitative Data Transform qualitative insights into analyzable categories:

* Implementation stages: Planning (1), Early (2), Mid (3), Advanced (4), Complete (5)
* Challenge types: Pre-defined list of 10 standard challenges with severity scale
* Success factors: Standardized taxonomy aligned with UNESCO's results framework

10% Contextual Fields Limited open text for nuances that numbers can't capture, but with structure:

* Word limits (250 words maximum)
* Specific prompts ("Describe the primary innovation in your approach")
* Optional fields that don't break analysis if left empty

#### 🔗 SDG Integration for Cross-Sector Analysis

Every indicator must map to at least one SDG target. This enables:

* Automatic contribution calculations for UNESCO's SDG reporting
* Cross-programme synergy identification
* Resource optimization insights

Structure SDG questions to capture:

* Primary contribution (single selection from SDG 1-17)
* Secondary impacts (multiple selection)
* Specific targets (dropdown of official target numbers)
* Contribution level (Direct/Indirect/Enabling - with definitions)

***

### 🛠️ Technical Standards for Data Quality

#### Data Validation Rules

You could refer to the Data Standards Formats [UNESCO Data Quality Standards and Guidelines](https://unesco.sharepoint.com/:fl:/g/contentstorage/CSP_8c5957d7-b8cd-438b-9c67-b43d2baa6a13/EYZjP0CIJXJPoqJAwbzvObcBSNyCc3q1j9kYwIgRZEDJxg?nav=cz0lMkZjb250ZW50c3RvcmFnZSUyRkNTUF84YzU5NTdkNy1iOGNkLTQzOGItOWM2Ny1iNDNkMmJhYTZhMTMmZD1iITExZFpqTTI0aTBPY1o3UTlLNnBxRXdTV2VWSlBkMTlDcE5QMTdFa0dMelNWRy1JbS1tSjdTYXozUzRlbXV2aGsmZj0wMUg2WFFSWjRHTU03VUJDQkZPSkgyRklTQVlHNk82T05YJmM9)Implement these validation standards to ensure statistical reliability:Range Validations

* Percentages: Must be between 0-100
* Distributions: Must sum to 100% (±1% for rounding)
* Dates: Cannot be future dates for historical data
* Budget figures: Positive numbers only, with maximum thresholds

Consistency Checks

* If total population is provided, subgroups cannot exceed it
* End dates must be after start dates
* Baseline values must precede target values

Completeness Requirements

* Core indicators: 100% required for submission
* Enhanced indicators: 80% required for "complete" status
* Context fields: Optional but encourage >50% completion

#### 📐 Standardization Through Reference Lists

All categorical data must use UNESCO's Single Source of Truth (SSOT) lists:

* Countries: ISO 3166-1 alpha-2 codes (ISO2) [États membres](https://data.unesco.org/explore/dataset/cou001/information/)
* Languages: ISO 639-2 codes
* Currencies: ISO 4217 three-letter codes
* Organizations: UNESCO's institutional taxonomy [Thésaurus de l'UNESCO](https://data.unesco.org/explore/dataset/voc001/information/)
* Programme areas: Official UNESCO sector classifications

This standardization enables:

* Automatic data merging across surveys
* Consistent filtering in dashboards
* Accurate regional aggregations
* Valid cross-country comparisons

***

### 📈 Collection Strategy for High-Quality Datasets

#### Phased Approach for Different Capacities

Recognize that member states have varying statistical capabilities. Design your survey with progressive disclosure:Tier 1 - Essential (All respondents) Focus on 10-12 core indicators that even basic statistical offices can provide:

* National totals without disaggregation
* Annual data without quarterly breakdowns
* Primary programme data without sub-components

Tier 2 - Standard (Most respondents) Add disaggregation and detail for countries with established systems:

* Gender breakdowns
* Urban/rural splits
* Age group distributions
* Quarterly or semi-annual data

Tier 3 - Advanced (Statistical institutes) Comprehensive data for sophisticated analysis:

* Multiple disaggregation dimensions
* Time series data
* Confidence intervals
* Methodological documentation

#### Response Quality Optimization

To achieve datasets suitable for statistical analysis:Pre-populate Known Values

* Previous year's data as baseline
* Country codes and regional classifications
* Standard conversion rates

Provide Calculation Assistance

* Embedded calculators for percentages
* Automatic currency conversion options
* Built-in data quality scores

Enable Collaborative Completion

* Save and share draft functionality
* Multiple authorized contributors
* Comment fields for internal notes

***

### ✅ Quality Assurance for Statistical Standards

#### Data Quality Framework

Apply the UN Statistical Quality Assurance Framework dimensions:Relevance: Every indicator links to policy questions and SDG targets Accuracy: Validation rules prevent impossible values Timeliness: Collection windows align with reporting cycles Accessibility: Data exports in standard formats (CSV, JSON, SDMX) Interpretability: Metadata explains collection methods and limitations Coherence: Consistent definitions across all UNESCO surveys

#### Pre-Launch Checklist

* ☑️ Test data exports import correctly into Power BI
* ☑️ Verify all percentage questions sum properly
* ☑️ Confirm SDG mapping produces expected categorizations
* ☑️ Check mobile data entry maintains validation rules (offline feature needed or not)
* ☑️ Validate that aggregation functions work with test data
* ☑️ Ensure missing data doesn't break calculations

***

### 🎨 From Collection to Visualization

#### Data Structure for Dashboard Success

Design your data collection with the end visualization in mind:Geographic Visualizations

* Include latitude/longitude for precise mapping
* Use standard geographic hierarchies (country > region > district)
* Collect both current and historical geographic data

Temporal Analysis

* Consistent date formats throughout
* Include data collection date AND reference period
* Enable year-over-year comparisons

Cross-Sector Comparisons

* Use standardized scales across programmes
* Include normalization factors (per capita, per GDP)
* Maintain consistent categorizations

#### Export Specifications

Configure your survey platform to produce:

* Raw data: Complete responses with all metadata
* Cleaned data: Validated and standardized values
* Aggregated data: Summary statistics by key dimensions
* Metadata: Data dictionary, collection methods, quality indicators

Standard format requirements:

* UTF-8 encoding for international characters
* ISO 8601 date formats (YYYY-MM-DD)
* Decimal points (not commas) for numbers
* Empty cells for missing data (not "N/A" or "0")


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unesco.gitbook.io/unesco-data-ai/good-practices/data-collection-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
