Data Methodology — Apiar Data

Source Selection

Data quality begins at the source. Our source selection criteria prioritise institutional provenance, methodological transparency, and licence compatibility.

We treat provenance as a first-class attribute. Every number on the platform can be traced to its origin.

— Apiar Data Quality Framework

Primary sources include:

Official statistical agencies: Bureau of Labor Statistics, Office for National Statistics, Eurostat, Statistics Canada, and equivalents in 60+ countries
International organisations: IMF, World Bank, OECD, BIS, UN statistical division
Central banks: Federal Reserve, European Central Bank, Bank of England, and other major central banks
Academic repositories: FRED (Federal Reserve Bank of St. Louis), CEPR, and peer-reviewed data publications

Collection Process

Automated Ingestion

Structured data from agency APIs, statistical portals, and machine-readable publications is collected via automated pipelines that run on defined schedules.

Manual Curation

Data from sources that lack structured APIs — PDFs, web tables, irregular releases — is ingested through semi-automated processes with human verification.

All ingested data is validated against schema checks before entering the platform. Validation errors trigger alerts for manual review. Data that fails validation is held in quarantine and not published until the issue is resolved.

Data Normalisation

Raw data from different sources arrives in incompatible formats: different time granularities, units of measurement, calendar conventions, and geographic classifications. Normalisation is the process of resolving these incompatibilities.

Our normalisation pipeline applies the following transformations:

Date standardisation: All time stamps are converted to ISO 8601 format with explicit UTC offset
Unit harmonisation: Base units are preserved; derived units (index values, ratios) are clearly labelled
Geographic coding: Geographic identifiers are mapped to ISO 3166 country codes and UN M49 region codes
Classification alignment: Where applicable, industry and product classifications are mapped to standard schemas (ISIC, HS, CPA)

Normalisation is applied non-destructively. Original source values are always preserved alongside normalised values.

Quality Assurance

We apply a multi-stage quality assurance process to all data before publication:

Schema validation: Structural checks against expected data types, ranges, and formats
Outlier detection: Statistical tests flag values that deviate significantly from historical series patterns
Continuity checks: Sudden breaks in series values or coverage gaps trigger manual review
Cross-source consistency: For series covered by multiple sources, automated checks compare values and flag significant divergence
Manual review: Flagged series are reviewed by a team member before publication

Quality flags are displayed on dataset pages where relevant. Known issues, caveats, and data limitations are documented in the methodology notes for each dataset.

Update Frequency

Daily High-frequency series (prices, markets)

Monthly Standard macro series

Quarterly National accounts, employment

Annual Structural & demographic series

Update schedules are set based on source release cadence. Where source agencies publish advance release calendars, we configure ingestion pipelines to run shortly after the scheduled release time. For agencies without advance calendars, we monitor sources continuously.

The “last updated” timestamp on each dataset page reflects the most recent data point, not the last time we checked the source.

Handling Revisions

Statistical agencies regularly revise previously published data — for seasonal adjustment, methodological improvements, or corrected source data. Our approach to revisions is:

Revisions are applied to the live series automatically when detected
The revision date and nature of the change are logged in the series metadata
For significant revisions, a note is added to the dataset page explaining the change
Vintage series (point-in-time snapshots) are maintained for datasets where revision history matters for analysis

We do not backfill revisions silently. All changes to published data are logged and queryable via the API.

Methodology FAQ

When multiple sources report the same series with different values, we flag the discrepancy and link to all source datasets. We do not arbitrate between official sources. The primary source listed on a dataset page is the one used in Apiar composite metrics.

Where source agencies publish both seasonally adjusted (SA) and unadjusted (NSA) series, we carry both. We do not apply our own seasonal adjustment; we rely on the methodology of the originating agency. The adjustment status is clearly labelled on every series.

We maintain a full revision history for all series. When a source agency revises historical data, we update the series and record the revision date. Point-in-time data (vintage series) is available for select datasets where analytical accuracy requires working with originally published values.

Data is stored and presented in the currency reported by the source agency. Currency conversion to USD or EUR is available as an on-the-fly transformation using daily exchange rates from the European Central Bank. Converted values are clearly distinguished from original source data.