Skip to main content

IOM releases the Global Synthetic Dataset

For two years, Khadijetou, a victim of trafficking, was exploited, tortured, deprived of his income and his family. His face and his body are keeping track of daily abuse. Sibylle Desjardins / IOM  © International Organization for Migration
For two years, Khadijetou, a victim of trafficking, was exploited, tortured, deprived of his income and his family. His face and his body are keeping track of daily abuse. Sibylle Desjardins / IOM © International Organization for Migration

IOM releases today the Global Synthetic Dataset, the largest publicly available  source of individual-level data on human trafficking. 

The dataset is made possible by innovative technology to protect the safety and privacy of victims and survivors. Developed in partnership with Microsoft Research, this dataset provides in-depth information to accelerate evidence-based policy in the fight against human trafficking. The dataset is available for visualization and download on the Counter Trafficking Data Collaborative (CTDC) website.

The Global Synthetic Dataset provides critical information on the socio-demographic profiles of victims, types of exploitation, and the trafficking process, including means of control used on victims. This data, updated in 2024, represents 20 years of assistance and hotline data – with contributions from IOM, Polaris, A21, RecollectiV, and the Portuguese Observatory on Trafficking in Human Beings (OTSH). This dataset combines information about more than 206,000 victims and survivors of trafficking identified across 190 countries and territories from 2002 to 2022.

This is the third synthetic dataset derived from victim of trafficking case records. It accurately preserves the statistical properties of the original victim case records, while providing the guarantee of differential privacy. Differential privacy was first developed by Microsoft Research in 2006, and today represents the gold standard in privacy protection. The differential privacy approach to synthetic data generation provides quantifiable privacy guarantees against privacy attacks, even across multiple data releases. The technology has enabled CTDC to share more data than it did in the past, and conduct more robust research, while protecting privacy and civil liberties.

Further information on the approach is available through the open-source software and documentation on differential privacy via GitHub. Please find the related dataset, codebook, and data dictionary on the CTDC website. We also encourage you to check out the FAQs page for more information about the data.

This data release and supporting technology were made possible by the Tech Against Trafficking 2019 Accelerator Program, in which IOM worked with Microsoft, Amazon, BT, Salesforce, and the broader community to advance the data and technology foundations of the CTDC platform.

For more information, please contact:
IOM: Lorraine Wong, at lwong@iom.int and Claire Galez-Davis at cgalez@iom.int
Microsoft: Darren Edge, at darren.edge@microsoft.com

This data release is supported by the Ministry of Foreign Affairs (MFA) of the Netherlands through the Cooperation on Migration and Partnerships to Achieve Sustainable Solutions (COMPASS) initiative. The contents are the responsibility of the authors and do not necessarily reflect the views of the Dutch MFA.  

FAQ

Why should you  use the Global Synthetic Dataset?

The Global Synthetic Dataset includes in-depth information on the socio-demographic profile of trafficked persons, the types of exploitation, the means of control used on victims, and more. Such information can help you better understand trafficking as a crime and the needs of survivors. The Dataset also provides information to protect vulnerable populations and inform counter-trafficking interventions.

What is the main objective of the Global Synthetic Dataset?

The Global Synthetic Dataset aims to provide comprehensive and privacy-protected information to advance research, policymaking, and intervention strategies in combating human trafficking worldwide.

How can the Global Synthetic Dataset support your work?

The Global Synthetic Dataset provides policymakers detailed insight into how trafficking affects their countries – at the level of individual victim case records – in ways that can help build understanding, inform policy decisions, and channel assistance and prevention resources more effectively. Moreover, the synthetic dataset enables the release of data that were unsafe to publish before. This empowers stakeholders such as researchers and statisticians to analyse and explore this previously unavailable information.