Governments across the world came together in Marrakesh this past December to ratify a pact to improve cooperation on international migration. Among other objectives, the Global Compact for Migration seeks to use “accurate and disaggregated data as a basis for evidence-based policies.” How can machine learning technologies help with deeply polarizing societal issues like migration?
In early 2018, with support from IBM Corporate Citizenship and the Danish Ministry for Foreign Affairs, IBM and the Danish Refugee Council (DRC) embarked on a partnership aimed squarely at the need to better understand migration drivers and evidence-based policy guidance for a range of stakeholders. At the recent THINK Copenhagen keynote, the Secretary General of the DRC, Christian Bach, presented the first results of this effort.
If we can predict migration and refugee flows we can prevent and improve the protection of people on the move. Great partnership with @IBM on predictive modelling. Presented first results at #THINK2018CPH! pic.twitter.com/x1JDyI8s6L
In this post, I’ll walk through the development of a machine learning system that provides strategic forecasts of mixed migration along with scenario analysis. Mixed migration refers to cross-border movements of people that are motivated by a multiplicity of factors to move, including refugees fleeing persecution and conflict, victims of trafficking, and people seeking better lives and opportunity. Such populations have a range of legal statuses, some of which are not reflected in official government statistics.
Understanding migration dynamics and drivers is inherently complex. Circumstances differ from person to person. The question “why did you decide to move?” is not straightforward for people to answer. However, to the extent that individual decisions reflect structural societal factors, the dynamics can be partially explained by aggregate measures. For instance, economic drivers for movement can be expected to be related to employment opportunities and therefore macro indicators on employment. These challenges are compounded by data availability and coverage on specific indicators.
The forecasting system
We started by leveraging the 4MI monitoring program run by the DRC through which thousands of migrants on the move are interviewed. Analysis of survey data reveals high-level clusters of drivers for migration. These clusters ranged from lack of rights and other social services, to economic necessity and conflict. These drivers are then mapped to quantitative indicators. Features derived from these indicators are then fed to a model that generates forecasts along with confidence intervals (Figure 1). In addition, the system also generates context for each prediction by showing specific drivers that contributed to the forecast.
Figure 1: Features derived from indicators are then fed to a model that generates forecasts along with confidence intervals
In our pilot implementation, we focused on migration from Ethiopia to six destination countries where DRC subject matter experts had their pulse on the situation. We gathered a range of development indicators, 85 in total, from several institutional providers so that a sufficient range of migratory drivers were represented in the model. These included statistics on the labour economy, food, education, socio-demographics, infrastructure, strength of institutions, and governance (Figure 2).
Figure 2: Correlation matrix for all features considered in the model (no temporal effects)
Using these indicators, we developed an ensemble model to make strategic forecasts annually for bilateral flows on mixed-migration volumes annually. Our evaluations showing error rates to be within a few thousand persons per year even for countries with volatile conditions. The system further allows for scenario analysis, where relative changes in influencing factors can be modeled to make adjusted predictions.
Interesting counter-intuitive dynamics emerge from such analysis. For instance, unemployment rates in Ethiopia are above average compared to Sub-Saharan countries. A large number of Ethiopians travel to Saudi Arabia for work. Increases in employment rates to the best fifth in the region will result in greater migration to the UK (two percent increase), Sweden (two percent increase) and Saudi Arabia (eight percent increase). This reflects an increased ability and means of Ethiopians to meet their aspirations abroad. If unemployment increases to the worst levels, the model predicts an increase of migration to South Africa (three percent increase) and Saudi Arabia (four percent increase), with EU destinations largely invariant to increases in unemployment.
Such detailed quantitative analysis has previously not been available to stakeholders who need to formulate policy responses.
The forecasting system described above is purely data-driven where we rely on the model to derive relationships between all the variables. Alternatively, if we seek to exploit subject matter expertise and include specific insights in the system, we could take the approach of probabilistic graphical models.
At a workshop held in our IBM Research-Ireland lab, subject matter experts from the Mixed Migration Centre in Geneva and DRC drew out the “spaghetti” network showing how they expect indicator clusters to be causally linked. Using this as input, we then combined their expert opinion with the data. We used a technique called structure learning to develop such a network.
Figure 3: (left) Causal network drawn by experts and (right) network learnt based on expert opinion and evidence based on data for all of Sub-Saharan Africa
Forecasting using such networks typically don’t perform as well as purely data-driven approaches presented above; nevertheless, they do aid in scenario analysis and policy analysis.
These are the first few steps towards a future where policy makers have instant access to evidence when and where it is needed and where complex relationships can be explored easily to provide more insight driving better policy.
For now, we are continuing to improve the system and gather user feedback with subject experts within the DRC. Following more detailed validation, we will look to expand the geographic scope and scenario analysis capabilities.
Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.”
The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.