Internet of Things

Forecasting the Future: Air Quality in South Africa with the IoT

Share this post:

Tapiwa Chiwewe is based at IBM's lab in Johannesburg.

Tapiwa Chiwewe is based at IBM’s lab in Johannesburg.

Nearly a year ago IBM Research announced that it will expand its Green Horizons initiative globally to enable city governments, utility companies and factories to better understand and improve their relationships with the environment, and to help tackle pressing issues related to air pollution and climate change.

Green Horizons applies IBM’s advanced machine learning and Internet of Things (IoT) technologies to ingest and learn from vast amounts of Big Data, including weather and pollutants, constantly self-configuring and improving in accuracy to create some of the world’s most-accurate energy and environmental forecasting systems.

One of the pilot cities is Johannesburg, South Africa in a collaboration with IBM Research – Africa and South Africa’s Council for Scientific and Industrial Research.

IBM scientist Tapiwa Chiwewe is leading the research which uses historical and real-time data from environmental monitoring stations and machine learning and cognitive models to provide insight about air pollution, ground level ozone, and air quality to model the effectiveness of intervention strategies.

At the opening of IBM’s newest lab in South Africa back in August I caught up with him to see status of the pilot and the next steps.

How far in advance are the forecasts made?

Tapiwa Chiwewe (TC): The historical results for the pilot are day-ahead (tomorrow) forecasts. For operational forecasts, the plan is to provide daily forecasts for up to seven days in advance.

What is the spatial resolution of the forecasts?

TC: The results for the pilot are in 10×10 km mesh grids; the same can be done for a live, operational system. Higher resolution of up to 1×1 km can also be made available with additional computing resources.

How much prediction error could significantly affect decisions?

TC: For the decision making, roughly over 70 percent accuracy (consider Mean Absolute Error/Obvious Mean) could be good enough, and the forecasts of pollution level categories (no pollution, moderate pollution, heavy pollution) are also very useful for public warnings.

How much fluctuations in the level of pollutants occur within a single day or site, and how does this compare to prediction error?

TC: The fluctuation of pollution depends on the particular weather patterns. High-impact weather conditions (strong winds, rainfall, stationary low-pressure, etc.) will quickly change the pollution levels in a few hours. The air quality forecasts make use of weather model predictions to capture those high-impact weather conditions, which means the scale of forecast error is much smaller than the fluctuation.

Below are animations showing the forecast levels of PM10 and PM2.5 over a period of time.

An animation showing the forecast levels of PM10 and PM2.5 over a period of time.

How precisely can we today or in the future pinpoint where the pollutants are coming from?

TC: Pollution sourcing is a different topic from forecasting, as it can describe the contribution of each pollution source, which includes the direct emission, weather transportation, and second-phase emission from chemical reactions.

When using a sourcing model with an accurate emissions inventory, it is possible to track the pollution sources from various locations and industries for particular days. But this needs the sourcing model, which is not included in the current pilot, but could be included for future commercial projects. Alternatively, wind can be used to give simple sourcing methods, when only considering the transportation of pollutants to make this model much easier.

What is required to operationalise the system? In other words, make it live?

TC: Real-time sensor data is required. Daily forecasts can then be distributed (considering three days into the future for a start), and those forecast results can be shown or analyzed using the online portal. It’s also necessary to integrate the forecasting feature into the online portal.

Where does the data come from that you used and how is it collected?

TC: The data comes from three air quality monitoring networks, namely the City of Johannesburg, City of Thswane, and Vaal Triangle air quality monitoring networks in South Africa. The networks have 21 monitoring stations between them. The data is collected through requests that are made to the South African Air Quality Information System. We are also planning to add the weather data from The Weather Company, an IBM business.

Can you explain why you chose to model PM10, PM2.5, Ozone and Nitrogen Dioxide? Are these simply the most harmful?

TC: There are certain priority pollutants that have been identified as posing the most risk to the health and well being of people, wildlife, and the environment. There are air quality management plans that environmental authorities in South Africa release regularly (typically every five years) where they explicitly identify priority pollutants, given their potential to cause harm, as well as describe the observed effectiveness of intervention strategies that they have in place to control air pollution.

Which Johannesburg township did you choose as a data source?

TC: The quality of the data that is collected at the different stations varies in many ways, such as how regularly readings are reported, the sampling intervals for taking measurements, and the accuracy of the readings. This is due in part to maintenance issues around the equipment at the monitoring stations.

Diepkloof is an example of one of the stations that had good quality data which allowed us to give a good comparison of how the forecasts compare with the actual readings measured without cases of, for example, big gaps in the readings reported, which would not allow for direct comparisons to be done.

What’s next for your research?

TC: In addition to getting access to more data, we are working on releasing an API for developers to create apps for consumers and businesses to use. Developers can contact me directly if they are interested.

For more scientific details on this research read Tapiwa’s paper: Machine Learning Based Estimation of Ozone Using Spatio-Temporal Data from Air Quality Monitoring Stations


More Internet of Things stories

Gauteng Province Launches COVID-19 Dashboard Developed by IBM Research, Wits University and GCRO – Now Open to the Public

The Gauteng Province has been using data and cloud technologies to monitor and respond to Covid-19, and now they are sharing access with the public. As of 20 August the Gauteng Province in South Africa has 33% of the national cases for COVID-19 with 202,000 confirmed cases — and the numbers continue to rise. To address […]

Continue reading

Largest Dataset for Document Layout Analysis Used to Ingest COVID-19 Data

Documents in Portable Document Format (PDF) are ubiquitous with over 2.5 trillion available from insurance documents to medical files to peer-review scientific articles. It represents one of the main sources of knowledge both online and offline. For example, just recently The White House made available the COVID-19 Open Research Dataset, which included links to 45,826 papers […]

Continue reading

IBM Research AI at ICASSP 2020

The 45th International Conference on Acoustics, Speech, and Signal Processing is taking place virtually from May 4-8. IBM Research AI is pleased to support the conference as a bronze patron and to share our latest research results, described in nine papers that will be presented at the conference.

Continue reading