Charles Explorer logo
🇬🇧

PM2.5 Estimation in the Czech Republic using Extremely Randomized Trees: A Comprehensive Data Analysis

Publication at Faculty of Science |
2023

Abstract

The accuracy of artificial intelligence techniques in estimating air quality is contingent upon a multitude of influencing factors. Unlike our previous study that examined PM(2.5) over whole Europe using unbalanced spatial-temporal data, the focus of this study was on estimating PM(2.5) specifically over the Czech Republic using more balanced dataset to train and evaluate the model.

Moreover, the spatial autocorrelation between PM(2.5) measurements was taken into consideration while building the model. The feature importance while developing the Extra Trees model revealed that spatial autocorrelation had greater significance in comparison to commonly used inputs such as elevation and NDVI.

We found that R(2) of the 10-CV for the new model was 16% higher than the previous one. Where R(2) reached 0.85 with RMSE=5.42 µg/m3, MAE=3.41 µg/m3, and bias=-0.03 µg/m3.

The developed spatiotemporal model was employed to generate comprehensive daily maps covering the entire study area throughout the period 2018-2020. The temporal analysis showed that the levels of PM2.5 exceeded recommended limits during the year 2018 in many regions.

The eastern part of the country suffered from the highest concentrations especially over Zlín and Moravian-Silesian Regions. Air quality improved during the next two years in all regions reaching promising levels in 2020.

The generated dataset will be available for other future air quality studies.