Prediction of the Daily Mean PM10 Concentrations Using Linear Models
J. C.M. Pires, F. G. Martins, S. I.V. Sousa, M. C.M. Alvim-Ferraz, and M. C. Pereira
DOI : 10.3844/ajessp.2008.445.453
American Journal of Environmental Sciences
Volume 4, Issue 5
The performance of five linear models to predict the daily mean PM10 concentrations was compared. The linear models proposed were: (i) multiple linear regression; (ii) principal component regression; (iii) independent component regression; (iv) quantile regression; and (v) partial least squares regression. The study was based on data from an urban site in Oporto Metropolitan Area and the analysed period was from January 2003 to December 2005. The linear models were evaluated with two datasets of different sizes belonging to the analysed period. Environmental data (SO2, CO, NO, NO2 and PM10 concentrations) and meteorological data (temperature, relative humidity and wind speed) were used as PM10 predictors.During the training step, quantile regression presented the lowest residual errors for the two datasets. Independent component regression was the worst model using the larger dataset. Multiple linear regression, principal component regression and partial least squares regression presented similar results for both datasets. During the test step, independent component regression and quantile regression showed bad performance, while multiple linear regression, principal component regression and partial least squares regression presented similar results using the larger dataset. For the smaller dataset, the models that remove the correlation of the variables (principal component regression, independent component regression and partial least squares regression) presented better results than multiple linear regression and quantile regression. Independent component regression was the linear model with the lowest value of residual error. Concluding, the dataset size is also an important parameter for the evaluation of the models concerning the prediction of variables. The prediction of the daily mean PM10 concentrations was more efficient when using independent component regression for the smaller dataset and partial least squares regression for the larger datasets.
© 2008 J. C.M. Pires, F. G. Martins, S. I.V. Sousa, M. C.M. Alvim-Ferraz, and M. C. Pereira. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.