Partial Least Squares Regression Based Variables Selection for Water Level Predictions
Noraini Ibrahim and Antoni Wibowo
DOI : 10.3844/ajassp.2013.322.330
American Journal of Applied Sciences
Volume 10, Issue 4
Floods are common phenomenon in the state of Kuala Krai, specifically in Kelantan-Malaysia. Every year, floods affecting biodiversity on this region and also causing property loss of this residential area. The residents in Kelantan always suffered from floods since the water overflows to the areas adjoining to the rivers, lakes or dams. Months, average monthly rainfall, temperature, relative humidity and surface wind were used as predictors while the water level of Galas River was used as response. The selection of suitable predictor variables becomes an important issue for developing prediction model since the analysis data uses many variables from meteorological and hydrogical departments. In this study, we conduct K-fold Cross-Validation (CV) to select the important variables for the water level predictions. A suitable prediction model is needed to forecast the water level in Galas River by adopting the Ordinary Linear Regression (OLR) and Partial Least Squares Regression (PLSR). However, we need to perform pre-processing data of the datasets since the original data contain missing data. We perform two types of pre-processing data which are using mean of the corresponding months (type I pre-processing data) and OLR (type II pre-processing data) of missing data. Based on the experiment, PLSR is more suitable model rather than OLR for predicting the water level in Galas River and the use of the type I pre-processing data gives higher accuracy than the type II pre-processing data.
© 2013 Noraini Ibrahim and Antoni Wibowo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.