Prediction of Transmissivity of Malikan Plain Aquifer Using Random Forest Method

Authors

Tabriz

Abstract

Transmissivity is an important factor in identifying the characteristics of aquifers, so, estimating its value and distribution by modeling is necessary for aquifer management. Estimation of this parameter using field experiments such as pumping test is costly and time-consuming. For suitable management of Malikan plain, as one of active agricultureal areas in north-west of the country, understanding the hydrogeological parameters, such as transmissivity is required. In this study the random forest (RF) algorithm, which is a learning method based on ensemble of decision trees, is proposed for predicting transmissivity that has not been used in this field, yet. The RF technique has advantages over other methods due to having high prediction accuracy, ability to learn nonlinear relationships, non-parametric natural and ability to determine the important variables in the prediction process. Increasing the number of trees decrease the error, so 500 trees were selected to reap high efficiency of the model. The model results were evaluated by OOB error estimating method and in addition, to reduce the dimensions, increase the accuracy and better interpretation of the model process, the FS method was used. The most important variables in the prediction were also identified by the FS method. Based on the results of RF modeling with AUC=0.96 and MSE=0.036, electrical conductivity, aquifer media and hydraulic gradient variables were the most important parameters in predicting transmissivity, respectively. Also the accuracy of RF model and determining the important parameters in transmissivity prediction showed the advantages of this model over other models in prediction issue.

Keywords

Main Subjects


اصغری مقدم الف، 1389. اصول شناخت آب­های زیرزمینی. انتشارات دانشگاه تبریز.
علوی نائینی م و شهرابی م، 1358. نقشه زمین‌شناسی مراغه، مقیاس 1:100000، سازمان زمین‌شناسی و اکتشاف معدن کشور. تهران.
بی‌نام، 1385. تهیه بیلان و چرخه آب در محدوده مطالعاتی ملکان.شرکت آب منظقه‌ای آذربایجان‌شرقی، تبریز.
ندیری ع، اصغری مقدم ا، عبقری ه، کلانتری اسکویی ع، حسین پور ع و حبیب زاده ا، 1393. مدل منطق فازی در تخمین قابلیت انتقالآبخوان­ها.مجله دانش آب و خاک، شماره1، دوره 24، صفحه‌های 219 تا 233.
 
Bellman R, 2003. Dynamic Programming. Dover Publications 366 p. Mineola, New York.
Booker DJ, Snelder TH, 2012. Comparing methods for estimating flow duration curves at ungauged sites. Journal of Hydrology 435: 78–94.
Breiman L (2001) Random Forests. Machine Learning 45(1): pp. 5–32.
Breiman L, Friedman JH, Olshen RA, Stone CJ, 1984. Classification and Regression Trees. Chapman & Hall/CRC, New York.
Chehata N, Guo L, Mallet C, 2009. Airborne lidar feature selection for urban classification using random forests. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 39: 207-12.
Chen CH, Lin ZS, 2006. A committee machine with empirical formulas for permeability prediction. Journal of Computers and Geosciences 32: 485–496.
Chitsazan N, Nadiri AA, Tsai F, 2015. Prediction and structural uncertainty analyses of artificial neural networks using hierarchical bayesian model averaging. Journal of Hydrology 528: 52-62.
Chow VT, 1952. On the determination of transmissibility and storage coefficient from pumping test data. Transactions, American Geophysical Union 33: 397-404.
Cooper HH, Jacob CE, 1946. A generalized graphical method for evaluation formation constants and summarizing well field history. Transactions, American Geophysical Union 27: 526-534.
Critto A, Carlon C, Marcomini, A, 2003. Characterization of contaminated soil and groundwater surrounding an illegal landfill by principal component analysis and kriging. Journal of Environmental Pollution 122(2): 235–44.
Dixon B.A, 2009. Case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N. Journal of Hydrogeology 17(6): 15–20.
Duda, RO, Hart PE, Stork DG, 2011. Pattern Classification. 2nd Edition. John Wiley & Sons, New York.
Emberger L, 1952. Sur le quotient pluviothermique. C.R. Sciences, 234: 2508-2511.
Friedl MA, Brodley CE, Strahler AH, 1999. Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Trans Geoscience Remote Sense 37(2): 969–77.
Guo L, Chehata N, Mallet C, Boukir S, 2011. Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests. Jounal of Photogramm Remote Sensing 66(1): 56–66.
Guyon I, Elisseeff A, 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–82.
Harb N, Haddad K, Farkh S, 2010. Calculation of transverse resistance to correct aquifer resistivity of groundwater saturated zones: implications for estimating its hydrogeological properties. Journal of Lebanese Science 11(1): 105-115.
Kadkhodaie-Ilkhchi A, Amini A, 2009. Journal of Petroleum Geology, A fuzzy logic approach to estimation hydraulic flow units from well log data: case study from the Ahvaz oilfield in south Iran. Journal of Petroleum Geology 32(1): 67-78 67.
Kadkhodaie-Ilkhchi A, Rezaee MR, Rahimpour-Bonab H, (2009a) A committee neural network for prediction of normalized oil content from well log data: An example from South Pars Gas Field, Persian Gulf. Journal of Petroleum Science and Engineering 65: 23-32.
Kadkhodaie-Ilkhchi A, Rezaee MR, Rahimpour-Bonab H, Chehrazi A, 2009b. Petro physical data prediction from seismic attributes using committee inference system. Journal of Computers & Geosciences 35: 314–330.
Ko B, Gim J, Nam J, 2011. Image classification based on ensemble features and random forest. Electronics Letters 47: 638-9.
Kotsiantis S, Pintelas P, 2004. Combining bagging and boosting. Journal of Computational Intelligence 1(4): 324–33.
Maillet R, 1947. The fundamental equations of electrical prospecting. Journal of Geophysics 12: 529-556.
Nadiri AA, Asghari Moghaddam A, Tsai F, Fijani E, 2013. Hydrogeochemical analysis for Tasuj plain aquifer, Iran. Journal of Earth System Science 122(4): 1091-1105.
Nadiri AA, Chitsazan N, Tsai F, Asghari Moghaddam A, 2014. Bayesian artificial intelligence model averaging for hydraulic conductivity estimation. Journal of Hydrological Engineering 19(3): 520-523.
Olatunji SO, Selamat A, Abdulraheem A, 2011. Modeling the permeability of carbonate reservoir using type-2 fuzzy logic systems. Journal of Computers in Industry 62: 147–163.
Pal M, 2005. Random Forest classifier for remote sensing classification. International Journal of Remote Sensing 26(1): 217–22.
Peters J, Baets BD, Verhoest NEC, Samson R, Degroeve S, Becker P D, 2007. Random Forests as a tool for Eco hydrological distribution modelling. Journal of Ecology Modeling 207(2–4): 304–18.
Pulido CI, Gutiérrez JC, 2009. Improved irrigation water demand forecasting using a soft computing hybrid model. Journal of Biosystems Engineering 102(2): 202-218.
Quinlan JR, 1986. Induction of decision trees. Journal of Machine Learning 1(1): 81-106.
Quinlan JR, 1993. C4.5 programs for machine learning. San Mateo, 303 pp, Morgan Kaurmann. San Mateo, CA, California.
Rodriguez VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sánchez JP, 2012d. An assessment of the effectiveness of a Random Forest classifier for land-cover classification. Journal of Photogram Remote Sens 67: 9 -104.
Ross J, Ozbek M, 2007. Hydraulic conductivity estimation via fuzzy. Journal of Mathematical Geology 39(8): 765-780.
Schapire R, 1990. The strength of weak learnability. Journal of Machine learning 5: 197-227.
Theis CV, 1935. The relationship between the lowering of piezo metric surface and the rate and duration of discharge of a well using groundwater storage. Transactions, American Geophysical Union 16: 519-524.
Todd DK, Mays LW, 2005. Groundwater Hydrology, 3nd, John Wiley and Son’s Publishers, p 636.