Data Mining Application on Weather Prediction Using Classification Tree, Naïve Bayes and K-Nearest Neighbor Algorithm With Model Testing of Supervised Learning Probabilistic Brier Score, Confusion Matrix and ROC

Ratih Prasetya


One of data mining techniques is Classification, used to predict relationships between data on a dataset. The prediction performed by classifying data into several different classes considering certain factor. Classification is a performance of Supervised Learning application where the training data already has a label when entered as input data. Classification is an approach of empirical techniques that can be utilized for short-term weather prediction. The most widely used algorithms in Classification Techniques are Classification Tree, Naïve Bayes and K-Nearest Neighbors. In this study, the author used these three algorithms to predict rain with validation parameters of Brier Score, Confusion Matrix and ROC curves. The input data is synoptic data of Kemayoran Meteorological Station, Jakarta (96745) for 10 years (2006 - 2015) consists of 3528 datasets and 8 attributes. Based on a series of data processing, selection and model testing shows that the Naïve Bayes Algorithm has the best accuracy rate of 77.1% with the category of fair classification so it is quite potential to be used in the operational. The dominant weather attributes in rain formation are moisture (RHavg), minimum temperature (Tmin), maximum temperature (Tmax), average temperature (Tavg) and wind direction (ddd).


Artificial Intelligence, Data Mining, Classification algorithm, Supervised Learning, Weather Prediction, Meteorology

Full Text:



Alkhatib, K., Najadat, H., Hmeidi, I. and Shatnawi, M. (2013) ‘Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm’, Ijbhtnet.Com, 3(3), pp. 32–44. Available at:

Ashari, A., Paryudi, I. and Tjoa, A. (2013) ‘Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool’, International Journal of Advanced Computer Science and Applications, 4(11), pp. 33–39. doi: 10.14569/IJACSA.2013.041105.

Barde, N. C. and Patole, M. (2016) ‘Classification and Forecasting of Weather using ANN, k-NN and Naïve Bayes Algorithms’, International Journal of Science and Research (IJSR), 5(2), pp. 1740–1742.

Bhatkande, S. S. and Hubballi, R. G. (2016) ‘Weather Prediction Based on Decision Tree Algorithm Using Data Mining Techniques’, 5(5), pp. 483–487. doi: 10.17148/IJARCCE.2016.55114.

Chauhan, D. and Thakur, J. (2014) ‘Data Mining Techniques for Weather Prediction: A Review’, International Journal on Recent and Innovation Trends in Computing and Communication, 2(8), pp. 2184–2189. Available at: Vol_2 Issue_8/Data Mining Techniques for Weather Prediction A Review.pdf.

E.Manjula, S. D. (2016) ‘Analysis of Data Mining Techniques for Agriculture Data’, International Journal of Computer Science and Engineering Communications, 4(2), pp. 1311–1313. doi: 10.7910/DVN/MYBLHC.

Esposito, F. (1997) ‘A Comparative Analysis of Methods for Pruning Decision Trees’, 19(5), pp. 476–491.

Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996) ‘From data mining to knowledge discovery in databases’, AI magazine, pp. 37–54. doi: 10.1145/240455.240463.

Gaikwad, G. and Nikam, V. B. (2013) ‘Different Rainfall Prediction Models And General Data Mining Rainfall Prediction Model’, International Journal of Engineering Research and Technology, 2(7), pp. 115–123.

Gorunescu, F. (2011) Data Mining, San Francisco, CA, itd: Morgan Kaufmann. doi: 10.1007/978-3-642-19721-5.

Hamill, T. M. (2005) ‘Brier Skill Scores, ROCs, and Economic Value Diagrams Can Overestimate Forecast Skill’, Monthly Weather Review, 1(303), pp. 1–29.

Jadhav, S. D. and Channe, H. P. (2016) ‘Comparative Study of K-NN , Naive Bayes and Decision Tree Classification Techniques’, 5(1), pp. 2014–2017.

Jay Heizer, B. R. (2006) Operations Management.

Kohail, S. N. and El-halees, A. M. (2011) ‘Implementation of Data Mining Techniques for Meteorological Data Analysis ( A case study for Gaza Strip )’, International Journal of Information and Communication Technology Research, 1(3), pp. 96–100.

Kunwar, V., Chandel, K., Sabitha, A. S. and Bansal, A. (2016) ‘Chronic Kidney Disease Analysis Using Data Mining Classification’, pp. 300–305. doi: 10.1109/CONFLUENCE.2016.7508132.

Lakshmi, B. N. (2015) ‘A Comparative Study of Classification Algorithms for Risk Prediction in Pregnancy’, pp. 0–5.

Lan, L. and Vucetic, S. (2011) ‘Improving accuracy of microarray classification by a simple multi-task feature selection filter’, International journal of data mining and, 5(2), pp. 189–208. Available at:

Lo, J. (2013) ‘Help from Weather Forecasters From Verification to Validation The Brier Score’, pp. 1–6.

Makridakis, S. (1991) ‘Forecasting in the 21st century’, International Journal of Forecasting, 7(2), pp. 123–126. doi: 10.1016/0169-2070(91)90046-X.

Memorandum, T. and Palmer, T. N. (2003) ‘Predicting uncertainty in forecasts of weather’, (294).

Olson, D. and Delen, D. (2008) Advanced Data Mining Techniques, Springer-Verlag. doi: 10.1017/CBO9781107415324.004.

Permatasari, R. and Pascasarjana, P. (2008) ‘in the model’, pp. 54–90.

Potter, T. D. (1986) ‘World climate programme’, The. Science. of. the. Total. Environment., 55(193), pp. 197–205. doi: 10.1016/S0378-777X(85)80099-8.

Prasad, R. S. and Nejres, S. M. (2015) ‘International Journal of Advanced Research in Use of Data Mining Techniques for Weather Data in Basra City’, 5(12), pp. 135–139.

Prediction, D., Classification, U., Naive, M. and Tree, D. (2016) ‘Prediksi Keputusan Menggunakan Metode Klasifikasi Naïve Bayes , One-R , Dan Decision Tree’, pp. 1–10.

Purnamasari, D., Henharta, J., Sasmita, Y. P., Ihsani, F. and Wicaksana, I. W. S. (2013) ‘Get Easy Using WEKA’.

Sandro, L. (2008) ‘Mikrofisika awan’, pp. 1–18.

Shadiq, M. A. (2009) ‘Keoptimalan Naïve Bayes Dalam Klasifikasi’, (1), p. 31.

Sholikhin, M. N. and Rahayu, Y. (2013) ‘Analisis Delay Penerbangan Akibat Cuaca di Bandara Ahmad Yani Semarang dengan Algoritma C4 . 5’, 5, pp. 1–10.

Suhu, P., Dan, U., Udara, K., Persamaan, D., Untuk, R., Prediksi, S., Hujan, T., Di, B. and Lampung, B. (2009) ‘Pemanfaatan suhu udara dan kelembapan udara dalam persamaan regresi untuk simulasi prediksi total hujan bulanan di bandar lampung’, Pemanfaatan Suhu, 11(Suhu dan Manfaatnya), pp. 271–281. doi: 10.18860/ca.v3i1.2565.

Tan, P. and Steinbach, M. (2006) ‘Introduction to Data Mining Instructor ’ s Solution Manual’.

Tobergte, D. R. and Curtis, S. (2013) ‘Metode Classification’, Journal of Chemical Information and Modeling, 53(9), pp. 1689–1699. doi: 10.1017/CBO9781107415324.004.

Turney, P. D. and Littman, M. L. (2002) ‘Unsupervised learning of semantic orientation from a hundred-billion-word corpus’, Technical Report NRC Technical Report ERB-1094, Institute for Information Technology, p. 11. Available at:

Urban, D. I. D. (2012) ‘Pendekatan indeks..., Mira Meilani, FKM UI, 2012’.

Viswanath, P. and Sarma, T. H. (2011) ‘An Improvement to k -Nearest Neighbor Classifier’, (May), pp. 227–231.

Witten, I. H., Frank, E. and Hall, M. a (2011) Data Mining: Practical Machine Learning Tools and Techniques (Google eBook), Complementary literature None. doi: 0120884070, 9780120884070.

Young, R. M. B. (2010) ‘Notes and Correspondence Decomposition of the Brier score for weighted forecast-verification pairs’, (July), pp. 1364–1370. doi: 10.1002/qj.641.

Zengin, E., Masson, E., Umgiesser, G. and Hilmi, K. (2008) ‘Decision support systems and tools’, pp. 455–481.



  • There are currently no refbacks.

ISSN: 2541-6340
Online ISSN: 2541-6359


View My Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.