Improved Decission Tree Performance using Information Gain for Classification of Covid-19 Survillance Datasets
Abstract
One of the most feared infectious diseases today is COVID-19. The transmission of this disease is quite fast. Patients also sometimes do not have the same symptoms. Overcoming the spread of the pandemic has been widely carried out throughout the world. Apart from the medical method, there are also many other methods, including computerization. Data mining is a discipline that can project data into new knowledge. One of the main functions of data mining is classification. Decision tree is one of the best models to solve classification problems. The number of data attributes can affect the performance of an algorithm. This study uses information gain to select the attribute features of the Covid-19 surveillance dataset. This study proves that there is an increase in the accuracy of the decision tree algorithm by adding information gain feature selection. Previously, the decision tree only had an accuracy rate of 65% for the classification of the Covid-19 surveillance dataset. After pre-processing using information gain, the accuracy rate increased to 75%.
Keywords
Full Text:
PDFReferences
Ian H Witten. Eibe Frank. Mark A Hall, Data Mining 3rd. 2011.
X. Wu, The Top Ten Algorithms in Data Mining. New York: Taylor & Francis Group, LLC, 2009.
M. A. Alkaromi, “Information Gain untuk Pemilihan Fitur pada Klasifikasi Heregistrasi Calon Mahasiswa dengan Menggunakan K-NN,” 2014.
Ivandari and M. A. Al Karomi, “Classification of Covid-19 Survillance Datasets using the Decision Tree Algorithm,” Jaict, vol. 6, no. 1, pp. 44–49, 2021, [Online]. Available: https://jurnal.polines.ac.id/index.php/jaict/article/view/2896.
Ivandari, T. T. Chasanah, S. W. Binabar, and M. A. Al Karomi, “Data Attribute Selection with Information Gain to Improve Credit Approval Classification Performance using K-Nearest Neighbor Algorithm,” IJIBEC, vol. I, pp. 15–24, 2017.
Ivandari and M. A. Al Karomi, “Algoritma K-NN untuk klasifikasi dataset Covid-19 survillance,” IC Tech, vol. 16, no. 1, pp. 12–15, 2021, [Online]. Available: https://ejournal.stmik-wp.ac.id/index.php/ictech/article/view/137.
M. A. Alkaromi, “Komparasi Algoritma Klasifikasi untuk dataset iris dengan rapid miner,” IC Tech, vol. XI, no. 2, 2014.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition. Elsevier, 2011.
D. T. Larose, Discovering Knowledge in Data: an Introduction to Data Mining. John Wiley & Sons, 2005.
E. Prasetyo, Data Mining Konsep dan Aplikasi menggunakan Matlab. Yogyakarta: Andi Offset, 2012.
B. Santosa, Data Mining Teknik Pemanfaatan Data untuk Keperluan Bisnis, Edisi Pert. Yogyakarta: Graha Ilmu, 2007.
A. Christobel and D. . Sivaprakasam, “An Empirical Comparison of Data Mining Classification Methods,” vol. 3, no. 2, pp. 24–28, 2011.
A. H. M. Ragab, A. Y. Noaman, A. S. Al-Ghamdi, and A. I. Madbouly, “A Comparative Analysis of Classification Algorithms for Students College Enrollment Approval Using Data Mining,” Proc. 2014 Work. Interact. Des. Educ. Environ. - IDEE ’14, pp. 106–113, 2014, doi: 10.1145/2643604.2643631.
A. Ashari, I. Paryudi, and A. M. Tjoa, “Performance Comparison between Naïve Bayes , Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,” vol. 4, no. 11, pp. 33–39, 2013.
D. R. Amancio et al., “A systematic comparison of supervised classifiers,” Oct. 2013, Accessed: Oct. 20, 2014. [Online]. Available: http://arxiv.org/abs/1311.0202v1.
DOI: http://dx.doi.org/10.32497/jaict.v7i1.3501
Refbacks
- There are currently no refbacks.
ISSN: 2541-6340
Online ISSN: 2541-6359
Visitor:
This work is licensed under a Creative Commons Attribution 4.0 International License.