Classification of Type 2 Diabetes using Decission Tree Algorithm

Ivandari Ivandari, Much. Rifqi Maulana, M Adib Al Karomi

Abstract


Diabetes is a disease that causes many deaths. According to data from WHO, in 2019 there were 2 million deaths due to diabetes. The recording of the patient's condition has been carried out for medical purposes. The large number of records that are only used as stored data will only later become digital waste. Data mining offers a classification process to process data into new knowledge. The recognition of new patterns from existing data results from algorithmic calculation processes as well as statistics. This study uses the type 2 diabetes dataset from the uci repository which was released in 2020. Previous research was conducted using the KNN algorithm with an accuracy rate of 92.5%. For numerical datasets, the decision tree algorithm is proven to be superior and can represent it in a language that is easy for humans to understand. One of the best and widely used classification algorithms for high-dimensional datasets is the decision tree. The results showed that the accuracy of the decision tree algorithm for type 2 diabetes data classification was 95.96%. Another output of this study is a decision tree from the early stage diabetes risk prediction dataset.


Keywords


Data mining, Decision tree, Diabetes type 2

Full Text:

PDF

References


WHO, “Diabetes,” 2023 World Health Organization, 2023. https://www.who.int/news-room/fact-sheets/detail/diabetes.

Univercity of Washington, “Explore results from the 2019 Global Burden of Disease (GBD) study,” 2023. https://vizhub.healthdata.org/gbd-results/.

C. J. Ejiyi et al., “A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms,” Healthc. Anal., vol. 3, no. December 2022, p. 100166, 2023, doi: 10.1016/j.health.2023.100166.

O. Maimoon and L. Rokach, Data Mining and Knowledge Discovery Handbook, vol. 40, no. 6. Springer, 2010.

J. Han and M. Kamber, “Data Mining: Concepts and Techniques Second Edition,” vol. 40, no. 6, p. 9823, Mar. 2006, doi: 10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition. Elsevier, 2011.

ikhsan wisnuadji Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis (Pgk) Menggunakan K-Nearest Neighbor (Knn) Dengan Backward Elimination,” vol. 7, no. 2, pp. 417–426, 2018, doi: 10.25126/jtiik.202071896.

M. F. Kurniawan and Ivandari, “Komparasi Algoritma Data Mining untuk Klasifikasi Kanker Payudara,” IC Tech, vol. I April 20, pp. 1–8, 2017.

G. Aguilera-Venegas, A. López-Molina, G. Rojo-Martínez, and J. L. Galán-García, “Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus,” J. Comput. Appl. Math., vol. 427, p. 115115, 2023, doi: 10.1016/j.cam.2023.115115.

Ivandari, M. A. Al Karomi, and W. Setianto, “No TitleKLASIFIKASI DIABETES TIPE 2 MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBOUR,” IC-TECH, vol. 18, no. 1, 2023.

S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, no. January, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.

C. Carpinteiro, J. Lopes, A. Abelha, and M. F. Santos, “A Comparative Study of Classification Algorithms for Early Detection of Diabetes,” Procedia Comput. Sci., vol. 220, pp. 868–873, 2023, doi: 10.1016/j.procs.2023.03.117.

X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2007.

E. Prasetyo, Data Mining Konsep dan Aplikasi menggunakan Matlab. Yogyakarta: Andi Offset, 2012.

Kusrini and L. E. Taufiq, Algoritma Data Mining. Yogyakarta: Andi Offset, 2009.

B. Santosa, Data Mining Teknik Pemanfaatan Data untuk Keperluan Bisnis, Edisi Pert. Yogyakarta: Graha Ilmu, 2007.

Ian H Witten. Eibe Frank. Mark A Hall, Data Mining 3rd. 2011.

Ivandari and M. A. Al Karomi, “Classification of Covid-19 Survillance Datasets using the Decision Tree Algorithm,” Jaict, vol. 6, no. 1, pp. 44–49, 2021, [Online]. Available: https://jurnal.polines.ac.id/index.php/jaict/article/view/2896.

Ivandari and M. A. Al Karomi, “Algoritma K-NN untuk klasifikasi dataset Covid-19 survillance,” IC Tech, vol. 16, no. 1, pp. 12–15, 2021, [Online]. Available: https://ejournal.stmik-wp.ac.id/index.php/ictech/article/view/137.

F. Gorunescu, Data Mining: Concepts; Models and Techniques. Springer, 2011.

S. Diabetes and B. Hospital in Sylhet, “Early stage diabetes risk prediction dataset,” 2020. https://archive.ics.uci.edu/dataset/529/early+stage+diabetes+risk+prediction+dataset.




DOI: http://dx.doi.org/10.32497/jaict.v8i2.4835

Refbacks

  • There are currently no refbacks.


ISSN: 2541-6340
Online ISSN: 2541-6359

Visitor: 

View My Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.