Khishigsuren Davagdorj1, Jong Seol Lee1, Kwang Ho Park1, Pham Van Huy2, Keun Ho Ryu2, 3* 1 Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea
2 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
3 Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea
Download Citation:
|
Download PDF
Smoking is one of the significant avoidable risk factors for premature death. Most smokers make multiple quit attempts during their lifetime but smoking dependence is not easy and many people eventually failed quit attempts. Predicting the likelihood of success in smoking cessation program is necessary for public health. In recent years, a few numbers of decision support systems have been developed for dealing with smoking cessation based on machine learning techniques. However, the class imbalance problem is increasingly recognized as serious in real-world applications. Therefore, this paper presents a synthetic minority over-sampling technique (SMOTE) based decision support framework in order to predict the success of smoking cessation program using Korea National Health and Nutrition Examination Survey (KNHANES) dataset. We carried out experiments as follows: I) the unnecessary instances and variables have been eliminated, II) then we employed three variations of SMOTE, III) also the prediction models have been constructed. Finally, compare the prediction models to obtain the best model. Our experimental results showed that SMOTE improved the prediction performance of machine learning classifiers among evaluation metrics. Moreover, SMOTE regular based Random Forest (RF) and Naïve Bayes (NB) classifiers were determined the best prediction models in real-world smoking cessation dataset. Consequently, our decision support framework can interpret the important risk factors of smoking cessation using multivariate regression analysis.ABSTRACT
Keywords:
Smoking cessation; Risk factor analysis; Class imbalance; Synthetic minority oversampling; Machine learning classifiers.
Share this article with your colleagues
REFERENCES
ARTICLE INFORMATION
Received:
2020-03-29
Revised:
2020-06-26
Accepted:
2020-07-16
Available Online:
2020-09-01
Davagdorj, K., Lee, J.S., Park, K.H., Huy, P.V., Ryu, K.H. 2020. Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program. International Journal of Applied Science and Engineering, 17, 223–235. https://doi.org/10.6703/IJASE.202009_17(3).223
Cite this article:
Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.