Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program

Khishigsuren Davagdorj; Jong Seol Lee; Kwang Ho Park; Pham Van Huy; Keun Ho Ryu

doi:10.6703/IJASE.202009_17(3).223

Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program

Special issue: The 10th International Conference on Awareness Science and Technology (iCAST 2019)

Khishigsuren Davagdorj¹, Jong Seol Lee¹, Kwang Ho Park¹, Pham Van Huy², Keun Ho Ryu^{2, 3*}

¹Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea
²Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
³Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea

Download Citation: |
Download PDF

ABSTRACT

Smoking is one of the significant avoidable risk factors for premature death. Most smokers make multiple quit attempts during their lifetime but smoking dependence is not easy and many people eventually failed quit attempts. Predicting the likelihood of success in smoking cessation program is necessary for public health. In recent years, a few numbers of decision support systems have been developed for dealing with smoking cessation based on machine learning techniques. However, the class imbalance problem is increasingly recognized as serious in real-world applications. Therefore, this paper presents a synthetic minority over-sampling technique (SMOTE) based decision support framework in order to predict the success of smoking cessation program using Korea National Health and Nutrition Examination Survey (KNHANES) dataset. We carried out experiments as follows: I) the unnecessary instances and variables have been eliminated, II) then we employed three variations of SMOTE, III) also the prediction models have been constructed. Finally, compare the prediction models to obtain the best model. Our experimental results showed that SMOTE improved the prediction performance of machine learning classifiers among evaluation metrics. Moreover, SMOTE regular based Random Forest (RF) and Naïve Bayes (NB) classifiers were determined the best prediction models in real-world smoking cessation dataset. Consequently, our decision support framework can interpret the important risk factors of smoking cessation using multivariate regression analysis.

Keywords: Smoking cessation; Risk factor analysis; Class imbalance; Synthetic minority oversampling; Machine learning classifiers.

Share this article with your colleagues

REFERENCES

Babar, V., Ade, R. 2015. A novel approach for handling imbalanced data in medical diagnosis using under sampling technique. In Communications on Applied Electronics (CAE), Foundation of Computer Science FCS.
Basheer, I.A., Hajmeer, M. 2000. Artificial neural networks: fundamentals, computing, design, and application. Journal of microbiological methods, 43, 3–31.
Borrelli, B., Spring, B., Niaura, R., Hitsman, B., Papandonatos, G. 2001. Influences of gender and weight gain on short-term relapse to smoking in a cessation trial. Journal of Consulting and Clinical Psychology, 69, 511.
Charafeddine, R., Demarest, S., Cleemput, I., Van Oyen, H., Devleesschauwer, B. 2017. Gender and educational differences in the association between smoking and health-related quality of life in Belgium. Preventive medicine, 105, 280–286.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.
Davagdorj, K., Lee, J.S., Park, K.H., Ryu, K.H. 2019, October. A machine-learning approach for predicting success in smoking cessation intervention. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), IEEE, 1–6.
Davagdorj, K., Lee, J.S., Pham, V.H., Ryu, K.H. 2020. A comparative analysis of machine learning methods for class imbalance in a smoking cessation intervention. Applied Sciences, 10, 3307.
Davagdorj, K., Yu, S.H., Kim, S.Y., Huy, P.V e., Park, J.H., Ryu, K.H. 2019. Prediction of 6 months smoking cessation program among women in Korea. International journal of machine learning and computing, 9, 83–90.
Ganji, M.F., Abadeh, M.S., Hedayati, M., Bakhtiari, N. 2010, November. Fuzzy classifcation of imbalanced data sets for medical diagnosis. In 2010 17th Iranian Conference of Biomedical Engineering (ICBME), 1–5, IEEE.
Han, H., Wang, W.Y., Mao, B.H. 2005. August. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, 878-887, Springer, Berlin, Heidelberg.
Huang, Y.M., Hung, C.M., Jiau, H.C. 2006. Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Analysis: Real World Applications, 7, 720–747.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems. 3146–3154.
Kim, S. 2012. Smoking prevalence and the association between smoking and sociodemographic factors using the Korea National Health and Nutrition Examination Survey Data, 2008 to 2010. Tobacco Use Insights, 5, TUI-S9841.
Kim, Y.J. 2014. Predictors for successful smoking cessation in Korean adults. Asian nursing research, 8, 1–7.
Lee, E.S., Seo, H.G. 2007. The factors associated with successful smoking cessation in Korea. Journal of the Korean Academy of Family Medicine, 28, 39–44.
Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H. 2017. Class imbalance in unsupervised change detection–a diagnostic analysis from urban remote sensing. International journal of applied earth observation and geoinformation, 60, 83–98.
Liaw, A., Wiener, M. 2002. Classification and regression by randomForest. R news, 2, 18–22.
Luque, A., Carrasco, A., Martín, A., de las Heras, A. 2019. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231.
Maciejewski, T., Stefanowski, J. 2011. April. Local neighbourhood extension of SMOTE for mining imbalanced data. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 104–111, IEEE.
Marqués, A.I., García, V., Sánchez, J.S. 2013. On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64, 1060–1070.
Menard, S. 2002. Applied logistic regression analysis,106, Sage.
Powers, D.M. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.
Rish, I. 2001. August. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, 3, 41–46.
Sahin, Y., Bulkan, S., Duman, E. 2013. A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications, 40, 5916–5923.
Song, Y.M., Sung, J., Cho, H.J. 2008. Reduction and cessation of cigarette smoking and risk of cancer: a cohort study of Korean men. Journal of clinical oncology, 26, 5101–5106.
World Health Organization and Research for International Tobacco Control, 2008. WHO report on the global tobacco epidemic, 2008: the MPOWER package. World Health Organization.
World Health Organization, 2015. WHO report on the global tobacco epidemic 2015: raising taxes on tobacco. World Health Organization.
World Health Organization, 2017. WHO report on the global tobacco epidemic, 2017: monitoring tobacco use and prevention policies. World Health Organization.
Wufeng, T.C., Caotun, N.C. 2004. Prediction of RNA polymerase binding sites using purine-pyrimidine encoding and hybrid learning methods. International Journal of Applied Science and Engineering, 2, 177–188.
Zheng, Z., Cai, Y., Li, Y. 2016. Oversampling method for imbalanced classification. Computing and Informatics, 34, 1017–1037.

ARTICLE INFORMATION

Received: 2020-03-29
Revised: 2020-06-26
Accepted: 2020-07-16
Available Online: 2020-09-01

Cite this article:

Davagdorj, K., Lee, J.S., Park, K.H., Huy, P.V., Ryu, K.H. 2020. Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program. International Journal of Applied Science and Engineering, 17, 223–235. https://doi.org/10.6703/IJASE.202009_17(3).223

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program

ABSTRACT

REFERENCES

ARTICLE INFORMATION

Other people also read ...

Monitoring soil resilience via the dynamic changes of selected physicochemical properties of soil in a tropical rehabilitated forest

Efficacy of real-time audio biofeedback on physiological strains for simulated tasks with medium and heavy loads

An alternative framework for implementing generator coherency prediction and islanding detection scheme considering critical contingency in an interconnected power grid

Usability evaluation for driving simulation with the mechanical and joystick manual controllers

Formulation, characterization, and optimization of aripiprazole-loaded lyotropic liquid crystalline nanoparticle for sustained release and better encapsulation efficiency against psychosis disorder

Influence of palm oil mills effluent (POME) sludge vermicomposting on soil physicochemical properties and Zea mays growth performances

IJASE - Most Read Articles

IJASE - Most popular articles

Comprehensive analysis of deep learning based text classification models and applications

Salp swarm algorithm applied to optimal capacitor allocation problem in distribution network for annual cost savings

Deployment of CNN on colour fundus images for the automatic detection of glaucoma

About IJASE

Articles

For Authors

Publisher