International Journal of Applied Science and Engineering
Published by Chaoyang University of Technology

Special issue: The 10th International Conference on Awareness Science and Technology (iCAST 2019)

Khishigsuren Davagdorj1, Jong Seol Lee1, Kwang Ho Park1, Pham Van Huy2, Keun Ho Ryu2, 3*

1 Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea
2 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
3 Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea


Download Citation: |
Download PDF


ABSTRACT


Smoking is one of the significant avoidable risk factors for premature death. Most smokers make multiple quit attempts during their lifetime but smoking dependence is not easy and many people eventually failed quit attempts. Predicting the likelihood of success in smoking cessation program is necessary for public health. In recent years, a few numbers of decision support systems have been developed for dealing with smoking cessation based on machine learning techniques. However, the class imbalance problem is increasingly recognized as serious in real-world applications. Therefore, this paper presents a synthetic minority over-sampling technique (SMOTE) based decision support framework in order to predict the success of smoking cessation program using Korea National Health and Nutrition Examination Survey (KNHANES) dataset. We carried out experiments as follows: I) the unnecessary instances and variables have been eliminated, II) then we employed three variations of SMOTE, III) also the prediction models have been constructed. Finally, compare the prediction models to obtain the best model. Our experimental results showed that SMOTE improved the prediction performance of machine learning classifiers among evaluation metrics. Moreover, SMOTE regular based Random Forest (RF) and Naïve Bayes (NB) classifiers were determined the best prediction models in real-world smoking cessation dataset. Consequently, our decision support framework can interpret the important risk factors of smoking cessation using multivariate regression analysis.


Keywords: Smoking cessation; Risk factor analysis; Class imbalance; Synthetic minority oversampling; Machine learning classifiers.


Share this article with your colleagues

 


REFERENCES


  1. Babar, V., Ade, R. 2015. A novel approach for handling imbalanced data in medical diagnosis using under sampling technique. In Communications on Applied Electronics (CAE), Foundation of Computer Science FCS.

  2. Basheer, I.A., Hajmeer, M. 2000. Artificial neural networks: fundamentals, computing, design, and application. Journal of microbiological methods, 43, 3–31.

  3. Borrelli, B., Spring, B., Niaura, R., Hitsman, B., Papandonatos, G. 2001. Influences of gender and weight gain on short-term relapse to smoking in a cessation trial. Journal of Consulting and Clinical Psychology, 69, 511.

  4. Charafeddine, R., Demarest, S., Cleemput, I., Van Oyen, H., Devleesschauwer, B. 2017. Gender and educational differences in the association between smoking and health-related quality of life in Belgium. Preventive medicine, 105, 280–286.

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.

  6. Davagdorj, K., Lee, J.S., Park, K.H., Ryu, K.H. 2019, October. A machine-learning approach for predicting success in smoking cessation intervention. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), IEEE, 1–6.

  7. Davagdorj, K., Lee, J.S., Pham, V.H., Ryu, K.H. 2020. A comparative analysis of machine learning methods for class imbalance in a smoking cessation intervention. Applied Sciences, 10, 3307.

  8. Davagdorj, K., Yu, S.H., Kim, S.Y., Huy, P.V e., Park, J.H., Ryu, K.H. 2019. Prediction of 6 months smoking cessation program among women in Korea. International journal of machine learning and computing, 9, 83–90.

  9. Ganji, M.F., Abadeh, M.S., Hedayati, M., Bakhtiari, N. 2010, November. Fuzzy classifcation of imbalanced data sets for medical diagnosis. In 2010 17th Iranian Conference of Biomedical Engineering (ICBME), 1–5, IEEE.

  10. Han, H., Wang, W.Y., Mao, B.H. 2005. August. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, 878-887, Springer, Berlin, Heidelberg.

  11. Huang, Y.M., Hung, C.M., Jiau, H.C. 2006. Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Analysis: Real World Applications, 7,  720–747.

  12. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems. 3146–3154.

  13. Kim, S. 2012. Smoking prevalence and the association between smoking and sociodemographic factors using the Korea National Health and Nutrition Examination Survey Data, 2008 to 2010. Tobacco Use Insights, 5, TUI-S9841.

  14. Kim, Y.J. 2014. Predictors for successful smoking cessation in Korean adults. Asian nursing research, 8, 1–7.

  15. Lee, E.S., Seo, H.G. 2007. The factors associated with successful smoking cessation in Korea. Journal of the Korean Academy of Family Medicine, 28, 39–44.

  16. Leichtle, T., Geiß, C., Lakes, T., Taubenböck, H. 2017. Class imbalance in unsupervised change detection–a diagnostic analysis from urban remote sensing. International journal of applied earth observation and geoinformation, 60, 83–98.

  17. Liaw, A., Wiener, M. 2002. Classification and regression by randomForest. R news, 2, 18–22.

  18. Luque, A., Carrasco, A., Martín, A., de las Heras, A. 2019. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231.

  19. Maciejewski, T., Stefanowski, J. 2011. April. Local neighbourhood extension of SMOTE for mining imbalanced data. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 104–111, IEEE.

  20. Marqués, A.I., García, V., Sánchez, J.S. 2013. On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64, 1060–1070.

  21. Menard, S. 2002. Applied logistic regression analysis,106, Sage.

  22. Powers, D.M. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.

  23. Rish, I. 2001. August. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, 3, 41–46.

  24. Sahin, Y., Bulkan, S., Duman, E. 2013. A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications, 40, 5916–5923.

  25. Song, Y.M., Sung, J., Cho, H.J. 2008. Reduction and cessation of cigarette smoking and risk of cancer: a cohort study of Korean men. Journal of clinical oncology, 26, 5101–5106.

  26. World Health Organization and Research for International Tobacco Control, 2008. WHO report on the global tobacco epidemic, 2008: the MPOWER package. World Health Organization.

  27. World Health Organization, 2015. WHO report on the global tobacco epidemic 2015: raising taxes on tobacco. World Health Organization.

  28. World Health Organization, 2017. WHO report on the global tobacco epidemic, 2017: monitoring tobacco use and prevention policies. World Health Organization.

  29. Wufeng, T.C., Caotun, N.C. 2004. Prediction of RNA polymerase binding sites using purine-pyrimidine encoding and hybrid learning methods. International Journal of Applied Science and Engineering, 2, 177–188.

  30. Zheng, Z., Cai, Y., Li, Y. 2016. Oversampling method for imbalanced classification. Computing and Informatics, 34, 1017–1037.


ARTICLE INFORMATION


Received: 2020-03-29
Revised: 2020-06-26
Accepted: 2020-07-16
Available Online: 2020-09-01


Cite this article:

Davagdorj, K., Lee, J.S., ParkK.H., Huy, P.V., Ryu, K.H. 2020. Synthetic oversampling based decision support framework to solve class imbalance problem in smoking cessation program. International Journal of Applied Science and Engineering, 17, 223–235. https://doi.org/10.6703/IJASE.202009_17(3).223

  Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.