New term weighting methods for classifying textual sentiment data

Jing-Rong Chang; Long-Sheng Chen; Chia-Wei Chang

doi:10.6703/IJASE.202009_17(3).257

New term weighting methods for classifying textual sentiment data

Special issue: The 10th International Conference on Awareness Science and Technology (iCAST 2019)

Jing-Rong Chang, Long-Sheng Chen^*, Chia-Wei Chang

Department of Information Management, Chaoyang University of Technology, Taichung, Taiwan, R.O.C.

Download Citation: |
Download PDF

ABSTRACT

In current society, people can easily use social media to express their own opinions toward products and services. These online comments can influence other customers’ purchase behaviors. Especially those negative reviews and comments can hurt the images of companies. Consequently, to identify the sentiment of social media users from a large amount comments is one of crucial issues. In recent years, machine learning approaches have been considered as one of possible solutions for recognizing sentiment of text reviews. But, when using these methods to sentiment classification, traditional term weighting methods including Term Presence (TP), Term Frequency (TF), and Term Frequency-Inverse Document Frequency (TF-IDF) often have been utilized for describing the collected textual reviews. However, those conventional term weighting methods cannot have positive effect on improving the classification performance of text sentiment data. Therefore, this study aims to propose two new term weighting methods called Categorical Difference Weights (CDW) and TF-CDW by integrating class information into term weights of textual data to construct Term-Document Matrix (TDM). Then, Support Vector Machines (SVM) will be employed to build classifiers. Finally, we will use several actual cases to demonstrate the effectiveness of our presented methods. Compared to traditional term weighting methods, results showed that our methods indeed outperform TF, TP and TF-IDF.

Keywords: Sentiment classification; Term weighting; Class information; Text mining; Product reviews.

Share this article with your colleagues

REFERENCES

Abbasi, A., Chen, H., Salem, A. 2007. Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Transactions on Information Systems, 26.
Aizawa, A. 2003. An information-theoretic perspective of TF-IDF measures. Information Processing and Management, 39, 45–65.
Akhtar, S., Garg, T., Asif Ekbal, A. 2020. Multi-task learning for aspect term extraction and aspect sentiment classification. Neurocomputing, in press.
Bai, X. 2010. Predicting consumer sentiments from online text. Decision Support Systems, doi:10.1016/j.dss. 2010.08.024.
Bansal, B., Srivastava, S. 2018. Sentiment classification of online consumer reviews using word vector representations. Procedia Computer Science, 132, 1147–1153.
Cerqueira, A.S., Ferreira, D.D., Ribeiro, M.V., Duque, C.A. 2008. Power quality events recognition using a SVM-based method. Electric Power Systems Research, 78, 1546–1552.
Chang, C.C., Lin, C.J. 2001. LIBSVM: a Library for support vector machines, Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chaovalit, P., Zhou, L. 2005. Movie review mining: A comparison between supervised and unsupervised classification approaches. Proceedings of the 38th Hawaii International Conference on System Sciences.
Chatterjee, S., Kar, A.K. 2020. Why do small and medium enterprises use social media marketing and what is the impact: Empirical insights from India. International Journal of Information Management, 53, Article 102103.
Chen, J., Huang, H., Tian, S., Qua, Y. 2009. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36, 5432–5435.
Chen, L.-S., Chiu, H.-J. 2009. Developing a neural network based index for sentiment classification. Proceedings of the International MultiConference of Engineers and Computer Scientists, 744–749.
Gokalp, O., Tasci, E., Ugur, A. 2020. A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Systems with Applications, 14615, Article 113176.
Hsu, C.-W., Chang, C.-C., Lin, C.-J. 2006. A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/ libsvm/index.html.
Karabatak, M., Ince, M.C. 2009. A new feature selection method based on association rules for diagnosis of Erythemato-squamous diseases. Expert Systems with Applications, 36, 12500–12505.
Kennedy, A., Inkpen D. 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22, 110–125.
Khan, K., Baharudin, B.B., Khan, A., e-Malik, F. 2009. Mining opinion from text documents: A survey, The 3rd IEEE International Conference on Digital Ecosystems and Technologies, 217–222.
Kong, L., Li, C., Ge, J., Zhang, F., Feng, Y., Li, Z., Luo, B. 2020. Leveraging multiple features for document sentiment classification. Information Sciences, 518, 39–55.
Li, B., Xu, S., Zhang, J. 2007. Enhancing clustering blog documents by utilizing author/reader comments. Proceedings of the 45th Annual Southeast Regional Conference, 94–99.
Li, S., Zong, C., Wang, X. 2007. Sentiment classification through combining classifiers with multiple feature Sets. Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, 135–140.
Liu, B., Hu, M., Cheng, J. 2005. Opinion observer: analyzing and comparing opinions on the web. Proceedings of the 14th international conference on World Wide Web, 342–351.
Martineau, J., Finin, T. 2009. Delta TFIDF: An improved feature space for sentiment analysis. Proceedings of the Third AAAI International Conference on Weblogs and Social Media, San Jose, CA, USA.
Mekawie, N., Hany, A. 2019. Understanding the factors driving consumers’ purchase intention of over the counter medications using social media advertising in Egypt: (A Facebook advertising application for cold and Flu products). Procedia Computer Science, 164, 698–705.
Na, J.C., Khoo, C., Wu, P.H.J. 2005. Use of negation phrases in automatic sentiment classification of product reviews. Library Collections, Acquisitions, and Technical Services, 29, 180–191.
O’Keefe, T., Koprinska, I. 2009. Feature selection and weighting methods in sentiment analysis. Proceedings of the 14th Austraasian Document Computing Symposium.
Pang, B., Lee, L., Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. EMNLP, 79–86.
Polat, K., Gunes, S. 2009. A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Systems with Applications, 36, 10367–10373.
Simeon, M., Hilderman, R. 2008. Categorical proportional difference: A feature selection method for text categorization. The Australasian Data Mining Conference, 201–208.
Tan, S., Zhang, J. 2008. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34, 2622–2629.
Tang, B., Shepherd, M., Milios, E., Heywood, M.I. 2005. Comparing and combining dimension reduction techniques for efficient text clustering. Proceedings of the Workshop on Feature Selection for Data Mining, SIAM Data Mining.
Tian, X., Tong, W. 2010. An improvement to TF: term distribution based term weight algorithm. The second International Conference on Networks Security Wireless Communications and Trusted Computing, 252–255.
Vapnik, V.N. 1995. The nature of statistical learning theory, Springer-Verlag.
Wang, T., Huang, H., Tian, S., Xu, J. 2010. Feature selection for SVM via optimization of kernel polarization with Gaussian ARD kernels. Expert Systems with Applications, 37, 6663–6668.
Whitelaw C., Garg N., Argamon, S. 2005. Using appraisal groups for sentiment analysis. Proceedings of the 14th ACM international conference on Information and knowledge management, 625–631.
Xu, F., Pan, Z., Xia, R. 2020. E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework. Information Processing & Management, Article 102221.
Ye, Q., Zhang, Z., Law, R. 2009. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36, 6527–6535.
Yu, B., Kaufmann, S., Diermeier, D. 2008. Exploring the characteristics of opinion expressions for political opinion classification. Proceedings of the 9th Annual International Digital Government Research Conference, 82–91.
Zhang, C., Zuo, W., Peng, T., He, F. 2008. Sentiment classification for Chinese reviews using machine learning methods based on string kernel. The Third International Conference on Convergence and Hybrid Information Technology, 909–914.
Zhang, W., Yoshida, T., Tang, X. 2011. A comparative study of TF-IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38, 2758–2765.
Zhang, Y., Zhang, Z., Miao, D., Wang, J. 2019. Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Information Sciences, 477, 55–64.
Zhao, P., Hou, L., Wu, O. 2020. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowledge-Based Systems, 1936, Article 105443.

ARTICLE INFORMATION

Received: 2020-04-14
Revised: 2020-05-07
Accepted: 2020-07-26
Available Online: 2020-09-01

Cite this article:

Chang, J.R., Chen, L.S., Chang, C.W. 2020. New term weighting methods for classifying textual sentiment data. International Journal of Applied Science and Engineering, 17, 257–268. https://doi.org/10.6703/IJASE.202009_17(3).257

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

New term weighting methods for classifying textual sentiment data

ABSTRACT

REFERENCES

ARTICLE INFORMATION

Other people also read ...

Monitoring soil resilience via the dynamic changes of selected physicochemical properties of soil in a tropical rehabilitated forest

Efficacy of real-time audio biofeedback on physiological strains for simulated tasks with medium and heavy loads

An alternative framework for implementing generator coherency prediction and islanding detection scheme considering critical contingency in an interconnected power grid

Usability evaluation for driving simulation with the mechanical and joystick manual controllers

Formulation, characterization, and optimization of aripiprazole-loaded lyotropic liquid crystalline nanoparticle for sustained release and better encapsulation efficiency against psychosis disorder

Influence of palm oil mills effluent (POME) sludge vermicomposting on soil physicochemical properties and Zea mays growth performances

IJASE - Most Read Articles

IJASE - Most popular articles

A dynamic weight function based BERT auto encoder for sentiment analysis

Photonic crystal fibre sensor for alcohol detection with extremely low birefringence

Artificial intelligence in agriculture: Application trend analysis using a statistical approach

About IJASE

Articles

For Authors

Publisher