A dynamic weight function based BERT auto encoder for sentiment analysis

Sentiment analysis is a crucial task in Natural Language Processing (NLP) that determines whether users have positive, neutral, or negative feelings about movies or products. NLP is being employed to address challenges related to the implementation of sentiment analysis. In these issues, basic polarity detection has been replaced by intricate emoticons that differentiate between various negative emotions. Utilizing a feature set to train the classifier, machine learning (ML) techniques can surpass lexicon-based methods in performance, but their effectiveness may be constrained to specific applications. The existing algorithms were processed to reduce the effects on social media and fake news detections but do not work on the classification and validation of tweets. To overcome these issues, a novel Deep Learning model with Weight Function based Bidirectional Encoder Representation from Transformers (WF-BERT) is proposed in this research to achieve high classification accuracy compared to the conventional ML techniques The weight functions of input and output elements in the proposed model are determined dynamically, based on the connection link between input and output. The obtained output results of the proposed model achieved an accuracy of 93.66 %, recall of 87.30 %, the precision of 88.00 %, and F1-score of 87.90 %. The proposed model outperforms existing language representation models with limited training data by including additional domain information.


INTRODUCTION
Natural Language Processing (NLP) is a Sentiment Analysis technique employed to discern the sentiments conveyed by customers through various platforms, such as social media, surveys, and e-commerce site reviews.Sentiment analysis is the method of processing unstructured data by analyzing the language of users and creating an effective model to extract information from it (Patel and Passi, 2020;Bibi et al., 2022;Nezhad and Deihimi, 2022).Advanced and traditional machine learning methods are widely used for sentiment analysis classification in the English language, however, there has been slight research into developing a framework for the Persian language (Dashtipour et al., 2021;Neelakandan et al., 2022;Rodrigues et al., 2022).NLP is implemented for sentiment analysis related issues.Complex emoticons that can make a distinction between negative emoticons have been established in place of the earlier, simpler polarity detection techniques (Naresh and Krishna, 2021;Aljedaani et al., 2022).Machine learning techniques outperform lexicon based techniques but are restricted to a specific application.They train the classifier using a set of features (Sankar et al., 2020;Sunitha et al., 2022).The aesthetics of emoticons produce a high conscious level of processing in social psychology.Affective phenomena are described by emotions, feelings, and core effects, with the core factor serving as an outward expression of our emotions and feelings (Kumar et al., 2020;Hidayat et al., 2022;Leelawat et al., 2022).People express their opinions through various social media websites like Facebook, Twitter, Blogs, etc. Sentiment analysis plays a vital role in the film industry, political field, and marketing area (Soumya and Pramod, 2020;Singh et al., 2022;Aslan et al., 2023).Sentiment analysis techniques recognize the crucial words in a sentence that indicates polarity, including positivity, negativity, and neutrality.Today's world relies heavily on sentiment analysis to process natural language in social media networks.Using machine learning classifiers, the high performance can be assessed in classification and sentiment polarity identification (Faruque et al., 2022;Kumar and Vardhan, 2022;Mendoza-Urdiales et al., 2022;Wang et al., 2022).Opinion mining is a crucial task in NLP that refers to a classification for extracting text features based on the emotional tendency.The existing RNN and CNN models are complicated in training hence enhanced Bi-LSTM model is evaluated which is well performed in finding input information (Alattar and Shaalan, 2021;Jabalameli et al., 2022;Yadav et al., 2023).
The Internet produces a significant amount of text data generated by users, which incorporates their opinions and sentiments.The overview, manual reading, and analyses of subjective data (sentiments, opinions) in large texts are difficult and are also expensive and time-consuming processes.Due to this, NLP in Text data is used to extract and analyze the opinions of the user automatically (Xu et al., 2020;Gowda et al., 2022;Jain et al., 2022;Kaushik and Bhatia, 2022).The pre-processing approach of sentimental analysis tweets with both English and Italian languages to transform the tweets by removing the noise and extracting the hidden information by using BERT-based languages (Pota et al., 2021;Dwivedi and Pathak, 2022).The complex nature of both public and private emotions was demonstrated by emotion recognition through emotion correlation mining using NLP.The recognition process of emotion was performed based on title, comments, and body where results in the emotions easily making wrong predictions as anger (Wang et al., 2021b;Alkhaldi et al., 2022).A knowledge-enabled sentiment analysis BERT model guides the input sentence embedding extracts data from the knowledge graph based on sentiment analysis and recognizes the related information to improve the performance of sentimental analysis (AlBadani et al., 2022;Swamy et al., 2022).Using multi-domain sentiment classification, a continuous naive Bayes framework for large-scale product evaluations on e-commerce sites can reduce computation for such reviews and enhance the capability of continuous learning from different domains (Neogi et al., 2021;Narwal and Aggarwal, 2022).The statistical methods are used in classification and the Bidirectional Encoder Representations Transformer with Bidirectional Long Short-Term Memory (BERT-BiLSTM) is used for the orientation of user's statements.It results from an improved performance in the quality and style of online comments on social media networks (Li et al., 2021;Palomino and Aider, 2022).The existing algorithms were processed to reduce the effects on social media and fake news detections but do not work on the classification and validation of tweets.Hence a novel deep learning model namely BERT was introduced to result in high accuracy in classification with including conventional Machine Learning models (Chintalapudi et al., 2021;Qian et al., 2022).
Most of the existing researches do not consider word ambiguities, which results in the complexity of understanding the sentence.Training the context of important key term is also not effective which results in the huge loss.To overcome these problems, a BERT auto encoder is proposed in this research.The motivation of this research is described as follows: • To classify and validate the twitter tweets for sentiment analysis using WF BERT method.To extract features from twitter data, the user orientation like word ambiguities and polarity of tweets is considered in this work.
• Analysing sentiments based on the distributed opinion of tweets by using a knowledge base, where, the emotional words will be extracted using WordNet and also testify the sentiment polarity of words.• To enhance the sentiment computation on user tweets a dynamic weight function for the BERT autoencoder is proposed.
The organization of the paper is arranged as follows: Section 2 describes the proposed methodology, Section 3 represents the results and discussions of proposed method, and Section 4 concludes the proposed research.Giménez et al. (2020) implemented the idea of applying semantic-based padding using a Convolutional Neural Network (CNN) for NLP tasks like sentiment analysis using different words embedded in a sentence.The proposed model achieved accuracy without applying a padding strategy.The advantage with the use of pre-trained embedded words and a semantic-based padding strategy, the performance and accuracy surpassed the state of the art.The combined methods like CNN and NLP have a limitation of overlapping objects and low 3D viewpoint variation.Future research focuses on extending the implementation of the proposed model in other domains according to the network's improved interpretability.Pathak et al. (2021) developed an idea of topic-level sentiment analysis using a deep learning model, that used latent semantic indexing to extract the topic at the sentence level.By conducting online topic detection and implementing dynamic topic modelling on short-text data streams, this approach has made scalable modelling possible, which is an advantage of this work.The proposed model was only implemented in the detection of a single topic, which is the limitation of this work.Further, it can be implemented to detect multiple topics with a slight change in method and perform sentiment classification simultaneously in the parallel mode.Tao and Fang (2020) presented a multi-label classification of sentiment analysis with a transfer learning approach by two models namely, Aspect Based Sentiment Analysis (ABSA) and Aspect Enhanced Sentiment Analysis (AESA).ABSA was used in obtaining aspect sentiments within a large amount of labelled data, leaving behind the entity aspects of certain sentiments in the existing methods.The proposed multilabel classification with ABSA and AESA methods helps in solving analytic problems in the classification by considering both entity aspects which exceeded in terms of high performance compared to existing methods.Further, the proposed work can be used to investigate the performance in various sentence levels as the present work was restricted to only certain sentence levels, which is drawback of this method.Villavicencio et al. (2021) presented a Naive Bayes Model to apply the NLP methods to analyze and understand the text sentiment of Twitter tweets during the COVID-19 pandemic from Filipinos.Using social networking sites, the data was gathered from the Philippine government to help the government understand the public response via Twitter.The collected data was trained and analyzed using the Naive Bayes Model and Rapid Miner data science software, and the results showed 81.77% accuracy when compared to conventional methods that carried out sentiment analyses on the Philippines Twitter data.The tweets in English and Filipino were classified into neutral, positive, and negative tweets.As future work a complete knowledge of the sentiment analysis of the tweets by this model, the responses can be classified further into different emotions like happy, sad, annoyed, afraid, or angry.Malhotra et al. (2021) presented a bidirectional model motivated by a universal model for language fine tuning (ULMFit) depending on transfer learning.The bidirectional model performs well on tasks of sentimental analysis like regularization and contextualization.The main objective of the model was to pre-train and fine-tune the language model on specific data where the weights of the Average Weight-Dropped Stochastic Gradient-Long Short-Term Memory (AWD-LSTM) include the scaling factor zeta.The bidirectional model was performed both in forward and backward language modelling based on transfer learning models.The drawbacks of it were mainly susceptible to overfitting, the repeated word in static embedding does not suitable for polysemy, and the hidden layers were not trained, only embedded layers were trained while embeddings were loaded into a model which causes it to fail in capturing highlevel data.Further, more attention mechanisms should be implemented for multi-class text classifications.Cai et al. (2020) developed an auto encoder classification model to improve sentiment analysis classification accuracy and the performance of quality measures like, recall, precision, accuracy, and F1-score.In this proposed model, biLSTM was combined with an Enhanced Multi-Head Self-Attention mechanism.To conduct sentiment analysis on movie reviews, a four-layer auto-encoder network model was created for this research.The IMDB movie comment dataset was used in the experiment.The obtained results demonstrated that the model performed well in terms of improving classification accuracy and quality measuring terms.Bidirectional Encoder Representations from Transformers (BERT) were used as a pre-training structure in this method instead of word2vec.The network module layer and parameters should be reduced in the future to strengthen the model, and the use of masking models should be increased, which is a limitation of this work.Wang et al. (2021c) presented an efficient novel framework for sentiment polarity determination of a specific aspect of words in a sentence using aspect-level sentimental analysis.Initially, the BERT model were used in creating embedding vectors from words.Several inter and intra-level attention mechanisms were used to produce the state representation of the sentence which are hidden.The sentiment identification was improved by using various feature focused attention mechanisms and the performance of the model was assessed using various aspect level datasets of sentimental analysis.Further, the performance of the model should be enhanced on the analysis of aspect level sentiment analysis where the extra data like syntactic dependency and knowledge graphs should be incorporated.Laghari et al. (2018) suggested an image compression framework to assess the Quality of Experience (QoE) in cloud computing.This framework has focused mainly on user level satisfaction and user experience on the image quality after performing JPEG compression, removing noise to asses' higher impact image parameter.The results showed that this framework has achieved high level user satisfaction on the quality of compressed image.However, there was high impact on quality of image due to high resolution scaling, which is a limitation of this work.Karim et al. (2021) developed a method of objectives with QoS techniques to measure the image quality efficiency by downloading images from Facebook, twitter, and Instagram.The QoS of these images were measured and resulted in best compressed images of twitter compared to other two.However, this approach cannot support for large datasets which degrades the image quality.Karim et al. (2018) presented a drone plane for monitoring and targeting the street crime criminals based on the real time applications.This was designed based on two computational units of image by including feature extraction and classification.A HOG was used for the weapon detection and Support Vector Machine (SVM) was used for classification.The results obtained has shown better detection accuracy.However, this method cannot work in the shadow regions and it is difficult to detect the objects.Zhao et al. (2022) presented an attention mechanism based classification model of BERT and LSTM for identifying a multi-channel character relationship.The relationship between the characters were extracted and classified by using this model and resulted in the better classification accuracy.However, this approach is not effective for the cross domain text classification.2022) Suggested a Deep Belief Neural Network (DBN) to classify the emotions of twitter data by using the combination of several pre-processing techniques such as tokenisation, filtering, stemming, and building Ngram models.This approach has achieved better classification accuracy due to its ability of fast convergence.However, the suggested approach is a complex model and requires large data to train the network.

RELATED WORKS
The common limitations found from the existing methods are low classification and detection accuracy, data constraints, over fitting, vanishing gradient, low quality of image text, low quality measures.To overcome this limitations, a BERT auto encoder with dynamic weight function is proposed for sentiment classification from text data.The following section describes how the proposed method is developed to overcome the aforesaid limitations.

METHODOLOGY
In this research, an improved model for detecting and classifying negative sentiment polarity was developed using detection algorithms and machine learning classifiers.
The proposed methods' block diagram is given in Fig. 1.The most important steps of the proposed methodologies are defined as follows: 3.1 Pre-processed Tweets It's essential to handle words that have variations in capitalization or spelling but convey the same meaning accurately.The process of normalization ensures that these words are treated equally.Tweets that have not been preprocessed are highly unstructured and contain redundant information.To address these issues, tweet preprocessing is carried out in a series of steps.Almost every social media site is known for the hashtags that represent the topic.
Preprocessing is an important step in text processing.A text can be made up of words, sentences, or paragraphs.Text is defined as a meaningful sequence of characters.Preprocessing techniques are used to feed data to a machine learning algorithm in a better, more natural form.

Sentence Splitter
A sentence splitter is a process of splitting text into individual sentences.This often would work but they could mean hashtags, emoji's, etc. tweets are particularly interesting in that different hashtags and emotions and other interesting tokens hold specific meanings.

Neural Language Processing (NLP)
Sentiment analysis is an NLP method for evaluating the positivity, negativity, and neutrality of data.Textual data is

Fig. 1. Block diagram of proposed method
frequently subjected to sentiment analysis to assist businesses in tracking brand and product sentiment in customer feedback and comprehending customer needs.Twitter sentiment analysis makes it possible to monitor what people are saying about a product or service on social media and can assist in identifying irate customers or unfavourable mentions before they become more serious.Using NLP techniques, stop words like "I," "me," "our," "your," "is," "was," etc. are removed from the words and tokenized.Additionally, NLP is used to clean up the text and eliminate punctuation and special characters to pre-process the data.

Knowledge Base
A knowledge base is a dataset that is employed for storing and distributing knowledge.It encourages knowledge collection, organization, and retrieval to capture the knowledge and skills of human experts to enable decisionmaking.

Emotional Word Extraction
Emotional extraction refers to the application of neurological research to support the importance of emotions in interpersonal interactions at work.It entails classifying various human emotion categories, such as angry, joyful, or depressed.Emotion can be displayed in a variety of visible ways, including voice, written language, gestures, and facial expression.In essence, emotion detection in texts is a content-based classification challenge integrating ideas from the fields of machine learning and NLP.Some of the examples for emotional words are happiness, sadness, fear, anger, surprise, excitement and so on.

Sentiment Polarity of a Word
The orientation of the expressed sentiment for an element is determined by its polarity.It determines if the text portrays positive, negative, or neutral sentiment polarity of the subject from the text.The process of categorizing tweets as positive or negative is known as polarity classification.

Weight Function-based BERT Autoencoder
Transformers are indeed a fundamental basis of BERT, which stands for Bidirectional Encoder Representation from Transformers.The transformers ensure that every output element is linked to every input element and that the correlations between them are determined through proactive measurement of their weightings.BERT is a wide machine learning framework for processing natural language.To assist machines in comprehending the significance of ambiguous language in the text, the weight function of BERT maintains positive relationships with the surrounding text.
Each text is denoted by embedding text vectors like Text_1, Text_2, …, Text_ n and these vector representation is linked to each other by a fully connected network to perform sentiment classification.Each word in the phrase receives the same weight, where the weighted averages of word embedding can increase performance in classification.
A set of sentiment categories  = { 1 ,  2 , … ,   } are considered for classification, in which parameter weighting is obtained and the obtained matrix is considered after the completion of the training phase.The probability of each sentiment (  ) that belongs to a specific category, and the weight function of BERT is mathematically given in equation ( 1).
Where, W is the parameter weighting function,  is the vectorized representation, and k is the total number of categories in sentiment classification.
Each word in the phrase received the same weights in neural language models like word2vec, and the word embedding was calculated by averaging the embedding terms.Weighted averages of word embedding can increase performance in grouping or classification, according to recent empirical research on NLP problems.
According to equation ( 2), a Softmax function is employed to normalize and acquire the class with the maximum probability.
To develop the model, the cross-entropy   is used, and its mathematical expression is given by equation ( 3).
Where,   is the probability of i-th sample belonging to the m-th class ( ∈  ).  is 1 if it corresponds to the m-class; else, it is zero.   indicates the probability of estimating the i-th sample from the m-th category.A dropout strategy is implemented to ensure the obtained model to be robust.This strategy is employed in a completely connected network using vectorized representation (TE) to sentiment category C, and the dropout value is adjusted to 0.5.

Compute Sentiment Polarity for Overall Tweet
The polarity calculation of a text includes assigning the polarity score of every word, if it makes it appear in the dictionary to generate a total polarity score.The overall polarity score of the text is elevated, for instance, if a word in the lexicon fits one that is marked as positive in the dictionary.Polarity detection, the most popular form of sentiment analysis, involves categorizing statements as positive, negative, or neutral.

Sentiment Labeling
Annotating the sentiment label of a word, phrase, sentence, or document is recognized as sentiment labeling.Among the labeling, methods are interactive, automatic, and manual tagging.

EXPERIMENTAL RESULTS AND DISCUSSIONS
The experimental evaluation is performed on windows with python 3.11, intel core i5, and RAM of 64 GB.The tweets polarity of the proposed WF-BERT is measured in terms of accuracy, precision, recall, and F1 score.With the achieved output results, the WF-BERT is analysed and compared to the existing researches.

Performance Metrics
To evaluate the classification model, it uses factors like accuracy, precision, recall, F1 score.
• Accuracy: Accuracy is mainly used for classification problems where it gives efficient classification results but when the samples are balanced only it gives accurate results as shown in equation ( 4).
Where True Positive is TP, True Negative is TN, False Positive is FP and False Negative is FN.
• Precision: To measure the rate of accuracy resulting after the process is done is known as precision.It is measured by using equation ( 5).
• Recall: The rate of accuracy performed to find the positive parameters are measured by the parameter called Recall.The recall and precision parameters are mutually one-sided and restrictive as shown in equation ( 6).
• F1-Score: The harmonic mean value of precision and recall is F-measure score where the equivalence between sensitivity (recall) and precision (correctness) is evaluated.It measures the way results evaluated recall and precision values on the dataset.The F-measure is calculated as shown in equation ( 7).

Classification of Ternary
The proposed WF-BERT is used to classify the tweets' polarity.Positive class, negative class and neutral class are classified as P, N and Neu.The proposed mechanism is compared with the existing techniques in this validation, including SVM (AlBadani et al., 2022), RF (Neogi et al., 2021), standard BERT, and WF based BERT by implementing the collected tweets from tweepy API.Table 1 displays the classified outputs for the proposed WF-BERT using the existing models.
The proposed model obtained a high precision value along with the recall and f score on positive and negative classes.The performance of the existing methods is low on neutral classes due to a large amount of distribution of tweets being done only on positive and negative classes.The conventional methods SVM and RF have less performance in all classes.Additionally, Table 2's evaluation of the proposed model with conventional methodologies using recall, f-score, and precision for all tweets and Fig. 2's graphical representation shows the comparison.
The SVM implementation in the existing technique using Binary, SentiWord (SW), and TIF-IDF feature extractions led to less performance metrics.The SW's utilization of SVM classifiers to retrieve relevant features from the positive, negative, and neutral classes accounts for this.Since RF surpasses the constraint of overfitting on tweets, PCA-RF outperformed TF-IFD in terms of performance (i.e., nearly 74% on all tweets).Due to the SA mechanism's integration, the existing BERT outperformed the WF-BERT in terms of performance.

Quantitative Analysis of WF-BERT on Accuracy
The accuracy of the proposed WF-BERT is compared with the existing techniques such as SVM, RF, LSTM-Attention, BERT, DBN, WF-BERT in this section as shown    (AlBadani et al., 2022) 37.06 RF (Neogi et al., 2021) 63.03 LSTM -Attention (Malhotra et al., 2021) 90.00 BERT 90.05 DBN (Srikanth et al., 2022) 86.10 Proposed WF-BERT 93.66 Fig. 3 demonstrates that SVM only managed to obtain a classification accuracy of 37.06%, which was lower than that of other techniques due to its inability to process such a large part of Twitter data.BERT and WF-BERT obtained an accuracy of 91%, but this BERT is not suitable for long sentences.DBN has achieved 86.1% classification accuracy, but it requires large amount of data to train the network and also has complex data models.The WF-BERT achieved 93.66% classification accuracy, outperforming other methods at the time.

Quantitative Analysis in terms of Mean Rate
The proposed model's performance is validated by comparing existing techniques using the error rates like MSE, RMSE, and MAE, as shown in Table 4 along with the graphical representation from Fig. 4.

Comparative Analysis
The existing models, such as the User-attributes Convolution and Recurrent Neural Network (UCRNN) model and fuzzy sentiment analysis, were implemented to improve fake news detection on social media, however, these methods do not work for classifying and validating tweets.Convolutional Neural Network (CNN) use semantic-based padding, which uses distinct words encoded in a sentence, for NLP applications like sentiment analysis.But it does not result in suitable accuracy with the use of pre-trained embedded words and a semantic-based padding strategy of sentiment analysis (Giménez et al., 2020).The deep learning model is used to extract the topic at the sentence level which used latent semantic indexing and allows scalable modeling as the topic detection is carried out online where it was implemented in the detection of only a single topic (Pathak et al., 2021).The NLP techniques by using Naive Bayes model to analyze and understand the text sentiment of Twitter tweets (Villavicencio et al., 2021).The process of sentimental analysis in NLP using WF-BERT model has resulted in an improved performance with accurate results and further, it can be implemented on various emotions with automatic sentimental analysis.The existing models such as LSTM Attention (Malhotra et al., 2021), BERT-BiLSTM (Cai et al., 2020), Knowledge enabled-BERT model (Wang et al., 2021c) are compared with the proposed model in terms of accuracy, recall, precision, and F1-score in Table 5 and represented graphically in Fig. 5.
From Table 5, it is observed that the classification accuracy of the proposed model is relatively high with an accuracy of 93.66% when compared to the existing model such LSTM attention (Malhotra et al., 2021), BERT-BiLSTM (Cai et al., 2020), and Knowledge enabled BERT model (Wang et al., 2021c).The limitations such as data overfitting, low strength and less performance of these existing prediction models has been overcome with the proposed model by introducing weight function into BERT.Similarly, when the proposed WF-BERT is compared with the existing method such as image compression framework with QoE (Laghari et al., 2018), WF-BERT achieved better result as the QoE approach has the limitation of high scaling resolution.The QoS technique (Karim et al., 2021) also has similar limitation such as degrading image quality for large datasets.To overcome this WF-BERT enhances the image multiple times till it achieves a high quality output image.A facial expression recognition algorithm with a recursive neural network and central loss function based on CNN (Wang et al., 2021a) has a limitation of vanishing gradient problem.This can be overcome with the use of autoencoder which uses neural network to compress and reconstruct the data without leading to vanishing gradient.

CONCLUSION
Sentimental Analysis is an NLP technique used to identify the sentiments which result in the data about customers' views through various mediums like social media.Sentimental Analysis is mainly used in business to monitor the sentiment of the product in the feedback of the customer and know the needs of the customer.The WF BERT was used in the sentimental analysis for better performance in the classification and validation of the information.The BERT model was able to know the variability in data patterns, understand the language, and generalize various NLP tasks.The suggested model overcomes the limitation that the existing models, such as the UCRNN model and fuzzy sentiment analysis, have on the classification and validation of tweets.The limitations like lack of interoperability and using a large amount of data were reduced using the WF BERT model and resulted in an improved performance in various sentimental analyses.Future work focuses on developing the Aspect Based Sentiment Analysis (ABSA) for effective text categorization, which can recover the text's original contextual meaning.
Wang et al. (2021a) presented a facial expression recognition algorithm with a recursive neural network and central loss function based on CNN, to solve low recognition accuracy problems of existing methods.The output results have shown an improved classification and recognition accuracy of CNN.However, the use of RNN leads to the vanishing gradient problem in the system.Srikanth et al. (

Fig. 4 .
Fig. 4. The analysis of error rate for the proposed WF-BERT The SVM model has outperformed the other techniques including the proposed model in terms of RMSE with 0.91.Additionally, the SVM technique has high MSE and MAE, with RF having a high RMSE of 0.74, MSE of 0.57 and MAE of 0.10.However, the long sentence tweet in the output data caused BERT-SA to perform less well.As a result, the research develops Rectified Adam Optimizer with WF-BERT and obtained an error rate with 0.40 RSME, 0.15 MSE, and 0.10 lower error rate.Therefore, the performance of the proposed WF-BERT surpassed the existing techniques with high-performance metrics.

Fig. 5 .
Fig. 5. Graphical representation of existing models with the proposed model

Table 1 .
Outcomes of the suggested WF-BERT's classification for three categories

Table 3 .
The graphical representation of these comparisons are shown in Fig.3.

Table 3 .
Classification accuracy of proposed WF-BERT

Table 4 .
Error rate of the proposed WF-BERT

Table 5 .
Comparison of existing models with the adaptive learning rates with sublinear optimal memory cost optimizer with WF BERT (proposed model)