The Bayesian CNN-LSTM classification model to predict and evaluate learner’s performance

Learning analytics (LA) is a research domain that leverages the analysis of data from the learning process to gain a deeper understanding and enhance learning outcomes. To classify learner performance, a model has been proposed that combines various deep learning techniques, including convolutional neural network (CNN), Long Short-Term Memory (LSTM), and Bayesian models. The integration of these approaches aims to improve the accuracy and effectiveness of performance classification. CNN is used for capturing the local information and LSTM neural network is used for the long-distance dependencies. The effective classification of learners' performance is achieved by combining the strengths of CNN and LSTM, along with the integration of a Bayesian deep learning model. The performance of the proposed model is estimated using the metrics like Accuracy, Precision, Recall and F1-Score. The model showed improvements in Accuracy, Precision, Recall and F1-Score are 98.18%, 97.09%, 96.38% and 95.35% respectively. The proposed model is compared with another existing model such as LSTM and collaborative machine learning (ML) models in terms of performance metrics. The proposed method attained accuracy of 98.18% which is higher than other existing models.


INTRODUCTION
The significance of learning analytics (LA) relies on the interconnections within various educational sectors and disciplines, serving as a stepping stone towards reshaping educational landscapes (Kawamura et al., 2021).In recent years, the increase of information technologies and the internet exploitation presents the unique opportunity and challenges in learning and teaching (Bayazit et al., 2022).The key to success lies in customized and adaptable learning methods, ensuring no student is left behind through the implementation of top-notch student-centered education for all (Ouyang et al., 2023a).Various approaches, supported by innovative techniques and diverse computing system architectures, will pave the way for these educational paths (Garbero et al., 2021).The machine learning (ML) models, Predictive LA, Adaptive Learning, Educational Data Mining (EDM) and Recommender systems are allowing us to learn the personalized learning techniques in the form of distance, education, attendance-based and physical (Vieira et al., 2021).LA are the polymorphic data in different formats calculated, reported, collected and associated with all the facts of learning which is technology-based like the learners and learner' context (Praharaj et al., 2021).
LA includes four elements such as collection of data on learners and the learner's learning environment, data pre-processing and utilizing ML mechanisms and another various statistical method to analyze the patterns of students and interventions to improve the success and avoid failure (Misiejuk et al., 2021).The above-mentioned components allow the developing of educational early warning systems (Han et al., 2021).An early warning system allows to forecast the performance of student's academic in the earlier stage (Gomez et al., 2021).The researchers can utilize LA to develop a forecasting model to allow the early detection of student's failure and give them the suitable intervention and feedback (Nguyen et al., 2021).Classification is based on a supervised ML technique, where class label is forecasted for an input data sample (Worsley et al., 2021).In past researches, the prediction models are developed based on classification methods and compared these models with 5 classifiers to predict the accuracy and identifying the better performance of classification ML algorithm (Yilmaz and Yilmaz, 2022).The Dimensionality Curse (DC) has huge health data challenges for robust Artificial Intelligence (AI) model growth (Laghari et al., 2022).The experiment of Quality of Experience (QoE) is performed for evaluating the satisfaction of end user in image compression (Laghari et al., 2018).Hyper Spectral Imaging (HIS) is relevant method for giving meaningful data about the unique objects in medical field (Karim et al., 2023).An automated method is developed for diagnosing the Accurate Lymphoblastic Leukemia disease utilizing convolutional neural network (CNN) (Saeed et al., 2023).An attention mechanism and strategy of weight fusion is developed in the research (Meng et al., 2023).A federated learning technique depends on prior knowledge and bilateral segmentation network for image edge extraction (Teng et al., 2023).The existing AI prediction methods concentrates on development and optimization of AI algorithms accuracy rather than using AI models to give student with in-time and simultaneous feedback and maximize the learning quality of students (Ouyang et al., 2023b).It is a critical to first process the current situation of students before developing a program to maximize students' performance (Veluri et al., 2022).Address the requirement by utilizing a case of four years of interdisciplinary research in creating the dashboard of Early Alerts Indicators (EAI) at a distance learning university (Rets et al., 2023).Multi regression represent that previous knowledge and technical skills predict the final performance in the course context (ICT 101) (Yildirim and Gülbahar, 2022).The use of LA in higher education for measurement purposes (Caspari-Sadeghi, 2023).The methods are applied in various periods of the school year for regular secondary school and for technical secondary school (Queiroga et al., 2022).To define an efficient learning model which utilizes educational data to predict the outcome of learning process (Renò et al., 2022).The model developed the KNIME platform utilizing the simple regression tree for learner training algorithm (Maraza-Quispe et al., 2022).The LA process is conducted to determine the major significant factors which leads to good academic performance (Gonzalez-Nucamendi et al., 2022).The consent propensity differs between student subpopulations by sending our email prompt to a sample of 4,000 students at out institution stratified by gender and ethnicity (Li et al., 2022a).Participants submitted their models as docker containers for evaluation and ranking on holdout synthetic data (Flanagan et al., 2022).The diversified contexts of LA, with the major ones being tertiary education and online learning (Wong et al., 2022).The students reporting much digital distraction problems obtained lower final course grades and reporting stronger peer learning orientation received higher final course grades (Liao and Wu, 2022).Certain techniques are perceived with less trust and satisfaction than the local feature relevance explanation techniques (Brdnik et al., 2023).The utilize of LA can result in tension between the Council for Advancement of Standards in Higher Education (CAS) principles of autonomy and non-malfeasance on one hand and principle of benefits of other hand (O'Donoghue, 2023).The anticipation emerged in actor-networks as both fluid and stable, encouraging the problematizations and priorities of school leaders (Lunde, 2022).By considering multimodal educational data as input, the ML models and deep learning network can predict students' behaviour change with optimum performance.Mubarak et al. (2021) presented long short-term memory (LSTM) model as a kind of deep neural network (DNN) for the prediction of low-performance learners and allows measures of the timely interruption on video clickstreams data.The introduced model used video clickstream data for classification of temporal sequences which was a major decision-making problem, the model identifies these problems and increases the performance of learners.For better predictions, the model learns both the critical features and functions.The model demonstrates good performance only when the batch size is small, with a minimum requirement of 50.However, when using a large batch size, the training stage does not yield satisfactory results.Rodríguez et al. (2020) presented a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) mechanism was a type of unsupervised method of time series analysis and clustering used for the categorization of emotion by using LA.The time series analysis measures the trend data slope by using linear regression and determines the emotional metrics behaviour in the increasing or decreasing educational field.The DBSCAN mechanism has faced many critical challenges due to complex long-term fluctuations.Christopoulos et al. (2021) suggested an ethical framework Augmented Reality Learning Analytics (ARLEAN) which customized the particular characteristics of AR applications and concentrated on the different learning subjects.The framework classified the generally received input data by utilizing the common reference point and analyses the effect such as different variables may have on various method implementations.The ARLEAN method has the limitations such as overfitting on small data and can't measure the uncertainty.Arashpour et al. (2023) suggested a hybrid method Teaching-Learning Based Optimizer (TLBO) which consists of 2 ML methods known as artificial neural network (ANN) and support vector machine (SVM) for predicting the exam performance of students.For the problems of regression and classification, the suggested TLBO mechanism processes the feature selection for both ML techniques, when the combination of input variables was determined.By utilizing the suggested TLBO algorithm parallel, the ANN technique processes the feature selection.The suggested model had a limitation wherein it relied on clickstream data to represent the study level, which was only suitable for online subject delivery.Rafique et al. ( 2021) presented collaborative ML and integrating LA for the prediction of students' future performance.A presented model highlighted the student's risk in the early weeks of a course and supported the weak students by promoting collaborative learning.The model helps teachers in grouping the students by respective student's performance to track and monitor them.The presented model's outcome shows that by utilizing collaborative learning, the capability of the student's learning was improved.The dataset collected for the presented model was small and the method can't measure the uncertainty.Deep learning models are suitable for predicting LA because of its automatic learning ability and its capability to manage huge amount of information.Various deep learning models are utilized for the research such as CNN and LSTM.The classical deep learning models depends on real-valued deterministic models and they are not well-designed model to predict the uncertainty.To capture this uncertainty the Bayesian deep learning method utilized a practical framework for uncertainties prediction of deep learning.The purpose of this study is discussed below: • A novel model for the classification of the performance of learners is proposed using the combined deep learning models.
• The effective classification of learner's performance is achieved by combining the strengths of CNN and LSTM, alongside the integration of a Bayesian deep learning model.
The research is as follows: The complete description of the proposed method is given in Section 2. The results and discussion of the proposed method are illustrated in Section 3. At last, the conclusion of the paper is presented in Section 4.

PROPOSED METHODOLOGY
In the proposed methodology, a Bayesian CNN-LSTM model is used for the classification of the Learner's performance.The methodology involves data collection, Data pre-processing, Feature Extraction, and Classification.Fig. 1 shows the pictorial representation of the proposed methodology.

Dataset
The data adopted in the research is selected from the MOOC video courses designed and launched by the University of Stanford https://www.kaggle.com/datasets/samyakjhaveri/mooc-final.The dataset consists of videos, students and clickstream events for every week's course.
Every week consists of a group of videos and quizzes.The total number of longer videos is 26 videos and the mean length of many of the videos is longer than twenty mins (Feng et al., 2019).In the MOOC courses, every week includes 1 or 2 quizzes that are combined with videos of the week (Misiejuk et al., 2021).

Pre-processing
The dataset utilized in the research is a little large and within a tuple, it contains interfering data and multiple missing values (Xing et al., 2022).Exceptions before analyzing and classification of data need to be considered for the assurance of the accuracy of model results.The data cleaning levels are needed to verify the validity and integrity of data while analyzing (Fleur et al., 2023).The first level is the need to take off the records with invalid video codes or screen names because invalid pre-defined names are assigned as unique IDs for the learners.The next level is the need to take off the empty columns and columns which add noise to the data like resource and course data (Korir et al., 2023).Next, these data are going to give as input for the extraction of features from video clickstreams.

Feature Extraction
The pre-processed data are extracted by the following methods such as rate-change, pause, play, stop, and seek forward or backward (Jiang et al., 2023).The above methods are recorded as explicit features, the features are extracted by analyzing the current time of the event that respect to the time of video or event timestamp (Amjad et al., 2022)  The extracted features of the dataset are classified by using combined deep learning models like Bayesian, CNN, and LSTM in the following section.

Workflow of Proposed Methodology
In the proposed methodology, deep learning techniques are adopted for the classification.The adopted deep learning models for the classification are Bayesian, CNN and LSTM (Li et al., 2022b).The purpose of these deep learning models is discussed below:

Convolutional Neural Network (CNN)
A CNN is a specific kind of neural network which is most commonly used for classification and segmentation processes in the area of Natural Language Processing (NLP) and Computer Vision (Ochoa and Wise, 2021).Convolution is the method of getting input data and it selects the features of the matrix.The insertion from various courses is combined to get a 2-D array and the outcome is moved to a convolutional layer to generate a new feature (Marmolejo-Ramos et al., 2023).The hidden presentation is formed by applying pooling methods on new features and the final predictions are made by fully connected final layers and the final classification is done by using SoftMax activation function.

Long Short-Term Memory (LSTM)
An LSTM is a particular Recurrent Neural Network that can capture long-distance dependencies and the recurrent connections between units.The dataset utilized for the research is video courses dataset which is time series data, so the LSTM is adopted which is light weight model that helps to learn complex functions and features for better prediction.
The LSTM neural network used linear memory cells which are covered by multiplicative gates to store the information of read, writing, and resetting (Fan et al., 2021).LSTM consists of 3 gates such as input, output, and forget gate.The LSTM neural network takes the input at the present time-step and the output at the past time-step and the generated result is given to the following time-step.The LSTM model is particularly developed for prediction problems in sequence.The process by " " is a learning function that shows the  input value and the  output sequence.The mathematical form for the LSTM model is described in the below Equation (1).

Bayesian Deep Learning Models
Many deep learning models have improved results in different tasks, but they do not give a guarantee for regression or classification predictions.Bayesian Deep Learning is an area at the joining of Bayesian probability and Deep Learning theory.The Bayesian gives the principle uncertainty estimation for the deep learning structures (Aguilar et al., 2022).The Bayesian deep learning network identifies the P(W/D) posterior distribution rather than point estimation.The above-mentioned deep learning models are combined in the proposed method for classification.

Proposed Bayesian CNN-LSTM
The proposed method adopts a structure that combines CNN and LSTM neural networks for capturing the deep sentiment features.Combining the benefits of each network model shows complete and better outcomes which increases the performance of prediction.In the proposed technique, the local information is captured by CNN and the long dependencies are captured by LSTM.The model consists of five layers such as embedding, a convolutional, maxpooling, an LSTM, and a fully connected layer.In the model, Bayesian CNN-LSTM includes drop out layer with a constant dropout rate  before every layer.The outcome of the model is only on a probability distribution and not in a point estimation and is described in the below Equation (2).

𝑃𝑃(𝐷𝐷) = 𝑝𝑝(𝑥𝑥, 𝑦𝑦) = ∫ 𝑝𝑝(𝑦𝑦|𝑥𝑥, 𝑤𝑤) 𝑝𝑝(𝑤𝑤)𝑑𝑑𝑤𝑤
(2) Where, P -dropout layer D -is written as the input and output of x, y pairs.The proposed Bayesian CNN-LSTM technique includes an input layer that is the MOOC courses.The MOOC courses are cleaned in the pre-processing stage.The many convolutional filters slide over the matrix to generate a feature map and these have various sizes to produce and detect particular patterns and the filters are working in parallel to produce a many features map.
The filter starting in the top left corner and ending in the bottom right corners slides from left to right and jumps down one row at a time.By utilizing backpropagation, the weights which are in the filters are updated after each training session, and using the max-pooling layer computes the highest value to a particular filter.The non-maximal values are eliminated by applying the max operation to the outcome of every filter and decreasing the calculation for the upper layer and the local dependencies are extracted within the various areas to have significant data.
The outcome of the Max-pooling layer is applied as input to the LSTM neural network for measuring the longdistance dependencies of the feature sequence.The main benefit of using an LSTM neural network is it has the probability to capture the long dependencies across regions by using past data.Every region vector is combined into a text vector by using the sequential layer.In the proposed model, the LSTM network allows for capturing the learner's course-changing sentiment.
The outcome of the LSTM neural network is integrated and given to a fully connected layer and SoftMax which is an activation function that is applied for producing a final classification.One of the advantages of using this method is local features which are extracted using the initial convolution layer and the LSTM neural network has the ability for utilizing the ordering of said features to learn about the inputs.

EXPERIMENTAL RESULTS
In this research, the performance of the model is estimated using the below confusion matrix like Accuracy, Precision, Recall, F1-Score.The mathematical representations of these parameters are given in the below Equations ( 3) to ( 6 • False Positives (FP) -misclassification the predicted outcome is "yes" but the actual outcome is "no".• False Negatives (FN) -misclassification of the predicted outcome is "no" but the actual the outcome is "yes".

Experimental Setup
In this research, the developed model is simulated by using a Python environment with the system requirements: RAM: 16GB, processor: Intel core i7, and Operating System: Windows 10 (64 bit).The efficiency of the proposed Bayesian CNN-LSTM is evaluated, based on the week the model is trained for every course.During the training process, each week  ℎ , and its previous weeks are appended in a homogeneous single vector and fed to model layers.The pad is applied to input data to obtain an equallength vector, which is then masked before feeding it to model layers.The masking layer discarded the padded values in an input vector, that keeps only the clickstream data as a sequence.The size of every batch is 50 and if the batch size is larger instances for every training stage the model did not perform well.The model concentrated only on the learners who are watching 20 percent at least of the videos in a week to guarantee that enough data is available for the training and testing of the proposed model.

Quantitative Analysis
In this section, the proposed model's performance is estimated by using a confusion matrix like Accuracy, Precision, Recall and F1-Score.In Table 1 the performance of the model in Accuracy and Precision performance metrics is given and Fig. 2 shows the graphical representation of the model in Accuracy and Precision performance metrics given on the MOOC videos.
In Table 1, the various classification models are compared with the proposed Bayesian CNN-LSTM model in terms of Accuracy and Precision performance metrics.The proposed technique performs good than the other classification models like ANN, SVM, logistic regression (LR) and LSTM.The proposed technique gives the performance of accuracy and precision metrics in percentage is 98.18% and 97.09% respectively.Fig. 3 shows the graphical representation of the proposed technique performance in terms of Accuracy and Precision.

Discussion
This section provides the discussion about proposed method and its result comparisons.The limitation of the Mubarak et al. (2021) was the model demonstrates good performance only when the batch size is small, with a minimum requirement of 50.However, when using a large batch size, the training stage does not yield satisfactory results.Rafique et al. (2021) has the limitation the dataset

CONCLUSION
In this research, a new classification technique is proposed by using deep learning techniques for the effective classification of learners' performance.The developed model comprises three learning techniques like CNN, LSTM, and Bayesian deep learning models.By utilizing the benefits of LSTM and CNN neural network and it is combined with Bayesian deep learning models for the accurate classification of learner's performance.CNN is utilized for capturing the local information and the LSTM is utilized for capturing the long-term dependencies.This research concluded that the proposed hybrid Bayesian CNN-LSTM performs well than using the single deep learning model.The proposed Bayesian CNN-LSTM method attained the accuracy of 98.18% which is comparatively higher than other existing methods like Mubarak et al. (2021) attained accuracy of 96.80% and Rafique et al. ( 2021) attained accuracy of 95.83%.In the future, the feature selection method is utilized to improve classification performance.

Fig. 1 .
Fig. 1.The architecture of proposed methodology are discussed below: • Fraction Spent (fracSpent): The learner spending the time watching the video is compared with the actual time duration of the video.• Fraction Completed (fracComp): The percentage of the video that the learner watched, not counting repeated segment intervals.• Fraction Played (fracPL): The learner watched the video in the increasing number of playtimes which is divided by the real-time duration of the video.• Several Played (NumPL): The number of times the learner played the video.• Number of Pauses (NumPa): The number of times the learner paused the video.• Fraction Paused (fracPa): It is a real time fraction, the number of times the learner paused the video is divided by the total playback video time.• Fraction Forward (Fracfrwd): The number of times the learner skipped the video forward when the video is playing is added together and is divided by their period with repeating.• Fraction Backward (Frackbkwd): The number of times the learner skipped the video back when the video is playing is added together and is divided by their time span with repeating.• Number of Seek Backward (NumSBW): The number of times the learner skipped the video backward.• Number of Fast Forwards (NumFFW): The number of times the learner skipped the video forward.

Fig. 2 .
Fig. 2. Graphical representation of model in accuracy and precision metricsIn Table2, the various classification models are compared with the proposed Bayesian CNN-LSTM model

Fig. 3 .
Fig. 3. Graphical representation of various models in F1-Score and Recall

Fig. 4 .
Fig. 4. Graphical representation of comparative analysis of proposed modelcollected for the research was small.The existing methods has the limitations such as overfitting on small data and can't measure the uncertainty, that has negative effect on general abilities and also the prediction method has faced many critical challenges due to complex long-term fluctuations.By applying Bayesian in deep learning methods to estimate the uncertainty in method prediction was implemented.The method can be hugely robust to overfitting and allowed to estimate the uncertainty.The existing methodsMubarak et al. (2021) attained accuracy of 96.80% and Rafique et al. (2021) attained accuracy of 95.83%.The proposed Bayesian CNN-LSTM method attained the accuracy of 98.18% which is comparatively higher than other existing methods.

Table 1 .
Model performance in accuracy and precision

Table 2 .
Model performance in recall and F1-Score

Table 3 .
Represents the comparative analysis of the proposed technique