Enhanced non-destructive of degree of pineapple juiciness using ensemble learning model based on tapping sound sensing

This research proposes to enhance the non-destructive method for classifying the juiciness of pineapples using tapping sound sensing. Ten statistical features were extracted from the waveform signals by waveform analysis. These features were then separated into a spectral feature set (3 features) and a temporal feature set (7 features). Each feature set was calculated with the weight of important features and selected features for 15 training datasets using 10 machine learning classifiers. Ten machine learning classifiers were Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Ensemble Voting, Adaboost, Ensemble Bagging, and Ensemble Stacking. The classifiers were evaluated with accuracy and the kappa coefficient. Grid search was used to determine various important hyperparameters for Machine Learning classifiers. The experiment results showed that the Ensemble Voting (soft), Ensemble Stacking, and MLP outperformed other classifiers. They can obtain an accuracy of 92.08%, and kappa coefficients are 0.8811, 0.8808 and 0.8808, respectively.


INTRODUCTION
The pineapple (Anamas comosus) is one of the most well-known tropical fruits.Pineapple is classified as a non-climacteric fruit.It does not continue ripening after being harvested.As a result, juiciness is an essential characteristic of edible pineapple and is useful for exporters and consumers who want to sort pineapples based on their ripeness and juiciness.Unfortunately, it is difficult to ascertain the degree of juiciness in pineapple using traditional non-destructive detection techniques.
Traditionally, farmers and sellers used various non-destructive techniques to determine the juiciness degree of pineapple.Pineapple farmers or merchants typically employ a traditional technique of tapping the fruit skin with a rubber-tipped stick or their middle fingernails using force impulse techniques to hear the sound to judge pineapple juiciness.However, the conventional classification method calls for years of training or practice.Due to personal judgment, classification errors can happen quickly using traditional approaches, resulting in low classification accuracy.Notably, a human's ability to detect cannot accurately determine the juiciness level of pineapple.Various image processing techniques have been extensively employed in previous evaluation studies.They have been employed in the fields of medical for the critical classification of lymphoblastic cancer (Saeed et al., 2023) as well as for the detection of objects in remote sensing imaging (Wang et al., 2022).Furthermore, the agricultural sector has applied these techniques to assess the quality of fruits (Azman and Ismail, 2017;Chaikaew et al., 2019).However, it is important to note that assessing the external appearance of fruit may be inaccurate due to potential damage from the external environment.
Additionally, several studies applied the acoustic signal to assess the internal quality of fruits like apples (Lashgari et al., 2020;Ekramirad et al., 2021;Zhao et al., 2021), pears (Zhang et al., 2021), and wheat (Yang et al., 2021).Furthermore, researchers have adapted the machine learning using the statistical features to parameters of 11 features of time domain, 7 features of the frequency domain, and 18 features of the combined feature set to identify early core browning in pear fruit.In each feature set, the minimal number of features was determined using the distance evaluation approach.As a result, the browning classifier achieved an accuracy of 93.90% using only three timedomain features (shape factor, kurtosis and square root amplitude value) and one frequency domain feature (variance).On the other hand, the classifier for slight browning achieved an accuracy of 86.40% overall with two time-domain features (shape factor and clearance factor) and one frequency-domain feature (mean square) (Zhang et al., 2021).When applying digital signal processing, natural frequency, spectrum entropy, and zero-crossing were the three most effective components of the carob moth diagnostic method in pomegranate fruit.A classification accuracy of 97.55% was achieved using these three features.(Janati et al., 2022).To enable real-time evaluation of kiwifruit firmness, 10 features from the frequency domain data were extracted using statistical characteristics.The root mean square, energy, means, spectral centroid of magnitude, and reference firmness indices all showed strong relationships (|R| > 0.7).The CARS-PLS model's prediction accuracy for fresh firmness, stiffness, and skin firmness in external cross-validation sets yielded R 2 cv = 0.96, 0.95 and 0.93, respectively (Tian et al., 2022).In addition, the industrial sector has also applied statistical features for machinery fault diagnosis (Hui et al., 2017;Lei et al., 2017), and utilizing machine learning methods for the purpose of diagnosing, treating, and overseeing cognitive rehabilitation in individuals with neurological disorders is a primary focus.EEGs are employed to monitor and analyze brain activity (Das et al., 2023).
Machine learning methods can be used to evaluate quality by analyzing data collected from sensing devices.This is a useful tool since it can be used to find flaws in products and raise their quality.(Nturambirwe and Opara, 2020).Therefore, high-dimensional data in machine learning issues is problematic, especially with numerous characteristics and extracting feature significance from these variables and data with a high dimension.The redundant data and noise in the dataset were removed using statistical methods.This is important because feature selection for models is crucial for classifying the phenotypes of colorectal cancer cases.Many different methods have been proposed for selecting the most important features in a dataset.(Cenggoro et al., 2019).The result showed that random forests (RF) have the highest performance in techniques for selecting features from RFs and extra trees for malware detection in ensemble classification (Gbenga et al., 2021).Additionally, it used RF, Boruta, and Recursive Feature Elimination (RFE) selection methods to select essential features and compare different machine learning for classification analysis.In all experiment groups, the RF algorithm outperformed other algorithms in terms of performance (Chen et al., 2020).
However, no prior research works have been proposed to perform the statistical features extracted from the waveform signals and use feature selection for pineapple juiciness classification on Machine Learning methods.Signal processing might be a more suitable choice compared to image processing due to the potential for errors resulting from adverse external factors.These external conditions can lead to inaccuracies in the assessment process.Therefore, this paper proposes a statical feature of acoustic sensing and 10 machine learning classifiers to classify the juiciness level of pineapple.The juiciness level is divided into three classes: Juiciness 1 is defined as the flattening sound that is produced when a pineapple is particularly juicy, sweet, and slightly acidic in flavor.Juiciness 2 is the dullness that results from fruit that is just a little bit juicy, sweet, and sour in flavor.Juiciness 3 is defined as the echo transmitted as a tympany sound when the pineapple is slightly less juicy and sweet and has a more acidic flavor.(Phawiakkharakun et al., 2022).
The primary aim of this research is to address the challenge of assessing the juiciness level in pineapples using non-invasive methods, particularly by analyzing audio waveforms generated through the detection of tapping sounds on pineapples.Traditionally, fruit quality assessment has relied on the subjective personal experience judgments, which can be time inefficiencies and lead to inconsistencies and inaccuracies.Our research proposes a modern approach that utilizes machine learning techniques to provide an objective and efficient solution.
The research makes several valuable contributions: 1) The study utilizes machine learning models to assess the juiciness of pineapples, demonstrating the practical application of these models in assessing fruit quality.2) It highlights the importance of feature selection by highlighting key attributes that significantly affect classification accuracy, such as the form factor, crest factor, and root mean square.3) The study demonstrates notable gains in accuracy when analyzing the effects of combining features from several datasets when compared to reference datasets.This suggests that model performance may be enhanced through data integration.4) The investigation employs ensemble learning techniques, including ensemble layering and ensemble voting, to enhance classification accuracy effectively, improving the efficiency of machine learning models.5) The study provides a comparative analysis of various machine learning models, demonstrating variations in their accuracy when classifying pineapple juiciness.6) Applying machine learning to assess fruit quality, particularly pineapples, has practical implications for the fruit production and quality control industries.7) Unlike prior research that relied on Convolutional Neural Networks (CNNs), this approach prioritizes accessibility and the ability to conclude the significance of feature engineering with ensemble learning, making it suitable for practical applications.
These collective contributions enhance the understanding and real-world utilization of machine learning for assessing fruit quality, with a specific focus on classifying the juiciness of pineapples.The structure of this study is as follows: Section 2, we present a summary of pertinent research related to the assessment of fruit quality, the utilization of acoustic signals, and the utilization of machine learning approaches.We delve into foundational investigations explore related applications and highlight the gaps in the literature.
Section 3, we introduce the datasets and machine learning models used in our study.We elaborate on the features extracted from audio waveforms and the essential preprocessing steps required for effective model training.
Section 4, we showcase the results of our experiments, illustrating how different machine learning models perform on a range of datasets.We underscore the significance of feature selection, feature integration, and the utilization of ensemble learning to improve the accuracy of classification.
Section 5, we delve into a thorough examination of our results, emphasizing the importance of distinct attributes like crest factor, shape factor, and root mean square in the classification process.Additionally, we explore the advantages of utilizing ensemble classification techniques to enhance the precision of our classifications.
Section 6 encapsulates the findings of our research and their significance in the context of quality assessment.We emphasize the disruptive nature of our methodology and its potential for wider adoption within the agricultural postharvest industry.

MATERIALS AND METHODS
The experimental process for classifying the juiciness of Sriracha pineapples.The process begins with Step 1, which involves acquiring data from 30 pineapple samples using a mobile phone.In Step 2, 1,200 audio waveforms are input and divided into three classes based on juiciness levels (Juiciness 1, Juiciness 2 and Juiciness 3).Step 3 involves audio preprocessing, where the spectral centroid, spectral bandwidth, spectral roll-off, crest factor, skewness, zerocrossing rate, root mean square, impulse factor, shape factor, and kurtosis are extracted as features from the waveform.
In Step 4, a score of feature importance values is calculated, and the dataset is prepared by selecting features based on their rank of importance.
Step 6 employs an evaluation model comprising a SVM, RF, GBM, XGB, KNN, Multi-Layer Perceptron (MLP), Ensemble Voting, Adaboost, Ensemble Bagging, and Ensemble Stacking, with accuracy and kappa coefficient serving as the metrics.Finally, in step 7, the classification results are visualized and compared among the different models as shown in Fig. 1.

Dataset Preparation
Thirty Sriracha pineapples were collected and tapped with a rubber-tipped stick.The tapping sound was recorded by using Motiv audio software in real environments (Phawiakkharakun et al., 2022).This non-destructive recording approach is based on the technique used by pineapple sellers and pineapple farmers who tap on pineapple samples to identify their juiciness.There are two methods of tapping: (a) employing a stick with a rubber tip and (b) utilizing the middle fingernail of an individual as shown in Fig. 2.

Fig. 2. The process of assessing the level of pineapple's juiciness
Samples of pineapple were divided by pineapple connoisseurs or pineapple farmers.Juiciness 1 was separated into three groups of identical size, followed by Juiciness 2 and Juiciness 3 (Fig. 3(a)).Consequently, ten pineapples of each level of juiciness are included in each dataset.Each level of the pineapple's juiciness is shown in Fig. 3(b).

Fig. 3. The pre-classified the harvested pineapple fruits
The impact response technique, the rubber-tipped stick, is used to process the acoustic data.A sampling rate of 44,100 Hz and a bit-depth of 16 bits per sample were obtained for the mono audio.Each pineapple was tested by tapping it five times, resulting in a total of 40 audio waveform files.In total, 1,200 audio waveform files were created, which were categorized into three groups: Juiciness 1 and Juiciness 2 each have 400 audio waveform files, and Juiciness 3 has 400 audio waveform files.

Data Exploration
Our dataset consists of 1,200 audio waveform files, out of which 1,098 files (equivalent to 91.5% of the total) have a length of less than 3 seconds.The remaining 102 audio waveform files have a period of time longer than three seconds, constituting 8.5% of the total files and the density of the data set varies within the range of 1.5-3.0seconds.
The audio signal contains noise, such as variations in the duration of the first tap produced by a rubber-tipped stick hitting a pineapple.Additionally, ambient noise from human, animal, machine, and engine sources.These factors contribute to the presence of noise or errors in each audio signal, resulting from data acquisition in an uncontrolled environment.

Audio Pre-processing
Each audio signal was used to extract features.The spectral features consist of three representative statistical attributes, namely, the spectral centroid (F1), the spectral bandwidth (F2), and the spectral roll-off (F3).The Librosa library (McFee, 2015) was used in our research to extract the features from the spectral features.
In addition to these, it lists seven statistical features for the temporal features, namely, crest factor (F4), skewness (F5), zero-crossing rate (F6), root mean square (F7), impulse factor (F8), shape factor (F9) and kurtosis (F10) of the audio signal.The identification of juiciness classes was carried out using both spectral and temporal feature datasets as shown in Table 1.
An excessive amount of information and irrelevant data can lead to bias, which can affect the accuracy of machine learning outcomes.Therefore, it is essential to focus on important features that have a significant impact on machine learning.Feature importance approaches are employed for the computation of a score across all input features in a machine learning model.The scores provide an indication of the "significance" of each feature, where a higher score implies that the feature will exert a more substantial influence on the model's parameters.
Our research analyzes features to identify which ones are effective in classifying data.RFs were used in these experiments.The top three features identified were crest factor, shape factor, and root mean square, with obtained feature importance values of 0.1796, 0.1661, and 0.1341, respectively.The feature importance values for spectral centroid, crest factor, impulse factor, spectral roll-off, skewness, spectral bandwidth, and kurtosis were 0.0924, 0.0881, 0.0862, 0.0832, 0.0633, 0.0547 and 0.0522, respectively.
The dataset comprises 15 sets of experiments.X1 includes all features (F1-F10), X2 includes crest factor (F4), shape factor (F9), and root mean square (F7) (X1 is a dataset with feature importance values greater than 0.1).X3 includes spectral centroid (F1), crest factor (F6), shape factor (F9), spectral roll-off (F3), skewness (F5), spectral bandwidth (F2) and kurtosis (F10) (X3 is a dataset with 1 where the spectral magnitude at frequency bin k is denoted as S(k), the frequency at bin k is represented by f(k), and the spectral centroid is given by fc.. 2 where Rt is the roll-off frequency, and Mt is the magnitude of the n-th frequency component of the spectrum. 3where x(n) is a signal series for n = 1,2, … N, and N is the number of data point feature importance values less than 0.1).X4 includes spectral centroid (F1), crest factor (F6), shape factor (F9), and spectral roll-off (F3) (X4 is a dataset with feature importance values between 0.08 and 0.1).X5 includes skewness (F5), spectral bandwidth (F2), and kurtosis (F10) (X5 is a dataset with feature importance values between 0.6 and 0.8).Another dataset (X6-X14) combines the top three features (crest factor, shape factor, and root mean square) with X3 and X5.X15 comprises a combination of X2 and X4, eliminating the skewness, spectral bandwidth, and kurtosis features.To conduct the experiment, all datasets were split randomly into two groups, with 80% used to train the model and 20% used to test the performance of the model on unseen data, as shown in Fig. 4. The examined features for classification, including the 15 datasets and the target output (Juiciness class), were applied to the development of a variety of machine learning models.Some of the implemented machine learning models were SVM, RF, GBM, XGB, KNN, MLP, Ensemble Bagging, Adaboost, Ensemble Voting, and Ensemble Stacking.To carry out an optimized analysis, each ML model underwent hyperparameter fine-tuning using the tuning procedure.
We utilized grid search to determine various essential hyperparameters.Grid search is a way to find the best hyperparameters for a model by trying out all possible combinations of hyperparameter values within a specified range and then applying them to the learning process.The experimental results for the best parameter of each model consisted of the classification accuracy achieved when tuning hyperparameters via SVM, RF, GBM, XGB, KNN, MLP, bagging, Adaboost, and the optimal hyperparameters selected via grid search, presented in Table 2.
Ensemble voting implements both "hard" and "soft" voting.When using "hard" voting, it relies on predicted class labels to determine the majority vote.On the other hand, the class label is predicted by "soft" voting based on the argmax of the sums of the expected probability.The estimators used in ensemble voting are SVM, RF, GBM, XGB, KNN, MLP, Bagging, and Adaboost.
Ensemble stacking, on the other hand, utilizes a metalearning algorithm that decides how to combine predictions from two or more fundamental machine learning techniques most effectively.Like ensemble voting, it also uses the estimators SVM, RF, GBM, XGB, KNN, MLP, Bagging, and Adaboost.However, the final estimator is logistic regression.Overall, ensemble methods are powerful techniques that can significantly enhance the robustness and accuracy of machine learning models.Derived from the study details in the research, the pseudocode framework for an Ensembled learning classification algorithm is presented in Algorithm 1.

Evaluation
The optimal network with the best hyperparameters is selected and applied to the dataset, which is randomly split into 80% for the training dataset and 20% for the validation dataset.The validation precision and loss (error) are both recorded.The accuracy and Cohen's kappa are compared to the results of each model on the test dataset using the confusion matrix and classification report to determine the model's performance.As shown in Equation (1), accuracy denotes the ratio of accurately categorized pineapple samples to the overall count of pineapple samples.(1) Cohen's kappa is a widely used statistical measure that determines the level of agreement between two important factors.It is frequently employed to assess how well a classification model performs the equation for computing Cohen's kappa as shown in Equation ( 2).
The kappa coefficient value is denoted by K, where po represents the overall accuracy of the model, and pe represents the degree of concurrence between the predictions of the model and the authentic class values that could happen randomly.The degree of agreement can be categorized based on the kappa coefficient value as follows: slight agreement if the value is less than or equal to 0.20.Fair agreement if the value is between 0.41 and 0.60.There is moderate agreement if the value is between 0.61 and 0.80.There is a strong and almost perfect agreement if the value is between 0.81 and 1.0.

RESULTS AND DISCUSSION
In this paper, the important scores for all input features were calculated.The analysis revealed that the crest factor had the highest feature importance value of 0.1796, followed by the shape factor with a value of 0.1661, and the root mean square with a value of 0.1341.
Notably, the crest factor, shape factor, and root mean square obtained feature importance value greater than 0.1.In the next step, we partitioned the features into 15 datasets, applied grid search to tune hyperparameters, and showed the best parameters for each of the 15 datasets (X1-X15  F9), spectral roll-off (F3), skewness (F5), spectral bandwidth (F2), and kurtosis (F10), achieves the best accuracy value of 90.00% with the SVM model.The X6 dataset combines crest factor with a feature in X3; the X7 dataset combines shape factor with a feature in X3; the X8 dataset combines root mean square with a feature in X3; and the X9 dataset combines crest factor and shape factor with a feature in X3, achieve accuracies of 90.83%, 91.25%, and 90.83%, respectively.Moreover, other ML models can achieve higher accuracy than the baseline dataset (X3), as shown in Fig. 5. 4. The results indicate that both ensemble voting (soft) and ensemble stacking methods can achieve a high accuracy of 92.08% in the X8 and X7 datasets, as demonstrated in Fig. 6.A blue highlight is the performance of the model obtaining an accuracy greater than 90% of each dataset.
5. The X5 dataset comprises skewness (F5), spectral bandwidth (F2), and kurtosis (F10), combined with the top three features (i.e., crest factor, shape factor, and root mean square) with a score of feature importance greater than 0.1.The X10 dataset includes the crest factor combined with a feature from X5, the X11 dataset includes the shape factor combined with a feature from X5, and the X12 dataset includes root mean square combined with a feature from X5.The X13 dataset comprises the crest factor and the shape factor combined with a feature from X5, and the X14 dataset comprises the crest factor, the shape factor, and root mean square combined with a feature from X15.The MLP model can achieve an accuracy of 76.25% in the baseline dataset (X5) and 89.58% in the combined dataset (X14), indicating an improvement of approximately 17.48% from the baseline dataset (X5).Other models can also outperform the baseline dataset, as shown in Fig. 7.
6.The performance of the ensemble learning model shows that ensemble stacking can achieve an accuracy of 76.67% in the X5 dataset and an accuracy of 87.92% in the X12 dataset, resulting in an accuracy improvement of approximately 14.67% from the baseline dataset (X5).Additionally, other models can achieve higher accuracy levels than the baseline dataset (X5), as demonstrated in Fig. 8. 7. The features of skewness, spectral bandwidth, and kurtosis were removed from the X15 dataset because their feature importance values were less than 0.07.As a result, the X15 dataset now includes the crest factor, the shape factor, root mean square, the spectral cent, the zero-crossing rate, the impulse factor, and the roll-off.
When compared with the X1 dataset, which consists of all 10 features, the MLP model achieved an accuracy of 92.08% with the X15 dataset, while the X1 dataset could only obtain an accuracy of 91.67%, as shown in Fig. 9 The primary objective of the study was to apply machine learning models to classify the degree of juiciness in pineapples.The study included processes such as feature selection, hyperparameter tuning, and a comprehensive evaluation of multiple models.This study presents the following significant findings and results: 1. Feature Importance Analysis: The calculation of feature importance scores included all input features, leading to the identification of three features with the most significant importance values: crest factor, form factor, and root mean square.The features mentioned above showed significance values exceeding 0. In this discussion, the study highlights the significance of feature selection and combination, hyperparameter tuning, and the utilization of ensemble learning methods to enhance the classification accuracy of machine learning models for determining the degree of pineapple juiciness.The findings indicate that specific features, such as crest factor, shape factor, and root mean square, play a crucial role in the classification task.When combined with other relevant features, they result in significant improvements in accuracy.Additionally, ensemble methods like ensemble stacking and ensemble voting (soft) prove effective in enhancing classification accuracy.In comparison to the previous work by Phawiakkharakun et al. (2022) as shown in Table 5.
Our previous research (Phawiakkharakun et al., 2022) utilized a CNN model to evaluate the juiciness level of Sriracha pineapple.This was done by comparing the performance of two different feature extraction methods, namely Mel Frequency Cepstral Coefficient (MFCC) and Mel-Spectrogram, utilizing acoustic sensors in conjunction with CNN.The results from the experiments revealed that both CNN and MFCC outperformed other approaches, achieving an impressive accuracy of 96.67 percent.In contrast, the present study employs statistical features (such as spectral centroid, spectral bandwidth, spectral roll-off, crest factor, skewness, zero-crossing rate, root mean square, impulse factor, shape factor and kurtosis) extracted from audio recordings as input variables for machine learning in the task of categorizing juiciness.Even if this approach's accuracy was lower than that of the preceding work at 92.08 percent, it is crucial to note that the strategy utilized in the earlier work may have problems with feature interpretability.As a result, the current research approach holds distinct advantages in terms of accessibility and the potential for drawing meaningful inferences about the significance of feature engineering when combined with ensemble learning.This makes it a valuable choice for applying machine learning models.

CONCLUSIONS
This research presents a non-invasive approach to assess the quality of Sriracha pineapples based on the audio waveforms sound.The method classifies the juiciness of the pineapple using ten features extracted from spectral and temporal analyses.These features were divided into 15 datasets based on their importance scores and fed into machine learning and ensemble learning models.In this study, we employed grid search to optimize the hyperparameters of nine machine learning models.Our experiments revealed that the ensemble voting (soft) method performed the best, achieving an accuracy of 92.08% and a Kappa coefficient of 0.8811 in the X8 dataset.The ensemble stacking and MLP methods both achieved an accuracy of 92.08%, with Kappa coefficients of 0.8088 in the X7 and X15 datasets, respectively.
There are multiple avenues to explore for future research.1) Develop mobile applications that utilize this method to assess the pineapple juiciness in real-time, making it applicable to business applications.2) Increasing the dataset with additional pineapple samples from various sources and environments to enhance the generalizability of the model.
3) Investigate feature engineering methods to find additional features or combinations that can increase accuracy.4) Integrating acoustic sensing devices to automate the data collection process and provide immediate feedback to users.5) Applying the methodology to assess the quality of other fruits or agricultural products, thereby expanding its potential impact on the agriculture and culinary sectors.
Future research has the potential to expand on the groundwork laid in this study and drive the domain of noninvasive quality evaluation through acoustic signals by tackling these challenges.

Fig. 4 .
Fig. 4. The dataset features an experiment for input into the classifier

Table 1 .
Features description and formula of spectral and temporal features

Table 2 .
The selected optimal parameters using grid search

Table 3 .
The performance of ML models for pineapple juiciness classification An orange highlight is the best performance of each input dataset compared with all models.(SVM, RF, GBM XGB, KNN, MLP, ensemble Voting (soft), ensemble Voting (hard), Adaboost, ensemble bagging, and ensemble stacking) A blue highlight the performance of the model obtaining an accuracy greater than 90% of each dataset.

Table 4 .
The performance of Ensembled learning models for pineapple juiciness classification An orange highlight is the best performance of each input dataset compared with all models.(SVM, RF, GBM XGB, KNN, MLP, ensemble Voting (soft), ensemble Voting (hard), Adaboost, ensemble bagging, and ensemble stacking)

Table 5 .
The comparison for pineapple juiciness