Processing math: 100%
World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A Filter-Based Improved Multi-Objective Equilibrium Optimizer for Single-Label and Multi-Label Feature Selection Problem

    https://doi.org/10.1142/S1469026823500281Cited by:0 (Source: Crossref)

    Abstract

    Effectively reducing the dimensionality of big data and retaining its key information has been a research challenge. As an important step in data pre-processing, feature selection plays a critical role in reducing data size and increasing the overall value of the data. Many previous studies have focused on single-label feature selection, however, with the increasing variety of data types, the need for feature selection on multi-label data types has also arisen. Unlike single-labeled data, multi-labeled data with more combinations of classifications place higher demands on the capabilities of feature selection algorithms. In this paper, we propose a filter-based Multi-Objective Equilibrium Optimizer algorithm (MOEO-Smp) to solve the feature selection problem for both single-label and multi-label data. MOEO-Smp rates the optimization results of solutions and features based on four pairs of optimization principles, and builds three equilibrium pools to guide exploration and exploitation based on the total scores of solutions and features and the ranking of objective fitness values, respectively. Seven UCI single-label datasets and two Mulan multi-label datasets and one COVID-19 multi-label dataset are used to test the feature selection capability of MOEO-Smp, and the feature selection results are compared with 10 other state-of-the-art algorithms and evaluated using three and seven different metrics, respectively. Feature selection experiments and comparisons with the results in other literatures show that MOEO-Smp not only has the highest classification accuracy and excellent dimensionality reduction on single-labeled data, but also performs better on multi-label data in terms of Hamming loss, accuracy, dimensionality reduction, and so on.

    1. Introduction

    The quality of the original dataset can have a great impact on the effectiveness of machine learning, so pre-processing the original dataset is essential in machine learning. Feature selection is an important step in data pre-processing, which helps machine learning improve classification performance and effectively reduce the complexity of data analysis by eliminating irrelevant and redundant dimensions from the original data and selecting a subset of important features.1 Currently, feature selection is divided into three approaches: wrapper-based, embedded-based and filter-based.2 The wrapper-based feature selection methods combine a learning algorithm with a feature selection algorithm such that the learning algorithm acts as an evaluation black box until a stopping condition is reached. As a result, wrapper-based feature selection methods tend to obtain a more efficient subset of features compared to other methods, but this method will face a huge computational complexity as the data size increases. Embedded-based feature selection methods fuse feature selection algorithms and learning algorithms into one, and are generally faster in computational speed than packaged methods, but they both face high computational consumption due to the volume of data. The filtering-based feature selection methods are most remarkable in that the feature selection algorithm and the learning algorithm are independent of each other, with the former being responsible for completing iterations and outputting feature subsets and the latter being responsible for further learning or evaluation. The filter-based approach has less computational consumption and complexity than the wrapper-based and embedded approaches, and is particularly suitable for processing datasets with a large number of samples or features.

    Single-label feature selection belongs to the category of supervised learning, which selects some dimensions from the original data as feature subsets, and uses the feature subsets to train and test the learning algorithm, so that the classification results of the learning algorithm are closer to the actual labels. In terms of discovering or diagnosing diseases and making predictive countermeasures through the results of machine learning, feature selection plays an important role in timely detection and prevention of diseases.3 Therefore, the problem of single-label feature selection has been a hot research topic.

    However, in the classification tasks of machine learning in social, biological and medical fields, one sample in the dataset may correspond to multiple classification labels at the same time, such as text classification and gene function annotation.4,5 Different from the difficulty of single-label feature selection, selecting a subset of features for the data with multiple labels requires, in addition to facing 2N feature combinations (N is the number of features in the data), considering the intrinsic relevance of the features to the different labels.6 So, the training set filtered by this feature subset will make the set of classification labels obtained by the classifier for each instance have good classification performance in 2S label combinations (S is the number of classification labels). This needs not only the feature selection algorithm to be able to have low computation consumption, but also to be able to provide an effective feature subset.

    Based on a metaheuristic algorithm proposed in 2019, the equilibrium optimizer (EO), this paper proposes a filter-based improved equilibrium optimizer (MOEO-Smp) to solve the single-label and multi-label feature selection problems. The proposed method takes into account the complexity of the multi-label feature selection task by adding consideration of feature subsets, i.e., feature-label correlation and feature-feature redundancy, to the filtered feature selection method to improve the feature selection effectiveness. MOEO-Smp correlates changes in variables during the optimization process with changes in fitness to quickly identify key features, and uses equilibrium pool mechanism to store them to guide the update. The main contributions of this work are summarized as follows:

    In this paper, maximum correlation and minimum redundancy are used as the two objectives of optimization, and the Pareto optimality set of feature subsets is obtained.

    Based on four pairs of reward and penalty principles, rating rules for solution and feature optimization results are set, and total solution scores and total feature scores are calculated based on rating and related metrics.

    Three equilibrium pools are built to guide the MOEO-Smp updates.

    Seven UCI single-label datasets were used to test the performance of MOEO-Smp and 10 other comparison algorithms on three metrics.

    Two Mulan multi-label datasets and one COVID-19 multi-label dataset were used to test the performance of MOEO-Smp and other 10 comparison algorithms on seven metrics. In addition, to the best of our knowledge, no one has used the EO to solve the multi-label feature selection problem prior to this.

    2. Related Work

    2.1. Research on single-label feature selection problem

    The use of metaheuristic algorithms to solve the feature selection problem can select effective data features while keeping the computational consumption low, and has been the subject of considerable research. Li et al.7 adopted a wrapper-based multi-objective particle swarm algorithm with two objectives of maximum true-negative rate and maximum true-positive rate incorporating an improved PSV mechanism, variation operation and distance-based updating into the algorithm to increase the exploration capability and diversity of the algorithm. Hussain et al.8 fuses the sine-cosine algorithm and the Harris Hawk algorithm for large-scale data feature selection to improve convergence speed, accuracy, and dimensionality reduction without increasing computational consumption. Song et al.9 proposed a hybrid feature selection method based on filter and clustering and using particle swarm algorithm as classifier, which can significantly reduce the subsequent computational complexity in the process of filtering and clustering and fully consider the feature-label correlation and feature-feature redundancy. Wu et al.10 proposed a mixing-based improved binary quantum particle swarm algorithm based on a filtered framework, which numericalizes the dataset by maximum information coefficient (MIC) and uses a comprehensive learning strategy to improve the diversity of solutions.

    2.2. Research on multi-label feature selection problem

    One solution to multi-label feature selection is a problem transformation, which transforms a multi-label dataset into multiple single-label datasets and finally combines the classification results.11 However, the multi-label feature selection method based on problem transformation loses the correlation information contained between features and labels in the dataset, increases the computational complexity, and is not conducive to selecting an effective subset of features. Therefore, researchers have proposed feature selection methods based on algorithm adaptation. Zhang and Zhou12 proposed a multi-label k-nearest neighbor approach to predict labels by calculating the posterior probability from the label information of known k-nearest neighbors. This approach maintains the integrity of the intrinsic relationships of the data, but the computational consumption is still high.

    In order to preserve the original intrinsic information in the dataset and reduce the computational consumption, some other multi-label feature selection methods have been proposed. Among them, with the aid of metaheuristic algorithm which is insensitive to the target space and does not depend on gradient optimization, a weighted multi-objective PSO is proposed for feature selection of five medical datasets and better results are obtained with the validation of different classifiers.13 This shows that the metaheuristic algorithm is able to identify and select effective features for medically relevant datasets. Although this approach has good results and low computational complexity, it is difficult to determine the objective weights for different datasets. Dong et al.14 proposed an improved NSGA III algorithm based on Pareto optimality and tested it on 11 multi-label datasets, and the results showed that the method has good classification performance. Paniri et al.15 also focus on the curse of dimension faced by multi-label learning, so a filter-based ant colony algorithm is used to reduce the dimension of multi-labeled data, and different search objectives are proposed under unsupervised and supervised conditions.

    2.3. Research on equilibrium optimizer

    Equilibrium Optimizer (EO) has been used in many research areas due to its low computational complexity and excellent balance of exploration and exploitation capabilities. Zhong et al.16 added backward learning and levy flight to the update process of EO in order to solve the high-dimensional optimization problem, and considered the evolutionary population dynamics to improve the algorithm’s search capability; the algorithm shows good optimization results on high-dimensional benchmark test functions. Rizk-Allah and Aboul17 mixed pattern search technique and EO to propose a method to solve the combinatorial optimization problem and focus on wind farm layout as the main problem to optimize the siting of wind power facilities in the Gulf of Suez in the Red Sea. El-Ela et al.18 proposed a multi-objective equilibrium optimizer algorithm to solve the battery parameter assignment problem in a life maximization system and used a fuzzy technique to select a compromise solution from the set of pareto optimal solutions.

    Some studies have also used EO to solve the feature selection problem. Varzaneh et al.19 combined the filtered and wrapped frameworks by first performing an initial filtering of the data using maximum relevance minimum redundancy (mRMR), and then performing feature selection again using a wrapped-based EO that incorporates the concepts of Levy distribution and entropy increase. Vommi and Tirumala20 improved EO using positive cosine operation and bi-phase variation with a wrapper-based framework for feature selection on single-label standard biological datasets, and the results showed that the method performs well.

    The MOEO-Smp proposed in this paper is based on filter framework and has the optimization objectives of maximizing the feature-label correlation and minimizing the feature-feature redundancy in the dataset, and scoring the solutions and features based on the results of the optimization to select the promising solutions and features and guide other solutions update through an equilibrium pooling strategy. This method was tested on seven UCI single-label datasets, two Mulan multi-label datasets and one COVID-19 dataset from the real world to examine its single-label and multi-label feature selection capabilities. Three metrics are used to evaluate the feature selection results for single-label data and seven metrics are used to evaluate the feature selection results for multi-label data. The experimental results illustrate that MOEO-Smp has significantly better performance and overall better competitiveness in single-label feature selection and multi-label feature selection compared with 10 other state-of-the-art algorithms.

    3. Methodology

    3.1. Equilibrium optimizer

    Equilibrium Optimizer (EO) is a metaheuristic algorithm that is proposed based on physical phenomena, which has excellent global optimization capabilities.21 It is inspired by the relationship between the inflow, outflow and generation of mass in a control volume based on the equilibrium equation of mass in physics.

    The equilibrium pool is an important mechanism of EO, which is able to store the currently obtained solution with the best fitness performance to be used to guide other solutions to update, and is expressed as follows :

    Xeq,pool={Xeq1,Xeq2,Xeq3,Xeq4,Xeqave},(1)
    where Xeq1, Xeq2, Xeq3, Xeq4 are the four solution vectors with the best fitness performance so far, and Xeqave is the average of these four solution vectors. After the EO is initialized, each iteration follows the update as follows :
    Xdi=Xdeq+(XdiXdeq)F+G(1F)λV,(2)
    F=asign(r0.5)[eλt1],(3)
    t=[1iterT](biterT),(4)
    G=0.5r1logic(r2,GP)(XdeqλXdi)F,(5)
    logic(A,B)={1if AB0else,(6)
    where Xdi denotes the dth dimension of the ith solution in the population and Xdeq denotes the dth dimension of a random one in the equilibrium pool Xeq,pool. F is exponential term, t is time. Sign(⋅) is a symbolic function, the value is 1 or 1. Iter and T denote the current and maximum number of iterations, respectively. G is generation rate. logic(A,B) is a logical judgment, if the result is true then it equals 1, otherwise it equals 0. λ, r1 and r2 are all uniformly distributed random numbers between 0 and 1. a, b and GP are constants equal to 2, 1 and 0.5, respectively.

    3.2. Proposed MOEO-Smp

    Traditional algorithms usually use fitness as an important metric to evaluate optimization results; however, considering the improvement rate of the solution is equally important in finding a solution with potential.22 In the entire iterative process, the performance of the fitness value cannot express the complete information of the optimization rate, which may lead to a locally optimal solution instead of the solution with higher optimization potential to guide the algorithm to explore and exploit other solutions, thus falling into local optimality or accuracy degradation.

    Based on this, the proposed MOEO-Smp combines the characteristics of multi-objective optimization and establishes not only the optimal equilibrium pool that can store those solutions with optimal values obtained on each objective, but also the candidate solution equilibrium pool and the candidate feature equilibrium pool by considering the information of the solutions’ improvement rate on each objective and the scoring of each solution and each feature in the population based on the four pairs of reward and penalty principles and the optimization results to be one of the important indicators.

    After one iteration, MOEO-Smp compares the changes in fitness of each solution in the population on each objective, and if the updated solution is better than the original solution, the change is called positive phase (PP), and vice versa, called negative phase (NP), as shown in Fig. 1, where “0” and “1” indicate that the feature is not selected and selected into the feature subset, respectively. In order to better identify and select solutions with higher improvement rates on each objective to enter the candidate solution balancing pool, we developed two pairs of reward and penalty principles for rating the solutions: (1) encouraging all objectives are PP and penalizing all objectives are NP; (2) encouraging continuous optimization and punishing successive stagnation. In this way, when a solution is optimized multiple times continuously on a certain objective or optimized on all objectives, it has the opportunity to obtain higher rating in this iteration. The rating expression for solutions is as follows :

    RSi(iter+1)=RSi(iter)+logic([alo PP,alo NP],[MGim,MGst])CTR1,(7)
    where RSi(iter) denotes the rating of the ith solution at the i-ter iteration, and its initial value is 0. logic(⋅) is a logical judgment. alo PP and alo NP denote the scores where all objectives are PP and all objectives are NP, respectively. MGim and MGst denote exceeding the continuous optimization threshold and exceeding the continuous stagnation threshold on the objectives, respectively. []Tr denotes the transpose matrix. C1 denotes the solution rating matrix of size 1×4 and all elements are constants, with each element being 0.25. The score of the optimization result for the ith solution at the iterative is defined as
    Sscorei(iter+1)=[RSi(iter+1)](iter+1)+(1nsfiNF),(8)
    where Sscorei(iter) denotes the optimization result score of the ith solution at the iter-th iteration, and its initial value is 0. nsfi is the number of features selected in the ith solution, and NF is the total number of features. The larger Sscore indicates the higher optimization potential of the solution.

    Fig. 1.

    Fig. 1. The situation of optimization results.

    After one iteration, the features selected in the solution also change, and if the selection of a feature is added to the updated solution compared to the original solution, it is called select in (SI), and vice versa, called select out (SO), and the rest of the case are called unchanged (UC), as shown in Fig. 1.

    If the solution after iteration has a better fitness value, it means that it has selected some important features and eliminated some redundant features in general. Therefore, two pairs of reward and penalty principles are set for rating each feature in the solution according to how the solution changes in fitness and how the selected features change after the update: (1) encouraging SO and not all objectives are NP and penalizing SI and all objectives are NP; (2) encouraging non-UC and not all objectives are NP and penalizing UC and all objectives are NP. In this way, information about changes in fitness caused by changes in the selected features after the update will be reflected in the feature rating, and features that have higher contribution to the fitness will receive higher rating. The rating expression for features is as follows :

    RFdi(iter+1)=RFdi(iter)+SUM{logic([SI,SO,UC]TR)logic([alo PP,alo NP,else])C2},(9)
    where RFdi(iter) denotes the optimization result rate of the dth feature in the ith solution at the iter-th iteration with an initial value of 0. SUM{⋅} denotes the sum of all elements in the matrix. Else denotes any case except alo PP and alo NP. C1 denotes the feature rating matrix of size 3×3 and all elements are constants, as [0.25, 0.25, 0.25; 0.5, 0, 0.5; 0, 0.5, 0]. The optimization result score of the ith solution at the iterative is defined as :
    Fscored(iter+1)=[RFd(iter+1)][N(iter+1)]+nsfdN,(10)
    where Fscored(iter) denotes the optimization result score of the dth feature at the iter-th iteration, and its initial value is 0. nsfd is the number of times the dth feature is selected by all solutions, and N is the number of solutions in the population. The larger the Fscore, the greater the influence of the feature in the optimization.

    Based on the score ranking of the optimization results of solutions and features, we build the candidate solution equilibrium pool and the candidate feature equilibrium pool with the expressions as follows, respectively :

    Xeq,Spool={XSeq1,XSeq2,XSeq3,XSeq4,XSeqave},(11)
    Xeq,Fpool={randirank[Fscore(iter),g,h]},(12)
    where Xeq,Spool is the candidate solution balancing pool, XSeq1, XSeq2, XSeq3, XSeq4 are the four solution vectors with the best solution scores so far, and XSeqave is the average of these four solution vectors. Xeq,Fpool is the candidate feature balancing pool, randirank[X,g,h] means sort all elements of the matrix X in descending order, and randomly selects h% elements from the first g%. In addition, we build the optimal equilibrium pool capable of storing the solutions that have obtained the best performance on a given objective, with the following expressions :
    Xeq,Bpool={X1,X2,,Xm,,XM},(13)
    where Xm denotes the solution that obtains the optimal value on the mth objective and M is the number of objectives.

    The updated equation of MOEO-Smp replaces the equilibrium pool in EO with the candidate solution equilibrium pool and adds the optimal equilibrium pool to guide the particle exploitation and exploration direction. The expressions are as follows :

    Xdi=XdSeq+(XdiXdSeq)F+G(1F)λV+[12(iterT)]λXdm,(14)
    G=0.5r1logic(r2,GP)(XdSeqλXdi)F.(15)

    We propose an elite strategy with a nonlinear learning rate for the candidate feature equilibrium pool by: (1) generating a nonlinear learning rate lr, as in (15), after each update; (2) generating a random number r0 uniformly distributed between 0 and 1 for each feature corresponding to the candidate solution equilibrium pool; (3) When lr is greater than r0, the corresponding feature in the candidate solution equilibrium pool will be removed in the solution Xi, thus generating the elite Xi; (4) If elite Xi outperforms Xi on all objectives, Xi is replaced by elite Xi. If Xi outperforms elite Xi on all objectives, Xi remains unchanged. In other cases, there is a probability of 1/2 that elite Xi replaces Xi.

    lr=10.5×(1(2iterT1))3.(16)

    Figure 2 shows the variation of the nonlinear learning rate lr at different iteration periods. In the early iterations, lr is larger than the linear learning rate and increases at a slower rate, giving the algorithm sufficient opportunity to explore other subsets, preventing prematurely falling into local optima and facilitating rapid reduction of the feature subset size. In the late iteration, the lr is smaller than the linear learning rate and grows at a faster rate, allowing the algorithm to retain more optimization information and exploit it finely, which is conducive to convergence.

    Fig. 2.

    Fig. 2. Learning rate in MOEO-Smp.

    In addition, the pareto optimal solutions generated by MOEO-Smp based on the dominance relationship are saved in the archive, and when the archive capacity is full, other solutions are replaced based on the ranking of the crowding distance of the solutions, and the flow chart of the algorithm is shown in Fig. 3.

    Fig. 3.

    Fig. 3. MOEO-Smp algorithm flowchart.

    4. Experiment and Discussion

    In this section, seven single-label datasets, and three multi-label datasets are used to test the feature selection ability of the filter-based approach MOEO-Smp. In the single-label feature selection experiments, the performances of 10 other multi-objective algorithms with KNN as classifier are compared on three metrics. In the multi-label feature selection experiments, the performances of 10 other multi-objective algorithms with MLKNN as classifier are compared on seven metrics. The results show that the feature subset selected by MOEO-Smp has better classification ability.

    All experiments are run on a computer running the Windows 10 operating system with an Intel(R) Core(TM) i9-13900HX 2.20GHz and 16.00 GB of RAM. All algorithm codes are run on MATLAB 2019b software.

    4.1. Metrics and parameters

    The UCI dataset contains classification data in various domains and is the most commonly used standard dataset for machine learning. The Mulan dataset contains a variety of multi-label data and is widely used in multi-label feature selection studies. The experimental part of this paper contains seven UCI single-label data, two Mulan multi-label data to test the feature selection ability of the proposed method.

    In addition, to further validate the multi-label feature selection performance of MOEO-Smp, the diagnosis of COVID-19 and its clinical spectrum dataset from Kaggle is used to further test the effectiveness of the proposed MOEO-Smp to solve the multi-label feature selection problem in the real world. All the dataset information is shown in Table 1.

    Table 1. Dataset for experiment.

    DatasetsSampleFeatureLabelClassification
    Single-labelLandsat2320003616
    Msplice2331752403
    Pima2376882
    Sonar23208602
    Spambase234601572
    Wine23178133
    Wpbc23198332
    Multi-labelEmotion24,255937262Label
    Birds24,2564530019
    COVID-1926,2756441005

    In this multi-label and single-label feature selection problem, MOEO-Smp obtains a Pareto optimal solution set with the maximum mutual information value between features and labels and the minimum cosine similarity between features and features as two optimization objectives, respectively :

    MI(X,Y)=SUMxXSUMyYp(x,y)log[p(x,y)p(x)p(y)],(17)
    CS(X,Y)=|SUMdi=1(XiYi)[SUMdi=1X2iSUMdi=1Y2i]|,(18)
    where MI(X,Y) is the mutual information of X and Y, p(x) and p(y) donate probability function. CS(X,Y) is the cosine similarity of variable X and Y, d is the variable dimension.

    In single-label experiments, the accuracy (Acc), number of features (NF) selected by feature subsets and the running time (time) of the algorithm are used as metrics to evaluate single-label feature selection. Acc is calculated as follows :

    Acc=(TP+TN)(TP+FP+TN+FN),(19)
    where TP denotes the samples whose predicted values match with the true values and both are positive; TN denotes the samples whose predicted values match with the true values and both are negative; FP denotes the samples whose true values are negative and predicted values are positive; FN denotes the samples whose true values are positive and predicted values are negative.

    In multi-label experiments, the hamming loss (HL), ranking loss (RL), one-error rate (OE), coverage rate (CV) and average precision (AP) as the metrics to evaluate the performance of feature selection.28 In addition, the number of features (NF) and the running time (time) are also compared, where, smaller HL, RL, OE, CV, time and NF indicate better performance of the algorithm and larger AP indicates better performance of the algorithm.

    Suppose D={(xi,Yi),i=1,2,,s} is the test set, xi is the instance, s is the label space, L is the label set, Yi is the correct subset of labels, and YiL. h is the multi-label classifier. The metric calculations are as follows :

    HL(h)=1sSUMSi=1|h(xi)ΔYi|,(20)
    where || denotes the base of the returned set and Δ denotes the symmetric difference between the two sets. Hamming loss is able to calculate the percentage of misclassified labels.
    RL(f)=1sSUMSi=11(|YiYi|){(y,y)|f(xi,y)f(xi,y),(y,y)YiYi},(21)
    where Yi is complement of ˉYi¯,f(,) is a real-value function. Ranking loss indicates the proportion of irrelevant labels ranking higher than the relevant tags ranking.
    OE(f)=1sSUMSi=1logic(arg maxyLf(xi,y)Yi),(22)
    where indicates not belonging to the relation. One-error rate can calculate the percentage of instances where the top-ranked label is not in the set of related labels.
    CV(f)=1sSUMSi=1maxrankf(xi,y)1,(23)
    where rankf(x,y) denotes the ranking of y in L according to f(x,y) to be returned in descending order. Coverage rate can be calculated as the average number needed to cover all relevant tags down the list of labels.
    AP(f)=1sSUMSi=11|Yi|SUMyY{y|rankf(xi,y)rankf(xi,y),yYi}rankf(xi,y).(24)

    Average precision indicates the average precision of the related labels ranked on above a particular label yY.

    In addition, the running time (time) of the algorithm and the number of features (NF) selected by feature subsets are also compared, where smaller HL, RL, OE, CV, time and NF indicate better performance of the algorithm and larger AP indicates better performance of the algorithm.

    NSGA-II,29 MOPSO,30 MOEA-D,31 MOALO,32 MOHHO,33 MOTEO,34 MSSA,35 MOEA-NSGA-II,36 MOABC37 and NSGWO38 are used as comparison algorithms with the same parameter settings as in the original paper. These algorithms are binarized using the following transfer function :

    BX=logic[(1(1+expX)),0.5],(25)
    where X is the solution vector. Logic() is a logical judgment, the value is 0 or 1. If the dimension in BX is “1”, the feature is selected; if the dimension in BX is “0”, the feature is not selected. All algorithms have a population size of 30 and a maximum number of iterations of 500, and KNN (k=5)39 and MLKNN (k=10)12 as classifiers for single-label and multi-label feature selection tests, respectively. Sixty percent of the data are used for training and 40% of the data are used for testing to verify the results of feature selection.

    4.2. Single-label feature selection and experimental results

    For the single-label feature selection test, we selected seven UCI single-label datasets and compared them with 10 other state-of-the-art algorithms to evaluate the results in terms of the accuracy (Acc), number of features (NF) selected by feature subsets and the running time (time).

    Accuracy can reflect the quality of the feature subsets selected by the algorithm. Table 2 shows the correct classification rates of the feature subsets selected by MOEO-Smp and the other 10 compared algorithms on 7 single-label datasets. The results showed that MOEO-Smp obtained better performance than the other 10 comparison algorithms on the msplice, pima, sonar, spambase, wine and wpbc datasets. Although MOPAO and MOALO obtained the highest accuracy in the landsat dataset, MOEO-Smp was second only to them, obtaining the third best result.

    Table 2. Accuracy (acc) of 11 algorithms on 7 single-label datasets.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    landsatAve0.864050.874870.856480.872780.866340.868680.868120.869270.869370.869880.87132
    Std0.007090.007970.010660.010340.005190.010160.011290.007300.008280.008080.00644
    mspliceAve0.730020.815940.766230.827980.797700.769060.822130.787810.795840.822660.82920
    Std0.007340.007210.022580.010380.012260.006770.007370.012110.007840.008530.00449
    pimaAve0.677660.693830.687090.701950.695370.692340.700740.704030.706620.692520.71145
    Std0.016930.013780.017500.010660.019230.010130.010650.015310.014100.013070.01390
    sonarAve0.736310.737000.729680.746050.714600.727660.732560.737140.739580.731880.74633
    Std0.026080.024930.037800.043010.023190.031250.037970.040220.037220.032020.02560
    spambaseAve0.823610.830730.828720.832070.827230.815830.831430.818030.816790.830330.83334
    Std0.009860.010080.036780.010760.019570.005570.014790.012070.008800.009950.00926
    wineAve0.736310.737000.729680.746050.714600.727660.732560.737140.739580.731880.74633
    Std0.016490.035460.051090.031020.036770.026120.035340.029940.028120.038790.04417
    wpbcAve0.823610.830730.828720.832070.827230.815830.831430.818030.816790.830330.83334
    Std0.029570.028360.025050.026010.022150.031540.028240.027780.035370.026590.02440

    The lower number of features in the feature subset indicates that the algorithm performs better in dimensionality reduction. Table 3 shows the average number of features in feature subsets for MOEO-Smp and the other 10 compared algorithms on 7 single-label datasets. The feature subsets of NSGA-II obtained the least number of features on landsat, pima, sonar and spambase. The feature subset of MOEA-D has the lowest average dimensionality on wine and wpbc. The feature subset of MOEO-Smp has the lowest number of feature selections only on msplice. The number of selected features is one of the metrics to evaluate the goodness of feature subsets, but the correct classification rate is a more important evaluation metric.3 Although MOEO-Smp only has the lowest number of selected features on msplice, it obtained at least the second or third best performance on landsat, pima, sonar, spambase, wine and wpbc compared to the other 10 compared algorithms, and compared to NSGA-II and MOEA-D, which have the best dimensionality reduction, MOEO-Smp’s feature subset has significantly higher classification accuracy. Thus, the feature subset selected by MOEO-Smp not only has better data dimensionality reduction capability, but also maintains high classification accuracy.

    Table 3. Number of features selected (nf) by feature subsets of 11 algorithms on 7 single-label datasets.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    landsatAve14.1216721.8966710.4550023.1390919.0130920.2337523.0813223.8666724.9466722.1900017.67405
    Std0.568910.629760.511310.819571.155810.312170.622390.596970.537090.690120.54178
    mspliceAve95.04000144.06533121.42500137.50391120.77100132.47333140.16219156.96333156.36333149.17333115.28873
    Std1.142053.319801.970124.601093.826951.064572.462091.603721.334422.752533.83372
    pimaAve3.196674.363933.246673.923063.935624.598653.995135.275525.406674.026674.03009
    Std0.334720.214450.328100.305290.244210.105300.383820.320850.245570.118770.18821
    sonarAve23.7566737.2733322.2016738.7183331.8791533.3329238.9083339.5050040.8066738.1100029.02443
    Std0.492650.642330.633051.383081.740630.425791.022890.992560.721000.621650.93966
    spambaseAve22.4833336.5209722.7833337.4540431.1666832.6341737.3446637.5883338.9333338.1083329.28637
    Std0.707810.593071.512541.584741.795620.465310.919140.380560.579680.660841.02480
    wineAve5.118337.691672.445007.306846.723077.419587.437648.693339.065007.018336.35864
    Std0.293690.357520.305170.487420.579520.221110.349810.357310.340880.242660.33306
    wpbcAve12.9783319.963339.1383320.7933316.9425518.2991720.9716722.0566723.0400020.2200015.84511
    Std0.602460.620780.536420.801581.088180.439820.701650.508310.532740.582230.60241

    The runtime is the speed at which the algorithm performs feature selection and reflects the complexity of the algorithm. Table 4 shows the average running time of each algorithm for each independent operation on the seven datasets, where NSGA-II and NSGWO have the fastest average running speed, while MOEO-Smp ranks in the middle of the 11 algorithms, which indicates that MOEO-Smp, similar to most of the other compared algorithms, does not have a faster running speed, which is mainly due to the multi-balance pool mechanism leads to an increase in spatial complexity, but this appears from Tables 2 and 3 that such a mechanism can effectively reduce the data dimensionality and maintain a high classification accuracy, which is especially important for subsequent learning algorithms for further data processing. In addition, MOEO-Smp outperforms MOEA-D, MOALO, MOTEO and MOABC in terms of running time, and differs from MOPSO and MOHHO by about 1 s on average, while ensuring higher classification accuracy and better dimensionality reduction.

    Table 4. Average running time (time) of 11 algorithms on 7 single-label datasets.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    landsatAve0.200763.422215.107365.548613.852236.435580.345191.470525.269220.885184.27232
    Std0.014380.208550.272600.090110.330370.080870.023260.080790.058850.017990.18835
    mspliceAve0.900974.791044.9427943.128275.194929.583781.130136.266536.974834.562723.39810
    Std0.035045.330750.327513.730520.395790.132950.034680.154390.121320.184160.49944
    pimaAve0.172972.913906.373771.390063.353236.539270.175431.708104.895950.882311.95560
    Std0.006800.251760.074980.057870.342200.084440.017920.065150.144950.028020.29014
    sonarAve0.242483.951645.180918.858994.395556.524380.506141.847305.454271.079985.38975
    Std0.007420.336600.321400.174410.316350.034340.024920.085100.035520.035980.10646
    spambaseAve0.283653.072574.797688.484523.791027.114350.386182.021665.588501.463623.46477
    Std0.010200.279000.486670.090910.362790.132090.017630.093520.083280.063480.70257
    wineAve0.169563.020294.812482.181674.726726.245590.282411.052055.235410.806533.21216
    Std0.009770.255480.421020.074330.451420.053360.013510.085570.057140.026210.20952
    wpbcAve0.186533.866994.930585.156364.645856.320580.395131.352235.311320.901454.53122
    Std0.011260.290850.376530.113960.397170.096570.018060.124940.072550.043710.16788

    Table 5 shows the Friedman test of the Acc, Nf and time. The p-value values on all seven indicators are less than 0.05, indicating that a statistically significant the p-value values on all seven indicators are less than 0.05, indicating that a statistically significant difference between algorithms, where MR denotes the average ranking from smallest to largest on the seven datasets and Rank denotes the overall ranking. MOEO-Smp ranks 1, 3 and 6 on Acc, Nf and time, respectively, and the p-values are all less than 0.05, indicating that there is a significant difference between the results. In conclusion, this is consistent with the analysis in Tables 24, i.e., MOEO-Smp is able to select a suitable subset of features while maintaining optimal accuracy and better dimensionality reduction without causing too slow running time.

    Table 5. Friedman test of feature selection results.

    AccNfTime
    MRRankMRRankMRRank
    NSGA-II293.4391.5711.001
    MOPSO307.5737.0075.435
    MOEA-D313.29101.7128.008
    MOALO329.7126.8668.719
    MOHHO333.8683.8647.007
    MOTEO342.43115.86510.5711
    MSSA356.7147.5792.002
    MOEA-NSGA-II365.86710.00104.714
    MOABC376.29510.86119.2910
    NSGWO386.1467.2983.143
    MOEO-Smp10.7113.4336.146
    p-Value2.333E-63.273E-92.793E-9

    Table 6 shows the results of feature selection using pima, sonar and wine dataset in other literatures. Faris et al.40 use a wrapper structure, so its classification accuracy on sonar and wine is higher than the filtered structure in this paper. However, since the classifier needs to be run to validate the feature subset in each iteration, this study is significantly better than the wrapper structure in terms of computational complexity, and in terms of dimensionality reduction ability, the feature subset size of MOEO-Smp is slightly smaller than that of BSSA_S3_ CP. Sun et al.41 combine fuzzy multi-objective neighborhood rough set and whale algorithms, which are best at dealing with unbalanced data, such as pima, and comparing the two-dimensionality reduction capabilities reveals that TFSFB has a much better performance, with a feature subset of approximately 0.43 features less than that of MOEO-Smp. Ranjan and Chhabra42 propose a packed crow search algorithm (MO-CSACFS) for clustering and feature selection, which significantly outperforms clustering methods such as FCM, K-means and medoids in the experiments. The dimensionality reduction ability of MOEO-Smp is better than that of MO-CSACFS overall, and on the wine dataset, MOEO-Smp also has higher classification accuracy. In conclusion, MOEO-Smp still has a gap in the accuracy of single-label feature selection compared with wrapper algorithms, but it has advantages in dimensionality reduction ability and computational speed.

    Table 6. Comparison with other literature single-label datasets experimental results.

    MOEO-SmpBSSA_S3_CP40TFSFB41MO-CSACFS42
    AccNfTimeAccNfTimeAccNfTimeAccNfTime
    pimaAve0.711454.030091.955603.65.55
    Std0.013900.188210.29014
    sonarAve0.7463329.024435.389750.937233.366728
    Std0.025600.939660.106460.00972.8585
    wineAve0.746336.358643.212160.99336.33330.739810.5
    Std0.044170.333060.209520.00560.9589

    4.3. Multi-label feature selection and experimental results

    In the multi-label feature selection test, we select two Mulan multi-label datasets and one real-world COVID-19 multi-label data. And compared with 10 other state-of-the-art algorithms to evaluate the feature selection results from the hamming loss (HL), ranking loss (RL), one-error rate (OE), coverage rate (CV) average precision (AP), the number of features (NF) and the running time (time) these seven aspects. It is worth noting that the COVID-19 dataset collects clinical data from the early stages of the infectious disease and its aims are to diagnose the disease faster and allocate medical resources according to the condition with the help of computer technology. The dataset is based on the blood sample results and personal information of 5644 patients to determine whether a patient is positive and whether need to admit a positive patient or a patient requiring admitted treatment, with four levels of care based on the patient’s condition. The problem has practical significance for reducing the pressure on health care resources in times of mass epidemic infections.

    Table 7 shows the feature selection results of the 11 algorithms on the multi-label dataset Emotion. From Table 7, we can see that the feature subset selected by MOEO-Smp has the best average performance on HL, RL, OE, CV and AP. In terms of Nf metrics, the average number of features selected for the feature subset of NSGA-II is about 28.6, which is the best performance among the 11 algorithms, followed by MOEA-D with the average number of features selected for the feature subset of about 29.2, and the average number of features selected for the feature subset of MOEO-Smp is about 37.0, which is second only to NSGA-II and MOEA-D. Therefore, the proposed MOEO-Smp is equally competitive in terms of dimensionality reduction capability. In terms of average running time, MSSA has the shortest average running time among all algorithms, only about 0.43 s. However, MSSA performs worse than MOEO-Smp in all other metrics, while the running time of MOEO-Smp is not much different from MOPSO, MOEA-D, MOBO-I, MOALO and MOEA-NSGA-II, and it can overall obtain better feature selection results.

    Table 7. Results of 7 evaluation metrics for 11 algorithms on the emotion dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLAve0.277120.269370.273610.269240.266420.268550.271030.270950.269000.270840.26611
    Std0.005870.006700.006010.010640.005940.006000.008020.006140.007570.005440.00456
    RLAve0.287430.258770.275050.261160.260530.262540.263380.263600.259710.272250.25735
    Std0.011260.012490.011070.018850.009640.014470.010280.010860.013300.011960.01172
    OEAve0.428680.390470.405780.400860.393750.394810.396760.397040.393000.399390.38929
    Std0.013150.019600.015620.027720.016220.019560.014760.018260.015370.015640.01746
    CVAve2.408132.285872.354562.279972.273382.288602.300362.306862.275572.340212.26087
    Std0.068940.073090.058840.096910.049270.072150.078430.058070.086910.073560.06188
    APAve0.686290.711170.698570.707050.708780.708080.707200.705420.710540.701900.71288
    Std0.009410.012320.009400.017900.009110.013200.008870.011040.010710.011510.01088
    NfAve28.6133346.3616746.8483329.2483345.8946.6833341.8833347.2033346.7116747.9516736.98029
    Std0.655470.763210.790110.814590.854991.595661.399480.816850.678521.074230.91615
    Time (/s)Ave0.471637.3601874.1197010.002128.705699.353033.406710.431183.340019.0311810.60926
    Std0.032720.516705.865980.654500.154380.112560.263510.015151.944483.276110.36725

    Table 8 shows the Friedman test of the seven metrics of the 11 algorithms on the emotion dataset. the rank of MOEO-Smp on HL, RL, OE, CV and AP are all 1 and the p-value is less than 0.05, which indicates that MOEO-Smp significantly outperforms the other algorithms. On the Nf metric, the rank of MOEO-Smp is 3, which indicates that its number of selected features is competitive among the 11 algorithms. On the Time metric, although MOEO-Smp has a low rank, we can see in the actual situation in Table 7 that the actual running time differs from that of five of the algorithms by only about 1s. In addition, the sum rank of MOEO-Smp is the best among the 11 algorithms.

    Table 8. Friedman test of 7 evaluation metrics for 11 algorithms on the emotion dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLMR8.655.507.806.004.455.356.456.055.356.254.15
    Rank(11)(5)(10)(6)(2)(4)(9)(7)(4)(8)(1)
    p-Value4.445E-04
    RLMR10.054.408.755.105.205.205.305.604.707.454.25
    Rank(11)(2)(10)(4)(6)(6)(7)(8)(3)(9)(1)
    p-Value3.013E-10
    OEMR10.14.936.856.505.055.185.905.804.906.204.60
    Rank(11)(3)(10)(9)(4)(5)(7)(6)(2)(8)(1)
    p-Value4.489E-06
    CVMR9.655.158.654.704.955.505.556.204.726.754.18
    Rank(11)(5)(10)(2)(4)(7)(6)(8)(3)(9)(1)
    p-Value4.223E-08
    APMR2.057.553.856.156.706.356.306.407.605.008.05
    Rank(11)(3)(10)(8)(4)(6)(7)(5)(2)(9)(1)
    p-Value1.414E-08
    NfMR1.307.008.251.706.357.604.059.057.909.783.00
    Rank(1)(6)(9)(2)(5)(7)(4)(10)(8)(11)(3)
    p-Value8.662E-30
    TimeMR1.955.1511.008.456.207.253.501.053.658.259.55
    Rank(2)(5)(11)(9)(6)(7)(3)(1)(4)(8)(10)
    p-Value4.029E-35
    Sum rank9311546782101

    The results in Tables 7 and 8 illustrate that MOEO-Smp is able to obtain significantly better performance in most of the metrics. Although MOEO-Smp does not have a significant advantage over some algorithms in terms of runtime, MOEO-Smp is able to achieve better performance in other metrics within an acceptable time frame.

    Table 9 shows the feature selection results of the 11 algorithms on the multi-label dataset Birds. The feature subset selected by MOEO-Smp has the best average performance on RL, CV and AP. For the HL metric, MOALO obtained the best performance with a value of 0.05234, while MOEO-Smp obtained the next best performance with an HL value of 0.05235, which differed from MOALO by only 0.00001. On the Nf metric, NSGA-II and MOHHO had values of about 103.0 and 109.4, respectively, and MOEO-Smp had a value of about 111.7, which differed from the former by 8.7 and 1.3, respectively, and obtained the third best performance among the 11 algorithms. In terms of Time metrics, MSSA and NSGA-II have faster running speeds of about 1s and 1.6s. The average running time of MOEO-Smp is about 4.9s, which differs from the former by about 3.9 s and 3.3s, respectively, but is in the middle of the range compared to all other algorithms.

    Table 9. Results of 7 evaluation metrics for 11 algorithms on the birds dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLAve0.053970.053260.053590.054430.053660.052340.055910.053520.053260.053790.05235
    Std0.003100.001920.001570.003550.003050.001160.002860.001150.002240.003350.00154
    RLAve0.161680.165030.169980.168270.163590.168450.168280.163910.165830.166580.15933
    Std0.010820.006470.008020.013250.008540.007320.009460.006940.010120.009520.00719
    OEAve0.843050.855770.845940.847020.849370.860990.849230.852850.853070.855140.84448
    Std0.013860.009530.011990.014730.016490.008390.020080.009260.012250.017610.01072
    CVAve3.658603.706983.777563.788173.676133.755123.851363.722883.689053.733043.55767
    Std0.280280.181340.148000.318650.227920.144500.231370.165860.218070.265410.16483
    APAve0.217570.208230.216960.215020.214340.204760.214180.208500.209330.209510.21839
    Std0.013300.007640.009760.013880.012510.005940.013320.007300.011670.013540.00914
    NfAve103.0100147.97697140.90996140.38333145.72774133.48565109.36944145.00056147.39712167.36167111.67458
    Std1.546596.508149.9212111.915566.7284510.327365.301687.771027.092842.381617.53877
    Time(/s)Ave1.604612.64803119.9547410.733699.4156339.966143.434871.040101.976334.394044.87430
    Std0.072610.417989.602022.300480.256350.264761.548220.027420.207870.171886.17833

    Table 10 shows the Friedman test of the birds dataset for seven metrics for 11 algorithms. MOEO-Smp obtained the best ranking on both RL and CV with p-values less than 0.05, which indicates that MOEO-Smp significantly outperformed the other algorithms. However, in HL, OE and AP, MOEO-Smp obtained the second best ranking after MOALO, NSGA-II and MOGWO, respectively. It is worth noting that, although in Table 9, MOEO-Smp has a slightly better AP value scale mean than MOGWO, the larger variance of MOGWO leads to a larger overall fluctuation of some AP and a better ranking of some results, resulting in a slightly higher overall ranking than MOEO-Smp. In addition, the Nf ranking of MOEO-Smp is second only to NSGA-II and MOHHO, and among all algorithms MOEO-Smp has a better dimensionality reduction ability. The Time rank of MOEO-Smp is the fifth, which performs better among 11 algorithms. The sum rank of MOEO-Smp ranks first among 11 algorithms, and its overall performance in seven metrics is better than well-known algorithms such as NSGA-II, MOPSO and MOMVO.

    The results in Tables 9 and 10 illustrate that MOEO-Smp was able to obtain significantly better results on three of the seven evaluation metrics than the other 11 algorithms. Although again MOEO-Smp failed to obtain the best performance on the other four metrics, it also obtained the second best or third best performance on three of them.

    Table 11 shows the feature selection results of the 11 algorithms on the multi-label dataset COVID-19. The feature subset selected by MOEO-Smp has the best average performance on HL, OE, CV and Nf. In the RL and AP metrics, MOALO obtained the best performance with values of 0.01756 and 0.98138, respectively, but MOEO-Smp was second only to MOALO with a value of 0.98131, with a difference of only 0.00029 and 0.00007, respectively. The average running time of MOEO-Smp is about 1.97s, which is second only to MSSA, MOMVO and NSGA-II, and it performs better among all algorithms

    Table 10. Friedman test of 7 evaluation metrics for 11 algorithms on the birds dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLMR6.555.756.506.656.253.908.255.755.656.634.13
    Rank(8)(5)(7)(10)(6)(1)(11)(5)(3)(9)(2)
    p-Value2.939E-03
    RLMR4.805.307.856.755.706.756.905.755.906.453.85
    Rank(2)(3)(10)(8)(4)(8)(9)(5)(6)(7)(1)
    p-Value1.356E-02
    OEMR4.337.704.685.106.038.505.456.406.506.804.53
    Rank(1)(10)(3)(4)(6)(11)(5)(7)(8)(9)(2)
    p-Value2.909E-04
    CVMR5.005.907.056.555.306.758.156.305.256.103.65
    Rank(2)(5)(10)(8)(4)(9)(11)(7)(3)(6)(1)
    p-Value3.802E-03
    APMR7.454.407.756.806.153.756.704.805.405.257.55
    Rank(3)(10)(1)(4)(6)(11)(5)(9)(7)(8)(2)
    p-Value1.840E-04
    NfMR1.208.256.656.257.454.752.407.558.1010.92.50
    Rank(1)(10)(6)(5)(7)(4)(2)(8)(9)(11)(3)
    p-Value2.533E-29
    TimeMR2.004.7011.008.558.0510.005.951.003.106.804.85
    Rank(2)(4)(11)(9)(8)(10)(6)(1)(3)(7)(5)
    p-Value1.545E-35
    Sum rank2688410953111

    Table 12 shows the Friedman test of the seven metrics of the 11 algorithms on the COVID-19 dataset. MOEO-Smp obtained the best ranking on HL, CV and Nf. It is noteworthy that the p-value is greater than 0.05 for the HL metric, which indicates that the difference between the 11 algorithms on the HL metric is not significant, and the reason may be that there is a sample imbalance in this data resulting in a small gap between HL. The p-values of the other indicators were less than 0.05, which indicated that the experimental results of the other indicators were significantly different. On the RL and AP indicators, although MOALO received the best ranking, MOEO-Smp received the second best ranking after MOALO. On the OE metric, MOALO and NSGA-II received the highest ranking, and MOEO-Smp received the third best ranking after them. In the Time metric, MSSA has the best rank, and MOEO-Smp has a rank of 5, which is in the middle of the 11 algorithms. And in sum rank, MOEO-Smp ranks first among the 11 algorithms, indicating that the overall performance of MOEO-Smp is better than all other compared algorithms.

    The results in Tables 11 and 12 illustrate that MOEO-Smp obtained the best results in four of the seven evaluation metrics and had significant differences compared to the results of the other algorithms. Also, the rank and sum rank indicate that MOEO-Smp is highly competitive among all the compared algorithms.

    In summary, the results of multi-label feature selection experiments with 10 comparison algorithms show that MOEO-Smp is able to obtain a subset of features with higher classification accuracy within an acceptable time frame compared to well-known algorithms such as NSGA-II, MOPSO and MOGWO. On some metrics that did not obtain the best performance and ranking, MOEO-Smp was able to obtain the second best or third best results in most cases and ranked proudly overall. Also, MOEO-Smp is one of the best algorithms in terms of dimensionality reduction among all the compared algorithms. Therefore, among the 11 algorithms, MOEO-Smp is the best overall performer when performing multi-label feature selection.

    Table 11. Results of 7 evaluation metrics for 11 algorithms on the COVID-19 dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLAve0.031650.031950.031720.031760.031420.031430.031770.031650.031870.031390.03101
    Std0.001570.000680.000390.001570.000910.001250.001160.001670.001100.001510.00082
    RLAve0.017890.018770.019180.017940.017780.017560.018000.018520.018090.018330.01774
    Std0.001840.000980.000780.002010.001280.000980.001160.001830.001130.002010.00057
    OEAve0.029390.031160.030470.029510.029370.029240.029890.030360.029590.030130.02923
    Std0.002590.001550.000850.002930.001990.001750.001490.003020.001640.002540.00078
    CVAve0.169790.171450.173390.170670.167570.168000.169870.170720.171490.168470.16634
    Std0.009120.004090.003600.010120.006310.005540.005720.009090.006470.009430.00447
    APAve0.981200.980070.980120.981040.981260.981380.980990.980480.981000.980660.98131
    Std0.001690.000950.000520.001890.001230.000990.001110.001810.001040.001770.00057
    NfAve39.4550051.8613854.0478843.0533352.1342948.5547740.6944950.5197450.7530364.9833335.00837
    Std0.770531.952815.672091.376211.958262.354582.718992.054932.062561.007544.32374
    Time(/s)Ave0.767552.9282366.965358.876056.1842225.444482.054030.523570.746642.816141.97491
    Std0.0440.244195.068800.531010.1484126.729422.017390.517930.028540.103810.22623

    Table 12. Friedman test of 7 evaluation metrics for 11 algorithms on the COVID-19 dataset.

    NSGA-II29MOPSO30MOEA-D31MOALO32MOHHO33MOTEO34MSSA35MOEA-NSGA-II36MOABC37NSGWO38MOEO-Smp
    HLMR6.157.356.356.655.455.556.305.806.355.554.50
    Rank(6)(11)(9)(10)(2)(4)(7)(5)(9)(4)(1)
    p-Value4.247E-1
    RLMR5.007.758.755.555.354.355.856.505.955.955.00
    Rank(3)(10)(11)(5)(4)(1)(6)(9)(8)(8)(3)
    p-Value1.016E-03
    OEMR5.008.257.555.655.654.956.156.155.506.135.10
    Rank(2)(11)(10)(6)(6)(1)(9)(9)(4)(7)(3)
    p-Value2.707E-02
    CVMR5.907.107.906.154.755.206.256.356.505.254.65
    Rank(5)(10)(11)(6)(2)(3)(7)(8)(9)(4)(1)
    p-Value5.574E-02
    APMR6.903.803.756.406.557.506.105.506.555.757.20
    Rank(3)(10)(11)(6)(5)(1)(7)(9)(5)(8)(2)
    p-Value1.699E-03
    NfMR2.257.958.553.808.355.902.807.007.2010.951.25
    Rank(2)(8)(10)(4)(9)(5)(3)(6)(7)(11)(1)
    p-Value1.116E-31
    TimeMR2.506.5510.98.957.9510.104.201.402.306.354.80
    Rank(3)(7)(11)(9)(8)(10)(4)(1)(2)(6)(5)
    p-Value7.969E-36
    Sum rank2101174358691

    Table 13 shows the experimental results from literature43,44,45,46,47 using Emotion, birds and COVID-19 datasets to test the effectiveness of the proposed algorithm. We can see that on the Emotion dataset, the Hamming loss of MOEO-Smp outperforms RFSFS and RWFS, but is worse than MOMFS. The number of selected features is similar to that of MOMFS, which indicates that both of them are more effective in dimensionality reduction, but MOEO-Smp significantly outperforms MOMFS in terms of correctness and running time. On the Birds dataset, the HL metrics of the proposed method are similar to those of RWFS and MFSJMI, but MOEO-Smp runs about 25 times faster than MFSJMI. On the COVID-19 dataset, although the literature47 uses data from the same source as this paper, the methods are not similar in the processing of the data, and its data size is smaller than that of the COVID-19 dataset in this paper, and then in this case, the MOEO-Smp is much higher than the BMPA-TVSinV, which may be caused by the difference in the number of unbalanced samples in the data. All in all, MOEO-Smp is not inferior to other methods proposed in the literature in terms of feature selection ability, and also has an advantage in terms of runtime.

    Table 13. Comparison with other literature multi-label datasets experimental results.

    EmotionBirdsCOVID-19
    MOEO-SmpRFSFS43MOMFS44RWFS45MOEO-SmpRWFS45MFSJMI46MOEO-SmpBMPA-TVSinV47
    HLAve0.266110.27830.190.26770.052350.050.05060.03101
    Std0.004560.01460.01750.001540.00170.00130.00082
    APAve0.712880.310.218390.981310.8677
    Std0.010880.009140.00057
    NfAve36.9802936111.6745835.00837
    Std0.916157.538774.32374
    TimeAve10.6092615.814.87430133.591.97491
    Std0.367256.178330.22623

    5. Conclusion

    In this study, a filter-based improved multi-objective equilibrium optimizer (MOEO-Smp) is proposed to solve the feature selection problem for both single-label and multi-label datasets. The scoring mechanism and optimization principles are introduced so that solutions with high optimization potential and better performance are filtered out from the population and key features in the better performing solutions can be selected after updating. The optimal equilibrium pool, candidate solution equilibrium pool and candidate feature equilibrium pool are created so that the information of the most promising solutions and features are recorded to guide the update of other solutions. Also, the learning probability dynamically adjusts the convergence and search ability of the algorithm during the iterative process of MOEO-Smp, which not only enables the proposed algorithm to search the target space extensively at the beginning of the iteration to prevent falling into local optimum, but also accelerates the speed and accuracy of convergence at the later stage. Seven single-label datasets from UCI are used to test the single-label feature selection ability of MOEO-Smp, and two multi-label datasets from Mulan and one COVID-19 multi-label dataset from real-world problems are used to test the feature selection ability of MOEO-Smp for both single-label and multi-label data. The results of the single-label feature selection experiments show that MOEO-Smp has significantly better results in terms of accuracy of classification, number of selected features and running time compared to the other 10 state-of-the-art algorithms. The multi-label feature selection experiments evaluate the feature selection results in terms of seven metrics, and the results show that MOEO-Smp has the best overall competitiveness compared to the other 10 state-of-the-art algorithms and has the best performance on multiple metrics in each multi-label dataset. In summary, MOEO-Smp not only solves the single-label feature selection task excellently, but also achieves better performance for multi-label data with larger sample size. Because of the need to score the optimization process as well as the optimization results, there is a slight increase in the computing time of MOEO-Smp in practical applications, so how to reduce the amount of scoring calculations and reduce the scoring steps to complete the task of feature selection for higher dimensional data is the next step of this paper.

    Acknowledgment

    This study was funded by the National Natural Science Foundation of China (No. 72104069), the Science and Technology Department of Henan Province, China (No. 222102210065), and the Postgraduate Meritocracy Scheme, China (No. SYL19060145).

    ORCID

    Wendong Wang  https://orcid.org/0009-0009-8929-4941

    Yu Li  https://orcid.org/0000-0001-9748-6024

    Jingsen Liu  https://orcid.org/0000-0002-2828-4223

    Huan Zhou  https://orcid.org/0000-0002-2785-8041

    Remember to check out the Most Cited Articles!

    Check out these titles in artificial intelligence!