Effectively reducing the dimensionality of big data and retaining its key information has been a research challenge. As an important step in data pre-processing, feature selection plays a critical role in reducing data size and increasing the overall value of the data. Many previous studies have focused on single-label feature selection, however, with the increasing variety of data types, the need for feature selection on multi-label data types has also arisen. Unlike single-labeled data, multi-labeled data with more combinations of classifications place higher demands on the capabilities of feature selection algorithms. In this paper, we propose a filter-based Multi-Objective Equilibrium Optimizer algorithm (MOEO-Smp) to solve the feature selection problem for both single-label and multi-label data. MOEO-Smp rates the optimization results of solutions and features based on four pairs of optimization principles, and builds three equilibrium pools to guide exploration and exploitation based on the total scores of solutions and features and the ranking of objective fitness values, respectively. Seven UCI single-label datasets and two Mulan multi-label datasets and one COVID-19 multi-label dataset are used to test the feature selection capability of MOEO-Smp, and the feature selection results are compared with 10 other state-of-the-art algorithms and evaluated using three and seven different metrics, respectively. Feature selection experiments and comparisons with the results in other literatures show that MOEO-Smp not only has the highest classification accuracy and excellent dimensionality reduction on single-labeled data, but also performs better on multi-label data in terms of Hamming loss, accuracy, dimensionality reduction, and so on.

Keywords:

1. Introduction

The quality of the original dataset can have a great impact on the effectiveness of machine learning, so pre-processing the original dataset is essential in machine learning. Feature selection is an important step in data pre-processing, which helps machine learning improve classification performance and effectively reduce the complexity of data analysis by eliminating irrelevant and redundant dimensions from the original data and selecting a subset of important features.¹ Currently, feature selection is divided into three approaches: wrapper-based, embedded-based and filter-based.² The wrapper-based feature selection methods combine a learning algorithm with a feature selection algorithm such that the learning algorithm acts as an evaluation black box until a stopping condition is reached. As a result, wrapper-based feature selection methods tend to obtain a more efficient subset of features compared to other methods, but this method will face a huge computational complexity as the data size increases. Embedded-based feature selection methods fuse feature selection algorithms and learning algorithms into one, and are generally faster in computational speed than packaged methods, but they both face high computational consumption due to the volume of data. The filtering-based feature selection methods are most remarkable in that the feature selection algorithm and the learning algorithm are independent of each other, with the former being responsible for completing iterations and outputting feature subsets and the latter being responsible for further learning or evaluation. The filter-based approach has less computational consumption and complexity than the wrapper-based and embedded approaches, and is particularly suitable for processing datasets with a large number of samples or features.

Single-label feature selection belongs to the category of supervised learning, which selects some dimensions from the original data as feature subsets, and uses the feature subsets to train and test the learning algorithm, so that the classification results of the learning algorithm are closer to the actual labels. In terms of discovering or diagnosing diseases and making predictive countermeasures through the results of machine learning, feature selection plays an important role in timely detection and prevention of diseases.³ Therefore, the problem of single-label feature selection has been a hot research topic.

However, in the classification tasks of machine learning in social, biological and medical fields, one sample in the dataset may correspond to multiple classification labels at the same time, such as text classification and gene function annotation.^4,5 Different from the difficulty of single-label feature selection, selecting a subset of features for the data with multiple labels requires, in addition to facing 2^N feature combinations (N is the number of features in the data), considering the intrinsic relevance of the features to the different labels.⁶ So, the training set filtered by this feature subset will make the set of classification labels obtained by the classifier for each instance have good classification performance in 2^S label combinations (S is the number of classification labels). This needs not only the feature selection algorithm to be able to have low computation consumption, but also to be able to provide an effective feature subset.

Based on a metaheuristic algorithm proposed in 2019, the equilibrium optimizer (EO), this paper proposes a filter-based improved equilibrium optimizer (MOEO-Smp) to solve the single-label and multi-label feature selection problems. The proposed method takes into account the complexity of the multi-label feature selection task by adding consideration of feature subsets, i.e., feature-label correlation and feature-feature redundancy, to the filtered feature selection method to improve the feature selection effectiveness. MOEO-Smp correlates changes in variables during the optimization process with changes in fitness to quickly identify key features, and uses equilibrium pool mechanism to store them to guide the update. The main contributions of this work are summarized as follows:

•	In this paper, maximum correlation and minimum redundancy are used as the two objectives of optimization, and the Pareto optimality set of feature subsets is obtained.
•	Based on four pairs of reward and penalty principles, rating rules for solution and feature optimization results are set, and total solution scores and total feature scores are calculated based on rating and related metrics.
•	Three equilibrium pools are built to guide the MOEO-Smp updates.
•	Seven UCI single-label datasets were used to test the performance of MOEO-Smp and 10 other comparison algorithms on three metrics.
•	Two Mulan multi-label datasets and one COVID-19 multi-label dataset were used to test the performance of MOEO-Smp and other 10 comparison algorithms on seven metrics. In addition, to the best of our knowledge, no one has used the EO to solve the multi-label feature selection problem prior to this.

2. Related Work

2.1. Research on single-label feature selection problem

The use of metaheuristic algorithms to solve the feature selection problem can select effective data features while keeping the computational consumption low, and has been the subject of considerable research. Li et al.⁷ adopted a wrapper-based multi-objective particle swarm algorithm with two objectives of maximum true-negative rate and maximum true-positive rate incorporating an improved PSV mechanism, variation operation and distance-based updating into the algorithm to increase the exploration capability and diversity of the algorithm. Hussain et al.⁸ fuses the sine-cosine algorithm and the Harris Hawk algorithm for large-scale data feature selection to improve convergence speed, accuracy, and dimensionality reduction without increasing computational consumption. Song et al.⁹ proposed a hybrid feature selection method based on filter and clustering and using particle swarm algorithm as classifier, which can significantly reduce the subsequent computational complexity in the process of filtering and clustering and fully consider the feature-label correlation and feature-feature redundancy. Wu et al.¹⁰ proposed a mixing-based improved binary quantum particle swarm algorithm based on a filtered framework, which numericalizes the dataset by maximum information coefficient (MIC) and uses a comprehensive learning strategy to improve the diversity of solutions.

2.2. Research on multi-label feature selection problem

One solution to multi-label feature selection is a problem transformation, which transforms a multi-label dataset into multiple single-label datasets and finally combines the classification results.¹¹ However, the multi-label feature selection method based on problem transformation loses the correlation information contained between features and labels in the dataset, increases the computational complexity, and is not conducive to selecting an effective subset of features. Therefore, researchers have proposed feature selection methods based on algorithm adaptation. Zhang and Zhou¹² proposed a multi-label k-nearest neighbor approach to predict labels by calculating the posterior probability from the label information of known k-nearest neighbors. This approach maintains the integrity of the intrinsic relationships of the data, but the computational consumption is still high.

In order to preserve the original intrinsic information in the dataset and reduce the computational consumption, some other multi-label feature selection methods have been proposed. Among them, with the aid of metaheuristic algorithm which is insensitive to the target space and does not depend on gradient optimization, a weighted multi-objective PSO is proposed for feature selection of five medical datasets and better results are obtained with the validation of different classifiers.¹³ This shows that the metaheuristic algorithm is able to identify and select effective features for medically relevant datasets. Although this approach has good results and low computational complexity, it is difficult to determine the objective weights for different datasets. Dong et al.¹⁴ proposed an improved NSGA III algorithm based on Pareto optimality and tested it on 11 multi-label datasets, and the results showed that the method has good classification performance. Paniri et al.¹⁵ also focus on the curse of dimension faced by multi-label learning, so a filter-based ant colony algorithm is used to reduce the dimension of multi-labeled data, and different search objectives are proposed under unsupervised and supervised conditions.

2.3. Research on equilibrium optimizer

Equilibrium Optimizer (EO) has been used in many research areas due to its low computational complexity and excellent balance of exploration and exploitation capabilities. Zhong et al.¹⁶ added backward learning and levy flight to the update process of EO in order to solve the high-dimensional optimization problem, and considered the evolutionary population dynamics to improve the algorithm’s search capability; the algorithm shows good optimization results on high-dimensional benchmark test functions. Rizk-Allah and Aboul¹⁷ mixed pattern search technique and EO to propose a method to solve the combinatorial optimization problem and focus on wind farm layout as the main problem to optimize the siting of wind power facilities in the Gulf of Suez in the Red Sea. El-Ela et al.¹⁸ proposed a multi-objective equilibrium optimizer algorithm to solve the battery parameter assignment problem in a life maximization system and used a fuzzy technique to select a compromise solution from the set of pareto optimal solutions.

Some studies have also used EO to solve the feature selection problem. Varzaneh et al.¹⁹ combined the filtered and wrapped frameworks by first performing an initial filtering of the data using maximum relevance minimum redundancy (mRMR), and then performing feature selection again using a wrapped-based EO that incorporates the concepts of Levy distribution and entropy increase. Vommi and Tirumala²⁰ improved EO using positive cosine operation and bi-phase variation with a wrapper-based framework for feature selection on single-label standard biological datasets, and the results showed that the method performs well.

The MOEO-Smp proposed in this paper is based on filter framework and has the optimization objectives of maximizing the feature-label correlation and minimizing the feature-feature redundancy in the dataset, and scoring the solutions and features based on the results of the optimization to select the promising solutions and features and guide other solutions update through an equilibrium pooling strategy. This method was tested on seven UCI single-label datasets, two Mulan multi-label datasets and one COVID-19 dataset from the real world to examine its single-label and multi-label feature selection capabilities. Three metrics are used to evaluate the feature selection results for single-label data and seven metrics are used to evaluate the feature selection results for multi-label data. The experimental results illustrate that MOEO-Smp has significantly better performance and overall better competitiveness in single-label feature selection and multi-label feature selection compared with 10 other state-of-the-art algorithms.

3. Methodology

3.1. Equilibrium optimizer

Equilibrium Optimizer (EO) is a metaheuristic algorithm that is proposed based on physical phenomena, which has excellent global optimization capabilities.²¹ It is inspired by the relationship between the inflow, outflow and generation of mass in a control volume based on the equilibrium equation of mass in physics.

The equilibrium pool is an important mechanism of EO, which is able to store the currently obtained solution with the best fitness performance to be used to guide other solutions to update, and is expressed as follows :

X eq, pool = {X eq 1, X eq 2, X eq 3, X eq 4, X eqave}, <math display="block" altimg="eq-00001.gif"><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">pool</mtext></mstyle></mrow></msub><mo>=</mo><mo stretchy="false">{</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mn>2</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mn>3</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mn>4</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eqave</mtext></mstyle></mrow></msub><mo stretchy="false">}</mo><mo>,</mo></math> (1)

where

$X_{eq 1}$ ,

$X_{eq 2}$ ,

$X_{eq 3}$ ,

$X_{eq 4}$ are the four solution vectors with the best fitness performance so far, and

$X_{eqave}$ is the average of these four solution vectors. After the EO is initialized, each iteration follows the update as follows :

X d i = X d eq + (X d i - X d eq) F + G (1 - F) ∕ λ V, <math display="block" altimg="eq-00007.gif"><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo>=</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo>+</mo><mo stretchy="false">(</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo>-</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">)</mo><mi>F</mi><mo>+</mo><mi>G</mi><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><mi>F</mi><mo stretchy="false">)</mo><mo stretchy="false">∕</mo><mi>λ</mi><mi>V</mi><mo>,</mo></math> (2)

F = a * sign (r - 0.5) [e - λ t - 1], <math display="block" altimg="eq-00008.gif"><mi>F</mi><mo>=</mo><mstyle><mtext mathvariant="normal">a</mtext></mstyle><mo>*</mo><mstyle><mtext mathvariant="normal">sign</mtext></mstyle><mo stretchy="false">(</mo><mi>r</mi><mo>-</mo><mn>0</mn><mo>.</mo><mn>5</mn><mo stretchy="false">)</mo><mo stretchy="false">[</mo><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><mi>λ</mi><mi>t</mi></mrow></msup><mo>-</mo><mn>1</mn><mo stretchy="false">]</mo><mo>,</mo></math> (3)

t = [1 - iter ∕ T] (b * iter ∕ T), <math display="block" altimg="eq-00009.gif"><mi>t</mi><mo>=</mo><msup><mrow><mo stretchy="false">[</mo><mn>1</mn><mo>-</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">∕</mo><mi>T</mi><mo stretchy="false">]</mo></mrow><mrow><mo stretchy="false">(</mo><mi>b</mi><mo>*</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">∕</mo><mi>T</mi><mo stretchy="false">)</mo></mrow></msup><mo>,</mo></math> (4)

G = 0.5 r 1 * logic (r 2, G P) * (X d eq - λ X d i) F, <math display="block" altimg="eq-00010.gif"><mi>G</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>5</mn><msub><mrow><mi>r</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>*</mo><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mi>G</mi><mi>P</mi><mo stretchy="false">)</mo><mo>*</mo><mo stretchy="false">(</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo>-</mo><mi>λ</mi><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">)</mo><mi>F</mi><mo>,</mo></math> (5)

logic (A, B) = {1 if A \geq B 0 else, <math display="block" altimg="eq-00011.gif"><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><mi>A</mi><mo>,</mo><mi>B</mi><mo stretchy="false">)</mo><mo>=</mo><mfenced separators="" open="{" close=""><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><mn>1</mn><mspace width="1em"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">if</mtext></mstyle><mtext> </mtext><mi>A</mi><mo>\geq</mo><mi>B</mi></mtd></mtr><mtr><mtd columnalign="left"><mn>0</mn><mspace width="1em"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">else</mtext></mstyle></mtd></mtr></mtable><mo>,</mo></mrow></mfenced></math> (6)

where

$X_{i}^{d}$ denotes the dth dimension of the ith solution in the population and

$X_{eq}^{d}$ denotes the dth dimension of a random one in the equilibrium pool

$X_{eq, pool}$ . F is exponential term, t is time. Sign(⋅) is a symbolic function, the value is 1 or

$- 1$ . Iter and T denote the current and maximum number of iterations, respectively. G is generation rate. logic(

$A, B$ ) is a logical judgment, if the result is true then it equals 1, otherwise it equals 0.

$λ$ ,

$r_{1}$ and

$r_{2}$ are all uniformly distributed random numbers between 0 and 1. a, b and GP are constants equal to 2, 1 and 0.5, respectively.

3.2. Proposed MOEO-Smp

Traditional algorithms usually use fitness as an important metric to evaluate optimization results; however, considering the improvement rate of the solution is equally important in finding a solution with potential.²² In the entire iterative process, the performance of the fitness value cannot express the complete information of the optimization rate, which may lead to a locally optimal solution instead of the solution with higher optimization potential to guide the algorithm to explore and exploit other solutions, thus falling into local optimality or accuracy degradation.

Based on this, the proposed MOEO-Smp combines the characteristics of multi-objective optimization and establishes not only the optimal equilibrium pool that can store those solutions with optimal values obtained on each objective, but also the candidate solution equilibrium pool and the candidate feature equilibrium pool by considering the information of the solutions’ improvement rate on each objective and the scoring of each solution and each feature in the population based on the four pairs of reward and penalty principles and the optimization results to be one of the important indicators.

After one iteration, MOEO-Smp compares the changes in fitness of each solution in the population on each objective, and if the updated solution is better than the original solution, the change is called positive phase (PP), and vice versa, called negative phase (NP), as shown in Fig. 1, where “0” and “1” indicate that the feature is not selected and selected into the feature subset, respectively. In order to better identify and select solutions with higher improvement rates on each objective to enter the candidate solution balancing pool, we developed two pairs of reward and penalty principles for rating the solutions: (1) encouraging all objectives are PP and penalizing all objectives are NP; (2) encouraging continuous optimization and punishing successive stagnation. In this way, when a solution is optimized multiple times continuously on a certain objective or optimized on all objectives, it has the opportunity to obtain higher rating in this iteration. The rating expression for solutions is as follows :

RS i (iter + 1) = RS i (iter) + logic ([alo PP, alo NP], [MGim, MGst]) * C TR 1, <math display="block" altimg="eq-00020.gif"><msub><mrow><mstyle><mtext mathvariant="normal">RS</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><msub><mrow><mstyle><mtext mathvariant="normal">RS</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">)</mo><mo>+</mo><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">alo</mtext></mstyle><mtext> </mtext><mstyle><mtext mathvariant="normal">PP</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">alo</mtext></mstyle><mtext> </mtext><mstyle><mtext mathvariant="normal">NP</mtext></mstyle><mo stretchy="false">]</mo><mo>,</mo><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">MGim</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">MGst</mtext></mstyle><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo>*</mo><msubsup><mrow><mi>C</mi></mrow><mrow><mn>1</mn></mrow><mrow><mstyle><mtext mathvariant="normal">TR</mtext></mstyle></mrow></msubsup><mo>,</mo></math> (7)

where RS_i(iter) denotes the rating of the ith solution at the i-ter iteration, and its initial value is 0. logic(⋅) is a logical judgment. alo PP and alo NP denote the scores where all objectives are PP and all objectives are NP, respectively. MGim and MGst denote exceeding the continuous optimization threshold and exceeding the continuous stagnation threshold on the objectives, respectively.

${[\cdot]}^{Tr}$ denotes the transpose matrix.

$C_{1}$ denotes the solution rating matrix of size

$1 \times 4$ and all elements are constants, with each element being 0.25. The score of the optimization result for the ith solution at the iterative is defined as

S score i (iter + 1) = [RS i (iter + 1)] ∕ (iter + 1) + (1 - nsf i ∕ NF), <math display="block" altimg="eq-00024.gif"><mi>S</mi><msub><mrow><mstyle><mtext mathvariant="normal">score</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">[</mo><msub><mrow><mstyle><mtext mathvariant="normal">RS</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">∕</mo><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><msub><mrow><mstyle><mtext mathvariant="normal">nsf</mtext></mstyle></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">∕</mo><mstyle><mtext mathvariant="normal">NF</mtext></mstyle><mo stretchy="false">)</mo><mo>,</mo></math> (8)

where Sscore_i(iter) denotes the optimization result score of the ith solution at the iter-th iteration, and its initial value is 0. nsf_i is the number of features selected in the ith solution, and NF is the total number of features. The larger Sscore indicates the higher optimization potential of the solution.

Fig. 1. The situation of optimization results.

After one iteration, the features selected in the solution also change, and if the selection of a feature is added to the updated solution compared to the original solution, it is called select in (SI), and vice versa, called select out (SO), and the rest of the case are called unchanged (UC), as shown in Fig. 1.

If the solution after iteration has a better fitness value, it means that it has selected some important features and eliminated some redundant features in general. Therefore, two pairs of reward and penalty principles are set for rating each feature in the solution according to how the solution changes in fitness and how the selected features change after the update: (1) encouraging SO and not all objectives are NP and penalizing SI and all objectives are NP; (2) encouraging non-UC and not all objectives are NP and penalizing UC and all objectives are NP. In this way, information about changes in fitness caused by changes in the selected features after the update will be reflected in the feature rating, and features that have higher contribution to the fitness will receive higher rating. The rating expression for features is as follows :

RF d i (iter + 1) = R F d i (iter) + SUM {logic ([SI, SO, UC] TR) * logic ([alo PP, alo NP, else]) * C 2}, <math display="block" altimg="eq-00025.gif"><msubsup><mrow><mstyle><mtext mathvariant="normal">RF</mtext></mstyle></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mtext>R</mtext><msubsup><mrow><mtext>F</mtext></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">)</mo><mo>+</mo><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle><mfenced separators="" open="{" close="}"><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><msup><mrow><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">SI</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">SO</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">UC</mtext></mstyle><mo stretchy="false">]</mo></mrow><mrow><mstyle><mtext mathvariant="normal">TR</mtext></mstyle></mrow></msup><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd columnalign="left"><mo>*</mo><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><mo stretchy="false">[</mo><mstyle><mtext mathvariant="normal">alo</mtext></mstyle><mtext> </mtext><mstyle><mtext mathvariant="normal">PP</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">alo</mtext></mstyle><mtext> </mtext><mstyle><mtext mathvariant="normal">NP</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">else</mtext></mstyle><mo stretchy="false">]</mo><mo stretchy="false">)</mo><mo>*</mo><msub><mrow><mi>C</mi></mrow><mrow><mn>2</mn></mrow></msub></mtd></mtr></mtable></mrow></mfenced><mo>,</mo></math> (9)

where RF

$_{i}^{d}$ (iter) denotes the optimization result rate of the dth feature in the ith solution at the iter-th iteration with an initial value of 0. SUM{⋅} denotes the sum of all elements in the matrix. Else denotes any case except alo PP and alo NP.

$C_{1}$ denotes the feature rating matrix of size

$3 \times 3$ and all elements are constants, as [0.25,

$- 0.25$ , 0.25; 0.5, 0, 0.5; 0, 0.5, 0]. The optimization result score of the ith solution at the iterative is defined as :

F score d (iter + 1) = [RF d (iter + 1)] ∕ [N (iter + 1)] + nsf d ∕ N, <math display="block" altimg="eq-00030.gif"><mi>F</mi><msub><mrow><mstyle><mtext mathvariant="normal">score</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msub><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">[</mo><msup><mrow><mstyle><mtext mathvariant="normal">RF</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msup><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo stretchy="false">∕</mo><mo stretchy="false">[</mo><mi>N</mi><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo>+</mo><msub><mrow><mstyle><mtext mathvariant="normal">nsf</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msub><mo stretchy="false">∕</mo><mi>N</mi><mo>,</mo></math> (10)

where Fscore_d(iter) denotes the optimization result score of the dth feature at the iter-th iteration, and its initial value is 0. nsf_d is the number of times the dth feature is selected by all solutions, and N is the number of solutions in the population. The larger the Fscore, the greater the influence of the feature in the optimization.

Based on the score ranking of the optimization results of solutions and features, we build the candidate solution equilibrium pool and the candidate feature equilibrium pool with the expressions as follows, respectively :

X eq, Spool = {X Seq 1, X Seq 2, X Seq 3, X Seq 4, X Seqave}, <math display="block" altimg="eq-00031.gif"><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mo>,</mo><mstyle><mtext mathvariant="normal">Spool</mtext></mstyle></mrow></msub><mo>=</mo><mo stretchy="false">{</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle><mn>2</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle><mn>3</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle><mn>4</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seqave</mtext></mstyle></mrow></msub><mo stretchy="false">}</mo><mo>,</mo></math> (11)

X eq, F pool = {randirank [F score (iter), g, h]}, <math display="block" altimg="eq-00032.gif"><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mo>,</mo><mi>F</mi><mstyle><mtext mathvariant="normal">pool</mtext></mstyle></mrow></msub><mo>=</mo><mo stretchy="false">{</mo><mstyle><mtext mathvariant="normal">randirank</mtext></mstyle><mo stretchy="false">[</mo><mi>F</mi><mstyle><mtext mathvariant="normal">score</mtext></mstyle><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">)</mo><mo>,</mo><mi>g</mi><mo>,</mo><mi>h</mi><mo stretchy="false">]</mo><mo stretchy="false">}</mo><mo>,</mo></math> (12)

where

$X_{eq, Spool}$ is the candidate solution balancing pool,

$X_{Seq 1}$ ,

$X_{Seq 2}$ ,

$X_{Seq 3}$ ,

$X_{Seq 4}$ are the four solution vectors with the best solution scores so far, and

$X_{Seqave}$ is the average of these four solution vectors.

$X_{eq, F pool}$ is the candidate feature balancing pool, randirank[

$X, g, h$ ] means sort all elements of the matrix X in descending order, and randomly selects h% elements from the first g%. In addition, we build the optimal equilibrium pool capable of storing the solutions that have obtained the best performance on a given objective, with the following expressions :

X eq, B pool = {X 1, X 2, \dots, X m, \dots, X M}, <math display="block" altimg="eq-00041.gif"><msub><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">eq</mtext></mstyle><mo>,</mo><mi>B</mi><mstyle><mtext mathvariant="normal">pool</mtext></mstyle></mrow></msub><mo>=</mo><mo stretchy="false">{</mo><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mi>m</mi></mrow></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow><mi>X</mi></mrow><mrow><mi>M</mi></mrow></msub><mo stretchy="false">}</mo><mo>,</mo></math> (13)

where

$X_{m}$ denotes the solution that obtains the optimal value on the mth objective and M is the number of objectives.

The updated equation of MOEO-Smp replaces the equilibrium pool in EO with the candidate solution equilibrium pool and adds the optimal equilibrium pool to guide the particle exploitation and exploration direction. The expressions are as follows :

X d i = X d Seq + (X d i - X d Seq) F + G (1 - F) ∕ λ V + [1 ∕ 2 (iter ∕ T)] λ X d m, <math display="block" altimg="eq-00043.gif"><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo>=</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo>+</mo><mo stretchy="false">(</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo>-</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">)</mo><mi>F</mi><mo>+</mo><mi>G</mi><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><mi>F</mi><mo stretchy="false">)</mo><mo stretchy="false">∕</mo><mi>λ</mi><mi>V</mi><mo>+</mo><mo stretchy="false">[</mo><mn>1</mn><mo stretchy="false">∕</mo><msup><mrow><mn>2</mn></mrow><mrow><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">∕</mo><mi>T</mi><mo stretchy="false">)</mo></mrow></msup><mo stretchy="false">]</mo><mi>λ</mi><msubsup><mrow><mi>X</mi></mrow><mrow><mi>m</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo>,</mo></math> (14)

G = 0.5 r 1 * logic (r 2, GP) * (X d Seq - λ X d i) F . <math display="block" altimg="eq-00044.gif"><mi>G</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>5</mn><msub><mrow><mi>r</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>*</mo><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mstyle><mtext mathvariant="italic">GP</mtext></mstyle><mo stretchy="false">)</mo><mo>*</mo><mo stretchy="false">(</mo><msubsup><mrow><mi>X</mi></mrow><mrow><mstyle><mtext mathvariant="normal">Seq</mtext></mstyle></mrow><mrow><mi>d</mi></mrow></msubsup><mo>-</mo><mi>λ</mi><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">)</mo><mi>F</mi><mo>.</mo></math> (15)

We propose an elite strategy with a nonlinear learning rate for the candidate feature equilibrium pool by: (1) generating a nonlinear learning rate lr, as in (15), after each update; (2) generating a random number $r_{0}$ uniformly distributed between 0 and 1 for each feature corresponding to the candidate solution equilibrium pool; (3) When lr is greater than $r_{0}$ , the corresponding feature in the candidate solution equilibrium pool will be removed in the solution $X_{i}$ , thus generating the elite $X_{i}$ ; (4) If elite $X_{i}$ outperforms $X_{i}$ on all objectives, $X_{i}$ is replaced by elite $X_{i}$ . If $X_{i}$ outperforms elite $X_{i}$ on all objectives, $X_{i}$ remains unchanged. In other cases, there is a probability of 1/2 that elite $X_{i}$ replaces $X_{i}$ .

lr = 1 - 0.5 \times (1 - (2 * iter ∕ T - 1)) 3 . <math display="block" altimg="eq-00058.gif"><mtext>lr</mtext><mo>=</mo><mn>1</mn><mo>-</mo><mn>0</mn><mo>.</mo><mn>5</mn><mo>\times</mo><msup><mrow><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><mo stretchy="false">(</mo><mn>2</mn><mo>*</mo><mstyle><mtext mathvariant="normal">iter</mtext></mstyle><mo stretchy="false">∕</mo><mi>T</mi><mo>-</mo><mn>1</mn><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><mrow><mn>3</mn></mrow></msup><mo>.</mo></math> (16)

Figure 2 shows the variation of the nonlinear learning rate lr at different iteration periods. In the early iterations, lr is larger than the linear learning rate and increases at a slower rate, giving the algorithm sufficient opportunity to explore other subsets, preventing prematurely falling into local optima and facilitating rapid reduction of the feature subset size. In the late iteration, the lr is smaller than the linear learning rate and grows at a faster rate, allowing the algorithm to retain more optimization information and exploit it finely, which is conducive to convergence.

In addition, the pareto optimal solutions generated by MOEO-Smp based on the dominance relationship are saved in the archive, and when the archive capacity is full, other solutions are replaced based on the ranking of the crowding distance of the solutions, and the flow chart of the algorithm is shown in Fig. 3.

4. Experiment and Discussion

In this section, seven single-label datasets, and three multi-label datasets are used to test the feature selection ability of the filter-based approach MOEO-Smp. In the single-label feature selection experiments, the performances of 10 other multi-objective algorithms with KNN as classifier are compared on three metrics. In the multi-label feature selection experiments, the performances of 10 other multi-objective algorithms with MLKNN as classifier are compared on seven metrics. The results show that the feature subset selected by MOEO-Smp has better classification ability.

All experiments are run on a computer running the Windows 10 operating system with an Intel(R) Core(TM) i9-13900HX 2.20GHz and 16.00 GB of RAM. All algorithm codes are run on MATLAB 2019b software.

4.1. Metrics and parameters

The UCI dataset contains classification data in various domains and is the most commonly used standard dataset for machine learning. The Mulan dataset contains a variety of multi-label data and is widely used in multi-label feature selection studies. The experimental part of this paper contains seven UCI single-label data, two Mulan multi-label data to test the feature selection ability of the proposed method.

In addition, to further validate the multi-label feature selection performance of MOEO-Smp, the diagnosis of COVID-19 and its clinical spectrum dataset from Kaggle is used to further test the effectiveness of the proposed MOEO-Smp to solve the multi-label feature selection problem in the real world. All the dataset information is shown in Table 1.

**Table 1. Dataset for experiment.**
Datasets		Sample	Feature	Label	Classification
Single-label	Landsat²³	2000	36	1	6
	Msplice²³	3175	240		3
	Pima²³	768	8		2
	Sonar²³	208	60		2
	Spambase²³	4601	57		2
	Wine²³	178	13		3
	Wpbc²³	198	33		2
Multi-label	Emotion^24,25	593	72	6	2^Label
	Birds^24,25	645	300	19
	COVID-19^26,27	5644	100	5

In this multi-label and single-label feature selection problem, MOEO-Smp obtains a Pareto optimal solution set with the maximum mutual information value between features and labels and the minimum cosine similarity between features and features as two optimization objectives, respectively :

MI (X, Y) = SUM x \in X SUM y \in Y p (x, y) log [p (x, y) ∕ p (x) * p (y)], <math display="block" altimg="eq-00060.gif"><mstyle><mtext mathvariant="normal">MI</mtext></mstyle><mo stretchy="false">(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>x</mi><mo>\in</mo><mi>X</mi></mrow></msub><msub><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>y</mi><mo>\in</mo><mi>Y</mi></mrow></msub><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>log</mo><mo stretchy="false">[</mo><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo stretchy="false">∕</mo><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>*</mo><mi>p</mi><mo stretchy="false">(</mo><mi>y</mi><mo stretchy="false">)</mo><mo stretchy="false">]</mo><mo>,</mo></math> (17)

CS (X, Y) = | SUM d i = 1 (X i Y i) ∕ [SUM d i = 1 X 2 i * SUM d i = 1 Y 2 i] |, <math display="block" altimg="eq-00061.gif"><mstyle><mtext mathvariant="normal">CS</mtext></mstyle><mo stretchy="false">(</mo><mi>X</mi><mo>,</mo><mi>Y</mi><mo stretchy="false">)</mo><mo>=</mo><mi>|</mi><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>d</mi></mrow></msubsup><mo stretchy="false">(</mo><msub><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow></msub><msub><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">∕</mo><mo stretchy="false">[</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>d</mi></mrow></msubsup><msubsup><mrow><mi>X</mi></mrow><mrow><mi>i</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo>*</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>d</mi></mrow></msubsup><msubsup><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow><mrow><mn>2</mn></mrow></msubsup><mo stretchy="false">]</mo><mi>|</mi><mo>,</mo></math> (18)

where MI(

$X, Y)$ is the mutual information of X and Y,

$p (x)$ and

$p (y)$ donate probability function. CS(

$X, Y)$ is the cosine similarity of variable X and Y, d is the variable dimension.

In single-label experiments, the accuracy (Acc), number of features (NF) selected by feature subsets and the running time (time) of the algorithm are used as metrics to evaluate single-label feature selection. Acc is calculated as follows :

Acc = (TP + TN) ∕ (TP + FP + TN + FN), <math display="block" altimg="eq-00066.gif"><mstyle><mtext mathvariant="normal">Acc</mtext></mstyle><mo>=</mo><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle><mo stretchy="false">)</mo><mo stretchy="false">∕</mo><mo stretchy="false">(</mo><mstyle><mtext mathvariant="normal">TP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">FP</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">TN</mtext></mstyle><mo>+</mo><mstyle><mtext mathvariant="normal">FN</mtext></mstyle><mo stretchy="false">)</mo><mo>,</mo></math> (19)

where TP denotes the samples whose predicted values match with the true values and both are positive; TN denotes the samples whose predicted values match with the true values and both are negative; FP denotes the samples whose true values are negative and predicted values are positive; FN denotes the samples whose true values are positive and predicted values are negative.

In multi-label experiments, the hamming loss (HL), ranking loss (RL), one-error rate (OE), coverage rate (CV) and average precision (AP) as the metrics to evaluate the performance of feature selection.²⁸ In addition, the number of features (NF) and the running time (time) are also compared, where, smaller HL, RL, OE, CV, time and NF indicate better performance of the algorithm and larger AP indicates better performance of the algorithm.

Suppose $D = {(x_{i}, Y_{i}), i = 1, 2, \dots, s}$ is the test set, $x_{i}$ is the instance, s is the label space, L is the label set, $Y_{i}$ is the correct subset of labels, and $Y_{i} \subseteq L$ . h is the multi-label classifier. The metric calculations are as follows :

HL (h) = 1 ∕ s * SUM S i = 1 | h (x i) Δ Y i |, <math display="block" altimg="eq-00071.gif"><mstyle><mtext mathvariant="normal">HL</mtext></mstyle><mo stretchy="false">(</mo><mi>h</mi><mo stretchy="false">)</mo><mo>=</mo><mn>1</mn><mo stretchy="false">∕</mo><mi>s</mi><mo>*</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>S</mi></mrow></msubsup><mi>|</mi><mi>h</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mi mathvariant="normal">Δ</mi><msub><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow></msub><mi>|</mi><mo>,</mo></math> (20)

where

$| \cdot |$ denotes the base of the returned set and

$Δ$ denotes the symmetric difference between the two sets. Hamming loss is able to calculate the percentage of misclassified labels.

where

$Y_{i}$ is complement of

${\bar{Y}}_{i ¯}, f (\cdot, \cdot)$ is a real-value function. Ranking loss indicates the proportion of irrelevant labels ranking higher than the relevant tags ranking.

OE (f) = 1 ∕ s * SUM S i = 1 logic (arg max y \in L f (x i, y) \sim \in Y i), <math display="block" altimg="eq-00077.gif"><mstyle><mtext mathvariant="normal">OE</mtext></mstyle><mo stretchy="false">(</mo><mi>f</mi><mo stretchy="false">)</mo><mo>=</mo><mn>1</mn><mo stretchy="false">∕</mo><mi>s</mi><mo>*</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>S</mi></mrow></msubsup><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">(</mo><msub><mrow><mstyle><mtext mathvariant="normal">arg max</mtext></mstyle></mrow><mrow><mi>y</mi><mo>\in</mo><mi>L</mi></mrow></msub><mi>f</mi><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>\sim</mo><mo>\in</mo><msub><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo><mo>,</mo></math> (22)

where

$\sim \in$ indicates not belonging to the relation. One-error rate can calculate the percentage of instances where the top-ranked label is not in the set of related labels.

CV (f) = 1 ∕ s * SUM S i = 1 max rank f (x i, y) - 1, <math display="block" altimg="eq-00079.gif"><mstyle><mtext mathvariant="normal">CV</mtext></mstyle><mo stretchy="false">(</mo><mi>f</mi><mo stretchy="false">)</mo><mo>=</mo><mn>1</mn><mo stretchy="false">∕</mo><mi>s</mi><mo>*</mo><msubsup><mrow><mstyle><mtext mathvariant="normal">SUM</mtext></mstyle></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>S</mi></mrow></msubsup><mo>max</mo><msub><mrow><mstyle><mtext mathvariant="normal">rank</mtext></mstyle></mrow><mrow><mi>f</mi></mrow></msub><mo stretchy="false">(</mo><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>-</mo><mn>1</mn><mo>,</mo></math> (23)

where rank

$_{f} (x, y)$ denotes the ranking of y in L according to

$f (x, y)$ to be returned in descending order. Coverage rate can be calculated as the average number needed to cover all relevant tags down the list of labels.

Average precision indicates the average precision of the related labels ranked on above a particular label $y \in Y$ .

In addition, the running time (time) of the algorithm and the number of features (NF) selected by feature subsets are also compared, where smaller HL, RL, OE, CV, time and NF indicate better performance of the algorithm and larger AP indicates better performance of the algorithm.

NSGA-II,²⁹ MOPSO,³⁰ MOEA-D,³¹ MOALO,³² MOHHO,³³ MOTEO,³⁴ MSSA,³⁵ MOEA-NSGA-II,³⁶ MOABC³⁷ and NSGWO³⁸ are used as comparison algorithms with the same parameter settings as in the original paper. These algorithms are binarized using the following transfer function :

BX = logic [(1 ∕ (1 + exp - X)), 0.5], <math display="block" altimg="eq-00084.gif"><mstyle><mtext mathvariant="italic">BX</mtext></mstyle><mo>=</mo><mstyle><mtext mathvariant="normal">logic</mtext></mstyle><mo stretchy="false">[</mo><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">∕</mo><mo stretchy="false">(</mo><mn>1</mn><mo>+</mo><msup><mrow><mo>exp</mo></mrow><mrow><mo>-</mo><mi>X</mi></mrow></msup><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo><mn>0</mn><mo>.</mo><mn>5</mn><mo stretchy="false">]</mo><mo>,</mo></math> (25)

where X is the solution vector. Logic

$(\cdot)$ is a logical judgment, the value is 0 or 1. If the dimension in BX is “1”, the feature is selected; if the dimension in BX is “0”, the feature is not selected. All algorithms have a population size of 30 and a maximum number of iterations of 500, and KNN (

$k = 5$ )³⁹ and MLKNN (

$k = 10$ )¹² as classifiers for single-label and multi-label feature selection tests, respectively. Sixty percent of the data are used for training and 40% of the data are used for testing to verify the results of feature selection.

4.2. Single-label feature selection and experimental results

For the single-label feature selection test, we selected seven UCI single-label datasets and compared them with 10 other state-of-the-art algorithms to evaluate the results in terms of the accuracy (Acc), number of features (NF) selected by feature subsets and the running time (time).

Accuracy can reflect the quality of the feature subsets selected by the algorithm. Table 2 shows the correct classification rates of the feature subsets selected by MOEO-Smp and the other 10 compared algorithms on 7 single-label datasets. The results showed that MOEO-Smp obtained better performance than the other 10 comparison algorithms on the msplice, pima, sonar, spambase, wine and wpbc datasets. Although MOPAO and MOALO obtained the highest accuracy in the landsat dataset, MOEO-Smp was second only to them, obtaining the third best result.

**Table 2. Accuracy (acc) of 11 algorithms on 7 single-label datasets.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
landsat	Ave	0.86405	0.87487	0.85648	0.87278	0.86634	0.86868	0.86812	0.86927	0.86937	0.86988	0.87132
	Std	0.00709	0.00797	0.01066	0.01034	0.00519	0.01016	0.01129	0.00730	0.00828	0.00808	0.00644
msplice	Ave	0.73002	0.81594	0.76623	0.82798	0.79770	0.76906	0.82213	0.78781	0.79584	0.82266	0.82920
	Std	0.00734	0.00721	0.02258	0.01038	0.01226	0.00677	0.00737	0.01211	0.00784	0.00853	0.00449
pima	Ave	0.67766	0.69383	0.68709	0.70195	0.69537	0.69234	0.70074	0.70403	0.70662	0.69252	0.71145
	Std	0.01693	0.01378	0.01750	0.01066	0.01923	0.01013	0.01065	0.01531	0.01410	0.01307	0.01390
sonar	Ave	0.73631	0.73700	0.72968	0.74605	0.71460	0.72766	0.73256	0.73714	0.73958	0.73188	0.74633
	Std	0.02608	0.02493	0.03780	0.04301	0.02319	0.03125	0.03797	0.04022	0.03722	0.03202	0.02560
spambase	Ave	0.82361	0.83073	0.82872	0.83207	0.82723	0.81583	0.83143	0.81803	0.81679	0.83033	0.83334
	Std	0.00986	0.01008	0.03678	0.01076	0.01957	0.00557	0.01479	0.01207	0.00880	0.00995	0.00926
wine	Ave	0.73631	0.73700	0.72968	0.74605	0.71460	0.72766	0.73256	0.73714	0.73958	0.73188	0.74633
	Std	0.01649	0.03546	0.05109	0.03102	0.03677	0.02612	0.03534	0.02994	0.02812	0.03879	0.04417
wpbc	Ave	0.82361	0.83073	0.82872	0.83207	0.82723	0.81583	0.83143	0.81803	0.81679	0.83033	0.83334
	Std	0.02957	0.02836	0.02505	0.02601	0.02215	0.03154	0.02824	0.02778	0.03537	0.02659	0.02440

The lower number of features in the feature subset indicates that the algorithm performs better in dimensionality reduction. Table 3 shows the average number of features in feature subsets for MOEO-Smp and the other 10 compared algorithms on 7 single-label datasets. The feature subsets of NSGA-II obtained the least number of features on landsat, pima, sonar and spambase. The feature subset of MOEA-D has the lowest average dimensionality on wine and wpbc. The feature subset of MOEO-Smp has the lowest number of feature selections only on msplice. The number of selected features is one of the metrics to evaluate the goodness of feature subsets, but the correct classification rate is a more important evaluation metric.³ Although MOEO-Smp only has the lowest number of selected features on msplice, it obtained at least the second or third best performance on landsat, pima, sonar, spambase, wine and wpbc compared to the other 10 compared algorithms, and compared to NSGA-II and MOEA-D, which have the best dimensionality reduction, MOEO-Smp’s feature subset has significantly higher classification accuracy. Thus, the feature subset selected by MOEO-Smp not only has better data dimensionality reduction capability, but also maintains high classification accuracy.

**Table 3. Number of features selected (nf) by feature subsets of 11 algorithms on 7 single-label datasets.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
landsat	Ave	14.12167	21.89667	10.45500	23.13909	19.01309	20.23375	23.08132	23.86667	24.94667	22.19000	17.67405
	Std	0.56891	0.62976	0.51131	0.81957	1.15581	0.31217	0.62239	0.59697	0.53709	0.69012	0.54178
msplice	Ave	95.04000	144.06533	121.42500	137.50391	120.77100	132.47333	140.16219	156.96333	156.36333	149.17333	115.28873
	Std	1.14205	3.31980	1.97012	4.60109	3.82695	1.06457	2.46209	1.60372	1.33442	2.75253	3.83372
pima	Ave	3.19667	4.36393	3.24667	3.92306	3.93562	4.59865	3.99513	5.27552	5.40667	4.02667	4.03009
	Std	0.33472	0.21445	0.32810	0.30529	0.24421	0.10530	0.38382	0.32085	0.24557	0.11877	0.18821
sonar	Ave	23.75667	37.27333	22.20167	38.71833	31.87915	33.33292	38.90833	39.50500	40.80667	38.11000	29.02443
	Std	0.49265	0.64233	0.63305	1.38308	1.74063	0.42579	1.02289	0.99256	0.72100	0.62165	0.93966
spambase	Ave	22.48333	36.52097	22.78333	37.45404	31.16668	32.63417	37.34466	37.58833	38.93333	38.10833	29.28637
	Std	0.70781	0.59307	1.51254	1.58474	1.79562	0.46531	0.91914	0.38056	0.57968	0.66084	1.02480
wine	Ave	5.11833	7.69167	2.44500	7.30684	6.72307	7.41958	7.43764	8.69333	9.06500	7.01833	6.35864
	Std	0.29369	0.35752	0.30517	0.48742	0.57952	0.22111	0.34981	0.35731	0.34088	0.24266	0.33306
wpbc	Ave	12.97833	19.96333	9.13833	20.79333	16.94255	18.29917	20.97167	22.05667	23.04000	20.22000	15.84511
	Std	0.60246	0.62078	0.53642	0.80158	1.08818	0.43982	0.70165	0.50831	0.53274	0.58223	0.60241

The runtime is the speed at which the algorithm performs feature selection and reflects the complexity of the algorithm. Table 4 shows the average running time of each algorithm for each independent operation on the seven datasets, where NSGA-II and NSGWO have the fastest average running speed, while MOEO-Smp ranks in the middle of the 11 algorithms, which indicates that MOEO-Smp, similar to most of the other compared algorithms, does not have a faster running speed, which is mainly due to the multi-balance pool mechanism leads to an increase in spatial complexity, but this appears from Tables 2 and 3 that such a mechanism can effectively reduce the data dimensionality and maintain a high classification accuracy, which is especially important for subsequent learning algorithms for further data processing. In addition, MOEO-Smp outperforms MOEA-D, MOALO, MOTEO and MOABC in terms of running time, and differs from MOPSO and MOHHO by about 1 s on average, while ensuring higher classification accuracy and better dimensionality reduction.

**Table 4. Average running time (time) of 11 algorithms on 7 single-label datasets.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
landsat	Ave	0.20076	3.42221	5.10736	5.54861	3.85223	6.43558	0.34519	1.47052	5.26922	0.88518	4.27232
	Std	0.01438	0.20855	0.27260	0.09011	0.33037	0.08087	0.02326	0.08079	0.05885	0.01799	0.18835
msplice	Ave	0.90097	4.79104	4.94279	43.12827	5.19492	9.58378	1.13013	6.26653	6.97483	4.56272	3.39810
	Std	0.03504	5.33075	0.32751	3.73052	0.39579	0.13295	0.03468	0.15439	0.12132	0.18416	0.49944
pima	Ave	0.17297	2.91390	6.37377	1.39006	3.35323	6.53927	0.17543	1.70810	4.89595	0.88231	1.95560
	Std	0.00680	0.25176	0.07498	0.05787	0.34220	0.08444	0.01792	0.06515	0.14495	0.02802	0.29014
sonar	Ave	0.24248	3.95164	5.18091	8.85899	4.39555	6.52438	0.50614	1.84730	5.45427	1.07998	5.38975
	Std	0.00742	0.33660	0.32140	0.17441	0.31635	0.03434	0.02492	0.08510	0.03552	0.03598	0.10646
spambase	Ave	0.28365	3.07257	4.79768	8.48452	3.79102	7.11435	0.38618	2.02166	5.58850	1.46362	3.46477
	Std	0.01020	0.27900	0.48667	0.09091	0.36279	0.13209	0.01763	0.09352	0.08328	0.06348	0.70257
wine	Ave	0.16956	3.02029	4.81248	2.18167	4.72672	6.24559	0.28241	1.05205	5.23541	0.80653	3.21216
	Std	0.00977	0.25548	0.42102	0.07433	0.45142	0.05336	0.01351	0.08557	0.05714	0.02621	0.20952
wpbc	Ave	0.18653	3.86699	4.93058	5.15636	4.64585	6.32058	0.39513	1.35223	5.31132	0.90145	4.53122
	Std	0.01126	0.29085	0.37653	0.11396	0.39717	0.09657	0.01806	0.12494	0.07255	0.04371	0.16788

Table 5 shows the Friedman test of the Acc, Nf and time. The p-value values on all seven indicators are less than 0.05, indicating that a statistically significant the p-value values on all seven indicators are less than 0.05, indicating that a statistically significant difference between algorithms, where MR denotes the average ranking from smallest to largest on the seven datasets and Rank denotes the overall ranking. MOEO-Smp ranks 1, 3 and 6 on Acc, Nf and time, respectively, and the p-values are all less than 0.05, indicating that there is a significant difference between the results. In conclusion, this is consistent with the analysis in Tables 2–4, i.e., MOEO-Smp is able to select a suitable subset of features while maintaining optimal accuracy and better dimensionality reduction without causing too slow running time.

**Table 5. Friedman test of feature selection results.**
	Acc		Nf		Time
	MR	Rank	MR	Rank	MR	Rank
NSGA-II²⁹	3.43	9	1.57	1	1.00	1
MOPSO³⁰	7.57	3	7.00	7	5.43	5
MOEA-D³¹	3.29	10	1.71	2	8.00	8
MOALO³²	9.71	2	6.86	6	8.71	9
MOHHO³³	3.86	8	3.86	4	7.00	7
MOTEO³⁴	2.43	11	5.86	5	10.57	11
MSSA³⁵	6.71	4	7.57	9	2.00	2
MOEA-NSGA-II³⁶	5.86	7	10.00	10	4.71	4
MOABC³⁷	6.29	5	10.86	11	9.29	10
NSGWO³⁸	6.14	6	7.29	8	3.14	3
MOEO-Smp	10.71	1	3.43	3	6.14	6
p-Value	2.333E-6		3.273E-9		2.793E-9

Table 6 shows the results of feature selection using pima, sonar and wine dataset in other literatures. Faris et al.⁴⁰ use a wrapper structure, so its classification accuracy on sonar and wine is higher than the filtered structure in this paper. However, since the classifier needs to be run to validate the feature subset in each iteration, this study is significantly better than the wrapper structure in terms of computational complexity, and in terms of dimensionality reduction ability, the feature subset size of MOEO-Smp is slightly smaller than that of BSSA_S3_ CP. Sun et al.⁴¹ combine fuzzy multi-objective neighborhood rough set and whale algorithms, which are best at dealing with unbalanced data, such as pima, and comparing the two-dimensionality reduction capabilities reveals that TFSFB has a much better performance, with a feature subset of approximately 0.43 features less than that of MOEO-Smp. Ranjan and Chhabra⁴² propose a packed crow search algorithm (MO-CSACFS) for clustering and feature selection, which significantly outperforms clustering methods such as FCM, K-means and medoids in the experiments. The dimensionality reduction ability of MOEO-Smp is better than that of MO-CSACFS overall, and on the wine dataset, MOEO-Smp also has higher classification accuracy. In conclusion, MOEO-Smp still has a gap in the accuracy of single-label feature selection compared with wrapper algorithms, but it has advantages in dimensionality reduction ability and computational speed.

**Table 6. Comparison with other literature single-label datasets experimental results.**
		MOEO-Smp			BSSA_S3_CP⁴⁰			TFSFB⁴¹			MO-CSACFS⁴²
		Acc	Nf	Time	Acc	Nf	Time	Acc	Nf	Time	Acc	Nf	Time
pima	Ave	0.71145	4.03009	1.95560	—	—	—	—	3.6	—	—	5.55	—
	Std	0.01390	0.18821	0.29014	—	—	—	—	—	—	—	—	—
sonar	Ave	0.74633	29.02443	5.38975	0.9372	33.3667	—	—	—	—	—	28	—
	Std	0.02560	0.93966	0.10646	0.0097	2.8585	—	—	—	—	—	—	—
wine	Ave	0.74633	6.35864	3.21216	0.9933	6.3333	—	—	—	—	0.7398	10.5	—
	Std	0.04417	0.33306	0.20952	0.0056	0.9589	—	—	—	—	—	—	—

4.3. Multi-label feature selection and experimental results

In the multi-label feature selection test, we select two Mulan multi-label datasets and one real-world COVID-19 multi-label data. And compared with 10 other state-of-the-art algorithms to evaluate the feature selection results from the hamming loss (HL), ranking loss (RL), one-error rate (OE), coverage rate (CV) average precision (AP), the number of features (NF) and the running time (time) these seven aspects. It is worth noting that the COVID-19 dataset collects clinical data from the early stages of the infectious disease and its aims are to diagnose the disease faster and allocate medical resources according to the condition with the help of computer technology. The dataset is based on the blood sample results and personal information of 5644 patients to determine whether a patient is positive and whether need to admit a positive patient or a patient requiring admitted treatment, with four levels of care based on the patient’s condition. The problem has practical significance for reducing the pressure on health care resources in times of mass epidemic infections.

Table 7 shows the feature selection results of the 11 algorithms on the multi-label dataset Emotion. From Table 7, we can see that the feature subset selected by MOEO-Smp has the best average performance on HL, RL, OE, CV and AP. In terms of Nf metrics, the average number of features selected for the feature subset of NSGA-II is about 28.6, which is the best performance among the 11 algorithms, followed by MOEA-D with the average number of features selected for the feature subset of about 29.2, and the average number of features selected for the feature subset of MOEO-Smp is about 37.0, which is second only to NSGA-II and MOEA-D. Therefore, the proposed MOEO-Smp is equally competitive in terms of dimensionality reduction capability. In terms of average running time, MSSA has the shortest average running time among all algorithms, only about 0.43 s. However, MSSA performs worse than MOEO-Smp in all other metrics, while the running time of MOEO-Smp is not much different from MOPSO, MOEA-D, MOBO-I, MOALO and MOEA-NSGA-II, and it can overall obtain better feature selection results.

**Table 7. Results of 7 evaluation metrics for 11 algorithms on the emotion dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
HL	Ave	0.27712	0.26937	0.27361	0.26924	0.26642	0.26855	0.27103	0.27095	0.26900	0.27084	0.26611
	Std	0.00587	0.00670	0.00601	0.01064	0.00594	0.00600	0.00802	0.00614	0.00757	0.00544	0.00456
RL	Ave	0.28743	0.25877	0.27505	0.26116	0.26053	0.26254	0.26338	0.26360	0.25971	0.27225	0.25735
	Std	0.01126	0.01249	0.01107	0.01885	0.00964	0.01447	0.01028	0.01086	0.01330	0.01196	0.01172
OE	Ave	0.42868	0.39047	0.40578	0.40086	0.39375	0.39481	0.39676	0.39704	0.39300	0.39939	0.38929
	Std	0.01315	0.01960	0.01562	0.02772	0.01622	0.01956	0.01476	0.01826	0.01537	0.01564	0.01746
CV	Ave	2.40813	2.28587	2.35456	2.27997	2.27338	2.28860	2.30036	2.30686	2.27557	2.34021	2.26087
	Std	0.06894	0.07309	0.05884	0.09691	0.04927	0.07215	0.07843	0.05807	0.08691	0.07356	0.06188
AP	Ave	0.68629	0.71117	0.69857	0.70705	0.70878	0.70808	0.70720	0.70542	0.71054	0.70190	0.71288
	Std	0.00941	0.01232	0.00940	0.01790	0.00911	0.01320	0.00887	0.01104	0.01071	0.01151	0.01088
Nf	Ave	28.61333	46.36167	46.84833	29.24833	45.89	46.68333	41.88333	47.20333	46.71167	47.95167	36.98029
	Std	0.65547	0.76321	0.79011	0.81459	0.85499	1.59566	1.39948	0.81685	0.67852	1.07423	0.91615
Time (/s)	Ave	0.47163	7.36018	74.11970	10.00212	8.70569	9.35303	3.40671	0.43118	3.34001	9.03118	10.60926
	Std	0.03272	0.51670	5.86598	0.65450	0.15438	0.11256	0.26351	0.01515	1.94448	3.27611	0.36725

Table 8 shows the Friedman test of the seven metrics of the 11 algorithms on the emotion dataset. the rank of MOEO-Smp on HL, RL, OE, CV and AP are all 1 and the p-value is less than 0.05, which indicates that MOEO-Smp significantly outperforms the other algorithms. On the Nf metric, the rank of MOEO-Smp is 3, which indicates that its number of selected features is competitive among the 11 algorithms. On the Time metric, although MOEO-Smp has a low rank, we can see in the actual situation in Table 7 that the actual running time differs from that of five of the algorithms by only about 1s. In addition, the sum rank of MOEO-Smp is the best among the 11 algorithms.

**Table 8. Friedman test of 7 evaluation metrics for 11 algorithms on the emotion dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
HL	MR	8.65	5.50	7.80	6.00	4.45	5.35	6.45	6.05	5.35	6.25	4.15
	Rank	(11)	(5)	(10)	(6)	(2)	(4)	(9)	(7)	(4)	(8)	(1)
	p-Value	4.445E-04
RL	MR	10.05	4.40	8.75	5.10	5.20	5.20	5.30	5.60	4.70	7.45	4.25
	Rank	(11)	(2)	(10)	(4)	(6)	(6)	(7)	(8)	(3)	(9)	(1)
	p-Value	3.013E-10
OE	MR	10.1	4.93	6.85	6.50	5.05	5.18	5.90	5.80	4.90	6.20	4.60
	Rank	(11)	(3)	(10)	(9)	(4)	(5)	(7)	(6)	(2)	(8)	(1)
	p-Value	4.489E-06
CV	MR	9.65	5.15	8.65	4.70	4.95	5.50	5.55	6.20	4.72	6.75	4.18
	Rank	(11)	(5)	(10)	(2)	(4)	(7)	(6)	(8)	(3)	(9)	(1)
	p-Value	4.223E-08
AP	MR	2.05	7.55	3.85	6.15	6.70	6.35	6.30	6.40	7.60	5.00	8.05
	Rank	(11)	(3)	(10)	(8)	(4)	(6)	(7)	(5)	(2)	(9)	(1)
	p-Value	1.414E-08
Nf	MR	1.30	7.00	8.25	1.70	6.35	7.60	4.05	9.05	7.90	9.78	3.00
	Rank	(1)	(6)	(9)	(2)	(5)	(7)	(4)	(10)	(8)	(11)	(3)
	p-Value	8.662E-30
Time	MR	1.95	5.15	11.00	8.45	6.20	7.25	3.50	1.05	3.65	8.25	9.55
	Rank	(2)	(5)	(11)	(9)	(6)	(7)	(3)	(1)	(4)	(8)	(10)
	p-Value	4.029E-35
Sum rank		9	3	11	5	4	6	7	8	2	10	1

The results in Tables 7 and 8 illustrate that MOEO-Smp is able to obtain significantly better performance in most of the metrics. Although MOEO-Smp does not have a significant advantage over some algorithms in terms of runtime, MOEO-Smp is able to achieve better performance in other metrics within an acceptable time frame.

Table 9 shows the feature selection results of the 11 algorithms on the multi-label dataset Birds. The feature subset selected by MOEO-Smp has the best average performance on RL, CV and AP. For the HL metric, MOALO obtained the best performance with a value of 0.05234, while MOEO-Smp obtained the next best performance with an HL value of 0.05235, which differed from MOALO by only 0.00001. On the Nf metric, NSGA-II and MOHHO had values of about 103.0 and 109.4, respectively, and MOEO-Smp had a value of about 111.7, which differed from the former by 8.7 and 1.3, respectively, and obtained the third best performance among the 11 algorithms. In terms of Time metrics, MSSA and NSGA-II have faster running speeds of about 1s and 1.6s. The average running time of MOEO-Smp is about 4.9s, which differs from the former by about 3.9 s and 3.3s, respectively, but is in the middle of the range compared to all other algorithms.

**Table 9. Results of 7 evaluation metrics for 11 algorithms on the birds dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
HL	Ave	0.05397	0.05326	0.05359	0.05443	0.05366	0.05234	0.05591	0.05352	0.05326	0.05379	0.05235
	Std	0.00310	0.00192	0.00157	0.00355	0.00305	0.00116	0.00286	0.00115	0.00224	0.00335	0.00154
RL	Ave	0.16168	0.16503	0.16998	0.16827	0.16359	0.16845	0.16828	0.16391	0.16583	0.16658	0.15933
	Std	0.01082	0.00647	0.00802	0.01325	0.00854	0.00732	0.00946	0.00694	0.01012	0.00952	0.00719
OE	Ave	0.84305	0.85577	0.84594	0.84702	0.84937	0.86099	0.84923	0.85285	0.85307	0.85514	0.84448
	Std	0.01386	0.00953	0.01199	0.01473	0.01649	0.00839	0.02008	0.00926	0.01225	0.01761	0.01072
CV	Ave	3.65860	3.70698	3.77756	3.78817	3.67613	3.75512	3.85136	3.72288	3.68905	3.73304	3.55767
	Std	0.28028	0.18134	0.14800	0.31865	0.22792	0.14450	0.23137	0.16586	0.21807	0.26541	0.16483
AP	Ave	0.21757	0.20823	0.21696	0.21502	0.21434	0.20476	0.21418	0.20850	0.20933	0.20951	0.21839
	Std	0.01330	0.00764	0.00976	0.01388	0.01251	0.00594	0.01332	0.00730	0.01167	0.01354	0.00914
Nf	Ave	103.0100	147.97697	140.90996	140.38333	145.72774	133.48565	109.36944	145.00056	147.39712	167.36167	111.67458
	Std	1.54659	6.50814	9.92121	11.91556	6.72845	10.32736	5.30168	7.77102	7.09284	2.38161	7.53877
Time(/s)	Ave	1.60461	2.64803	119.95474	10.73369	9.41563	39.96614	3.43487	1.04010	1.97633	4.39404	4.87430
	Std	0.07261	0.41798	9.60202	2.30048	0.25635	0.26476	1.54822	0.02742	0.20787	0.17188	6.17833

Table 10 shows the Friedman test of the birds dataset for seven metrics for 11 algorithms. MOEO-Smp obtained the best ranking on both RL and CV with p-values less than 0.05, which indicates that MOEO-Smp significantly outperformed the other algorithms. However, in HL, OE and AP, MOEO-Smp obtained the second best ranking after MOALO, NSGA-II and MOGWO, respectively. It is worth noting that, although in Table 9, MOEO-Smp has a slightly better AP value scale mean than MOGWO, the larger variance of MOGWO leads to a larger overall fluctuation of some AP and a better ranking of some results, resulting in a slightly higher overall ranking than MOEO-Smp. In addition, the Nf ranking of MOEO-Smp is second only to NSGA-II and MOHHO, and among all algorithms MOEO-Smp has a better dimensionality reduction ability. The Time rank of MOEO-Smp is the fifth, which performs better among 11 algorithms. The sum rank of MOEO-Smp ranks first among 11 algorithms, and its overall performance in seven metrics is better than well-known algorithms such as NSGA-II, MOPSO and MOMVO.

The results in Tables 9 and 10 illustrate that MOEO-Smp was able to obtain significantly better results on three of the seven evaluation metrics than the other 11 algorithms. Although again MOEO-Smp failed to obtain the best performance on the other four metrics, it also obtained the second best or third best performance on three of them.

Table 11 shows the feature selection results of the 11 algorithms on the multi-label dataset COVID-19. The feature subset selected by MOEO-Smp has the best average performance on HL, OE, CV and Nf. In the RL and AP metrics, MOALO obtained the best performance with values of 0.01756 and 0.98138, respectively, but MOEO-Smp was second only to MOALO with a value of 0.98131, with a difference of only 0.00029 and 0.00007, respectively. The average running time of MOEO-Smp is about 1.97s, which is second only to MSSA, MOMVO and NSGA-II, and it performs better among all algorithms

**Table 10. Friedman test of 7 evaluation metrics for 11 algorithms on the birds dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
HL	MR	6.55	5.75	6.50	6.65	6.25	3.90	8.25	5.75	5.65	6.63	4.13
	Rank	(8)	(5)	(7)	(10)	(6)	(1)	(11)	(5)	(3)	(9)	(2)
	p-Value	2.939E-03
RL	MR	4.80	5.30	7.85	6.75	5.70	6.75	6.90	5.75	5.90	6.45	3.85
	Rank	(2)	(3)	(10)	(8)	(4)	(8)	(9)	(5)	(6)	(7)	(1)
	p-Value	1.356E-02
OE	MR	4.33	7.70	4.68	5.10	6.03	8.50	5.45	6.40	6.50	6.80	4.53
	Rank	(1)	(10)	(3)	(4)	(6)	(11)	(5)	(7)	(8)	(9)	(2)
	p-Value	2.909E-04
CV	MR	5.00	5.90	7.05	6.55	5.30	6.75	8.15	6.30	5.25	6.10	3.65
	Rank	(2)	(5)	(10)	(8)	(4)	(9)	(11)	(7)	(3)	(6)	(1)
	p-Value	3.802E-03
AP	MR	7.45	4.40	7.75	6.80	6.15	3.75	6.70	4.80	5.40	5.25	7.55
	Rank	(3)	(10)	(1)	(4)	(6)	(11)	(5)	(9)	(7)	(8)	(2)
	p-Value	1.840E-04
Nf	MR	1.20	8.25	6.65	6.25	7.45	4.75	2.40	7.55	8.10	10.9	2.50
	Rank	(1)	(10)	(6)	(5)	(7)	(4)	(2)	(8)	(9)	(11)	(3)
	p-Value	2.533E-29
Time	MR	2.00	4.70	11.00	8.55	8.05	10.00	5.95	1.00	3.10	6.80	4.85
	Rank	(2)	(4)	(11)	(9)	(8)	(10)	(6)	(1)	(3)	(7)	(5)
	p-Value	1.545E-35
Sum rank		2	6	8	8	4	10	9	5	3	11	1

Table 12 shows the Friedman test of the seven metrics of the 11 algorithms on the COVID-19 dataset. MOEO-Smp obtained the best ranking on HL, CV and Nf. It is noteworthy that the p-value is greater than 0.05 for the HL metric, which indicates that the difference between the 11 algorithms on the HL metric is not significant, and the reason may be that there is a sample imbalance in this data resulting in a small gap between HL. The p-values of the other indicators were less than 0.05, which indicated that the experimental results of the other indicators were significantly different. On the RL and AP indicators, although MOALO received the best ranking, MOEO-Smp received the second best ranking after MOALO. On the OE metric, MOALO and NSGA-II received the highest ranking, and MOEO-Smp received the third best ranking after them. In the Time metric, MSSA has the best rank, and MOEO-Smp has a rank of 5, which is in the middle of the 11 algorithms. And in sum rank, MOEO-Smp ranks first among the 11 algorithms, indicating that the overall performance of MOEO-Smp is better than all other compared algorithms.

The results in Tables 11 and 12 illustrate that MOEO-Smp obtained the best results in four of the seven evaluation metrics and had significant differences compared to the results of the other algorithms. Also, the rank and sum rank indicate that MOEO-Smp is highly competitive among all the compared algorithms.

In summary, the results of multi-label feature selection experiments with 10 comparison algorithms show that MOEO-Smp is able to obtain a subset of features with higher classification accuracy within an acceptable time frame compared to well-known algorithms such as NSGA-II, MOPSO and MOGWO. On some metrics that did not obtain the best performance and ranking, MOEO-Smp was able to obtain the second best or third best results in most cases and ranked proudly overall. Also, MOEO-Smp is one of the best algorithms in terms of dimensionality reduction among all the compared algorithms. Therefore, among the 11 algorithms, MOEO-Smp is the best overall performer when performing multi-label feature selection.

**Table 11. Results of 7 evaluation metrics for 11 algorithms on the COVID-19 dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC³⁷	NSGWO³⁸	MOEO-Smp
HL	Ave	0.03165	0.03195	0.03172	0.03176	0.03142	0.03143	0.03177	0.03165	0.03187	0.03139	0.03101
	Std	0.00157	0.00068	0.00039	0.00157	0.00091	0.00125	0.00116	0.00167	0.00110	0.00151	0.00082
RL	Ave	0.01789	0.01877	0.01918	0.01794	0.01778	0.01756	0.01800	0.01852	0.01809	0.01833	0.01774
	Std	0.00184	0.00098	0.00078	0.00201	0.00128	0.00098	0.00116	0.00183	0.00113	0.00201	0.00057
OE	Ave	0.02939	0.03116	0.03047	0.02951	0.02937	0.02924	0.02989	0.03036	0.02959	0.03013	0.02923
	Std	0.00259	0.00155	0.00085	0.00293	0.00199	0.00175	0.00149	0.00302	0.00164	0.00254	0.00078
CV	Ave	0.16979	0.17145	0.17339	0.17067	0.16757	0.16800	0.16987	0.17072	0.17149	0.16847	0.16634
	Std	0.00912	0.00409	0.00360	0.01012	0.00631	0.00554	0.00572	0.00909	0.00647	0.00943	0.00447
AP	Ave	0.98120	0.98007	0.98012	0.98104	0.98126	0.98138	0.98099	0.98048	0.98100	0.98066	0.98131
	Std	0.00169	0.00095	0.00052	0.00189	0.00123	0.00099	0.00111	0.00181	0.00104	0.00177	0.00057
Nf	Ave	39.45500	51.86138	54.04788	43.05333	52.13429	48.55477	40.69449	50.51974	50.75303	64.98333	35.00837
	Std	0.77053	1.95281	5.67209	1.37621	1.95826	2.35458	2.71899	2.05493	2.06256	1.00754	4.32374
Time(/s)	Ave	0.76755	2.92823	66.96535	8.87605	6.18422	25.44448	2.05403	0.52357	0.74664	2.81614	1.97491
	Std	0.044	0.24419	5.06880	0.53101	0.14841	26.72942	2.01739	0.51793	0.02854	0.10381	0.22623

**Table 12. Friedman test of 7 evaluation metrics for 11 algorithms on the COVID-19 dataset.**
		NSGA-II²⁹	MOPSO³⁰	MOEA-D³¹	MOALO³²	MOHHO³³	MOTEO³⁴	MSSA³⁵	MOEA-NSGA-II³⁶	MOABC37	NSGWO³⁸	MOEO-Smp
HL	MR	6.15	7.35	6.35	6.65	5.45	5.55	6.30	5.80	6.35	5.55	4.50
	Rank	(6)	(11)	(9)	(10)	(2)	(4)	(7)	(5)	(9)	(4)	(1)
	p-Value	4.247E-1
RL	MR	5.00	7.75	8.75	5.55	5.35	4.35	5.85	6.50	5.95	5.95	5.00
	Rank	(3)	(10)	(11)	(5)	(4)	(1)	(6)	(9)	(8)	(8)	(3)
	p-Value	1.016E-03
OE	MR	5.00	8.25	7.55	5.65	5.65	4.95	6.15	6.15	5.50	6.13	5.10
	Rank	(2)	(11)	(10)	(6)	(6)	(1)	(9)	(9)	(4)	(7)	(3)
	p-Value	2.707E-02
CV	MR	5.90	7.10	7.90	6.15	4.75	5.20	6.25	6.35	6.50	5.25	4.65
	Rank	(5)	(10)	(11)	(6)	(2)	(3)	(7)	(8)	(9)	(4)	(1)
	p-Value	5.574E-02
AP	MR	6.90	3.80	3.75	6.40	6.55	7.50	6.10	5.50	6.55	5.75	7.20
	Rank	(3)	(10)	(11)	(6)	(5)	(1)	(7)	(9)	(5)	(8)	(2)
	p-Value	1.699E-03
Nf	MR	2.25	7.95	8.55	3.80	8.35	5.90	2.80	7.00	7.20	10.95	1.25
	Rank	(2)	(8)	(10)	(4)	(9)	(5)	(3)	(6)	(7)	(11)	(1)
	p-Value	1.116E-31
Time	MR	2.50	6.55	10.9	8.95	7.95	10.10	4.20	1.40	2.30	6.35	4.80
	Rank	(3)	(7)	(11)	(9)	(8)	(10)	(4)	(1)	(2)	(6)	(5)
	p-Value	7.969E-36
Sum rank		2	10	11	7	4	3	5	8	6	9	1

Table 13 shows the experimental results from literature^{43,44,45,46,47} using Emotion, birds and COVID-19 datasets to test the effectiveness of the proposed algorithm. We can see that on the Emotion dataset, the Hamming loss of MOEO-Smp outperforms RFSFS and RWFS, but is worse than MOMFS. The number of selected features is similar to that of MOMFS, which indicates that both of them are more effective in dimensionality reduction, but MOEO-Smp significantly outperforms MOMFS in terms of correctness and running time. On the Birds dataset, the HL metrics of the proposed method are similar to those of RWFS and MFSJMI, but MOEO-Smp runs about 25 times faster than MFSJMI. On the COVID-19 dataset, although the literature⁴⁷ uses data from the same source as this paper, the methods are not similar in the processing of the data, and its data size is smaller than that of the COVID-19 dataset in this paper, and then in this case, the MOEO-Smp is much higher than the BMPA-TVSinV, which may be caused by the difference in the number of unbalanced samples in the data. All in all, MOEO-Smp is not inferior to other methods proposed in the literature in terms of feature selection ability, and also has an advantage in terms of runtime.

**Table 13. Comparison with other literature multi-label datasets experimental results.**
		Emotion				Birds			COVID-19
		MOEO-Smp	RFSFS⁴³	MOMFS⁴⁴	RWFS⁴⁵	MOEO-Smp	RWFS⁴⁵	MFSJMI⁴⁶	MOEO-Smp	BMPA-TVSinV⁴⁷
HL	Ave	0.26611	0.2783	0.19	0.2677	0.05235	0.05	0.0506	0.03101	—
	Std	0.00456	0.0146	—	0.0175	0.00154	0.0017	0.0013	0.00082	—
AP	Ave	0.71288	—	0.31	—	0.21839	—	—	0.98131	0.8677
	Std	0.01088	—	—	—	0.00914	—	—	0.00057	—
Nf	Ave	36.98029	—	36	—	111.67458	—	—	35.00837	—
	Std	0.91615	—	—	—	7.53877	—	—	4.32374	—
Time	Ave	10.60926	—	15.81	—	4.87430	—	133.59	1.97491	—
	Std	0.36725	—	—	—	6.17833	—	—	0.22623	—

5. Conclusion

In this study, a filter-based improved multi-objective equilibrium optimizer (MOEO-Smp) is proposed to solve the feature selection problem for both single-label and multi-label datasets. The scoring mechanism and optimization principles are introduced so that solutions with high optimization potential and better performance are filtered out from the population and key features in the better performing solutions can be selected after updating. The optimal equilibrium pool, candidate solution equilibrium pool and candidate feature equilibrium pool are created so that the information of the most promising solutions and features are recorded to guide the update of other solutions. Also, the learning probability dynamically adjusts the convergence and search ability of the algorithm during the iterative process of MOEO-Smp, which not only enables the proposed algorithm to search the target space extensively at the beginning of the iteration to prevent falling into local optimum, but also accelerates the speed and accuracy of convergence at the later stage. Seven single-label datasets from UCI are used to test the single-label feature selection ability of MOEO-Smp, and two multi-label datasets from Mulan and one COVID-19 multi-label dataset from real-world problems are used to test the feature selection ability of MOEO-Smp for both single-label and multi-label data. The results of the single-label feature selection experiments show that MOEO-Smp has significantly better results in terms of accuracy of classification, number of selected features and running time compared to the other 10 state-of-the-art algorithms. The multi-label feature selection experiments evaluate the feature selection results in terms of seven metrics, and the results show that MOEO-Smp has the best overall competitiveness compared to the other 10 state-of-the-art algorithms and has the best performance on multiple metrics in each multi-label dataset. In summary, MOEO-Smp not only solves the single-label feature selection task excellently, but also achieves better performance for multi-label data with larger sample size. Because of the need to score the optimization process as well as the optimization results, there is a slight increase in the computing time of MOEO-Smp in practical applications, so how to reduce the amount of scoring calculations and reduce the scoring steps to complete the task of feature selection for higher dimensional data is the next step of this paper.

Acknowledgment

This study was funded by the National Natural Science Foundation of China (No. 72104069), the Science and Technology Department of Henan Province, China (No. 222102210065), and the Postgraduate Meritocracy Scheme, China (No. SYL19060145).

ORCID

Wendong Wang https://orcid.org/0009-0009-8929-4941

Yu Li https://orcid.org/0000-0001-9748-6024

Jingsen Liu https://orcid.org/0000-0002-2828-4223

Huan Zhou https://orcid.org/0000-0002-2785-8041

Remember to check out the Most Cited Articles!
Check out these titles in artificial intelligence!

References

1. M. B. Dowlatshahi, D. Vali and N. Hossein , A novel three-stage filter-wrapper framework for miRNA subset selection in cancer classification, Informatics 5(1) (2018) 2–19. https://doi.org/10.3390/informatics5010013 Crossref, Google Scholar
2. L. Ming and X. Hu , Feature selection in supervised saliency prediction, IEEE Trans. Cybern. 45(5) (2014) 914–926. https://doi.org/10.1109/tcyb.2014.2338893 Crossref, Google Scholar
3. R. R. Rajalaxmi, S. Mirjalili, E. Gothai and P. Natesan , Binary grey wolf optimizer with mutation and adaptive K-nearest neighbour for feature selection in Parkinson’s disease diagnosis, Knowl.-Based Syst. 246 (2022) 108701. https://doi.org/10.1016/j.knosys.2022.108701 Crossref, Google Scholar
4. R. E. Schapire and Y. Singer , BoosTexter: A boosting-based system for text categorization, Mach. Learn. 39 (2000) 135–168. https://doi.org/10.1023/A:1007649029923 Crossref, Google Scholar
5. Z. Barutcuoglu, R. E. Schapire and O. G. Troyanskaya , Hierarchical multi-label prediction of gene function, Bioinformatics 22(7) (2006) 830–836. https://doi.org/10.1093/bioinformatics/btk048 Crossref, Google Scholar
6. D. Manoranjan and H. Liu , Feature selection for classification, Intell. Data Anal. 1(1–4) (1997) 131–156. https://doi.org/10.1016/s1088-467x(97)00008-5 Crossref, Google Scholar
7. A. D. Li, B. Xue and M. Zhang , Multi-objective particle swarm optimization for key quality feature selection in complex manufacturing processes, Inf. Sci. 641 (2023) 119062. https://doi.org/10.1016/j.ins.2023.119062 Crossref, Google Scholar
8. K. Hussain, N. Neggaz, W. Zhu and E. H. Houssein , An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection, Expert Syst. Appl. 176 (2021) 114778. https://doi.org/10.1016/j.eswa.2021.114778 Crossref, Google Scholar
9. X. F. Song, Y. Zhang, D. Gong and X. Z. Gao , A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data, IEEE Trans. Cybern. 52(9) (2021) 9573–9586. https://doi.org/https://doi.org/10.1109/TCYB.2021. 3061152 Crossref, Google Scholar
10. Q. Wu, Z. Ma, J. Fan, G. Xu and Y. Shen , A feature selection method based on hybrid improved binary quantum particle swarm optimization, IEEE Access 7 (2019) 80588–80601. https://doi.org/https://doi.org/10.1109/ACCESS.2019.2919956 Crossref, Google Scholar
11. M. R. Boutell, J. Luo, X. Shen and C. Brown , Learning multi-label scene classification, Pattern Recognit. 37(9) (2004) 1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009 Crossref, Google Scholar
12. M. L. Zhang and Z. H. Zhou , ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognit. 40(7) (2007) 2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019 Crossref, Google Scholar
13. M. Rostami, S. Forouzandeh, K. Berahmand and M. Soltani , Integration of multi-objective PSO based feature selection and node centrality for medical datasets, Genomics 112(6) (2020) 4370–4384. https://doi.org/10.1016/j.ygeno.2020.07.027 Crossref, Google Scholar
14. H. Dong, J. Sun, X. Sun and R. Ding , A many-objective feature selection for multi-label classification, Knowl.-Based Syst. 208 (2020) 106456. https://doi.org/10.1016/j.knosys.2020.106456 Crossref, Google Scholar
15. M. Paniri, M. B. Dowlatshahi and H. Nezamabadi-Pour , MLACO: A multi-label feature selection algorithm based on ant colony optimization, Knowl.-Based Syst. 192 (2020) 105285. https://doi.org/10.1016/j.knosys.2019.105285 Crossref, Google Scholar
16. C. Zhong, G. Li, Z. Meng and W. He , Opposition-based learning equilibrium optimizer with Levy flight and evolutionary population dynamics for high-dimensional global optimization problems, Expert Syst. Appl. 215 (2023) 119303. https://doi.org/10.1016/j.eswa.2022.119303 Crossref, Google Scholar
17. R. M. Rizk-Allah and E. H. Aboul , A hybrid equilibrium algorithm and pattern search technique for wind farm layout optimization problem, ISA Trans. 132 (2023) 402–418. https://doi.org/10.1016/j.isatra.2022.06.014 Crossref, Google Scholar
18. A. El-Ela, R. A. El-Sehiemy, A. M. Shaheen, W. A. Wahbi and M. T. Mouwafi , A multi-objective equilibrium optimization for optimal allocation of batteries in distribution systems with lifetime maximization, J. Energy Storage 55 (2022) 105795. https://doi.org/10.1016/j.est.2022.105795 Crossref, Google Scholar
19. Z. A. Varzaneh, S. Hossein, S. E. Mood and M. M. Javidi , A new hybrid feature selection based on Improved Equilibrium Optimization, Chemometr. Intell. Lab. Syst. 228 (2022) 104618. https://doi.org/10.1016/j.chemolab.2022.104618 Crossref, Google Scholar
20. M. Vommi and K. B. Tirumala , A binary Bi-phase mutation-based hybrid Equilibrium Optimizer for feature selection in medical datasets classification, Comput. Electr. Eng. 105 (2023) 108553. https://doi.org/10.1016/j.compeleceng.2022.108553 Crossref, Google Scholar
21. A. Faramarzi, M. Heidarinejad, B. Stephens and S. Mirjalili , Equilibrium optimizer: A novel optimization algorithm. Knowl.-Based Syst. 191 (2020) 105190. https://doi.org/10.1016/j.knosys.2019.105190 Crossref, Google Scholar
22. X. Wu, J. Han, D. Wang, P. Gao, Q. Cui, L. Chen, Y. Liang, H. Huang, H. P. Lee, C. Miao, Y. Zhou and C. Wu , Incorporating Surprisingly Popular Algorithm and Euclidean distance-based adaptive topology into PSO, Swarm Evol. Comput. 76 (2023) 101222. https://doi.org/10.1016/j.swevo.2022.101222 Crossref, Google Scholar
23. C. Blake, UCI repository of machine learning databases (1998), http://www.ics.uci.edu/∼mlearn/MLRepository.html. Google Scholar
24. G. Tsoumakas, I. Katakis and I. Vlahavas , Mining multi-label data, in Data Mining and Knowledge Discovery Handbook, eds. O. Maimon and L. Rokach (Springer, Boston, MA, 2009), pp. 667–685. https://doi.org/10.1007/978-0-387-09823-4_34 Crossref, Google Scholar
25. Mulan: A Java Library for Multi-Label Learning, https://mulan.sourceforge.net/datasets-mlc.html. Google Scholar
26. Diagnosis of COVID-19 and its clinical spectrum (2020), https://www.kaggle.com/einsteindata4u/covid19. Google Scholar
27. Blood work based COVID19 screening LGBM (2022), https://www.kaggle.com/code/andrewmvd/bloodwork-based-covid19-screening-lgbm/data. Google Scholar
28. J. Lee and D. W. Kim , Mutual information-based multi-label feature selection using interaction information, Expert Syst. Appl. 42(4) (2015) 2013–2025. https://doi.org/10.1016/j.eswa.2014.09.063 Crossref, Google Scholar
29. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan , A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6(2) (2002) 182–197. https://doi.org/10.1109/4235.996017 Crossref, Google Scholar
30. C. Coello and M. S. Lechuga , MOPSO: A proposal for multiple objective particle swarm optimization, in Proc. 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Honolulu, HI, USA, Vol. 2 (2002), pp. 1051–1056. https://doi.org/https://doi.org/10.1109/CEC.2002.1004388 Crossref, Google Scholar
31. Q. Zhang and H. Li , MOEA/D: A multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 11(6) (2007) 712–731. https://doi.org/10.1109/tevc.2007.892759 Crossref, Google Scholar
32. S. Mirjalili, P. Jangir and S. Saremi , Multi-objective ant lion optimizer: A multi-objective optimization algorithm for solving engineering problems, Appl. Intell. 46 (2017) 79–95. https://doi.org/10.1007/s10489-016-0825-8 Crossref, Google Scholar
33. A. A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja and H. Chen , Harris hawks optimization: Algorithm and applications, Future Gener. Comput. Syst. 97 (2019) 849–872. https://doi.org/10.1016/j.future.2019.02.028 Crossref, Google Scholar
34. N. Khodadadi, T. Siamak and D. E. Armin , MOTEO: A novel multi-objective thermal exchange optimization algorithm for engineering problems, Soft Comput. 26(14) (2022) 6659–6684. https://doi.org/10.1007/s00500-022-07050-7 Crossref, Google Scholar
35. N. Khodadadi, S. Talatahari and A. D. Eslamlou , Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems, Adv. Eng. Softw. 114 (2017) 163–191. https://doi.org/10.1016/j.advengsoft.2017.07.002 Crossref, Google Scholar
36. A. Seshadri, Multi-objective optimization using evolutionary algorithms (MOEA) (2006). Matlab Website: http://www.Mathworks.com/matlabcentral/fileexchange/10429. Google Scholar
37. Y. Zhang, S. Cheng, Y. Shi, D. Gong and X. Zhao , Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm, Expert Syst. Appl. 137 (2019) 46–58. https://doi.org/10.1016/j.eswa.2019.06.044 Crossref, Google Scholar
38. P. Jangir and J. Narottam , A new non-dominated sorting grey wolf optimizer (NS-GWO) algorithm: Development and application to solve engineering designs and economic constrained emission dispatch problem with integration of wind power, Eng. Appl. Artif. Intell. 72 (2018) 449–467. https://doi.org/10.1016/j.engappai.2018.04.018 Crossref, Google Scholar
39. G. Guo, H. Wang, D. Bell, Y. Bi and K. Greer , KNN model-based approach in classification, in On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003, Lecture Notes in Computer Science, Vol. 2888 (Springer, Berlin, 2003), pp. 986–996. https://doi.org/10.1007/978-3-540-39964-3_62 Crossref, Google Scholar
40. H. Faris, M. M. Mafarja, A. A. Heidari, I. Aljarah, A. Z. Ala’m, S. Mirjalili and H. Fujita , An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems, Knowl.-Based Syst. 154 (2018) 43–67. https://doi.org/https://doi.org/10.1016/j.knosys.2018.05.009 Crossref, Google Scholar
41. L. Sun, S. Si, W. Ding, X. Wang and J. Xu , TFSFB: Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data, Inf. Fusion 95 (2023) 91–108. https://doi.org/10.1016/j.inffus.2023.02.016 Crossref, Google Scholar
42. R. Ranjan and J. K. Chhabra , Automatic clustering and feature selection using multi-objective crow search algorithm, Appl. Soft Comput. 142 (2023) 110305. https://doi.org/10.1016/j.asoc.2023.110305 Crossref, Google Scholar
43. Y. Li, L. Hu and W. Gao , Multi-label feature selection via robust flexible sparse regularization, Pattern Recognit. 134 (2023) 109074. https://doi.org/10.1016/j.patcog.2022.109074 Crossref, Google Scholar
44. H. Dong, J. Sun, T. Li, R. Ding and X. Sun , A multi-objective algorithm for multi-label filter feature selection problem, Appl. Intell. 50 (2020) 3748–3774. https://doi.org/10.1007/s10489-020-01785-2 Crossref, Google Scholar
45. L. Hu, L. Gao, Y. Li, P. Zhang and W. Gao , Feature-specific mutual information variation for multi-label feature selection, Inf. Sci. 593 (2022) 449–471. https://doi.org/10.1016/j.ins.2022.02.024 Crossref, Google Scholar
46. P. Zhang, G. Liu and J. Song , MFSJMI: Multi-label feature selection considering join mutual information and interaction weight, Pattern Recognit. 138 (2023) 109378. https://doi.org/10.1016/j.patcog.2023.109378 Crossref, Google Scholar
47. Z. Beheshti , BMPA-TVSinV: A binary marine predators algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection, Knowl.-Based Syst. 252 (2022) 109446. https://doi.org/10.1016/j.knosys.2022.109446 Crossref, Google Scholar