Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Multi-label learning (MLL) problems abound in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. Issues that severely limit the applicability of many current machine learning approaches to MLL are the large-scale problem, which have a strong impact on the computational complexity of learning. These problems are especially pronounced for approaches that transform MLL problems into a set of binary classification problems for which Support Vector Machines (SVMs) are used. On the other hand, the most efficient approaches to MLL, based on decision trees, have clearly lower predictive performance. We propose a hybrid decision tree architecture, where the leaves do not give multi-label predictions directly, but rather utilize local SVM-based classifiers giving multi-label predictions. A binary relevance architecture is employed in the leaves, where a binary SVM classifier is built for each of the labels relevant to that particular leaf. We use a broad range of multi-label datasets with a variety of evaluation measures to evaluate the proposed method against related and state-of-the-art methods, both in terms of predictive performance and time complexity. Our hybrid architecture on almost every large classification problem outperforms the competing approaches in terms of the predictive performance, while its computational efficiency is significantly improved as a result of the integrated decision tree.
Multi-label active learning for image classification has been a popular research topic. It faces several challenges, even though related work has made great progress. Existing studies on multi-label active learning do not pay attention to the cleanness of sample data. In reality, data are easily polluted by external influences that are likely to disturb the exploration of data space and have a negative effect on model training. Previous methods of label correlation mining, which are purely based on observed label distribution, are defective. Apart from neglecting noise influence, they also cannot acquire sufficient relevant information. In fact, they neglect inner relation mapping from example space to label space, which is an implicit way of modeling label relationships. To solve these issues, we develop a novel multi-label active learning with low-rank application (ENMAL) algorithm in this paper. A low-rank model is constructed to quantize noise level, and the example-label pairs that contain less noise are emphasized when sampling. A low-rank mapping matrix is learned to signify the mapping relation of a multi-label domain to capture a more comprehensive and reasonable label correlation. Integrating label correlation with uncertainty and considering sample noise, an efficient sampling strategy is developed. We extend ENMAL with automatic labeling (denoted as AL-ENMAL) to further reduce the annotation workload of active learning. Empirical research demonstrates the efficacy of our approaches.
In order to solve the inaccuracy of age estimation dataset and the imbalance of age distribution, this paper proposes an age estimation model based on the structured sparse learning. Firstly, the Multi-label representation of facial images is performed by age, and the age estimation model is trained by solving the model matrix. Finally, the correlation with all age labels is calculated according to the facial images and age estimation model to be tested, and the most correlated age is taken as the predicted age. This paper sets up a series of verification experiments, and analyzes the structured sparse age estimation model from several perspectives. The proposed algorithm has achieved good results in the evaluation of indexes such as the mean absolute error, accumulation index curve and convergence rate, and has designed the demo system to put the model into use. Facts prove that the age estimation model proposed in this paper may achieve a good estimation effect.
Reading and interpreting the medical image still remains the most challenging task in radiology. Through the important achievement of deep Convolutional Neural Networks (CNN) in the context of medical image classification, various clinical applications have been provided to detect lesions from Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) scans. In the diagnosis process for the liver cancer from Dynamic Contrast-Enhanced MRI (DCE-MRI), radiologists consider three phases during contrast injection: before injection, arterial phase, and portal phase for instance. Even if the contrast agent helps in enhancing the tumoral tissues, the diagnosis may be very difficult due to the possible low contrast and pathological tissues surrounding the tumors (cirrhosis). Alongside, in the medical field, ontologies have proven their effectiveness to solve several clinical problems such as offering shareable terminologies, vocabularies, and databases. In this article, we propose a multi-label CNN classification approach based on a parallel preprocessing algorithm. This algorithm is an extension of our previous work cited in the International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI) 2020. The aim of our approach is to ameliorate the detection of HCC lesions and to extract more information about the detected tumor such as the stage, the localization, the size, and the type thanks to the use of ontologies. Moreover, the integration of such information has improved the detection process. In fact, experiments conducted by testing with real patient cases have shown that the proposed approach reached an accuracy of 93% using MRI patches of 64×64 pixels, which is an improvement compared with our previous works.
Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.
Effectively reducing the dimensionality of big data and retaining its key information has been a research challenge. As an important step in data pre-processing, feature selection plays a critical role in reducing data size and increasing the overall value of the data. Many previous studies have focused on single-label feature selection, however, with the increasing variety of data types, the need for feature selection on multi-label data types has also arisen. Unlike single-labeled data, multi-labeled data with more combinations of classifications place higher demands on the capabilities of feature selection algorithms. In this paper, we propose a filter-based Multi-Objective Equilibrium Optimizer algorithm (MOEO-Smp) to solve the feature selection problem for both single-label and multi-label data. MOEO-Smp rates the optimization results of solutions and features based on four pairs of optimization principles, and builds three equilibrium pools to guide exploration and exploitation based on the total scores of solutions and features and the ranking of objective fitness values, respectively. Seven UCI single-label datasets and two Mulan multi-label datasets and one COVID-19 multi-label dataset are used to test the feature selection capability of MOEO-Smp, and the feature selection results are compared with 10 other state-of-the-art algorithms and evaluated using three and seven different metrics, respectively. Feature selection experiments and comparisons with the results in other literatures show that MOEO-Smp not only has the highest classification accuracy and excellent dimensionality reduction on single-labeled data, but also performs better on multi-label data in terms of Hamming loss, accuracy, dimensionality reduction, and so on.