The accurate identification of errors in machine-translated English textual content is both necessary and tough, with massive implications for natural language processing packages. current device translation structures, while an increasing number of sophisticated, nevertheless fall prey to quite a number of errors that may compromise which means and fluency. This paper addresses these shortcomings through offering a greater system mastering method for the adaptive recognition of translation errors. We gift a singular framework that integrates the Transformer version, famed for its efficacy in shooting contextual relationships within text sequences, with a robust attention mechanism that prioritizes salient data at some stage in translation. The innovation of this studies lies in the incorporation of meta-getting to know techniques, allowing the model to self-modify in response to various classes of mistakes, therefore refining the precision of errors recognition. We outline the technique in detail, emphasizing the systematic steps taken to enrich the model’s adaptability. The efficacy of our approach is substantiated through comparative experiments, which demonstrate extremely good upgrades in errors identification over existing techniques. The results suggest the capacity of our adaptive mechanism to decorate the excellent of machine translation, paving the method for greater, reliable and nuanced language translation tools.
In the last years, the number of machine learning algorithms and their parameters has increased significantly. On the one hand, this increases the chances of finding better models. On the other hand, it increases the complexity of the task of training a model, as the search space expands significantly. As the size of datasets also grows, traditional approaches based on extensive search start to become prohibitively expensive in terms of computational resources and time, especially in data streaming scenarios. This paper describes an approach based on meta-learning that tackles two main challenges. The first is to predict key performance indicators of machine learning models. The second is to recommend the best algorithm/configuration for training a model for a given machine learning problem. When compared to a state-of-the-art method (AutoML), the proposed approach is up to 130x faster and only 4% worse in terms of average model quality. Hence, it is especially suited for scenarios in which models need to be updated regularly, such as in streaming scenarios with big data, in which some accuracy can be traded for a much shorter model training time.
Machine learning algorithms are commonly used for quickly and efficiently counting people from a crowd. Test-time adaptation methods for crowd counting adjust model parameters and employ additional data augmentation to better adapt the model to the specific conditions encountered during testing. The majority of current studies concentrate on unsupervised domain adaptation. These approaches commonly perform hundreds of epochs of training iterations, requiring a sizable number of unannotated data of every new target domain apart from annotated data of the source domain. Unlike these methods, we propose a meta-test-time adaptive crowd counting approach called CrowdTTA, which integrates the concept of test-time adaptation into the meta-learning framework and makes it easier for the counting model to adapt to the unknown test distributions. To facilitate the reliable supervision signal at the pixel level, we introduce uncertainty by inserting the dropout layer into the counting model. The uncertainty is then used to generate valuable pseudo labels, serving as effective supervisory signals for adapting the model. In the context of meta-learning, one image can be regarded as one task for crowd counting. In each iteration, our approach is a dual-level optimization process. In the inner update, we employ a self-supervised consistency loss function to optimize the model so as to simulate the parameters update process that occurs during the test phase. In the outer update, we authentically update the parameters based on the image with ground truth, improving the model’s performance and making the pseudo labels more accurate in the next iteration. At test time, the input image is used for adapting the model before testing the image. In comparison to various supervised learning and domain adaptation methods, our results via extensive experiments on diverse datasets showcase the general adaptive capability of our approach across datasets with varying crowd densities and scales.
Learning from only few samples is a challenging problem and meta-learning is an effective approach to solve it. Meta-learning model aims to learn by training a large number of other samples. When encountering target task, the model can quickly adapt and obtain better performance with only few labeled samples. However, general meta-learning only provides a universal model that has certain generalization ability for all unknown tasks, which causes limited effects on specific target tasks. In this paper, we propose a Few-shot Directed Meta-learning (FSDML) model to specialize and solve the target task by using few labeled samples of the target task to direct the meta-learning process. FSDML divides model parameters into shared parameters and target adaptation parameters to store prior knowledge and determine the update direction. These two parts of the parameters are updated in different stages of training. We conduct experiments of image classification task on miniImageNet and Omniglot and the results show that FSDML has better performance.
This paper proposes a few-shot pedestrian re-identification (Re-ID) model based on an improved ResNet50 with a compression and stimulation module, which is named CS-ResNet50. It combines the meta-learning framework with metric learning. This method first compresses residual network channels, then stimulates them to achieve the effect of feature weighting, ultimately making feature extraction more accurate. The research makes the model learn how to finish new tasks efficiently from its experience that it has obtained in the training process of former subtasks. In each subtask, the dataset is divided into a gallery set and a query set, where the model parameters are trained. In this way, the model can be trained efficiently and adopted to new tasks rapidly, which could solve few-shot Re-ID problems. Compared with the baseline, the proposed model improves two indicators efficiently on two Re-ID datasets and achieves better Re-ID effect in few-shot mode.
Meta-learning stands as a prevalent framework utilized in few-shot learning methods. Nonetheless, its efficacy hinges on substantial data availability during meta-training. Recent work adeptly tackled this hurdle by synergizing prompt tuning with the meta-learning paradigm, consequently attaining unparalleled performance on four benchmarks (FewRel, HuffPost, Reuters and Amazon). Nonetheless, the implementation efficacy of the previous method leaves room for enhancement, which is especially crucial when tuning larger language models. To this end, we introduce another expedited prompt tuning approach nested within the meta-learning framework. The novel approach normalizes the label information and sample information and uses the regression method to obtain the closed-form solution of each few-shot task, which significantly enhances inference speed, achieving a twofold improvement, while concurrently elevating average accuracy by 1.7∼3.0% on the same benchmarks. Moreover, it demonstrates enhanced stability when faced with limited meta-training data, which is more applicable in many real scenarios where parallel data is rare. The source code is available to reproduce the results (http://github.com/Dr-Lv/EMPT).
Intelligent Traffic Management is a crucial issue closely related to daily life and productivity, with traffic congestion being a complex and challenging problem faced by most cities. Traffic Signal Control (TSC) stands out as the most direct and effective method to tackle congestion. It aims to minimize travel time, enhance throughput, improve traffic safety, reduce emissions, and conserve energy by coordinating the direction and timing of vehicle movements at intersections. Traditional TSC methods mostly rely on simple rules, limited data, and expert knowledge, making them inadequate for increasingly complex traffic scenarios. In the context of TSC, an increasing number of researchers are turning to Deep Learning (DL) methods to address identification, decision-making, and optimization challenges. Although many reviews have examined the TSC problems and the application of Reinforcement Learning in this field, there remains a notable gap in comprehensive analyses of TSC utilizing a wider range of DL techniques, including Deep Reinforcement Learning, Federated Learning, and Meta-learning. This paper, building upon the basic concepts and traditional approaches of TSC, provides a detailed overview of the latest research advancements employing different DL methods for this issue. Experimental settings and evaluations are also introduced. Furthermore, to spark new interest in this research field, future works are proposed.
The selection of an appropriate inducer is crucial for performing effective classification. In previous work we presented a system called NOEMON which relied on a mapping between dataset characteristics and inducer performance to propose inducers for specific datasets. Instance-based learning was used to create that mapping. Here we explore the use of decision trees inducers as the inducers on the meta-learning level. We believe that they posses a set of properties that match the properties of the meta-learning problem that we are trying to solve. The results show that the performance of the system is indeed improved with the use of the decision tree learners.
Nowadays, there is no doubt that machine learning techniques can be successfully applied to data mining tasks. Currently, the combination of several classifiers is one of the most active fields within inductive machine learning. Examples of such techniques are boosting, bagging and stacking. From these three techniques, stacking is perhaps the less used one. One of the main reasons for this relates to the difficulty to define and parameterize its components: selecting which combination of base classifiers to use, and which classifier to use as the meta-classifier. One could use for that purpose simple search methods (e.g. hill climbing), or more complex ones (e.g. genetic algorithms). But before search is attempted, it is important to know the properties of the search space itself. In this paper we study exhaustively the space of Stacking systems that can be built by using four base learning systems: C4.5, IB1, Naive Bayes, and PART. We have also used the Multiple Linear Response (MLR) as meta-classifier. The properties of this state-space obtained in this paper will be useful for designing new Stacking-based algorithms and tools.
Contrary to much of the research in machine learning where there is a concentration on problems with relatively small volume of data, one of the main challenges of the today's data mining systems is their ability to handle data that is substantially larger than available main memory on a single processor. In this paper, we present a distributed technique that consists in combining the partial results of different classifiers supplied, in parallel, with different subsets of data. It is, actually, a two-phase process. At first, a number of classifiers are trained, each with a different subset of the data. Then, the trained classifiers are used in the construction of a new training data set which has exactly the same format as, but is substantially smaller than, the initial one. The new data set is used to train the final classifier through an iterative process, which is guided by a threshold concerning the size of the data set and the achieved increase in accuracy. We present extensive empirical tests, which demonstrate that the proposed technique significantly reduces time complexity, usually at the expense of a lower accuracy, compared to a single classifier supplied with all the data.
In the process of selecting a machine learning algorithm to solve a problem, questions like the following commonly arise: (1) Are some algorithms basically the same, or are they fundamentally different? (2) How different? (3) How do we measure that difference? (4) If we want to combine algorithms, what algorithms and combinators should be tried? This research proposes COD (Classifier Output Difference) distance as a diversity metric. COD separates difference from accuracy, COD goes beyond accuracy to consider differences in output behavior as the basis for comparison. The paper extends earlier on COD by giving a basic comparison to other diversity metrics, and by giving an example of using COD data as a predictive model from which to select algorithms to include in an ensemble. COD may fill a niche in metalearning as a predictive aid to selecting algorithms for ensembles and hybrid systems by providing a simple, straightforward, computationally reasonable alternative to other approaches.
The performance of existing traditional Chinese medicine (TCM) recommendation models is generally poor because of their weak generalization ability, overfitting, and inability to use known biological networks. Therefore, building a TCM recommendation model based on artificial intelligence has currently become an important bioinformatics task. This study aimed to design a multitask meta-learning model with good biological interpretation for TCM formula recommendation (MBI-TCMR) for deep learning regularization. This method was based on the known biological network structure to sparse the deep learning network, solve the overfitting problem of the model, and enhance the biological interpretability of the model. Furthermore, a multi-learning framework based on meta-learning was also proposed. The framework allowed the MBI-TCMR model to mine knowledge of TCM formulas and quickly adapt to different types of TCM formula recommendation tasks. Finally, we used a gradient-based deep learning feature backtracking method to calculate the feature weight for each neuron. This weight could provide valuable explanatory information for researchers to study how the model made its medicine recommendations. We designed three independent experiments. The experimental results showed that the hit ratio (HR), AUC, and recall and precision value of the MBI-TCMR model outperformed the existing TCM formula recommendation models. The MBI-TCMR model’s HR of top 1–10 reached 0.15–0.9 (Gynecologic Disease Dataset). HR was 10 for the MBI-TCMR model, which was an improvement of 11.1% compared with the best baseline model. The bio-enrichment analysis showed that the model exhibited good bio-interpretation. In summary, this study proposed a novel TCM formula recommendation model, which expanded the application of the artificial intelligence model and achieved good results.
One of the main challenges of today's data mining systems is their ability to manage a huge volume of data generated possibly by different sources. On the other hand, inductive learning algorithms have been extensively researched in machine learning using small amounts of judiciously chosen laboratory examples. There is an increasing concern in classifiers handling data that are substantially larger than available main memory on a single processor. One approach to the problem is to combine the results of different classifiers supplied with different subsets of the data, in parallel. In this paper, we present an efficient algorithm for combining partial classification rules. Moreover, the proposed algorithm can be used to match classification rules in a distributed environment, where different subsets of data may have different domains. The latter is achieved by using given concept hierarchies for the identification of matching classification rules. We also present empirical tests that demonstrate that the proposed algorithm has a significant speedup with respect to the analog non-distributed classification algorithm, at a cost of a lower classification accuracy.
Under a federated learning environment, the training samples are generally collected and stored locally on each client’s device, which makes the machine learning procedure not meet the requirement of independent and identical distribution (IID). Existing federated learning methods to deal with non-IID data generally assume that the data is globally balanced. However, real-world multi-class data tend to exhibit long-tail distribution, where the majority of samples are in a few head classes and a large number of tail classes only have a small amount of data. This paper, therefore, focuses on addressing the problem of handling non-IID and globally long-tailed data in a federated learning scenario. Accordingly, we propose a new federated learning method called Federated meta re-weighting networks (FedReN), which assigns weights during the local training process from the class-level and instance-level perspectives, respectively. To deal with data non-IIDness and global long-tail, both of the two re-weighting functions are globally trained by the meta-learning approach to acquire the knowledge of global long-tail distribution. Experiments on several long-tailed image classification benchmarks show that FedReN outperforms the state-of-the-art federated learning methods. The code is available at https://github.com/pxqian/FedReN.
Trading cryptocurrencies (digital currencies) are currently performed by applying methods similar to what is applied to the stock market or commodities; however, these algorithms are not necessarily well-suited for predicting cryptocurrency prices. Unlike stock exchanges, which shut down for several hours or days at a time, digital currency prediction and trading seem to be of a more consistent and predictable nature. In this work, we benefit from sentiment analysis of tweets using both an existing sentiment analysis package and a manually tailored “objective analysis,” to calculate one impact value for each analysis every 15min. We then select the most appropriate training method by applying evolutionary techniques and discover the best subset of the generated features to include, as well as other parameters. One of the unique contributions of this work is the analysis of both English and Japanese tweets with a tailored “objective analysis” tool. This resulted in implementation of predictors which yielded 28% to 122% profit in a four-week simulation, much more than simply holding a digital currency for the same period of time.
Few-shot learning can solve new learning tasks in the condition of fewer samples. However, currently, the few-shot learning algorithms mostly use the ResNet as a backbone, which leads to a large number of model parameters. To deal with the problem, a lightweight backbone named DenseAttentionNet which is based on the Convolutional Block Attention Mechanism is proposed by comparing the parameter amount and the accuracy of few-shot classification with ResNet-12. Then, based on the DenseAttentionNet, a few-shot learning algorithm called Meta-DenseAttention is presented to balance the model parameters and the classification effect. The dense connection and attention mechanism are combined to meet the requirements of fewer parameters and to achieve a good classification effect for the first time. The experimental results show that the DenseAttentionNet, not only reduces the number of parameters by 55% but also outperforms other classic backbones in the classification effect compared with the ResNet-12 benchmark. In addition, Meta-DenseAttention has an accuracy of 56.57% (5way-1shot) and 72.73% (5way-5shot) on the miniImageNet, although the number of parameters is only 3.6M. The experimental results also show that the few-shot learning algorithm proposed in this paper not only guarantees classification accuracy but also has the characteristics of lightweight.
The study and treatment of cancer is traditionally specialized to the cancer’s site of origin. However, certain phenotypes are shared across cancer types and have important implications for clinical care. To date, automating the identification of these characteristics from routine clinical data - irrespective of the type of cancer - is impaired by tissue-specific variability and limited labeled data. Whole-genome doubling is one such phenotype; whole-genome doubling events occur in nearly every type of cancer and have significant prognostic implications. Using digitized histopathology slide images of primary tumor biopsies, we train a deep neural network end-to-end to accurately generalize few-shot classification of whole-genome doubling across 17 cancer types. By taking a meta-learning approach, cancer types are treated as separate but jointly-learned tasks. This approach outperforms a traditional neural network classifier and quickly generalizes to both held-out cancer types and batch effects. These results demonstrate the unrealized potential for meta-learning to not only account for between-cancer type variability but also remedy technical variability, enabling real-time identification of cancer phenotypes that are too often costly and inefficient to obtain.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.