Please login to be able to save your searches and receive alerts for new content matching your search criteria.
With the advent of the big data era, data-driven decision-making and analysis are increasingly valued in various fields. Especially in the field of education, how to use big data technology to better understand student needs, optimize the education process, and improve education quality has become an important research topic. This paper will explore the application of decision trees and related analysis algorithms in the analysis of college students’ physical fitness, in order to provide scientific basis for improving the physical health level of college students. This paper studies the application of DT (decision tree) and correlation analysis algorithm in the analysis of college students’ physical fitness. In this paper, the method of big data and DM (data mining) is proposed to extract the rules contained in the data information, so as to directly provide auxiliary decision-making for physical fitness test and analysis. The research results show that through training the training set, a good classification accuracy rate is achieved, and through optimizing the depth, the accuracy rate can reach more than 85.033%. Using DM technology as a carrier, this paper digs into the rules behind the new knowledge of college students’ physical fitness, and digs out the previously unknown, implied and potentially useful information and knowledge.
In recent years, with the rise of innovation and entrepreneurship, the decision tree ID3 algorithm is more and more widely used in innovation and entrepreneurship analysis models. The decision tree ID3 algorithm is an effective data mining technology, which can mine useful information from a large number of data and provide a scientific basis for innovation and entrepreneurship. The traditional ID3 algorithm has obvious convergence characteristics, which limits its application in innovation and entrepreneurship analysis models. Therefore, it is important to optimize the ID3 algorithm to improve its computational efficiency and accuracy. From the perspective of the innovation and entrepreneurship analysis model, the application of the decision tree ID3 algorithm in innovation and entrepreneurship is studied. The basic principle, advantages, and disadvantages of the ID3 algorithm are analyzed, and a method to optimize the ID3 algorithm is proposed. Through the experimental verification, the application effect of the optimized ID3 algorithm in the analysis model of innovation and entrepreneurship is proved. When the number of records is 5200, the error rate of the optimization algorithm is 70%, while the error probability of the classical algorithm is 95%. When the number of records is 7200, the error rate of the optimized calculation method is 130%, while the error rate of the classic ID3 calculation method is 170%. As the number of records increases, the error rates of both optimization algorithms and classical algorithms will increase. As the number of records increases, the error rate of the optimized ID3 algorithm increases less than that of the classical algorithm. The optimized ID3 algorithm can significantly improve computational efficiency and accuracy and provide an effective data mining technology for innovation and entrepreneurship analysis models.
Data mining technology can solve the hidden rules of data when solving problems, which has great advantages. With the increasing maturity of data mining technology, its application in teaching is also more and more. This paper aims to analyze the specific impact of campus football activities on students’ mental health quality through the application of data mining algorithms, especially decision trees and association rule algorithms. On this basis, a data mining algorithm is applied to study the influence of campus football activities on students’ sports quality and mental health state. First, the present situation of campus football is expounded. First, the development status of campus football is described, and the research of data mining algorithms is summarized. This paper establishes an analysis model of the influence of campus football on students’ mental quality and uses a decision tree and association rule algorithm in a data mining algorithm to analyze students’ sports quality and mental health state. The algorithm in this article adopts the method of one-time scanning of the database, and after generating new frequent itemsets, the database continuously decreases. When the pruning severity is set to 30–60, higher accuracy can be achieved, and the pruning severity is set to 40 in the design. Subsequent operations do not require scanning again, which can occupy less space and reduce time complexity. The algorithm test shows that the design accuracy and accuracy of the algorithm have been improved, which can meet the needs. The designed analysis of students’ sports quality and mental health state based on the data mining algorithm can provide effective data for the decision-making of campus football activities.
This study endeavors to introduce a method for measuring stock market investment risk, leveraging data mining techniques alongside decision trees (DTs). By harnessing extensive stock market data and integrating steps such as data cleaning, feature selection, and model construction within data mining technology, an effective risk measurement model is formulated. Specifically, DTs serve as the primary modeling tool, adept at capturing intricate relationships and nonlinear characteristics prevalent within the stock market, thereby facilitating precise measurement of investment risks. Through empirical analysis, the efficacy and viability of the proposed method in risk measurement are substantiated, furnishing investors with a pivotal decision-making reference. Overall, this study contributes to the ongoing discourse on stock market risk assessment by integrating advanced data mining methodologies, thereby enhancing the accuracy and reliability of risk evaluation in investment decision-making processes.
An indicator system for financial monitoring is known as a financial risk early warning system. This system makes use of statistical data to make predictions about the size of the possibility of shocks occurring at a certain point in time. Examples of shocks include a regional or national economic crisis or a collapse of the stock market. The early warning system for enterprise financial risk presents additional challenges, such as data spikes and uncertainty in the integration of risk information. It is the responsibility of the Financial Crisis early warning system to analyze and summarize significant data on the financial status of an organization, as well as to provide technical support for the purpose of making financial decisions via the use of Data Mining Technologies (DMT and other related technologies). It has been determined that the financial warning model that is based on Decision Tree (DT) integration is more accurate. This suggests that the model has the potential to increase the correct identification rate of companies that are experiencing a financial crisis, provided that the overall warning accuracy is enhanced. Hence, the proposed method, the DMT-DT, using early warning systems, includes automated solutions used to track borrowers’ credit condition to help monitor and appraise credit portfolios. Deloitte can assist in developing, implementing, improving, and maintaining various indicators for detecting early warning signals of potential financial situations. People in the community and individuals in danger can be better prepared to take timely and appropriate action with the help of people-centered early warning systems, that aim to lessen the likelihood of harm to people, property, and the environment.
In reaction to the essential demand for complete evaluation methodologies in evaluating the high-quality of higher training specializing in employment and entrepreneurship (Em&En) prospects for university students, this studies affords a novel integration of decision tree and neural community fashions, bolstered with the aid of genetic set of rules parameter tuning. The convergence of those techniques addresses the constraints of standalone techniques and harnesses their respective strengths. With a backdrop of escalating opposition within the job market and the need for universities to conform to their curricula to satisfy evolving enterprise needs, we take a look at endeavors to provide an advanced framework for comparing the efficacy of tutorial packages in fostering students’ career readiness and entrepreneurial acumen. Leveraging the robustness of decision timber in dealing with based records along the complicated sample popularity abilities of neural networks, our hybrid model captures elaborate relationships in the academic landscape. Furthermore, the software of genetic algorithms facilitates the satisfactory tuning of model parameters, optimizing predictive accuracy and generalization performance. Empirical reviews on pertinent datasets exhibit the efficacy of our proposed method, exhibiting advanced predictive overall performance and discerning insights into factors influencing educational best vis-à-vis Em&En consequences. These pioneering studies do not solely contribute to the advancement of evaluative methodologies in instructional contexts, however, additionally hold vast implications for educators in enhancing the efficacy of better schooling structures worldwide.
With the advancement of smart grid technology, the issue of power system network security has become increasingly critical. To fully utilize the power grid’s vast data resources and enhance the efficiency of anomaly detection, this paper proposes an improved decision tree (DT)-based automatic identification approach for anomalies in electric power big data. The method employs six-dimensional features extracted from the dimensions of volatility, trend, and variability to characterize the time series of power data. These features are integrated into a hybrid DT-SVM-LSTM framework, combining the strengths of DTs, support vector machines, and long short-term memory networks. Experimental results demonstrate that the proposed method achieves an accuracy of 96.8%, a precision of 95.3%, a recall of 94.8%, and an F1-score of 95.0%, outperforming several state-of-the-art methods cited in the literature. Moreover, the approach exhibits strong robustness to noise, maintaining high detection accuracy even under low signal-to-noise ratio conditions. These findings highlight the effectiveness of the method in efficiently detecting anomalies and addressing noise interference.
Microfluidic technology has achieved a rapid technological advancement and become well admired in miniaturized laboratory works. Effectiveness of laboratory works such as protein and glucose analysis, and pharmaceutical study depends on convenient sample preparation. These processes involve dilution of the primary reactant with buffer fluid in an appropriate proportion to ensure error-free bioassay operations in the future. In this paper, a decision tree-based method is proposed to construct the mixing tree for exploring all possible combinations of concentration values to generate the target sample for different biochemical experiments. This work is also extended for multi-target sample preparation concurrently to reduce the sample preparation time and cost. The simulated result shows that the proposed technique not only reduces the mixing/dilution operations for single-target sample preparation but also minimizes the use of primary reactant compared to the contemporary approaches and an enhanced outcome is also achieved for multi-target sample production.
Epilepsy is a global disease with considerable incidence due to recurrent unprovoked seizures. These seizures can be noninvasively diagnosed using electroencephalogram (EEG), a measure of neuronal electrical activity in brain recorded along scalp. EEG is highly nonlinear, nonstationary and non-Gaussian in nature. Nonlinear adaptive models such as empirical mode decomposition (EMD) provide intuitive understanding of information present in these signals. In this study a novel methodology is proposed to automatically classify EEG of normal, inter-ictal and ictal subjects using EMD decomposition. EEG decomposition using EMD yields few intrinsic mode functions (IMF), which are amplitude and frequency modulated (AM and FM) waves. Hilbert transform of these IMF provides AM and FM frequencies. Features such as spectral peaks, spectral entropy and spectral energy in each IMF are extracted and fed to decision tree classifier for automated diagnosis. In this work, we have compared the performance of classification using two types of decision trees (i) classification and regression tree (CART) and (ii) C4.5. We have obtained the highest average accuracy of 95.33%, average sensitivity of 98%, and average specificity of 97% using C4.5 decision tree classifier. The developed methodology is ready for clinical validation on large databases and can be deployed for mass screening.
With the increasing popularity of online social networking platforms, the amount of social data has grown exponentially. Social data analysis is essential as spamming activities and spammers are escalating over online social networking platforms. This paper focuses on spammer detection on the Twitter social networking platform. Although existing researchers have developed numerous machine learning methods to detect spammers, these methods are inefficient for appropriately detecting spammers on Twitter due to the imbalance of spam and nonspam data distribution, the involvement of diverse features and the applicability of data mechanisms by spammers to avoid their detection. This research work proposes a novel hybrid approach of the gravitational search algorithm and the decision tree (HGSDT) for detecting Twitter spammers. The individual decision tree (DT) algorithm is not able to address the challenges as it is unstable and ineffective for the higher level of favorable data for a particular attribute. The gravitational search algorithm (GSA) constructs the DTs with improved performance as the gravitational forces act as the information-transferring agents through mass agents. Moreover, the GSA is efficient in handling the data of higher dimensional search space. In the HGSDT approach, the construction of the DT and splitting of nodes are performed with the heuristic function and Newton’s laws. The performance of the proposed HGSDT approach is determined for the Social Honeypot dataset and 1KS-10KN dataset by conducting three different experiments to analyze the impact of training data size, features and spammer ratio. The result of the first experiment shows the need of a higher proportion of training data size, the second experiment signifies the more importance of textual content-based features compared to the other feature categories and the third experiment indicates the requirement of balanced data to attain the effective performance of the proposed approach. The overall performance comparison indicates that the proposed HGSDT approach is superior to the incorporated machine learning methods of DT, support vector machine and back propagation neural network for detecting Twitter spammers.
Online social media has made the process of disseminating news so quick that people have shifted their way of accessing news from traditional journalism and press to online social media sources. The rapid rotation of news on social media makes it challenging to evaluate its reliability. Fake news not only erodes public trust but also subverts their opinions. An intelligent automated system is required to detect fake news as there is a tenuous difference between fake and real news. This paper proposes an intelligent gravitational search random forest (IGSRF) algorithm to be employed to detect fake news. The IGSRF algorithm amalgamates the Intelligent Gravitational Search Algorithm (IGSA) and the Random Forest (RF) algorithm. The IGSA is an improved intelligent variant of the classical gravitational search algorithm (GSA) that adds information about the best and worst gravitational mass agents in order to retain the exploitation ability of agents at later iterations and thus avoid the trapping of the classical GSA in local optimum. In the proposed IGSRF algorithm, all the intelligent mass agents determine the solution by generating decision trees (DT) with a random subset of attributes following the hypothesis of random forest. The mass agents generate the collection of solutions from solution space using random proportional rules. The comprehensive prediction to decide the class of news (fake or real) is determined by all the agents following the attributes of random forest. The performance of the proposed algorithm is determined for the FakeNewsNet dataset, which has sub-categories of BuzzFeed and PolitiFact news categories. To analyze the effectiveness of the proposed algorithm, the results are also evaluated with decision tree and random forest algorithms. The proposed IGSRF algorithm has attained superlative results compared to the DT, RF and state-of-the-art techniques.
In today’s interconnected digital world, email remains a primary mode of digital interaction valued for its suitability in official, academic and business communications. However, despite its utility, email faces significant challenges due to the widespread presence of spam in various forms, such as phishing, suspicious attachments and deceptive content. This issue not only affects the efficiency and security of email communication but also poses a barrier to its reliability. Therefore, it is essential to devise effective methods to tackle the escalating count of spam emails. This research work presents an intriguing methodology to combat the persistent problem of email spam. The proposed method, abbreviated as AIGSADT, is an amalgamation of the intelligent variant of the gravitational search algorithm (IGSA) and decision trees (DTs). The machine learning-based DT algorithm is individually inadequate for dealing with the large amount of constructive data on a certain attribute due to its instability and ineffectiveness. The proposed AIGSADT approach integrates the IGSA algorithm, which is effective in handling large amounts of data to detect email spam. This is achieved by constructing decision trees employing gravitational forces as the means of information transfer through mass agents. Here, the intelligent factor of the IGSA algorithm prevents the trapping of GSA agents in local optima by updating their positions based on information provided by the best and worst agents. The performance of the presented AIGSADT approach is analyzed through experiments conducted on various categories available in the Ling spam dataset. These experimental evaluations aim to analyze the significance of different pre-processing modules across different dataset categories. The comparative analysis indicates the supremacy of the proposed approach compared to state-of-the-art methodologies.
This research pertains to classification of the heart sound using digital Phonocardiogram (PCG) signals targeted to screen for heart ailments. In this study, an existing variant of the decision tree, i.e. XgBoost has been used with unsegmented heart sound signal. The dataset provided by PhysioNet Computing in Cardiology (CinC) Challenge 2016 has been used to validate the technique proposed in this research work. The said dataset comprises six databases (A–F) having 3240 heart sound recordings in all with the duration lasting from 5–120 s. The approach proposed in this paper has been compared with 18 existing methodologies. The proposed method is accurate with the mean score of 92.9, while sensitivity and specificity scores are 94.5 and 91.3, respectively. The timely prediction of heart health will support specialists to attain useful risk stratification of patients and also assist clinicians in effective decision-making. These predictive facts may serve as a guide to provide improved quality of care to the patients by way of effective treatment planning and monitoring.
This paper examines the classification capability of different Gabor representations for human face recognition. Usually, Gabor filter responses for eight orientations and five scales for each orientation are calculated and all 40 basic feature vectors are concatenated to assemble the Gabor feature vector. This work explores 70 different Gabor feature vector extraction techniques for face recognition. The main goal is to determine the characteristics of the 40 basic Gabor feature vectors and to devise a faster Gabor feature extraction method. Among all the 40 basic Gabor feature representations the filter responses acquired from the largest scale at smallest relative orientation change (with respect to face) shows the highest discriminating ability for face recognition while classification is performed using three classification methods: probabilistic neural networks (PNN), support vector machines (SVM) and decision trees (DT). A 40 times faster summation based Gabor representation shows about 98% recognition rate while classification is performed using SVM. In this representation all 40 basic Gabor feature vectors are summed to form the summation based Gabor feature vector. In the experiment, a sixth order data tensor containing the basic Gabor feature vectors is constructed, for all the operations.
This paper deals with a supervised classification method, using Galois Lattices based on a navigation-based strategy. Coming from the field of data mining techniques, most literature on the subject using Galois lattices relies on selection-based strategies, which consists of selecting/choosing the concepts which encode the most relevant information from the huge amount of available data. Generally, the classification step is then processed by a classical classifier such as the k-nearest neighbors rule or the Bayesian classifier. Opposed to these selection-based strategies are navigation-based approaches which perform the classification stage by navigating through the complete lattice (similar to the navigation in a classification tree), without applying any selection operation. Our approach, named Navigala, proposes an original navigation-based approach for supervised classification, applied in the context of noisy symbol recognition. Based on a state of the art dealing with Galois Lattices classification based methods, including a comparison between possible selection and navigation strategies, this paper proposes a description of NAVIGALA and its implementation in the context of symbol recognition. Some objective quantitative and qualitative evaluations of the approach are proposed, in order to highlight the relevance of the method.
Multi-label learning (MLL) problems abound in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. Issues that severely limit the applicability of many current machine learning approaches to MLL are the large-scale problem, which have a strong impact on the computational complexity of learning. These problems are especially pronounced for approaches that transform MLL problems into a set of binary classification problems for which Support Vector Machines (SVMs) are used. On the other hand, the most efficient approaches to MLL, based on decision trees, have clearly lower predictive performance. We propose a hybrid decision tree architecture, where the leaves do not give multi-label predictions directly, but rather utilize local SVM-based classifiers giving multi-label predictions. A binary relevance architecture is employed in the leaves, where a binary SVM classifier is built for each of the labels relevant to that particular leaf. We use a broad range of multi-label datasets with a variety of evaluation measures to evaluate the proposed method against related and state-of-the-art methods, both in terms of predictive performance and time complexity. Our hybrid architecture on almost every large classification problem outperforms the competing approaches in terms of the predictive performance, while its computational efficiency is significantly improved as a result of the integrated decision tree.
For the virtual reality 360∘ videos, equirectangular projection (ERP) is a commonly used projection format. However, its high resolution brings extraordinary huge computational complexity in encoding. In order to speed up the intra coding process, a fast coding unit (CU) partitioning algorithm based on regional decision tree is proposed in this paper. The frame image is divided into two regions from a statistical point of view, and the earlysplit and pruned decision trees are established using light weight sample attributes for each region. With the help of these decision trees, the CU partitioning process is accelerated. Compared with the original algorithm of HM16.20, the proposed algorithm can reduce the encoding time by 28%, while BD-rate only increases by 0.27%.
Supervisory Control and Data Acquisition (SCADA) system in the modern industrial automation control network is facing an increasing number of serious security threats. In order to meet the security defense requirements of oil and gas SCADA system, an intelligent security defense model based on multi-agent was designed by analyzing the security risks in oil and gas SCADA system and combining the advantages of multi-agent technology in distributed intrusion detection system. First, the whole structure of this model was divided into three layers: monitoring layer, decision layer and control layer. Then, the defense model was verified by C4.5 decision tree algorithm, and obtained a good result. Finally, the security defense prototype system of large-scale oil and gas SCADA system based on this model was realized. Results demonstrate that the application of multi-agent technology in the security defense of oil and gas SCADA system can achieve more comprehensive defense, more accurate detection, which can handle large-scale distributed attacks and improve the robustness and stability of security defenses. This study makes full use of the multi-agent architecture and has the advantage of accurate detection, high detection efficiency and timely response.
The automatic detection and recognition for motorcycle license plates present a very challenging task since they appear more compact and versatile than vehicle license plates. In this paper, we present an efficient detection and recognition system for motorcycle license plates based on decision tree and deep learning. It can be successfully carried out under various conditions, such as frontal, horizontally or vertically skewed, blurry, poor illumination, large viewing distances or angles, distortions, multiple license plates in an image, at night or interfered with brake lights, and headlights. Experimental results show that our system performs the best when testing with multiple license plates images under different conditions as compared against six state-of-the-art methods. Furthermore, our detection and recognition system have shown more accurate results than three commercial automatic license plate recognition systems in evaluation using accuracy, precision, recall, and F1 rates.
A new algorithm for constructing entropy reduction based decision tree classifier in two steps for large reference-class sets is proposed. The d-dimensional feature space is first mapped onto a line thus allowing a dynamic choice of features and then the resultant linear space is partitioned into two sections while minimizing the average system entropy. Classes of each section, again considered as a collection of references classes in a d-dimensional feature space, can be further split in a similar manner should the collection still be considered excessively large, thus forming a binary decision tree of nodes with overlapping members. The advantage of using such a classifier is that the need to match a test feature vector against all the references exhaustively is avoided. We demonstrate in this paper that discrete syllable recognition with dynamic programming equipped with such a classifier can reduce the recognition time by a factor of 40 to 100. The recognition speed is one third to one half of that using hidden Markov models (HMM), while the recognition rate is somewhat higher. The theory is considerably simpler than that of HMM but the decision tree can occupy a lot of memory space.