The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.
This paper represents a multivariate hybrid approach, combining Multi-Objective Optimization on the basis of Ratio Analysis (MOORA) and Principal Component Analysis (PCA) to optimize different correlated responses during Wire Electrical Discharge Machining (WEDM) process of titanium grade 6. The response parameters selected are the average cutting speed, average Kerf width and average surface roughness (Ra). All of them have been studied in terms of pulse-ON time, pulse-OFF time, wire feed and wire tension. As indicated by Taguchi’s signal-to-noise ratio, the optimum process parameters were achieved for the desired average cutting speed, average Kerf width and average surface roughness, respectively. At last, the optimum combination of process parameters was validated by affirmation test which gave considerably improved various quality characteristics. Confirmation test outcome revealed that multivariate hybrid approach MOORA coupled with PCA was a competent strategy to decide available cutting parameters for a desired response quality for WEDM of titanium grade 6.
The development of efficient stroke-detection methods is of significant importance in today's society due to the effects and impact of stroke on health and economy worldwide. This study focuses on Human Activity Recognition (HAR), which is a key component in developing an early stroke-diagnosis tool. An overview of the proposed global approach able to discriminate normal resting from stroke-related paralysis is detailed. The main contributions include an extension of the Genetic Fuzzy Finite State Machine (GFFSM) method and a new hybrid feature selection (FS) algorithm involving Principal Component Analysis (PCA) and a voting scheme putting the cross-validation results together. Experimental results show that the proposed approach is a well-performing HAR tool that can be successfully embedded in devices.
Cable-stayed bridges are widely used all around the world. Unfortunately, during their service life, they are exposed to adverse conditions that may cause their deterioration and, consequently, their collapse. Vibration-based structural health monitoring techniques have become the most promising alternatives for efficiently detecting and locating damage into civil structures. In this regard, this paper presents a new methodology based on statistical features, Principal component analysis (PCA), and Mahalanobis distance (MD) for detecting and locating a cable loss in the Río Papaloapan bridge (RPB) using vibration signals. It is based on the extraction of a set of statistical time features (STFs) from vibration signals, which are analyzed using the autocorrelation function (ACF) to denoise and strengthen the features found in them. Then PCA-based models are computed by using the STFs to enhance the damage location process. Then a new damage index based on MD is proposed to indicate if a damage exists and its location.
Traveling Wire Electro-Chemical Spark Machining (TW-ECSM) process is a new innovative thermal erosion-based machining process suitable for cutting electrically nonconductive materials using tool electrode in the form of wire. This article attempts experimental modeling of TW-ECSM process using a hybrid methodology comprising Taguchi methodology (TM) and response surface methodology (RSM). The experiments were carried out on borosilicate glass using L27 orthogonal array (OA) considering the input parameters like applied voltage, pulse on-time, pulse off-time, electrolyte concentration and wire feed velocity along with process performances such as material removal rate (MRR), surface roughness (Ra) and kerf width (Kw). The interaction influence of input parameters on process performances was also discussed. Further, multi-objective optimization (MOO) of response performances of TW-ECSM process is executed using a coupled approach of grey relational analysis (GRA) and principal component analysis (PCA). The optimal process parameter setting illustrates the improvement of MRR by 171%, diminution of Ra and Kw by 27% and 8% against the initial parameter settings. Moreover, irregular cutting of kerf width and surface characteristics were also scrutinized using scanning electron microscope (SEM).
High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by F1-score. Our method also produces better-than-average execution times compared with the benchmark methods.
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.
Graphene oxide (GO)-doped CFRP composites possess excellent mechanical properties for high-performance products of aircraft, defense, biomedical and chemical trades. This paper highlights a novel hybridization of the combined compromise solution-principal component analysis (CoCoSo-PCA) method to optimize multiple correlated responses during CNC milling of GO-doped epoxy/CFRP. The influence of process constraints like drill speed (S), feed rate (F), Depth of cut (D) and GO wt.% (GO) on machining performances like MRR, cutting force (Fc) and Surface roughness (Ra) has investigated. Taguchi L9 orthogonal array considered for machining (milling) of composite by using Titanium aluminium nitride (TiAlN) milling cutter (ϕ5mm). A multivariate hybrid approach based on combined multiplication rule was utilized to evaluate the ranking of the alternatives decision process and optimize responses. ANOVA reveals that spindle speed (82.24%) is the most influential factor trailed by feed rate (5.02%), depth of cut (0.55%) and GO wt.% (2.17%). This module has fruitfully tackled critical issues such as response priority weight assignment and response correlation. Finally, CoCoSo-PCA shows the higher predicted value of 9.06 and confirmatory test performed on optimum settings as S−1600 RPM, F-160mm/rev D−0.5mm and GO-1%, which show a satisfactory agreement with actual ones for favorable machining environment.
A new method about the multi-fault condition monitoring of slurry pump based on principal component analysis (PCA) and sequential probability ratio test (SPRT) is proposed. The method identifies the condition of the slurry pump by analyzing the vibration signal. The experimental model is established using the normal impeller and the faulty impellers where the collected vibration signals were preprocessed using wavelet packet transform (WPT). The characteristic parameters of the vibration signals are extracted by time domain signal analysis and the dimension of data was reduced by PCA. The principal components with the largest contribution rate are chosen as the inputted signal to SPRT to assess the proposed algorithm. The new methodology is reasonable and practical for the multi-fault diagnosis of slurry pump.
Currently, the entire world is fighting against the Corona Virus (COVID-19). As of now, more than thirty lacs of people all over the world were died due to the COVID-19 till April 2021. A recent study conducted by China suggests that Chest CT and X-ray images can be used as a preliminary test for COVID detection. This paper propose a transfer learning-based mathematical COVID detection model, which integrates a pre-trained model with the Random Forest Tree (RFT) classifier. As the available COVID dataset is noisy and imbalanced so Principal Component Analysis (PCA) and Generative Adversarial Networks (GANs) is used to extract most prominent features and balance the dataset respectively. The Bayesian Cross-Entropy Loss function is used to penalize the false detection differently according to the class sensitivity (i.e., COVID patient should not be classified as Normal or Pneumonia class). Due to the small dataset, a pre-trained model like VGGNet-19, ResNet50 and Inception_ResNet_V2 were chosen to extract features and then trained them over the RFT for the classification task. The experiment results showed that ResNet50 gives the maximum accuracy of 99.51%, 98.21%, and 97.2% for training, validation, and testing phases, respectively, and none of the COVID Chest X-ray images were classified as Normal or Pneumonia classes.
Facial expression recognition is one of the most challenging research areas in the image recognition field and has been actively studied since the 70's. For instance, smile recognition has been studied due to the fact that it is considered an important facial expression in human communication, it is therefore likely useful for human–machine interaction. Moreover, if a smile can be detected and also its intensity estimated, it will raise the possibility of new applications in the future. We are talking about quantifying the emotion at low computation cost and high accuracy. For this aim, we have used a new support vector machine (SVM)-based approach that integrates a weighted combination of local binary patterns (LBPs)-and principal component analysis (PCA)-based approaches. Furthermore, we construct this smile detector considering the evolution of the emotion along its natural life cycle. As a consequence, we achieved both low computation cost and high performance with video sequences.
This article describes new control criteria and robust optimization methodology to balance drilling parameters and machining characteristics. Experimentation was performed according to response surface methodology (RSM) using a TiAlN coated SiC tool. The full drilling force signal and cutting parameters tested are categorized into five stages, indicating the drilling tool-workpiece interactions’ different statuses. Principal component analysis (PCA) assigns real response priority weight during the aggregation of conflicting characteristics. The hybrid module of combined compromise solution and PCA (CoCoSo–PCA) is used to decide the optimal parametric setting. It efficiently undertakes a trade-off between minimal thrust (Th=30.02N), torque (T=0.05Nm) surface roughness (Ra=1.55μm). A regression model between input parameters and output function was established using RSM quadratic model. The validation experiment shows significant improvement, and the proposed module can be recommended for quality-productivity characteristics control.
With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking.
Direct relations between companies and words allow semantic business analytics (e.g., top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies’ similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.
Human immunodeficiency virus-1 (HIV-1) integrase appears to be a crucial target for developing new anti-HIV-1 therapeutic agents. Different quantitative structure–activity relationships (QSARs) algorithms have been used in order to develop efficient model(s) to predict the activity of new pyridinone derivatives against HIV-1 integrase. Multiple linear regression (MLR) and combined principal component analysis (PCA) with MLR have been applied to build QSAR models for a set of new pyridinone derivatives as potent anti-HIV-1 therapeutic agents. Four different approaches based on MLR method including; concrete-MLR, stepwise-MLR, concrete PCA–MLR and stepwise PCA–MLR were utilized for this aim. Twenty two different sets of descriptors containing 1613 descriptors were constructed for each optimized molecule. Comparison between predictability of the “concrete” and “stepwise” procedure in two different algorithms of MLR and PCA models indicated the advantage of the stepwise procedure over that of the simple concrete method. Although the PCA was employed for dimension reduction, using stepwise PCA–MLR model showed that the method has higher ability to predict the compounds’ activity. The stepwise PCA–MLR model showed highly validated statistical results both in fitting and prediction processes (R2test=0.78 and Q2=0.80). Therefore, using stepwise PCA approach is suitable to remove ineffective descriptors, which results in remaining efficient descriptors for building good predictability stepwise PCA–MLR. The stepwise hybrid approach of PCA–MLR may be useful in derivation of highly predictive and interpretable QSAR models.
Financial crisis early warning is an important link in financial management. With its relatively simple structure and excellent classification performance, support vector machine is often used in intelligent warning of financial crises. A financial warning model based on improved support vector machine is proposed to address the shortcomings of current financial warning methods that cannot handle large-scale data well, and the accuracy and efficiency are not ideal. In response to the excessive indicators in the financial early warning indicator system, the principal component analysis method is studied for dimensionality reduction, reducing the computational complexity of the model, and improving the training and operational efficiency of the model. In response to the limited practicality of support vector machine, which can only search for the optimal solution under constraint conditions, the use of smooth support vector machine to replace support vector machine for predictive classification is studied. Finally, a financial crisis early warning model is constructed based on smooth support vector machine using principal component analysis for dimensionality reduction of input data. The performance of the model is tested using an internet listed company as an example. The findings demonstrate that Model 1 outperforms the other three models in terms of accuracy rate (95%), MAE, MSE and U values (0.162, 0.174 and 0.169, respectively). Therefore, the financial crisis early warning model constructed by the research can properly forecast the company’s financial situation, thereby helping the enterprise to develop better.
Singular spectrum analysis (SSA) is a powerful method that is frequently used in dynamical systems theory and time series analysis. However, the algorithm itself is only partially understood. In this paper, we tackle the problem of a thorough interpretation of the complete basic SSA algorithm. We point out the relationship between SSA and Fourier analysis (FA), linear filtering (LF), principal component analysis (PCA), and power spectrum density estimation (PSDE). The main contribution is a reformulation and comprehensive interpretation of SSA by means of a novel class of data-adaptive circulant frames. Based on this class of frames, we introduce a formalism for time series decomposition and reconstruction. This mathematical framework generalizes principle component analysis and SSA. We show that SSA, PCA, FA, LF, and PSDE can be found from special cases of the proposed frame structures. This way we are able to provide a formal description of SSA and its relationship to the aforementioned methods. Whilst previous approaches to the interpretation of SSA relied on approximations, we provide exact mathematical descriptions of the relation between the above-mentioned techniques.
The commercial quality of Japanese Angelica radices — Angelica acutiloba Kitagawa (Yamato-toki) and A. acutiloba Kitagawa var. sugiyama Hikino (Hokkai-toki) — used in Kampo traditional herbal medicines, was studied by use of omics technologies. Complementary and alternative medical providers have observed in their clinical experience that differences in radix commercial quality reflect the differences in pharmacological responses; however, there has been little scientific examination of this phenomenon. The approach of omics, including metabolomics, transcriptomics, genomics, and informatics revealed a distinction between the radix-quality grades based on their metabolites, gene expression in human subjects, and plant genome sequences. Systems biology, constructing a network of omics data used to analyze this complex system, is expected to be a powerful tool for enhancing the study of radix quality and furthering a comprehensive understanding of all medicinal plants.
Different eigenspace-based approaches have been proposed for the recognition of faces. They differ mostly in the kind of projection method being used and in the similarity matching criterion employed. The aim of this paper is to present a comparative study between some of these different approaches. This study considers theoretical aspects as well as experiments performed using a face database with a few number of classes (Yale) and also with a large number of classes (FERET).
In this paper, we examine the properties of the Jones polynomial using dimensionality reduction learning techniques combined with ideas from topological data analysis. Our data set consists of more than 10 million knots up to 17 crossings and two other special families up to 2001 crossings. We introduce and describe a method for using filtrations to analyze infinite data sets where representative sampling is impossible or impractical, an essential requirement for working with knots and the data from knot invariants. In particular, this method provides a new approach for analyzing knot invariants using Principal Component Analysis. Using this approach on the Jones polynomial data, we find that it can be viewed as an approximately three-dimensional subspace, that this description is surprisingly stable with respect to the filtration by the crossing number, and that the results suggest further structures to be examined and understood.
The objective of this study is to analyze and assess multi criteria quality of products by an integrated multivariate approach. The integrated multivariate method is based on data envelopment analysis (DEA), principle component analysis (PCA) and numerical taxonomy (NT). To achieve the objective of this study 51-product quality indicators were identified. These indicators (inputs and outputs) were classified as direct and indirect product quality indices and they were classified according to balanced score card (BSC) arrangements. To show the applicability of the proposed approach, five random indicators were considered for seventeen workshops within a large machinery manufacturer. Moreover, PCA, DEA and NT were applied to the set of data. Furthermore, PCA and NT are used to verify and validate the findings of DEA. The results (ranking) of the three approaches were then compared by non-parametric Spearman and Kendall Tau correlation techniques. The results of the non-parametric analysis show should high level of correlation between the three approaches. Previous studies evaluate quality characteristics based on a set of selected criteria that does not reflect total quality characteristics, whereas this study proposes a total multi criteria quality approach to overcome these shortcomings. Moreover, this is the first study to utilize and apply an integrated multivariate approach based on DEA, PCA and Numerical Taxonomy for assessment, ranking and verification and validation of industrial units based on multi criteria quality characteristics. This means that DEA is used for ranking, PCA is used for evaluation of the importance of each indicator and NT is used for validation and verification purpose. The approach of this study may be applied to other manufacturers for total quality assessment of quality characteristics.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.