In order to meet the challenges of increasingly diverse clothing styles and consumer demands in the market, traditional fabric design methods have become difficult to meet the needs of rapid iteration and innovation due to their time-consuming, costly, and subjective preferences of designers. In view of this, we propose an innovative fabric design optimization simulation strategy aimed at breaking through these bottlenecks through technological means. This strategy cleverly integrates advanced technologies of variational autoencoder (VAE) and generative adversarial network (GAN). First, VAE is used to capture and learn the complex distribution characteristics of existing clothing fabric designs, which include key information such as fabric texture, color matching, and structural details. Subsequently, GAN uses the hidden vectors obtained from VAE as input to generate brand new fabric design samples. During the training phase, GAN continuously iterates and optimizes, engaging in intense “adversarial” interactions between its generator and discriminator. The generator is dedicated to creating new samples that are as close to real fabric designs as possible, while the discriminator is responsible for identifying the authenticity of these samples. This process is implemented through backpropagation (BP) loss function, ensuring that the generated fabric design can visually simulate real fabrics. Experimental verification shows that this method can not only effectively generate high-quality and realistic clothing fabric designs, but also greatly shorten the design cycle and reduce costs.
This paper presents a novel approach for clustering English verb forms, denoted as VBs, a task crucial for advancing linguistic analysis and natural language processing (NLP). We propose a methodology that utilizes a variational autoencoder (VAE) to capture the low-dimensional semantic representations of verbs, derived from high-dimensional text data sourced from extensive English corpora. The VAE comprises an encoder to reduce text data to a latent space and a decoder to reconstruct the input, ensuring meaningful feature extraction. After this, an improved density peak clustering (DPC) algorithm is applied to these representations, which automatically determines the number and center of clusters based on semantic attributes. We conducted comprehensive experiments to demonstrate that our method surpasses traditional clustering methods on several metrics, including the profile coefficient and the adjusted Rand index. This study not only offers a robust technique for verb form clustering but also contributes significantly to computational linguistics by providing a more nuanced understanding of verb semantics.
Collecting high-quality medical image data for machine learning applications remains a significant challenge due to data scarcity, privacy concerns, and high annotation costs. To address these issues, vision generative models, particularly Latent Diffusion Models (LDMs), have emerged as state-of-the-art solutions that reduce computational demands while maintaining superior performance in data generation tasks. In this study, we propose an enhanced LDM-based approach that integrates separable self-attention mechanisms within the diffusion process, positioned after residual blocks, to improve the capture of detailed features and maintain spatial consistency. This modification reduces memory usage by 82.94% and decreases the Fréchet Inception Distance (FID) by 25.01% compared to traditional self-attention models, all while preserving image quality. Our method addresses critical challenges such as data scarcity and computational efficiency in medical imaging by combining Variational Autoencoders (VAEs) for latent space mapping with U-Net for noise prediction. Evaluations on five datasets — PneumoniaMNIST, BloodMNIST, ChestMNIST, Dental4k, and HandMNIST — demonstrate significant improvements in computational efficiency, memory usage, and the quality of generated images, showcasing the potential of our approach for scalable and effective medical image synthesis.
Machine learning techniques can help to represent and solve quantum systems. Learning measurement outcome distribution of quantum ansatz is useful for characterization of near-term quantum computing devices. In this work, we use the popular unsupervised machine learning model, variational autoencoder (VAE), to reconstruct the measurement outcome distribution of quantum ansatz. The number of parameters in the VAE are compared with the number of measurement outcomes. The numerical results show that VAE can efficiently learn the measurement outcome distribution with few parameters. The influence of entanglement on the task is also revealed.
In recent years, semi-supervised learning has been investigated to take full advantages of increasing unlabeled data. Although pretrained deep learning models are successfully adopted on a massive amount of unlabeled data, they may not be applicable in specific domains as the data is limited. In this paper, we propose a model, termed Semi-supervised Variational AutoEncoder (SVAE), which consists of Gated Convolutional Neural Networks (GCNN) as both the encoder and the decoder. Since the canonical VAE suffers from Kullback–Leibler (KL) vanishing problem, we attach a layer named Scalar after Batch Normalization (BN) to scale the output of the BN. We conduct experiments on two domain-specific datasets with a small amount of data. The results show that SVAE outperforms other alternative baselines for language modeling and semi-supervised learning studies. Especially, the results in the language modeling validate the effect of combining BN and Scalar for tackling the KL vanishing problem. Moreover, the visualization of the latent representations verifies the performance of SVAE on less data.
When loaded with difficulties in fulfilling daily requirements, a lot of people in today’s world experience an emotional pressure known as stress. Stress that lasts for a short duration of time has more advantages as they are good for mental health. But, the persistence of stress for a long duration of time may lead to serious health impacts in individuals, such as high blood pressure, cardiovascular disease, stroke and so on. Long-term stress, if unidentified and not treated, may also result in personality disorder, depression and anxiety. The initial detection of stress has become more important to prevent the health issues that arise due to stress. Detection of stress based on brain signals for analysing the emotion in humans leads to accurate detection outcomes. Using EEG-based detection systems and disease, disability and disorders can be identified from the brain by utilising the brain waves. Sentiment Analysis (SA) is helpful in identifying the emotions and mental stress in the human brain. So, a system to accurately and precisely detect depression in human based on their emotion through the utilisation of SA is of high necessity. The development of a reliable and precise Emotion and Stress Recognition (ESR) system in order to detect depression in real-time using deep learning techniques with the aid of Electroencephalography (EEG) signal-based SA is carried out in this paper. The essentials needed for performing stress and emotion detection are gathered initially from benchmark databases. Next, the pre-processing procedures, like the removal of artifacts from the gathered EEG signal, are carried out on the implemented model. The extraction of the spectral attributes is carried out from the pre- processed. The extracted spectral features are considered the first set of features. Then, with the aid of a Conditional Variational Autoencoder (CVA), the deep features are extracted from the pre-processed signals forming a second set of features. The weights are optimised using the Adaptive Egret Swarm Optimisation Algorithm (AESOA) so that the weighted fused features are obtained from these two sets of extracted features. Then, a Cascaded Deep Temporal Convolution Network with Attention Mechanism (CDTCN-AM) is used to recognise stress and emotion. The validation of the results from the developed stress and emotion recognition approach is carried out against traditional models in order to showcase the effectiveness of the suggested approach.
Circular RNAs (circRNAs) are endogenous non-coding RNAs with a covalently closed loop structure. They have many biological functions, mainly regulatory ones. They have been proven to modulate protein-coding genes in the human genome. CircRNAs are linked to various diseases like Alzheimer’s disease, diabetes, atherosclerosis, Parkinson’s disease and cancer. Identifying the associations between circular RNAs and diseases is essential for disease diagnosis, prevention, and treatment. The proposed model, based on the variational autoencoder and genetic algorithm circular RNA disease association (VAGA-CDA), predicts novel circRNA-disease associations. First, the experimentally verified circRNA-disease associations are augmented with the synthetic minority oversampling technique (SMOTE) and regenerated using a variational autoencoder, and feature selection is applied to these vectors by a genetic algorithm (GA). The variational autoencoder effectively extracts features from the augmented samples. The optimized feature selection of the genetic algorithm effectively carried out dimensionality reduction. The sophisticated feature vectors extracted are then given to a Random Forest classifier to predict new circRNA-disease associations. The proposed model yields an AUC value of 0.9644 and 0.9628 under 5-fold and 10-fold cross-validations, respectively. The results of the case studies indicate the robustness of the proposed model.
Quantum state tomography (QST) is an important and challenging task in the field of quantum information, which has attracted a lot of attentions in recent years. Machine learning models can provide a classical representation of the quantum state after trained on the measurement outcomes, which are part of effective techniques to solve QST problem. In this work, we use a variational autoencoder (VAE) to learn the measurement distribution of two quantum states generated by MPS circuits. We first consider the Greenberger–Horne–Zeilinger (GHZ) state which can be generated by a simple MPS circuit. Simulation results show that a VAE can reconstruct 3- to 8-qubit GHZ states with a high fidelity, i.e., 0.99, and is robust to depolarizing noise. The minimum number (N∗sN∗s) of training samples required to reconstruct the GHZ state up to 0.99 fidelity scales approximately linearly with the number of qubits (NN). However, for the quantum state generated by a complex MPS circuit, N∗sN∗s increases exponentially with NN, especially for the quantum state with high entanglement entropy.
Classification problems in which the number of features (dimensions) is unduly higher than the number of samples (observations) is an essential research and application area in a variety of domains, especially in computational biology. It is also known as a high-dimensional small-sample-size (HDSSS) problem. Various dimensionality reduction methods have been developed, but they are not potent with the small-sample-sized high-dimensional datasets and suffer from overfitting and high-variance gradients. To overcome the pitfalls of sample size and dimensionality, this study employed variational autoencoder (VAE), which is a dynamic framework for unsupervised learning in recent years. The objective of this study is to investigate a reliable classification model for high-dimensional and small-sample-sized datasets with minimal error. Moreover, it evaluated the strength of different architectures of VAE on the HDSSS datasets. In the experiment, six genomic microarray datasets from Kent Ridge Biomedical Dataset Repository were selected, and several choices of dimensions (features) were applied for data preprocessing. Also, to evaluate the classification accuracy and to find a stable and suitable classifier, nine state-of-the-art classifiers that have been successful for classification tasks in high-dimensional data settings were selected. The experimental results demonstrate that the VAE can provide superior performance compared to traditional methods such as PCA, fastICA, FA, NMF, and LDA in terms of accuracy and AUROC.
Molecular dynamics (MD) simulations have been extensively used to study protein dynamics and subsequently functions. However, MD simulations are often insufficient to explore adequate conformational space for protein functions within reachable timescales. Accordingly, many enhanced sampling methods, including variational autoencoder (VAE) based methods, have been developed to address this issue. The purpose of this study is to evaluate the feasibility of using VAE to assist in the exploration of protein conformational landscapes. Using three modeling systems, we showed that VAE could capture high-level hidden information which distinguishes protein conformations. These models could also be used to generate new physically plausible protein conformations for direct sampling in favorable conformational spaces. We also found that VAE worked better in interpolation than extrapolation and increasing latent space dimension could lead to a trade-off between performances and complexities.
Single-cell RNA sequencing (scRNA-seq) is a powerful tool to profile the transcriptomes of a large number of individual cells at a high resolution. These data usually contain measurements of gene expression for many genes in thousands or tens of thousands of cells, though some datasets now reach the million-cell mark. Projecting high-dimensional scRNA-seq data into a low dimensional space aids downstream analysis and data visualization. Many recent preprints accomplish this using variational autoencoders (VAE), generative models that learn underlying structure of data by compress it into a constrained, low dimensional space. The low dimensional spaces generated by VAEs have revealed complex patterns and novel biological signals from large-scale gene expression data and drug response predictions. Here, we evaluate a simple VAE approach for gene expression data, Tybalt, by training and measuring its performance on sets of simulated scRNA-seq data. We find a number of counter-intuitive performance features: i.e., deeper neural networks can struggle when datasets contain more observations under some parameter configurations. We show that these methods are highly sensitive to parameter tuning: when tuned, the performance of the Tybalt model, which was not optimized for scRNA-seq data, outperforms other popular dimension reduction approaches – PCA, ZIFA, UMAP and t-SNE. On the other hand, without tuning performance can also be remarkably poor on the same data. Our results should discourage authors and reviewers from relying on self-reported performance comparisons to evaluate the relative value of contributions in this area at this time. Instead, we recommend that attempts to compare or benchmark autoencoder methods for scRNA-seq data be performed by disinterested third parties or by methods developers only on unseen benchmark data that are provided to all participants simultaneously because the potential for performance differences due to unequal parameter tuning is so high.
When analyzing biological data, it can be helpful to consider gene sets, or predefined groups of biologically related genes. Methods exist for identifying gene sets that are differential between conditions, but large public datasets from consortium projects and single-cell RNA-Sequencing have opened the door for gene set analysis using more sophisticated machine learning techniques, such as autoencoders and variational autoencoders. We present shallow sparsely-connected autoencoders (SSCAs) and variational autoencoders (SSCVAs) as tools for projecting gene-level data onto gene sets. We tested these approaches on single-cell RNA-Sequencing data from blood cells and on RNA-Sequencing data from breast cancer patients. Both SSCA and SSCVA can recover known biological features from these datasets and the SSCVA method often outperforms SSCA (and six existing gene set scoring algorithms) on classification and prediction tasks.
Anonymized electronic health records (EHR) are often used for biomedical research. One persistent concern with this type of research is the risk for re-identification of patients from their purportedly anonymized data. Here, we use the EHR of 731,850 de-identified patients to demonstrate that the average patient is unique from all others 98.4% of the time simply by examining what laboratory tests have been ordered for them. By the time a patient has visited the hospital on two separate days, they are unique in 72.3% of cases. We further present a computational study to identify how accurately the records from a single day of care can be used to re-identify patients from a set of 99 other patients. We show that, given a single visit’s laboratory orders (even without result values) for a patient, we can re-identify the patient at least 25% of the time. Furthermore, we can place this patient among the top 10 most similar patients 47% of the time. Finally, we present a proof-of-concept technique using a variational autoencoder to encode laboratory results into a lower-dimensional latent space. We demonstrate that releasing latentspace encoded laboratory orders significantly improves privacy compared to releasing raw laboratory orders (<5% re-identification), while preserving information contained within the laboratory orders (AUC of >0.9 for recreating encoded values). Our findings have potential consequences for the public release of anonymized laboratory tests to the biomedical research community. We note that our findings do not imply that laboratory tests alone are personally identifiable. In the attack scenario presented here, reidentification would require a threat actor to possess an external source of laboratory values which are linked to personal identifiers at the start.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.