Please login to be able to save your searches and receive alerts for new content matching your search criteria.
MALDI-based Imaging Mass Spectrometry (IMS) is an analytical technique that provides the opportunity to study the spatial distribution of biomolecules including proteins and peptides in organic tissue. IMS measures a large collection of mass spectra spread out over an organic tissue section and retains the absolute spatial location of these measurements for analysis and imaging. The classical approach to IMS imaging, producing univariate ion images, is not well suited as a first step in a prospective study where no a priori molecular target mass can be formulated. The main reasons for this are the size and the multivariate nature of IMS data. In this paper we describe the use of principal component analysis as a multivariate pre-analysis tool, to identify the major spatial and mass-related trends in the data and to guide further analysis downstream. First, a conceptual overview of principal component analysis for IMS is given. Then, we demonstrate the approach on an IMS data set collected from a transversal section of the spinal cord of a standard control rat.
Brain imaging and protein expression, from both cerebrospinal fluid and blood plasma, have been found to provide complementary information in predicting the clinical outcomes of Alzheimer’s disease (AD). But the underlying associations that contribute to such a complementary relationship have not been previously studied yet. In this work, we will perform an imaging proteomics association analysis to explore how they are related with each other. While traditional association models, such as Sparse Canonical Correlation Analysis (SCCA), can not guarantee the selection of only disease-relevant biomarkers and associations, we propose a novel discriminative SCCA (denoted as DSCCA) model with new penalty terms to account for the disease status information. Given brain imaging, proteomic and diagnostic data, the proposed model can perform a joint association and multi-class discrimination analysis, such that we can not only identify disease-relevant multimodal biomarkers, but also reveal strong associations between them. Based on a real imaging proteomic data set, the empirical results show that DSCCA and traditional SCCA have comparable association performances. But in a further classification analysis, canonical variables of imaging and proteomic data obtained in DSCCA demonstrate much more discrimination power toward multiple pairs of diagnosis groups than those obtained in SCCA.
Identifying reproducible yet relevant features is a major challenge in biological research. This is well documented in genomics data. Using a proposed set of three reliability benchmarks, we find that this issue exists also in proteomics for commonly used feature-selection methods, e.g. t-test and recursive feature elimination. Moreover, due to high test variability, selecting the top proteins based on p-value ranks — even when restricted to high-abundance proteins — does not improve reproducibility. Statistical testing based on networks are believed to be more robust, but this does not always hold true: The commonly used hypergeometric enrichment that tests for enrichment of protein subnets performs abysmally due to its dependence on unstable protein pre-selection steps. We demonstrate here for the first time the utility of a novel suite of network-based algorithms called ranked-based network algorithms (RBNAs) on proteomics. These have originally been introduced and tested extensively on genomics data. We show here that they are highly stable, reproducible and select relevant features when applied to proteomics data. It is also evident from these results that use of statistical feature testing on protein expression data should be executed with due caution. Careless use of networks does not resolve poor-performance issues, and can even mislead. We recommend augmenting statistical feature-selection methods with concurrent analysis on stability and reproducibility to improve the quality of the selected features prior to experimental validation.
Traditional Chinese medicine (TCM), an alternative medicine, focuses on the treatment of human disease via the integrity of the close relationship between body and syndrome analysis. It remains a form of primary care in most Asian countries and its characteristics showcase the great advantages of personalized medicine. Although this approach to disease diagnosis, prognosis and treatment has served the medical establishment well for thousands of years, it has serious shortcomings in the era of modern medicine that stem from its reliance on reductionist principles of experimentation and analysis. In this way, systems biology offers the potential to personalize medicine, facilitating the provision of the right care to the right patient at the right time. We expect that systems biology will have a major impact on future personalized therapeutic approaches which herald the future of medicine. Here we summarize current trends and critically review the potential limitations and future prospects of such treatments. Some characteristic examples are presented to highlight the application of this groundbreaking platform to personalized TCM as well as some of the necessary milestones for moving systems biology of a state-of-the-art nature into mainstream health care.
The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have shown that its topology displays a number of surprising features shared by other complex networks, both natural and artificial. In order to understand the origins of this topology and its evolutionary implications, we present a simple model of proteome evolution that is able to reproduce many of the observed statistical regularities reported from the analysis of the yeast proteome. Our results suggest that the observed patterns can be explained by a process of gene duplication and diversification that would evolve proteome networks under a selection pressure, favoring robustness against failure of its individual components.
Proteomic challenges, stirred up by the advent of high-throughput technologies, produce large amount of MS data. Nowadays, the routine manual search does not satisfy the “speed” of modern science any longer. In our work, the necessity of single-thread analysis of bulky data emerged during interpretation of HepG2 proteome profiling results for proteoforms searching. We compared the contribution of each of the eight search engines (X!Tandem, MS-GF+, MS Amanda, MyriMatch, Comet, Tide, Andromeda, and OMSSA) integrated in an open-source graphical user interface SearchGUI (http://searchgui.googlecode.com) into total result of proteoforms identification and optimized set of engines working simultaneously. We also compared the results of our search combination with Mascot results using protein kit UPS2, containing 48 human proteins. We selected combination of X!Tandem, MS-GF+ and OMMSA as the most time-efficient and productive combination of search. We added homemade java-script to automatize pipeline from file picking to report generation. These settings resulted in rise of the efficiency of our customized pipeline unobtainable by manual scouting: the analysis of 192 files searched against human proteome (42153 entries) downloaded from UniProt took 11h.
Medicinal herbs have a long history of use in the practice of traditional Chinese medicine and a substantial body of evidence has, over recent decades, demonstrated a range of important pharmacological properties. Western biomedical researchers are examining not only the efficacy of the traditional herbal products but, through the use of a range of bioassays and analytical techniques, are developing improved methods to isolate and characterize active components. This review briefly describes the different extraction methodologies used in the preparation of herbal extracts and reviews the utility of chromatography-mass spectrometry for the analysis of their active components. In particular, applications of gas or liquid chromatography with mass spectrometry for the isolation and characterization of active components of ginseng are critically assessed. The analysis of toxic substances from herb extracts with mass spectrometric techniques is also discussed along with the potential for mass spectrometric methods to investigate the proteomics of herbal extracts.
Abundant evidence supports the key role of ultraviolet radiation (UVR) in skin cancer development. The human skin, especially the epidermal layer, is the main defense against UV radiation. Baicalin is a major bioactive component of Scutellaria baicalensis Georgi, a plant which has been found to exhibit antitumor activity. The anticarcinogenic mechanism of baicalin is not completely understood. We have reported that baicalin inhibited UVB-induced photo-damage and apoptosis in HaCaT cells (human skin keratinocytes). The aim of the present study is to investigate the cellular gene targets responsible for baicalin’s antitumor activity by performing two-dimensional electrophoresis liquid chromatography-mass spectrometry/mass spectrometry (2-DE LC-MS/MS) with HaCaT cells following UVB and baicalin exposure. Two-DE for protein separation was performed, followed by matrix-assisted laser desorption/ionization mass spectrometry and database searches. Nucleophosmin (NPM)-specific siRNA was designed and synthesized, and the small interfering RNA was transfected into skin squamous cancer A431 cells to knockdown the NPM expression. Proliferation and cell cycle status were assessed by CCK8 and flow cytometric analyses, respectively. We have identified 38 protein spots that are differentially expressed in HaCaT cells exposed to baicalin and/or UVB irradiation These proteins are involved in detoxification, proliferation, metabolism, cytoskeleton and motility. In particular, we found several proteins that have been linked to tumor progression and resistance, such as NPM. Baicalin treatment reduced the cellular proliferation rate and induced arrest during the S-phase of the cell cycle in A431 cells. NPM1 silencing significantly enhanced the effect of baicalin. Our data indicated that baicalin results in the significant inhibition of tumor growth in the A431 cell line, which may be associated with the regulation of the NPM gene expression.
Proteomics technologies can be applied to simultaneously study the function, organization, diversity, and dynamic variety of a cell or a whole tissue. The integrative approach of proteomics is in line with the holistic concept and practices of traditional Chinese medicine (TCM). In this review, the technologies of proteomics, their adoption leverages the depth and breadth of TCM research are introduced. This article presents some examples to illustrate the use of proteomics technologies in the study of pharmacological effects and their action mechanisms relevant to TCM. Proteomics technologies could be used to screen the target molecules of the TCM actions, identify new bioactive components, and elucidate the underlying mechanisms of their effects. With proteomics approaches, it was found that the Siwu decoction could regulate the protein expression of the bone marrow of blood (Xue) deficient mice, including some proteins and enzymes involved in the hemopoiesis system. Ganoderma lucidum spores might promote the survival and axon regeneration of injured spinal motor neurons in rats by regulating the expression levels of proteins involved in the energy and tissue regeneration system. Polygonatum zanlanscianense Pamp exhibited cytotoxicity towards human myeloblast leukemia HL-60 cells through multiple apoptosis-including pathways. Panax ginseng might be beneficial to patients suffering from diabetes mellitus and its complications by alleviating inflammation. Taken together with a discussion on the challenges and perspectives, this paper provides an overview of the recent developments of proteomics technologies in TCM research, and contends that proteomics will play an important role in the modernization and internationalization of TCM.
Podophyllotoxin, a kind of lignan extracted from the Podophyllum plant, has been shown to inhibit the growth of various carcinoma cells. However, the molecular mechanism remains unclear. In this study, the inhibition of cell growth and changes in protein expression induced by podophyllotoxin were investigated in human cervical carcinoma HeLa cells. Our results demonstrate that Podophyllotoxin inhibits HeLa cell growth and induces apoptosis. By using proteomic techniques, seven proteins were found to be significantly regulated by podophyllotoxin compared to the untreated control; among them, four were down-regulated and three were up-regulated. All of the seven proteins were identified with peptide mass fingerprinting using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) after in-gel trypsin digestion. Five of these proteins are involved in protein metabolism, and the other two play roles in cell communication and signaling transduction pathways. It is suggested that the effect of podophyllotoxin on the growth of tumor cells is significantly related to the metabolism-associated proteins.
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS p-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard p-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective p-value is ill-advised.
We focus on the problem of prediction with confidence and describe a recently developed learning algorithm called transductive confidence machine for making qualified region predictions. Its main advantage, in comparison with other classifiers, is that it is well-calibrated, with number of prediction errors strictly controlled by a given predefined confidence level. We apply the transductive confidence machine to the problems of acute leukaemia and ovarian cancer prediction using microarray and proteomics pattern diagnostics, respectively. We demonstrate that the algorithm performs well, yielding well-calibrated and informative predictions whilst maintaining a high level of accuracy.
Proteomics research programs typically comprise the identification of protein content of any given cell, their isoforms, splice variants, post-translational modifications, interacting partners and higher-order complexes under different conditions. These studies present significant analytical challenges owing to the high proteome complexity and the low abundance of the corresponding proteins, which often requires highly sensitive and resolving techniques. Mass spectrometry plays an important role in proteomics and has become an indispensable tool for molecular and cellular biology. However, the analysis of mass spectrometry data can be a daunting task in view of the complexity of the information to decipher, the accuracy and dynamic range of quantitative analysis, the availability of appropriate bioinformatics software and the overwhelming size of data files. The past ten years have witnessed significant technological advances in mass spectrometry-based proteomics and synergy with bioinformatics is vital to fulfill the expectations of biological discovery programs. We present here the technological capabilities of mass spectrometry and bioinformatics for mining the cellular proteome in the context of discovery programs aimed at trace-level protein identification and expression from microgram amounts of protein extracts from human tissues.
Cisplatin-induced drug resistance is known to involve a complex set of cellular changes whose molecular mechanism details remain unclear. In this study, we developed a systems biology approach to examine proteomics- and network-level changes between cisplatin-resistant and cisplatin-sensitive cell lines. This approach involves experimental investigation of differential proteomics profiles and computational study of activated enriched proteins, protein interactions, and protein interaction networks. Our experimental platform is based on a Label-free liquid Chromatography/mass spectrometry proteomics platform. Our computational methods start with an initial list of 119 differentially expressed proteins. We expanded these proteins into a cisplatin-resistant activated sub-network using a database of human protein-protein interactions. An examination of network topology features revealed the activated responses in the network are closely coupled. By examining sub-network proteins using gene ontology categories, we found significant enrichment of proton-transporting ATPase and ATP synthase complexes activities in cisplatin-resistant cells in the form of cooperative down-regulations. Using two-dimensional visualization matrixes, we further found significant cascading of endogenous, abiotic, and stress-related signals. Using a visual representation of activated protein categorical sub-networks, we showed that molecular regulation of cell differentiation and development caused by responses to proteome-wide stress as a key signature to the acquired drug resistance.
Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.
Molecular mechanisms characterizing cancer development and progression are complex and process through thousands of interacting elements in the cell. Understanding the underlying structure of interactions requires the integration of cellular networks with extensive combinations of dysregulation patterns. Recent pan-cancer studies focused on identifying common dysregulation patterns in a confined set of pathways or targeting a manually curated set of genes. However, the complex nature of the disease presents a challenge for finding pathways that would constitute a basis for tumor progression and requires evaluation of subnetworks with functional interactions. Uncovering these relationships is critical for translational medicine and the identification of future therapeutics. We present a frequent subgraph mining algorithm to find functional dysregulation patterns across the cancer spectrum. We mined frequent subgraphs coupled with biased random walks utilizing genomic alterations, gene expression profiles, and protein-protein interaction networks. In this unsupervised approach, we have recovered expert-curated pathways previously reported for explaining the underlying biology of cancer progression in multiple cancer types. Furthermore, we have clustered the genes identified in the frequent subgraphs into highly connected networks using a greedy approach and evaluated biological significance through pathway enrichment analysis. Gene clusters further elaborated on the inherent heterogeneity of cancer samples by both suggesting specific mechanisms for cancer type and common dysregulation patterns across different cancer types. Survival analysis of sample level clusters also revealed significant differences among cancer types (p < 0.001). These results could extend the current understanding of disease etiology by identifying biologically relevant interactions.
Supplementary Information: Supplementary methods, figures, tables and code are available at https://github.com/bebeklab/FSM_Pancancer.
Proteomics has become a powerful technique to investigate cellular processes and network functions. This became possible as a result of major progress in the sensitivity of mass spectrometry instrumentation and data analysis software. As proteomics technologies are now becoming available to the wider scientific community, efforts are under way to identify complete proteomes. This information is used to improve genome annotation and to identify and confirm protein splice variants. Analysis of protein modifications and protein variants uses novel scoring and prediction tools independent of established protein databases. We discuss the proteomics tools and analysis pipelines that can be applied to rice in order to facilitate our understanding of rice genome structure and function.
In this paper a technique to improve protein secondary structure prediction is proposed. The approach is based on the idea of combining the results of a set of prediction tools, choosing the most correct parts of each prediction. The correctness of the resulting prediction is measured referring to accuracy parameters used in several editions of CASP. Experimental evaluations validating the proposed approach are also reported.
Cancer is a proteomic disease. Though MALDI-TOF mass spectrometry allows direct measurement of the protein signature of tissue, blood, or their biological samples, and holds tremendous potential for disease diagnosis and treatment, key challenges remain in the processing of proteomic data. In this chapter, we will introduce a wavelet based mathematical framework and computational tools for proteomic data processing, feature selection, and statistical analysis in cancer study.