Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This article provides essential support for victorian Science.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.
Chemogenomic experiments, where genetic and chemical perturbations are combined, provide data for discovering the relationships between genotype and phenotype. Traditionally, analysis of chemogenomic datasets has been done considering the sensitivity of the deletion strains to chemicals, and this has shed light on drug mechanism of action and detecting drug targets. Here, we computationally analyzed a large chemogenomic dataset, which combines more than 300 chemicals with virtually all gene deletion strains in the yeast S. cerevisiae. In addition to sensitivity relation between deletion strains and chemicals, we also considered the deletion strains that are resistant to chemicals. We found a small set of genes whose deletion makes the cell resistant to many chemicals. Curiously, these genes were enriched for functions related to RNA metabolism. Our approach allowed us to generate a network of drugs and genes that are connected with resistance or sensitivity relationships. As a quality assessment, we showed that the higher order motifs found in this network are consistent with biological expectations. Finally, we constructed a biologically relevant network projection pertaining to drug similarities, and analyzed this network projection in detail. We propose this drug similarity network as a useful tool for understanding drug mechanism of action.
Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS p-values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard p-value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective p-value is ill-advised.
Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use p-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for p-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call “home ground testing”. Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal “home ground testing”.