Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Computer based systems have increased dramatically in scope, complexity, pervasiveness. Most industries are highly dependent on computers for their basic day to day functioning. Safe & reliable software operations are an essential requirement for many systems across different industries. The number of functions to be included in a software system is decided during the software development. Any software system must be constructed in such a way that execution can resume even after the occurrence of failure with minimal loss of data and time. Such software systems which can continue execution even in presence of faults are called fault tolerant software. When failure occurs one of the redundant software modules get executed and prevent system failure. The fault tolerant software systems are usually developed by integrating COTS (commercial off-the-shelf) software components. The motivation for using COTS components is that they will reduce overall system development costs and reduce development time. In this paper, reliability models for fault tolerant consensus recovery blocks are analyzed. In first optimization model, we formulate joint optimization problem in which reliability maximization of software system and execution time minimization for each function of software system are considered under budgetary constraint. In the second model the issue of compatibility among alternatives available for different modules, is discussed. Numerical illustrations are provided to demonstrate the developed models.
Modeling of software reliability has gained lot of importance in recent years. Use of software-critical applications has led to tremendous increase in amount of work being carried out in software reliability growth modeling. Number of analytic software reliability growth models (SRGM) exists in literature. They are based on some assumptions; however, none of them works well across different environments. The current software reliability literature is inconclusive as to which models and techniques are best, and some researchers believe that each organization needs to try several approaches to determine what works best for them. Data-driven artificial neural-network (ANN) based models, on other side, provide better software reliability estimation. In this paper we present a new dimension to build an ensemble of different ANN to improve the accuracy of estimation for complex software architectures. Model has been validated on two data sets cited from the literature. Results show fair improvement in forecasting software reliability over individual neural-network based models.
We establish natural criteria under which normally iterable premice are iterable for stacks of normal trees. Let Ω be a regular uncountable cardinal. Let m<ω and M be an m-sound premouse and Σ be an (m,Ω+1)-iteration strategy for M (roughly, a normal (Ω+1)-strategy). We define a natural condensation property for iteration strategies, inflation condensation. We show that if Σ has inflation condensation then M is (m,Ω,Ω+1)∗-iterable (roughly, M is iterable for length ≤Ω stacks of normal trees each of length <Ω), and moreover, we define a specific such strategy Σst and a reduction of stacks via Σst to normal trees via Σ. If Σ has the Dodd-Jensen property and card(M)<Ω then Σ has inflation condensation. We also apply some of the techniques developed to prove that if Σ has strong hull condensation (introduced independently by John Steel), and G is V-generic for an Ω-cc forcing, then Σ extends to an (m,Ω+1)-strategy Σ+ for M with strong hull condensation, in the sense of V[G]. Moreover, this extension is unique. We deduce that if G is V-generic for a ccc forcing then V and V[G] have the same ω-sound, (ω,Ω+1)-iterable premice which project to ω.
Research in databases of three-dimensional (3D) objects is a special area of research in our community. This paper is in this context and more generally in information retrieval by content. In this context, the authors are particularly interested in the indexing of 3D objects, which aims to characterize the shape of objects, to allow comparison between models. The 3D shape descriptor is presented and, based on the distribution of cords. To make the descriptor invariant to geometric transformations of space, a normalization step is applied to the object, as a preprocessing step. The work is, therefore, to develop a system for indexing and searching databases of 3D objects. The hidden Markov models are well suited to for this indexing system. Using these tools, a system for learning and recognizing of 3D objects is built.
In [V. I. Arnold, Simple singularities of curves, Proc. Steklov Inst. Math. 226(3) (1999) 20–28, Sec. 5, p. 32], Arnold writes: ‘Classification of singularities of curves can be interpreted in dual terms as a description of “co-artin” subalgebras of finite co-dimension in the algebra of formal series in a single variable (up to isomorphism of the algebra of formal series)’. In the paper, such a description is obtained but up to isomorphism of algebraic curves (i.e. this description is finer).
Let K be an algebraically closed field of arbitrary characteristic. The aim of the paper is to give a classification (up to isomorphism) of the set of subalgebras 𝒜 of the polynomial algebra K[x] that contains the ideal xmK[x] for some m≥1. It is proven that the set 𝒜=∐m,Γ𝒜(m,Γ) is a disjoint union of affine algebraic varieties (where Γ∐{0,m,m+1,…} is the semigroup of the singularity and m−1 is the Frobenius number). It is proven that each set 𝒜(m,Γ) is an affine algebraic variety and explicit generators and defining relations are given for the algebra of regular functions on 𝒜(m,Γ). An isomorphism criterion is given for the algebras in 𝒜. For each algebra A∈𝒜(m,Γ), explicit sets of generators and defining relations are given and the automorphism group AutK(A) is explicitly described. The automorphism group of the algebra A is finite if and only if the algebra A is not isomorphic to a monomial algebra, and in this case |AutK(A)|<dimK(A/𝔠A) where 𝔠A is the conductor of A. The set of orders of the automorphism groups of the algebras in 𝒜(m,Γ) is explicitly described.
In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider k-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy+. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy+ algorithm benefits from early termination and introduces extra 1.2X performance improvement.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.
We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.
Since the available microarray data of BOEC (human blood outgrowth endothelial cells), large vessel, and microvascular endothelial cells were from two different platforms, a working cross-platform normalization method was needed to make these data comparable. With six HUVEC (human umbilical vein endothelial cells) samples hybridized on two-channel cDNA arrays and six HUVEC samples on Affymetrix arrays, 64 possible combinations of a three-step normalization procedure were investigated to search for the best normalization method, which was selected, based on two criteria measuring the extent to which expression profiles of biological samples of the same cell type arrayed on two platforms were indistinguishable. Next, three discriminative gene lists between the large vessel and the microvascular endothelial cells were achieved by SAM (significant analysis of microarrays), PAM (prediction analysis for microarrays), and a combination of SAM and PAM lists. The final discriminative gene list was selected by SVM (support vector machine). Based on this discriminative gene list, SVM classification analysis with best tuning parameters and 10,000 times of validations showed that BOEC were far from large vessel cells, they either formed their own class, or fell into the microvascular class. Based on all the common genes between the two platforms, SVM analysis further confirmed this conclusion.
The influence of genetic variations on diseases or cellular processes is the main focus of many investigations, and results of biomedical studies are often only accessible through scientific publications. Automatic extraction of this information requires recognition of the gene names and the accompanying allelic variant information. In a previous work, the OSIRIS system for the detection of allelic variation in text based on a query expansion approach was communicated. Challenges associated with this system are the relatively low recall for variation mentions and gene name recognition. To tackle this challenge, we integrate the ProMiner system developed for the recognition and normalization of gene and protein names with a conditional random field (CRF)-based recognition of variation terms in biomedical text. Following the newly developed normalization of variation entities, we can link textual entities to Single Nucleotide Polymorphism database (dbSNP) entries. The performance of this novel approach is evaluated, and improved results in comparison to state-of-the-art systems are reported.
New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.
Sequencing of transposon-mutant libraries using next-generation sequencing (TnSeq) has become a popular method for determining which genes and non-coding regions are essential for growth under various conditions in bacteria. For methods that rely on quantitative comparison of counts of reads at transposon insertion sites, proper normalization of TnSeq datasets is vitally important. Real TnSeq datasets are often noisy and exhibit a significant skew that can be dominated by high counts at a small number of sites (often for non-biological reasons). If two datasets that are not appropriately normalized are compared, it might cause the artifactual appearance of Differentially Essential (DE) genes in a statistical test, constituting type I errors (false positives). In this paper, we propose a novel method for normalization of TnSeq datasets that corrects for the skew of read-count distributions by fitting them to a Beta-Geometric distribution. We show that this read-count correction procedure reduces the number of false positives when comparing replicate datasets grown under the same conditions (for which no genuine differences in essentiality are expected). We compare these results to results obtained with other normalization procedures, and show that it results in greater reduction in the number of false positives. In addition we investigate the effects of normalization on the detection of DE genes.
A complete decomposition of the space of the curvature tensors over tensor product of vector spaces into simple modules under the action of the group G = GL(p, ℝ) ⊗ GL(q, ℝ) is given. We use these results to study geometry of manifolds with Grassmann structure and Grassmann manifolds endowed with a connection whose torsion is not zero. We show that Oscr M a manifold is an example of a manifold with Grassmann structure. Owing to this fact, we consider results of Miron, Atanasiu, Anastasiei, Čomić and others from representation theory point of view and connect them with some results of Alekseevsky, Cortes, and Devchand, as well as of Machida and Sato, and others. New examples of connections with torsion defined on four-dimensional Grassmann manifold are given. Symmetries of curvatures for half-flat connections are also investigated. We use algebraic results to reveal obstructions to the existence of corresponding connections.
We introduce novel paraphrasing rules for Japanese light verb constructions (LVCs) that reduce the differences in the surface forms while retaining several of the crucial syntactic/semantic functions of these light verbs. An analysis of the linguistic properties of light verbs allows us to create paraphrasing patterns that map 151 different light verbs into 10 simple forms. Of these 10 forms, 7 convert complex noun-particle-verb structures into simple predicative forms. By constructing a list of 923 examples for ambiguous light verbs, we show that we can correctly distinguish real LVCs from those in which the light verbs were actually functioning as a main verb. The results of experiments indicate that our paraphrasing rules offer high accuracy. The experiments also reveal that our paraphrasing system works as a normalizer of complex predicates, which improves the recall rate of the predicate extraction task.
With the increase of Web use in Morocco today, Internet has become an important source of information. Specifically, across social media, the Moroccan people use several languages in their communication leaving behind unstructured user-generated text (UGT) that presents several opportunities for Natural Language Processing. Among the languages found in this data, Moroccan Arabic (MA) stands with an important content and several features. In this paper, we investigate online written text generated by Moroccan users in social media with an emphasis on Moroccan Arabic. For this purpose, we follow several steps, using some tools such as a language identification system, in order to conduct a deep study of this data. The most interesting findings that have emerged are the use of code-switching, multi-script and low amount of words in the Moroccan UGT. Moreover, we used the investigated data in order to build a new Moroccan language resource. The latter consists in building a Moroccan words orthographic variants lexicon following an unsupervised approach and using character neural embedding. This lexicon can be useful for several NLP tasks such as spelling normalization.
There have been very few studies that evaluate multiple dimensions of energy security in China. This study thus aims to extend the line of research on energy security for the case of China from 1980 to 2014. For this purpose, the 4-A quantitative evaluation framework from Yao and Chang (2014) was employed as the main stream of analysis. We then conducted an overall energy security index based on a total of sixteen indicators. The results indicate that: (i) the availability and applicability dimensions show a stable increasing trend over the entire investigation period; (ii) the acceptability dimension increased strongly over the period 1980s to 1995 before gradually increasing afterwards; (iii) the affordability dimension improved steadily from 1980 to 1992 and from 1999 to 2008, while showing a decline in other periods. Notably, China’s overall energy security improved during the period 1990–1995, but decreased sharply over the years 1994–2000. Furthermore, it fluctuated from 2000 to 2009 before exhibiting a rising trend in recent years. The findings of this study imply that actions need to be taken on various dimensions for China to improve its overall energy security.
Averaging-normalization, applied to weakly nonlinear wave equations provides a tool for identification of slow manifolds in these infinite-dimensional systems. After discussing the general procedure we demonstrate its effectiveness for a Rayleigh wave equation to find low-dimensional invariant manifolds.
Chromatin immunoprecipitation on microarray (ChIP-chip) experiments are a powerful tool for the detection of in vivo protein-DNA binding activity essential to the regulation of gene expression. Coupled with newly introduced tiling microarrays that can interrogate select parts or even an entire genome with high resolution, Chip-chip technology allows for the unbiased mapping of DNA-binding proteins throughout the genome. However, the increased resolution from tiling microarrays results in large, noisy, and correlated data sets that require powerful and computationally efficient analysis methods. In this chapter we first introduce ChIP-chip technology and tiling microarrays. Then we discuss methods for analyzing ChIP-chip data to obtain a list of putative protein-DNA interaction regions. Furthermore, we consider relevant follow-up analyses such as correlating ChIP-chip data with gene expression and searching for known and novel DNA-binding motifs in the putative binding regions.
Assume that there are p samples and q metrics, which also means the data has q dimensions. For those original accurate data, we employ the normalization method to transfer them into numbers ranging from -1 to +1; for those indicators which cannot be presented as a number, we apply the man-made scoring: -1, -0.5, 0, +0.5, +1in five levels. Then separately, we performed Analytic Hierarchy Process and Maximum Entropy Model on these secondary data. By multiplying the weight and the correlation coefficient, Level 3 data is obtained. Finally, the Principal Component Analysis is carried out, and we can derive the function between the original data and the evaluation result. However, the connection is quite unstable if any data changes significantly. So we put forward a feedback system, which evaluates the final result (as a number) to be a new original data. Apparently, this model still has some contingencies. In order to evaluate the error, we establish an implicit relationship between the original data and the final results by Artificial Neural Networks Principals. Compared the simulation without previous results, one may conclude that our model is robust and valid under most circumstances. Finally, we can work out the error percentage of our evaluation model.