Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In Biochemistry, tandem mass spectrometry (MS/MS) is the most common method for peptide and protein identifications. One computational method to get a peptide sequence from the MS/MS data is called de novo sequencing, which is becoming more and more important in this area. However De novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by "mass gaps". We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses add up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this paper, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.
Two formal languages are f-equivalent if their symmetric difference L1 Δ L2 is a finite set — that is, if they differ on only finitely many words. The study of f-equivalent languages, and particularly the DFAs that accept them, was recently introduced [3]. First, we restate the fundamental results in this new area of research. Second, we introduce our main result, which is a faster algorithm for the natural minimization problem: given a starting DFA D, find the smallest (by number of states) DFA D′ such that L(D) and L(D′) are f-equivalent. Finally, we suggest a technique that combines this hyper-minimization with the well-studied notion of a deterministic finite cover automaton [2, 4, 5], or DFCA, thereby extending the applicability of DFCAs from finite to infinite regular languages.
The consequences of regular expression hashing as a means of finite state automaton reduction is explored, based on variations of Brzozowski's algorithm. In this approach, each hash collision results in the merging of the automaton's states, and it is subsequently shown that a super-automaton will always be constructed, regardless of the hash function used. Since direct adaptation of the classical Brzozowski algorithm leads to a non-deterministic super-automaton, a new algorithm is put forward for constructing a deterministic FA. Approaches are proposed for measuring the quality of a hash function.
These ideas are empirically tested on a large sample of relatively small regular expressions and their associated automata, as well as on a small sample of relatively large regular expressions. Differences in the quality of tested hash functions are observed. Possible reasons for this are mentioned, but future empirical work is required to investigate the matter.
It was conjectured by Černý in 1964, that a synchronizing DFA on n states always has a synchronizing word of length at most (n−1)2, and he gave a sequence of DFAs for which this bound is reached. Until now a full analysis of all DFAs reaching this bound was only given for n≤5, and with bounds on the number of symbols for n≤12. Here we give the full analysis for n≤7, without bounds on the number of symbols.
For PFAs (partial automata) on ≤7 states we do a similar analysis as for DFAs and find the maximal shortest synchronizing word lengths, exceeding (n−1)2 for n≥4. Where DFAs with long synchronization typically have very few symbols, for PFAs we observe that more symbols may increase the synchronizing word length. For PFAs on ≤10 states and two symbols we investigate all occurring synchronizing word lengths.
We give series of PFAs on two and three symbols, reaching the maximal possible length for some small values of n. For n=6,7,8,9, the construction on two symbols is the unique one reaching the maximal length. For both series the growth is faster than (n−1)2, although still quadratic.
Based on string rewriting, for arbitrary size we construct a PFA on three symbols with exponential shortest synchronizing word length, giving significantly better bounds than earlier exponential constructions. We give a transformation of this PFA to a PFA on two symbols keeping exponential shortest synchronizing word length, yielding a better bound than applying a similar known transformation. Both PFAs are transitive.
Finally, we show that exponential lengths are even possible with just one single undefined transition, again with transitive constructions.
For a formal language L, the problem of language enumeration asks to compute the length-lexicographically smallest word in L larger than a given input w∈Σ∗ (henceforth called the L-successor of w). We investigate this problem for regular languages from a computational complexity and state complexity perspective. We first show that if L is recognized by a DFA with n states, then 2Θ(√nlogn) states are (in general) necessary and sufficient for an unambiguous finite-state transducer to compute L-successors. As a byproduct, we obtain that if L is recognized by a DFA with n states, then 2Θ(√nlogn) states are sufficient for a DFA to recognize the subset S(L) of L composed of its lexicographically smallest words. We give a matching lower bound that holds even if S(L) is represented as an NFA. It has been known that L-successors can be computed in polynomial time, even if the regular language is given as part of the input (assuming a suitable representation of the language, such as a DFA). In this paper, we refine this result in multiple directions. We show that if the regular language is given as part of the input and encoded as a DFA, the problem is in NL. If the regular language L is fixed, we prove that the enumeration problem of the language is reducible to deciding membership to the Myhill-Nerode equivalence classes of L under DLOGTIME-uniform AC0 reductions. In particular, this implies that fixed star-free languages can be enumerated in AC0, arbitrary fixed regular languages can be enumerated in NC1 and that there exist regular languages for which the problem is NC1-complete.
Applying the statistical hypothesis testing, we investigate several nonlinear properties embedded in the return series of the Chinese Fund Market (CFM), which suggests the series is non-normal, auto-correlative and heteroskedastic. We hereby analyze the Hurst exponent of the return series in different timescales on the basis of the detrended fluctuation analysis (DFA) algorithm, and discuss the fractal behavior of the CFM. Furthermore, by studying the correlation of different weights in the volatility, we find the persistent long-range power-law correlation exists in the time series. We also provide hints that the above statistical properties are insensitive to the funds kind, and may be irrelevant to the market phases. Our work may reveal the self-similarity characteristics of the financial market and show a better understanding of the CFM.
We re-analyze historical daily atmospheric temperature time series for investigating long-range correlation. Such a problem is attracting much attention due to the deep importance of assessing statistical dependence of atmospheric phenomena on climatic scales in the context of Climate modeling. In particular, we adopt Detrended Fluctuation Analysis (DFA), which is one of the most used techniques for detecting scale invariance in stationary signals contaminated by external non-stationary disturbances. A very standard application of this methodology seems to evidence persistence power-law exponents close to 0.65 on time scales greater than the meteorological one (>15 days). Nevertheless, more careful investigations put into evidence the local character of this exponent whose value decays progressively with scale. Our results show that the simple detection of approximately straight lines in a log–log plot cannot be considered as a signature of scale invariance and local scale features have to be explicitly investigated.
Analyzing the statistical features of fluctuation is remarkably significant for financial risk identification and measurement. In this study, the asymmetric detrended fluctuation analysis (A-DFA) method was applied to evaluate asymmetric multifractal scaling behaviors in the Shanghai and New York gold markets. Our findings showed that the multifractal features of the Chinese and international gold spot markets were asymmetric. The gold return series persisted longer in an increasing trend than in a decreasing trend. Moreover, the asymmetric degree of multifractals in the Chinese and international gold markets decreased with the increase in fluctuation range. In addition, the empirical analysis using sliding window technology indicated that multifractal asymmetry in the Chinese and international gold markets was characterized by its time-varying feature. However, the Shanghai and international gold markets basically shared a similar asymmetric degree evolution pattern. The American subprime mortgage crisis (2008) and the European debt crisis (2010) enhanced the asymmetric degree of the multifractal features of the Chinese and international gold markets. Furthermore, we also make statistical tests for the results of multifractatity and asymmetry, and discuss the origin of them. Finally, results of the empirical analysis using the threshold autoregressive conditional heteroskedasticity (TARCH) and exponential generalized autoregressive conditional heteroskedasticity (EGARCH) models exhibited that good news had a more significant effect on the cyclical fluctuation of the gold market than bad news. Moreover, good news exerted a more significant effect on the Chinese gold market than on the international gold market.
Abundant asymmetric unit of the [FeBr4]2[py.H]3Br magnetic molecule in the acetonitrile solvent was characterized via Debye function analysis (DFA) of the X-ray powder diffraction pattern from dilute solution. A diluted solution of the material in acetonitrile solvent has been prepared to reduce, as far as possible, the interaction between the molecular units. The X-ray diffraction from the sample was measured and Debye function simulations of three out of ten chemically plausible molecular units were observed to suitably comply with the experimental results. These three configurations were further optimized with first-principles method in the framework of density functional theory (DFT) and the most stable structure according to the calculated total energy is presented.
It has been observed that Internet gateways employing Transport Control Protocol (TCP) and the Random Early Detection (RED) control algorithm may exhibit instability and oscillatory behavior. Most control methods proposed in the past have been based on analytical models that rely on statistical measurements of network parameters. In this paper, we apply the detrended fluctuation analysis (DFA) method to analyze stability of the TCP-RED system. The DFA is used to analyze time-series data and generate power-law scaling exponents, which indicate the long-range correlations of the time series. We quantify the stability of the TCP-RED system by examining the variation of the DFA power-law scaling exponent when the system parameters are varied. We also study the long-range power-law correlations of TCP window periods.
In this paper, the complexity-entropy causality plane approach is applied to analyze traffic data. The R/S analysis and detrended fluctuation analysis (DFA) methods are also used to compare with this approach. Moreover, based on the concept of entropy, we propose to use permutation to calculate the probability distribution of the time series when applying the representation plane. The empirical results indicate that traffic dynamics exhibit different levels of traffic congestion and demonstrate that this statistical method can give a more refined classification of traffic states than the R/S analysis and DFA.
Based on the daily return and volatility series of the Chinese yuan (RMB)/US dollar (USD) exchange rate and the Shanghai Stock Composite Index, the time-varying long memories of the Chinese currency and stock markets are investigated by comprehensively using the rescaled range (R/S), the modified R/S, and the detrended fluctuation analysis methods. According to the results drawn: (1) the efficiency of the Chinese currency market has not improved significantly, whereas the efficiency of the Chinese stock market has improved steadily, (2) volatility series presents longer memory than return series either in the Chinese currency or stock market and (3) the time-varying Hurst exponent of the Chinese currency market is sensitive to the reform that enhances the flexibility of the RMB exchange rate. Moreover, we find that short-term bidirectional Granger causal relationship exists, but no long-run equilibrium relationship between the time-varying Hurst exponents of the Chinese currency and stock markets was found based on the Granger causality and cointegration tests, respectively.
This paper aims to analyze whether the financial crises of the past 20 years have reduced efficiency, in its weak form, in 19 stock markets belonging to the 20 most developed economies (G-20). The sample period comprises the period from 2 January 2000 to 5 February 2021 with the respective financial crises, namely, Dot-com, Argentina, Subprime, Sovereign debt, China stock market crash (2015–2016), UK’s withdrawal from the European Union and the global pandemic of 2020. The results highlight that most markets show signs of (in)efficiency in each sliding window (1000 days), that is, they show asymmetries and non-Gaussian distributions, and αDFA≠0.5. These findings suggest that the random walk hypothesis is rejected in certain markets, which has implications for investors, since some returns may be expected, creating arbitrage and abnormal profit opportunities, contrary to the random walk and informational efficiency hypotheses. The results found also open room for market regulators to take steps to ensure better informational data across international financial markets.
This chapter uses four alternative ways to evaluate mutual fund performance and rank the domestic equity funds of three leading mutual fund companies: DFA, Fidelity, and Vanguard, while splitting the Fidelity and Vanguard funds into indexed and managed portfolios.
First, we use the Fama–French factors to show the extent to which various equally-weighted fund portfolios out-returned their empirically determined benchmarks, and characterize the funds according to size and book-to-market value. Our measure tweaks alpha, the extent to which a mutual fund outperforms its benchmark, and we offer a novel interpretation of alpha. Second, we use Sharpe’s (1962) style analysis to explore the robustness of our analysis. Third, we ask whether funds out-return the market. Fourth, we compare the fund returns with their published benchmarks and use Fama–French factors to adjust for bias in the benchmarks.
Return relative to the stock market reflects style choice, benchmark choice, fees, transaction efficiency, and income from lending to short sellers. Fama–French and Sharpe style adjustment abstract from style choice. Comparison with style-adjusted published benchmarks further abstracts from benchmark choice. Each calculation answers a different question.
The Fama–French and Sharpe methods, recorded in Table 6, show that after style adjustment for Fidelity 2001–2018 the indexed portfolio out-returned the managed portfolio. The indexed and managed portfolios from Vanguard and the managed portfolio from Fidelity 2001–2019 out-returned the U.S. stock market as did the Fidelity indexed portfolio 2005–2018. In comparison with the style-adjusted published benchmarks, the Vanguard managed portfolio is the best and the Fidelity managed portfolio is the worst.