In this paper a novel approach is introduced for modeling and clustering gene expression time-series. The radial basis function neural networks have been used to produce a generalized and smooth characterization of the expression time-series. A co-expression coefficient is defined to evaluate the similarities of the models based on their temporal shapes and the distribution of the time points. The profiles are grouped using a fuzzy clustering algorithm incorporated with the proposed co-expression coefficient metric. The results on artificial and real data are presented to illustrate the advantages of the metric and method in grouping temporal profiles. The proposed metric has also been compared with the commonly used correlation coefficient under the same procedures and the results show that the proposed method produces better biologicaly relevant clusters.
We propose here a novel neural architecture dedicated to the prediction of time series. It can be considered as an adaptation of the idea of (GQN) to the data which is of a sequence nature. The new approach, dubbed here as the (RGQN), allows for efficient prediction of time series. The predictor information (i.e. the independent variable) is one or more of the other time series which are in some relationship with the predicted sequence. Each time series is accompanied by additional meta-information reflecting its selected properties. This meta-information, together with the standard dynamic component, is provided simultaneously in (RNN). During the inference phase, meta-information becomes a query reflecting the expected properties of the predicted time series. The proposed idea is illustrated with use cases of strong practical relevance. In particular, we discuss the example of an industrial pipeline that transports liquid media. The trained RGQN model is applied to predict pressure signals, assuming that the training was carried out during routine operational conditions. The subsequent comparison of the prediction with the actual data gathered under extraordinary circumstances, e.g. during the leakage, leads to a specific residual distribution of the prediction. This information can be applied directly within the data-driven Leak Detection and Location framework. The RGQN approach can be applied not only to pressure time series but also in many other use cases where the quantity of sequence nature is accompanied by a meta-descriptor.
Deep learning for automated interictal epileptiform discharge (IED) detection has been topical with many published papers in recent years. All existing works viewed EEG signals as time-series and developed specific models for IED classification; however, general time-series classification (TSC) methods were not considered. Moreover, none of these methods were evaluated on any public datasets, making direct comparisons challenging. This paper explored two state-of-the-art convolutional-based TSC algorithms, InceptionTime and Minirocket, on IED detection. We fine-tuned and cross-evaluated them on a public (Temple University Events — TUEV) and two private datasets and provided ready metrics for benchmarking future work. We observed that the optimal parameters correlated with the clinical duration of an IED and achieved the best area under precision-recall curve (AUPRC) of 0.98 and F1 of 0.80 on the private datasets, respectively. The AUPRC and F1 on the TUEV dataset were 0.99 and 0.97, respectively. While algorithms trained on the private sets maintained their performance when tested on the TUEV data, those trained on TUEV could not generalize well to the private data. These results emerge from differences in the class distributions across datasets and indicate a need for public datasets with a better diversity of IED waveforms, background activities and artifacts to facilitate standardization and benchmarking of algorithms.
Forecasting the evolution of industrial processes can be useful to discover faults. Several techniques based on analysis of time series are used to forecast the evolution of certain critical variables; however, the amount of variables makes difficult the analysis. In this way, the use of dimensionality reduction techniques such as the SOM (Self-Organizing Map) allows us to work with less data to determine the evolution of the process. SOM is a data mining technique widely used for supervision and monitoring. Since the SOM is projects data from a high dimensional space into a 2-D, the SOM reduces the number of variables. Thus, time series with the variables of the low dimensional projection can be created to make easier the prediction of future values in order to detect faults.
This paper is concerned with the case of an exogenous system in which a model is required to forecast a periodic output time series using a causal input. A novel approach is developed in which the wavelet packet transform is taken of both the dependent time series and causal input. This results in two sets of basis dictionaries and requires two bases to be chosen. It is proposed that the best bases to choose are those which maximize the mutual information. Input selection is then implemented by eliminating those coefficients of the selected input basis with low mutual information. As an example, a model is constructed to forecast short-term electrical demand.
We analyze variance, skewness and kurtosis risk premia and their option-implied and realized components as predictors of excess market returns and of the cross-section of stock returns. We find that the variance risk premium is the only moment-based variable to predict S&P 500 index excess returns, with a monthly out-of-sample R2 above 6% for the period between 2001 and 2014. Nonetheless, all aggregate moment-based variables are effective in predicting the cross-section of stock returns. Self-financed portfolios long on the stocks least exposed to the aggregate moment-based variable and short on the stocks most exposed to it achieve positive and significant Carhart 4-factor alphas and a considerably higher Sharpe ratio than the S&P 500 index, with positive skewness.
This paper introduces a new functional optimization approach to portfolio optimization problems by treating the unknown weight vector as a function of past values instead of treating them as fixed unknown coefficients in the majority of studies. We first show that the optimal solution, in general, is not a constant function. We give the optimal conditions for a vector function to be the solution, and hence give the conditions for a plug-in solution (replacing the unknown mean and variance by certain estimates based on past values) to be optimal. After showing that the plug-in solutions are sub-optimal in general, we propose gradient-ascent algorithms to solve the functional optimization for mean–variance portfolio management with theorems for convergence provided. Simulations and empirical studies show that our approach can perform significantly better than the plug-in approach.
Several researchers have recently shown an interaction between macroeconomic variables and stock returns. Most of these studies have concentrated on interest rates and inflation. These and other variables, of course, have an influence on the debt markets as well. Other variables that can influence the debt and equity markets include employment information. On the first Friday of each month the government releases its employment report for the previous month. Strong growth in employment generally bodes well for economic output and growth in the economy. Any inflation and interest rate implications of a strong employment report will ultimately be reflected in bond and stock prices. It is generally observed that if payroll employment growth is moderately strong prices in the bond market drop while prices in the stock market rise. The empirical evidence presented supports these observations. This study documents the reaction of the bond and the stock markets in response to the employment reports. As the unemployment rate tends to rise so do the bond and the stock markets.
Recently a new technique for generating linear surrogates of multichannel data was introduced. This technique, referred to as the coherent digitally filtered surrogate method, preserves both the individual power spectra, and the coherence function, of the original data. This method is somewhat limited in that it can only be applied to multichannel data in which the cross-spectrum is real. We present here an alteration to this algorithm that can be used to test any linear-correlation model, with arbitrary complex cross-spectra. This is of particular interest for experiments in which two channels are coupled with some time lag, and it is necessary to determine if the coupling is linear. We also demonstrate that this algorithm can be used along with the synchronization to provide a much better estimate for the degree of phase-locking between two signals than the coherence analysis techniques traditionally used in neuroscience.
Stock forecasting technology is always a popular research topic because accurate forecasts allow profitable investments and social change. We postulate, based on past research, three major drawbacks for using time series in forecasting stock prices as follows: (1) a simple time-series model provides insufficient explanations for inner and external interactions of the stock market; (2) the variables of a time series behave in strict stationarity, but economic time-series are usually in a nonlinear or nonstationary state and (3) the forecasting factors of multivariable time-series are selected based on researcher’s knowledge, and such a method is a “subjective” way to construct a forecasting model. Therefore, this paper proposes a causal time-series model to select forecasting factors and builds a machine learning forecast model. The “Granger causality test” is utilized first in the proposed model to select the critical factors from technical indicators and market indexes; next, a “multilayer perceptron regression (MLPR)” is employed to construct a forecasting model. This paper collected financial data over a 13-year period (from 2003 to 2015) of the Taiwan stock index (TAIEX) as experimental datasets. Furthermore, the root mean square error (RMSE) was used as a performance indicator, and we use five forecasting models as comparison models. The results reveal that the proposed model outperforms the comparison models in forecasting accuracy and performs well for three key indicators. LAG1, S&P500 and DJIA, are critical factors in all 11 of our time sliding windows (T1–T11). We offer these results to investors to aid in their decision-making processes.
Shapelets are subsequences of time-series that represent local patterns and can improve the accuracy and the interpretability of time-series classification. The major task of time-series classification using shapelets is to discover high quality shapelets. However, this is challenging since local patterns may have various scales/lengths rather than a unified scale. In this paper, we resolve this problem by discovering shapelets with multiple scales. We propose a novel Multi-Scale Shapelet Discovery (MSSD) algorithm to discover expressive multi-scale shapelets by extending initial single-scale shapelets (i.e., shapelets with a unified scale). MSSD adopts a bi-directional extension process and is robust to extend single-shapelets obtained by different methods. A supervised shapelet quality measurement is further developed to qualify the extension of shapelets. Comprehensive experiments conducted on 25 UCR time-series datasets show that multi-scale shapelets discovered by MSSD improve classification accuracy by around 10% (in average), compared with single-scale shapelets discovered by counterpart methods.
This paper presents a forecasting model for the mortality rates of COVID-19 in six of the top most affected countries depending on the hybrid Genetic Algorithm and Autoregressive Integrated Moving Average (GA-ARIMA). It was aimed to develop an advanced and reliable predicting model that provides future forecasts of possible confirmed cases and mortality rates (Total Deaths per 1 million Population of COVID-19) that could help the public health authorities to develop plans required to resolve the crisis of the pandemic threat in a timely and efficient manner. The study focused on predicting the mortality rates of COVID-19 because the mortality rate determines the prevalence of highly contagious diseases. The Genetic algorithm (GA) has the capability of improving the forecasting performance of the ARIMA model by optimizing the ARIMA model parameters. The findings of this study revealed the high prediction accuracy of the proposed (GA-ARIMA) model. Moreover, it has provided better and consistent predictions compared to the traditional ARIMA model and can be a reliable method in predicting expected death rates as well as confirmed cases of COVID-19. Hence, it was concluded that combining ARIMA with GA is further accurate than ARIMA alone and GA can be an alternative to find the parameters and model orders for the ARIMA model.
Higher organisms possess many genes which cycle under normal conditions, to allow the organism to adapt to expected environmental conditions throughout the course of a day. However, treatment-induced disruption of regular cyclic gene expression patterns presents a significant challenge in novel gene discovery experiments because these disruptions can induce strong differential regulation events for genes that are not involved in an adaptive response to the treatment. To address this cycle disruption problem, we reviewed the state-of-art periodic pattern detection algorithms and a pattern decomposition algorithm (PRIISM), which is a knowledge-based Fourier analysis algorithm designed to distinguish the cyclic patterns from the rest gene expression patterns, and discussed potential future improvements.
Clusters of time series data may change location and memberships over time; in gene expression data, this occurs as groups of genes or samples respond differently to stimuli or experimental conditions at different times. In order to uncover this underlying temporal structure, we consider dynamic clusters with time-dependent parameters which split and merge over time, enabling cluster memberships to change. These interesting time-dependent structures are useful in understanding the development of organisms or complex organs, and could not be identified using traditional clustering methods. In cell cycle data, these time-dependent structure may provide links between genes and stages of the cell cycle, whilst in developmental data sets they may highlight key developmental transitions.
Influenza is a communicable respiratory illness that can cause serious public health hazards. Due to its huge threat to the community, accurate forecasting of Influenza-like-illness (ILI) can diminish the impact of an influenza season by enabling early public health interventions. Machine learning models are increasingly being applied in infectious disease modelling, but are limited in their performance, particularly when using a longer forecasting window. This paper proposes a novel time series forecasting method, Randomized Ensembles of Auto-regression chains (Reach). Reach implements an ensemble of random chains for multistep time series forecasting. This new approach is evaluated on ILI case counts in Auckland, New Zealand from the years 2015-2018 and compared to other standard methods. The results demonstrate that the proposed method performed better than baseline methods when applied to this ILI time series forecasting problem.
We present techniques for the analysis and retrieval of time-series under conditions of noise. This is an important topic because the data obtained using various sensors (examples include GPS data or video tracking data) are typically noisy. The performance of previously used measures is generally degraded under noisy conditions. Here we formalize non-metric similarity functions based on the Longest Common Subsequence that are very robust to noise. Furthermore they provide an intuitive notion of similarity between time-series by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to Answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.
The increasing number of knowledge-based systems that build on Bayesian networks and dynamic Bayesian networks acknowledges the usefulness of these frameworks for addressing complex real-life problems. The usually large number of probabilities required for their application, however, is often considered a major obstacle. The use of qualitative abstractions may to some extent remove this obstacle. Qualitative Bayesian networks and associated algorithms have been developed before. Based on qualitative Bayesian networks, in this paper, we present three definitions for the time-series dynamic Bayesian networks with feedback loops and qualitative time-series dynamic Bayesian networks by defining qualitative influence between adjacent time slices analogously. Then, we apply the qualitative dynamic Bayesian networks with feedback loops to an economic example and make qualitative decision successfully.
This paper presents a procedure for measuring a quantity when there is serially correlated data. In this situation, the statistical method of determining the standard uncertainty that assumes independence is inaccurate and, usually, anti-conservative. The procedure presented here involves calculating (a) an alternative estimate of the standard deviation of the sample mean for use as the standard uncertainty and (b) an associated number of degrees of freedom. Simulations for large and small samples show the procedure to substantially improve on the method that assumes independence, with the amount of improvement increasing with sample size.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.