Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Recently, much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we establish one more relation between the HMM and the NN paradigms by introducing the time-warping network (TWN) that is a generalization of both an HMM-based recognizer and a backpropagation net. The basic element of such a network, a time- warping neuron, extends the operation of the formal neuron of a backpropagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM-based recognition system. This equivalent neural representation suggests ways to improve the discriminative power of this system by using backpropagation discriminative training, and/or by generalizing the structure of the recognizer to a multi-layer net. The performance of the proposed network was evaluated on a highly confusable, isolated word, multi-speaker recognition task. The results indicate that not only does the recognition performance improve, but the separation between classes is enhanced, allowing us to set up a rejection criterion to improve the confidence of the system.
This paper proposes an indexing technique for fast retrieval of similar image subsequences using the multi-dimensional time warping distance. The time warping distance is a more suitable similarity measure as compared to the Lp distance in many applications where sequences may be of different lengths and/or different sampling rates. Our indexing scheme employs a disk-based suffix tree as an index structure and uses a lower-bound distance function to filter out dissimilar subsequence without false dismissals. It applies the normalization for an easier control of relative weighting of feature dimensions and the discretization to compress the index tree. Experiments on medical and synthetic image sequences verified that the proposed method significantly outperforms the naïve method and scales well in a large volume of image sequence databases.
This paper discusses the value added by using a single vector sensor over a conventional pressure-only hydrophone for geoacoustic inversions. Inversion methods based on genetic algorithms are used to estimate the seabed properties. Synthetic signals of impulsive arrivals first are modeled using KRAKEN and RAM propagation models, each being modified to predict components of the vector field. While KRAKEN is utilized to directly compute dispersion curves, RAM provides full-field results that require the application of time warping to separate the modal arrivals. Combinations of dispersion curves utilizing all vector sensor channels are compared to curves estimated with the pressure-only channel. Within the time warping analysis, both binary masking and band-pass filter masking methods are applied to compare stability of results. The environment modeled for the synthetic analysis and inversion method utilize sound speed profiles measured during the Monterey Bay 2019 at-sea experiment and assume a sediment layer of constant thickness overlying a deeper sub-bottom type. White noise is added to the synthetic data at different signal-to-noise ratios to evaluate the impact of signal excess on the results. A hybrid optimization approach is used to improve the results of the genetic algorithm method. The analysis with synthetic data is consistent with the analysis of broadband, impulsive data collected from the experiment, indicating that the additional information from the vertical velocity channel further improves the geoacoustic parameter estimates.
We present techniques for the analysis and retrieval of time-series under conditions of noise. This is an important topic because the data obtained using various sensors (examples include GPS data or video tracking data) are typically noisy. The performance of previously used measures is generally degraded under noisy conditions. Here we formalize non-metric similarity functions based on the Longest Common Subsequence that are very robust to noise. Furthermore they provide an intuitive notion of similarity between time-series by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to Answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.
Recently, much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we establish one more relation between the HMM and the NN paradigms by introducing the time-warping network (TWN) that is a generalization of both an HMM-based recognizer and a backpropagation net. The basic element of such a network, a time-warping neuron, extends the operation of the formal neuron of a backpropagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM-based recognition system. This equivalent neural representation suggests ways to improve the discriminative power of this system by using backpropagation discriminative training, and/or by generalizing the structure of the recognizer to a multi-layer net. The performance of the proposed network was evaluated on a highly confusable, isolated word, multi-speaker recognition task. The results indicate that not only does the recognition performance improve, but the separation between classes is enhanced, allowing us to set up a rejection criterion to improve the confidence of the system.