Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We consider the uniform integration approximation problem in a class of Sobolev spaces. This type of problem can be transformed into the study of discrepancy function for certain sampling sets. The main tools are L2-discrepancy methods and stratified sampling. We first extend the notion of jittered sampling (a special case of stratified sampling) to two general classes of partitions, which are based on jittered grids. Second, we compute the expected L2-discrepancy under these two classes of partitions, and the explicit and exact formulas are derived. These results yield better expected L2-discrepancy formulas than jittered sampling. Our method is to calculate the size relationship between the expected L2-discrepancy under the new partition models and jittered sampling. In the end, this idea can also be applied to general measure partition models, such as a partition that is no longer dependent on jittered grids. This may allow us to obtain the exact expected L2-discrepancy formula under partition models with appropriate sampling number conditions.
Let us color the vertices of the grid ℤd or the infinite regular tree 𝕋d, using a finite number of colors, with the constraint that some predefined pairs of colors are not allowed for adjacent vertices. The set of admissible colorings is called a nearest-neighbor subshift of finite type (SFT). We study “uniform” probability measures on SFT, with the motivation of having an insight into “typical” admissible configurations. We recall the known results on uniform measures on SFT on grids and we complete the picture by presenting some contributions to the description of uniform measures on SFT on 𝕋d. Then we focus on the problem of uniform random sampling of configurations of SFT. We propose a first method based on probabilistic cellular automata, which is valid under some restrictive conditions. Then we concentrate on the case of SFT on ℤ for which we propose several alternative sampling methods.
Symmetry of an object on a plane and in a space is an important geometric feature for biology, chemistry, and the understanding of human perception of figures. We propose a randomized method for the detection of symmetry in planar polygons and polyhedrons without assuming the predetermination of the centroids of the objects. Using a voting process, which is the main concept of the Hough transform in image processing, we transform the geometric computation for symmetry detection which is usually based on graph theory and combinatorial optimization, to the peak detection problem in a voting space in the context of the Hough transform. Our algorithm detects the centroid after detecting symmetry of an object for both planar and spatial objects.
Face recognition, as a research hot topic, still faces many challenges. This paper proposes a new face recognition method by fusing the advantages of fuzzy set theory, sub-image method and random sampling technique. In this method, we partition an original image into some sub-images to improve the robustness to different facial variations, and extract local features from each sub-image by using fuzzy 2D-Linear Discriminant analyzis (LDA) which makes use of the class information hidden in neighbor samples. In order to increase the diversity of component classifiers and retain as much as the structural information of the row vectors, we further randomly sample row vectors from each sub-image before performing fuzzy 2D-LDA. Experimental results on Yale A, ORL, AR and Extended Yale B face databases show its superiority to other related state-of-the-art methods on the different variations such as illumination, occlusion and facial expression. Furthermore, we analyze the diversity of our proposed method by virtue of Kappa diversity-error analyzis and frequency histogram and results show that the proposed method can construct more diverse component classifiers than other methods.
A dynamic geometric data stream is a sequence of m ADD/REMOVE operations of points from a discrete geometric space {1,…, Δ}d ?. ADD (p) inserts a point p from {1,…, Δ}d into the current point set P, REMOVE(p) deletes p from P. We develop low-storage data structures to (i) maintain ε-nets and ε-approximations of range spaces of P with small VC-dimension and (ii) maintain a (1 + ε)-approximation of the weight of the Euclidean minimum spanning tree of P. Our data structure for ε-nets uses bits of memory and returns with probability 1 – δ a set of
points that is an e-net for an arbitrary fixed finite range space with VC-dimension
. Our data structure for ε-approximations uses
bits of memory and returns with probability 1 – δ a set of
points that is an ε-approximation for an arbitrary fixed finite range space with VC-dimension
. The data structure for the approximation of the weight of a Euclidean minimum spanning tree uses O(log(1/δ)(log Δ/ε)O(d)) space and is correct with probability at least 1 – δ.
Our results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P.
We prove a general lemma on the existence of (1/r)-cuttings of geometric objects in Ed that satisfy certain properties. We use this lemma to construct (1/r)-cuttings of small size for arrangements of line segments in the plane and arrangements of triangles in 3-space; for line segments in the plane we obtain a cutting of size O(r+Ar2/n2), and for triangles in 3-space our cutting has size O(r2+c+Ar3/n3). Here A is the combinatorial complexity of the arrangement. Finally, we use these results to obtain new results for several problems concerning line segments in the plane and triangles in 3-space.
We introduce the notion of expansiveness to characterize a family of robot configuration spaces whose connectivity can be effectively captured by a roadmap of randomly-sampled milestones. The analysis of expansive configuration spaces has inspired us to develop a new randomized planning algorithm. This new algorithm tries to sample only the portion of the configuration space that is relevant to the current query, avoiding the cost of precomputing a roadmap for the entire configuration space. Thus, it is well-suited for problems where only a single query is submitted for a given environment. The algorithm has been implemented and successfully applied to complex assembly maintainability problems from the automotive industry.
Covariance matching is an excellent algorithm of target tracking. In this paper, forgetting factor and random sampling methods are proposed to improve the robustness and efficiency of covariance tracking. First, a distance function between covariance matrixes is weighted by using a forgetting factor based on a fuzzy membership function to overcome the disturbances from similar targets. Then a random sampling method is applied to reduce the computing time in covariance matching and to facilitate real-time object tracking. Experiment results show that the algorithm proposed in this paper can effectively mitigate the clutter and occlusion problems at a high computing speed.
Survey analysis method is widely used in many areas such as social study, marketing research, economics, public health, clinical trials and transportation data analysis. Minimum sample size determination is always needed before a survey is conducted to avoid huge cost. Some statistical methods can be found from the literature for finding the minimum required sample size. This paper proposes a method for finding the minimum total sample size needed for the survey when the population is divided into cells. The proposed method can be used for both the infinite population case and the finite population case. A computer program is needed to realize the sample size calculation. The computer program the authors used is SAS/IML, which is a special integrated matrix language (IML) procedure of the Statistical Analysis System (SAS) software.
This study proposed a Hough Transform algorithm based on the probabilistic scheme termed as the Orientation Constrained Probabilistic Hough Transform. The orientation constraints for segmentation are applied to form compact and reliable sampling subsets. This process is subsequently followed by constrained searching. Numerical experiments are performed with both synthetic and real datasets indicate that the proposed method performs better than other algorithms in terms of correctness and omission rate. Moreover, computational time deemed necessary is much less than that for other algorithms.
Large complex dynamical systems behave in ways that reflect their structure. There are many applications where we need to control these systems by bringing them from their current state to a desired state. Affecting the state of these systems is done by communications with its key elements, called driver nodes, in reference to their representation as a network of nodes. Over the past decades, much focus has been paid on analytical approaches to derive optimal control configurations based on the concept of Minimum Driver node Sets (MDSs) for directed complex networks. However, the underlying control mechanisms of many complex systems rely on quickly controlling a major subspace of a system. In this work, we ask how complex networks behave if driver nodes are randomly selected? We seek to understand and employ the statistical characteristics of MDSs to randomly select driver nodes and analyze the controllability properties of complex network. We propose an algorithm to build Random Driver node Sets (RDSs) and analyze their controllable subspace, the minimum time needed to control, and the cardinality of RDSs. Through our evaluations on a variety of synthetic and real-world networks, we show RDSs can quickly and effectively control a major subspace of networks.
Shift-invariant spaces play an important role in approximation theory, wavelet analysis, finite elements, etc. In this paper, we consider the stability and reconstruction algorithm of random sampling in multiply generated shift-invariant spaces Vp(Φ). Under some decay conditions of the generator Φ, we approximate Vp(Φ) with finite-dimensional subspaces and prove that with overwhelming probability, the stability of sampling set conditions holds uniformly for all functions in certain compact subsets of Vp(Φ) when the sampling size is sufficiently large. Moreover, we show that this stability problem is connected with properties of the random matrix generated by Φ. In the end, we give a reconstruction algorithm for the random sampling of functions in Vp(Φ).
The set of sampling and reconstruction in trigonometric polynomial spaces will play an important role in signal processing. However, in many applications, the frequencies in trigonometric polynomial spaces are not all integers. In this paper, we consider the problem of weighted random sampling and reconstruction of functions in general multivariate trigonometric polynomial spaces. The sampling set is randomly selected on a bounded cube with a probability distribution. We obtain that with overwhelming probability, the sampling inequality holds and the explicit reconstruction formula succeeds for all functions in the general multivariate trigonometric polynomial spaces when the sampling size is sufficiently large.
Automated text mining is an especially important task in modern data analysis, both from theoretical and experimental points of view. This particular problem has a major interest in the digital age that is related to “Artificial Intelligence, Machine learning and Information Retrieval”. Feature selection and classification of high dimensionality of text data are challenging tasks. In this paper, we adopted an optimal method for dealing with high dimensionality of data. Later, we chose an appropriate strategy (learning algorithm) for an effcient model training. Our empirical evaluation and experimental analysis show that the proposed method performs better compared with other variable selection-based dimension reduction and further text categorisation methods. We exploited several systematic and careful experimentation scenarios in this work to discover what architecture works best for this BBC news dataset. We used 3 hidden layers, each layer with 128 neurons. We observed this architecture optimal as per our specific problem experimentation. Moreover, our proposed method can be useful for improving efficiency and speed-up the calculations on certain datasets.
An oversampling-based randomized algorithm is introduced for sorting n keys of any abstract data type on the p processors of a latency-tolerant parallel system such as a bulk-synchronous parallel computer. The algorithm is asymptotically, for large n, optimal in computation and communication compared to the best available sequential sorting algorithm, even when constant factors are taken into consideration. Its parallel time is within a (1 + o(1))/p multiplicative factor of the corresponding sequential method for sorting, improving upon other approaches for a wider range of values of p relative to n. It also improves upon other randomized sorting algorithms that have been developed for latency-tolerant models of computation on the amount of parallel slack (ratio n over p) required to achieve optimal speedup and also, on the associated failure probability. For values of p closer to n than other latency-tolerant randomized approaches, it can be turned into a PRAM-like algorithm but for such cases a speedup of O(p) rather than p/(1 + o(1)) is then achievable. Although the general framework of the proposed algorithm relies on the well-established prior idea of sample-based sorting, there are some novel features in its design such as the choice of splitter size, the way keys are split, and the handling of the base case of the recursive sorting algorithm that contribute to its performance.
Background: Carpal tunnel syndrome (CTS) is the most common entrapment neuropathy worldwide, but there are few reports investigating its prevalence using subjects diagnosed by both clinical symptoms and nerve conduction studies (NCSs) in a population-based cohort. This study aimed to determine the epidemiology of CTS diagnosed by sensory disturbance findings and NCSs using a randomly sampled resident population.
Methods: Subjects aged between 50 and 89 years were randomly sampled from the basic resident registry of a rural Japanese town. Subjects indicating a history of CTS surgery in a written questionnaire were classified as having past CTS. Subjects with both sensory disturbance of the median nerve area and delays in NCSs were diagnosed as having present CTS. Subjects with past or present CTS were judged as affected with CTS. We calculated the prevalence of CTS and investigated for possible risk factors.
Results: Seventeen subjects (14 female and 3 male) were affected with CTS among 379 enrolled subjects. Adjusting these results to Japanese population values, the weighted prevalence of CTS was 4.7% (female: 7.2%, male: 1.8%) in the Japanese population aged 50 to 89 years. Statistically significant positive correlations were found between CTS and female, higher BMI, rheumatoid arthritis, and trigger digit. In females affected with CTS, third metacarpal length was significantly shorter than in those without CTS.
Conclusions: This epidemiological study clarified the prevalence of CTS among Japanese seniors as 4.7%. Female, higher BMI, rheumatoid arthritis, trigger digit, and shorter third metacarpal length in females were risk factors for CTS.
There are growing interests in algorithms over data streams recently. This paper introduces the problem of sampling from sliding windows of recent data items from data streams and presents two random sampling algorithms for this problem. The first algorithm is a basic window-based sampling algorithm (BWRS Algorithm) for count-based sliding window. BWRS algorithm extends classic reservoir sampling to deal with the expiration of data elements from count-based sliding window, and can avoid drawbacks of classic reservoir sampling. The second algorithm is a stratified multistage sampling algorithm for time-based sliding window (SMS Algorithm). The SMS algorithm takes different sampling fraction in different strata from time-based sliding window, and works even when the number of data items in the sliding window varies dynamically over time. The theoretic analysis and experiments show that the algorithms are effective and efficient for continuous data streams processing.
The average degree is a fundamental concept of complex networks. Some properties of the general aggregate are deduced from the properties of the sample using random sampling. In this paper, the relationship of average degree between the subnetwork and networks is investigated. Different relationships between the subnetworks and networks for some real complex networks are described by numerical tests. The results show that the average degree of subnetwork is not directly proportional to the value of sampling rate for the scientific collaboration network and the neural network. However, the average degree of subnetwork is directly proportional to the value of sampling rate for the American football games network. The findings are useful in revealing some of the properties of unknown complex networks.