Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Stencil codes such as the Jacobi, Gauß-Seidel, and red-black Gauß-Seidel kernels are among the most time-consuming routines in many scientific and engineering applications. The performance of these codes critically depends on an efficient usage of caches, and can be improved by tiling. Several tiling schemes have been suggested in the literature; this paper gives an overview and comparison. Then, in the main part, we prove a lower bound on the number of cold and capacity misses. Finally, we analyze a particular tiling scheme, and show that it is off the lower bound by a factor of at most ten. Our results show up limitations to the speedup that can be gained by future research.
The paper is concerned with aperiodic linearly repetitive tilings. For such tilings, we establish a weak form of self-similarity that allows us to prove general (sub)additive ergodic theorems. Finally, we provide applications to the study of lattice gas models.
One-dimensional cellular automata are known to be able to present complex behaviors. In some cases, their evolution may be understood as movings, collisions, or creations of particles. In the case of the special Wolfram's rule 54, Boccara has previously pointed out basic particles. In this paper, we introduce a group which allows the formal study of interactions between these particles. Coming back to the complexity of rule 54 and using the new algebraic classification of Rapaport, we prove that rule 54 is not simple.
The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers, they must be performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present an infrastructure for polyhedral optimizations on such an IR. We describe the detection of program parts amenable to a polyhedral optimization (so-called static control parts), their translation to a Z-polyhedral representation, optimizations on this representation and the generation of optimized IR code. Furthermore, we define an interface for connecting external optimizers and present a novel way of using the parallelism they introduce to generate SIMD and OpenMP code. To evaluate Polly, we compile the PolyBench 2.0 benchmarks fully automatically with PLuTo as external optimizer and parallelizer. We can report on significant speedups.
Iterative stencil computations are important in scientific computing and more also in the embedded and mobile domain. Recent publications have shown that tiling schemes that ensure concurrent start provide efficient ways to execute these kernels. Diamond tiling and hybrid-hexagonal tiling are two tiling schemes that enable concurrent start. Both have different advantages: diamond tiling has been integrated in a general purpose optimization framework and uses a cost function to choose among tiling hyperplanes, whereas the greater flexibility with tile sizes for hybrid-hexagonal tiling has been exploited for effective generation of GPU code.
In this paper we undertake a comparative study of these two tiling approaches and propose a hybrid approach that combines them. We analyze the effects of tile size and wavefront choices on tile-level parallelism, and formulate constraints for optimal diamond tile shapes. We then extend, for the case of two dimensions, the diamond tiling formulation into a hexagonal tiling one, which offers both the flexibility of hexagonal tiling and the generality of the original diamond tiling implementation. We also show how to compute tile sizes that maximize the compute-to-communication ratio, and apply this result to compare the best achievable ratio and the associated synchronization overhead for diamond and hexagonal tiling.
Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.
Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining loop tile sizes that achieve the highest cache hit-rate. The widening disparity between a processor's peak instruction rate and main memory access time in modern computer systems makes this kind of optimization increasingly important for overall program efficiency.
Our simulation technique generates only those references of a loop nest that may generate a cache memory miss and processes them on an architecturally accurate cache model at compile-time. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions of memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. These simulation speeds and statistics allow for the accurate analysis of the impact of cache optimizations at compile-time.
We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.
This paper is a follow-up Irigoin and Triolet's earlier work and our recent work on tiling. In this paper, tiling is discussed in terms of its effects on the dependences between tiles, the dependences within a tile and the required dependence test for legality. A necessary and sufficient condition is given for enforcing the data dependences of the program, while Irigion and Triolet's atomic tile constraint is only sufficient. A condition is identified under which both Irigoin and Triolet's and our constraints are equivalent. The results of this paper are discussed in terms of their impact on dependence abstractions suitable for legality test and on tiling to optimise a certain given goal.
The Siromoney matrix model is a simple and elegant model for describing two-dimensional digital picture languages. The notion of attaching indices to nonterminals in a generative grammar, introduced and investigated by Aho. is considered in the vertical phase of a Siromoney matrix grammar (SMG). The advantage of this study is that the new model retains the simplicity and elegance of SMG but increases the generative power and enables us to describe pictures not generable by SMG. Besides certain closure properties and hierarchy results. applications of these two-dimensional grammars to describe tilings, polyominoes, distorted patterns and parquet deformations are studied.
The emptiness problem for non-overlapping Basic puzzle grammars is shown to be decidable. An alternate proof of the decidability of the non-overlapping feature for basic puzzle grammars is given. Hierarchy among the various classes of puzzle languages is also established.
A remarkable theorem is described: "It is possible to tile the plane with non-overlapping squares using exactly one square of each integral dimension". Thus, one can "square the plane".
Interpolating a piecewise-linear triangulated surface between two polygons lying in parallel planes has attracted a lot of attention in the literature over the past three decades. This problem is the simplest variant of interpolation between parallel slices, which may contain multiple polygons with unrestricted geometries and/or topologies. Its solution has important applications to medical imaging, digitization of objects, geographical information systems, and more. Practically all currently-known methods for surface reconstruction from parallel slices assume a priori the existence of a non-self-intersecting triangulated surface defined over the vertices of the two polygons, which connects them. Gitlin et al. were the first to specify a nonmatable pair of polygons. In this paper we provide proof of the nonmatability of a “simpler” pair of polygons, which is less complex than the example given by Gitlin et al. Furthermore, we provide a family of polygon pairs with unbounded complexity, which we believe to be nonmatable. We also give a few sufficient conditions for polygon matability.
A set of natural numbers tiles the plane if a square-tiling of the plane exists using exactly one square of sidelength n for every n in the set. From Ref. 8 we know that ℕ itself tiles the plane. From that and Ref. 9 we know that the set of even numbers tiles the plane while the set of odd numbers doesn't. In this paper we explore the nature of this property. We show, for example, that neither tiling nor non-tiling is preserved by superset. We show that a set with one or three odd numbers may tile the plane—but a set with two odd numbers can't.
We find examples of both tiling and non-tiling sets that can be partitioned into tiling sets, non-tiling sets or a combination. We show that any set growing faster than the Fibonacci numbers cannot tile the plane.
There exists a linear algorithm to decide whether a polyomino tessellates the plane by translation only. On the other hand, the problem of deciding whether a set of 11 or more polyominoes can tile the plane by translation is undecidable. We narrow the gap between decidable and undecidable by showing that it remains undecidable for a set of 10 polyominoes, which partially solves a conjecture posed by Ollinger.
Not much is known about the topological structure of a connected self-similar tile whose interior is disconnected, and even less is understood if the interior consists of infinitely many components. We introduce a technique to show that for a large class of self-similar tiles in ℝ2, the closure of each component of the interior is homeomorphic to a disk. This allows us to prove such a result for the Eisenstein set, the fundamental domain of a well-known quadratic canonical number system, and some other well-known fractal tiles.
In this paper, we present an idea of creating fractals by using the geometric arc as the basic element. This approach of generating fractals, through the tuning of just three parameters, gives a universal way to obtain many novel fractals including the classic ones. Although this arc-fractal system shares similar features with the well-known Lindenmayer system, such as the same set of invariant points and the ability to tile the space, they do have different properties. One of which is the generation of pseudo-random number, which is not available in the Lindenmayer system. Furthermore, by assuming that coastline formation is based purely on the processes of erosion and deposition, the arc-fractal system can also serve as a dynamical model of coastal morphology, with each level of its construction corresponds to the time evolution of the shape of the coastal features. Remarkably, our results indicate that the arc-fractal system can provide an explanation on the origin of fractality in real coastline.
The object under study is a particular closed and simple curve on the square lattice ℤ2 related with the Fibonacci sequence Fn. It belongs to a class of curves whose length is 4F3n+1, and whose interiors tile the plane by translation. The limit object, when conveniently normalized, is a fractal line for which we compute first the fractal dimension, and then give a complexity measure.
A fractal tiling is a tiling which possesses self-similarity and the boundary of which is a fractal. In this paper, we investigate the boundary dimension of sn- and un-tilings. We first derive an explicit recursion formula for the boundary edges of sn- and un-tilings. Then we present an analytical expression for their fractal boundary dimensions using matrix methods. Results indicate that, as n increases, the boundaries of sn- and un-tilings will degenerate into general Euclidean curves. The method proposed in this paper can be extended to compute the boundary dimensions of other kinds of fractal tilings.
A fractal tiling or f-tiling is a tiling which possesses self-similarity and the boundary of which is a fractal. By substitution rule of tilings, this short paper presents a very simple strategy to create a great number of f-tilings. The substitution tiling Equithirds is demonstrated to show how to achieve it in detail. The method can be generalized to every tiling that can be constructed by substitution rule.
In this paper we study the geometry of minimal networks on some Riemannian manifolds (each small part of such a network has the smallest possible length). We discuss a complete classification of planar minimal networks spanning convex boundary sets, some results concerning the investigation of minimal networks spanning the vertex set of a regular polygon. Also, to study the bifurcations of a minimal network under a deformation of the boundary of the network, we construct a bifurcation graph. The last section of the paper is devoted to the complete classification of closed (without boundary) minimal networks on a flat torus.