Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Stencil codes such as the Jacobi, Gauß-Seidel, and red-black Gauß-Seidel kernels are among the most time-consuming routines in many scientific and engineering applications. The performance of these codes critically depends on an efficient usage of caches, and can be improved by tiling. Several tiling schemes have been suggested in the literature; this paper gives an overview and comparison. Then, in the main part, we prove a lower bound on the number of cold and capacity misses. Finally, we analyze a particular tiling scheme, and show that it is off the lower bound by a factor of at most ten. Our results show up limitations to the speedup that can be gained by future research.
The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers, they must be performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present an infrastructure for polyhedral optimizations on such an IR. We describe the detection of program parts amenable to a polyhedral optimization (so-called static control parts), their translation to a Z-polyhedral representation, optimizations on this representation and the generation of optimized IR code. Furthermore, we define an interface for connecting external optimizers and present a novel way of using the parallelism they introduce to generate SIMD and OpenMP code. To evaluate Polly, we compile the PolyBench 2.0 benchmarks fully automatically with PLuTo as external optimizer and parallelizer. We can report on significant speedups.
Iterative stencil computations are important in scientific computing and more also in the embedded and mobile domain. Recent publications have shown that tiling schemes that ensure concurrent start provide efficient ways to execute these kernels. Diamond tiling and hybrid-hexagonal tiling are two tiling schemes that enable concurrent start. Both have different advantages: diamond tiling has been integrated in a general purpose optimization framework and uses a cost function to choose among tiling hyperplanes, whereas the greater flexibility with tile sizes for hybrid-hexagonal tiling has been exploited for effective generation of GPU code.
In this paper we undertake a comparative study of these two tiling approaches and propose a hybrid approach that combines them. We analyze the effects of tile size and wavefront choices on tile-level parallelism, and formulate constraints for optimal diamond tile shapes. We then extend, for the case of two dimensions, the diamond tiling formulation into a hexagonal tiling one, which offers both the flexibility of hexagonal tiling and the generality of the original diamond tiling implementation. We also show how to compute tile sizes that maximize the compute-to-communication ratio, and apply this result to compare the best achievable ratio and the associated synchronization overhead for diamond and hexagonal tiling.
Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.
Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining loop tile sizes that achieve the highest cache hit-rate. The widening disparity between a processor's peak instruction rate and main memory access time in modern computer systems makes this kind of optimization increasingly important for overall program efficiency.
Our simulation technique generates only those references of a loop nest that may generate a cache memory miss and processes them on an architecturally accurate cache model at compile-time. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions of memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. These simulation speeds and statistics allow for the accurate analysis of the impact of cache optimizations at compile-time.
We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.
This paper is a follow-up Irigoin and Triolet's earlier work and our recent work on tiling. In this paper, tiling is discussed in terms of its effects on the dependences between tiles, the dependences within a tile and the required dependence test for legality. A necessary and sufficient condition is given for enforcing the data dependences of the program, while Irigion and Triolet's atomic tile constraint is only sufficient. A condition is identified under which both Irigoin and Triolet's and our constraints are equivalent. The results of this paper are discussed in terms of their impact on dependence abstractions suitable for legality test and on tiling to optimise a certain given goal.
The Siromoney matrix model is a simple and elegant model for describing two-dimensional digital picture languages. The notion of attaching indices to nonterminals in a generative grammar, introduced and investigated by Aho. is considered in the vertical phase of a Siromoney matrix grammar (SMG). The advantage of this study is that the new model retains the simplicity and elegance of SMG but increases the generative power and enables us to describe pictures not generable by SMG. Besides certain closure properties and hierarchy results. applications of these two-dimensional grammars to describe tilings, polyominoes, distorted patterns and parquet deformations are studied.
The emptiness problem for non-overlapping Basic puzzle grammars is shown to be decidable. An alternate proof of the decidability of the non-overlapping feature for basic puzzle grammars is given. Hierarchy among the various classes of puzzle languages is also established.
A remarkable theorem is described: "It is possible to tile the plane with non-overlapping squares using exactly one square of each integral dimension". Thus, one can "square the plane".
Interpolating a piecewise-linear triangulated surface between two polygons lying in parallel planes has attracted a lot of attention in the literature over the past three decades. This problem is the simplest variant of interpolation between parallel slices, which may contain multiple polygons with unrestricted geometries and/or topologies. Its solution has important applications to medical imaging, digitization of objects, geographical information systems, and more. Practically all currently-known methods for surface reconstruction from parallel slices assume a priori the existence of a non-self-intersecting triangulated surface defined over the vertices of the two polygons, which connects them. Gitlin et al. were the first to specify a nonmatable pair of polygons. In this paper we provide proof of the nonmatability of a “simpler” pair of polygons, which is less complex than the example given by Gitlin et al. Furthermore, we provide a family of polygon pairs with unbounded complexity, which we believe to be nonmatable. We also give a few sufficient conditions for polygon matability.
A set of natural numbers tiles the plane if a square-tiling of the plane exists using exactly one square of sidelength n for every n in the set. From Ref. 8 we know that ℕ itself tiles the plane. From that and Ref. 9 we know that the set of even numbers tiles the plane while the set of odd numbers doesn't. In this paper we explore the nature of this property. We show, for example, that neither tiling nor non-tiling is preserved by superset. We show that a set with one or three odd numbers may tile the plane—but a set with two odd numbers can't.
We find examples of both tiling and non-tiling sets that can be partitioned into tiling sets, non-tiling sets or a combination. We show that any set growing faster than the Fibonacci numbers cannot tile the plane.
There exists a linear algorithm to decide whether a polyomino tessellates the plane by translation only. On the other hand, the problem of deciding whether a set of 11 or more polyominoes can tile the plane by translation is undecidable. We narrow the gap between decidable and undecidable by showing that it remains undecidable for a set of 10 polyominoes, which partially solves a conjecture posed by Ollinger.
The emptiness problem for non-overlapping Basic puzzle grammars is shown to be decidable. An alternate proof of the decidability of the non-overlapping feature for basic puzzle grammars is given. Hierarchy among the various classes of puzzle languages is also established.
The Siromoney matrix model is a simple and elegant model for describing two-dimensional digital picture languages. The notion of attaching indices to nonterminals in a generative grammar, introduced and investigated by Aho, is considered in the vertical phase of a Siromoney matrix grammar (SMG). The advantage of this study is that the new model retains the simplicity and elegance of SMG but increases the generative power and enables us to describe pictures not generable by SMG. Besides certain closure properties and hierarchy results, applications of these two-dimensional grammars to describe tilings, polyominoes, distorted patterns and parquet deformations are studied.