[Keyword: Tiling] AND [All Categories: Computer Architecture] : Search

In this topic

Advanced Search

SEARCH GUIDE

Results: 1 - 7of7

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
[Keyword: Tiling] AND [All Categories: Computer Architecture] (7)	27 Mar 2025	Run
[Keyword: Policy] AND [All Categories: Environmental Economics / Energy Economics] (13)	27 Mar 2025	Run
[Keyword: Kidney] AND [All Categories: Computer Science] (2)	27 Mar 2025	Run
[Keyword: Ethics] AND [All Categories: Bioinformatics & Computational Biology] (5)	27 Mar 2025	Run
[Keyword: Xylene] AND [All Categories: Pollution / Waste Management] (2)	27 Mar 2025	Run

articleNo Access
CACHE MISS ANALYSIS OF 2D STENCIL CODES WITH TILED TIME LOOP
- CLAUDIA LEOPOLD
International Journal of Foundations of Computer Science01 Feb 2003
Preview Abstract
Stencil codes such as the Jacobi, Gauß-Seidel, and red-black Gauß-Seidel kernels are among the most time-consuming routines in many scientific and engineering applications. The performance of these codes critically depends on an efficient usage of caches, and can be improved by tiling. Several tiling schemes have been suggested in the literature; this paper gives an overview and comparison. Then, in the main part, we prove a lower bound on the number of cold and capacity misses. Finally, we analyze a particular tiling scheme, and show that it is off the lower bound by a factor of at most ten. Our results show up limitations to the speedup that can be gained by future research.
articleNo Access
POLLY — PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION
Parallel Processing Letters01 Dec 2012
Preview Abstract
The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers, they must be performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present an infrastructure for polyhedral optimizations on such an IR. We describe the detection of program parts amenable to a polyhedral optimization (so-called static control parts), their translation to a Z-polyhedral representation, optimizations on this representation and the generation of optimized IR code. Furthermore, we define an interface for connecting external optimizers and present a novel way of using the parallelism they introduce to generate SIMD and OpenMP code. To evaluate Polly, we compile the PolyBench 2.0 benchmarks fully automatically with PLuTo as external optimizer and parallelizer. We can report on significant speedups.
articleNo Access
The Relation Between Diamond Tiling and Hexagonal Tiling
- Tobias Grosser,
- Sven Verdoolaege,
- Albert Cohen, and
- P. Sadayappan
Parallel Processing Letters01 Sep 2014
Preview Abstract
Iterative stencil computations are important in scientific computing and more also in the embedded and mobile domain. Recent publications have shown that tiling schemes that ensure concurrent start provide efficient ways to execute these kernels. Diamond tiling and hybrid-hexagonal tiling are two tiling schemes that enable concurrent start. Both have different advantages: diamond tiling has been integrated in a general purpose optimization framework and uses a cost function to choose among tiling hyperplanes, whereas the greater flexibility with tile sizes for hybrid-hexagonal tiling has been exploited for effective generation of GPU code.
In this paper we undertake a comparative study of these two tiling approaches and propose a hybrid approach that combines them. We analyze the effects of tile size and wavefront choices on tile-level parallelism, and formulate constraints for optimal diamond tile shapes. We then extend, for the case of two dimensions, the diamond tiling formulation into a hexagonal tiling one, which offers both the flexibility of hexagonal tiling and the generality of the original diamond tiling implementation. We also show how to compute tile sizes that maximize the compute-to-communication ratio, and apply this result to compare the best achievable ratio and the associated synchronization overhead for diamond and hexagonal tiling.
articleNo Access
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling
Parallel Processing Letters01 Dec 1997
Preview Abstract
Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.
articleNo Access
Compile-Time Cache Performance Prediction and Its Application to Tiling
- Wesley K. Kaplow and
- Boleslaw K. Szymanski
Parallel Processing Letters01 Dec 1997
Preview Abstract
Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining loop tile sizes that achieve the highest cache hit-rate. The widening disparity between a processor's peak instruction rate and main memory access time in modern computer systems makes this kind of optimization increasingly important for overall program efficiency.
Our simulation technique generates only those references of a loop nest that may generate a cache memory miss and processes them on an architecturally accurate cache model at compile-time. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions of memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. These simulation speeds and statistics allow for the accurate analysis of the impact of cache optimizations at compile-time.
We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.
articleNo Access
On Tiling as a Loop Transformation
- Jingling Xue
Parallel Processing Letters01 Dec 1997
Preview Abstract
This paper is a follow-up Irigoin and Triolet's earlier work and our recent work on tiling. In this paper, tiling is discussed in terms of its effects on the dependences between tiles, the dependences within a tile and the required dependence test for legality. A necessary and sufficient condition is given for enforcing the data dependences of the program, while Irigion and Triolet's atomic tile constraint is only sufficient. A condition is identified under which both Irigoin and Triolet's and our constraints are equivalent. The results of this paper are discussed in terms of their impact on dependence abstractions suitable for legality test and on tiling to optimise a certain given goal.
chapterNo Access
BASIC PUZZLE LANGUAGES
- K. G. SUBRAMANIAN,
- R. SIROMONEY,
- V. R. DARE, and
- A. SAOUDI
Parallel Image Analysis: Theory and Applications01 Dec 1995
Preview Abstract
The emptiness problem for non-overlapping Basic puzzle grammars is shown to be decidable. An alternate proof of the decidability of the non-overlapping feature for basic puzzle grammars is given. Hierarchy among the various classes of puzzle languages is also established.