Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    PARALLELIZATION OF WAVELET FILTERS USING SIMD EXTENSIONS

    Much work has been done to optimize wavelet transforms for SIMD extensions of modern CPUs. However, these approaches are mostly restricted to the vertical part of 2-D transforms with line-wise organized memory layouts because this leads to a rather straight forward SIMD-implementation. This work shows for an example of a common wavelet filter new approaches to use SIMD operations on 1-D transforms that are able to produce reasonable speedups. As a result, the performance of algorithms that use wavelet transforms, such as JPEG2000, can be increased significantly. Various variants of parallelization are presented and compared. Their advantages and disadvantages for general filters are discussed.

  • articleNo Access

    USING HARDWARE MULTITHREADING TO OVERCOME BROADCAST/REDUCTION LATENCY IN AN ASSOCIATIVE SIMD PROCESSOR

    The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.

  • articleNo Access

    POLLY — PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION

    The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers, they must be performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present an infrastructure for polyhedral optimizations on such an IR. We describe the detection of program parts amenable to a polyhedral optimization (so-called static control parts), their translation to a Z-polyhedral representation, optimizations on this representation and the generation of optimized IR code. Furthermore, we define an interface for connecting external optimizers and present a novel way of using the parallelism they introduce to generate SIMD and OpenMP code. To evaluate Polly, we compile the PolyBench 2.0 benchmarks fully automatically with PLuTo as external optimizer and parallelizer. We can report on significant speedups.

  • articleNo Access

    EFFICIENT K-SELECTION IN HYPERCUBE MULTIPROCESSORS

    This paper deals with the problem of finding the K smallest elements out of a totally ordered but non-sorted set of N elements. This problem, called K-selection, arises often in statistics, image processing and distributed computing. We propose two algorithms to solve this problem in hypercubes. Our first algorithm is asymptotically optimal when K = O((log N)β), for any constant β. The second enlarges the range of optimality to K = N, ∊ < 1, using a recursive strategy. These are major improvements on previous results.

  • articleNo Access

    Parallel Image Processing Using Neural Networks: Applications in Contrast Enhancement of Medical Images

    This paper describes the implementation of a parallel image processing algorithm, the aim of which is to give good contrast enhancement in real time, especially on the boundaries of an object of interest defined by a grey homogeneity (for example, an object of medical interest having a functional or morphologic homogeneity, like a bone or tumor). The implementation of a neural network algorithm which does this contrast enhancement has been done on a SIMD massively parallel machine (a MasPar of 8192 processors) and the communication between its processors has been optimized.

  • articleNo Access

    COMPUTATION OF THE CONTINUOUS WAVELET TRANSFORM ON MASSIVELY PARALLEL SIMD ARRAYS

    Strategies for computing the continuous wavelet transform on massively parallel SIMD arrays are introduced and discussed. The different approaches are theoretically assessed and the results of implementations on a MasPar MP-2 are compared.