Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    PROBLEM HEAPS AND THEIR EVALUATION

    Problem heaps provide a means of asynchronous communication in multiprocessor systems: Processors put problems (subtasks, messages) on the heap and retrieve problems from it. In this paper we present an abstract model of problem heaps, analyse their hardware complexity and present realizations meeting the derived lower bounds.

  • articleNo Access

    PERFORMANCE BOUNDS FOR STATIC MULTIPROCESSOR SCHEDULING OF MULTI-TASK JOBS

    Job scheduling on multiprocessor systems is studied here as a special case of oriented two-dimensional orthogonal bin packing. Each job has subtasks which can be processed in parallel, requiring multiple processors to be allocated to each job. Then each job corresponds to a rectangle with sides equal to the processor requirement and the processing time. We study two classes of algorithms: (i) Longest processing time first (LPT) algorithms, and (ii) Largest processor requirement first (LPR) algorithms. We obtain improved asymptotic upper bounds for these algorithms compared to the bounds of the corresponding algorithms for the general two-dimensional packing problem. This is due to the discrete nature of the processor requirement (dimension) of jobs. We find that the LPR algorithms have better asymptotic upper bound on the makespan compared to the LPT algorithms. Specifically, the bound is 7/4 for the LPR algorithms whereas it is 2 for the LPT algorithms. Moreover, LPR algorithms are found to be more suited for dynamic job scheduling.

  • articleNo Access

    Compiling for Scalable Multiprocessors with Polaris

    Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. As a consequense, there has been much research on the development of compiler techniques to simplify programming, to increase reliability, and to reduce development costs. For code generation, a compiler applies a number of transformations in areas such as data privatization, data copying and replication, synchronization, and data and work distribution. In this paper, we discuss our recent work on the development and implementation of a few compiler techniques for some of these transformations. We use Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement our algorithms. The paper includes experimental results obtained by applying our techniques to several benchmark codes.

  • articleNo Access

    Mapping Fortran Programs to Single Assignment Semantics for Efficient Parallelization

    This paper presents Mustang, a system that automatically parallellizes Fortran programs by mapping them to single assignment semantics. Specifically, sequential Fortran source programs are translated into IF1, a machine-independent dataflow graph description language that is the intermediate form for the SISAL language. During this translation, Parfrase 2 is used to parse the source program perform dependency analysis and to detect opportunities for parallelization which are then explicitly introduced into the IF1 program. The resulting IF1 program is then processed by the Optimizing SISAL Compiler which produces parallel executables on multiple target platforms. A working prototype has been developed and tested. The execution results of several Livermore Loops are presented and compared against Fortran and SISAL implementations on two different platforms. The initial results obtained provide proof of concept that Fortran can be mapped to Single Assignment Semantics without sacrificing efficiency.

  • articleNo Access

    SHARED MEMORY VERSUS MESSAGE PASSING FOR ITERATIVE SOLUTION OF SPARSE, IRREGULAR PROBLEMS

    The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform. We evaluate the communication mechanisms of the MIT Alewife machine, a multiprocessor which provides integrated cache-coherent shared memory, massage passing, and DMA. We perform this evaluation with "best-effort" implementations which solve several sparse, irregular benchmark problems with a preconditioned conjugate gradient sparse matrix solver (ICCG).

    We find that machines with fast global memory operations do not need message passing or bulk transfer to suport our irregular problems. This is primarily due to three reasons. First, a 5-to-1 ratio between global and local cache misses makes memory copies in bulk communication expensive relati to communication via shared memory. Second, although message passing has synchronization semantics superior to shared memory for data-driven computation, efficient shared memory can overcome this handicap by using global read-modify-writes to change from the traditional owner-computers model to a producer-computes model. Third, bulk transfers can result in high processor idle times in irregular applications.

  • articleNo Access

    A NOTE ON THE IMPLEMENTATION OF REPLICATION-BASED GARBAGE COLLECTION FOR MULTITHREADED APPLICATIONS AND MULTIPROCESSOR ENVIRONMENTS

    Replication-based incremental garbage collection is one of the more appealing concurrent garbage collection algorithms known today. It allows continuous operation of the application (the mutator) with very short pauses for garbage collection. There is a growing need for such garbage collectors suitable for a multithreaded environments such as the Java Virtual Machine. Furthermore, it is desirable to construct collectors that also work on multiprocessor computers.

    We begin by pointing out an important, yet subtle point, which arises when implementing the replication-based garbage collector for a multithreaded environment. We first show that a simple and natural implementation of the algorithm may lead to an incorrect behavior of multithreaded applications. We then show that another simple and natural implementation eliminates the problem completely. Thus, the contribution of this part is in stressing this warning to future implementors.

    Next, we address the effects of the memory coherence model on this algorithm. We show that even when the algorithm is properly implemented with respect to our first observation, a problem might still arise when a multiprocessor system is used. Adopting a naive solution to this problem results in very frequent (and expensive) synchronization. We offer a slight modification to the algorithm which eliminates the problem and requires little synchronization.

  • articleNo Access

    Global Fixed-Priority Scheduling for Parallel Real-Time Tasks with Constrained Parallelism

    With the rapid development of parallel programming techniques and the widespread use of multiprocessors, scheduling and analysis techniques supporting parallel real-time tasks become a critical topic for multiprocessor real-time systems. Global scheduling that allows the vertices of a parallel task to execute on any processor is a promising scheduling approach with guaranteed theoretical bounds and wide use in practice. However, the complex internal structure of parallel tasks leads to extensive inner- and inter-task interference, which leads to significant pessimism in the worst-case timing analysis. In this paper, a global fixed-priority (G-FP) scheduling with constrained parallelism for parallel real-time tasks based on the sporadic directed acyclic graph (DAG) model is proposed. Each DAG task is assigned a parallel threshold, such that the number of processors occupied by the task is limited to the parallel threshold of the task at a time. We propose a heuristic algorithm to set the parallel threshold and present a response-time analysis (RTA) to exploit the feature of the constrained parallel scheduling. Experiments with randomly generated tasks show that the proposed approach improves the schedulability upon G-FP and federated scheduling in terms of acceptance ratio.