Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We present a new approach to parallel copying garbage collection on symmetric multiprocessor (SMP) machines appropriate for Java and other object-oriented languages. Parallel, in this setting, means that the collector runs in several parallel threads.
Our collector is based on a new idea called delayed allocation, which completely eliminates the fragmentation problem of previous parallel copying collectors while still keeping low synchronization, high efficiency, and simplicity of collection. In addition to this main idea, we also discuss several other ideas such as termination detection, balancing the distribution of work, and dealing with contention during work distribution.
The recent advent of stacked die memory and logic technologies has lead to a resurgence in research associated with fundamental architectural techniques. Many architecture research projects begin with ample simulation of the target theoretical functions and approach. However, the logical and physical nature three-dimensional stacked devices, such as the Hybrid Memory Cube (HMC) specification, fundamentally do not align with traditional memory simulation techniques. As such, there currently exists a chasm in the capabilities of modern architectural simulation frameworks. This work introduces a new simulation framework developed specifically for the Hybrid Memory Cube specification. We present a set of novel techniques implemented on an associated development framework that provide an infrastructure to flexibly simulate one or more Hybrid Memory Cube stacked die memory devices attached to an arbitrary core processor. The goal of this development infrastructure is to provide architectural simulation frameworks the ability to begin migrating current banked DRAM memory models to stacked HMC-based designs without a reduction in simulation fidelity or functionality. In addition to the core architecture, this work also presents a series of memory workload test results that represent unit stride and random access memory patterns using the HMC-Sim infrastructure. These evaluations confirm that HMC-Sim can provide insightful guidance in designing and developing highly efficient systems, algorithms, and applications, considering the next-generation three-dimensional stacked memory devices.
Shared memory multiprocessor systems need efficient dynamic storage allocators, both for system purposes and to support parallel programs. Memory managers are often based on the buddy system, which provides fast allocation and release. Previous parallel buddy memory managers made no attempt to coordinate the allocation, splitting and release of blocks, and as a result needlessly fragment memory. We present a fast and simple parallel buddy memory manager that is also as space efficient as a serial buddy memory manager. We test our algorithms using memory allocation/deallocation traces collected from a parallel sparse matrix algorithm.
Two techniques for managing memory on a parallel random access machine (PRAM) are presented. One is a scheme for an n/log n processor EREW PRAM that dynamically allocates and deallocates up to n records of the same size in O(log n) time. The other is a simulation of a PRAM with initialized memory by one with uninitialized memory. A CREW PRAM variant of the technique justifies the assumption that memory can be assumed to be appropriately initialized with no asymptotic increase in time but a factor of n increase in space. An EREW PRAM solution incurs a factor of O(log n) increase in time but only a constant factor increase in space.
Replication-based incremental garbage collection is one of the more appealing concurrent garbage collection algorithms known today. It allows continuous operation of the application (the mutator) with very short pauses for garbage collection. There is a growing need for such garbage collectors suitable for a multithreaded environments such as the Java Virtual Machine. Furthermore, it is desirable to construct collectors that also work on multiprocessor computers.
We begin by pointing out an important, yet subtle point, which arises when implementing the replication-based garbage collector for a multithreaded environment. We first show that a simple and natural implementation of the algorithm may lead to an incorrect behavior of multithreaded applications. We then show that another simple and natural implementation eliminates the problem completely. Thus, the contribution of this part is in stressing this warning to future implementors.
Next, we address the effects of the memory coherence model on this algorithm. We show that even when the algorithm is properly implemented with respect to our first observation, a problem might still arise when a multiprocessor system is used. Adopting a naive solution to this problem results in very frequent (and expensive) synchronization. We offer a slight modification to the algorithm which eliminates the problem and requires little synchronization.
It is an important problem to determine the tenuring collection time or major collection time to meet the pause time goal for a generational garbage collector. From such a viewpoint, this paper proposes two stochastic models based on the working schemes of a generational garbage collector: Garbage collections occur at a nonhomogeneous Poisson process. Minor collections are made when the garbage collector begins to work, tenuring collection is made at a planned time T or at the first collection time when surviving objects have exceeded K for the first model. Major collection is made at time T or at the Nth collection for the second model. Using the techniques of cumulative processes and reliability theory, expected cost rates are obtained, and optimal policies of tenuring and major collection times which minimize them are discussed analytically and computed numerically.
Memory is becoming one of the major power consumers in computing systems. Therefore, energy efficient memory management is essential. Modern memory systems employ sleep states for energy saving. To utilize this feature, existing research activities have concentrated on increasing spatial locality to deactivate as many blocks as possible. However, they did not count the unexpected activation of memory blocks due to cache eviction of deactivated tasks. In this paper, we suggest a software-based power state management scheme for memory, which exploits temporal locality to relieve the energy loss from the unexpected activation of memory blocks from cache eviction. The suggested scheme SW-NAP makes a memory block remain deactivated during a certain tick, which has no cache miss over the block. The evaluation shows that SW-NAP is 50% better than PAVM, which is an existing software scheme, and worse than PMU, which is another approach based on the specialized hardware by 20%. We also suggest task scheduling policies that increase the effectiveness of SW-NAP and they saved up to 7% additional energy.
This paper proposes a new architecture for efficient and minimized memory management in Viterbi Decoders based on Zig-Zag algorithm. The memory organization techniques mainly deal with the storage of survivor sequences from which the decoded information sequence is retrieved. The survivor sequences are usually stored in RAM blocks and traced back. Register Exchange (RE) Method and Trace-Back (TB) Method have been used for memory management techniques. Due to large memory area utilization of these two methods, a new architecture is implemented in this paper. This method uses only a single RAM instead of two (used in Trace-Back Method), which performs storage as well as trace-back simultaneously. The implementation shows that the memory size has been reduced to 56.09% when compared to the trace-back (TB) Method. The trade off in latency has been compensated by optimization of the parameters and it has been reduced to 50%. The architecture based on Viterbi Decoder has been implemented with a constraint length of 4, code rate of 1/6 and the traceback depth of 20.