Please login to be able to save your searches and receive alerts for new content matching your search criteria.
The Fast Fourier Transform is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional SIMD parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Our FFT programs are compared to the currently best Cray-2 FFT library program, CFFT2.
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n2 log n), O(n log n), O(n), and O(log n), and require n, n2, n2, and n3 processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
We have implemented pure gauge Quantum Chromo-dynamics (QCD) on the massively-parallel Connection Machine, in *Lisp. We describe our program in some detail, and give performance measurements for it. With no tuning or optimization, the code runs at approximately 500 to 1000 Mflops on a 64K CM-2, depending on the VP ratio.
Two implementations of the molecular dynamics method on the Connection Machine are given demonstrating the advantage of using NEWS communication over the general router. Timing comparisons are also made with the Cray 2.
Supercomputers are coming into wider use for generating realistic imagery for commercial animation, special effects, and scientific simulation. The Connection Machine requires a more radical rethinking of rendering algorithms than previous supercomputers since it is not intended to function as a scalar processor. A fascinating mix of changes from conventional approaches is emerging. Some procedures can run virtually unchanged while others must be turned completely inside out. We have confidence in the viability of the Connection Machine as an architecture for high-end computer graphics. For complex scenes resulting in at least tens of thousands of polygons per frame, most steps of the rendering pipeline can make effective use of the massive number of processors available.
Early approaches to massively parallel graphics systems have focused on processor per pixel organizations. We show that a dynamic mix of organizations, including processor per pixel, processor per vertex, and processor per polygon are necessary. Additionally, we note that an apparent consequence of the style of algorithm enforced by the Connection Machine is an enormously increased appetite for memory. We explore standard algorithms for image generation and note the differences that arise in an implementation for the Connetion Machine. We conclude by attempting a comparison of the viability of alternative computing environments for our application.
The advent of the Connection Machine profoundly changes the world of supercomputers. Its highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Kanerva’s sparse distributed memory (SDM) is an example of such an algorithm.
Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. In this paper I describe the foundations for sparse distributed memory, and give some simple examples of using the memory. I continue by showing the relationship of sparse distributed memory to random-access memory. Finally, I discuss the implementation of the algorithm for sparse distributed memory on the Connection Machine.
We develop an analytical model of grid communication on the connection machine for arbitrary grid geometries over exact power-of-two distances. There is excellent agreement between the model and the measured performance of the CM2. We propose some conceptually simple enhancements to the virtual processor mechanism and to the grid communication microcode which could lead to substantial performance improvements for several important numerical algorithms.
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort is designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered. Important issues in the architecture and programming environment of connection machines in general are discussed. These are contrasted with the same issues in MIMD multiprocessors and multicomputers.
This paper presents an algorithm for the Gaussian elimination problem that reduces the length of the critical path compared to the algorithm of Lord et al. This is done by redefining the notion of a task. For all practical purposes, the issues of communication overhead and pivoting cannot be overlooked. We consider these issues for the new algorithm as well. Timing results of this algorithm as executed on the CM-2 model of the Connection Machine are presented. Another contribution of this paper is the use of logical pivoting for stable computation of the Gaussian elimination algorithm. Pivoting is essential in producing stable results. When pivoting occurs, an interchange of two rows is required. A physical interchange of the values can be avoided by providing a permutation vector in a globally accessible location. We show experimental results that substantiate the use of logical pivoting.
This paper is divided into three parts: an introduction to the Hybrid Monte Carlo (HMC) algorithm; a detailed discussion of its implementation for Quantum Chromodynamics (QCD) in CMIS on the Connection Machine (CM-2); and a summary of recent analytic results on the behavior of HMC for free field theory. We present a detailed description of the architecture of the CM-2 and explain how it can execute the Conjugate Gradient (CG) solver for HMC at 6.35 GFlops/s.
This meeting produces another evidence that present parallel computers are (a) real instruments of computational physics, (b) largely in the hands of still-pioneers, (c) efficiently promoted by basic research groups with large-scale computational needs. Progress in parallel computing is carried by two types of such groups, that either follow the build-it-yourself or the early-use strategies. In this contribution, we describe, as an example to the second approach, the Wuppertal university pilot project in applied parallel computing. We report in particular about one of our key applications in theoretical particle physics on the Connection Machine CM-2: a high statistics computer experiment to determine the static quark-antiquark potential from quenched quantum chromodynamics.
A detailed investigation of the temperature dependence of the spatial string tension σs in SU(2) gauge theory is presented. A sustained performance of 3 GFLOPS on a 64K Connection Machine CM-2 equivalent has been achieved. Scaling of σs between β = 2.5115 and β = 2.74, on large lattices, is demonstrated. Below the critical temperature, Tc, σs remains constant. For temperatures larger than 2Tc the temperature dependence can be parametrized by σs(T) = (0.369 ± 0.014)2g4(T)T2, where g(T) is a 2-loop running coupling constant with the scale parameter determined as ΛT = (0.076 ± 0.013)Tc.
Semantic networks as a means for knowledge representation and manipulation are used in many artificial intelligence applications. A number of computer architectures, that have been reported for semantic network processing, are presented in this paper. A novel set of evaluation criteria for such semantic network architectures has been developed. Semantic network processing as well as architectural issues are considered in such evaluation criteria. A study of how the reported architectures meet the requirements of each criterion is presented. This set of evaluation criteria is useful for future designs of machines for semantic networks because of its comprehensive range of issues on semantic networks and architectures.