We evaluate the impact of the memory hierarchy of virtual shared memory computers on the design of algorithms for linear algebra. On classical shared memory multiprocessor computers, block algorithms are used for efficiency. We study here the potential and the limitations of such approaches on globally addressable distributed memory computers. The BBN TC2000 belongs to this class of computers and will be used to illustrate our discussion.

We describe the implementation of Level 3 BLAS and examine the performance of some of the LAPACK routines. The impact of the number of processors with respect to the choice of the variants of classical matrix factorizations (for example, KJI, JKI, JIK for the LU factorization) is discussed. We also study the factorization of sparse matrices based on a multifrontal approach. The ideas introduced for the parallelization of full linear algebra codes are applied to the sparse case. We discuss and illustrate the limitations of this approach in sparse multifrontal factorization.

We show that the speed-ups obtained on the BBN TC2000 for the class of methods presented here are comparable to those obtained on more classical shared memory computers, such as the Alliant FX/80, the CRAY-2 and the IBM 3090/VF.

Keywords: