PARALLEL MATRIX MULTIPLICATION ALGORITHMS ON HYPERCUBE MULTIPROCESSORS
Abstract
In this paper, we present three parallel algorithms for matrix multiplication. The first one, which employs pipelining techniques on a mesh grid, uses only one copy of data matrices. The second one uses multiple copies of data matrices also on a mesh grid. Although data communication operations of the second algorithm are reduced, the requirement of local data memory for each processing element increases. The third one, which uses a cubic grid, shows the trade-offs between reducing the computation time and reducing the communication overhead. Performance models and feasibilities of these three algorithms are studied. We analyze the interplay among the numbers of processing elements, the communication overhead, and the requirements of local memory in each processing element. We also present experimental results of these three algorithms on a 32-node nCUBE-2 computer.