Please login to be able to save your searches and receive alerts for new content matching your search criteria.
SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in computational efficiency. This paper proposes a new algorithm, SPMSD (SpGEMM Based on Minimum Standard Deviation). The algorithm is developed based on a hash table and partition strategy. First, the number of intermediate results in the matrix is divided into multiple blocks based on a new partition strategy to ensure the minimum standard deviation among blocks. Second, the input matrix is transformed according to the result of the partition strategy. Finally, SPMSD performs the parallel computing of SpGEMM based on the advantages of fast insertion and also fast access storage of the hash table and the calculation process controls the insertion and merging of intermediate results according to the offset to avoid the shortage of atomic operations. These experiments indicate the execution of SPMSD is faster than the existing cuSPARSE libraries by 7.4x. Compared with the Out of Core method, SPMSD improves the computational performance by 1.2x, SPMSD memory utilization is decreased by 0.19x.
In parallel systems, a number of joins from one or more queries can be executed either serially or in parallel. While serial execution assigns all processors to execute each join one after another, parallel execution distributes the joins to clusters formed by certain numbers of processors and executes them concurrently. However, data skew may result in load imbalance among processors executing the same join and some clusters may be overloaded with more expensive joins. As a result, the completion time will be much longer than what is expected. In this paper, we propose an algorithm to further minimize the completion time of concurrently executed multiple joins. For this algorithm, all the joins to be executed concurrently are decomposed into a set of tasks that are ordered according to decreasing task size. These tasks are dynamically acquired by available processors during execution. Our performance study shows that the proposed algorithm outperforms previously proposed approaches, especially when the number of processors increases, the relations are highly skewed and relation sizes are large.