In this study, we discuss overlapping possibilities of Sparse Matrix-Vector multiplication (SpMV) in cases where we have multiple RHS-vectors and where the whole sparse matrix data may or may not fit into the memory of the discrete GPU, at once, by using OpenACC. With GPUs, one can take advantage of their relatively high memory bandwidths. However, data needs to be transferred over the relatively slow PCIe bus. We implement communication-hiding to increase performance.
In the case of three degrees of freedom and modeling 2,097,152 nodes, we observe a just above 40% performance increase by applying communication-hiding in our routine. This underlines the importance of applying such techniques in simulations, when it is suitable with the algorithmic structure of the problem in relation to the underlying computer architecture.