World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

RDMKE: Applying Reuse Distance Analysis to Multiple GPU Kernel Executions

    https://doi.org/10.1142/S0218126619502451Cited by:1 (Source: Crossref)

    Modern GPUs can execute multiple kernels concurrently to keep the hardware resources busy and to boost the overall performance. This approach is called simultaneous multiple kernel execution (MKE). MKE is a promising approach for improving GPU hardware utilization. Although modern GPUs allow MKE, the effects of different MKE scenarios have not adequately studied by the researchers. Since cache memories have significant effects on the overall GPU performance, the effects of MKE on cache performance should be investigated properly. The present study proposes a framework, called RDMKE (short for Reuse Distance-based profiling in MKEs), to provide a method for analyzing GPU cache memory performance in MKE scenarios. The raw memory access information of a kernel is first extracted and then RDMKE enforces a proper ordering to the memory accesses so that it represents a given MKE scenario. Afterward, RDMKE employs reuse distance analysis (RDA) to generate cache-related performance metrics, including hit ratios, transaction counts, cache sets and Miss Status Holding Register reservation fails. In addition, RDMKE provides the user with the RD profiles as a useful locality metric. The simulation results of single kernel executions show a fair correlation between the generated results by RDMKE and GPU performance counters. Further, the simulation results of 28 two-kernel executions indicate that RDMKE can properly capture the nonlinear cache behaviors in MKE scenarios.

    This paper was recommended by Regional Editor Zoran Stamenkovic.