World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

High Performance Kernel Architecture for Convolutional Neural Network Acceleration

    https://doi.org/10.1142/S0218126621502662Cited by:3 (Source: Crossref)

    Convolutional neural networks (CNNs) have emerged as a prominent choice in artificial intelligence tasks. Recent advancements in CNN designs have greatly improved the performance and energy-efficiency of several computation-intensive applications. However, in real-time applications, greater accuracy of CNN is attained at the expense of very high computational cost and complexity. Further, the implementation of real-time CNN on embedded platforms is highly challenging due to resource and power constraints. This paper addresses the aforesaid computational complexity and presents an accelerator architecture accompanied by a novel kernel design to improve overall CNN performance. The proposed kernel design introduces a computing mechanism that reduces the data movement cost in terms of computational cycle count (latency) by parallelizing the convolution processing elements. This architecture takes advantage of the overlap of spatially adjacent data. The performance of the proposed architecture is also analyzed for multiple hyper-parameter configurations. The proposed accelerator achieves an average of 16× improvement in reduction of execution time than the conventional computing architecture. To analyze the proposed architecture’s performance, we validate the architecture with AlexNet and VGG-16 CNN models. The proposed accelerator architecture achieves an average of 1.7× throughput improvement over state-of-the-art accelerators.

    This paper was recommended by Regional Editor Emre Salman.