Advanced Search

Narrow Results

Results: 1 - 6of6

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Hardware Accelerator (6)	26 Mar 2025	Run
Keyword: Matrix Decomposition (12)	26 Mar 2025	Run
Keyword: Thompson's Group F (8)	26 Mar 2025	Run
Keyword: Age-related Diseases (1)	26 Mar 2025	Run
Keyword: Needs Based Approach (1)	26 Mar 2025	Run

articleNo Access
SOC DESIGN FOR WIRELESS COMMUNICATIONS
- ZORAN STAMENKOVIĆ
Journal of Circuits, Systems and Computers01 Dec 2011
Preview Abstract
The paper emphasizes methods, architectures, and components for system-on-chip design. It describes the basic knowledge and skills for designing high-performance low-power embedded devices whose complexity increases exponentially, as so does the effort of designing them. Relying upon an appropriate design methodology which concentrates on reuse, executable specifications, and early error detection, these complexities can be mastered. The paper bundles these topics in order to provide a good understanding of all the problems involved. It shows how to go from description and verification to implementation and testing, presenting three systems-on-chip for three different wireless applications based on configurable processors and custom hardware accelerators.
articleNo Access
FPGA-Based Hardware Accelerator for an Embedded Factor Graph with Configurable Optimization
- Indar Sugiarto,
- Cristian Axenie, and
- Jörg Conradt
Journal of Circuits, Systems and Computers12 Nov 2018
Preview Abstract
A factor graph (FG) can be considered as a unified model combining a Bayesian network (BN) and a Markov random field (MRF). The inference mechanism of a FG can be used to perform reasoning under incompleteness and uncertainty, which is a challenging task in many intelligent systems and robotics. Unfortunately, a complete inference mechanism requires intense computations that introduces a long delay for the reasoning process to complete. Furthermore, in an energy-constrained system such as a mobile robot, it is required to have a very efficient inference process. In this paper, we present an embedded FG inference engine that employs a neural-inspired discretization mechanism. The engine runs on a system-on-chip (SoC) and is accelerated by its FPGA. We optimized our design to balance the trade-off between speed and hardware resource utilization. In our fully-optimized design, it can accelerate the inference process eight times faster than the normal execution, which is twice the speed-up gain achieved by a parallelized FG running on a PC. The experiments demonstrate that our design can be extended into an efficient reconfigurable computing machine.
articleNo Access
Feature Extraction Acceleration to Stabilize Execution Time for Real-Time Applications in Low-Cost Embedded Systems
- Taek Kyu Kim
Journal of Circuits, Systems and Computers19 Aug 2019
Preview Abstract
Extracted features are widely used for image processing. Many research endeavors have been undertaken to extract significant features of fast moving images. Appropriate algorithm processing is necessary to extract features and provide features to the other modules in real time with low-cost embedded systems. The features from accelerated segment test (FAST) algorithm is renowned for feature extraction. FAST is composed of simple arithmetic operators. In this study, FAST is employed to implement the hardware accelerator in a field-programmable gate array for small embedded systems. Meanwhile, the threshold value in FAST affects the number of extracted features and the execution time. The precarious execution time makes it difficult for the system to schedule the timing of system functions and thus degrades the performance. An appropriate method is necessary to stabilize the execution time. A dynamic threshold controller in a FAST hardware accelerator is thus proposed to enable a stable execution time. A proportional integral controller composed of an adder, subtractor, and shifter is applied for low design implementation costs. The proposed approach occupies 2,263 slice flip-flops, 3,498 look-up tables, and 17 block RAMs in a Xilinx Virtex 5 FX field-programmable gate array. It requires 3.87ms for continuous 800×480 images from the KITTI benchmark.
articleNo Access
Design of Low-Power SoC for Wearable Healthcare Device
- Ji Kwang Kim,
- Jung Hwan Oh,
- Gwan Beom Hwang,
- Oh Seong Gwon, and
- Seung Eun Lee
Journal of Circuits, Systems and Computers29 Aug 2019
Preview Abstract
In wearable devices, power consumption is a serious issue since wearable devices must maintain the power-on state at any time. In healthcare system, a variety of signal processing operations occupy a large portion of overall workload because it has periodic and heavy computational workloads. In this paper, we propose a low-power System on Chip (SoC) architecture for wearable healthcare devices. In order to reduce power consumption of processor, we design a hardware accelerator that handles signal processing and provides computation offloading. Furthermore, to minimize the area and maximize the performance of the accelerator, we optimize the operation bit-width by analyzing the frequency response. The low-power healthcare SoC was fabricated with 0.11 $μ$ m CMOS process. Finally, we measured the power consumption of our chip and verified the applicability of the digital filter accelerator, which reduces the energy consumption for embedded processor.
articleNo Access
High Performance Kernel Architecture for Convolutional Neural Network Acceleration
Journal of Circuits, Systems and Computers25 May 2021
Preview Abstract
Convolutional neural networks (CNNs) have emerged as a prominent choice in artificial intelligence tasks. Recent advancements in CNN designs have greatly improved the performance and energy-efficiency of several computation-intensive applications. However, in real-time applications, greater accuracy of CNN is attained at the expense of very high computational cost and complexity. Further, the implementation of real-time CNN on embedded platforms is highly challenging due to resource and power constraints. This paper addresses the aforesaid computational complexity and presents an accelerator architecture accompanied by a novel kernel design to improve overall CNN performance. The proposed kernel design introduces a computing mechanism that reduces the data movement cost in terms of computational cycle count (latency) by parallelizing the convolution processing elements. This architecture takes advantage of the overlap of spatially adjacent data. The performance of the proposed architecture is also analyzed for multiple hyper-parameter configurations. The proposed accelerator achieves an average of $16 \times$ improvement in reduction of execution time than the conventional computing architecture. To analyze the proposed architecture’s performance, we validate the architecture with AlexNet and VGG-16 CNN models. The proposed accelerator architecture achieves an average of $1.7 \times$ throughput improvement over state-of-the-art accelerators.
articleNo Access
FPQRNA: HARDWARE-ACCELERATED QRNA PACKAGE FOR NONCODING RNA GENE DETECTING ON FPGA
- FEI XIA,
- YONG DOU, and
- GUO-QING LEI
Journal of Bioinformatics and Computational Biology01 Aug 2010
Preview Abstract
Noncoding RNAs (ncRNAs) have important functional roles in biological processes and have become a central research interest in modern molecular biology. However, how to find ncRNA attracts much more attention since ncRNA gene sequences do not have strong statistical signals, unlike protein coding genes. QRNA is a powerful program and has been widely used as an efficient analysis tool to detect ncRNA gene at present. Unfortunately, the O(L³) computing requirements and complicated data dependency greatly limit the usefulness of QRNA package with the explosion in gene database. In this paper, we present a fine-grained parallel QRNA prototype system, FPQRNA, for accelerating ncRNA gene detection application on FPGA chip. We propose a systolic-like array architecture with multiple PEs (Processing Elements). We partition the tasks by columns and assign tasks to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrices from external memory. The experimental results show a speedup factor of more than 18× over the QRNA - 2.0.3c software running on a PC platform with AMD Phenom 9650 Quad CPU for pairwise sequence alignment with 996 residues, however the power consumption of our FPGA accelerator is only about 30% of that of the general-purpose microprocessors.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

SOC DESIGN FOR WIRELESS COMMUNICATIONS

FPGA-Based Hardware Accelerator for an Embedded Factor Graph with Configurable Optimization

Feature Extraction Acceleration to Stabilize Execution Time for Real-Time Applications in Low-Cost Embedded Systems

Design of Low-Power SoC for Wearable Healthcare Device

High Performance Kernel Architecture for Convolutional Neural Network Acceleration

FPQRNA: HARDWARE-ACCELERATED QRNA PACKAGE FOR NONCODING RNA GENE DETECTING ON FPGA