The rapid advancement of semiconductor technology has significantly influenced the development and testing of System-on-a-Chip (SoC) solutions, particularly in the domain of Internet of Things (IoT) automation. This paper reviews the current state of key SoC testing technologies globally, addressing the challenges of integration and the strategies employed to reduce test costs and improve chip reliability. Key technical approaches are examined in three areas: system-level verification, high-speed signal testing, and test optimization. The paper also explores the evolving landscape of SoC design, highlighting unresolved issues of testability and design that affect IoT advancements. As devices and machines become increasingly efficient and compact, SoC platforms must meet specific IoT application requirements. The integration of multiple cores on a single SoC platform shows promise for IoT applications, though it poses challenges in inter-core communication efficiency. Heterogeneous Network-on-Chip Simulators address these challenges by enhancing the interconnectivity within multiprocessor systems. This paper highlights critical aspects of IoT systems, such as power consumption, throughput, and latency, and provides insights into the deployment of SoC solutions in IoT, sensor advancements, and methodological innovations. Future directions in SoC testing and design are discussed, aiming to further the capabilities and applications of SoC in IoT automation. The development and testing of SoC technology represent a paradigm shift from traditional computing systems to more efficient and connected IoT devices. By addressing advanced testing technologies and integration challenges, IoT applications are poised for significant innovation, paving the way toward a smarter, more connected future.
Power spectral density is a crucial tool in the field of signal processing, mainly in biomedical signal processing. Power spectral density is also one of the most widely used tools in real-time applications. Therefore, further research should be prioritized in the hardware implementation of power spectral density. In this paper, two techniques are introduced in the implementation of power spectral density, mainly focusing on the Fourier transform block. The approaches introduced are the adoption of the Coordinate Rotation Digital Computer algorithm-based fast Fourier transform and the Coordinate Rotation Digital Computer algorithm-based sliding discrete Fourier transform. The other blocks in modified Welch’s architecture are also enhanced using pipelining and approximate distributed arithmetic methods. The introduction of all these techniques has led to an improvement in power and area. There is almost a 36% decrease in the number of lookup tables when compared to the existing methodology. With regards to power, there is almost 44% and 16% decrease in these two architectures, respectively. The overall architectures were synthesized using Xilinx Vivado 19.1, and the language used was Verilog Hardware Description Language.
The development of FPGAs that can be programmed to implement custom circuits by modifying memory has inspired researchers to investigate how FPGAs can be used as a computational resource in systems designed for high performance applications. When such FPGA–based systems are composed of arrays of chips or chips that can be partially reconfigured, the programmable array space can be partitioned among several concurrently executing tasks. If partition sizes are adapted to the needs of tasks, then array resources become fragmented as tasks with varying requirements are processed. Tasks may end up waiting despite their being sufficient, albeit fragmented resources available. We examine the problem of repartitioning the system (rearranging a subset of the executing tasks) at run–time in order to allow waiting tasks to enter the system sooner. In this paper, we introduce the problems of identifying and scheduling feasible task rearrangements when tasks are moved by reloading. It is shown that both problems are NP–complete. We develop two very different heuristic approaches to finding and scheduling suitable rearrangements. The first method, known as Local Repacking, attempts to minimize the size of the subarray needing rearrangement. Candidate subarrays are repacked using known bin packing algorithms. Task movements are scheduled so as to minimize delays to their execution. The second approach, called Ordered Compaction, constrains the movements of tasks in order to efficiently identify and schedule feasible rearrangements. The heuristics are compared by time complexity and resulting system performance on simulated task sets. The results indicate that considerable scheduling advantages are to be gained for acceptable computational effort. However, the benefits may be jeopardized by delays to moving tasks when the average cost of reloading tasks becomes significant relative to task service periods. We indicate directions for future research to mitigate the cost of moving executing tasks.
Connected component labeling is a process that assigns unique labels to the connected components of a binary image. The main contribution of this paper is to present a low-latency hardware connected component labeling algorithm for k-concave binary images designed and implemented in FPGA. Pixels of a binary image are given to the FPGA in raster order, and the resulting labels are also output in the same order. The advantage of our labeling algorithm is low latency and to use a small internal storage of the FPGA. We have implemented our hardware labeling algorithm in an Altera Stratix Family FPGA, and evaluated the performance. The implementation result shows that for a 10-concave binary image of 2048 × 2048, our connected component labeling algorithm runs in approximately 70ms and its latency is approximately 750µs.
A hardware architecture of a Probabilistic Logic Neuron (PLN) is presented. The suggested model facilitates the on-chip learning of pyramidal Weightless Neural Networks using a modified probabilistic search reward/penalty training algorithm. The penalization strategy of the training algorithm depends on a predefined parameter called the probabilistic search interval. A complete Weightless Neural Network (WNN) learning system is modeled and implemented on Xilinx XC4005E Field Programmable Gate Array (FPGA), allowing its architecture to be configurable. Various experiments have been conducted to examine the feasibility and performance of the WNN learning system. Results show that the system has a fast convergence rate and good generalization ability.
This paper presents an approach that permits the effective hardware realization of a novel Evolvable Spiking Neural Network (ESNN) paradigm on Field Programmable Gate Arrays (FPGAs). The ESNN possesses a hybrid learning algorithm that consists of a Spike Timing Dependent Plasticity (STDP) mechanism fused with a Genetic Algorithm (GA). The design and implementation direction utilizes the latest advancements in FPGA technology to provide a partitioned hardware/software co-design solution. The approach achieves the maximum FPGA flexibility obtainable for the ESNN paradigm. The algorithm was applied as an embedded intelligent system robotic controller to solve an autonomous navigation and obstacle avoidance problem.
Spiking neural networks (SNN) are able to emulate real neural behavior with high confidence due to their bio-inspired nature. Many designs have been proposed for the implementation of SNN in hardware, although the realization of high-density and biologically-inspired SNN is currently a complex challenge of high scientific and technical interest. In this work, we propose a compact digital design for the implementation of high-volume SNN that considers the intrinsic stochastic processes present in biological neurons and enables high-density hardware implementation. The proposed stochastic SNN model (SSNN) is compared with previous SSNN models, achieving a higher processing speed. We also show how the proposed model can be scaled to high-volume neural networks trained by using back propagation and applied to a pattern classification task. The proposed model achieves better results compared with other recently-published SNN models configured with unsupervised STDP learning.
Field Programmable Gate Arrays (FPGA), as one of the popular circuit implementation platforms, provide the flexible and powerful way for different applications. IC designs are configured to FPGA through bitstream files. However, the configuration process can be hacked by side channel attacks (SCA) to acquire the critical design information, even under the protection of encryptions. Reports have shown many successful attacks against the FPGA cryptographic systems during the bitstream loading process to acquire the entire design. Current countermeasures, mostly random masking methods, are effective but also introduce large hardware complexity. They are not suitable for resource-constrained scenarios such as Internet of Things (IoT) applications. In this paper, we propose a new secure FPGA masking scheme to counter the SCA. By utilizing the FPGA partial reconfiguration feature, the proposed technique provides a light-weight and flexible solution for the FPGA decryption masking.
Continuous enhancement of the performance of energy harvesters in recent years has broadened their arenas of applications. On the other hand, ample availability of IoT devices has made radio frequency (RF) a viable source of energy harvesting. Integration of a maximum power point tracking (MPPT) controller in RF energy harvester is a necessity that ensures maximum available power transfer with variable input power conditions. In this paper, FPGA implementation of a machine learning (ML) model for maximum power point tracking in RF energy harvesters is presented. A supervised learning-based ML model-feedforward neural network (FNN) has been designed which is capable of tracking maximum power point with optimal accuracy. The model was designed using stochastic gradient descent (SGD) optimizer and mean square error (MSE) loss function. Simulation results of the VHDL translated model demonstrated a good agreement between the expected and the obtained values. The proposed ML based MPPT controller was implemented in Artix-7 Field Programmable Gate Array (FPGA).
This paper presents the FPGA implementation of the prototype for the Data-Driven Chip-Multiprocessor (D2-CMP). In particular, we study the implementation of a Thread Synchronization Unit (TSU) on FPGA, a hardware unit that enables thread execution using dataflow-like scheduling policy on a chip multiprocessor. Threads are scheduled for execution based on data availability, i.e., a thread is scheduled for execution only if its input data is available. This model of execution is called the non-blocking Data-Driven Multithreading (DDM) model of execution. The DDM model has been evaluated using an execution driven simulator. To validate the simulation results, a 2-node DDM chip multiprocessor has been implemented on a Xilinx Virtex-II Pro FPGA with two PowerPC processors hardwired on the FPGA. Measurements on the hardware prototype show that the TSU can be implemented with a moderate hardware budget. The 2-node multiprocessor has been implemented with less than half of the reconfigurable hardware available on the Xilinx Virtex-II Pro FPGA (45% slices), which corresponds to an ASIC equivalent gate count of 1.9 million gates. Measurements on the prototype showed that the delays incurred by the operation of the TSU can be tolerated.
During the past decades, computer science experts were inspired from the study of biological organisms. Moreover, bio-inspired algorithms were produced that many times can give excellent solutions with low computational cost in complex engineering problems. In our case, the plasmodium of Physarum polycephalum is capable of finding the shortest path solution between two points in a labyrinth. In this study, we implement a Cellular Automata (CA) model in hardware, which attempts to describe and, moreover, mimic the behavior of the plasmodium in a maze. Beyond the successful implementation of the CA-based Physarum model in software, in order to take full advantage of the inherent parallelism of CA, we focus on a Field Programmable Gate Array (FPGA) implementation of the proposed model. Namely, two different implementations were considered here. Their difference is on the desired precision produced by the numerical representation of CA model parameters. Based on the corresponding results of the shortest path in the labyrinth,the modeling efficiency of both approaches was compared depending on the resulting error propagation. The presented FPGA implementations succeed to take advantage of the CA's inherit parallelism and improve the performance of the CA algorithm when compared with software in terms of computational speed and power consumption. As a result, the implementations presented here, can also be considered as a preliminary CA-based Physarum polycephalum IP core which produces a biological inspired solution to the shortest-path problem.
Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096×4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.
Filtered-x Least Mean Squares (FxLMS) is an algorithm commonly used for Active Noise Control (ANC) systems in order to cancel undesired acoustic waves from a sound source. There is a small number of hardware designs reported in the literature, that in turn only use one reference signal, one error signal and one output control signal. In this paper, it is proposed a 3-dimensional hardware-based version of the widely used FxLMS algorithm, using one reference microphone, 18 error microphones, one output and a FIR filter of 400th order. The FxLMS algorithm was implemented in a Xilinx Artix 7 FPGA running at 25 MHz, which allowed to update the filter coefficients in 32.44μ s. The main idea behind this work is to propose a pipelined parallelized architecture to achieve processing times faster than real time for the filter coefficients update. The main contribution of this work is not the ANC technique itself, but rather the proposed hardware implementation that utilizes integer arithmetic, which provided an acceptable error when benchmarked with a software implementation. This parallel system allows a scalable implementation as an advantage of using FPGA without compromising the computational cost and, consequently, the latency.
This paper proposes a real-time super-resolution (SR) system. The proposed system performs a fast SR algorithm that generates a high-resolution image from a low-resolution image using direct regression functions with an up-scaling factor of 2. This algorithm contained two processes: feature learning and SR image prediction. The feature learning stage is performed offline, in which several regression functions were trained. The SR image prediction stage is implemented on the proposed system to generate high-resolution image patches. The system implemented on a Xilinx Virtex 7 field-programmable gate array achieves output resolution of 3840×2160 (UHD) at 85 fps and 700Mpixels/s throughput. Structure similarity (SSIM) is measured for image quality. Experimental results show that the proposed system provides high image quality for real-time applications. And the proposed system possesses high scalability for resolution.
There has been a growing interest from academia and industry in developing circuits and systems for edge computing and quality control tasks in food production lines, where image-processing is frequently required. This paper outlines the required considerations for designing a fruit classification system based on image-processing using Cellular Automata (CA) models and integrating it into reconfigurable hardware (HW) such as Field Programmable Gate Arrays (FPGAs). Parallel processing in CA requires numerous processing elements to be implemented and mapping CA models to HW generally comes with limitations. Homogeneous CA arrays are easier to design and implement in HW but can be resource-demanding. To fill this gap, this study explores different alternatives for the HW implementation of CA models, particularly trading computational-parallelism for a more optimized use of the available HW resources. We conducted experimental tests of the designed HW system using the Digilent Nexys development board, and the operation was validated against software-based benchmarks for image-processing, particularly concerning edge-detection. The presented study provides a broader range of design solutions for the HW implementation of two-dimensional CA models and a better understanding of their advantages and disadvantages. The results show that solutions focusing on instruction-parallelism add some complexity to the conception and require more design effort, compared to homogeneous CA models composed of identical cells. However, the instruction-parallel design solutions can significantly improve the HW resource utilization, especially when implementing computationally intensive CA rules in FPGAs.
We are trying to make a hardware logic-circuit for pipelines of fast Fourier transformation (FFT) with a field programmable gate array (FPGA) for data analyses of an interferometric gravitational-wave detector. That FFT processor is connected to a personal computer (PC) through PCI bus and will increase the calculation speed of FFT which is the most time-consuming step for typical gravitational-wave analyses.
This study explores the implementation of the nonlinear autoregressive Volterra (NARV) model using a field programmable gate arrays (FPGAs)-based hardware simulation platform and accomplishes the identification process of the Hodgkin–Huxley (HH) model. First, a physiological detailed single-compartment HH model is applied to generate experiment data sets and the electrical behavior of neurons are described by the membrane potential. Then, based on the injected input current and the output membrane potential, a second-order NARV model is constructed and implemented on FPGA-based simulation platforms. The NARV modeling method is data-driven, requiring no accurate physiological information and the FPGA-based hardware simulation can provide a real time and high-performance platform to deal with the drawbacks of software simulation. Therefore, the proposed method in this paper is capable of handling the nonlinearities and uncertainties in nonlinear neural systems and may help promote the development of clinical treatment devices.
A four-variable dynamical system composed of memristor is proposed to investigate the dependence of multi-scroll attractor on initial setting for one variable with memory, and the description for physical background is supplied. It is found that appropriate setting of initial values for the memory variable can induce different numbers of attractor, as a result, resetting initials can change the profile of attractors which is also dependent on the calculating period. Time-delayed feedback is used to stabilize the dynamical system thus the effect of initial dependence is suppressed and multi-scroll attractors are controlled by applying appropriate time delay and feedback gain in the controller. Furthermore, the system is verified on FPGA circuit platform and memristor is used to describe the memory effect of variable related to magnetic flux. It is confirmed that multi-scroll attractors can be stabilized and the dependence of initials setting is suppressed in experiment way.
This paper designs a novel classification hardware framework based on neural network (NN). It utilizes COordinate Rotation DIgital Computer (CORDIC) algorithm to implement the activation function of NNs. The training was performed through software using an error back-propagation algorithm (EBPA) implemented in C++, then the final weights were loaded to the implemented hardware framework to perform classification. The hardware framework is developed in Xilinx 9.2i environment using VHDL as programming languages. Classification tests are performed on benchmark datasets obtained from UCI machine learning data repository. The results are compared with competitive classification approaches by considering the same datasets. Extensive analysis reveals that the proposed hardware framework provides more efficient results as compared to the existing classifiers.
The goal of this work was to develop an improved defect detection scheme for high-speed real-time web surface inspection. This goal was realized by splitting the task into two independent parts: feature extraction and segmentation. Both parts were implemented using efficient algorithms which were implemented in hardware that is suitable and fast enough to be included in a working web inspection system. The proposed scheme is based on some derived texture features and a new self-organizing map variant, the statistical self-organizing map. These techniques offer several improvements over the gray-level thresholding techniques that have been traditionally used in commercial web inspection systems.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.