Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleOpen Access

    UniBoard2, A Generic Scalable High-Performance Computing Platform for Radio Astronomy

    With the ever-increasing data rates in radio astronomy, a universal Field Programmable Gate Array (FPGA)-based hardware platform which can be used at different locations in the signal processing chain, like a beamformer, data router or correlator, would reduce development time significantly. In this paper, we present the design of such a platform, the UniBoard2. With UniBoard2, both large rack-based and single-board systems can be made. Standard Quad Small Form-factor Pluggable (QSFP) input and output (IO) interfaces on the front side make it easy to interface UniBoard2 to standard 40 Gigabit Ethernet (GbE) network equipment. Hardware design challenges, like transceiver links, power supplies, power dissipation and cooling are described. The paper concludes with some examples of systems (like beamformers and correlators) that can be built using the UniBoard2 hardware platform.

  • articleOpen Access

    Novel Binary Signed-Digit Addition Algorithm for FPGA Implementation

    Signed-digit (SD) number representation systems have been studied for high-speed arithmetic. One important property of the SD number system is the possibility of performing addition without long carry chain. However, many numbers of logic elements are required when the number representation system and such an adder are realized on a logic circuit. In this study, we propose a new adder on the binary SD number system. The proposed adder uses more circuit area than the conventional SD adders when those adders are realized on ASIC. However, the proposed adder uses 20% less number of logic elements than the conventional SD adder when those adders are realized on a field-programmable gate array (FPGA) which is made up of 4-input 1-output LUT such as Intel Cyclone IV FPGA.

  • articleOpen Access

    The SKARAB Board in the Framework of Single-Dish Radio Astronomy

    The Square Kilometer Array Reconfigurable Application Board (SKARAB) is a Xilinx Virtex-7 FPGA-based platform designed for the MeerKAT array, to be used for both F- and X/B-engines. The MeerKAT F-engine receives digitized data, thus no samplers are required on the SKARAB-based platform. Among the alternative options available, a high-performing ADC mezzanine board can be used in conjunction with the SKARAB, which allows four 1.4-GHz-wide input channels to be digitized and processed. In this paper, we outline how the SKARAB has been successfully used for single-dish applications including imaging, spectroscopy, and spectro-polarimetry.

  • articleOpen Access

    Implementation of a Correlator onto a Hardware Beam-Former to Calculate Beam-Weights

    One of the main technologies to open up a wider field of view for today’s radio telescopes are phased arrays. This is especially the case for radio astronomy instruments operating below 2GHz. Nowadays, the existing dish-type instruments are being upgraded with phased array feeds (PAF) in the focal plane. This increases the field of view at the expense of needing more analog electronics and digital signal processing. One of the digital signal processing functionalities used to combine the digitized signals from the PAF is a beam-former which creates multiple high sensitivity beams within the field of view of the dish. Before beams can be formed, the signals from the PAF need to be calibrated using a correlator. In this paper, we present a solution where these two operations are combined by using the beam-former also as a correlator. The statistics unit used as part of the beam-former implementation, can be used as well for calculating correlation products. With the proper settings of the beam-former weight of each beamlet, a frequency sub-band with a direction, can be used as a single cross correlation product. By implementing the correlator on the beam-former, the digital resources and development time can be reduced. To validate the idea, two versions of the algorithm are implemented in the Apertif PAF system on the Westerbork Synthesis Radio Telescope (WSRT). Results show that two full-bandwidth correlation matrices per beam, needed to determine the static beam weights for the calibration, and a single column of the correlation matrix, used to compensate for any drift between the receiver chains, can be performed.

  • articleNo Access

    SARAS CD/EoR Radiometer: Design and Performance of the Digital Correlation Spectrometer

    In the currently accepted model for cosmic baryon evolution, Cosmic Dawn (CD) and the Epoch of Reionization (EoR) are significant times when first light from the first luminous objects emerged, transformed and subsequently ionized the primordial gas. The 21cm (1420MHz) hyperfine transition of neutral hydrogen, redshifted from these cosmic times to a frequency range of 40MHz to 200MHz, has been recognized as an important probe of the physics of CD/EoR. The global 21cm signal is predicted to be a spectral distortion of a few 10’s to a few 100’s of mK, which is expected to be present in the cosmic radio background as a trace additive component. Shaped Antenna measurement of the background RAdio Spectrum (SARAS) is a spectral radiometer purpose designed to detect the weak 21cm signal from CD/EoR. An important subsystem of the radiometer, the digital correlation spectrometer, is developed around a high-speed digital signal processing platform called pSPEC. pSPEC is built around two quad 10-bit analog-to-digital converters (EV10AQ190) and a Virtex 6 (XC6VLX240T) field programmable gate array, with provision for multiple Gigabit Ethernet and 4.5Gbps fiber-optic interfaces. Here, we describe the system design of the digital spectrometer, the pSPEC board, and the adaptation of pSPEC to implement a high spectral resolution (61kHz), high dynamic range (105:1) correlation spectrometer covering the entire CD/EoR band. As the SARAS radiometer is required to be deployed in remote locations where terrestrial radio frequency interference (RFI) is a minimum, the spectrometer is designed to be compact, portable and operating off internal batteries. The paper includes an evaluation of the spectrometer’s susceptibility to RFI and capability to detect signals from CD/EoR.

  • articleOpen Access

    On the Appropriate Handling of Metastable Voltages in FPGAs

    The significant process, voltage and temperature (PVT) variations seen with modern technologies make strictly synchronous design inefficient. Asynchronous design with its flexible timing is a promising alternative, but prototyping is difficult on the available FPGA platforms which are clock centric and do not provide the required functional primitives like mutual exclusion or Muller C-elements. The solutions proposed in the literature so far work nicely in principle but cannot safely handle metastability issues that are inevitable even at some interfaces in asynchronous designs. In this paper, we propose reliable implementations of the fundamental function blocks required to safely convert potential intermediate voltage levels that result from metastability into late transitions that can be reliably handled in the asynchronous domain. These are high- and low-threshold buffers as well as a Schmitt-trigger. We give elaborate background analysis for the proposed circuits and also present the associated routing constraints to make the Schmitt-trigger circuit work properly in spite of the uncertain routing within FPGAs. Furthermore, we propose a procedure for an “in situ reliability assessment” of the specific Schmitt-trigger element under consideration, which also applies to metastability containment with high- or low-threshold buffers only. Our proof of concept is based on experimental results for both Xilinx and Altera FPGA platforms.

  • articleNo Access

    Symbolic regression-based adaptive generation of implied volatility

    This research paper introduces a new form of Implied Volatility calculation with Symbolic Regression suited for high-frequency trading. The solutions are easily migratable to hardware accelerators like Field Programmable Gate Arrays. This machine learning approach is flexible, and configurable for either high precision, lower latency, or energy efficiency. The model evaluates each mathematical operator in terms of cycles, which then generates highly parallel yet low depth formulas. From testing with C++, the formulas achieved higher accuracy and less than a sixth the time of traditional Implied Volatility models. The data were tested on the SPX dataset to validate accuracy.

  • articleNo Access

    AM&FT: An Aging Mitigation and Fault Tolerance Framework for SRAM-Based FPGA in Space Applications

    Field programmable gate arrays (FPGAs) have become very widely used devices in space applications, and their runtime reconfigurable architecture allows for the area and power acceleration for complex applications. However, FPGAs are increasingly susceptible to aging effects and failures due to harsh space environments and long operation cycles, which reduce the reliability and lifetime of such devices. Although offline aging-aware layout-based methods are effective in aging mitigation, existing studies ignore the fault tolerance needs of the task and the layout strategy will completely fail after a hard failure occurs. This paper presents a reliability framework AM&FT for SRAM-based FPGAs in space applications to support on-chip aging mitigation and fault tolerance. We use an Integer Linear Programming (ILP) model to solve mapping relationships between tasks and reconfigurable blocks (Rbs) in the offline phase to achieve the aging and reliability-aware layout strategy. Second, the ILP model is incorporated into the Design Space Exploration (DSE) to generate a set of layout strategies to tolerate hard faults. Moreover, the state model is used for runtime fault management to handle the impact of different types of faults on the device. Experimental results demonstrate that our framework achieves FPGA on-chip aging mitigation and fault tolerance. Compared with the existing methods, AM&FT can guarantee the fault tolerance requirements of tasks and give priority to guarantee the Quality of Service (QoS) of critical tasks under the condition of hard faults accumulation. In addition, our framework delivers up to 6.5×higher mean time to failure (MTTF) than the baseline.

  • articleNo Access

    Hardware Efficient Pseudo-Random Number Generator Using Chen Chaotic System on FPGA

    This paper introduces an FPGA implementation of a pseudo-random number generator (PRNG) using Chen’s chaotic system. This paper mainly focuses on the development of an efficient VLSI architecture of PRNG in terms of bit rate, area resources, latency, maximum length sequence, and randomness. First, we analyze the dynamic behavior of the chaotic trajectories of Chen’s system and set the parameter’s value to maintain low hardware design complexity. A circuit realization of the proposed PRNG is presented using hardwired shifting, additions, subtractions, and multiplexing schemes. The benefit of this architecture, all the binary multiplications (except XiYi and XiZi) operations are performed using hardwired shifting. Moreover, the generated sequences pass all the 15 statistical tests of NIST, while it generates pseudo-random numbers at a uniform clock rate with minimum hardware complexity. The proposed architecture of PRNG is realized using Verilog HDL, prototyped on the Virtex-5 FPGA (XC5VLX50T) device, and its analysis has been done using the Matlab tool. Performance analysis confirms that the proposed Chen chaotic attractor-based PRNG scheme is simple, secure, and hardware efficient, with high potential to be adopted in cryptography applications.

  • articleNo Access

    A Resource Efficient CNN Accelerator for Sensor Signal Processing Based on FPGA

    With the use of Convolutional Neural Network (CNN) in the application of sensor signal processing system, it usually faces the urgent requirements of system integration, high throughput, hardware resource and energy efficiency. This paper introduces a resource efficient accelerator with general two-dimensional multiply-add array operator to focus on the characteristic of sensor signal processing, which can be applied to standard CNN, depth-wise CNN, Fully Connected (FC) layer for varied networks. Meanwhile, resource estimation model is also constructed to provide the exploration of parallel parameters for computing efficiency. Finally, a board-level verification is implemented to demonstrate the efficiency of proposed accelerator with common scene of LeNet and complex scene of MobileNetV1. Experimental results show that the Inferences Per Second (IPS) of 332225 and 1498 is realized with 100MHz frequency. The corresponding efficiency is 88.84% and 61.09%, which outperforms other related works about CNN accelerator design in terms of signal processing. This paper is also applicable and scalable to other fields about effective acceleration research.

  • articleOpen Access

    Variable Correlation Digital Noise Source on FPGA — A Versatile Tool for Debugging Radio Telescope Backends

    Contemporary wideband radio telescope backends are generally developed on Field Programmable Gate Arrays (FPGA) or hybrid (FPGA+GPU) platforms. One of the challenges faced while developing such instruments is the functional verification of the signal processing backend at various stages of development. In the case of an interferometer or pulsar backend, the typical requirement is for one independent noise source per input, with provision for a common, correlated signal component across all the inputs, with controllable level of correlation. This paper describes the design of a FPGA-based variable correlation Digital Noise Source (DNS), and its applications to built-in testing and debugging of correlators and beamformers. This DNS uses the Central Limit Theorem-based approach for generation of Gaussian noise, and the architecture is optimized for resource requirements and ease of integration with existing signal processing blocks on FPGA.

  • articleNo Access

    FPGA REALIZATION OF COMPLEX LOGISTIC MAP FRACTAL BEHAVIOR

    Fractals26 Nov 2021

    This paper studies the capability of digital architecture to mimic fractal behavior. As chaotic attractors realized digitally had opened many tracks, digital designs mimicking fractals may ultimately achieve the same. This study is based on a complex single-dimensional discrete chaotic system known as the generalized positive logistic map. The fractals realized from this system are linked to the results of the mathematical analysis to understand the fractal behavior with different variations. A digital hardware architecture manifesting the fractal behavior is achieved on FPGA, showing a fractal entity experimentally. With this digital realization, it is hoped that fractals can follow the example of chaotic attractors digital applications.

  • articleNo Access

    An Efficient Fixed-Point Multiplier Based on CORDIC Algorithm

    Fixed-point multiplication is an important operation that is frequently used in many digital signal processing (DSP) applications. The operation is computationally intense and very often the performance of multiplier determines the overall performance of DSP system. Evidently, a wide range of approaches have been proposed for efficient implementation of fixed-point multipliers on different hardware platforms. In this paper, we use Coordinate Rotation DIgital Computer (CORDIC) algorithm to perform fixed-point multiplication operation. The motivation for our approach is based on the fact that CORDIC is a hardware-efficient algorithm, wherein accuracy can be traded-off for performance. Our implementation targets field programmable gate arrays (FPGAs) and focuses on exploiting the underlying general and specialized fabric to the fullest. Performance comparisons against various traditional and recent approaches show that a substantial improvement is achievable by using CORDIC-based multipliers. We have also implemented a recently proposed convolution architecture using CORDIC-based multipliers. The results show that a proper choice of CORDIC architecture can result in an improvement of performance parameters like resource utilization, throughput and dynamic power. This, however, is achieved in lieu of a small cost in accuracy. Our analysis of an 8-stage CORDIC multiplier reports a mean absolute percentage error (MAPE) of 6.032 — a factor that reduces exponentially with increasing number of stages.

  • articleOpen Access

    Hardware Design Strategies Oriented to Edge Computing and AgriFood Electronics for Image Processing using Cellular Automata

    There has been a growing interest from academia and industry in developing circuits and systems for edge computing and quality control tasks in food production lines, where image-processing is frequently required. This paper outlines the required considerations for designing a fruit classification system based on image-processing using Cellular Automata (CA) models and integrating it into reconfigurable hardware (HW) such as Field Programmable Gate Arrays (FPGAs). Parallel processing in CA requires numerous processing elements to be implemented and mapping CA models to HW generally comes with limitations. Homogeneous CA arrays are easier to design and implement in HW but can be resource-demanding. To fill this gap, this study explores different alternatives for the HW implementation of CA models, particularly trading computational-parallelism for a more optimized use of the available HW resources. We conducted experimental tests of the designed HW system using the Digilent Nexys development board, and the operation was validated against software-based benchmarks for image-processing, particularly concerning edge-detection. The presented study provides a broader range of design solutions for the HW implementation of two-dimensional CA models and a better understanding of their advantages and disadvantages. The results show that solutions focusing on instruction-parallelism add some complexity to the conception and require more design effort, compared to homogeneous CA models composed of identical cells. However, the instruction-parallel design solutions can significantly improve the HW resource utilization, especially when implementing computationally intensive CA rules in FPGAs.

  • articleOpen Access

    NONLINEAR FM INDEX APPLICATION FOR ALIGNMENT OF SHORT DNA SEQUENCES USING RE-PARAMETRIZATION OF ALGORITHMS

    Fractals01 Jun 2018

    An innovative reconfiguration application is proposed to re-calculate the parameters of the Ferragina and Manzini exact search algorithm (or FM indexes), using a modular and efficient hardware implementation to accelerate alignment programs of short DNA sequence reads. Although these programs use multi-core execution strategies or multiple computers, they have become slow considering the very high speed at which the new massively parallel sequencing machines produce the reads to be aligned. Consequently, a search for different ways to accelerate the alignment is crucial. The proposed design runs with software functions in a hybrid system, and has the ability to align millions of reads to reference as large as the human genome. Tests on the M505k325t card show that a single alignment core can accelerate the computation by a factor close to 10.64× in relation to BWA. Due to the minor consumption of area and power, multiple alignment cores can fill the Field Programmable Gate Array (FPGA) by multiplying the computation speed. With a multiple-core implementation, the processing speed of the design outperforms applications that are accelerated by GPUs and competes with similar FPGA proposals whose cost is much higher.

  • articleNo Access

    Bit-Vector-Based Hardware Accelerator for DNA Alignment Tools

    Next generation sequencing technologies have noticeably improved in the last decade. Time and cost of whole genome sequencing are important challenges that must be reduced, opening unprecedented opportunities to various research and development areas. The alignment or mapping of small reads produced by sequencing machines to reference genomes of billions of nucleotides is a fundamental task in this sequencing process. It is computationally highly demanding and has become the bottleneck of the DNA analysis process. This paper proposes hardware acceleration based on FPGA of the Myers bit-parallelized algorithm, appropriately modified to be used in the extend stage of DNA alignment tools. The proposed design can be employed in conjunction with software functions, as it constitutes an extremely fast heterogeneous DNA alignment system. The implementation results show a speedup of up to 110× relative to a sequential implementation only in software. In addition, due to the limited use of FPGA resources and the modular design, multiple modules can be used to completely populate the chip, further increasing the computing speed.

  • articleNo Access

    Caching Hybrid Rotation: A Memory Access Optimization Method for CNN on FPGA

    Custom computing architectures on field programmable gate array (FPGA) platforms are a viable solution to further accelerate convolutional neural network (CNN) inference. However, due to the large size feature map matrix data, the optimization of CNN feature maps storage computing on FPGA remains a challenge. To overcome these challenges, a FPGA-oriented memory access optimization method for CNN is proposed. Firstly, the feature map partition strategy is used to group the feature map efficiently. Second, the input and the output caching rotation methods are employed in adaptive memory access mode. Third, a caching hybrid rotation method is proposed to optimize memory access performance and can effectively reduce the access time of the CNN feature map. Experimental results based on SkyNet and VGG16 show that the inference speed of the proposed model is accelerated by 7.1 times compared with the previous conventional memory access optimization for CNN on FPGA. Through the evaluation of computational energy efficiency, our method can be improved by 6.4 times compared to the current typical accelerators.

  • articleNo Access

    Improving Sub-Pixel Estimation of Laser Stripe Reflection Center by Autoconvolution on FPGA

    Modern laser scanners perform high-speed real-time image processing algorithms while operating in harsh industrial environments. Their performance goal is to extract the central position of the laser line reflection with Gaussian distribution. Traditional algorithms for sub-pixel estimation, such as the Center of Gravity (CG) or Parabolic Fit (PF), show poor performances under low SNR or if the pixels are saturated. Data pre-processing usually has a key role in suppressing the effects of various noise sources and dynamic environment, especially when the images are overexposed and the top of Gaussian pulse is flattened. Both in simulation and in experiment, this study explains a method that improves the accuracy of estimation of the laser stripe reflection center, by using an autoconvolution for extending the bit-width of pixel intensity. Autoconvolution of the image line is an efficient real-time pre-processing filtering method for improving the accuracy of CG calculation. The proposed algorithm is implemented on Field-Programmable Gate Arrays (FPGAs) and experimentally validated at real operational environment. It is shown that this method can reduce the error of CG laser reflection center estimation for more than one pixel in size when the image is highly affected by external noise sources and ambient light.

  • articleNo Access

    An Efficient FPGA-Based Welch Power Spectral Density for Real-Time Applications

    Power spectral density is a crucial tool in the field of signal processing, mainly in biomedical signal processing. Power spectral density is also one of the most widely used tools in real-time applications. Therefore, further research should be prioritized in the hardware implementation of power spectral density. In this paper, two techniques are introduced in the implementation of power spectral density, mainly focusing on the Fourier transform block. The approaches introduced are the adoption of the Coordinate Rotation Digital Computer algorithm-based fast Fourier transform and the Coordinate Rotation Digital Computer algorithm-based sliding discrete Fourier transform. The other blocks in modified Welch’s architecture are also enhanced using pipelining and approximate distributed arithmetic methods. The introduction of all these techniques has led to an improvement in power and area. There is almost a 36% decrease in the number of lookup tables when compared to the existing methodology. With regards to power, there is almost 44% and 16% decrease in these two architectures, respectively. The overall architectures were synthesized using Xilinx Vivado 19.1, and the language used was Verilog Hardware Description Language.

  • articleNo Access

    Dynamics and Implementation of FPGA for Memristor-Coupled Fractional-Order Hopfield Neural Networks

    The coupling between neurons can lead to diverse neural network architectures, with the Hopfield neural network (HNN) being particularly noteworthy for its resemblance to human brain function and its potential in modeling chaotic systems. This paper introduces a novel approach: a fractional-order HNN coupled with a hyperbolic tangent-type memristor. Initially, we propose a new model for the hyperbolic tangent-type memristor and fingerprints. Subsequently, we construct a memristor-coupled fractional-order Hopfield neural network (mFOHNN) and explore its dynamic behavior using various analytical tools, including phase diagrams, bifurcation diagrams, Lyapunov exponent diagrams, Poincaré maps, and attractor basins. Our findings reveal rich coexisting bifurcation behavior in the neural network model, influenced by different initial values of coexisting attractors. Finally, we validate the model through analysis and implementation using Multisim circuit simulation software and FPGA hardware, respectively.