Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In this paper a platform is presented, that given a stochastic context-free grammar (SCFG), automatically outputs the description of the parser in synthesizable hardware description language (HDL) which can be downloaded in an Field Programmable Gate Arrays (FPGA) board. Initially, according to our methodology the SCFG is augmented with attributes which store the probability values and can be evaluated through corresponding stack actions. The architecture of the produced system is based on a proposed extension of Earley’s parallel algorithm, which given an input string, generates the parse trees in the form of an AND-Or parse tree. This AND-or parse tree is then traversed using a proposed tree traversal technique in order to execute the corresponding actions in the correct order, so as to compute the necessary probabilities. The platform is suitable for embedded systems applications where a natural language interface is required or in pattern recognition tasks. The parser generated by the presented platform has been tested for various SCFGs and compared to software approaches. The performance comparison is one to two orders of magnitude in favor of the presented hardware, compared to previous software approaches, depending on the application, the input string length and the number of produced trees.
In this paper, a platform is presented, that given a Stochastic Context-Free Grammar (SCFG), automatically outputs the description of a parser in synthesizable Hardware Description Language (HDL) which can be downloaded in an FPGA (Field Programmable Gate Arrays) board. Although the proposed methodology can be used for various inexact models, the probabilistic model is analyzed in detail and the extension to other inexact schemes is described. Context-Free Grammars (CFG) are augmented with attributes which represent the probability values. Initially, a methodology is proposed based on the fact that the probabilities can be evaluated concurrently with the parsing during the parse table construction by extending the fundamental parsing operation proposed by Chiang & Fu. Using this extended operation, an efficient architecture is presented based on Earley’s parallel algorithm, which given an input string, generates the parse table while evaluating concurrently the probabilities of the generated dotted grammar rules in the table. Based on this architecture, a platform has been implemented that automatically generates the hardware design of the parser given a SCFG. The platform is suitable for embedded systems applications where a natural language interface is required or in pattern recognition tasks. The proposed hardware platform has been tested for various SCFGs and was compared with previously presented hardware parser for SCFGs based on Earley’s parallel algorithm. The hardware generated by the proposed platform is much less complicated than the one of comparison and succeeds a speed-up of one order of magnitude.
With the recent increase in IP traffic owing to fiber communication, previous schemes have become inadequate for link-layer processing of IP over SONET/SDH(POS). In this study, a proposal based on m processors to provide mapping or demapping of IP datagrams from or into SONET/SDH is presented, and the value of m is decided based on the link layer rate of POS. Further, the mathematic model of proposed architecture are presented in detail. Then the realization procedures are implemented in a Field-Programmable Gate Array (FPGA). Both theoretical analysis and experimental test prove that the proposed scheme is efficient, portable, cost-efficient and has a lower hardware resources consumption.
This paper presents a structural approach for testing SRAM-based FPGAs taking into account the configurability of such flexible devices. When SRAM-based FPGA testing is considered, different situations have first to be identified: namely the Application-Oriented Test situation and the Manufacturing-Oriented Test situation. This paper concentrates on Test Pattern Generation and DFT for an Application-Oriented test of SRAM-based FPGAs.
This paper describes how a complete test bench for a Common Rail™ injection system has been built by means of hardware/software codesign techniques. The test bench is made up of two main components: a HW component running mainly on a FPGA device, interacting directly with the electromechanical components (namely, a high pressure pump, six electrical injectors, an electrical discharge valve, two pressure sensors), for high speed signal acquisition and generation, and for closed loop control; and a SW component, written in Visual Basic™, running on a PC, including a graphical user interface for parameters setting and system characterization. An additional signal acquisition board is also used for monitoring six load cells and for temperature control. The two components communicate through the standard PC's parallel port operated in Enhanced Parallel Port mode. The test bench is totally designed, simulated and implemented under the CodeSimulink hardware/software codesign environment, which runs as a plug-in of The Mathworks™ Simulink™ design tool. A few other commercial HW/SW codesign tools have also been considered, but none of them offered either enough performance or flexibility or, more importantly, ease of use and compatibility with existing Simulink simulation models of the various components of the test bench.
This paper presents a novel application-specific field-programmable gate array (FPGA) architecture that satisfies efficient implementation of digit-serial DSP architectures on a digit wide basis. Digit-serial DSP designs have been an effective implementation method for FPGAs. To efficiently realize a digit-serial DSP design on FPGAs, one must create an FPGA architecture optimized for those types of systems. We examine the various circuits used in digit-serial DSP designs to extract their key features that should be reflected in the new FPGA architecture. We explain the design methodology, layout and implementation of the new digit-serial FPGA architecture. Digit-serial DSP designs using the digit-serial FPGA (DS-FPGA) are compared to those implemented on Xilinx FPGAs. We have estimated that the DS-FPGA are about 2.5~3 times more efficient in area and faster than the equivalent digit-serial DSP architectures implemented using Xilinx FPGAs.
This paper presents the design of tunable heterodyne filter that can be used very effectively in the elimination of narrowband interference in wide-band communications. The filter makes use of the heterodyne process to create a tunable notch filter from a fixed coefficient high-pass filter. It has been shown that it is possible to generate a signal that can be used to set the heterodyne frequency to match the tunable heterodyne notch filter to the center of the detected interference through the use of a simple second order IIR band-pass filter to detect the narrowband interference. Moreover various options have been illustrated for the implementation of heterodyne filters using Xilinx Virtex field programable gate arrays (FPGAs).
The availability of Silicon Germanium (SiGe) Heterojunction Bipolar Transistor (HBT) devices has opened a door for GHz Field Programmable Gate Arrays (FPGAs).1,2 The integration of high-speed SiGe HBTs and low-power CMOS gives a significant speed advantage to SiGe FPGAs over CMOS FPGAs. In the past, high static power consumption discouraged the pursuit of bipolar FPGAs from being scaled up significantly. This paper details new ideas to reduce power in designing high-speed SiGe BiCMOS FPGAs. The paper explains new methods to reduce circuitry and utilize a novel power management scheme to achieve a flexible trade-off between power consumption and circuit speed. In addition, new decoding logic is developed with shared address and data lines. A SiGe FPGA test chip based on the Xilinx 6200 architecture has been fabricated for demonstration.
Digital scan conversion (DSC) is the process of converting received ultrasound signals, or echoes, in multi-scan lines, at varying angles (polar coordinate), to a Cartesian raster format for displaying. In this paper, we propose a new DSC technique that uses nearest-neighbor interpolation and the linear interpolation between adjacent scan lines to reduce artifacts on the far field, with smaller angular separation between the interpolated lines. A hardware implementation is described that uses only a FIFO register and a display memory. Rapid prototyping using an ARM processor with FPGA resources is achieved to validate the operation of the described system. Experimental results of the implemented design demonstrated the expected operation of the reduced complexity architecture in term of needed memory. Also, the performance of retrieved images were increased.
Recently there has been a growing interest in window joins for scenarios in which data streams arrive at very high rates and a DSMS is registered with many simultaneous queries. Most of the algorithms proposed in the literature are implemented by software. In contrast, we present a novel algorithm named M3Join which can be implemented by software or hardware. M3Join is based on a join architecture which contains a join-routing-table and several half-join architectures. The join architecture is initialized or updated according to the simultaneous queries. Each tuple in the data streams extended with a route tag searches the join-routing-table and switches into the proper half-join architecture to perform join operations or return to end users. The generated join tuples whose route tags have been updated iterate the above search and join procedures until there is no join tuples produced. The algorithm needs only one scan over the data streams since different join queries share the intermediate results. The FPGA implementation of M3Join is presented in detail. The experimental results show that the algorithm is effective and hardware can accelerate join processing vastly.
This paper presents an FPGA design methodology that can be used to shorten the FPGA design-and-debug cycle, especially as the gate counts increase to multimillions. Core-based incremental placement algorithms, in conjunction with fast interactive routing, are investigated to reduce the design processing time by distinguishing the changes between design iterations and reprocessing only the changed blocks without affecting the remaining part of the design. When combined with a background refinement thread, the incremental approach offers the instant gratification that designers expect, while preserving the fidelity attained through batch-oriented programs. An integrated FPGA design environment is then developed based on the incremental placer and its background refiner. The results show that the incremental design methodology is in orders of magnitude faster than the competing approaches such as the Xilinx M3 tools without sacrificing too much quality.
This paper presents an algorithm for the allocation of on-chip FPGA Block RAMs for the implementation of Real-Time Video Processing Systems. The effectiveness of the algorithm is shown through the implementation of realistic image processing systems. The algorithm, which is based on a heuristic, seeks the most cost-effective way of allocating memory objects to the FPGA Block RAMs. The experimental results obtained, show that this algorithm generates results which are close to the theoretical optimum for most design cases.
Mobile communication has a great potential to the users due to fulfilling the dreams of real-time multimedia communication like voice, image, and text. The huge amount of data redundancy in still image should be compressed using exact image compression algorithm (ICA) before transmitting via wireless channel. Thus, an ICA should be adaptive, simple, and cost-effective and suitable for feasible implementation. Hardware implementation of the different algorithms has improved using modern, fast, and cost-effective technologies. The main aim of this paper is to review and demonstrate various ICAs developed based on image transmission via wireless channel as well as their hardware implementation. Finally, this review makes bridge for researchers to the future relative studies between different algorithms and architectures, and stands as a reference point for developing more controlling and flexible structures.
In this paper, we present two alternative architectures and FPGA implementations of the 64-bit NESSIE proposal, MISTY1 block cipher. The first architecture is suitable for applications with high-performance requirements. A throughput of up to 12.6 Gbps can be achieved at a clock frequency of 168 MHz. The main characteristic of this architecture is that uses RAM blocks embedded in modern FPGA devices in order to implement the S-boxes defined in the block cipher algorithm. The second architecture can be used in implementing applications on area-constrained systems. It utilizes feedback logic and inner pipeline with negative edge-triggered register. This technique shortens the critical path, without increasing the latency of the MISTY1 algorithm execution. Compared with an implementation without inner pipeline, performance improvement of 97% is achieved. The measured throughput of the second architecture implementation is 561 Mbps at 79 MHz.
In this paper, we present two new hardware architectures that implement the Smith–Waterman algorithm for DNA sequence alignment. Previous low-cost approaches based on Field Programmable Gate Array (FPGA) technology are reviewed in detail and then improved with the goal of increased performance at the same cost (i.e., area). This goal is achieved through low level optimizations aimed to adapt the systolic structure implementing the algorithm to the regular structure of FPGAs, essentially finding the optimum granularity of the systolic cells. The proposed architectures achieve processing rates close to 1 Gbps, clearly outperforming previous approaches. Comparing to the reported FPGA results of the computation of the edit-distance between two DNA sequences, throughput is doubled for the same clock frequency with a minimum area penalty. The design has been implemented on an FPGA-based prototyping board integrated into a bioinformatics system. This has allowed validating the approach in a real system (i.e., including I/O and database access), and comparing the proposed hardware solution to purely software approaches. As shown in the paper, the results are outstanding even for slow-rate buses.
Digital noise generators are basic components in communication channel emulators which are used to make the test and verification of communication systems easy and fast. In this paper, a novel scheme of implementing high-speed high-probability-resolution digital AWGN generator is proposed based on a derived universal algorithm for any white f(x)-distributed noise generation. The precision and the hardware complexity of the proposed AWGN generator are discussed in detail. Implemented with FPGA, the presented AWGN generator is utilized to test a high-speed UWB communication system and performs well with excellent precision in noise probability distribution. The proposed novel scheme of implementing digital noise generators is important and helpful in building communication channel emulators to test and verify communication systems.
The general packet radio service (GPRS) has been developed to allow packet data to be transported efficiently over an existing circuit-switched radio network, such as GSM. The main application of GPRS are in transporting Internet protocol (IP) datagrams from web servers (for telemetry or for mobile Internet browsers). Four GPRS baseband coding schemes are defined to offer a trade-off in requested data rates versus propagation channel conditions. However, data rates in the order of > 100 kbits/s are only achievable if the simplest coding scheme is used (CS-4) which offers little error detection and correction (EDC) (requiring excellent SNR) and the receiver hardware is capable of full duplex which is not currently available in the consumer market. A simple EDC scheme to improve the GPRS block error rate (BLER) performance is presented, particularly for CS-4, however gains in other coding schemes are seen. For every GPRS radio block that is corrected by the EDC scheme, the block does not need to be retransmitted releasing bandwidth in the channel and improving the user's application data rate. As GPRS requires intensive processing in the baseband, a viable field programmable gate array (FPGA) solution is presented in this paper.
Optimized implementation of computationally intensive cryptographic transformation is an area of active research, mainly focused on Advanced Encryption Standard (AES). Byte substitution implemented using substitution boxes (S-boxes), is the main transformation in AES which strains the enabling embedded platform, e.g., Field Programmable Gate Arrays. We represent a novel clocking technique enabling optimized implementation of Byte Substitution that enhances processing speed and reduces the area required for S-boxes on Xilinx FPGA Block RAM (BRAM).
Many problems in bioinformatics represent great computational challenges due to the huge amount of biological data to be analyzed. Reconfigurable systems can offer custom-computing machines, with orders of magnitude faster than regular software, running in general-purpose processors. We present a methodology for using a configware system in an interesting problem of molecular biology: the splice junction detection in eukaryote genes. Decision trees were developed using a benchmark of DNA sequences. They were converted into logical equations, simplified, and submitted to a Boolean minimization. The resulting circuit was implemented in reconfigurable parallel hardware and evaluated with a five-fold cross-validation procedure, run in a second level of parallelism. The average accuracy achieved was 90.41% and it takes 18 ns to process each data record with 60 nucleotides.
Current multimedia design processes suffer from the excessively large time spent on testing new IP-blocks with references based on large video encoders specifications (usually several thousands lines of code). The appropriate testing of a single IP-block may require the conversion of the overall encoder from software to hardware, which is difficult to complete in the short time required by the competition-driven reduced time-to-market demanded for the adoption of a new video coding standard. This paper presents a new design flow to accelerate the conformance testing of an IP-block using the H.264/AVC software reference model. An example block of the simplified 8 × 8 transformation and quantization, which is adopted in FRExt, is provided as a case study demonstrating the effectiveness of the approach.