Processing math: 100%
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    HIGH-PERFORMANCE MATHEMATICAL FUNCTIONS FOR SINGLE-CORE ARCHITECTURES

    Nowadays high-performance computing (HPC) architectures are designed to resolve assorted sophisticated scientific as well as engineering problems across an ever intensifying number of HPC and professional workloads. Application and computation of key trigonometric functions sine and cosine are in all spheres of our daily life, yet fairly time consuming task in high-performance numerical simulations. In this paper, we have delivered a detailed deliberation of how the micro-architecture of single-core Itanium® and Alpha 21264/21364 processors as well as the manual optimization techniques improve the computing performance of several mathematical functions. On describing the detailed algorithm and its execution pattern on the processor, we have confirmed that the processor micro-architecture side by side manual optimization techniques ameliorate computing performance significantly as compared to not only the standard math library's built-in functions with compiler optimizing options but also Intel® Itanium® library's highly optimized mathematical functions.

  • articleNo Access

    A Fault Tolerant Parallelism Approach for Implementing High-Throughput Pipelined Advanced Encryption Standard

    Advanced Encryption Standard (AES) is the most popular symmetric encryption method, which encrypts streams of data by using symmetric keys. The current preferable AES architectures employ effective methods to achieve two important goals: protection against power analysis attacks and high-throughput. Based on a different architectural point of view, we implement a particular parallel architecture for the latter goal, which is capable of implementing a more efficient pipelining in field-programmable gate array (FPGA). In this regard, all intermediate registers which have a role for unrolling the main loop will be removed. Also, instead of unrolling the main loop of AES algorithm, we implement pipelining structure by replicating nonpipelined AES architectures and using an auto-assigner mechanism for each AES block. By implementing the new pipelined architecture, we achieve two valuable advantages: (a) solving single point of failure problem when one of the replicated parts is faulty and (b) deploying the proposed design as a fault tolerant AES architecture. In addition, we put emphasis on area optimization for all four AES main functions to reduce the overhead associated with AES block replication. The simulation results show that the maximum frequency of our proposed AES architecture is 675.62MHz, and for AES128 the throughput is 86.5Gbps which is 30.9% better than its closest existing competitor.

  • articleNo Access

    Improved Synthesis of Generalized Parallel Counters on FPGAs Using Only LUTs

    Generalized parallel counters (GPCs) are frequently used to construct high speed compressor trees on field programmable gate arrays (FPGAs). The introduction of fast carry-chain in FPGAs has greatly improved the performance of these elements. Evidently, a large number of GPCs have been proposed in literature that use a combination of look-up tables (LUTs) and carry-chains. In this paper, we take an alternate approach and try to eliminate the carry-chain from the GPC structure. We present a heuristic that aims at synthesizing GPCs on FPGAS using only the general LUT fabric. The resultant GPCs are then easily pipelined by placing registers at the output node of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry-chain from the GPC structure with an increase in LUT count in some GPCs. Experimentation using Xilinx FPGAs shows that filter systems constructed using our GPCs show an improvement in speed and power performance and a comparable area performance.

  • articleNo Access

    Design and Implementation of Face Detection Architecture for Heterogeneous System-on-Chip

    The seminal work of Viola and Jones for automatic face detection is widely used in many human–computer interaction and computer vision applications. On analyzing the existing face detection architectures, we observed that integral image calculation, feature computation in cascaded classifier, and recursive scanning of image with sliding window at multiple scales are the major reasons which increase the memory and time complexity of the algorithm. Therefore, in this paper, we have proposed a hardware–software co-design of Viola–Jones face detector for System-on-Chip (SoC). In the proposed architecture, integral image computation and cascaded classifier sub-modules are implemented on the hardware — Programmable Logic FPGA (PL-FPGA), while the image scaling and nonmaximum suppression sub-modules are implemented on the software — Processing System ARM (PS-ARM). Concepts of pipelining, folding, and parallel processing are effectively utilized to produce an optimum design architecture. The proposed architecture has been tested on PYNQ-Z1 board. The implementation results in a processing speed of 95 fps with PL and PS clocks of 100MHz and 650MHz, respectively, for an image of QVGA resolution. Results analysis demonstrates that the proposed architecture has minimum resource requirement as compared to state-of-the-art implementations, which facilitates and promotes the usage of resource-constrained low-cost ZYNQ SoC for face detection.