Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Quantum-Dot Transistor Based Multi-Bit Multiplier Unit for In-Memory Computing

    In-memory computing is an emerging technique to fulfill the fast growing demand for high-performance data processing. This technique provides fast processing and high throughput by accessing data stored in the memory array rather than dealing with complicated operation and data movement on hard drive. For data processing, the most important computation is dot product, which is also the core computation for applications such as deep learning neuron networks, machine learning, etc. As multiplication is the key function in dot product, it is critical to improve its performance and achieve faster memory processing. In this paper, we present a design with the ability to perform in-memory multi-bit multiplications. The proposed design is implemented by using quantum-dot transistors, which enable multi-bit computations in the memory cell. Experimental results demonstrate that the proposed design provides reliable in-memory multi-bit multiplications with high density and high energy efficiency. Statistical analysis is performed using Monte Carlo simulations to investigate the process variations and error effects.

  • articleFree Access

    A Multi-Bit Non-Volatile Compute-in-Memory Architecture with Quantum-Dot Transistor Based Unit

    The recent advance of artificial intelligence (AI) has shown remarkable success for numerous tasks, such as cloud computing, deep-learning, neural network and so on. Most of those applications rely on fast computation and large storage, which brings various challenges to the hardware platform. The hardware performance is the bottle neck to break through and therefore, there is a lot of interest in exploring new solutions for computation architecture in recent years. Compute-in-memory (CIM) has drawn attention to the researchers and it is considered as one of the most promising candidates to solve the above challenges. Computing-In-memory is an emerging technique to fulfill the fast-growing demand for high-performance data processing. This technique offers fast processing, low power and high performance by blurring the boundary between processing cores and memory units. One key aspect of CIM is performing matrix-vector multiplication (MVM) or dot product operation through intertwining of processing and memory elements. As the primary computational kernel in neural networks, dot product operation is targeted to be improved in terms of its performance. In this paper, we present the design, implementation and analysis of quantum-dot transistor (QDT) based CIM, from the multi-bit multiplier to the dot product unit, and then the in-memory computing array.

  • articleNo Access

    An In-Memory-Computing Structure with Quantum-Dot Transistor Toward Neural Network Applications: From Analog Circuits to Memory Arrays

    The rapid advancements in artificial intelligence (AI) have demonstrated great success in various applications, such as cloud computing, deep learning, and neural networks, among others. However, the majority of these applications rely on fast computation and large storage, which poses significant challenges to the hardware platform. Thus, there is a growing interest in exploring new computation architectures to address these challenges. Compute-in-memory (CIM) has emerged as a promising solution to overcome the challenges posed by traditional computer architecture in terms of data transfer frequency and energy consumption. Non-volatile memory, such as Quantum-dot transistors, has been widely used in CIM to provide high-speed processing, low power consumption, and large storage capacity. Matrix-vector multiplication (MVM) or dot product operation is a primary computational kernel in neural networks. CIM offers an effective way to optimize the performance of the dot product operation by performing it through an intertwining of processing and memory elements. In this paper, we present a novel design and analysis of a Quantum-dot transistor (QDT) based CIM that offers efficient MVM or dot product operation by performing computations inside the memory array itself. Our proposed approach offers energy-efficient and high-speed data processing capabilities that are critical for implementing AI applications on resource-limited platforms such as portable devices.

  • articleNo Access

    Control of the memory cell magnetization by a combined pulse of local magnetic fields

    We study the process of controlling the states of a magnetic memory cell by a pulsed magnetic field. The magnetization dynamics of such a system is described by the Landau-Lifshitz equation. We find the optimal parameters of time dependence, amplitude and duration of magnetic field pulses created by currents in the control system, which provide a fast inertialess process of switching the magnetization of the functional element of the cell.

  • articleNo Access

    A CONFIGURABLE BOSE–CHAUDHURI–HOCQUENGHEM CODEC ARCHITECTURE FOR FLASH CONTROLLER APPLICATIONS

    Error correction coding (ECC) has become one of the most important tasks of flash memory controllers. The gate count of the ECC unit is taking up a significant share of the overall logic. Scaling the ECC strength to the growing error correction requirements has become increasingly difficult when considering cost and area limitations.

    This work presents a configurable encoding and decoding architecture for binary Bose–Chaudhuri–Hocquenghem (BCH) codes. The proposed concept supports a wide range of code rates and facilitates a trade-off between throughput and space complexity. Commonly, hardware implementations for BCH decoding perform many Galois field multiplications in parallel. We propose a new decoding technique that uses different parallelization degrees depending on the actual number of errors. This approach significantly reduces the number of required multipliers, where the average number of decoding cycles is even smaller than with a fully parallel implementation.

  • articleNo Access

    CWC: A Companion Write Cache for Energy-Aware Multi-Level Spin-Transfer Torque RAM Cache Design

    Due to its large leakage power and low density, the conventional SARM becomes less appealing to implement the large on-chip cache due to energy issue. Emerging non-volatile memory technologies, such as phase change memory (PCM) and spin-transfer torque RAM (STT-RAM), have advantages of low leakage power and high density, which makes them good candidates for on-chip cache. In particular, STT-RAM has longer endurance and shorter access latency over PCM. There are two kinds of STT-RAM so far: single-level cell (SLC) STT-RAM and multi-level cell (MLC) STT-RAM. Compared to the SLC STT-RAM, the MLC STT-RAM has higher density and lower leakage power, which makes it a even more promising candidate for future on-chip cache. However, MLC STT-RAM improves density at the cost of almost doubled write latency and energy compared to the SLC STT-RAM. These drawbacks degrade the system performance and diminish the energy benefits. To alleviate these problems, we propose a novel cache organization, companion write cache (CWC), which is a small fully associative SRAM cache, working with the main MLC STT-RAM cache in a master-and-servant way. The key function of CWC is to absorb the energy-consuming write updates from the MLC STT-RAM cache. The experimental results are promising that CWC can greatly reduce the write energy and dynamic energy, improve the performance and endurance of MLC STT-RAM cache compared to a baseline.

  • articleNo Access

    An Energy-Efficient Reliable Heterogeneous Uncore Architecture for Future 3D Chip-Multiprocessors

    Uncore components such as cache hierarchy and on-chip interconnects consume a significant portion of overall energy consumption in emerging embedded processors. In Nanoscale technologies, static power consumption due to leakage current has become a serious issue in the design of SRAM-based on-chip cache memories and interconnections. To address this issue, non-volatile memory technologies such as STT-RAMs have been proposed as a replacement for SRAM cells due to their near-zero static power and high memory density. Nonetheless, STT-RAMs suffer from some failures such as read-disturb and limited endurance as well as high switching energy. One effective way to decrease the STTRAMs’ switching energy is to reduce their retention time; however, reducing the retention time has a negative impact on the reliability of STT-RAM cells. In this paper, we propose a heterogeneous last level cache (LLC) architecture for 3D embedded chip-multiprocessors (3D eCMPs) which employs two types of STT-RAM memory banks with retention time of 1s and 10ms to provide a beneficial trade-off between reliability, energy consumption, and performance. To this end, we also propose a convex optimization model to find the optimal configurations for these two kinds of memory banks. In parallel with hybrid memory architecting, optimizing the number and placement of through silicon vias (TSVs) as a main component of on-chip interconnection for building 3D CMPs is another important target of the proposed optimization approach. Experimental results show that the proposed method improves the energy-delay products and throughput by about 69% and 34.5% on average compared with SRAM configurations.

  • articleNo Access

    Enhancing the Lifetime of a Phase Change Memory with Bit-Flip Reversal

    Phase Change Memory (PCM) has evolved as a promising alternative over Dynamic Random Access Memory (DRAM) in terms of cell density and leakage power. While non-volatility is a desirable feature, it gives rise to the possibility of the data being present even after the power is switched off. To secure the data, encryption is normally done by using the standard Advanced Encryption Standard (AES) algorithm. Encrypting the data results in huge number of bit-flips, which reduces the lifetime of a PCM. The proposed method increases the lifetime of PCM by reducing the number of bit-flips occurred due to the encryption of modified words only and leaving the unmodified words as they are. The generated encrypted text, which is written by using the bit-flips reversal method, reduces the number of cells involved in writing by approximately 25%. This method is implemented by using Gem5 simulator and is evaluated with splash2 benchmark suite. It is observed that the proposed method improves the lifetime of a PCM memory by 15% without consuming extra power.

  • articleNo Access

    An Efficient Sorting Algorithm for Non-Volatile Memory

    Non-volatile memory (NVM) has emerged as an alternative of the next-generation memory due to its non-volatility, byte addressability, high storage-density, and low-energy consumption. However, NVM also has some limitations, e.g. asymmetric read and write latency. Therefore, at present, it is not realistic to completely replace DRAM with NVM in computer systems. A more feasible scheme is to adopt the hybrid memory architecture composed of NVM and DRAM. Following the assumption of hybrid memory architecture, in this paper, we propose an NVM-friendly sorting algorithm called NVMSorting. Particularly, we introduce a new concept called Natural Run to improve the existing MONTRES algorithm. Further, we apply the proposed NVMSorting to database join algorithms to improve the performance of the existing sort-merge join. To verify the performance of our proposal, we implement six existing sorting algorithms as baselines, including the MONTRES algorithm, and conduct comparative experiments on real Intel Optane DC persistent memory. The results show that NVMSorting outperforms other sorting algorithms in terms of execution time and NVM writes. In addition, the results of the join experiment show that the NVMSorting algorithm achieves the highest performance among all schemes. Especially, in the partially ordered data, the execution time of NVMSorting is 2.9%, 2.7%, and 4.2% less than MONTRES, external sort, and quick sort, respectively. Also, the amount of NVM writes of the NVMSorting is 26.1%, 43.6%, 96.2% less than MONTRES, external sort, and quick sort, respectively.

  • chapterNo Access

    Quantum-Dot Transistor Based Multi-Bit Multiplier Unit for In-Memory Computing

    In-memory computing is an emerging technique to fulfill the fast growing demand for high-performance data processing. This technique provides fast processing and high throughput by accessing data stored in the memory array rather than dealing with complicated operation and data movement on hard drive. For data processing, the most important computation is dot product, which is also the core computation for applications such as deep learning neuron networks, machine learning, etc. As multiplication is the key function in dot product, it is critical to improve its performance and achieve faster memory processing. In this paper, we present a design with the ability to perform in-memory multi-bit multiplications. The proposed design is implemented by using quantum-dot transistors, which enable multi-bit computations in the memory cell. Experimental results demonstrate that the proposed design provides reliable in-memory multi-bit multiplications with high density and high energy efficiency. Statistical analysis is performed using Monte Carlo simulations to investigate the process variations and error effects.

  • chapterNo Access

    A Multi-Bit Non-Volatile Compute-in-Memory Architecture with Quantum-Dot Transistor Based Unit

    The recent advance of artificial intelligence (AI) has shown remarkable success for numerous tasks, such as cloud computing, deep-learning, neural network and so on. Most of those applications rely on fast computation and large storage, which brings various challenges to the hardware platform. The hardware performance is the bottle neck to break through and therefore, there is a lot of interest in exploring new solutions for computation architecture in recent years. Compute-in-memory (CIM) has drawn attention to the researchers and it is considered as one of the most promising candidates to solve the above challenges. Computing-In-memory is an emerging technique to fulfill the fast-growing demand for high-performance data processing. This technique offers fast processing, low power and high performance by blurring the boundary between processing cores and memory units. One key aspect of CIM is performing matrix-vector multiplication (MVM) or dot product operation through intertwining of processing and memory elements. As the primary computational kernel in neural networks, dot product operation is targeted to be improved in terms of its performance. In this paper, we present the design, implementation and analysis of quantum-dot transistor (QDT) based CIM, from the multi-bit multiplier to the dot product unit, and then the in-memory computing array.

  • chapterNo Access

    An In-Memory-Computing Structure with Quantum-Dot Transistor Toward Neural Network Applications: From Analog Circuits to Memory Arrays

    The rapid advancements in artificial intelligence (AI) have demonstrated great success in various applications, such as cloud computing, deep learning, and neural networks, among others. However, the majority of these applications rely on fast computation and large storage, which poses significant challenges to the hardware platform. Thus, there is a growing interest in exploring new computation architectures to address these challenges. Compute-in-memory (CIM) has emerged as a promising solution to overcome the challenges posed by traditional computer architecture in terms of data transfer frequency and energy consumption. Non-volatile memory, such as Quantum-dot transistors, has been widely used in CIM to provide high-speed processing, low power consumption, and large storage capacity. Matrix-vector multiplication (MVM) or dot product operation is a primary computational kernel in neural networks. CIM offers an effective way to optimize the performance of the dot product operation by performing it through an intertwining of processing and memory elements. In this paper, we present a novel design and analysis of a Quantum-dot transistor (QDT) based CIM that offers efficient MVM or dot product operation by performing computations inside the memory array itself. Our proposed approach offers energy-efficient and high-speed data processing capabilities that are critical for implementing AI applications on resource-limited platforms such as portable devices.