Search name | Searched On | Run search |
---|---|---|
Keyword: Power Consumption (34) | 2 Apr 2025 | Run |
You do not have any saved searches
Approximate computing (AC) in arithmetic logic has become a viable option in applications requiring error tolerance in energy-efficient architectures. AC relied on approximate arithmetic functions to reduce delay, area, and power consumption while sacrificing accuracy to reduce delay, power, and area. In this research, a novel two approximate recursive multipliers (RMul-1, RMul-2) and an approximate adder have been designed to reduce power consumption, area, and computational delay in error-tolerant systems. The recursive multipliers utilize a combination of NOR, AND, half adder, and full adder gates to achieve low power and area-efficient designs. Furthermore, to reduce time, the approximation adder uses an optimal combination of AND, OR, and MUX gates. Furthermore, the cadence RTL compiler synthesizes the proposed multiplier using 28nm technology, and it is compared to previous approximation multipliers. Image processing applications are simulated, and the performance of the proposed multipliers is verified using simulations using the Xilinx ISE 13.2 tool. The proposed RMul designs outperform current techniques by up to 30.3% in area, 20.2% in power, and 43.9% in delay, according to experimental results. In addition, the suggested multipliers outperform existing multipliers in terms of SSIM and PSNR.
Most real-time systems are embedded in portable, battery-powered devices that have strict limitations on power consumption. Safety-critical embedded systems, in particular, demand a high level of reliability. To effectively enhance both reliability and power consumption, it is crucial to consider both criteria with an accurate and stable model. Existing research on power and reliability models for embedded systems often lacks the accuracy required for safety-critical applications and fails to account for all hardware and software components. This paper proposes a machine learning-based optimization model designed to improve the accuracy and stability of reliability and power consumption assessments. The proposed model demonstrates a significant enhancement in accuracy compared to previous randomization models, showing a 2.75% improvement in reliability and a 0.88% improvement in power consumption relative to existing state-of-the-art models.
We study the problem of resizing gates so as to reduce overall power consumption while satisfying a circuit's timing constraints. Polynomial time algorithms for series-parallel and tree circuits are obtained. Gate resizing with multigate modules is shown to be NP-hard. Algorithms that improve upon those presented by Chen and Sarrafzadeh4 for general circuits are also developed.
In this paper, we review some new distributed algorithms that construct sparse subgraphs with bounded degree of the unit disk graph efficiently for wireless ad hoc networks. They maintain a linear number of links while still preserving power-efficient routes for any pair of nodes. It was open whether the Yao plus reverse Yao graph and the symmetric Yao graph are spanners. We show that the Yao plus reverse Yao graph has a bounded power stretch factor 2 in civilized unit disk graph. In addition, we review a recent example by M. Grünewald et al. [6] to show that the symmetric Yao graph does not have a constant bounded stretch factor. Finally, we conduct simulations to study the practical performances of these structures. All structures have small power stretch factors for randomly generated unit disk graphs in our experiments.
The Parallel Random Access Machine is a very strong model of parallel computing that has resisted cost-efficient implementation attempts for decades. Recently, the development of VLSI technology has provided means for indirect on-chip implementation, but there are different variants of the PRAM model that provide different performance, area and power figures and it is not known how their implementations compare to each others. In this paper we measure the performance and estimate the cost of practical implementations of four PRAM models including EREW, Limited Arbitrary CRCW, Full Arbitrary CRCW, Full Arbitrary Multioperation CRCW on our Eclipse chip multiprocessor framework. Interestingly, the most powerful model shows the lowest simulation cost and highest performance/area and performance/power figures.
In this paper, we present a new approach for monitoring power consumption in several processes. The generalization of the envSOM algorithm, a variant of Self-Organizing Map (SOM), is used to build an electrical model and visualize the information. The envSOM extended to n hierarchical phases allows us to obtain a more accurate model from real past data. The model is conditioned hierarchically on environmental variables. In this way, time variables can be used to consider seasonality and weekday/hour periodicity. Time variable maps and electrical component planes make it possible to visualize and analyze power consumption. The representation of the Best Matching Unit (BMU) or its trajectory on these maps enables the on-line monitoring.
Power consumption of multimedia applications executing on embedded cores is heavily dependent on data transfers between system memory and processing units. The purpose of this paper is to extend an existing power optimizing methodology based on data-reuse decisions, in order to determine the optimal solution in a rapid and reliable way. An analytical approach is proposed by extracting expressions for the number of accesses to each memory layer. Moreover, the design space is further reduced since these analytical expressions are calculated only for a subset of all transformations. The results concerning the power efficiency of data-reuse transformations are in agreement to those in previous studies. However, the exploration time of the design space is significantly reduced. The proposed methodology is also applied to the case of multiple parallel processing cores, proving that the relative effect of each transformation is independent on the number of processors and the applied memory architecture.
Computer-aided design (CAD) tools are frequently employed to verify the design objectives before the fabrication of an integrated circuit. An important circuit parameter that requires accurate characterization is the power consumption due to the strict constraints on the acceptable power envelope of integrated systems. Circuit simulators typically provide built-in functions to measure the power consumption. However, the accuracy of the measured power is mostly overlooked since the approximations and the methodologies used by the existing built-in power estimation tools are not well documented. The research community tends to assume that the built-in functions provide accurate power figures. This blind-trust in the CAD tools, however, may lead to gross errors in power estimation. A generic methodology to accurately measure the power and energy consumption with the circuit simulators is described in this paper. An equation to calculate the device power consumption based on the different current conduction paths in a MOSFET is presented. An expression for the total power consumption of a complex circuit is derived by explicitly considering the different circuit terminals including the inputs, the outputs, and the body-contacts. Results indicate that the power measurements with the built-in functions of widely used commercial circuit simulators can introduce significant errors in a 65 nm CMOS technology. For deeply scaled nano-CMOS circuits, a conscious power and energy measurement with the proposed explicit methodology is recommended for an accurate pre-fabrication circuit characterization.
In this paper, we present a bisection-based power reduction design for CMOS flash analog-to-digital converters (ADCs). A comparator-based inverter is employed along with two switches of an NMOS and a PMOS, the bisection method can let only half of comparators in a flash ADC work in every clock cycle for reducing power consumption. A practical example of 6-bit flash ADC operates at 200 MHz sampling rate and 3.3 V supply voltage is demonstrated. The power consumption of proposed circuit is only 40.75 mW with HSPICE simulation. Compared with the traditional flash ADC, our bisection method can reduce up to 43.18% in terms of power dissipation.
Contemporary memory system design aims to achieve high performance and low energy consumption at a reasonable cost. To balance these requirements, we propose a modular reconfigurable architecture to design memories over FPGAs. The proposed memory system can be reconfigured taking into account: the number of words, the word size of the data, the number of physical memory banks and the number of ports of the banks. Different operating modes have been defined, each one implying a certain configuration for the memory system. Simulations of these modes show the performance of our reconfigurable memory in terms of timing and power consumption.
WLAN standards make use of different transmission modes to cope with different channel conditions, these modes make use of different modulation constellations and code rates. Data encoding is done with a 64-state convolutional code of rate 1/2, some modes employ this basic rate and others puncture the encoded data to obtain a rate of 3/4. At the receiver, the decision depth needed by the Viterbi decoder is higher for decoding punctured modes than for decoding non-punctured modes, this means that punctured modes need a greater area and, then, they cause a higher power consumption. This fact is used in this letter to reduce the power consumption of the Viterbi decoder when dealing with half-rate code modes, an architecture that disables the unnecessary hardware in the non-punctured modes allows a reduction of 20% in the dynamic power consumption with an area increase of only 1%.
The increasing power consumption levels of integrated circuits (ICs) have become a major concern of the semiconductor industry. Excessive power dissipation causes overheating, which can lead to soft errors or permanent damage. It also limits battery life in portable equipment. High power consumption can be reduced by properly increasing area. However, arbitrarily large area, namely high number of functional units (FU) in high-level view, dramatically increases IC cost. This paper describes a new dynamic-power aware High Level Synthesis data path approach that considers dynamic FU allocation while attempting to minimize area, power, or make a trade-off between them. The experimental results have shown that when the area is nearly the same, our approach delivers a 5.99% reduction in power consumption. And when the power consumption is nearly the same, a 11.81% reduction in total FU area occurs. And we can obtain different optimal power–area trade-off values by adjusting power and area ratios.
This paper presents a new power efficient single ended sense amplifier (SA). The proposed circuit is based on the direct current voltage conversion technique. It has been simulated using Microwind3 and DSCH3 tools (advanced BSIM 4 level) for 90 nm CMOS technology in terms of power consumption, sense time and results were compared to other circuits. The proposed SA circuit consumes more than 50% less power and gives 90% faster sensing speed compared to other circuits. The lower power consumption is due to lower leakage current, lower voltage drop on bit-line and faster speed is due to positive feedback of the circuit. The proposed circuit is more robust against any process and temperature variation.
In this paper, an energy efficiency (EE) analysis for elastic optical networks (EONs) considering the adaptive allocation of the transmitted power, spectrum bandwidth, modulation format and forward error correction (FEC) type has been proposed. The trade-off between the FEC coding types and the optical layer EE (OL-EE) was investigated considering the capacity of information transmission and power consumption. The power consumption model considers elements involved in the lightpath establishment, namely transmitter, receiver, bandwidth-variable optical crossconnect (BV-OXC), optical amplifiers and network control. Numerical examples have demonstrated EE increasing for the lightpaths when adaptive FEC coding is deployed; furthermore, the EE varies with the distance of the optical network nodes and hop count. In this sense, it is observed that the EE decreases with the increasing of FEC energy per bit consumption; however, an operation region is verified in which the overall EE network with FEC is superior to the system without FEC coding. After this point, the increases in the FEC energy per bit consumption will affect negatively the EE metric. For instance, with the proposed OL-EE model it is possible to determinate the maximum energy consumption allowed to the FEC codes without OL-EE system degradation.
Dynamic frequency scaling (DFS) is a feature commonly found in modern processors. It lowers the clock frequency of a core according to the load level and reduces the power consumption. In this paper, we present a case study of tuning DFS parameters on a platform with an AMD Phenom II X6 using the SPECjEnterprise2010 (jEnt10) and SPECjbb2005 (jbb05) as the workload. In jEnt10, a longer sampling period of core utilization (up to 1.5s) reduced the power by 6Watt at 25% load level. At 50% load level, combining it with an increased threshold level (98%) to switch the clock frequency further reduced the power consumption by up to 10Watt. In jbb05, stretching the sampling period was only effective up to 0.5s. The maximum reduction was observed at around 60% load level. Raising the threshold level was not effective for jbb05.
The conventional readout of one-transistor–one-capacitor dynamic random-access memories (1T–1C DRAMs) depends on using a sense amplifier to develop the bitline voltage and settle it to the voltage of the power supply, VDD, or to 0V depending on whether the stored data is “1” or “0,” respectively. However, using the sense amplifier makes the reading process sluggish. In this paper, a capacitive-voltage divider-based readout scheme is proposed. According to this scheme, the developed bitline voltage is converted into a pulse with a certain starting time. Specifically, this pulse appears at a later time in case of “0” storage than that if a “1” is stored, thus the proposed scheme is aptly called “time-domain readout.” The effects of parameter and component mismatches and technology scaling on the proposed scheme are investigated. The proposed scheme is analyzed quantitatively with a suggestion given to widen the time gap between the starting times of the pulses corresponding to the “0” and “1” states. The proposed scheme is verified by simulation adopting the 45 nm CMOS technology with VDD=1V. According to the simulation results, percentage savings of 68.8%, 56.8%, and 32% in the read-access time, the read-cycle time, and the average power-delay product, respectively, are shown. The proposed scheme requires approximately 40% extra area overhead for the reading circuitry. Also, a noise analysis is performed and it is found that the device noise does not affect the proposed scheme significantly.
With more cores integrated into a single chip and the fast growth of main memory capacity, the DRAM memory design faces ever increasing challenges. Previous studies have shown that DRAM can consume up to 40% of the system power, which makes DRAM a major factor constraining the whole system’s growth in performance. Moreover, memory accesses from different applications are usually interleaved and interfere with each other, which further exacerbates the situation in memory system management. Therefore, reducing memory power consumption has become an urgent problem to be solved in both academia and industry. In this paper, we first proposed a novel strategy called Dynamic Bank Partitioning (DBP), which allocates banks to different applications based on their memory access characteristics. DBP not only effectively eliminates the interference among applications, but also fully takes advantage of bank level parallelism. Secondly, to further reduce power consumption, we propose an adaptive method to dynamically select an optimal page policy for each bank according to the characteristics of memory accesses that each bank receives. Our experimental results show that our strategy not only improves the system performance but also reduces the memory power consumption at the same time. Our proposed scheme can reduce memory power consumption up to 21.2% (10% on average across all workloads) and improve the performance to some extent. In the case that workloads are built with mixed applications, our scheme reduces the power consumption by 14% on average and improves the performance up to 12.5% (3% on average).
Current networks-on-chip (NoCs) may include many Intellectual Properties (IPs). As those IPs do not necessarily operate at the same clock frequency, a significant number of Phase Locked Loops (PLLs) are required. Since a PLL is very power consuming (e.g., a PLL delivering a 6 GHz frequency consumes 11 mW), one needs to reduce the number of PLLs. To the best of our knowledge, only one work in literature tackled this problem. Since the interested problem is not polynomial in time, we developed heuristic-based methods and found that our work outperforms that which is described in the literature both in terms of number 30% and power consumption 25% of PLLs with less CPU time.
In this paper, a fine-grained scheduling approach to enhance lifetime reliability of multiprocessor systems is presented. Lifetime reliability is an important and emerging concern arising with advances in technology due to the increase in power density. As a result, temperature variation accelerates wear-out, leading to system failures. The antagonistic relation of lifetime reliability with other design parameters of multiprocessor systems, such as power consumption and temperature, makes its improvement more challenging. Lifetime reliability enhancement approaches are considered at different levels of abstractions and for various system components. Our proposed scheduling method extracts the precise low-level information of lifetime reliability from determined blocks of processing cores and utilizes them at system-level to study the system state criticality at a low-performance cost. Based on the online periodic monitoring, our proposed scheduling approach applies control actions to improve lifetime reliability of the system according to its effective parameters. To demonstrate the effectiveness of our proposed scheduling approach in improving lifetime reliability and compare it to the previous related research, several experiments are considered. To simulate the target multiprocessor system and the proposed approach, the Enhanced Super ESCalar (ESESC) simulator for computer architecture tool is utilized. The experimental results show that employing our proposed scheduling method improves lifetime reliability at about 54%. Moreover, it causes 14% and 12% enhancement in temperature and power consumption. Furthermore, we perform a Monte Carlo-based simulation to validate the proposed scheduling approach and generalize it to other applications at very low-performance overhead. Experimental results show that Monte Carlo simulation extremely decreases the execution time rather than ESESC which makes utilizing our scheduling approach reasonable in large applications.
The technology is shrinking in recent days which leads to growing concerns related to various design metrics. Leakage power tends to grow with the array size as most of the Static Random Access Memory (SRAM) cells operate in standby mode. The data to be written into the SRAM become difficult as the supply voltage decreases. So, stability in write mode requires enhancement. As SRAM is used for the on-chip computations, the faster write operation is required. The half-select issue in SRAM design needs to be eliminated so that bit interleaving architecture can be employed for the SRAM array enabling the protection from soft errors. A new Proposed 10 Transistor Bit-Interleaved SRAM cell has been designed addressing the above concerns. Employment of high-threshold voltage devices in read path and absence of NMOS device in one of the inverters reduces leakage power. Cut-off switch enables faster write operation and enhanced write stability. Cross point selection in write mode eliminates the half-select issue observed by carrying 1000 Monte-Carlo simulations. It has lower leakage power while holding 0 compared to 8 Transistor, Fully Differential 8 Transistor and Write Assist Low Power 11 Transistor SRAM cells at the worst fast-fast process corner for 0.9 V supply voltage. Write 1 Power Delay Product is lower than 8 Transistor, Fully Differential 8 Transistor and Write Assist Low Power 11 Transistor SRAM cells at slow-slow corner at 0.9V supply voltage. All the design metrics have been evaluated by performing post-layout simulation in Cadence Virtuoso in 45-nm technology.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.