Please login to be able to save your searches and receive alerts for new content matching your search criteria.
For exploiting the inherent parallelism enclosed in System-on-Chip designs, special architectural prerequisites have to be met. These prerequisites mainly affect the communication infrastructure, as parallel processing of all hardware modules accounts for a continuous and sufficient provision of data. While traditional communication architectures may fulfill these requirements for a fixed System-on-Chip design, changing composition, number and locations of processing modules in runtime reconfigurable System-on-Chips require new communication paradigms. Special communication architectures especially for use in runtime reconfigurable System-on-Chip designs are presented in this article. Their analysis provides a basis for the design of CoNoChi, a runtime reconfigurable Network-on-Chip dedicated for the usage in FPGA-based designs. CoNoChi supports the adaptation of the network topology during runtime by providing mechanisms to add or remove switches from the network during runtime without stopping or stalling the network. The applicability of CoNoChi is shown on the basis of a complex runtime reconfigurable System-on-Chip for networking applications. Prototyping results demonstrate that CoNoChi is a promising alternative to existing communication architectures supporting both a high degree of adaptability during runtime and a high concurrency of data transfers.
Currently, the majority of the Network-on-Chip (NoC) researches are based on 2D algorithm or simple 3D structure. However, the congestion and faulty links in the topology can increase the latency and power consumption. In this paper, the authors try to build a novel 3D topology based on hierarchical structure and TSV links which can reduce the latency and power consumption by decreasing the hops during the process of passing the packets. We employ the C++ tool to test our method, and the results show that the performance can be improved about 21%–36% in throughput, also 3%–11% in latency.
Network-on-Chip (NoC) is a strong candidate for scalable interconnect design of Multi-Processor System-on-Chip (MPSoC). Software tasks of MPSoC require a certain protocol to communicate with each other. In NoC such a communication protocol should be handled at Network Interface and/or Processor Element level and it is expected that different protocols show their trade-offs. In consideration of the above, we employed two types of basic protocol and investigated their performance impact. The contribution of this work is to quantitatively evaluate effectiveness of using separate communication protocols depending on the task structure.
Advance in semiconductor technologies enables seamless integration of hundreds of cores on a single silicon die, which requires high communication performance. To deal with the increasing communication complexity of System-on-Chip (SoC), Network-on-Chip (NoC) has been recently proposed as an alternative to the conventional point-to-point links and bus based communication fabrics. In practice, to facilitate NoC design evaluation and optimization, Poisson traffic or Bernoulli traffic models are generally assumed. However, actual measurements showed that real high speed network traffic always has strong correlations. The objective of this paper is to investigate the impact of traffic correlations on the performance of NoC design. Experimental results show that traffic correlation degrades the performance of NoC design and unrealistic traffic assumptions may yield unacceptable designs.
Multithreaded programming has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors. The performance bottleneck of a multithreaded program is its critical path, whose length is its total execution time. As the number of cores within a processor increases, Network-on-Chip (NoC) has been proposed as a promising approach for inter-core communication. In order to optimize the performance of a multithreaded program running on an NoC based multi-core platform, we design and implement the critical-path driven router, which prioritizes inter-thread communication on the critical path when routing packets. The experimental results show that the critical-path driven router improves the execution time of the test case by 14.8% compared to the ordinary router.
Three-dimensional Network-on-Chip (3D NoC) architectures have gained a lot of popularity to solve the on-chip communication delays of next generation System-on-Chip (SoC) systems. However, the vertical interconnects of 3D NoC are expensive and complex to manufacture. Also, 3D router architecture consumes more power and occupies more area per chip floorplan compared to a 2D router. Hence, more efficient architectures should be designed. In this paper, we propose area efficient and low power 3D heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D NoC architectures. Experimental results show a negligible penalty (less than 5%) in average packet latency of the proposed heterogeneous 3D NoC architectures compared to typical homogeneous 3D NoCs, while the heterogeneity provides power and area efficiency of up to 61% and 19.7%, respectively.
Embedded distributed multimedia applications based on the use of on-chip networks for communication and messages exchange requires specific and enhanced quality of service (QoS) management. To reach the desired performances at the application level, the network-on-chip (NoC) router should implement per flit handling strategy with wide granularity. This purpose requires an enhanced internal architecture that ensures from one hand a specific management according to a service classification and from the other hand, it enhances the routing process.
In this context, this paper proposes a new mechanism for QoS management in NoC. This mechanism is based on the use of central memory where flits are in-queued according to their class of service. This scheme enables an optimal flit scheduling phase and provides more capabilities to drop low important flits when the router shows congestion state symptoms. The paper presents, also, a protocol structure that fills with this architecture and introduces a signaling mechanism to make efficient the QoS management through the proposed architecture. The circuit performances and its adaptability to achieve QoS with low power processing and high bandwidth in on chip multiprocessor systems will be studied in this paper.
Mapping constitutes a very important step in network-on-chip (NoC)-based implementation of an application. An application is often represented in the form of an application core graph. The cores of the core graph communicate between themselves using the underlying network. This paper presents a constructive heuristic to statically map applications on two-dimensional mesh-connected NoC. The approach corresponds to a design time decision of attachment of cores to the routers. The mapping results, in terms of overall communication cost metric, have been compared with many well-known techniques reported in the literature and also with an exact method built around integer linear programming (ILP). A thorough complexity analysis of the algorithm has been performed. For smaller benchmarks, the results obtained are same as those for the ILP generated solutions. For benchmarks containing 64 and higher number of cores, the mapping solutions are better than the existing ones. Dynamic performances of the mapped solutions have been compared with respect to synthetically generated self-similar traffic. In many cases, our approach requires less latency and energy per packet than the existing methods while providing higher throughput.
In recent days, network-on-chip (NoC) researchers focus mainly on the area reduction and low power consumption both in architectural and algorithmic approach. To achieve low power and high performance in NoC architecture, sense amplifiers (SAs) introduced which can consume less power under various traffic conditions. In order to analyze the performance of architectural NoC design before fabrication level, the new simulator is developed based on multi core processor with improved sense amplifier (MCPSA) in this work. The MCPSA simulator provides user, the flexibility of incorporating various traffic configurations and routing algorithm with user reconfigurable option. In addition, the different SA model can be put into the simulation in plug and play manner for evaluation. The NoC case studies are presented to demonstrate the NoC architecture with double tail sense amplifier (DTSA) and modified-DTSA (M-DTSA) design. The performance metric such as delay, data rate and power consumption is evaluated. The main idea of this new simulator is to interface multisim environment (MSE) into a NoC environment for validating any DTSA.
We present the design and evaluation of a high-performance network-on-chip (NoC) focused on telecommunication and multimedia applications that tolerate latency and bandwidth variations. The design is based on a connectionless strategy in which flits from different communication flows are interleaved in the same communication channel. Each flit carries routing information that is used by routers to perform arbitration and scheduling of the corresponding output ports in order to balance channel utilization. In order to compare our approach with others, we introduce an analytic model for the worst-case latency (WCL) of our NoC and recall those of related approaches. Analytic comparisons and experimental data show that our approach keeps average WCL lower for variable-bit-rate multimedia applications than a network based on resource reservation. For these applications, the overall throughput is larger than that of networks that perform resource reservation. A case study based on the proposed NoC shows that the average latency was 28% lower than the WCL expected for the experiment. Indeed, hard real-time flows designed considering the absolute WCL of the network will always meet the requirements of the associated hard real-time tasks, so no deadline can be lost due to network contention.
Three-dimensional network-on-chip (3D-NoC) emerges as a potential multi-core architecture delivering high performance, high energy efficiency and great scalability. However, 3D-NoC suffers from severe thermal problems due to its high power density. To solve this problem, thermal-aware scheduling is an effective solution. However, the high complexity of the thermal model of 3D-NoC becomes a major hurdle for developing efficient thermal-aware scheduling algorithms for 3D-NoC. In this paper, we propose a novel thermal-aware task scheduling scheme named as the Bottom-to-Top (B2T) approach to address this challenge. This heuristic-based method performs task allocation on processing units to efficiently minimize the peak temperature and improve the execution time of the tasks with low complexity. The algorithm is first designed for two-layer 3D-NoC and then extended to 3D-NoC with an arbitrary number of layers. When compared to traditional thermal-aware scheduling algorithms designed for 2D-NoC, our B2T algorithm can achieve significant peak temperature reduction (up to 11.9∘C) and performance improvement (up to 4%) on two-layer 3D-NoC. The improvement becomes more significant as the number of layers in 3D-NoC increases. For four-layer 3D-NoC, the improvement is up to 13.23∘C peak temperature reduction.
When designing a system-on-chip (SOC), a network-on-chip (NOC) paradigm is the backbone of used interconnection, but in recent years, with a great improvement in silicon technology and the ability to implement billions of transistors on a wafer, it sounds that wire-based communication is not efficient any longer, therefore designers intend to replace wireless transferring data methodology instead. In the following, we show how the performance of the wireless network is improved with subnet extension. This paper focuses on the evaluation of delay and throughput which are two important factors in network proficiency. In fact, to enhance the performance of the system, we need to reduce the number of delay cycles and improve the throughput, therefore to keep balance between these two parameters, designer has to adjust the packet injection rate (PIR) in a safe margin such that it does not exceed a certain point in each state, otherwise delay is uncontrollable, thus it is required to clearly identify the take-off points. About the designer also has to be informed purpose for which the system is going to be designed. It strictly depends on whether high throughput or low delay cycle is desirable. Subnet extension is a way to achieve this target. Here, totally three networks with the number of cores 64, 512 and 1024 have been selected, respectively. The effect of subnet extension is evaluated on each one. The behavior of each network with different number of subnets and IPs is studied. Obtained results from the simulator for different ranges of PIR and subnets are significant. To emphasize, we highlight the take-off points for the delay cycles and the points which does not cross the PIR level. Performance evaluation is conducted based on flit-accurate and open source system C simulator BookSim.
Low-power consumption in three-dimensional integrated circuits (3D IC) design is becoming an important concern that cannot be neglected. The multiple layers/dies are stacked in 3D IC and communicate with each other through-silicon-vias (TSVs) to work as a single device in order to achieve high performance with minimum power dissipation. This paper demonstrates high-level power modeling approach for the power estimation of homogenous integration of Network-on-Chip (NoC)-based mesh architecture in 3D IC design. The preliminary experimental work of power model is divided into two major parts of the design. The first part estimates the power of NoC architecture on each stack separately and the second estimates the power dissipation of the uniformly distributed TSVs and input/output (I/O) pads. The model uses a linear function to estimate the average power dissipation. For an entire IC design, the average power is extracted by simple addition of all power estimation results of the model. The design is operated with multiple frequencies to find the most appropriate frequency to minimize power dissipation. In experiments, the average maximum error is estimated 18.03%.
Network-on-chip (NoC) mapping algorithms significantly affect NoC system performance in terms of communication cost and energy consumption. For a specific application represented by a task graph, this paper proposes an energy-efficient mapping algorithm that searches for the mapping decision with best communication locality and therefore lowest energy consumption. To this end, we formulate the concerned mapping problem as an optimization model, and propose an effective meta-heuristic algorithm to solve the formulated optimization model. During the mapping procedure, we employ a simulation-free, communication probability-based energy model to evaluate the quality of each candidate mapping. By iteratively updating the best explored mapping decision using a meta-heuristic search strategy, the mapping procedure can eventually identify an mapping decision with optimal energy efficiency in the search space. The proposed mapping algorithm has been verified on NoC systems of different sizes using a variety of benchmark applications. Simulation results demonstrate that the mapping decision produced by this algorithm achieves an up to 23% energy reduction compared with the traditional round-robin strategy.
The aging effect induced by negative bias temperature instability (NBTI) is a universal issue existing in electronic equipments. NBTI aging effect can increase the path delay of network-on-chip (NoC) device, resulting in the decreased frequency of processor core and in turn its performance degradation. Under this circumstance, aging-aware task scheduling becomes a complex and challenging problem in advanced multicore systems. This paper presents an aging-aware scheduling method that incorporates NBTI aging effect into the task scheduling framework for mesh-based NoCs. The proposed method relies on a NBTI aging model to evaluate the degradation of core’s operating frequency to establish the task scheduling model under aging effect. Taking into account core performance degradation and the communication overheads among cores, we develop a meta-heuristic scheduling strategy based on particle swarm optimization algorithm to minimize the total execution time of all tasks. Experimental results show that the schedule obtained by the aging-aware algorithm has shorter completion time and higher throughput compared with the nonaging-aware case. On average, the makespan can be reduced by 13.55% and the throughput can be increased by 21.73% for a variety of benchmark applications.
On-chip Memory-Controllers (MCs) placement is a key issue in designing Network-on-Chip (NoC) with multiple MCs. A good MC placement can efficiently decrease NoC memory-access latency. However, it is difficult to search an optimal MC placement within a feasible time period due to huge design space of MC placements and much time spent on evaluating whether an MC placement is better or not. This paper focuses on how to evaluate an MC placement quickly. As a common manner of evaluating an MC placement, network simulation manners often cost too much time. In this paper, we propose a method of fast evaluation of an MC placement. There are two contributions in this paper: the first one is to use an indicator “path-load” to approximately represent memory-access-flow transmission latency; the second one is to propose a highly-efficient method of calculating path-load to evaluate an MC placement to take the place of traditional simulation manner. To verify this method, we embed it into a traversal method to find out optimal MCs placements. Experiments show that almost all MCs placements with the minimum path load value are those placements that are of the best network performance in the same network scenarios. In a word, our proposed method can efficiently evaluate MCs placements rather than simulation manners. It can be embedded into searching algorithm to accelerate achieving optimal solutions.
Extensive large-scale data and applications have increasing requests for high-performance computations which is fulfilled by Chip Multiprocessors (CMP) and System-on-Chips (SoCs). Network-on-Chips (NoCs) emerged as the reliable on-chip communication framework for CMPs and SoCs. NoC architectures are evaluated based on design parameters such as latency, area, and power. Cycle-accurate simulators are used to perform the design space exploration of NoC architectures. Cycle-accurate simulators become slow for interactive usage as the NoC topology size increases. To overcome these limitations, we employ a Machine Learning (ML) approach to predict the NoC simulation results within a short span of time. LBF-NoC: Learning-based framework is proposed to predict performance, power and area for Direct and Indirect NoC architectures. This provides chip designers with an efficient way to analyze various NoC features. LBF-NoC is modeled using distinct ML regression algorithms to predict overall performance of NoCs considering different synthetic traffic patterns. The performance metrics of five different (Mesh, Torus, Cmesh, Fat-Tree and Flattened Butterfly) NoC architectures can be analyzed using the proposed LBF-NoC framework. BookSim simulator is employed to validate the results. Various architecture sizes from 2×2 to 45×45 are used in the experiments considering various virtual channels, traffic patterns, and injection rates. The prediction error of LBF-NoC is 6% to 8%, and the overall speedup is 5000× to 5500× with respect to BookSim simulator.
On-chip interconnection networks are primarily designed for efficient, high-performance Tiled Chip Multi-Processors (TCMP) architectures. Bufferless Network-on-Chip (NoC) is a better design option owing to their simpler router structure, area and power efficiency. Deflection routers have similar network performance of buffered designs at low to medium network traffic as deflections are minimal. But when network load increases, deflections also rise rapidly leading to poor network performance because of increased latency, power dissipation and unbalanced traffic. In this work, we propose a subnetwork based adaptive Concentrated Mesh (CMesh) bufferless router where deflections are considerably reduced by redirecting competing flit in one subnetwork to vacant port of the other subnetwork without any additional cycle latency. Simulations conducted over two-dimensional and multidimensional CMesh networks show that our topologically independent, adaptive deflection routing mechanism provides better network load balance and improves performance by minimizing unbounded deflections when compared to designs under consideration.
Network-on-chip multicore architectures with a large number of processing elements are becoming a reality with the recent developments in technology. In these modern systems the processing elements are interconnected with regular network-on-chip (NoC) topologies such as meshes and trees. In this paper we propose a parallel Gauss-Seidel (GS) iterative algorithm for solving large systems of linear equations on a torus NoC architecture. The proposed parallel algorithm is O(Nn2/k2) time complexity for solving a system with matrix of order n on a k × k torus NoC architecture with N iterations assuming n and N are large compared to k (i.e. for large linear systems that require a large number of iterations). We show that under these conditions the proposed parallel GS algorithm has near optimal speedup.
While designing router micro-architecture of an On-Chip network, a good allocation of virtual channels (VCs) governs an effective resources utilization which essentially results in an optimized number of packets received at destination(s). Generally, the VC allocation schemes deal with the one-way approach of VC allocation to the contending flits. However, this approach produces non-optimal matching of flits to the available VCs on next routers, and therefore leads to the under-utilization of these VCs. This paper proposes a 2-Way VC Allocation scheme to map input VCs (requestors) to output VCs (resources). The proposed scheme is compared with the conventional VC allocation scheme under two different mesh configurations with a 100% channel load. Simulations performed under two different routing schemes in diverse traffic scenarios demonstrate an increase in the number of packets received at destinations by up to 76%. Also, the network’s latency exhibits trade-off with total power consumption while reducing hotspots.