Advanced Search

Narrow Results

Results: 1 - 10of10

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Debye Ring (1)	18 Mar 2025	Run
Keyword: Multi-core (10)	18 Mar 2025	Run
Keyword: Philhealth (1)	18 Mar 2025	Run
Keyword: Electronic (31)	18 Mar 2025	Run
Keyword: GHZ States (13)	18 Mar 2025	Run

articleNo Access
BALANCED DENSE POLYNOMIAL MULTIPLICATION ON MULTI-CORES
- MARC MORENO MAZA and
- YUZHEN XIE
International Journal of Foundations of Computer Science01 Aug 2011
Preview Abstract
In symbolic computation, polynomial multiplication is a fundamental operation akin to matrix multiplication in numerical computation. We present efficient implementation strategies for FFT-based dense polynomial multiplication targeting multi-cores. We show that balanced input data can maximize parallel speedup and minimize cache complexity for bivariate multiplication. However, unbalanced input data, which are common in symbolic computation, are challenging. We provide efficient techniques, that we call contraction and extension, to reduce multivariate (and univariate) multiplication to balanced bivariate multiplication. Our implementation in Cilk++ demonstrates good speedup on multi-cores.
articleNo Access
A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING
Parallel Processing Letters01 Dec 2008
Preview Abstract
In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to incorporate an integrated memory controller. We evaluate the suitability of these processors in quad-socket compute nodes as building blocks for large-scale scientific computing clusters. Our analysis of intra-processor and intra-node scalability of microbenchmarks, and a range of large-scale scientific applications, indicates that quad-core processors can deliver an improvement in performance of up to 4x over a single core depending on the workload being processed. However, scalability can be less when considering a full node. We show that Nehalem outperforms Barcelona on memory-intensive codes by a factor of two for a Nehalem node with 8 cores and a Barcelona node containing 16 cores. Further optimizations are possible with Nehalem, including the use of Simultaneous Multithreading, which improves the performance of some applications by up to 50%.
articleNo Access
DATA ACCESS CHARACTERISTICS AND OPTIMIZATIONS FOR SUN ULTRASPARC T2 AND T2+ SYSTEMS
- GEORG HAGER,
- THOMAS ZEISER, and
- GERHARD WELLEIN
Parallel Processing Letters01 Dec 2008
Preview Abstract
Processor and system architectures that feature multiple memory controllers and/or ccNUMA characteristics are prone to show bottlenecks and erratic performance numbers on scientific codes. Although cache thrashing, aliasing conflicts, and ccNUMA locality and contention problems are well known for many types of systems, they take on peculiar forms on the new Sun UltraSPARC T2 and T2+ processors, which we use here as prototypical multi-core designs. We analyze performance patterns in low-level and application benchmarks and put some emphasis on a comparison of performance features between T2 and its successor. Furthermore we show ways to circumvent bottlenecks by careful data layout, placement and padding.
articleNo Access
PARALLEL CONSTRUCTION OF DATA CUBES ON MULTI-CORE MULTI-DISK PLATFORMS
- FRANK DEHNE and
- HAMIDREZA ZABOLI
Parallel Processing Letters01 Mar 2013
Preview Abstract
On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems.
In this paper we discuss the development of MCMD-CUBE, a new parallel data cube construction method for multi-core processors with multiple disks. We present experimental results for a Sandy Bridge multi-core processor with four parallel disks. Our experiments indicate that MCMD-CUBE achieves very close to linear speedup. A critical part of our MCMD-CUBE method is parallel sorting. We developed a new parallel sorting method termed MCMD-SORT for multi-core processors with multiple disks which outperforms other previous methods.
chapterNo Access
Assessing the Scalability Issues on Multi-Core NUMA Machines
- Malak Aljabri,
- Phil Trinder, and
- Hans-Wolfgang Loidl
Proceedings of the Eighth Saudi Students Conference in the UK15 Dec 2015
Preview Abstract
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offering varying access time and latencies between different memory banks. The organisation of nodes across different regions with nodes in the same regions that share the same memory poses challenges to efficient shared-memory access, thus negatively affecting the scalability of parallel applications. This paper studies the effect of state-of-the-art physical shared-memory NUMA architectures on the performance scalability of parallel applications using a range of programs and various language technologies. In particular, different parallel programs are used with different communication libraries and patterns in two sets of experiments. The first experiment examines the performance of the mainstream, widely used parallel technologies MPI and OpenMP, which utilise message passing and shared-memory communication patterns respectively. In addition, the performance implications of message passing versus shared-memory access on NUMA are compared using a concordance application. The second experiment assesses the performance of two parallel Haskell implementations as examples of a high-level language with automatic memory management.The results revealed that in the case of OpenMP the scalability was good, with threads up to six representing threads allocated in the same NUMA node. Moreover, as the number of threads increased, the performance dramatically decreased, confirming the effect of inefficient memory access. Likewise, MPI demonstrated similar behaviours, with the optimum speedup at six cores. However, unlike OpenMP, performance did not decrease sharply beyond that point, illustrating the benefits of message passing as opposed to shared-memory access. In terms of the standard, shared-memory parallel Haskell implementation, the scalability was limited to between 10 to 25 cores out of 48 across three parallel programs, with high memory management overheads. On the other hand, our parallel Haskell implementation, GUMSMP, which combines both distributed and shared-heap abstractions, scaled consistently with a speedup of up to 24 on 48 cores and overall performance improvement of up to 57%, as compared with the shared-memory implementation.
chapterNo Access
MCPP: A High-efficiency Routing Table Lookup Based on NPDK
- Jin-li YAN,
- Lu TANG,
- Gao-feng LV,
- Tao LI,
- Zhi-gang SUN, and
- Jian-biao MAO
Computer Science and Technology12 Dec 2016
Preview Abstract
Increasing network traffic requires higher performance in the processing ability of network devices. The routing table lookup based on single thread is not appropriate for usage in general-purpose multi-core processors to achieve parallel processing. Current parallelization of routing table lookup in application layer causes high overhead in data copy, network stack and conflict of shared memory access. Network Processor Development Kit (NPDK) is a development environment for the network processing platform. In this paper, we proposed a routing table lookup mechanism named Multicopy Parallel Processing (MCPP) provided by NPDK for the purpose of multi-core parallel processing. Based on multi-core parallel processing architecture of NPDK, the overhead of packet copy is eliminated and multi-copy technology efficiently avoids routing table access conflict. Moreover, in order to reduce the number of memory accesses, a lookup algorithm named Configurable Two-Level Operation (CTLO) was implemented. Lastly, we compare the forwarding performance between the MCPP mechanism and the traditional mechanism in NPE platform, which is based on the general-purpose multi-core processor FT1000A. The evaluation results show that the packet forwarding performance based on the MCPP mechanism outperforms that based on the traditional mechanism.
chapterNo Access
Speed up big integer multiplication in the Block Wiedemann on graphics processing unit
- Peng-Bo Wu,
- Jing-Fei Jiang, and
- Yang Zhao
Design, Manufacturing and Mechatronics05 Jan 2017
Preview Abstract
The big integer multiplication play a vital role in fields of cryptography, biological information, genetic engineering and so on. To accelerate it has become the one of the main objective of these fields’ algorithms research. In this paper, we choose to use static array to store big integer and design the parallel algorithms of big integer multiplication to improve the computational efficiency based on the Block Wiedemann algorithms. Using 32 threads on Graphics Processing Unit to carry out multiplication result for 1024-bit integer and 32-bit integer. And this implementation achieves the speedup of 7.44 times over the big integer multiplication implementation on a single CPU.
chapterNo Access
RESEARCH ON COMPENSATION POLICY SCHEDULING ALGORITHM FOR MULTI-CORE PROCESSORS
- BENHAI ZHOU,
- JIANZHONG QIAO,
- SHU-KUAN LIN, and
- ENZE WANG
Computer and Network Technology01 Jul 2009
Preview Abstract
With rapid development of processors, multi-core processors occupied the main market due to their superior performance and power characteristics. Threads run concurrently on a multi-core processor and share the processor's the second-level (L2) cache. Cache allocation and sharing is critical to the effective utilization of multi-core processors. Unfair CPU cache allocation can easily delay the critical task and cause serious problems, such as thread starvation, priority inversion, and inadequate CPU accounting. To solve these problems, this paper proposed a compensation policy scheduling algorithm which can decrease system variability and improve system performance effectively. The experiment results show that the system variability adopting compensation policy scheduling algorithm is lower than that using conventional scheduling algorithms. Consequently, compensation policy scheduling algorithm improves the stability of system and applications effectively on multi-core processors.
chapterNo Access
A PARALLEL APPROACH FOR COMPUTING APPROXIMATIONS OF DOMINANCE-BASED ROUGH SETS APPROACH
- SHAOYONG LI,
- TIANRUI LI, and
- HONGMEI CHEN
Decision Making and Soft Computing21 Jul 2014
Preview Abstract
Computation of approximation in Dominance-based Rough Sets Approach (DRSA) is a necessary step for multi-criteria decision analysis and other related works. This paper presents a parallel approach for computing approximations of DRSA. Its feasibility is validated by a numerical example in this paper.
chapterNo Access
Multi-Core Parallel Implementation of Data Filtering Algorithm for Multi-Beam Bathymetry Data
- Tianyang LIU,
- Weiming XU,
- Xiaodong YIN, and
- Xiliang ZHAO
Mechanical Engineering and Control Systems14 Jan 2016
Preview Abstract
In order to improve the multi-beam bathymetry data processing speed, we propose a parallel filtering algorithm based on multi thread technology. The algorithm consists of two parts. The first is the parallel data re-order step, in which the surveying area is divided into a regular grid, and the discrete bathymetry data is arranged into each grid by parallel method. The second part is the parallel filtering step, which involves dividing the grid into blocks and parallel executing filtering process in each block. In the experiment, the speedup of the proposed algorithm reaches to about 3.67 with an 8 core computer. The result shows the method can improve computing efficiency significantly comparing to the traditional algorithm.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

BALANCED DENSE POLYNOMIAL MULTIPLICATION ON MULTI-CORES

A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING

DATA ACCESS CHARACTERISTICS AND OPTIMIZATIONS FOR SUN ULTRASPARC T2 AND T2+ SYSTEMS

PARALLEL CONSTRUCTION OF DATA CUBES ON MULTI-CORE MULTI-DISK PLATFORMS

Assessing the Scalability Issues on Multi-Core NUMA Machines

MCPP: A High-efficiency Routing Table Lookup Based on NPDK

Speed up big integer multiplication in the Block Wiedemann on graphics processing unit

RESEARCH ON COMPENSATION POLICY SCHEDULING ALGORITHM FOR MULTI-CORE PROCESSORS

A PARALLEL APPROACH FOR COMPUTING APPROXIMATIONS OF DOMINANCE-BASED ROUGH SETS APPROACH

Multi-Core Parallel Implementation of Data Filtering Algorithm for Multi-Beam Bathymetry Data