Advanced Search

Narrow Results

Results: 1 - 9of9

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: QR Decomposition (9)	31 Mar 2025	Run
Keyword: Boundary Control (3)	31 Mar 2025	Run
Keyword: Rolling Friction (2)	31 Mar 2025	Run
Keyword: Oxidative Injury (1)	31 Mar 2025	Run
Keyword: Baroreflex Model (1)	31 Mar 2025	Run

articleNo Access
Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime
- Ichitaro Yamazaki,
- Jakub Kurzak,
- Piotr Luszczek, and
- Jack Dongarra
Parallel Processing Letters01 Dec 2014
Preview Abstract
A systolic array provides an alternative computing paradigm to the von Neumann architecture. Though its hardware implementation has failed as a paradigm to design integrated circuits in the past, we are now discovering that the systolic array as a software virtualization layer can lead to an extremely scalable execution paradigm. To demonstrate this scalability, in this paper, we design and implement a 3D virtual systolic array to compute a tile QR decomposition of a tall-and-skinny dense matrix. Our implementation is based on a state-of-the-art algorithm that factorizes a panel based on a tree-reduction. Freed from the constraint of a planar layout, we present a three-dimensional virtual systolic array architecture for this algorithm. Using a runtime developed as a part of the Parallel Ultra Light Systolic Array Runtime (PULSAR) project, we demonstrate on a Cray-XT5 machine how our virtual systolic array can be mapped to a large-scale machine and obtain excellent parallel performance. This is an important contribution since such a QR decomposition is used, for example, to compute a least squares solution of an overdetermined system, which arises in many scientific and engineering problems.
articleNo Access
An Improved Low-Rank Matrix Fitting Method Based on Weighted $L_{1, p}$ Norm Minimization for Matrix Completion
- Qing Liu,
- Qing Jiang,
- Jing Zhang,
- Bin Jiang, and
- Zhengyu Liu
International Journal of Pattern Recognition and Artificial Intelligence15 Mar 2023
Preview Abstract
Low-rank matrix completion, which aims to recover a matrix with many missing values, has attracted much attention in many fields of computer science. A low-rank matrix fitting (LMaFit) method has been proposed for fast matrix completion recently. However, this method cannot converge accurately on matrices of real-world images. For improving the accuracy of LMaFit method, an improved low-rank matrix fitting (ILMF) method based on the weighted $L_{1, p}$ norm minimization is proposed in this paper, where the $L_{1, p}$ norm is the summation of the $p$ -power $(0 < p < 2)$ of $L_{1}$ norms of rows in a matrix. In the proposed method, i.e. the ILMF method, the incomplete matrix that may be corrupted by noises is decomposed into the summation of a low-rank matrix and a noise matrix at first. Then, a weighted $L_{1, p}$ norm minimization problem is solved by using an alternating direction method for improving the accuracy of matrix completion. Experimental results on real-world images show that the ILMF method has much better performances in terms of both the convergence accuracy and convergence speed than the compared methods.
articleNo Access
TRANSMIT ANTENNA SELECTION FOR V-BLAST MIMO SYSTEM BASED ON EXTENDED CHANNEL MATRIX ANALYSIS
- JIANGUO LIU and
- LUXI YANG
Journal of Circuits, Systems and Computers01 Apr 2008
Preview Abstract
For the sake of reducing the hardware cost of multiple RF chains and the complexity for spatial multiplexing systems, in this paper a novel low-complexity transmit antenna selection criterion is proposed for V-BLAST system with MMSE OSIC detection. Based on MMSE extension of V-BLAST with sorted QR decomposition, an intuitional performance analysis of each sub-stream for MMSE V-BLAST transmission is done, and a new solution to the transmit antenna selection problem is suggested. Unlike most of the existing works, the proposed algorithm synthetically considers the impacts of detection order, interference cancellation, and noise amplification. Theoretical analysis and simulation results show that the proposed algorithm achieves well both in outage capacity and in symbol error rate performance with low computational complexity.
articleNo Access
DESIGN AND IMPLEMENTATION OF SYMBOL DETECTOR FOR MIMO SPATIAL MULTIPLEXING SYSTEMS
- YUNHO JUNG,
- SEONGJOO LEE, and
- JAESEOK KIM
Journal of Circuits, Systems and Computers01 Jun 2011
Preview Abstract
In this paper, we propose an efficient symbol detection algorithm for multiple-input multiple-output spatial multiplexing (MIMO-SM) systems and present its design and implementation results. By enhancing the error performance of the first detected symbol that causes error propagation, the proposed algorithm achieves a considerable performance gain compared with the conventional sorted QR decomposition (SQRD) based detection and the ordered successive detection (OSD) algorithms. The bit error rate (BER) performance of the proposed detection algorithm is evaluated by a simulation. In the case of a 16QAM MIMO-SM system with 4 transmit and 4 receive (4 × 4) antennas, at BER = 10^-3 the proposed algorithm results in a gain improvement of about 2.5–13.5 dB over the previous algorithms. The proposed detection algorithm was designed in a hardware description language (HDL) and synthesized to gate-level circuits using 0.18 μm 1.8 V CMOS standard cell library. The results show that the proposed algorithm can be implemented without increasing the hardware costs significantly.
articleNo Access
RECURSIVE QR DECOMPOSITION ARCHITECTURE FOR MIMO-OFDM DETECTION SYSTEMS
Journal of Circuits, Systems and Computers01 Feb 2013
Preview Abstract
This paper presents a modified implementation of QR decomposition for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) detection based on the Givens rotation method. The QR decomposition hardware is constructed using the coordinate rotation digital computer (CORDIC) algorithm operating with fewer gate counts and lower power consumption than do triangular systolic array (TSA) structures. Accurate signal transmission is essential to wireless communication systems. Thus, a more effective data detection algorithm and precise channel estimation method play vital roles in MIMO systems. Implementing data detection with QR decomposition helps reduce the complexity of MIMO-OFDM detection. Implementation results reveal that the proposed recursive QR decomposition (RQRD) architecture has lower clock latency than do TSA structures, and has a smaller hardware area than do Gram–Schmidt structures.
articleNo Access
VLIW DSP-Based Low-Level Instruction Scheme of Givens QR Decomposition for Real-Time Processing
- Mohamed Najoui,
- Mounir Bahtat,
- Anas Hatim,
- Said Belkouch, and
- Noureddine Chabini
Journal of Circuits, Systems and Computers24 Apr 2017
Preview Abstract
QR decomposition (QRD) is one of the most widely used numerical linear algebra (NLA) kernels in several signal processing applications. Its implementation has a considerable and an important impact on the system performance. As processor architectures continue to gain ground in the high-performance computing world, QRD algorithms have to be redesigned in order to take advantage of the architectural features on these new processors. However, in some processor architectures like very large instruction word (VLIW), compiler efficiency is not enough to make an effective use of available computational resources. This paper presents an efficient and optimized approach to implement Givens QRD in a low-power platform based on VLIW architecture. To overcome the compiler efficiency limits to parallelize the most of Givens arithmetic operations, we propose a low-level instruction scheme that could maximize the parallelism rate and minimize clock cycles. The key contributions of this work are as follows: (i) New parallel and fast version design of Givens algorithm based on the VLIW features (i.e., instruction-level parallelism (ILP) and data-level parallelism (DLP)) including the cache memory properties. (ii) Efficient data management approach to avoid cache misses and memory bank conflicts. Two DSP platforms C6678 and AK2H12 were used as targets for implementation. The introduced parallel QR implementation method achieves, in average, more than 12 $\times$ and 6 $\times$ speedups over the standard algorithm version and the optimized QR routine implementations, respectively. Compared to the state of the art, the proposed scheme implementation is at least 3.65 and 2.5 times faster than the recent CPU and DSP implementations, respectively.
articleNo Access
An Efficient and High-Speed Implementation of QRD-MGS Algorithm for STAP Application Based on Floating Point FPGAs
Journal of Circuits, Systems and Computers13 Jun 2019
Preview Abstract
Space-Time Adaptive Processing (STAP) can harness the efficacy of interference and clutter significantly. Calculations of the STAP weights involve solving linear equations which require very intensive computations. In this paper, the QR decomposition (QRD) using the modified gram-schmidt (MGS) algorithm is parameterized with vector size to create a trade-off between the hardware resources utilization and computation time. To achieve an efficient floating point structure, the proposed architecture of QRD-MGS algorithm is simulated and implemented in two modes: single-vector and multi-vector. Results show that the multi-vector method can lead to a high-performance design with higher operating frequency, lower power consumption, and less resource utilization than the single-vector method. For example, Modelism simulations show that the decomposition of a $12 \times 51$ matrix with vector size of 17 takes 7.86 $μ$ s with the maximum clock frequency of 282MHz, for implementation on the Arria10 FPGA. In real STAP applications, the matrix sizes are too large to be fit on FPGAs and the update rate of the weights are high. Therefore, this method can fit any matrix in the contemporary FPGAs with an acceptable update rate.
articleNo Access
Novel Implementation Approach with Enhanced Memory Access Performance of MGS Algorithm for VLIW Architecture
- Mohamed Najoui,
- Anas Hatim,
- Said Belkouch, and
- Noureddine Chabini
Journal of Circuits, Systems and Computers19 Feb 2020
Preview Abstract
Modified Gram–Schmidt (MGS) algorithm is one of the most-known forms of QR decomposition (QRD) algorithms. It has been used in many signal and image processing applications to solve least square problem and linear equations or to invert matrices. However, QRD is well-thought-out as a computationally expensive technique, and its sequential implementation fails to meet the requirements of many real-time applications. In this paper, we suggest a new parallel version of MGS algorithm that uses VLIW (Very Long Instruction Word) resources in an efficient way to get more performance. The presented parallel MGS is based on compact VLIW kernels that have been designed for each algorithm step taking into account architectural and algorithmic constraints. Based on instruction scheduling and software pipelining techniques, the proposed kernels exploit efficiently data, instruction and loop levels parallelism. Additionally, cache memory properties were used efficiently to enhance parallel memory access and to avoid cache misses. The robustness, accuracy and rapidity of the introduced parallel MGS implementation on VLIW enhance significantly the performance of systems under severe rea-time and low power constraints. Experimental results show great improvements over the optimized vendor QRD implementation and the state of art.
articleNo Access
Identification of Independent Patterns of COVID-19 Mortality in Brazil by a Functional QR Decomposition
- Jorge C. Lucero
Biophysical Reviews and Letters01 Mar 2022
Preview Abstract
The subset selection problem of linear algebra is applied to identify independent patterns of COVID-19 evolution within Brazil. The data consist of a set of mortality curves in states of Brazil. A subset of the most independent curves is selected by using a functional version of the QR matrix decomposition technique with column pivoting. The selected subset is used next as a basis to represent the remaining curves filtering out any data redundancy. For each independent curve, an associated epidemiological region of influence is defined. The results show two main independent curves with a similar two-peak pattern and a 50-day shift between the patterns. Two main epidemiological regions are next identified: one encompassing most of the country from the center and northeast states to the south, an another one containing the Amazonian region at the northwest.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime

An Improved Low-Rank Matrix Fitting Method Based on Weighted $L_{1, p}$ Norm Minimization for Matrix Completion

TRANSMIT ANTENNA SELECTION FOR V-BLAST MIMO SYSTEM BASED ON EXTENDED CHANNEL MATRIX ANALYSIS

DESIGN AND IMPLEMENTATION OF SYMBOL DETECTOR FOR MIMO SPATIAL MULTIPLEXING SYSTEMS

RECURSIVE QR DECOMPOSITION ARCHITECTURE FOR MIMO-OFDM DETECTION SYSTEMS

VLIW DSP-Based Low-Level Instruction Scheme of Givens QR Decomposition for Real-Time Processing

An Efficient and High-Speed Implementation of QRD-MGS Algorithm for STAP Application Based on Floating Point FPGAs

Novel Implementation Approach with Enhanced Memory Access Performance of MGS Algorithm for VLIW Architecture

Identification of Independent Patterns of COVID-19 Mortality in Brazil by a Functional QR Decomposition