Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.


A matrix operation is referred to as a hard-to-parallel matrix operation (HPMO) if it has serial bottlenecks that are hardly parallelizable. Otherwise, it is referred to as an easy-to-parallel matrix operation (EPMO). Empirical evidences showed the performance scalability of an HPMO is signi?cantly poorer than an EPMO on multicore and distributed architectures. As the result, the design of higher-level algorithms for applications, for the performance considerations on multicore and distributed architectures, should avoid the use of HPMOs as the computational kernels. In this paper, as a case study, we present an HPMO-avoiding algorithm for the Green’s function calculation in quantum Monte Carlo simulation. The original algorithm utilizes the QR-decomposition with column pivoting (QRP) as its computational kernel. QRP is an HPMO. The redesigned algorithm maintains the same simulation stability but employs the standard QR decomposition without pivoting (QR), which is an EPMO. Different implementations of the redesigned algorithm on multicore and distributed architectures are investigated. Although some implementations of the redesigned method use about a factor of three more ?oating-point operations than the original algorithm, they are about 20% faster on a quadcore system and 2.5 times faster on a 1024-CPU massively parallel processing system. The broader impact of the redesign of higher-level matrix algorithms to avoid HPMOs in other computational science applications is also discussed.

Questions and Answers

You need to be logged in to be able to post here.