Relative debugging for a highly parallel hybrid computer system

DeRose, Luiz, Gontarek, Andrew, Vose, Aaron, Moench, Robert, Abramson, David, Dinh, Minh Ngoc and Jin, Chao (2015). Relative debugging for a highly parallel hybrid computer system. In: SC '15 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, United States, (). 15-20 November 2015. doi:10.1145/2807591.2807605

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author DeRose, Luiz
Gontarek, Andrew
Vose, Aaron
Moench, Robert
Abramson, David
Dinh, Minh Ngoc
Jin, Chao
Title of paper Relative debugging for a highly parallel hybrid computer system
Conference name International Conference for High Performance Computing, Networking, Storage and Analysis
Conference location Austin, TX, United States
Conference dates 15-20 November 2015
Convener ACM
Proceedings title SC '15 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Journal name International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Place of Publication New York, NY, United States
Publisher ACM
Publication Year 2015
Sub-type Fully published paper
DOI 10.1145/2807591.2807605
Open Access Status Not Open Access
ISBN 9781450337236
ISSN 2167-4337
Volume 15-20-November-2015
Total pages 12
Collection year 2016
Language eng
Abstract/Summary Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs accelerators or coprocessors. In this paper we extend relative debugging to support porting stencil computation on a hybrid computer. We describe a generic data model that allows programmers to examine the global state across different types of applications, including MPI/OpenMP, MPI/OpenACC, and UPC programs. We present case studies using a hybrid version of the `stellarator' particle simulation DELTA5D, on Titan at ORNL, and the UPC version of Shallow Water Equations on Crystal, an internal supercomputer of Cray. These case studies used up to 5,120 GPUs and 32,768 CPU cores to illustrate that the debugger is effective and practical.
Keyword Parallel debugging
Hybrid programming
Scalability
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Wed, 04 May 2016, 15:00:50 EST by Ms Diana Cassidy on behalf of School of Information Technol and Elec Engineering