A data-centric framework for debugging highly parallel applications

Dinh, Minh Ngoc, Abramson, David, Jin, Chao, Gontarek, Andrew, Moench, Bob and DeRose, Luiz (2015) A data-centric framework for debugging highly parallel applications. Software: Practice and Experience, 45 4: 501-526. doi:10.1002/spe.2239


Author Dinh, Minh Ngoc
Abramson, David
Jin, Chao
Gontarek, Andrew
Moench, Bob
DeRose, Luiz
Title A data-centric framework for debugging highly parallel applications
Journal name Software: Practice and Experience   Check publisher's open access policy
ISSN 0038-0644
1097-024X
Publication date 2015-04-01
Year available 2013
Sub-type Article (original research)
DOI 10.1002/spe.2239
Open Access Status Not yet assessed
Volume 45
Issue 4
Start page 501
End page 526
Total pages 26
Place of publication West Sussex, United Kingdom
Publisher John Wiley & Sons
Language eng
Abstract Contemporary parallel debuggers allow users to control more than one processing thread while supporting the same examination and visualisation operations of that of sequential debuggers. This approach restricts the use of parallel debuggers when it comes to large scale scientific applications run across hundreds of thousands compute cores. First, manually observing the runtime data to detect error becomes impractical because the data is too big. Second, performing expensive but useful debugging operations becomes infeasible as the computational codes become more complex, involving larger data structures, and as the machines become larger. This study explores the idea of a data-centric debugging approach, which could be used to make parallel debuggers more powerful. It discusses the use of ad hoc debug-time assertions that allow a user to reason about the state of a parallel computation. These assertions support the verification and validation of program state at runtime as a whole rather than focusing on that of only a single process state. Furthermore, the debugger's performance can be improved by exploiting the underlying parallel platform because the available compute cores can execute parallel debugging functions, while a program is idling at a breakpoint. We demonstrate the system with several case studies and evaluate the performance of the tool on a 20 000 cores Cray XE6.
Keyword Software debugging
Parallel debugger
Assertion
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ
Additional Notes Published online ahead of print: 21 November 2013.

Document type: Journal Article
Sub-type: Article (original research)
Collections: Non HERDC
School of Information Technology and Electrical Engineering Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Wed, 20 Aug 2014, 22:39:22 EST by Emma Petherick on behalf of School of Information Technol and Elec Engineering