Data centric highly parallel debugging

Abramson, David, Dinh, Minh Nogoc, Kurniawan, Donny, Moench, Bob and DeRose, Luiz (2010). Data centric highly parallel debugging. In: HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, Chicago, IL, (119-129). 21 - 25 June 2010. doi:10.1145/1851476.1851491

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Abramson, David
Dinh, Minh Nogoc
Kurniawan, Donny
Moench, Bob
DeRose, Luiz
Title of paper Data centric highly parallel debugging
Conference name 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010
Conference location Chicago, IL
Conference dates 21 - 25 June 2010
Proceedings title HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Series ACM International Conference Proceeding Series
Place of Publication New York, NY United States
Publisher ACM
Publication Year 2010
Year available 2010
Sub-type Fully published paper
DOI 10.1145/1851476.1851491
ISBN 9781605589428
Start page 119
End page 129
Total pages 11
Language eng
Formatted Abstract/Summary
 Debugging parallel programs is an order of magnitude more complex than sequential ones, and yet, most parallel debuggers provide little extra functionality than their sequential counterparts. This problem becomes more serious as computational codes become more complex, involving larger data structures, and as the machines become larger. Peta-scale machines consisting of millions of cores pose a significant challenge for existing techniques. We argue that debugging must become more data-centric, and believe that "assertions" provide a useful model. Assertions allow a user to declare their expectations about the program state as a whole rather than focusing on that of only a single process state. Previously, we have implemented a special type of assertion that supports debugging applications as they evolve or are ported to different platforms. They allow a user to compare the state of one program against another reference version. These 'relative debugging' assertions, whilst powerful, pose significant implementation challenges for large peta-scale machines. In this paper we discuss a hashing technique that provides a scalable solution for very large problems on very large machines. We illustrate the scheme on 65k cores of Kraken, a Cray XT5 at the University of Tennessee.
Subjects 1703 Computational Theory and Mathematics
1706 Computer Science Applications
1712 Software
Keyword Debugging
Parallel computing
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 11 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 22 Oct 2013, 21:57:03 EST by Ms Diana Cassidy on behalf of Research Computing Centre