Run-time thread sorting to expose data-level parallelism

Ramdas, T., Egan, G. K., Abramson, D. and Baldridge, K. K. (2008). Run-time thread sorting to expose data-level parallelism. In: International Conference on Application-Specific Systems, Architectures and Processors, 2008. ASAP08 - IEEE 19th International Conference on Application-Specific Systems, Architectures and Processors, Leuven, Belgium, (55-60). 2-4 July 2008. doi:10.1109/ASAP.2008.4580154


Author Ramdas, T.
Egan, G. K.
Abramson, D.
Baldridge, K. K.
Title of paper Run-time thread sorting to expose data-level parallelism
Conference name ASAP08 - IEEE 19th International Conference on Application-Specific Systems, Architectures and Processors
Conference location Leuven, Belgium
Conference dates 2-4 July 2008
Proceedings title International Conference on Application-Specific Systems, Architectures and Processors, 2008
Place of Publication Piscataway, NJ, United States
Publisher IEEE (Institute for Electrical and Electronic Engineers)
Publication Year 2008
Sub-type Fully published paper
DOI 10.1109/ASAP.2008.4580154
ISBN 9781424418985
ISSN 1063-6862
Start page 55
End page 60
Total pages 6
Language eng
Abstract/Summary We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecules. An important algorithmic component of these computations is the evaluation of Electron Repulsion Integrals (ERIs). A key problem with ERI evaluation is controlflow variation between different ERI evaluations, which can only be resolved at runtime. This causes the computation to be unsuitable for data parallel execution. However, it is observed that although there is variation between ERI evaluations, the variation is limited; in fact there are a limited number of ERI classes present within any given workload. Conceptually, it is possible to classify the ERIs into sizable sets, and execute these sets in a data parallel fashion. Practically, creating these sets is computationally expensive. We describe an architecture to perform this thread sorting, where high throughput is achieved with small associative and multiport memories. The performance of the prototype is evaluated with FPGA synthesis. We go on to envision other uses for thread sorting, in general-purpose manycore architectures.
Subjects 1700 Computer Science
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Thu, 19 Dec 2013, 13:53:36 EST by Ms Diana Cassidy on behalf of School of Information Technol and Elec Engineering