Loading…
Run-time thread sorting to expose data-level parallelism
We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecules. An important algorithmic component of these computations is the evaluation of Electron Repulsion Integrals (ERIs). A k...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We address the problem of data parallel processing for computational quantum chemistry (CQC). CQC is a computationally demanding tool to study the electronic structure of molecules. An important algorithmic component of these computations is the evaluation of Electron Repulsion Integrals (ERIs). A key problem with ERI evaluation is controlflow variation between different ERI evaluations, which can only be resolved at runtime. This causes the computation to be unsuitable for data parallel execution. However, it is observed that although there is variation between ERI evaluations, the variation is limited; in fact there are a limited number of ERI classes present within any given workload. Conceptually, it is possible to classify the ERIs into sizable sets, and execute these sets in a data parallel fashion. Practically, creating these sets is computationally expensive. We describe an architecture to perform this thread sorting, where high throughput is achieved with small associative and multiport memories. The performance of the prototype is evaluated with FPGA synthesis. We go on to envision other uses for thread sorting, in general-purpose manycore architectures. |
---|---|
ISSN: | 1063-6862 |
DOI: | 10.1109/ASAP.2008.4580154 |