GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Axel Modave, Amik St-Cyr and Tim Warburton

june, 2016

Publication type:

Paper in peer-reviewed journals

Journal:

Computers and Geosciences, vol. 91, pp. 64 - 76

Publisher:

Elsevier

DOI:

10.1016/j.cageo.2016.03.008

HAL:

hal-01378464

arXiv:

1602.07997

Abstract:

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each operation, or they can leverage BLAS libraries that provide optimized routines for basic linear algebra operations. In this paper, we present and analyze up-to-date performance results for different implementations, tested in a unified framework on a single NVIDIA GTX980 GPU. We show that specialized kernels written with a one-node-per-thread strategy are competitive for polynomial bases up to the fifth and seventh degrees for acoustic and elastic models, respectively. For higher degrees, a strategy that makes use of the NVIDIA cuBLAS library provides better results, able to reach a net arithmetic throughput 35.7% of the theoretical peak value.

BibTeX:

@article{Mod-StC-War-2016,
    author={Axel Modave and Amik St-Cyr and Tim Warburton },
    title={GPU performance analysis of a nodal discontinuous Galerkin 
           method for acoustic and elastic models },
    doi={10.1016/j.cageo.2016.03.008 },
    journal={Computers and Geosciences },
    year={2016 },
    month={6},
    volume={91 },
    pages={64--76},
}