Exploring Differences Between Directive-Based GPU Programming Models

Organisers: Maciej Cytowski (PawseySC), Tom Papatheodore (ORNL), Chris Daley (LBL)

OpenACC and OpenMP are often seen as competing solutions for directive-based GPU offloading. Both models allow the programmer to offload computational workloads to run on GPUs and to manage data transfers between CPU and GPU memories. OpenACC is said to be a descriptive approach to programming GPUs, where the programmer uses directives to tell the compiler where data-independent loops are located and lets the compiler decide how/where to parallelize the loops based on the architecture (via compiler flags). OpenMP, on the other hand, is said to be a prescriptive approach to GPU programming, where the programmer uses directives to more explicitly tell the compiler how/where to parallelize the loops, instead of letting the compiler decide.

It’s common to hear programmers ask, “which programming model should I use?”, “which approach is more portable?”, “are one of these models going to replace the other?”, etc.

In this tutorial, we will not attempt to argue for one programming model over the other or specifically try to compare their performance profiles. Instead, we will explore the differences between the two approaches briefly outlined above and give participants the opportunity to explore how these differences manifest themselves when implementing a program during hands-on exercises.

We will also give a current snapshot of the compiler implementations available for OpenACC and OpenMP(GPU). This “developer view” of exploring the programming models is intended to give participants a foundation of information to help guide them in choosing which model works best for their application.

Prerequisites

Participants are expected to be familiar with GPU architecture and the concept of offloading computations to accelerators. Participants should be familiar with C/C++ or Fortran programming language. Participants are required to bring their own laptops with SSH client for the hands-on session. Participants will be provided with access to the HPC platform.

Schedule

Setup Download files required for the lesson
00:00 1. Talks Overview of directives and GPU offloading models available in OpenACC and OpenMP
Overivew of compiler support for OpenACC and OpenMP offloading models
Introduction to training HPC platform
01:30 2. Introduction to Laplace Equation A quick overview of the Laplace equation and solver.
01:40 3. Serial Implementation A quick overview of the serial implementation of a 2D Laplace equation solver
01:55 4. Profiling Basic profiling to identify the most computationally expensive parts of the code
02:10 5. Loop parallelisation Basic OpenACC and OpenMP directives to parallelise loop nests
02:40 6. Data management Usage of OpenACC and OpenMP data mapping directives
03:10 7. Multi-GPU implementation Extra: implementing multi-GPU parallelisation
04:25 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.