Parallel simulation-based engineering workflows using unstructured meshes require adaptive methods to ensure reliability and efficiency. Starting with a problem specification on a geometric model, an effective workflow automatically executes parallel mesh generation, analysis, and analysis-based mesh and/or model adaptation. The analyze-adapt cycle is repeated until a desired level of solution accuracy is reached. Between each step in the cycle is an opportunity to improve scalability and efficiency through dynamic partitioning.

Current dynamic load balancing methods do not effectively reduce imbalances to the levels needed by applications capable of strong scaling to the full size of leadership class petascale systems. ParMA, Partitioning using Mesh Adjacencies, is a scalable dynamic load balancer that quickly reaches the required imbalance levels for multiple criteria. These methods work directly on the unstructured mesh alongside traditional graph and geometric based methods to quickly reduce the source of imbalance the consuming application is sensitive to. This approach improves the linear algebra work performance of PHASTA computational fluid dynamics by 28% over a graph-based partition, and improves scaling from 0.82 to 1.14, on 512Ki processes of the ALCF Mira system.

The ParMA load balancing algorithms that worked directly on unstructured meshes are being generalized to support applications that use a different mesh distribution (e.g., node partitions), or simply have information and dependencies between them that can be represented with a graph. EnGPar will implement the ParMA diffusive algorithms and multi-level graph partitioning procedures using data-parallel operations on a graph structure using multiple edge-types to represent different application information dependencies. Efforts are underway to implement the diffusive algorithms and explore the use of the Kokkos programming model for performance portability.


ParMA is available via the SCOREC/core (PUMI) repository on GitHub:


Build instructions are available here:


The API documentation (doxygen) for ParMA is available here:

Figure. Geometry and mesh for simulations of blood flow in an abdominal aortic aneurysm (AAA) using the PHASTA CFD code

Partition Improvement with Multiple Criteria

Diffusive procedure driven by application-defined priority list of mesh entity types to balance. Iteration over three stages:

On the 1.2 billion element mesh shown in Figure 2, ParMA Vertex>Region balancing improves the PHASTA CFD linear algebra work performance by over 28\% versus a ParMETIS partition, and scaling is improved from 0.82 to 1.14 [1].

Figure 2. Full view (top) of the vertical stabilizer and rudder and a slice at their junction (bottom) colored by part number illustrating the complex geometry and small features of the fluid mesh [1].


This research was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under award DE-SC00066117 (FASTMath SciDAC Institute) and by the National Science Foundation under Grant No. ACI 1533581, (SI2-SSE: Fast Dynamic Load Balancing Tools for Extreme Scale Systems).

We gratefully acknowledge the use of the resources of the Rensselaer Center for Computational Innovations and the Leadership Computing Facility at Argonne National Laboratory.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.