Parallel unstructured simulations at extreme scale require that the mesh be distributed across a large number of processors with equal workload and minimum inter-part communications. ParMA's goal is to dynamically partition unstructured meshes directly using the existing mesh adjacency information to account for multiple criteria. Diffusive partition improvement procedures support large meshes (billions of mesh regions) on large core count machines (>100,000) and account for multiple criteria.
In the study of ParMA, we noticed that mesh adjacencies represent application data more completely than standard graph-partitioning models.
- All mesh entities can be considered, while graph-partitioning models use only a subset of mesh adjacency information
- Any adjacency can be obtained in O(1) time (assuming use of a complete mesh adjacency structure)
- Partition model supports efficient part adjacency queries
Figure. Geometry and mesh for simulations of blood flow in an abdominal aortic aneurysm (AAA) using the PHASTA CFD code
Partition Improvement with Multiple Criteria
Diffusive procedure driven by application-defined priority list of mesh entity types to balance. Iteration over three stages:
- Schedule load transfer from heavy parts to light parts
- Select mesh elements on the part boundaries that will smooth the part boundary
- Migrate selected mesh elements according to schedule
With ParMA Vertex>Region, PHASTA CFD strong scaling improves from 0.88 to 0.95 on 288K cores, JuGene BG/P, with 1B-region mesh.
Table. Partition improvement time and entity imbalances when applying ParMA to a hypergraph-based partition of 133M region AAA mesh with 16,384 parts on 512 cores of Jaguar