# Parallel Scientific Computation

Scalable parallel computers are allowing solutions on meshes of hundreds of millions of elements. Key to the effective parallel computations is balancing the computational load over the processors while keeping interprocessor communications to a minimum. To ensure the reliability of the simulations performed, adaptive simulation techniques should be used. However, the effective parallelism of adaptive computations is complicated by the fact that the computational load is constantly changing. SCOREC is developing parallel processing tools to address the needs of parallel automated adaptive analysis.

A key component of the ability to perform parallel adaptive computations is the base structures used to house the domain discretizations. The structures SCOREC has developed assume a fully distributed set of structures with communication via message passing (MPI). Two such structures were developed for the needs of automated parallel analysis:

- A mesh structure for distributed meshes based on a hierarchical serial mesh database.
- A distributed octree structure that has been used to support the dynamic re-partitioning of a mesh, parallel generation of the mesh and linkage to discrete models.

The constantly evolving nature of the computations in automated adaptive analysis requires the ability to dynamically re-partition the underlying structures to maintain load balance. Alternative algorithms for this process have been developed and found to have specific advantages for specific situations.

The entity weightings have been used with the dynamic re-partitioning procedures to properly account for computational cost per entity in an adaptive time marching algorithm and in a predictive load balancing procedure where the weights were based on refinement level and the mesh re-balanced before the refinement was performed.

The procedures to perform automated adaptive analysis have been built on the base parallel structures and techniques. The specific components considered include parallel automatic mesh generation, parallel mesh adaptation, parallel solver and complete parallel solution procedures.

Our most recent efforts on the development of these structures and methods has focused on:

- Definition and implementation of a configurable mesh database that operates in serial and parallel.
- Implementation of effective parallel control mechanisms building on RPM.
- Linkage with a generalized object oriented dynamic load balancing that can be used to maintain load balance for adaptive multiscale computations.

The configurable mesh data structure we are developing is the recently proposed Parallel Algorithm Oriented Mesh Data structure (PAOMD) in which the application can dictate the adjacencies needed for AOMD to construct at run time. (PAOMD is available as open source from SCOREC.) Given a set of minimal entities and adjacencies, AOMD can construct any required adjacency. Current extensions allow AOMD to support conforming and non-conforming grids defined by the general subdivision of an initially conforming mesh and octree-based spatial discretizations.

The Rensselaer partition model (RPM) has a hierarchical structure with a partition model containing the actual dissection of the domain into segments. Partitions are grouped by a process model for assignment to processes. Processes are assigned to a machine model which represents the actual computational nodes and their communication and memory properties. RPM's dynamic partitioning capabilities make it possible to tailor task scheduling to a particular architecture. Its structure is rich enough to handle h-, p-, and r-refinement and several discretization technologies, including finite difference, finite volume, finite element, and partition of unity methods. Using RPM as a prototype, we are investigating procedures for optimal task scheduling in non-uniform computational environments and the treatment of the coupling between different computational models. We believe the development of dynamic load balancing procedures for adaptive multiscale computations will be effectively supported by general object level specification of the Zoltan dynamic load balancing library. The key to this process is the mapping of the components of a multiscale computation to Zoltan objects with appropriate attributes specified for the load balancing procedures.