Revision - e8b1251 - Staggered fix finished

Revision e8b1251b8c29c4219cc7e1cd7083b4f008dfa7e3 authored by Peter Boyle on 17 February 2022, 04:51:13 UTC, committed by Peter Boyle on 17 February 2022, 04:51:13 UTC

Staggered fix finished

1 parent fad5a74

Files
Changes

Permalinks

interfacing.rst

Interfacing with external software
========================================

Grid provides a number of important modules, such as solvers and
eigensolvers, that are highly optimized for complex vector/SIMD
architectures, such as the Intel Xeon Phi KNL and Skylake processors.
This growing library, with appropriate interfacing, can be accessed
from existing code. Here we describe interfacing issues and provide
examples.

	  
MPI initialization
--------------------

Grid supports threaded MPI sends and receives and, if running with
more than one thread, requires the MPI_THREAD_MULTIPLE mode of message
passing. If the user initializes MPI before starting Grid, the
appropriate initialization call is::

  MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, &provided);
  assert(MPI_THREAD_MULTIPLE == provided);

Grid Initialization
---------------------

Grid itself is initialized with a call::

  Grid_init(&argc, &argv);

Command line options include::

  --mpi n.n.n.n   : default MPI decomposition
  --threads n     : default number of OMP threads
  --grid n.n.n.n  : default Grid size
  
where `argc` and `argv` are constructed to simulate the command-line
options described above.  At a minimum one usually provides the
`--grid` and `--mpi` parameters.  The former specifies the lattice
dimensions and the latter specifies the grid of processors (MPI
ranks).  If these parameters are not specified with the `Grid_init`
call, they need to be supplied later when creating Grid fields.

The following Grid procedures are useful for verifying that Grid
"default" values are properly initialized.

=============================================================   ===========================================================================================================
  Grid procedure                                                  returns 
=============================================================   ===========================================================================================================
  std::vector<int> GridDefaultLatt();                            lattice size
  std::vector<int> GridDefaultSimd(int Nd,vComplex::Nsimd());    SIMD layout
  std::vector<int> GridDefaultMpi();                             MPI layout
  int Grid::GridThread::GetThreads();                            number of threads
=============================================================   ===========================================================================================================


MPI coordination
----------------

Grid wants to use its own numbering of MPI ranks and its own
assignment of the lattice coordinates with each rank.  Obviously, the
calling program and Grid must agree on these conventions.  One should
use Grid's Cartesian communicator class to discover the processor
assignments. For a four-dimensional processor grid one can define::

  static Grid::CartesianCommunicator *grid_cart = NULL;
  grid_cart = new Grid::CartesianCommunicator(processors);

where `processors` is of type `std::vector<int>`, with values matching
the MPI processor-layout dimensions specified with the `--mpi`
argument in the `Grid_Init` call.  Then each MPI rank can obtain its
processor coordinate using the Cartesian communicator instantiated
above.  For example, in four dimensions::

  std::vector<int> pePos(4);    
  for(int i=0; i<4; i++)
     pePos[i] = grid_cart->_processor_coor[i];

and each MPI process can get its world rank from its processor
coordinates using::

  int peRank = grid_cart->RankFromProcessorCoor(pePos)
	  
Conversely, each MPI process can get its processor coordinates from
its world rank using::

  grid_cart->ProcessorCoorFromRank(peRank, pePos);

If the calling program initialized MPI before initializing Grid, it is
then important for each MPI process in the calling program to reset
its rank number so it agrees with Grid::

   MPI_Comm comm;
   MPI_Comm_split(MPI_COMM_THISJOB,jobid,peRank,&comm);
   MPI_COMM_THISJOB = comm;

where `MPI_COMM_THISJOB` is initially a copy of `MPI_COMM_WORLD` (with
`jobid = 0`), or it is a split communicator with `jobid` equal to the
index number of the subcommunicator.  Once this is done,::

  MPI_Comm_rank(MPI_COMM_THISJOB, &myrank);

returns a rank that agrees with Grid's `peRank`.

QMP coordination
----------------

If the calling program uses the SciDAC QMP message-passing package, a
call to QMP_comm_split() instead can be used to reassign the ranks.
In the example below, `peGrid` gives the processor-grid dimensions,
usually set on the command line with `-qmp-geom`.

**Example**::
  
  int NDIM = QMP_get_allocated_number_of_dimensions();
  Grid::Grid_init(argc,argv);
  FgridBase::grid_initted=true;
  std::vector<int> processors;
  for(int i=0;i<NDIM;i++) processors.push_back(peGrid[i]);
  Grid::CartesianCommunicator grid_cart(processors);
  std::vector<int> pePos(NDIM);
  for(int i=NDIM-1;i>=0;i--)
     pePos[i] = grid_cart._processor_coor[i];
  int peRank = grid_cart->RankFromProcessorCoor(pePos);
  QMP_comm_split(QMP_comm_get_default(),0,peRank,&qmp_comm);
  QMP_comm_set_default(qmp_comm);

  
Mapping fields between Grid and user layouts
---------------------------------------------

In order to map data between calling-program and Grid layouts, it is
important to know how the lattice sites are distributed across the
processor grid.  A lattice site with coordinates `r[mu]` is assigned
to the processor with processor coordinates `pePos[mu]` according to
the rule::

  pePos[mu] = r[mu]/dim[mu]

where `dim[mu]` is the lattice dimension in the `mu` direction.  For
performance reasons, it is important that the external data layout
follow the same rule.  Then data mapping can be done without
requiring costly communication between ranks.  We assume this is the
case here.

When mapping data to and from Grid, one must choose a lattice object
defined on the appropriate grid, whether it be a full lattice (4D
`GridCartesian`), one of the checkerboards (4D
`GridRedBlackCartesian`), a five-dimensional full grid (5D
`GridCartesian`), or a five-dimensional checkerboard (5D
`GridRedBlackCartesian`).  For example, an improved staggered-fermion
color-vector field `cv` on a single checkerboard would be constructed
using

**Example**::

  std::vector<int> latt_size   = GridDefaultLatt();
  std::vector<int> simd_layout = GridDefaultSimd(Nd,vComplex::Nsimd());
  std::vector<int> mpi_layout  = GridDefaultMpi();

  GridCartesian               Grid(latt_size,simd_layout,mpi_layout);
  GridRedBlackCartesian       RBGrid(&Grid);

  typename ImprovedStaggeredFermion::FermionField  cv(RBGrid);

The example above assumes that the grid default values were set in the
`Grid_init` call.  If not, they can be set at this point and passed
when `GridCartesian` is instantiated here.  To map data within an MPI
rank, the external code must iterate over the sites belonging to that
rank (full or checkerboard as appropriate).  Note that the site
coordinates are specified relative to the origin of the lattice
subvolume on that rank. To import data into Grid, the external data on
a single site with coordinates `r` is first copied into the
appropriate Grid scalar object `s`.  Then it is copied into the Grid
lattice field `l` with `pokeLocalSite`::

  pokeLocalSite(const sobj &s, Lattice<vobj> &l, Coordinate &r);

To export data from Grid, the reverse operation starts with::

  peekLocalSite(const sobj &s, Lattice<vobj> &l, Coordinate &r);

and then copies the single-site data from `s` into the corresponding
external type.

Here is an example that maps a single site's worth of data in a MILC
color-vector field to a Grid scalar ColourVector object `cVec` and from
there to the lattice colour-vector field `cv`, as defined above.

**Example**::

  indexToCoords(idx,r);
  ColourVector cVec;
  for(int col=0; col<Nc; col++)
      cVec()()(col) = 
          Complex(src[idx].c[col].real, src[idx].c[col].imag);

  pokeLocalSite(cVec, cv, r);

Here the `indexToCoords()` function is a MILC mapping of the MILC site
index `idx` to the 4D lattice coordinate `r`.

Grid provides block- and multiple-rhs conjugate-gradient solvers. For
this purpose it uses a 5D lattice. To map data to and from Grid data
types, the index for the right-hand-side vector becomes the zeroth
coordinate of a five-dimensional vector `r5`.  The remaining
components of `r5` contain the 4D space-time coordinates.  The
`pokeLocalSite/peekLocalSite` operations then accept the coordinate
`r5`, provided the destination/source lattice object is also 5D.  In
the example below data from a single site specified by `idx`,
belonging to a set of `Ls` MILC color-vector fields, are copied into a
Grid 5D fermion field `cv5`.

**Example**::

    GridCartesian * UGrid = SpaceTimeGrid::makeFourDimGrid(GridDefaultLatt();
    GridRedBlackCartesian * FrbGrid = SpaceTimeGrid::makeFiveDimRedBlackGrid(Ls,UGrid)  typename ImprovedStaggeredFermion5D::FermionField  cv5(FrbGrid);

    std::vector<int> r(4);
    indexToCoords(idx,r);
    std::vector<int> r5(1,0);
    for( int d = 0; d < 4; d++ ) r5.push_back(r[d]);

    for( int j = 0; j < Ls; j++ ){
      r5[0] = j;
      ColourVector cVec;
      for(int col=0; col<Nc; col++){
	  cVec()()(col) = 
	      Complex(src[j][idx].c[col].real, src[j][idx].c[col].imag);
      }
      pokeLocalSite(cVec, *(out->cv), r5);
    }

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...