Revision 075c02141d8c0aeff9d01a8ad75876a2d8778771 authored by Peter Raback on 05 December 2023, 13:31:00 UTC, committed by Peter Raback on 05 December 2023, 13:31:00 UTC
1 parent 6bc9811
Raw File
hutidoc.tex
%
% Documentation for the HUT-Iter library
% $Id: hutidoc.tex,v 1.1.1.1 1998/07/31 10:19:12 jim Exp $
%

\documentclass[11pt,a4paper,english,oneside]{report}
\usepackage{graphicx,float,isolatin1,t1enc}

\title{HUTI - HUT Iter Library, User's Guide}

\author{Jouni Malinen\\
CSC - IT Center for Science Ltd.\\
P.O.BOX 405, FIN-02101 Espoo, Finland\\
Jouni.Malinen@csc.fi}

\date{\today\\
Version 1.0
}

% ------------ Here begins the actual document
%              Standard stuff

\begin{document}

% Miscellaneous settings

\setcounter{secnumdepth}{4}
\setcounter{tocdepth}{4}

\pagenumbering{roman}
\pagestyle{plain}

\maketitle

\tableofcontents
% \listoffigures
\listoftables

% ------------ The actual text begins here

% ------------------------------------------------------------------------
% ------------------------------------------------------------------------

\chapter{Introduction}
\label{ch:intro}
\pagenumbering{arabic}
\pagestyle{headings}

\section{General}

Many computational problems require the solution of linear systems of
equations $Ax = b$, where $A$ is the coefficient matrix, $b$ is
the {\em right-hand side} \/and $x$ is the solution.

There are several methods for solving linear systems of equations. These
can be divided into two classes, direct and iterative methods. In
general direct methods are preferred because of their predictable
behaviour and robustness. However, the need to solve very large linear
systems in a reasonable time and with limited resources has been one of
the key reasons for the development of iterative solvers. 
We now know more of the behaviour and suitability of these methods
and are able to use them in different kind of applications.

HUTI is an effort to make an effective and well structured library
containing a collection of iterative methods. The methods implemented
in the library are:

\begin{itemize}
\item Conjugate Gradient (CG) \cite{Bar93}
\item Conjugate Gradient Squared (CGS) \cite{Bar93}
\item Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) \cite{Bar93}
\item Bi-Conjugate Gradient Stabilized (2) (Bi-CGSTAB(2))
\item Quasi-Minimal Residual (QMR) \cite{Bar93,Fre91,Fre94,Buc96}
\item Transpose-Free Quasi-Minimal Residual (TFQMR) \cite{Fre93b}
\item Generalized Minimum Residual (GMRES) \cite{Bar93,Saa96}
\end{itemize}

This library supports both serial and parallel execution. Parallelisation
has been made in a distributed memory environment using message passing
as a communication method between processes. User has the same interface
into the library for both of the execution models. The selection of the
model can be controlled with special library routines or via environment
variables.

\section{Why HUTI?}

There are already several implementations of various iterative
methods both as libraries and as ``plain code''
\cite{Cun95,Bal95,Saa95,Fre96}.
HUTI differs from these implementations in the sense that it has been
specially tuned for the parallel architecture it is using and is
not meant to be a general purpose code.

Another reason for making this library is the
master's thesis of the author. HUTI has also been incorporated into
a software called ELMER which in turn is made for a
TEKES\footnote{Technology Development Center in Finland}
funded project VIRKE\footnote{VIRtauslaskentaohjelmiston KEhitt�minen}.

The name HUTI comes from Helsinki University of Technology (HUT) and
Iterative solvers.

% ------------------------------------------------------------------------
% ------------------------------------------------------------------------

\chapter{Iterative Methods}
\label{ch:methods}

This chapter should somehow present the characteristics of different
iterative methods and their usability in different problem areas.

\section{Overview of the Methods}
\section{Preconditioning}
\section{Stopping Criteria}
\section{Convergence}
\section{Parallelism}

% ------------------------------------------------------------------------
% ------------------------------------------------------------------------

\chapter{Using HUTI}
\label{ch:using}

\section{Naming Conventions}

All the HUTI routine names and variables start with the 

\begin{minipage}{1in}
\begin{center}
\bigskip
{\ttfamily huti\_} \\
or \\
{\ttfamily HUTI\_}
\bigskip
\end{center}
\end{minipage}

\noindent prefix. In the routine names the precision is denoted by an
appropriate character: {\ttfamily s} for
{\em single precision},
{\ttfamily d} for {\em double precision}, {\ttfamily c} for {\em complex}
and {\ttfamily z} for {\em double complex}.

% ------------------------------------------------------------------------

\section{Driver Routines}

The key idea in HUTI is that all iterator routines have the same calling
conventions regardless of the selected method.
All matrix related operations are done externally from the iterator
library. This means that the solver doesn't need to know the exact matrix
structure. Matrix can be stored for example in well-known Compressed
Row Storage (CRS) or Compressed Comlumn Storage (CCS) formats.
This eases the optimization of memory usage in each particular case.

In parallel setting it is on users responsibility to define the storage
convention for the distribution of matrices and vectors.
Well-known distribution concepts are for example block-cyclic decomposition
and domain based decompositions. More information can be found from
\cite{Kum94,Saa96}. See also chapter \ref{ch:examples} for an example
of user supplied distribution of data.

Solver routines are called in the following way:

\noindent
\begin{tabbing}
{\ttfamily CALL HUTI\_$*$\_{\em SOLVER\_TYPE}} {\ttfamily (} \= 
{\ttfamily X, RHS, IPAR, DPAR, WORK, MATVEC,} \\
\> {\ttfamily PCONDL, PCONDR, DOTPROD, NORM,} \\
\> {\ttfamily STOPC)} \\
\end{tabbing}

\noindent
\begin{tabular*}{\textwidth}{lll}
where & $*$ & is either {\ttfamily S, D, C} or {\ttfamily Z}
depending on the precision. \\
& {\ttfamily {\em SOLVER\_TYPE}} & is either {\ttfamily CG, CGS, BICGSTAB,
BICGSTAB\_2, QMR, TFQMR} or {\ttfamily GMRES} \\
& & depending on the method. \\
\end{tabular*}

Table \ref{table:solver-param} describes the parameters for solver
routines.

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
X	& vector of 		& Vector $x$, the current iterate \\
	& type $*$		& \\
RHS 	& vector of		& $b$, the Right Hand Side \\
	& type $*$		& \\
IPAR	& vector type		& IPAR-structure, see section \ref{sec:ipars} \\
	& integer		& \\
DPAR 	& vector of type	& DPAR-structure, see section \ref{sec:dpars} \\
	& double prec.		& \\
WORK 	& matrix of		& User allocated working array, size varies\\
	& type $*$		& depending on the method, see table \ref{table:ipar-input} \\
MATVEC 	& subroutine 		& User supplied external routine, \\
	&			& must perform matrix-vector product \\
PCONDL 	& subroutine 		& User supplied routine for left side \\
	&			& preconditioning \\
PCONDR 	& subroutine 		& User supplied routine for right side \\
	&			& preconditioning \\
DOTPROD	& function 		& Used supplied routine to perform dot \\
	&			& product \\
NORM 	& function 		& User supplied routine returning norm \\
	&			& of a vector \\
STOPC 	& function 		& User supplied routine to perform stopping\\
	&			& criteria testing \\
\hline\hline
\end{tabular*}
\caption{Parameters for the solver routines}
\label{table:solver-param}
\end{table}

The external routine {\ttfamily MATVEC} is the only needed routine when
calling a solver. It should perform matrix-vector product.
Using zeros in place of the other external routine names will force the
library to use default routines applicapable to the selected execution model.
For example {\em double complex} Conjugate Gradient method could be called
from a FORTRAN program in the following way:

\medskip
\noindent
{\ttfamily CALL HUTI\_Z\_CG (X, RHS, IPAR, DPAR, WORK, MATVEC, 0, 0, 0, 0, 0)}
\medskip

\noindent
where {\ttfamily X, RHS, IPAR, DPAR} are user supplied vectors and
{\ttfamily WORK} is the user allocated work space (an array) for the iterator.
In this case the library would use BLAS-1 calls for {\ttfamily DOTPROD} and
{\ttfamily NORM} if executed in serial execution mode. There would be no
preconditioning applied. The {\ttfamily IPAR} and {\ttfamily DPAR} structures
must contain user supplied information about the dimensions of the vectors
and work array and certain control information for the iterators.

\section{External Routines}

This section describes external routines that can be given as arguments
for the solver routine. Only {\ttfamily MATVEC} routine is required,
others routines are optional.

These routines are called from the solver and type of arguments and order
is presented for each routine.

The matrix $A$ can be stored in any format because it is totally on
the user's responsibility to make it available for the external routines.

The {\ttfamily IPAR} structure is passed to some of the external routines
and is used to carry on certain control variables from the solver routine.
In the {\ttfamily IPAR} structure there is for example the assumed type
of the matrix in external operation. This applies for both the
matrix-vector operation
$Au = v$ and preconditioning operations $M_{1}^{-1}u = v$ and
$M_{2}^{-1}u = v$.

\subsection{Matrix-Vector operation}

The arguments for the external matrix-vector operation {\ttfamily MATVEC}
are given in Table \ref{table:matvec-param}. This routine should perform
matrix-vector product. In the {\ttfamily IPAR} structure the iterator
provides information about the matrix form, should it be transposed or not.
Only non-transposed forms are used in CG, CGS, Bi-CGSTAB, TFQMR and GMRES
methods.
Only QMR will need a transposed matrix-vector product, that is $A^{T}u = v$.

The calling convention for the {\ttfamily MATVEC} is:

\bigskip
\noindent
{\ttfamily SUBROUTINE MATVEC ( U, V, IPAR )}
\bigskip

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
U	& vector of 		& Vector u in $Au = v$ \\
	& type $*$		& \\
V 	& vector of		& Vector v in $Au = v$ \\
	& type $*$		& \\
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
	& integer		& \\
\hline\hline
\end{tabular*}
\caption{Parameters for the external MATVEC subroutine}
\label{table:matvec-param}
\end{table}

\subsection{Preconditioning}

The routines {\ttfamily PCONDL} and {\ttfamily PCONDR} should solve
the $M_{1}u = v$ and $M_{2}u = v$ respectively if preconditioning
matrix is splitted in two parts. If only one preconditioning matrix $M$ is
available, the {\ttfamily PCONDL} routine should solve $Mu = v$ and
{\ttfamily PCONDR} should not be supplied for the solver (the argument must
be zero).
 
The arguments for the external preconditioning operations
{\ttfamily PCONDL} and {\ttfamily PCONDR} are given in
Table \ref{table:pcond-param}. Preconditioning routines should use the
information in {\ttfamily IPAR} structure to apply transposed or
non-transposed solve when needed. Only QMR method will need the
$M^{-T}u = v$ operation.

The calling convention for the {\ttfamily PCONDL} is

\bigskip
\noindent
{\ttfamily SUBROUTINE PCONDL ( U, V, IPAR )}
\bigskip

\noindent
and for the {\ttfamily PCONDR}

\bigskip
\noindent
{\ttfamily SUBROUTINE PCONDR ( U, V, IPAR )}
\bigskip

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
U	& vector of 		& Vector u in $Mu = v$ \\
	& type $*$		& \\
V 	& vector of		& Vector v in $Mu = v$ \\
	& type $*$		& \\
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
	& integer		& \\
\hline\hline
\end{tabular*}
\caption{Parameters for the external PCONDL and PCONDR subroutines}
\label{table:pcond-param}
\end{table}

\subsection{Global Dot Product}

The external function {\ttfamily DOTPROD} is supposed to perform global
dot product for two given vectors. In the serial case this routine
is by default the corresponding BLAS-1 routine. In the parallel case
this is the place to do global product using for example
{\ttfamily MPI\_ALLREDUCE} function to sum up the local products
computed using for example BLAS-1 routine.

The calling convention for the function {\ttfamily DOTPROD} is

\bigskip
\noindent
{\ttfamily FUNCTION DOTPROD ( NDIM, X, INCX, Y, INCY )}
\bigskip

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
NDIM	& integer		& Dimension of vectors X and Y\\
X	& vector of 		& Vector x in $x \cdot y$ \\
	& type $*$		& \\
INCX	& integer		& The increment for the elements of X \\
Y 	& vector of		& Vector y in $x \cdot y$ \\
	& type $*$		& \\
INCY	& integer		& The increment for the elements of Y \\
\hline\hline
\end{tabular*}
\caption{Parameters for the external DOTPROD function}
\label{table:dotprod-param}
\end{table}

The function {\ttfamily DOTPROD} must return a value of the same type as the
argument vectors.

\subsection{Global Vector Norm}

The external routine {\ttfamily NORM} is used to produce the global
vector norm, usually the vector 2-norm $\|x\|_{2}$. In the serial case
this routine is by default the corresponding BLAS-1 routine. In parallel
case this is very similar as {\ttfamily DOTPROD} function.

The calling convention for the function {\ttfamily NORM} is

\bigskip
\noindent
{\ttfamily FUNCTION NORM ( NDIM, X, INCX )}
\bigskip

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
NDIM	& integer		& Dimension of vector X \\
X	& vector of 		& Vector x in $\|x\|$ \\
	& type $*$		& \\
INCX	& integer		& The increment for the elements of X \\
\hline\hline
\end{tabular*}
\caption{Parameters for the external NORM function}
\label{table:norm-param}
\end{table}

The function {\ttfamily NORM} must return a value that is either
real if X is single precision ({\em real or complex}) or double if X is
double precision ({\em double precision or double complex}).

\subsection{Stopping Criterion}

Stopping criterion can be selected from the built-in stopping criteria
or it can be supplied by the user. Built-in alternatives are
listed in table \ref{table:ipar-input}.

The calling convention for the user supplied function {\ttfamily STOPC} is

\bigskip
\noindent
{\ttfamily FUNCTION STOPC ( X, B, R, IPAR, DPAR )}
\bigskip

\begin{table}[H]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
\hline
X	& vector of 		& Current iterate $x_{n}$ \\
	& type $*$		& \\
B	& vector of 		& The original right-hand side \\
	& type $*$		& \\
R	& vector of 		& Current residual vector $r_{n}$ \\
	& type $*$		& \\
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
	& integer		& \\
DPAR	& vector of type	& DPAR-structure, see section \ref{sec:dpars} \\
	& double precision	& \\
\hline\hline
\end{tabular*}
\caption{Parameters for the external STOPC function}
\label{table:stopc-param}
\end{table}

The function {\ttfamily STOPC} must return a value of same type as
the {\ttfamily NORM} function for selected precision. See the previous
section.

The returned value should describe somehow how close the current iterate is
the user supplied tolerance. It will be tested against the tolerance and
printed if requested.

% ------------------------------------------------------------------------

\section{Iteration Parameters}

\subsection{IPAR -Structure}
\label{sec:ipars}

The {\ttfamily IPAR} structure is used to control the progress and behaviour
of the iterator routine and to get status back from it. {\ttfamily IPAR}
is also passed further to some of the user supplied routines.

Input parameters are described in table \ref{table:ipar-input} along with
their default values, output parameters are in table \ref{table:ipar-output}.

A more detailed description of the various parameters and output values
for each solver is listed on the corresponding reference pages.

\begin{table}[H]

\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Element} & {\bfseries Description} & {\bfseries Default} \\
\hline\hline
	& {\em General parameters} 				& \\
\hline
1	& Length of the IPAR structure 				& 50 \\
2	& Length of the DPAR structure 				& 10 \\
3	& Leading dimension of the matrix (and vectors)		& \\
4	& Number of vectors in the {\ttfamily work} array: 	& \\
	& CG: 4 						& \\
	& CGS: 7 						& \\
	& Bi-CGSTAB: 8						& \\
	& Bi-CGSTAB\_2: 8					& \\
	& QMR: 14 						& \\
	& TFQMR: 10 						& \\
	& GMRES: 7 + number of restart vectors			& \\
5 	& Number of iterations between debug output 		& 0 \\
6 	& Assumed matrix type in external operations		& \\
	& 0: Matrix must {\em not} be transposed 		& \\
	& 1: Matrix must be transposed 				& \\
\hline
	& {\em Iteration parameters} 				& \\
\hline
10 	& Maximum number of iterations allowed 			& 5000 \\
12 	& Stopping criterion used: 				& 0 \\
	& ($\epsilon$ is the tolerance given by the user, see table \ref{table:dpar}) & \\
	& 0: $\|r_{n}\| < \epsilon$ 				& \\
	& 1: $\|r_{n}\| < \epsilon \|b\| $ 			& \\
	& 2: $\|z_{n}\| < \epsilon$ 				& \\
	& 3: $\|z_{n}\| < \epsilon \|b\| $ 			& \\
	& 4: $M^{-1}\|z_{n}\| < \epsilon M^{-1}\|b\| $ 		& \\
	& 5: $\|x_{n} - x_{n-1}\| < \epsilon $ 			& \\
	& 6: {\em upper bound} $< \epsilon$ (only with TFQMR)	& \\
	& 10: Use the user supplied routine {\ttfamily STOPC} 	&\\
13	& Preconditioning technique used: 			& 0 \\
	& 0: None 						& \\
	& 1: Right preconditioning 				& \\
	& 2: Left preconditioning 				& \\
	& 3: Symmetric preconditioning 				& \\
14 	& Initial $x_{0}$, starting vector: 			& 0 \\
	& 0: Random $x_{0}$					& \\
	& 1: User supplied $x_{0}$, vector in {\ttfamily XVEC} 	& \\
15	& Number of restart vectors in GMRES(m)		 	& 1 \\
\hline
	& {\em Parallel environment parameters} 		& \\
\hline
20 	& Processor identification number for specific process 	& \\
21 	& Number of processors 					& 1 \\
\hline\hline
\end{tabular*}

\caption{IPAR-structure, input parameters}
\label{table:ipar-input}
\end{table}

\begin{table}[H]

\begin{tabular*}{\textwidth}{ll}
\hline\hline
{\bfseries Element} & {\bfseries Description} \\
\hline\hline
	& {\em General parameters} \\
\hline
30 	& Status information: \\
	& 0: No change \\
	& 1: Iteration converged \\
	& 2: Maximum number of iterations reached \\
	& 10: QMR breakdown in $\rho$ or $\psi$ \\
	& 11: QMR breakdown in $\delta$ \\
	& 12: QMR breakdown in $\epsilon$ \\
	& 13: QMR breakdown in $\beta$ \\
	& 14: QMR breakdown in $\gamma$ \\
	& 20: CG breakdown in $\rho$ \\
	& 25: CGS breakdown in $\rho$ \\
	& 30: TFQMR breakdown in $\rho$ \\
	& 35: Bi-CGSTAB breakdown in $\rho$ \\
	& 36: Bi-CGSTAB breakdown in $\|s\|$ \\
	& 37: Bi-CGSTAB breakdown in $\omega$ \\
31 	& Number of iterations run through \\
\hline\hline
\end{tabular*}

\caption{IPAR-structure, output parameters}
\label{table:ipar-output}
\end{table}

\subsection{DPAR -Structure}
\label{sec:dpars}

For parameters of type {\em double precision} there is a structure
called {\ttfamily DPAR}. Table \ref{table:dpar} describes the
elements of this structure.

\begin{table}[h]
\begin{tabular*}{\textwidth}{lll}
\hline\hline
{\bfseries Element} & {\bfseries Description} & {\bfseries Default} \\
\hline\hline
	& {\em General parameters} & \\
\hline
1 	& Tolerance used by stopping criterion & $10e^{-6}$ \\
\hline\hline
\end{tabular*}
\caption{DPAR-structure}
\label{table:dpar}
\end{table}

% ------------------------------------------------------------------------

\section{Header Files}
\subsection{{\ttfamily huti\_fdefs.h} and {\ttfamily huti\_defs.h}}

There are header files in preprocessor format for both Fortran90 and C
languages. These header files include definitions for all of the
variables described in
tables \ref{table:ipar-input}, \ref{table:ipar-output} and \ref{table:dpar}.
There are also definitions for possible flags of certain variables and
default values.

The user should use the named definitions by including header file
via {\ttfamily \#include ``huti\_defs.h''} for C defines and 
{\ttfamily \#include ``huti\_fdefs.h''} for Fortran90 defines. In that
way the compatibility is guaranteed also with the later versions of
the library.

% ------------------------------------------------------------------------
% ------------------------------------------------------------------------

\chapter{Examples}
\label{ch:examples}

% ------------ End of the main text

% ------------ Bibliography

\begin{thebibliography}{1}

	\bibitem{Gol89}
	Gene H. Golub and Charles F. van Loan, {\em Matrix Computations},
	second edition, The Johns Hopkins University Press, 1993.

        \bibitem{Gei93}
        Al Geist et al., {\em PVM 3 User's Guide and Reference Manual},
        Oak Ridge National Laboratory, Oak Ridge Tennessee, May 1993.

        \bibitem{Bar93}
        Richard Barrett et al.,{\em Templates for the Solution of Linear 
        Systems: Building Blocks for Iterative Methods}, SIAM, 1993

        \bibitem{Fre91}
        Roland W. Freund and No\"el Nachtigal, {\em QMR: a Quasi-Minimal
        Residual Method for Non-Hermitian Linear Systems}, Numer. Math. 
        60, 315-339, 1991

        \bibitem{Fre93a}
        Roland W. Freund, {\em An Implementation of the Look-Ahead
	Lanczos Algorithm for Non-Hermitian Matrices},
        SIAM J. Sci. Comput., Vol. 14, No. 1, pp. 137-158, January 1993

        \bibitem{Fre93b}
        Roland W. Freund, {\em A Transpose-Free Quasi-Minimal Residual
	Algorithm for Non-Hermitian Linear Systems}, 
        SIAM J. Sci. Comput., Vol. 14, No. 2, pp. 470-482, March 1993

        \bibitem{Fre94}
        Roland W. Freund and No\"el Nachtigal, {\em An Implementation of the 
        QMR Method Based on Coupled Two-Term Recurrences},
        SIAM J. Sci. Comput., Vol. 15, No. 2, pp. 313-337, March 1994

	\bibitem{Mpi94}
	{\em MPI: A Message-Passing Interface Standard}, Message
	Passing Interface Forum, April 1994

	\bibitem{Gro94}
	William Gropp, Ewing Lusk and Anthony Skjellum, {\em Using MPI,
	Portable Parallel Programming with the Message-Passing Interface},
	The MIT Press, 1994

	\bibitem{Cun95}
	Rudnei Dias da Cunha and Tim Hopkins, {\em PIM 2.0, The Parallel
	Iterative Methods package for Systems of Linear Equations, User's
	Guide}, ftp://unix.hensa.ac.uk/pub/misc/netlib/pim/ug20.ps.gz, 1995

	\bibitem{Buc96}
	H. Martin B\"{u}cker and Manfred Sauren, {\em A Parallel Version
	of the Unsymmetric Lanczos Algorithm and its Application to QMR},
	Forschungszentrum J\"{u}lich, March 1996

	\bibitem{Saa96}
	Yousef Saad, {\em Iterative Methods for Sparse Linear Systems},
	PWS Publishing Company, 1996

	\bibitem{Kor95}
	Samuel Kortas and Philippe Angot, {\em A Practical and Portable
	Model of Programming for Iterative Solvers on Distributed Memory
	Machines}, Parallel Computing, Vol. 22, No. 4, June 1996

	\bibitem{Jon95}
	Mark T. Jones and Paul E. Plassman, {\em BlockSolve95 Users Manual:
	Scalable Library Software for the Parallel Solution of Sparse
	Linear Systems}, Argonne National Laboratory ANL-95/48,
	December 1995

	\bibitem{Bal95}
	S. Balay, W. Gropp, L. C. McInnes and B. Smith, {\em PETSc 2.0
	Users Manual}, Argonne National Laboratory ANL-95/11, 1995

	\bibitem{Saa95}
	Yousef Saad and Andrei V. Malevsky, {\em P-SPARSLIB: A Portable
	Library of Distributed Memory Sparse Iterative Solvers},
	University of Minnesota, Department of Computer Science, May 1995.

	\bibitem{Kum94}
	Vipin Kumar, Ananth Grama, Anshul Gupta and George Karypis,
	{\em Introduction to Parallel Computing, Design and Analysis
	of Algorithms}, The Benjamin/Cummings Publishing Company Inc.,
	1994.

	\bibitem{Fre96}
	Roland W. Freund and No\"el Nachtigal, {\em QMRPACK: A Package
	of QMR Algorithms}, ACM Transactions on Mathematical Software,
	Vol 22, No. 1, pp. 46-77, March 1996

\end{thebibliography}

\label{page:last}
\end{document}
back to top