Sequential/Parallel SpGEMM Library
- Authors:
- Aydin Buluc
This is a Sparse Matrix Multiplication Library It targets uniprocessor, shared-memory (such as multicores), and distributed memory platforms.
It contains efficient implementations of novel algorithms as well as reimplementations of some previously known algorithms (for comparison).
It is written in C++ using:
Sequential classes:
- SparseTriplets : uses triples format to store matrices, only used for input/output and intermediate tasks (such as sorting) Does not contain a multiplication routine.
- SparseColumn : multiplication is similar to Matlab's, holds CSC.
- SparseComp : holds both CSC and CSR.
- SparseDComp : implements Alg 1A [1], holds both DCSC and DCSR.
- SparseDColumn : implements Alg 1B and Alg 2 [1] (depending on the macro in src/SpDefines.h), holds DCSC.
Parallel classes:
- SparseThreaded : shared memory implementation. Uses Boost.Threads for multithreading.
Uses a logical 2D block decomposition of sparse matrices.
Asyncronous (to migitate the severe load balancing problem)
Lock-free (since it relies on the owner computes rule, i.e.
is updated by only
) - SparsePar2D : synchronous distributed memory implementation (i.e. executes in
stages)
Based on SUMMA
Each processor stores its submatrix (block) in SparseDComp format
Uses Boost.MPI for communication
Slightly more scalable but much slower (a factor of 2) than SparseSumSync. - SparseSumSync : similar to SparseSumSync in principle, but uses SparseDColumn for storing submatrices.
- SparseSumAsync : asyncronous distributed memory implementation (i.e. no broadcasting or stages)
If a processor finished its update on
using
,
, it requests
,
from their owners right away.
A ServerThread is used for serving send_matrix requests to provide asyncronous computation.
Threading with 2 threads per processor (1 for compute, 1 for communicate) severely degrades the performance.
OpenMPI does not really work with multithreaded environments.
MPICH2 does work but it uses polling to minimize latency (no thread_yield), therefore even more degrading the performance. - SparseSumGas : asyncronous distributed memory implementation using GASNET .
For internal installation and implementation tricks, consult http://editthis.info/cs240aproject/Main_Page
Test programs demonstrating how to use the library:
[1] Aydin Buluç and John R. Gilbert, “On the Representation and Multiplication of Hypersparse Matrices”. The 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS2008), Miami, FL, April 14-18, 2008
Generated on Mon Jan 28 23:04:06 2008 for Parallel SpGEMM Library by
1.4.7