COMBINATORIAL_BLAS
1.4

This material is based upon work supported by the National Science Foundation under Grant No. 0709385 and by the Department of Energy, Office of Science, ASCR Contract No. DEAC0205CH11231. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF) and the Department of Energy (DOE). This software is released under the MIT license described here.
The Combinatorial BLAS is an extensible distributedmemory parallel graph library offering a small but powerful set of linear algebra primitives specifically targeting graph analytics.
Download
Requirements: You need a recent C++ compiler (gcc version 4.4+, Intel version 11.0+ and compatible  please avoid 13.x with std=c++11), a compliant MPI implementation, and C++11 Standard library (libstdc++ that comes with g++ has them). If not, you can use the boost library and pass the DCOMBBLAS_BOOST option to the compiler (cmake will automatically do it for you); it will work if you just add boost's path to $INCADD in the makefile. The recommended tarball uses the CMake build system, but only to build the documentation and unittests, and to automate installation. The chances are that you're not going to use any of our sample applications "asis", so you can just modify them or imitate their structure to write your own application by just using the header files. There are very few binary libraries to link to, and no configured header files. Like many highperformance C++ libraries, the Combinatorial BLAS is mostly templated. CombBLAS works successfully with GNU, Intel, and PGI compilers, using OpenMPI, MVAPICH, Cray's MPI (based on MPICH) and Intel MPI libraries.
Documentation: This is a reference implementation of the Combinatorial BLAS Library in C++/MPI. It is purposefully designed for distributed memory platforms though it also runs in uniprocessor and sharedmemory (such as multicores) platforms. It contains efficient implementations of novel data structures/algorithms as well as reimplementations of some previously known data structures/algorithms for convenience. More details can be found in the accompanying paper [1]. One of the distinguishing features of the Combinatorial BLAS is its decoupling of parallel logic from the sequential parts of the computation, making it possible to implement new formats and plug them in without changing the rest of the library.
The implementation supports both formatted and binary I/O. The latter is much faster but no human readable. Formatted I/O uses a tuples format very similar to the Matrix Market. We encourage inmemory generators for faster benchmarking. A port to University of Florida Sparse Matrix Collection is under construction. More info on I/O formats are here
The main data structure is a distributed sparse matrix ( SpParMat <IT,NT,DER> ) which HASA sequential sparse matrix ( SpMat <IT,NT> ) that can be implemented in various ways as long as it supports the interface of the base class (currently: SpTuples, SpCCols, SpDCCols).
For example, the standard way to declare a parallel sparse matrix A that uses 32bit integers for indices, floats for numerical values (nonzeros), SpDCCols <int,float> for the underlying sequential matrix operations is:
The repetitions of int and float types inside the SpDCCols< > is a direct consequence of the static typing of C++ and is akin to some STL constructs such as vector<int, SomeAllocator<int> >. If your compiler support "auto", then you can have the compiler infer the type.
Sparse and dense vectors can be distributed either along the diagonal processor or to all processor. The latter is more space efficient and provides much better load balance for SpMSV (sparse matrixsparse vector multiplication) but the former is simpler and perhaps faster for SpMV (sparse matrixdense vector multiplication)
New in version 1.4:
New in version 1.3:
The supported operations (a growing list) are:
All the binary operations can be performed on matrices with different numerical value representations. The typetraits mechanism will take care of the automatic type promotion, and automatic MPI data type determination.
Some features it uses:
Important Sequential classes:
Important Parallel classes:
Applications implemented using Combinatorial BLAS:
Performance results of the first two applications can be found in the design paper [1]; Graph 500 results are in a recent BFS paper [4]. The most recent sparse matrix indexing, assignment, and multiplication results can be found in [5]. Performance of filtered graph algorithms (BFS and MIS) are reported in [7].
A subset of test programs demonstrating how to use the library (under ReleaseTests):
Citation: Please cite the design paper [1] if you end up using the Combinatorial BLAS in your research.