Original CSB paper (SPAA 2009)

This code primarily targets the multicore and SMP like machines, although it works serially as well.
It has been written in C++ and parallelized using Cilk++

Library and drivers in a compressed tar file
README
Example makefile
An example input in (compressed) ascii and in (compressed) binary


2011 Release: We have updated the original code with bitmasked register blocking and symmetry support.
We also did minor fixes that affected performance on matrices having skewed nonzero structure.
Since CilkArts is acquired by Intel, this new code runs natively on any Intel compiler (from Version 12.0.0).
Download the new code here.
Paper describing the bitmasked register blocks and the symmetric algorithm (IPDPS 2011)

How to run it?
make parspmv (the new tarball includes sample makefiles as well)
./parspmv ../BinaryMatrices/kkt_power.bin nosym binary (using the binary format for fast I/O)
./parspmv ../TextMatrices/kkt_power.mtx nosym text (using the matrix market format)

What does those numbers mean?
BiCSB: Original CSB code with minor performance fixes, nonsymmetric and without register blocking. Quite robust
BmCSB: Bitmasked register blocks in action. Modify RBDIM in utility.h to try different blocking sizes (8x8, 4x4, etc). May perform better.
CSC: Serial CSC implementation. For reference only

New performance numbers on Intel's Sandy Bridge (BiCSB numbers from 2011 version only)

The X numbers on top of bars is speed-up with respect to serial CSC code.
*: CSB run with 6 workers. In all other cases, CSB code is ran with 12 workers.
We thank Jong-Ho Byun for MKL numbers. While MKL definitely runs in parallel, we don't know if it uses any internal data structure restructuring for optimization.

For certain matrices, optimizations from the IPDPS 2011 paper significantly improve over these results. For instance, take the Wind tunnel matrix:
CSB: 5577 Mflop/s 
CSB + bitmasked register blocking: 7352 Mflop/s
Symmetric CSB: 6961 Mflop/s
Symmetric CSB + bitmasked register blocking: 7780 Mflop/s 


© Copyright by Aydin Buluc
Code released under MIT License.
Please cite the appropriate paper(s) if you end up using the code for your research.