Dual Randomized Kaczmarz

DRK is an algorithm for solving Laplacian linear systems in the dual space (updating edges) instead of the primal space (updating vertices). We have written a toy version of this code using Python+Cython, mostly to analyze work required for convergence. This implementation is on github and can be found here.


MarkerReduce is an algorithm for efficient data reduction of genetic marker data. The goal is to reduce the set of homozygous markers to a set of bins corresponding to unique locations on the genetic map. By reducing the data in this way, a more accurate map may be constructed from the much less noisy, more complete (in terms of less missing data) bins. The code is written in C++ and is available here: MarkerReduce code


BubbleCluster is an algorithm used to efficiently cluster genetic markers into linkage groups. The current version assumes a DH input population. However, the only change required to extend the algorithm to other population types is a modified returnLOD function, which can be altered to produce a LOD score for any population type. BubbleCluster is written in C++ and is available here: BubbleCluster Code

The Knowledge Discovery Toolbox (KDT)

The Knowledge Discovery Toolbox (KDT) provides domain experts with a simple interface to analyze very large graphs quickly and effectively without requiring knowledge of the underlying graph representation or algorithms. The current version provides a selection of functions on attributed semantic graphs, directed graphs, and sparse matrices, from simple exploratory functions to complex algorithms. Because KDT is open-source, it can be customized or extended by interested (and intrepid) users.

The Combinatorial BLAS

The Combinatorial BLAS consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. This is a distributed memory reference implementation that implements scalable sparse (and some dense) matrix operations that is used to implement graph algorithms such as betweenness centrality and Markov clustering. For more information, online documentation, references and the library itself, go to the project webpage.

Compressed Sparse Blocks (CSB)

CSB is a new sparse matrix storage format that doesn't favor row over columns (or vice verse). It allows fast parallel computation of both sparse matrix times dense vector (SpMV) and sparse matrix transpose times dense vector (SpMVT) operations in multicore architectures. This library is a sample implementation using Intel Cilk Plus.  

All-Pairs Shortest Paths on GPU

Here is a CUDA program that computes the distances for all-pairs shortest paths in a dense directed graph on Graphical Processing Units. The algorithm illustrates the methods described in a parallel computing paper


PSORT [download] is a parallel sorting code for distributed and shared memory architectures. It uses MPI for communication, and is designed to minimize the volume of communication. It has been tested on terabyte datasets with up to 256 processors. It does not use sampling to locate splitters, and hence no prior knowledge of the probability distribution is required. It provides several options for splitting, sequential sorting and merging. A parallel sample sort is also included.

PSORT powers the parallel sorting functionality in Star-P. PSORT is made available under the MIT license.


GeomOct is a Geometry Generator, which splits a cubic region into octrees. It is being used by our group to evaluate preconditioners for various geometries. The code is written in C++, using templates. It is modular, and meant to be fairly extensible. We hope that you find the code useful, and that it saves you some effort. GeomOct documentation, generated through Doxygen, is viewable online. Code coverage results are also viewable online. The current set of features includes:

  • Generation of trees using an arbitrary splitting criterion.
  • Visualization with PyMol. The code generates .pdb files for use with PyMol.
  • Storage of data on both cells and corners.
  • Methods to iterate over all corners of the octrees.
  • Methods to iterate over all nodes of the octrees.
  • Unit tests to verify behavior. Needs the CppUnit library.
  • No memory leak, or strange memory access. Regularly checked against Valgrind.
  • Complete documentation, written using doxygen.
  • Well-documented, readable code.
  • Lots of comments.
  • Automated building using autoconf and automake. Should run under nearly any platform.
  • Extremely well tested code. Code coverage checked with lcov automatically.
  • Freely distributable. Code is available under the GNU Public License, version 2.

Download the latest version of the code: octree-0.80.tar.gz. octree-0.80.zip. This code was last updated on Mon Jul 2 11:57:05 PDT 2007. We welcome comments, bug reports, and bug fixes. This code is under development right now, so feel free to mail me feature requests. Please send comments to Vikram at This email address is being protected from spambots. You need JavaScript enabled to view it..

Videos of a cut away 3D octree mesh (of a sphere) are available here. We also have a mesh of a grid obtained by generating the vaidya preconditioner on the symmetric matrix here.


SSCA#2 is a part of the Synthetic Scalable Concise Applications benchmkark suite. The SSCAs are a new set of benchmarks designed by the DARPA/DOE High Productivity Computer Systems program to complement existing benchmarks. It is a graph analysis benchmark comprising of a data generator and 4 kernels which operate on the graph. The benchmark is designed to have very little locality so that the memory subsystems are tested. In case of a parallel implementation, the non-locality causes a lot of remote memory lookups.

cSSCA#2 is a concise implementation of SSCA#2 in MATLAB and Star-P - a parallel implementation of the matlab programming language.