Skip to content

project-asgard/kronmult993

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kronmult

This library implements the kronmult_batched function which computes output[K] += kron(matrix_list[K]) * input[K] (k being an index in a batch) which is a batch version of the matrix product of the kronecker product of several matrices and a given vector.

We provide efficient parallel implementations, for both CPU (using OpenMP) and GPU (using CUDA), that have been tuned for the needs of ASGarD. In particular, we expect our inputs to be * col-major* matrices and some output pointers to overlap.

Theory

We implement a variant of the backward version of algorithm 993 (Algorithm 993: Efficient Computation with Kronecker Products), chosen to perform well on col-major matrices and take into account the fact that the right side is a vector and not a matrix ( thus not needing an additional transposition).

We highly recommend reading ON KRONECKER PRODUCTS, TENSOR PRODUCTS AND MATRIX DIFFERENTIAL CALCULUS by Stephen Pollock to get more familiar with the linear algebra and reshaping tricks used in the algorithm.

Installation

You can use either the kronmult_omp (CPU paralelism) or the kronmult_gpu (GPU paralelism) CMake target to link this library. See the corresponding folders for further information on both instalation and implementations.

Usage

Include either kronmult.hpp (CPU) or kronmult.cuh (GPU) to get access to the kronmult_batched function which computes output[K] += kron(matrix_list[K]) * input[K] for 0 <= k < batchCount assuming that some output pointers will be equal (thus, requiring a thread-safe addition).

void kronmult_batched(int const matrix_number, int const matrix_size, T const * const matrix_list_batched[], int const matrix_stride,
                      T* input_batched[], T* output_batched[], T* workspace_batched[], int const nb_batch)

Inputs

  • matrix_list_batched is an array of nb_batch*matrix_count pointers to square matrices of size matrix_size by matrix_size and stride matrix_stride
  • input_batched is an array of nb_batch pointers to array of size matrix_size^matrix_count
  • output_batched is an array of nb_batch pointers to array of size matrix_size^matrix_count, to which the outputs will be added
  • workspace is an array of nb_batch pointers to array of size matrix_size^matrix_count, to be used as workspaces

Warnings

  • input_batched and workspace_batched will be used as temporary workspaces and thus modified
  • the matrices are assumed to be stored in col-major order
  • the sizes are assumed to be correct
  • the gpu version assumes that all the arrays have already been allocated on GPU (using cudaMalloc for example)