#

cublas

Here are 81 public repositories matching this topic...

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jun 8, 2024
Python

lebedov / scikit-cuda

Python interface to GPU-powered libraries

python gpu cuda cublas blas lapack numerical cufft pycuda cusolver

Updated Oct 15, 2023
Python

coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

rust gpu cuda cublas gpu-acceleration cuda-kernels cuda-toolkit curand cuda-programming nvrtc

Updated Jun 6, 2024
Rust

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm

Updated Nov 7, 2023
Cuda

Bruce-Lee-LY / cuda_hook

Hooked CUDA-related dynamic libraries by using automated code generation tools.

Updated Dec 12, 2023
C

hma02 / cublasgemm-benchmark

code for benchmarking GPU performance based on cublasSgemm and cublasHgemm

benchmarking gpu cuda cublas gemm gpu-performance

Updated May 20, 2022
Cuda

rbaygildin / learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

gpu opencl parallel-computing cuda image-processing cublas nvidia gpgpu gpu-computing pycuda curand

Updated Jan 18, 2022
Cuda

hma02 / cublasHgemm-P100

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

gpu cublas precision gemm half-precision float16 p100 v100

Updated Aug 20, 2019
Cuda

codingonion / awesome-cuda-tensorrt-fpga

🔥🔥🔥 A collection of some awesome public NVIDIA CUDA, cuBLAS, cuDNN, TensorRT, AMD ROCm and FPGA projects.

awesome fpga mojo gpu cuda pytorch cublas nvidia yolo blas web3 hdl cudnn tensorrt zkp yolov5 large-language-models llm llama3 yolov10

Updated Jun 6, 2024

rxwei / cuda-swift

Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper

swift gpu parallel cuda cublas

Updated Mar 27, 2017
Swift

devincody / DSAbeamformer

Real-time GPU Beamformer for DSA110 written in C/CUDA

gpu multiprocessing openmp cuda cublas radio-astronomy beamforming

Updated May 21, 2019
Jupyter Notebook

eth-cscs / Tiled-MM

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

amd gpu cuda cublas nvidia matrix-multiplication rocm cublasxt matmul rocblasxt rocblas

Updated May 29, 2024
C++

Smorodov / nano_bfm

Basel morphable face model mesh and texture generator using GPU.

cublas face-reconstruction face-morphing bfm face-generation basel-face-model 3d-face-reconstruction morphable-model face-generator face-morphable-model

Updated Sep 14, 2020
C

conradsnicta / bandicoot-code

Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io

c-plus-plus machine-learning gpu opencl linear-algebra cuda cublas matrix-functions scientific-computing gpu-acceleration armadillo opencl-kernels cuda-kernels gpu-computing linear-algebra-library matrix-library clblas cusolver gpu-accelerated-library

Updated Jul 19, 2023

sasagawa888 / deeppipe2

Deep Learning library using GPU(CUDA/cuBLAS)

elixir deep-learning gpu cuda cublas

Updated Sep 18, 2021
Elixir

mz24cn / gemm_optimization

The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能，提供binary，开盒即用。

opencl cublas matrix-multiplication blas gemm mkl clblas sgemm clblast gemm-optimization clnet

Updated Mar 28, 2019
C

Bruce-Lee-LY / cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda cublas nvidia gemm gemv matrix-multiply tensor-core hgemm cuda-core hgemv

Updated Nov 30, 2023
Cuda

TApplencourt / mkl-verbose-toolkit

Tools to run and parse MKL verbose mode

cublas mkl oneapi

Updated Jun 28, 2022
Python

nikulukani / pycublasxt

Python interface to the NVIDIA CublasXt API

python gpu linear-algebra cuda cublas multigpu cublasxt

Updated Apr 5, 2019
C++

Bruce-Lee-LY / cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm back2back-hgemm fused-hgemm back2back-gemm fused-gemm

Updated Nov 3, 2023
Cuda

Improve this page

Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."