Skip to content
/ hpc Public

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

License

Notifications You must be signed in to change notification settings

cjmcv/hpc

Repository files navigation

Learning and practice of high performance computing and ai infra

Application

pocket-ai -- A Portable Toolkit for building AI Infra.

https://github.com/cjmcv/pocket-ai

  • engine/cl: A small computing framework based on opencl. This framework is designed to help you quickly call Opencl API to do the calculations you need.

  • engine/vk: A small computing framework based on vulkan. This framework is designed to help you quickly call vulkan's computing API to do the calculations you need.

  • engine/graph: A small multitasking scheduler that can quickly build efficient pipelines for your multiple tasks.

  • engine/infer: A tiny inference engine for microprocessors, with a library size of only 10K+.

  • eval/llm: A small tool is used to quickly verify whether the end-to-end calculation results are correct when accelerating and optimizing the large language model (LLM) inference engine.

  • Other small tools.

Reading Notes

sglang, vllm

Practice

cux -- An experimental framework for performance analysis and optimization of CUDA kernel functions.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/cux

tag: cuda / simd / openmp.

mrpc -- Mini-RPC, based on asio.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/mrpc

tag: distributed computing.

DEPRECATED

hcs A heterogeneous computing system for multi-task scheduling optimization.

vky A Vulkan-based computing framework

"hcs" and "vky" have been moved to pocket-ai and renamed as graph and vk respectively.


Learning

Heterogeneous computing

cuda
vulkan
opencl
  • basic_demo : Introduce the basic calling method and process of OpenCL API (without using pocket-ai).
  • gemm_f32 : Gemm fp32 for Discrete graphics card.
  • gemm_mobile_f32 : Gemm fp32 for integrated graphics card.

SIMD

neon
sse/avx

Distributed computing

mpi/mpi4py

Thread

std
openmp
tbb

Coroutines

libco
asyncio

About

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published