Release v0.1.3 · tile-ai/tilelang

What's Changed

[Docker] Add libstdcxx-ng-12 to Dockerfiles for CUDA versions by @LeiWang1999 in #160
Add cpu jit with backend ctypes by @xs-keju in #154
[Carver] Multi-Threads Compilation for Fast Auto Tuning by @SiriusNEO in #156
[Refactor] Replace T.If with native Python if statement for mla paged kernel by @LeiWang1999 in #162
[Enhancement] Improve CUDA path detection by @xwhzz in #157
[Refactor] Replace T.thread_binding with T.get_thread_binding in examples and test cases by @LeiWang1999 in #163
[Bugfix] Cast bool dtype into int8 in blocksparse examples by @LeiWang1999 in #167
[Example] Implement NSA Decode tilelang exampls by @LeiWang1999 in #168
[Release] Bump version to v0.1.2.post1 by @LeiWang1999 in #166
Use SS-GEMM for PV in mla by @YouJiacheng in #165
[Example] Implement tilelang native sparse attention varlen example by @LeiWang1999 in #170
[Bugfix] Implement boundary check for the buffer shape with dynamic symbolic by @LeiWang1999 in #173
[AutoTune] Enable config-performance trace by @LeiWang1999 in #174
[Feat] Append Pass Context and TMA lowering configuration option by @LeiWang1999 in #175
[Feat] Introduce new caching mechanism for compiled kernels by @LeiWang1999 in #176
[Refactor] Enhance GPU Kernel Launch with Environment Thread Creation by @LeiWang1999 in #178
[Bugfix] Improve Thread Variable Handling in Layout Inference by @LeiWang1999 in #179
[Examples] Implement NSA Backward kernels by @LeiWang1999 in #180
[Enhancement] Optimize CMake build process with dynamic job count calculation by @LeiWang1999 in #183
[Bugfix] Add dynamic shape support with out_idx in Cython JIT kernel compilation by @LeiWang1999 in #185
[Dev][Bugfix] Add RMS Normalization Kernels and Fix Reduce Bug by @chengyupku in #188
[Dev] Add the failed nvcc command to the exception message by @penguin-wwy in #189
[Bugfix] Fix T.copy for scalar datatypes by @LeiWang1999 in #190
[Enhancement] Simplify GEMM example with direct kernel compilation by @LeiWang1999 in #191
[Bugfix] Make quickstart work properly on cu118 by @penguin-wwy in #193
[Language] Support clamp in language by @hyx1999 in #192
[Refactor] Add SetMaxNRegCollector to Improve Register Hint Handling in Warp Specialized Rewriter by @chengyupku in #194
[Feature] Add TMA Store Synchronization Support by @chengyupku in #195
Update expired example code. by @66RING in #196
[CMake] Add CUDA Major Version Detection for Conditional Compilation by @chengyupku in #197
[Feature] Support Async Pipeline inference within if scope by @LeiWang1999 in #198
[Dev] Add new example for FlashAttention with pipelined execution by @chengyupku in #200
[Enhancement] Enhancing the handling of conditional statements in the pipeline by @LeiWang1999 in #201
[Feature] Upgrade cutlass version and support fp8 T.gemm by @zqh-wz in #202
[Docker] Update Dockerfiles to specify exact version of libstdcxx-ng by @LeiWang1999 in #203
[Dev] Add GQA backward example by @chengyupku in #205
[LICENSE] Typo fix in LICENSE by @LeiWang1999 in #208
[Enhancement] Allow mma fallback when wgmma is not supported by @LeiWang1999 in #206
[Examples] Expand tuning configurations for FlashAttention example by @chenghuaWang in #204
[Enhancement] Avoid tvm ffi handling when out_idx is specified by @LeiWang1999 in #209
[Fix] Fix K // block_K to T.ceildiv(K,block_K) and add tests by @hyx1999 in #210
[Dev] Implement IfStmtBinding and MergeIfStmt transformations by @chengyupku in #211
[Language] Introduce T.reshape and T.view by @LeiWang1999 in #212
[Enhancement] Improve device handling in Cython kernel adapter by @LeiWang1999 in #220
[Enhancement] Update format script to support force compare with upstream by @LeiWang1999 in #221
[Refactor] Introduce KernelParam integration across modules by @LeiWang1999 in #223
[Bugfix] Fix mismatch of shared memory layout and mma atom on Hopper by @zqh-wz in #224
[Refactor] Update kernel compilation and profiling in examples by @chengyupku in #225
[Examples] Add fp8 gemm 2xAcc and deepgemm example by @cherichy in #217
[Doc] Add instructions for installing nightly version by @xwhzz in #226
[Bugfix] Disable force inline for ldmatrix by @LeiWang1999 in #227
[Bugfix] Support duplicate tma desc declaration by @LeiWang1999 in #228
[Refactor] Rename clamp functions and enhance dtype handling in tests by @LeiWang1999 in #232
[Enhancement] Simplify kernel source extraction in JIT adapters by @LeiWang1999 in #230
[Feature] Add reduce_max corresponding tests by @LeiWang1999 in #236
[BugFix] Fix bug of missing MBarrierExpectTX by @chengyupku in #241
[Refactor] Refactor for Better Layout Conflict Handling by @LeiWang1999 in #240
[Refactor] Align torch_assert_close tensor comparison with torch.testing.assert_close by @xwhzz in #239
[Dev] Implement FlashAttention3 Backward by @chengyupku in #244
[BugFix] Fix bug of mismatching dtype in testing by @xwhzz in #245
[Enhancement] Add zero initialization option to GEMM operations by @chengyupku in #246
[Enhancement][CUDA] Avoid C7508 for CUDA backend via assigning default value to minBlocksPerMultiprocesor by @cherichy in #248
[Feature] Add database storage for JITKernel cache with Cython and Ctypes adapters by @Alex4210987 in #213
[Examples] Implement elementwise add kernel by @chenghuaWang in #219
[Refactor] Phaseout LLVM Dependency by Making it Optional by @LeiWang1999 in #247
[Readme] Update Bib Citation Section by @LeiWang1999 in #249
[Enhancement] Support float variable as arguments by @LeiWang1999 in #250
add autotune to example_gemm.py by @yyttt6 in #252
[Language] Introduce T.alloc_var to define a variable like int var; by @LeiWang1999 in #255
[Example] Implement Kernel Example cumsum by @LeiWang1999 in #258
[Refactor] Refactor CUDA post-processing callback registration in TileLang by @LeiWang1999 in #259
[Refactor] Move compilation outside critical section by @YouJiacheng in #260
[CI] Use auditwheel to generate manylinux wheels by @oraluben in #251
[Bugfix] Fix Benchmark/Example Code for Autotuning by @SiriusNEO in #254
[Language] Enhance alias to support blockwise memory load by @LeiWang1999 in #261
[Bugfix] Fix auto tuning tma handling by @LeiWang1999 in #263
[Release] Bump version to 0.1.3 by @LeiWang1999 in #264

New Contributors

@xs-keju made their first contribution in #154
@YouJiacheng made their first contribution in #165
@penguin-wwy made their first contribution in #189
@hyx1999 made their first contribution in #192
@66RING made their first contribution in #196
@zqh-wz made their first contribution in #202
@chenghuaWang made their first contribution in #204
@cherichy made their first contribution in #217
@Alex4210987 made their first contribution in #213
@yyttt6 made their first contribution in #252
@oraluben made their first contribution in #251

Full Changelog: v0.1.2...v0.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!