v0.1.7 #1383
LeiWang1999
announced in
Announcements
v0.1.7
#1383
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What's Changed
seq_q<seq_kvin flash attention examples by @Rachmanino in [Bugfix] Ensure correct handling for cases whereseq_q<seq_kvin flash attention examples #864B[i,j] = c[i] + A[i,j]by @kurisu6912 in [Fix] Fix bug 0905: tilelang doesn't vectorizeB[i,j] = c[i] + A[i,j]#798ExprDeepEqualinstead ofStructuralEqualwhen merge consecutive If stmt by @LeiWang1999 in [Bugfix] UseExprDeepEqualinstead ofStructuralEqualwhen merge consecutive If stmt #876T.ieee_rsqrtand related high precision op by @LeiWang1999 in [Precision] IntroduceT.ieee_rsqrtand related high precision op #882atomic_addperformance for bwd examples by @LeiWang1999 in [Example] Introduce split+sum template, and optimizeatomic_addperformance for bwd examples #940bfloat16and user-definedsm_scalein attention sink examples by @Rachmanino in [Example] Add support forbfloat16and user-definedsm_scalein attention sink examples #924pre-commitintegration by @XuehaiPan in [CI] addpre-commitintegration #955CumSum1Dby @LeiWang1999 in [TileOp] ImplememtCumSum1D#978T.alloc_varfor AugAssign and AnnAsign by @LeiWang1999 in [Language] EnhanceT.alloc_varfor AugAssign and AnnAsign #979InjectFenceProxyand expose some warp group primitives in frontend by @LeiWang1999 in [Refactor] Refactor PassInjectFenceProxyand expose some warp group primitives in frontend #977access_ptr("r")instead ofaccess_ptr("w")for correct pipeline analysis by @LeiWang1999 in [Bugfix] Useaccess_ptr("r")instead ofaccess_ptr("w")for correct pipeline analysis #983torch.accelerator.synchronize()totorch.cuda.synchronize()by @yyttt6 in [Bugfix] Fallbacktorch.accelerator.synchronize()totorch.cuda.synchronize()#987LowerIntrinfrom tvm into tilelang by @LeiWang1999 in [Transform] MigrateLowerIntrinfrom tvm into tilelang #999LowerIntrinby @LeiWang1999 in [TIR] Revert some changes of PassLowerIntrin#1035TL_LIBSby @LeiWang1999 in [Env] Optimize the mechanism for locatingTL_LIBS#1038T.get_warp_idx_syncandT.shuffle_electfor efficient thread election by @LeiWang1999 in [Language] ExposeT.get_warp_idx_syncandT.shuffle_electfor efficient thread election #989has_simt_copyto decide whether to insertset_max_nregby @chengyupku in [Refactor] Usehas_simt_copyto decide whether to insertset_max_nreg#982LegalizeSafeMemoryAccessto support recursive load/store rewrite by @SiriusNEO in [Refactor] Refactor PassLegalizeSafeMemoryAccessto support recursive load/store rewrite #1050T.Parallelwith dynamic extents by @LeiWang1999 in [Parallel] SupportT.Parallelwith dynamic extents #990tileang.clear_cache()by @LeiWang1999 in [Cache] raise errors fortileang.clear_cache()#1077T.dynamicinstead ofT.symbolicby @LeiWang1999 in [Language] Recommend usingT.dynamicinstead ofT.symbolic#1076T.reduce_with shared memory input/output by @LeiWang1999 in [Language] EfficientT.reduce_with shared memory input/output #1080tilelang_cythonand relocate its path by @LeiWang1999 in [Refactor] Rename cython output totilelang_cythonand relocate its path #1086tilelang.disable_cache()calls from examples and tests by @Rachmanino in [Cleanup] Removetilelang.disable_cache()calls from examples and tests #1088TL_STORAGE_REWRITE_DETECT_INPLACEby @LeiWang1999 in [PassConfig] Introduce PassConfigTL_STORAGE_REWRITE_DETECT_INPLACE#1089alloc_var(dtype, init=x)by @LeiWang1999 in [Language] Support tilelangalloc_var(dtype, init=x)#1092cuTensorMapEncodeIm2colcall by @chengyupku in [Bugfix] Fix missing hostcuTensorMapEncodeIm2colcall #1094format.shand addclang-tidyto GHA workflow by @XuehaiPan in [CI][Lint] Retireformat.shand addclang-tidyto GHA workflow #1044ldmatrixand update mamba scan kernel by @chengyupku in [Refactor] Use forceinline inldmatrixand update mamba scan kernel #1104format.shby @XuehaiPan in [Maint] Update uncommitted change detection command informat.sh#1102T.ptrandT.Tensorby @xwhzz in [Feature] Support None type as input forT.ptrandT.Tensor#1114fence_barrier_initprimitive after mbarrier init by @chengyupku in [Enhancement] Add missingfence_barrier_initprimitive after mbarrier init #1121format.shand introduce loop carry thread sync unit test by @LeiWang1999 in [CI] allow dirty workspace forformat.shand introduce loop carry thread sync unit test #1153T.warpgroup_fence_operandfor nvcc code motion by @LeiWang1999 in [Language] ExposeT.warpgroup_fence_operandfor nvcc code motion #986cibuildwheeland reduce size of sdist by @oraluben in [Release] Unify local build scripts to usecibuildwheeland reduce size of sdist #1171tl.infinityoperator for infinity handling of bfloat16 by @Rachmanino in [Feature] Addtl.infinityoperator for infinity handling of bfloat16 #1175ccachefor CIBW on Linux by @oraluben in [CI] Enableccachefor CIBW on Linux #1184T.serialwith step and negative step by @kurisu6912 in [Feat] Add support forT.serialwith step and negative step #1188reduce.hby @LJC00118 in Fix type errors inreduce.h#1204libtvmas a dep oflibtilelangby @oraluben in [Build] Explicitly addlibtvmas a dep oflibtilelang#1215CompleteBufferFragmentby @LeiWang1999 in [Refactor] Simplify logic in theCompleteBufferFragment#1226builder.pyby @LJC00118 in [Bugfix] Minor fix inbuilder.py#1235-infinstead of clearing accumulators. by @Rachmanino in [BugFix] Refactor attention kernel to handle OOB positions by filling with-infinstead of clearing accumulators. #1222from __future__ import annotationsfor python 3.8 by @oraluben in [Minor] Removefrom __future__ import annotationsfor python 3.8 #1273int64_tstatic and dynamic shape. by @Elevator14B in Fix various issues underint64_tstatic and dynamic shape. #1218T.view/reshapeby @SiriusNEO in [Language] Add shape check inT.view/reshape#1277T.printfor bool type by @LeiWang1999 in [Bugfix] Supply missingT.printfor bool type #1279T.Tensor(n * 2 + 1)in function annotation by @kurisu6912 in [Feat] Add support for usingT.Tensor(n * 2 + 1)in function annotation #1285T.Varannotation by @kurisu6912 in [Feat] Add missing support to pass reference byT.Varannotation #1291T.assumehandling by @LJC00118 in Improve memory access safety andT.assumehandling #1292T.printby @xwhzz in [Enhancement] Support more dtype inT.print#1329NormalizeToBufferRegionandMakeAccessPtrFromRegionto utils by @LeiWang1999 in [Refactor] MovingNormalizeToBufferRegionandMakeAccessPtrFromRegionto utils #1333T.copyhas bad behavior from global memory to local memory #1304) by @kurisu6912 in [Fix] Fix bug copying from or to local buffer (#1304) #1324notoperator in frontend ([BUG] ValueError: Cannot use and / or / not operator to Expr, hint: use tvm.tir.all / tvm.tir.any instead #1347) by @kurisu6912 in [Fix] Fix missingnotoperator in frontend (#1347) #1348T.gemm_sp_v2on sm80 and sm89 by @botbw in [Language] supportT.gemm_sp_v2on sm80 and sm89 #1056New Contributors
int64_tstatic and dynamic shape. #1218Full Changelog: 0.1.6...v0.1.7
This discussion was created from the release v0.1.7.
Beta Was this translation helpful? Give feedback.
All reactions