Releases · modelscope/dash-infer

21 Jan 07:31

v2.0.0

012eb1b

v2.0.0 Latest

Latest

update sampling, prefix cache, json mode impl (#55)

- engine: stop and release model when engine release, and remove deprecated lock
- sampling: generate_op heavily modified, remove dependency on global tensors
- prefix cache: some bug fix, impove evict performance
- json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p
- span-attention: move span_attn decoderReshape to init
- lora: add docs, fix typo
- ubuntu: add ubuntu dockerfile, fix install dir err
- bugifx: fix multi-batch rep_penlty bug

Assets 14

dashinfer-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

655 MB 2025-01-21T07:36:07Z
dashinfer-2.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

655 MB 2025-01-21T07:36:07Z
dashinfer-2.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

655 MB 2025-01-21T07:36:07Z
DashInfer-2.0.0.cpu.aarch64.tar.gz

11.3 MB 2025-01-21T07:31:54Z
DashInfer-2.0.0.cpu.x86_64.tar.gz

17.1 MB 2025-01-21T07:36:07Z
DashInfer-2.0.0.cuda-12.4-shared.x86_64.tar.gz

823 MB 2025-01-21T07:36:06Z
dashinfer_cpu-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

18.1 MB 2025-01-21T07:36:07Z
dashinfer_cpu-2.0.0-cp310-cp310-manylinux_2_28_aarch64.whl

12.9 MB 2025-01-21T07:31:54Z
dashinfer_cpu-2.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

18.1 MB 2025-01-21T07:36:06Z
dashinfer_cpu-2.0.0-cp38-cp38-manylinux_2_28_aarch64.whl

12.9 MB 2025-01-21T07:31:54Z
Source code (zip)

2025-01-20T11:53:53Z
Source code (tar.gz)

2025-01-20T11:53:53Z

20 Dec 13:10

github-actions

v2.0.0-rc3

163850f

v2.0.0-rc3

some bugfix

- uuid crash issue
- update lora implement
- set page size by param
- delete deprecated files

Assets 14

17 Dec 12:29

github-actions

v2.0.0-rc2

1b2a6ad

v2.0.0-rc2

release script: reduce python wheel size (#46)

Assets 14

27 Aug 03:33

yejunjin

v1.3.0

2e7ea7b

v1.3.0

Highlight

Support Baichuan-7B and Baichuan2-7B & 13B by @WangNorthSea in #38

Full Changelog: v1.2.1...v1.3.0

Contributors

WangNorthSea

Assets 12

01 Jul 03:28

yejunjin

v1.2.1

5ceddf9

v1.2.1

What's Changed

Add llama.cpp benchmark steps
fix: fallback to mha without avx512f support
solve security issue; helper: bugfix, cpu platform check
add release package workflow

Assets 13

24 Jun 05:32

yejunjin

v1.2.0

3a0417b

v1.2.0

expand context length to 32K & support flash attention on intel-avx512 platform

remove currently unsupported cache mode
examples: update qwen prompt template, add print func to examples
support glm-4-9b-chat by
change to size_t to avoid overflow when seq is long
update README since we support 32k context length
Add flash attention on intel-avx512 platform

Assets 13

29 May 08:32

laiwenzh

v1.1.0

1b9b010

v1.1.0

support Qwen2, change dashinfer model extensions

support Qwen2, add model_type Qwen_v20
change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
python example: remove xxx_quantize.json config file, use command line arg instead

Assets 13

14 May 05:50

laiwenzh

v1.0.4

9ef6e35

v1.0.4

First official release.

Assets 13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlight

Contributors

What's Changed

Releases: modelscope/dash-infer

v2.0.0

v2.0.0-rc3

v2.0.0-rc2

v1.3.0

Highlight

Contributors

v1.2.1

What's Changed

v1.2.0

v1.1.0

v1.0.4