Releases: modelscope/dash-infer
Releases · modelscope/dash-infer
v2.0.0
update sampling, prefix cache, json mode impl (#55) - engine: stop and release model when engine release, and remove deprecated lock - sampling: generate_op heavily modified, remove dependency on global tensors - prefix cache: some bug fix, impove evict performance - json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p - span-attention: move span_attn decoderReshape to init - lora: add docs, fix typo - ubuntu: add ubuntu dockerfile, fix install dir err - bugifx: fix multi-batch rep_penlty bug
v2.0.0-rc3
some bugfix - uuid crash issue - update lora implement - set page size by param - delete deprecated files
v2.0.0-rc2
release script: reduce python wheel size (#46)
v1.3.0
Highlight
- Support Baichuan-7B and Baichuan2-7B & 13B by @WangNorthSea in #38
Full Changelog: v1.2.1...v1.3.0
v1.2.1
v1.2.0
expand context length to 32K & support flash attention on intel-avx512 platform
- remove currently unsupported cache mode
- examples: update qwen prompt template, add print func to examples
- support glm-4-9b-chat by
- change to size_t to avoid overflow when seq is long
- update README since we support 32k context length
- Add flash attention on intel-avx512 platform