v0.5.0
This release changes the directory structure of the models cache, such that cached files from the same HuggingFace Hub repository are grouped in a separate subdirectory. This change is meant to simplify the process of manually removing specific models from the cache to free up space. As a result, the cache contents from prior versions are invalidated, so you most likely want to remove the current cache contents. To find the cache location run elixir -e 'Mix.install([{:bumblebee, "0.4.2"}]); IO.puts(Bumblebee.cache_dir())'
(defaults to the standard cache location for the given operating system).
We also reduced memory usage during parameter loading (both when loading onto the CPU and GPU directly). Previously, larger models sometimes required loading parameters using CPU and only then transfering to the GPU, in order to avoid running out of GPU memory during parameter transformations. With this release this should no longer be the case. Loading parameters now has barely any memory footprint other than the parameters themselves.
Added
- Notebook on LLaMA 2 to the docs (#259)
- Mistral model (#264)
- Projection head models for ClipText and ClipVision (#276)
- Support more rotary embedding options for LLaMA required for Deepseek Coder (#285)
- Temperature generation option (#290)
- GPTBigCode model (used by Starcoder) (#294)
- Automatic detection of diffusers params files (specifying
:params_filename
for Stable Diffusion models is no longer necessary) (#301) :seed
option to generation serving inputs (#303):params_variant
option toBumblebee.load_model/2
for loading parameters of different precision (#309):type
option toBumblebee.load_model/2
for loading model under a specific precision policy (#311)- LCM scheduler (#320)
- Token summary to text generation output (#336)
- DINOv2 model (#334)
:spec_overrides
option toBumblebee.load_model/2
(#340)- Support for attention sliding window in Mistral (#341)
Changed
- (Breaking) Text generation to always return only the new text (for some models it used to include the prompt) (#302)
- Deprecated all options in
Bumblebee.apply_tokenizer/3
, these should now be set on the tokenizer usingBumblebee.configure/2
(#310) - Reduced memory used when the
:preallocate_params
serving option is enabled (#317) - (Breaking) Changed image size to maps in image featurizers (#329)
- (Breaking) Renamed ViT and DeiT
:for_masked_image_modeling
output from:logits
to:pixel_values
- (Breaking) Renamed CLIP outputs
:text_embeddings
and:image_embeddings
to singular - (Breaking) Changed ResNet
:pooled_state
output to flatten the extra 1-sized axes - Cache directory structure to group files by repository (#332)
- (Breaking) Changed the output of
Bumblebee.Text.Generation.build_generate/4
to a map (#336) - Reduced memory usage during parameter loading (#344)
Removed
- Removed the serving
:seed
option in favour of a runtime, per-input seed (#303) - Conversational serving (#308)
- Specific tokenizer modules in favour of a single module (#310)
- Removed the deprecated
Bumblebee.Audio.speech_to_text/5
(in favour of the more specificspeech_to_text_whisper/5
)
Fixed
- Featurizer batch template when image size is a tuple
- Error in concatenating results when running servings as partitioned (#282)
- Decoder cache being casted with low precision policies (#299)
- Loading of more recent VAE KL checkpoints (#305)
- Tokenizers truncation to account for trailing special tokens (#307)
- Loading models with auth token from within a HuggingFace Space (#314)
- Zero-shot classification serving to handle uppercased entailment token in model config (#327)
- Fixed text generation when using lower precision and encoder-decoder models (such as Whisper) (#346)