diff --git a/README.md b/README.md index 5d4cefc52c..af0ede4d33 100644 --- a/README.md +++ b/README.md @@ -92,7 +92,7 @@ Mistal.rs supports several model categories: - [Details](docs/QUANTS.md) - GGML: 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit, with ISQ support. - GPTQ: 2-bit, 3-bit, 4-bit and 8-bit -- HQQ: 4-bit and 8 bit, with ISQ support +- HQQ: 4-bit and 8-bit, with ISQ support **Powerful**: - LoRA support with weight merging @@ -569,7 +569,7 @@ Mistral.rs will attempt to automatically load a chat template and tokenizer. Thi ## Contributing -Thank you for contributing! If you have any problems or want to contribute something, please raise an issue or pull request. +Thank you for contributing! If you have any problems or want to contribute something, please raise an issue or pull a request. If you want to add a new model, please contact us via an issue and we can coordinate how to do this. ## FAQ @@ -582,7 +582,7 @@ If you want to add a new model, please contact us via an issue and we can coordi - Error: `recompile with -fPIE`: - Some Linux distributions require compiling with `-fPIE`. - Set the `CUDA_NVCC_FLAGS` environment variable to `-fPIE` during build: `CUDA_NVCC_FLAGS=-fPIE` -- Error `CUDA_ERROR_NOT_FOUND` or symbol not found when using a normal or vison model: +- Error `CUDA_ERROR_NOT_FOUND` or symbol not found when using a normal or vision model: - For non-quantized models, you can specify the data type to load and run in. This must be one of `f32`, `f16`, `bf16` or `auto` to choose based on the device. ## Credits diff --git a/docs/ADAPTER_MODELS.md b/docs/ADAPTER_MODELS.md index 0e2286908b..60564de8db 100644 --- a/docs/ADAPTER_MODELS.md +++ b/docs/ADAPTER_MODELS.md @@ -59,7 +59,7 @@ An ordering JSON file for LoRA contains 2 major parts: - Specifies the adapter name and the model ID to find them, which may be a local path. ### Preparing the ordering file (LoRA or X-LoRA cases) -There are 2 scripts to prepare the ordering file and which work for both X-LoRA and LoRA. The ordering file is specific to each architecture and set of target modules. Therefore, if either are changed, it is necessary to create a new ordering file using the first option. If only the adapter order or adapters changed, then it the second option should be used. +There are 2 scripts to prepare the ordering file and which work for both X-LoRA and LoRA. The ordering file is specific to each architecture and set of target modules. Therefore, if either is changed, it is necessary to create a new ordering file using the first option. If only the adapter order or adapters changed, then it the second option should be used. 1) From scratch: No ordering file for the architecture and target modules @@ -102,4 +102,4 @@ To use this feature, you should add a `preload_adapters` key to your ordering fi This allows mistral.rs to preload the adapter and enable runtime activation. -We also provide a script to add this key to your existing order file: [`load_add_preload_adapters.py`](../scripts/lora_add_preload_adapters.py). \ No newline at end of file +We also provide a script to add this key to your existing order file: [`load_add_preload_adapters.py`](../scripts/lora_add_preload_adapters.py). diff --git a/docs/ANYMOE.md b/docs/ANYMOE.md index 07ad1c63da..83fbcf1926 100644 --- a/docs/ANYMOE.md +++ b/docs/ANYMOE.md @@ -12,7 +12,7 @@ Paper: https://arxiv.org/abs/2405.19076 https://github.com/EricLBuehler/mistral.rs/assets/65165915/33593903-d907-4c08-a0ac-d349d7bf33de -> Note: By default, this has the capability to create an csv loss image. When building from source (for Python or CLI), you may use `--no-default-features` command line to disable this. This may be necessary if networking is unavailable. +> Note: By default, this has the capability to create a csv loss image. When building from source (for Python or CLI), you may use `--no-default-features` command line to disable this. This may be necessary if networking is unavailable. ## Dataset Currently, AnyMoE expects a JSON dataset with one top-level key `row`, which is an array of objects with keys `prompt` (string), `expert` (integer), and `image_urls` (optional array of strings). For example: @@ -35,7 +35,7 @@ Currently, AnyMoE expects a JSON dataset with one top-level key `row`, which is For a vision model, `image_urls` may contain an array of image URLs/local paths or Base64 encoded images. ## Experts -AnyMoE experts can be either fine-tuned models or LoRA adapter models. Only the mlp layers will be loaded from each. The experts must be homogeneous: they must be all fine-tuned or all adapter. Additionally, certain layers can be specified to apply AnyMoE. +AnyMoE experts can be either fine-tuned models or LoRA adapter models. Only the mlp layers will be loaded from each. The experts must be homogeneous: they must be all fine-tuned or all adapters. Additionally, certain layers can be specified to apply AnyMoE. > Note: When using LoRA adapter experts, it may not be necessary to set the layers where AnyMoE will be applied due to the lower memory usage. @@ -185,7 +185,7 @@ async fn main() -> Result<()> { let messages = TextMessages::new() .add_message( TextMessageRole::System, - "You are an AI agent with a specialty in programming.", + "You are an AI agent with a speciality in programming.", ) .add_message( TextMessageRole::User, diff --git a/docs/IDEFICS2.md b/docs/IDEFICS2.md index eca2ab6ba3..7b22cc7efe 100644 --- a/docs/IDEFICS2.md +++ b/docs/IDEFICS2.md @@ -2,7 +2,7 @@ The Idefics 2 Model has support in the Rust, Python, and HTTP APIs. The Idefics 2 Model also supports ISQ for increased performance. -> Note: Some of examples use our [Cephalo model series](https://huggingface.co/collections/lamm-mit/cephalo-664f3342267c4890d2f46b33) but could be used with any model ID. +> Note: Some of the examples use our [Cephalo model series](https://huggingface.co/collections/lamm-mit/cephalo-664f3342267c4890d2f46b33) but could be used with any model ID. The Python and HTTP APIs support sending images as: - URL @@ -183,4 +183,4 @@ print(res.usage) ``` - You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py). -- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py). \ No newline at end of file +- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py). diff --git a/docs/LLaVA.md b/docs/LLaVA.md index 6bc43fa394..5981de0fe3 100644 --- a/docs/LLaVA.md +++ b/docs/LLaVA.md @@ -8,7 +8,7 @@ This implementation supports both LLaVA and LLaVANext(which adds multi resolutio * llava-hf/llava-1.5-7b-hf -The LLaVA and LLaVANext Model has support in the Rust, Python, and HTTP APIs. The LLaVA and LLaVANext Model also supports ISQ for increased performance. +The LLaVA and LLaVANext Model have support in the Rust, Python, and HTTP APIs. The LLaVA and LLaVANext Models also support ISQ for increased performance. The Python and HTTP APIs support sending images as: - URL @@ -101,7 +101,7 @@ print(resp) ## Rust You can find this example [here](../mistralrs/examples/llava_next/main.rs). -This is a minimal example of running the LLaVA and LLaVANext model with a dummy image. +This is a minimal example of running the LLaVA and LLaVANext models with a dummy image. ```rust use anyhow::Result; @@ -192,4 +192,4 @@ print(res.usage) ``` - You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py). -- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py). \ No newline at end of file +- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py). diff --git a/docs/LORA_XLORA.md b/docs/LORA_XLORA.md index cd1a0d02f5..5a195ff8bd 100644 --- a/docs/LORA_XLORA.md +++ b/docs/LORA_XLORA.md @@ -2,7 +2,7 @@ - X-LoRA with no quantization -To start an X-LoRA server with the exactly as presented in [the paper](https://arxiv.org/abs/2402.07148): +To start an X-LoRA server exactly as presented in [the paper](https://arxiv.org/abs/2402.07148): ```bash ./mistralrs-server --port 1234 x-lora-plain -o orderings/xlora-paper-ordering.json -x lamm-mit/x-lora @@ -15,4 +15,4 @@ To start an LoRA server with adapters from the X-LoRA paper (you should modify t ./mistralrs-server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q8_0.gguf -a lamm-mit/x-lora ``` -Normally with a LoRA model you would use a custom ordering file. However, for this example we use the ordering from the X-LoRA paper because we are using the adapters from the X-LoRA paper. \ No newline at end of file +Normally with a LoRA model, you would use a custom ordering file. However, for this example, we use the ordering from the X-LoRA paper because we are using the adapters from the X-LoRA paper.