Skip to content

fixing the link for adv. quant tutorial and remaning quark link #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/llm/high_level_python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
High-Level Python SDK
#####################

A Python environment offers flexibility for experimenting with LLMs, profiling them, and integrating them into Python applications. We use the `Lemonade SDK <https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md>`_ to get up and running quickly.
A Python environment offers flexibility for experimenting with LLMs, profiling them, and integrating them into Python applications. We use the `Lemonade SDK <https://github.com/lemonade-sdk/lemonade>`_ to get up and running quickly.

To get started, follow these instructions.

Expand All @@ -35,7 +35,7 @@ To create and set up an environment, run these commands in your terminal:

conda create -n ryzenai-llm python=3.10
conda activate ryzenai-llm
pip install turnkeyml[llm-oga-hybrid]
pip install lemonade-sdk[llm-oga-hybrid]
lemonade-install --ryzenai hybrid

****************
Expand Down
4 changes: 2 additions & 2 deletions docs/llm/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ The Server Interface provides a convenient means to integrate with applications

To get started with the server interface, follow these instructions: :doc:`server_interface`.

For example applications that have been tested with Lemonade Server, see the `Lemonade Server Examples <https://github.com/onnx/turnkeyml/tree/main/examples/lemonade/server>`_.
For example applications that have been tested with Lemonade Server, see the `Lemonade Server Examples <https://github.com/lemonade-sdk/lemonade/tree/main/docs/server/apps>`_.


OGA APIs for C++ Libraries and Python
Expand Down Expand Up @@ -170,7 +170,7 @@ The comprehensive set of pre-optimized models for hybrid execution used in these
- 8.9x
- 🟢

The :ref:`ryzen-ai-oga-featured-llms` table was compiled using validation, benchmarking, and accuracy metrics as measured by the `ONNX TurnkeyML v6.1.0 <https://pypi.org/project/turnkeyml/6.1.0/>`_ ``lemonade`` commands in each example link.
The :ref:`ryzen-ai-oga-featured-llms` table was compiled using validation, benchmarking, and accuracy metrics as measured by the `ONNX TurnkeyML v6.1.0 <https://pypi.org/project/turnkeyml/6.1.0/>`_ ``lemonade`` commands in each example link. After this table was created, the Lemonade SDK moved to the new location found `here <https://github.com/lemonade-sdk/lemonade>`_.

Data collection details:

Expand Down
10 changes: 5 additions & 5 deletions docs/llm/server_interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ Server Setup
Lemonade Server can be installed via the Lemonade Server Installer executable by following these steps:

1. Make sure your system has the recommended Ryzen AI driver installed as described in :ref:`install-driver`.
2. Download and install ``Lemonade_Server_Installer.exe`` from the `latest TurnkeyML release <https://github.com/onnx/turnkeyml/releases>`_.
2. Download and install ``Lemonade_Server_Installer.exe`` from the `latest Lemonade release <https://github.com/lemonade-sdk/lemonade/releases>`_.
3. Launch the server by double-clicking the ``lemonade_server`` shortcut added to your desktop.

See the `Lemonade Server Installation Guide <https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/lemonade_server_exe.md>`_ for more details.
See the `Lemonade Server README <https://github.com/lemonade-sdk/lemonade/blob/main/docs/server/README.md>`_ for more details.

************
Server Usage
Expand All @@ -38,7 +38,7 @@ The Lemonade Server provides the following OpenAI-compatible endpoints:
- POST ``/api/v0/completions`` - Text Completions (prompt to completion)
- GET ``/api/v0/models`` - List available models

Please refer to the `server specification <https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md>`_ document in the Lemonade repository for details about the request and response formats for each endpoint.
Please refer to the `server specification <https://github.com/lemonade-sdk/lemonade/blob/main/docs/server/server_spec.md>`_ document in the Lemonade repository for details about the request and response formats for each endpoint.

The `OpenAI API documentation <https://platform.openai.com/docs/guides/streaming-responses?api-mode=chat>`_ also has code examples for integrating streaming completions into an application.

Expand Down Expand Up @@ -75,8 +75,8 @@ Instructions:
Next Steps
**********

- See `Lemonade Server Examples <https://github.com/onnx/turnkeyml/tree/main/examples/lemonade/server>`_ to find applications that have been tested with Lemonade Server.
- Check out the `Lemonade Server specification <https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md>`_ to learn more about supported features.
- See `Lemonade Server Examples <https://github.com/lemonade-sdk/lemonade/tree/main/docs/server/apps>`_ to find applications that have been tested with Lemonade Server.
- Check out the `Lemonade Server specification <https://github.com/lemonade-sdk/lemonade/blob/main/docs/server/server_spec.md>`_ to learn more about supported features.
- Try out your Lemonade Server install with any application that uses the OpenAI chat completions API.


Expand Down
4 changes: 2 additions & 2 deletions docs/model_quantization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ For more details
~~~~~~~~~~~~~~~~
- `AMD Quark Tutorial <https://github.com/amd/RyzenAI-SW/tree/main/tutorial/quark_quantization>`_ for Ryzen AI Deployment
- Running INT8 model on NPU using :doc:`Getting Started Tutorial <getstartex>`
- Advanced quantization techniques `Fast Finetuning and Cross Layer Equalization <https://gitenterprise.xilinx.com/VitisAI/RyzenAI-SW/blob/dev/tutorial/quark_quantization/docs/advanced_quant_readme.md>`_ for INT8 model
- Advanced quantization techniques `Fast Finetuning and Cross Layer Equalization <https://github.com/amd/RyzenAI-SW/blob/main/tutorial/quark_quantization/docs/advanced_quant_readme.md>`_ for INT8 model


BF16 Examples
Expand All @@ -65,7 +65,7 @@ For more details
- `Image Classification <https://github.com/amd/RyzenAI-SW/tree/main/example/image_classification>`_ using ResNet50 to run BF16 model on NPU
- `Finetuned DistilBERT for Text Classification <https://github.com/amd/RyzenAI-SW/tree/main/example/DistilBERT_text_classification_bf16>`_
- `Text Embedding Model Alibaba-NLP/gte-large-en-v1.5 <https://github.com/amd/RyzenAI-SW/tree/main/example/GTE>`_
- Advanced quantization techniques `Fast Finetuning <https://quark.docs.amd.com/latest/supported_accelerators/ryzenai/tutorial_convert_fp32_or_fp16_to_bf16.html>`_ for BF16 models.
- Advanced quantization techniques `FP32/FP16 to BF16 Conversion <https://quark.docs.amd.com/latest/supported_accelerators/ryzenai/tutorial_convert_fp32_or_fp16_to_bf16.html>`_ for BF16 models.


..
Expand Down
3 changes: 1 addition & 2 deletions docs/modelrun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,15 +327,14 @@ Python example:
)


**NOTE**: When compiling with encryptionKey, ensure that any existing cache directory (either the default cache directory or the directory specified by the ``cache_dir`` provider option) is deleted before compiling.
|memo| **NOTE**: When compiling with encryptionKey, ensure that any existing cache directory (either the default cache directory or the directory specified by the ``cache_dir`` provider option) is deleted before compiling.

|

**************************
Operator Assignment Report
**************************


Vitis AI EP generates a file named ``vitisai_ep_report.json`` that provides a report on model operator assignments across CPU and NPU. This file is automatically generated in the cache directory if no explicit cache location is specified in the code. This report includes information such as the total number of nodes, the list of operator types in the model, and which nodes and operators runs on the NPU or on the CPU. Additionally, the report includes node statistics, such as input to a node, the applied operation, and output from the node.


Expand Down
8 changes: 5 additions & 3 deletions docs/relnotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,11 @@ Version 1.4

- Known Issues:

- LT might cause warnings or crashes when running concurrently with other MSFT Copilot apps
- Recall app might stop functioning; NPU driver and workloads are expected to continue to work
- Cocreator app does not close contexts quickly and might cause contexts to be limited due to remaining contexts still open
- Microsoft Windows Insider Program (WIP) users may see warnings or need to restart when running all applications concurrently.

- NPU driver and workloads will continue to work.

- Context creation may appear to be limited when some application do not close contexts quickly.


***********
Expand Down