diff --git a/1_python/1_getting-started/project-setup.md b/1_python/1_getting-started/project-setup.md index 4c81fdb..f3c20ce 100644 --- a/1_python/1_getting-started/project-setup.md +++ b/1_python/1_getting-started/project-setup.md @@ -5,13 +5,13 @@ description: "Set up your `lmstudio-python` app or script." index: 2 --- -`lmstudio` is a library published on Python that allows you to use `lmstudio-python` in your own projects. +`lmstudio` is a library published on PyPI that allows you to use `lmstudio-python` in your own projects. It is open source and developed on GitHub. You can find the source code [here](https://github.com/lmstudio-ai/lmstudio-python). ## Installing `lmstudio-python` -As it is published to Python, `lmstudio-python` may be installed using `pip` +As it is published to PyPI, `lmstudio-python` may be installed using `pip` or your preferred project dependency manager (`pdm` is shown, but other Python project management tools offer similar dependency addition commands). diff --git a/1_python/1_getting-started/repl.md b/1_python/1_getting-started/repl.md index 5721ff1..f4162b3 100644 --- a/1_python/1_getting-started/repl.md +++ b/1_python/1_getting-started/repl.md @@ -6,8 +6,8 @@ index: 2 --- To enable interactive use, `lmstudio-python` offers a convenience API which manages -its resources via `atexit` hooks, allowing the a default synchronous client session -to be used across multiple interactive comments. +its resources via `atexit` hooks, allowing a default synchronous client session +to be used across multiple interactive commands. This convenience API is shown in the examples throughout the documentation as the `Python (convenience API)` tab (alongside the `Python (scoped resource API)` examples, diff --git a/1_python/1_llm-prediction/chat-completion.md b/1_python/1_llm-prediction/chat-completion.md index dbee63e..cf4822b 100644 --- a/1_python/1_llm-prediction/chat-completion.md +++ b/1_python/1_llm-prediction/chat-completion.md @@ -132,23 +132,23 @@ You can ask the LLM to predict the next response in the chat context using the ` ```lms_code_snippet variants: - Streaming: + "Non-streaming": language: python code: | # The `chat` object is created in the previous step. - prediction_stream = model.respond_stream(chat) + result = model.respond(chat) - for fragment in prediction_stream: - print(fragment.content, end="", flush=True) - print() # Advance to a new line at the end of the response + print(result) - "Non-streaming": + Streaming: language: python code: | # The `chat` object is created in the previous step. - result = model.respond(chat) + prediction_stream = model.respond_stream(chat) - print(result) + for fragment in prediction_stream: + print(fragment.content, end="", flush=True) + print() # Advance to a new line at the end of the response ``` ## Customize Inferencing Parameters diff --git a/1_python/1_llm-prediction/completion.md b/1_python/1_llm-prediction/completion.md index 7f63495..58c0353 100644 --- a/1_python/1_llm-prediction/completion.md +++ b/1_python/1_llm-prediction/completion.md @@ -39,23 +39,23 @@ Once you have a loaded model, you can generate completions by passing a string t ```lms_code_snippet variants: - Streaming: + "Non-streaming": language: python code: | # The `chat` object is created in the previous step. - prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100}) + result = model.complete("My name is", config={"maxTokens": 100}) - for fragment in prediction_stream: - print(fragment.content, end="", flush=True) - print() # Advance to a new line at the end of the response + print(result) - "Non-streaming": + Streaming: language: python code: | # The `chat` object is created in the previous step. - result = model.complete("My name is", config={"maxTokens": 100}) + prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100}) - print(result) + for fragment in prediction_stream: + print(fragment.content, end="", flush=True) + print() # Advance to a new line at the end of the response ``` ## 3. Print Prediction Stats @@ -64,21 +64,22 @@ You can also print prediction metadata, such as the model used for generation, n ```lms_code_snippet variants: - Streaming: + "Non-streaming": language: python code: | - # After iterating through the prediction fragments, - # the overall prediction result may be obtained from the stream - result = prediction_stream.result() - + # `result` is the response from the model. print("Model used:", result.model_info.display_name) print("Predicted tokens:", result.stats.predicted_tokens_count) print("Time to first token (seconds):", result.stats.time_to_first_token_sec) print("Stop reason:", result.stats.stop_reason) - "Non-streaming": + + Streaming: language: python code: | - # `result` is the response from the model. + # After iterating through the prediction fragments, + # the overall prediction result may be obtained from the stream + result = prediction_stream.result() + print("Model used:", result.model_info.display_name) print("Predicted tokens:", result.stats.predicted_tokens_count) print("Time to first token (seconds):", result.stats.time_to_first_token_sec) diff --git a/1_python/1_llm-prediction/parameters.md b/1_python/1_llm-prediction/parameters.md index bf084c5..879aaf0 100644 --- a/1_python/1_llm-prediction/parameters.md +++ b/1_python/1_llm-prediction/parameters.md @@ -33,7 +33,10 @@ Set inference-time parameters such as `temperature`, `maxTokens`, `topP` and mor -Another useful inference-time configuration parameter is [`structured`](<(./structured-responses)>), which allows you to rigorously enforce the structure of the output using a JSON or Pydantic schema. +Note that while `structured` can be set to a JSON schema definition as an inference-time configuration parameter, +the preferred approach is to instead set the [dedicated `response_format` parameter](<(./structured-responses)>), +which allows you to more rigorously enforce the structure of the output using a JSON or class based schema +definition. # Load Parameters diff --git a/1_python/1_llm-prediction/structured-response.md b/1_python/1_llm-prediction/structured-response.md index 6820c99..57e33cf 100644 --- a/1_python/1_llm-prediction/structured-response.md +++ b/1_python/1_llm-prediction/structured-response.md @@ -130,7 +130,7 @@ schema = { book = result.parsed print(book) - # ^ + # ^ # Note that `book` is correctly typed as { title: string, author: string, year: number } Streaming: @@ -138,10 +138,10 @@ schema = { code: | prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=schema) - # Optionally stream the response - # for fragment in prediction: - # print(fragment.content, end="", flush=True) - # print() + # Stream the response + for fragment in prediction: + print(fragment.content, end="", flush=True) + print() # Note that even for structured responses, the *fragment* contents are still only text # Get the final structured result @@ -149,13 +149,15 @@ schema = { book = result.parsed print(book) - # ^ + # ^ # Note that `book` is correctly typed as { title: string, author: string, year: number } ``` + diff --git a/1_python/5_manage-models/loading.md b/1_python/5_manage-models/loading.md index 871b178..574efc7 100644 --- a/1_python/5_manage-models/loading.md +++ b/1_python/5_manage-models/loading.md @@ -23,7 +23,8 @@ AI models are huge. It can take a while to load them into memory. LM Studio's SD ## Get the Current Model with `.model()` -If you already have a model loaded in LM Studio (either via the GUI or `lms load`), you can use it by calling `.model()` without any arguments. +If you already have a model loaded in LM Studio (either via the GUI or `lms load`), +you can use it by calling `.model()` without any arguments. ```lms_code_snippet variants: diff --git a/1_python/6_model-info/_get-load-config.md b/1_python/6_model-info/_get-load-config.md index a33fedb..27294c1 100644 --- a/1_python/6_model-info/_get-load-config.md +++ b/1_python/6_model-info/_get-load-config.md @@ -8,7 +8,9 @@ TODO: Python SDK has this interface hidden until we can translate server config LM Studio allows you to configure certain parameters when loading a model [through the server UI](/docs/advanced/per-model) or [through the API](/docs/api/sdk/load-model). -You can retrieve the config with which a given model was loaded using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`. +You can retrieve the config with which a given model was loaded using the SDK. +In the below examples, the LLM reference can be replaced with an +embedding model reference without requiring any other changes. ```lms_protip Context length is a special case that [has its own method](/docs/api/sdk/get-context-length). diff --git a/1_python/6_model-info/get-model-info.md b/1_python/6_model-info/get-model-info.md index f8008b1..243a24a 100644 --- a/1_python/6_model-info/get-model-info.md +++ b/1_python/6_model-info/get-model-info.md @@ -7,7 +7,9 @@ You can access general information and metadata about a model itself from a load instance of that model. Currently, the SDK exposes the model's default `identifier` -and the `path` used to [load it](/docs/api/sdk/load-model). In the below examples, `llm` can be replaced with an embedding model `emb`. +and the `path` used to [load it](/docs/api/sdk/load-model). +In the below examples, the LLM reference can be replaced with an +embedding model reference without requiring any other changes. ```lms_code_snippet variants: diff --git a/1_python/index.md b/1_python/index.md index 49ddf90..8269d32 100644 --- a/1_python/index.md +++ b/1_python/index.md @@ -8,7 +8,7 @@ description: "Getting started with LM Studio's Python SDK" ## Installing the SDK -`lmstudio-python` is available as a pypi package. You can install it using pip. +`lmstudio-python` is available as a PyPI package. You can install it using pip. ```lms_code_snippet variants: