Python docs review (2025-03-04)

ncoghlan · web-flow · commit 06eee5798312 · 2025-03-04T13:59:30.000+10:00
diff --git a/1_python/1_getting-started/project-setup.md b/1_python/1_getting-started/project-setup.md
@@ -5,13 +5,13 @@ description: "Set up your `lmstudio-python` app or script."
 index: 2
 ---
 
-`lmstudio` is a library published on Python that allows you to use `lmstudio-python` in your own projects.
+`lmstudio` is a library published on PyPI that allows you to use `lmstudio-python` in your own projects.
 It is open source and developed on GitHub.
 You can find the source code [here](https://github.com/lmstudio-ai/lmstudio-python).
 
 ## Installing `lmstudio-python`
 
-As it is published to Python, `lmstudio-python` may be installed using `pip`
+As it is published to PyPI, `lmstudio-python` may be installed using `pip`
 or your preferred project dependency manager (`pdm` is shown, but other
 Python project management tools offer similar dependency addition commands).
 
diff --git a/1_python/1_getting-started/repl.md b/1_python/1_getting-started/repl.md
@@ -6,8 +6,8 @@ index: 2
 ---
 
 To enable interactive use, `lmstudio-python` offers a convenience API which manages
-its resources via `atexit` hooks, allowing the a default synchronous client session
-to be used across multiple interactive comments.
+its resources via `atexit` hooks, allowing a default synchronous client session
+to be used across multiple interactive commands.
 
 This convenience API is shown in the examples throughout the documentation as the
 `Python (convenience API)` tab (alongside the `Python (scoped resource API)` examples,
diff --git a/1_python/1_llm-prediction/chat-completion.md b/1_python/1_llm-prediction/chat-completion.md
@@ -132,23 +132,23 @@ You can ask the LLM to predict the next response in the chat context using the `
 
 ```lms_code_snippet
   variants:
-    Streaming:
+    "Non-streaming":
       language: python
       code: |
         # The `chat` object is created in the previous step.
-        prediction_stream = model.respond_stream(chat)
+        result = model.respond(chat)
 
-        for fragment in prediction_stream:
-            print(fragment.content, end="", flush=True)
-        print() # Advance to a new line at the end of the response
+        print(result)
 
-    "Non-streaming":
+    Streaming:
       language: python
       code: |
         # The `chat` object is created in the previous step.
-        result = model.respond(chat)
+        prediction_stream = model.respond_stream(chat)
 
-        print(result)
+        for fragment in prediction_stream:
+            print(fragment.content, end="", flush=True)
+        print() # Advance to a new line at the end of the response
 ```
 
 ## Customize Inferencing Parameters
diff --git a/1_python/1_llm-prediction/completion.md b/1_python/1_llm-prediction/completion.md
@@ -39,23 +39,23 @@ Once you have a loaded model, you can generate completions by passing a string t
 
 ```lms_code_snippet
   variants:
-    Streaming:
+    "Non-streaming":
       language: python
       code: |
         # The `chat` object is created in the previous step.
-        prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
+        result = model.complete("My name is", config={"maxTokens": 100})
 
-        for fragment in prediction_stream:
-            print(fragment.content, end="", flush=True)
-        print() # Advance to a new line at the end of the response
+        print(result)
 
-    "Non-streaming":
+    Streaming:
       language: python
       code: |
         # The `chat` object is created in the previous step.
-        result = model.complete("My name is", config={"maxTokens": 100})
+        prediction_stream = model.complete_stream("My name is", config={"maxTokens": 100})
 
-        print(result)
+        for fragment in prediction_stream:
+            print(fragment.content, end="", flush=True)
+        print() # Advance to a new line at the end of the response
 ```
 
 ## 3. Print Prediction Stats
@@ -64,21 +64,22 @@ You can also print prediction metadata, such as the model used for generation, n
 
 ```lms_code_snippet
   variants:
-    Streaming:
+    "Non-streaming":
       language: python
       code: |
-        # After iterating through the prediction fragments,
-        # the overall prediction result may be obtained from the stream
-        result = prediction_stream.result()
-
+        # `result` is the response from the model.
         print("Model used:", result.model_info.display_name)
         print("Predicted tokens:", result.stats.predicted_tokens_count)
         print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
         print("Stop reason:", result.stats.stop_reason)
-    "Non-streaming":
+
+    Streaming:
       language: python
       code: |
-        # `result` is the response from the model.
+        # After iterating through the prediction fragments,
+        # the overall prediction result may be obtained from the stream
+        result = prediction_stream.result()
+
         print("Model used:", result.model_info.display_name)
         print("Predicted tokens:", result.stats.predicted_tokens_count)
         print("Time to first token (seconds):", result.stats.time_to_first_token_sec)
diff --git a/1_python/1_llm-prediction/parameters.md b/1_python/1_llm-prediction/parameters.md
@@ -33,7 +33,10 @@ Set inference-time parameters such as `temperature`, `maxTokens`, `topP` and mor
 
 <!-- See [`LLMPredictionConfigInput`](./../api-reference/llm-prediction-config-input) for all configurable fields. -->
 
-Another useful inference-time configuration parameter is [`structured`](<(./structured-responses)>), which allows you to rigorously enforce the structure of the output using a JSON or Pydantic schema.
+Note that while `structured` can be set to a JSON schema definition as an inference-time configuration parameter,
+the preferred approach is to instead set the [dedicated `response_format` parameter](<(./structured-responses)>),
+which allows you to more rigorously enforce the structure of the output using a JSON or class based schema
+definition.
 
 # Load Parameters
 
diff --git a/1_python/1_llm-prediction/structured-response.md b/1_python/1_llm-prediction/structured-response.md
@@ -130,32 +130,34 @@ schema = {
         book = result.parsed
 
         print(book)
-        #           ^
+        #     ^
         # Note that `book` is correctly typed as { title: string, author: string, year: number }
 
     Streaming:
       language: python
       code: |
         prediction_stream = model.respond_stream("Tell me about The Hobbit", response_format=schema)
 
-        # Optionally stream the response
-        # for fragment in prediction:
-        #   print(fragment.content, end="", flush=True)
-        # print()
+        # Stream the response
+        for fragment in prediction:
+            print(fragment.content, end="", flush=True)
+        print()
         # Note that even for structured responses, the *fragment* contents are still only text
 
         # Get the final structured result
         result = prediction_stream.result()
         book = result.parsed
 
         print(book)
-        #           ^
+        #     ^
         # Note that `book` is correctly typed as { title: string, author: string, year: number }
 ```
 
+<!--
+
 TODO: Info about structured generation caveats
 
-<!-- ## Overview
+ ## Overview
 
 Once you have [downloaded and loaded](/docs/basics/index) a large language model,
 you can use it to respond to input through the API. This article covers getting JSON structured output, but you can also
diff --git a/1_python/1_llm-prediction/working-with-chats.md b/1_python/1_llm-prediction/working-with-chats.md
@@ -24,7 +24,9 @@ variants:
 
 For more complex tasks, it is recommended to use the `Chat` helper class.
 It provides various commonly used methods to manage the chat.
-Here is an example with the `Chat` class.
+Here is an example with the `Chat` class, where the initial system prompt
+is supplied when initializing the chat instance, and then the initial user
+message is added via the corresponding method call.
 
 ```lms_code_snippet
 variants:
diff --git a/1_python/2_agent/tools.md b/1_python/2_agent/tools.md
@@ -71,7 +71,11 @@ is typically going to be the most convenient):
 
 This means that your wording will affect the quality of the generation. Make sure to always provide a clear description of the tool so the model knows how to use it.
 
-When a tool call fails, the language model may be able to respond appropriately to the failure.
+The SDK does not yet automatically convert raised exceptions to text and report them
+to the language model, but it can be beneficial for tool implementations to do so.
+In many cases, when notified of an error, a language model is able to adjust its
+request to avoid the failure.
+
 
 ## Tools with External Effects (like Computer Use or API Calls)
 
@@ -103,11 +107,6 @@ can essentially turn your LLMs into autonomous agents that can perform tasks on
 
 ```
 
-The SDK does not yet automatically convert raised exceptions to text and report them
-to the language model, but it can be beneficial for tool implementations to do so.
-In many cases, when notified of an error, a language model is able to adjust its
-request to avoid the failure.
-
 ### Example code using the `create_file` tool:
 
 ```lms_code_snippet
diff --git a/1_python/4_tokenization/index.md b/1_python/4_tokenization/index.md
@@ -8,7 +8,9 @@ Models use a tokenizer to internally convert text into "tokens" they can deal wi
 
 ## Tokenize
 
-You can tokenize a string with a loaded LLM or embedding model using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
+You can tokenize a string with a loaded LLM or embedding model using the SDK.
+In the below examples, the LLM reference can be replaced with an
+embedding model reference without requiring any other changes.
 
 ```lms_code_snippet
   variants:
@@ -74,31 +76,3 @@ You can determine if a given conversation fits into a model's context by doing t
         print("Fits in context:", does_chat_fit_in_context(model, chat))
 
 ```
-
-<!-- ### Context length comparisons
-
-The below examples check whether a conversation is over a LLM's context length
-(replace `llm` with `emb` to check for an embedding model).
-
-```lms_code_snippet
-  variants:
-    "Python (convenience API)":
-      language: python
-      code: |
-        import { LMStudioClient, Chat } from "@lmstudio/sdk";
-
-        const client = new LMStudioClient()
-        const llm = client.llm.model()
-
-        # To check for a string, simply tokenize
-        var tokens = llm.tokenize("Hello, world!")
-
-        # To check for a Chat, apply the prompt template first
-        const chat = Chat.createEmpty().withAppended("user", "Hello, world!")
-        const templatedChat = llm.applyPromptTemplate(chat)
-        tokens = llm.tokenize(templatedChat)
-
-        # If the prompt's length in tokens is less than the context length, you're good!
-        const contextLength = llm.getContextLength()
-        const isOkay = (tokens.length < contextLength)
-``` -->
diff --git a/1_python/5_manage-models/loading.md b/1_python/5_manage-models/loading.md
@@ -23,7 +23,8 @@ AI models are huge. It can take a while to load them into memory. LM Studio's SD
 
 ## Get the Current Model with `.model()`
 
-If you already have a model loaded in LM Studio (either via the GUI or `lms load`), you can use it by calling `.model()` without any arguments.
+If you already have a model loaded in LM Studio (either via the GUI or `lms load`),
+you can use it by calling `.model()` without any arguments.
 
 ```lms_code_snippet
   variants:
diff --git a/1_python/6_model-info/_get-load-config.md b/1_python/6_model-info/_get-load-config.md
@@ -8,7 +8,9 @@ TODO: Python SDK has this interface hidden until we can translate server config
 LM Studio allows you to configure certain parameters when loading a model
 [through the server UI](/docs/advanced/per-model) or [through the API](/docs/api/sdk/load-model).
 
-You can retrieve the config with which a given model was loaded using the SDK. In the below examples, `llm` can be replaced with an embedding model `emb`.
+You can retrieve the config with which a given model was loaded using the SDK.
+In the below examples, the LLM reference can be replaced with an
+embedding model reference without requiring any other changes.
 
 ```lms_protip
 Context length is a special case that [has its own method](/docs/api/sdk/get-context-length).
diff --git a/1_python/6_model-info/get-model-info.md b/1_python/6_model-info/get-model-info.md
@@ -7,7 +7,9 @@ You can access general information and metadata about a model itself from a load
 instance of that model.
 
 Currently, the SDK exposes the model's default `identifier`
-and the `path` used to [load it](/docs/api/sdk/load-model). In the below examples, `llm` can be replaced with an embedding model `emb`.
+and the `path` used to [load it](/docs/api/sdk/load-model).
+In the below examples, the LLM reference can be replaced with an
+embedding model reference without requiring any other changes.
 
 ```lms_code_snippet
   variants:
diff --git a/1_python/index.md b/1_python/index.md
@@ -8,7 +8,7 @@ description: "Getting started with LM Studio's Python SDK"
 
 ## Installing the SDK
 
-`lmstudio-python` is available as a pypi package. You can install it using pip.
+`lmstudio-python` is available as a PyPI package. You can install it using pip.
 
 ```lms_code_snippet
   variants: