add a minimal LLM chat example + switch to mlx-swift 0.30.2 by davidkoski · Pull Request #454 · ml-explore/mlx-swift-examples

davidkoski · 2025-12-17T22:00:55Z

Proposed changes

LLMEval has more of a showcase of features and runtime statistics
this provides the minimum required to load a model and interact with it
also cleans up the xcodeproj (see Fix Xcode project #451)
removes VLMEval (redundant and wasn't maintained)
fixes all warnings from moving to mlx-swift ~~0.30.2~~ 0.30.3 (and equivalent mlx-swift-lm, ~~tag not cut yet~~ 2.30.3)

I updated the documentation to indicate which examples were more full featured and which ones were the minimal starting points. Both have uses.

@DePasqualeOrg FYI

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)
update build dependency on mlx-swift-lm when tag is ready

davidkoski · 2025-12-17T22:02:31Z

Applications/LLMBasic/ChatModel.swift

+            self.task = nil
+        }
+    }
+}


This and the next file (ContentView) are the full minimal chat app.

davidkoski · 2025-12-17T22:03:03Z

Applications/LLMEval/README.md

-### Troubleshooting
-
-If the program crashes with a very deep stack trace, you may need to build
-in Release configuration. This seems to depend on the size of the model.


This advice was obsolete

davidkoski · 2025-12-17T22:03:14Z

Applications/LLMEval/Views/ContentView.swift

@@ -1,6 +1,5 @@
 // Copyright © 2025 Apple Inc.

-import AsyncAlgorithms


davidkoski · 2025-12-17T22:03:22Z

Package.swift

                .product(name: "MLXNN", package: "mlx-swift"),
                .product(name: "MLXOptimizers", package: "mlx-swift"),
                .product(name: "MLXRandom", package: "mlx-swift"),
-                .product(name: "Transformers", package: "swift-transformers"),


- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic

- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests

- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests - UserInputProcessors -> structs

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format

davidkoski · 2026-01-09T20:58:50Z

Applications/LLMEval/ViewModels/LLMEvaluator.swift

-                                    self.tokensPerSecond = Double(self.totalTokens) / elapsed
-                                    self.totalTime = elapsed
-                                }
+            let lmInput = try await modelContainer.prepare(input: userInput)


This is a little easier with the updated API on ModelContainer

davidkoski · 2026-01-09T20:59:29Z

Applications/LLMEval/Views/MetricsView.swift

-                        Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(GPU.memoryLimit))
-                        Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(GPU.cacheLimit))
+                        Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(Memory.memoryLimit))
+                        Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(Memory.cacheLimit))


This was changed from GPU -> Memory to match the python side (we aren't always running on a GPU).

These are deprecation warnings, not build breaks.

davidkoski · 2026-01-09T20:59:53Z

Applications/LoRATrainingExample/ContentView.swift

    private func startInner() async throws {
        // setup
-        GPU.set(cacheLimit: 32 * 1024 * 1024)
+        Memory.cacheLimit = 32 * 1024 * 1024


This as a property is more swifty -- the new Memory API exposes it like that

davidkoski · 2026-01-09T21:01:21Z

Tools/LinearModelTraining/LinearModelTraining.swift


    func run() async throws {
-        Device.setDefault(device: Device(device))
+        try await Device.withDefaultDevice(Device(device)) {


This is now Task scoped rather than global -- this is a better fit for the swift model. The setDefault is deprecated.

davidkoski · 2026-01-09T21:02:37Z

Tools/llm-tool/LLMTool.swift

+            }
+            if let chunk = item.chunk {
+                print(chunk, terminator: "")
+            }


Another move to the updated API. Passing the UserInput (not Sendable) was an issue in the above code in swift 6.

I wonder if this is worth moving to ChatSession? That would make it even simpler.

davidkoski · 2026-01-10T01:00:25Z

Tools/llm-tool/Chat.swift

-        var cache: [KVCache]
-
-        var printStats = false
-    }


Replace all of this with ChatSession -- much simpler.

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

* fix gemma3 + attention mask - see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format

awni

Looks great. I love having a super minimal example. I read through the docs but not most of the code. If there is anything in particular you'd like me to look at let me know!

davidkoski · 2026-01-22T06:00:35Z

Looks great. I love having a super minimal example. I read through the docs but not most of the code. If there is anything in particular you'd like me to look at let me know!

No, nothing in particular -- there are a lot of deleted files. The command line tool is simplified using the ChatSession API instead of setting up all of that directly. It is still a little bit complicated because it is collecting stats.

The new app is mostly this file: https://github.com/ml-explore/mlx-swift-examples/pull/454/files#diff-0d490c888b4ad72cfec150f14cbaecc0091e4b094686462311588ee234ea6581 which is 1) download, 2) load, 3) inference. The whole file is about 100 lines and supports asynchronous operations, progress, and cancellation.

I am excited to trim it down to the basics, but it actually has more functionality than the original example since you can 1) interrupt and 2) it is a chat session with history, but less code :-)

- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format

* fix thread safety issues - support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running * pick up mlx-swift 0.30.3 which has additional thread safety fixes

- LLMEval has more of a showcase of features and runtime statistics - this provides the minimum required to load a model and interact with it - also cleans up the xcodeproj (see #451) - removes VLMEval (redundant and wasn't maintained)

* fix thread safety issues - support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running * pick up mlx-swift 0.30.3 which has additional thread safety fixes

davidkoski requested a review from awni December 17, 2025 22:00

davidkoski commented Dec 17, 2025

View reviewed changes

davidkoski mentioned this pull request Dec 18, 2025

support for LLMBasic (mlx-swift-examples) ml-explore/mlx-swift-lm#29

Closed

4 tasks

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026

fix gemma3 + attention mask

6779f8e

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

davidkoski mentioned this pull request Jan 9, 2026

fix gemma3 + attention mask ml-explore/mlx-swift-lm#53

Merged

4 tasks

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026

fix gemma3 + attention mask

1a5bbcf

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

davidkoski mentioned this pull request Jan 9, 2026

fix thread safety issues ml-explore/mlx-swift-lm#55

Merged

4 tasks

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026

fix gemma3 + attention mask

97e842b

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

davidkoski changed the title ~~add a minimal LLM chat example~~ add a minimal LLM chat example + switch to mlx-swift 0.30.2 Jan 9, 2026

davidkoski commented Jan 9, 2026

View reviewed changes

davidkoski commented Jan 10, 2026

View reviewed changes

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 13, 2026

fix gemma3 + attention mask

b898ec1

- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454

This was referenced Jan 13, 2026

[BUG] embedding model dimensions: MLX converted vs not ml-explore/mlx-swift-lm#36

Closed

Is it possible to support concurrency in model container #358

Closed

awni approved these changes Jan 22, 2026

View reviewed changes

davidkoski added 9 commits January 22, 2026 13:00

add a minimal LLM chat example

34197dc

- LLMEval has more of a showcase of features and runtime statistics - this provides the minimum required to load a model and interact with it - also cleans up the xcodeproj (see #451) - removes VLMEval (redundant and wasn't maintained)

swift-format

ed20b7c

prepare for mlx 0.30.2 and matching mlx-swift-lm

de18ac4

remove tests that have migrated to mlx-swift-lm

5a1027a

remove ExampleLLM -- this will be covered by other examples

76aa07c

rework llm-tool to use ChatSession -- a better example

588420d

workaround for CI failures

fcaf865

update to mlx-swift 0.30.3 and mlx-swift-lm 2.30.3

342585a

remove dev team

acd897d

davidkoski force-pushed the trivial-llm branch from 2f82203 to acd897d Compare January 22, 2026 21:10

davidkoski merged commit c684488 into main Jan 22, 2026
2 checks passed

davidkoski deleted the trivial-llm branch January 22, 2026 21:24

davidkoski mentioned this pull request Jan 22, 2026

Fix Xcode project #451

Closed

4 tasks

		@@ -1,6 +1,5 @@
		// Copyright © 2025 Apple Inc.

		import AsyncAlgorithms

Conversation

davidkoski commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

davidkoski commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidkoski commented Dec 17, 2025 •

edited

Loading