add a minimal LLM chat example + switch to mlx-swift 0.30.2#454
add a minimal LLM chat example + switch to mlx-swift 0.30.2#454davidkoski merged 9 commits intomainfrom
Conversation
| self.task = nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This and the next file (ContentView) are the full minimal chat app.
| ### Troubleshooting | ||
|
|
||
| If the program crashes with a very deep stack trace, you may need to build | ||
| in Release configuration. This seems to depend on the size of the model. |
There was a problem hiding this comment.
This advice was obsolete
| @@ -1,6 +1,5 @@ | |||
| // Copyright © 2025 Apple Inc. | |||
|
|
|||
| import AsyncAlgorithms | |||
| .product(name: "MLXNN", package: "mlx-swift"), | ||
| .product(name: "MLXOptimizers", package: "mlx-swift"), | ||
| .product(name: "MLXRandom", package: "mlx-swift"), | ||
| .product(name: "Transformers", package: "swift-transformers"), |
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests - UserInputProcessors -> structs
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format
| self.tokensPerSecond = Double(self.totalTokens) / elapsed | ||
| self.totalTime = elapsed | ||
| } | ||
| let lmInput = try await modelContainer.prepare(input: userInput) |
There was a problem hiding this comment.
This is a little easier with the updated API on ModelContainer
| Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(GPU.memoryLimit)) | ||
| Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(GPU.cacheLimit)) | ||
| Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(Memory.memoryLimit)) | ||
| Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(Memory.cacheLimit)) |
There was a problem hiding this comment.
This was changed from GPU -> Memory to match the python side (we aren't always running on a GPU).
There was a problem hiding this comment.
These are deprecation warnings, not build breaks.
| private func startInner() async throws { | ||
| // setup | ||
| GPU.set(cacheLimit: 32 * 1024 * 1024) | ||
| Memory.cacheLimit = 32 * 1024 * 1024 |
There was a problem hiding this comment.
This as a property is more swifty -- the new Memory API exposes it like that
|
|
||
| func run() async throws { | ||
| Device.setDefault(device: Device(device)) | ||
| try await Device.withDefaultDevice(Device(device)) { |
There was a problem hiding this comment.
This is now Task scoped rather than global -- this is a better fit for the swift model. The setDefault is deprecated.
| } | ||
| if let chunk = item.chunk { | ||
| print(chunk, terminator: "") | ||
| } |
There was a problem hiding this comment.
Another move to the updated API. Passing the UserInput (not Sendable) was an issue in the above code in swift 6.
There was a problem hiding this comment.
I wonder if this is worth moving to ChatSession? That would make it even simpler.
| var cache: [KVCache] | ||
|
|
||
| var printStats = false | ||
| } |
There was a problem hiding this comment.
Replace all of this with ChatSession -- much simpler.
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
* fix gemma3 + attention mask - see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format
awni
left a comment
There was a problem hiding this comment.
Looks great. I love having a super minimal example. I read through the docs but not most of the code. If there is anything in particular you'd like me to look at let me know!
No, nothing in particular -- there are a lot of deleted files. The command line tool is simplified using the ChatSession API instead of setting up all of that directly. It is still a little bit complicated because it is collecting stats. The new app is mostly this file: https://github.com/ml-explore/mlx-swift-examples/pull/454/files#diff-0d490c888b4ad72cfec150f14cbaecc0091e4b094686462311588ee234ea6581 which is 1) download, 2) load, 3) inference. The whole file is about 100 lines and supports asynchronous operations, progress, and cancellation. I am excited to trim it down to the basics, but it actually has more functionality than the original example since you can 1) interrupt and 2) it is a chat session with history, but less code :-) |
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format
* fix thread safety issues - support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running * pick up mlx-swift 0.30.3 which has additional thread safety fixes
- LLMEval has more of a showcase of features and runtime statistics - this provides the minimum required to load a model and interact with it - also cleans up the xcodeproj (see #451) - removes VLMEval (redundant and wasn't maintained)
2f82203 to
acd897d
Compare
* fix thread safety issues - support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running * pick up mlx-swift 0.30.3 which has additional thread safety fixes
* fix thread safety issues - support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running * pick up mlx-swift 0.30.3 which has additional thread safety fixes
Proposed changes
0.30.20.30.3 (and equivalent mlx-swift-lm,tag not cut yet2.30.3)I updated the documentation to indicate which examples were more full featured and which ones were the minimal starting points. Both have uses.
@DePasqualeOrg FYI
Checklist
Put an
xin the boxes that apply.I have read the CONTRIBUTING document
I have run
pre-commit run --all-filesto format my code / installed pre-commit prior to committing changesI have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)
update build dependency on mlx-swift-lm when tag is ready