Skip to content

add a minimal LLM chat example + switch to mlx-swift 0.30.2#454

Merged
davidkoski merged 9 commits intomainfrom
trivial-llm
Jan 22, 2026
Merged

add a minimal LLM chat example + switch to mlx-swift 0.30.2#454
davidkoski merged 9 commits intomainfrom
trivial-llm

Conversation

@davidkoski
Copy link
Collaborator

@davidkoski davidkoski commented Dec 17, 2025

Proposed changes

  • LLMEval has more of a showcase of features and runtime statistics
  • this provides the minimum required to load a model and interact with it
  • also cleans up the xcodeproj (see Fix Xcode project #451)
  • removes VLMEval (redundant and wasn't maintained)
  • fixes all warnings from moving to mlx-swift 0.30.2 0.30.3 (and equivalent mlx-swift-lm, tag not cut yet 2.30.3)
image

I updated the documentation to indicate which examples were more full featured and which ones were the minimal starting points. Both have uses.

@DePasqualeOrg FYI

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document

  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes

  • I have added tests that prove my fix is effective or that my feature works

  • I have updated the necessary documentation (if needed)

  • update build dependency on mlx-swift-lm when tag is ready

@davidkoski davidkoski requested a review from awni December 17, 2025 22:00
self.task = nil
}
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next file (ContentView) are the full minimal chat app.

### Troubleshooting

If the program crashes with a very deep stack trace, you may need to build
in Release configuration. This seems to depend on the size of the model.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This advice was obsolete

@@ -1,6 +1,5 @@
// Copyright © 2025 Apple Inc.

import AsyncAlgorithms
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

.product(name: "MLXNN", package: "mlx-swift"),
.product(name: "MLXOptimizers", package: "mlx-swift"),
.product(name: "MLXRandom", package: "mlx-swift"),
.product(name: "Transformers", package: "swift-transformers"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Dec 18, 2025
- ml-explore/mlx-swift-examples#454

- fixes #27
- move ChatSession integration tests into new test target so we can more easily control when it runs
- make a ChatSession _unit_ (more or less) test
- fix Sendable / thread safety issues uncovered by LLMBasic
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 6, 2026
- ml-explore/mlx-swift-examples#454

- fixes #27
- move ChatSession integration tests into new test target so we can more easily control when it runs
- make a ChatSession _unit_ (more or less) test
- fix Sendable / thread safety issues uncovered by LLMBasic

- collect TestTokenizer and friends in its own file.  fix warnings in tests
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 6, 2026
- ml-explore/mlx-swift-examples#454

- fixes #27
- move ChatSession integration tests into new test target so we can more easily control when it runs
- make a ChatSession _unit_ (more or less) test
- fix Sendable / thread safety issues uncovered by LLMBasic

- collect TestTokenizer and friends in its own file.  fix warnings in tests
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 8, 2026
- ml-explore/mlx-swift-examples#454

- fixes #27
- move ChatSession integration tests into new test target so we can more easily control when it runs
- make a ChatSession _unit_ (more or less) test
- fix Sendable / thread safety issues uncovered by LLMBasic

- collect TestTokenizer and friends in its own file.  fix warnings in tests
- UserInputProcessors -> structs
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026
- see #27
- a port of ml-explore/mlx-lm#463 (happened after the initial port to swift)

- in support of ml-explore/mlx-swift-examples#454
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026
- see #27
- a port of ml-explore/mlx-lm#463 (happened after the initial port to swift)

- in support of ml-explore/mlx-swift-examples#454
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026
- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026
- see #27
- a port of ml-explore/mlx-lm#463 (happened after the initial port to swift)

- in support of ml-explore/mlx-swift-examples#454
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 9, 2026
- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

swift-format
@davidkoski davidkoski changed the title add a minimal LLM chat example add a minimal LLM chat example + switch to mlx-swift 0.30.2 Jan 9, 2026
self.tokensPerSecond = Double(self.totalTokens) / elapsed
self.totalTime = elapsed
}
let lmInput = try await modelContainer.prepare(input: userInput)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little easier with the updated API on ModelContainer

Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(GPU.memoryLimit))
Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(GPU.cacheLimit))
Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(Memory.memoryLimit))
Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(Memory.cacheLimit))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed from GPU -> Memory to match the python side (we aren't always running on a GPU).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are deprecation warnings, not build breaks.

private func startInner() async throws {
// setup
GPU.set(cacheLimit: 32 * 1024 * 1024)
Memory.cacheLimit = 32 * 1024 * 1024
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as a property is more swifty -- the new Memory API exposes it like that


func run() async throws {
Device.setDefault(device: Device(device))
try await Device.withDefaultDevice(Device(device)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now Task scoped rather than global -- this is a better fit for the swift model. The setDefault is deprecated.

}
if let chunk = item.chunk {
print(chunk, terminator: "")
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another move to the updated API. Passing the UserInput (not Sendable) was an issue in the above code in swift 6.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is worth moving to ChatSession? That would make it even simpler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

var cache: [KVCache]

var printStats = false
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace all of this with ChatSession -- much simpler.

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 13, 2026
- see #27
- a port of ml-explore/mlx-lm#463 (happened after the initial port to swift)

- in support of ml-explore/mlx-swift-examples#454
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 13, 2026
* fix gemma3 + attention mask

- see #27
- a port of ml-explore/mlx-lm#463 (happened after the initial port to swift)

- in support of ml-explore/mlx-swift-examples#454
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 13, 2026
- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

swift-format
Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I love having a super minimal example. I read through the docs but not most of the code. If there is anything in particular you'd like me to look at let me know!

@davidkoski
Copy link
Collaborator Author

Looks great. I love having a super minimal example. I read through the docs but not most of the code. If there is anything in particular you'd like me to look at let me know!

No, nothing in particular -- there are a lot of deleted files. The command line tool is simplified using the ChatSession API instead of setting up all of that directly. It is still a little bit complicated because it is collecting stats.

The new app is mostly this file: https://github.com/ml-explore/mlx-swift-examples/pull/454/files#diff-0d490c888b4ad72cfec150f14cbaecc0091e4b094686462311588ee234ea6581 which is 1) download, 2) load, 3) inference. The whole file is about 100 lines and supports asynchronous operations, progress, and cancellation.

I am excited to trim it down to the basics, but it actually has more functionality than the original example since you can 1) interrupt and 2) it is a chat session with history, but less code :-)

davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 22, 2026
- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

swift-format
davidkoski added a commit to ml-explore/mlx-swift-lm that referenced this pull request Jan 22, 2026
* fix thread safety issues

- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

* pick up mlx-swift 0.30.3 which has additional thread safety fixes
@davidkoski davidkoski merged commit c684488 into main Jan 22, 2026
2 checks passed
@davidkoski davidkoski deleted the trivial-llm branch January 22, 2026 21:24
@davidkoski davidkoski mentioned this pull request Jan 22, 2026
4 tasks
ronaldmannak pushed a commit to PicoMLX/mlx-swift-lm that referenced this pull request Jan 27, 2026
* fix thread safety issues

- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

* pick up mlx-swift 0.30.3 which has additional thread safety fixes
ronaldmannak pushed a commit to PicoMLX/mlx-swift-lm that referenced this pull request Jan 30, 2026
* fix thread safety issues

- support for ml-explore/mlx-swift-examples#454
- ModelContainer appeared to provide thread safe access to the KVCache and model
    - but in fact was not -- async token generation could use the KVCache concurrently
    - if you were to break the async stream early the previously call could still be running

* pick up mlx-swift 0.30.3 which has additional thread safety fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants