Skip to content

Update LLMEval example#452

Merged
davidkoski merged 3 commits intoml-explore:mainfrom
timsneath:llm-eval-new
Dec 11, 2025
Merged

Update LLMEval example#452
davidkoski merged 3 commits intoml-explore:mainfrom
timsneath:llm-eval-new

Conversation

@timsneath
Copy link
Contributor

@timsneath timsneath commented Dec 10, 2025

Proposed changes

This PR modernizes the LLMEval example application by refactoring it from a monolithic ContentView into a clean MVVM architecture with several new features.

New Features

  • Improved metrics panel: Better visual hierarchy for performance statistics
  • Preset prompts: Curated prompt library with support for tools and thinking mode, including long-form prompts
  • Enhanced loading UX: Visual progress indicators for model downloads with file counts
  • Collapsible prompt area: Expandable text input for longer prompts

Architecture Refactor

  • Extracted business logic into LLMEvaluator view model with improved state management
  • Split UI into focused, reusable components: HeaderView, OutputView, PromptInputView, MetricsView,
    LoadingOverlayView, PresetPromptsSheet
  • Created dedicated service layer with ToolExecutor for function calling
  • Organized models into separate files (PresetPrompts, ToolDefinitions)

Configuration Updates

  • Changed default model from Phi-2 to Qwen3-8B-4bit to demonstrate improved prompt loading on M5 hardware
  • Updated README with clearer instructions for switching models
  • Added an app icon
  • Cleaned up entitlements
Screenshot 2025-12-10 at 3 47 29 PM

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@davidkoski
Copy link
Collaborator

The changes look good, but some things to think about:

  • I think we want a very simple example that people could start with -- LLMEval is probably that app

    • that isn't to say that it shouldn't be improved or cleaned up, but feature-wise I think it should stay pretty bare
    • and ideally with few lines of code
    • but it hasn't been touched much for a long time so it can probably be modernized a bit!
  • the MLXChatExample app, on the other hand, is a little more feature rich and shows what you might build

    • I wonder if these changes might be more appropriate there?

OR

  • do we need a new application that better showcases features like feeding it the .md files as part of the prompt?
    • then we could actually simplify the LLMEval app to contain just the barest features

We should probably provide more guidance documentation-wise as to what to expect from each.

What do you think?

@davidkoski
Copy link
Collaborator

After offline discussion the new plan is:

  • take these improvements
  • I will make a new minimal "hello world" LLM application as a minimal example

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this improvement!

@davidkoski davidkoski merged commit fc3afc7 into ml-explore:main Dec 11, 2025
2 checks passed
@timsneath timsneath deleted the llm-eval-new branch December 11, 2025 18:36
@timsneath timsneath restored the llm-eval-new branch December 11, 2025 18:40
@timsneath timsneath deleted the llm-eval-new branch December 11, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants