Highlights
🏆 Stable release of Berkeley Function Calling Leaderboard V3 with Multi-step and Multi-turn function call evaluation
What's Changed
- Gorilla README and repo structure revamp by @CharlieJCJ in #799
- [BFCL] Fix
live_parallel_multiple_9-8-0copy-paste issue by @pkesseli in #865 - [BFCL] Fix Typo in
multi_turn_base_34Ground Truth by @HuanzhiMao in #876 - Adding New Model Haha-7B by @ZydHaha in #858
- [BFCL Chore] Implement
retry_with_backofffor Amazon Nova Handler by @HuanzhiMao in #880 - [BFCL] Fix
live_simple_183-108-0by @pkesseli in #872 - [BFCL] Fix live_simple_165-98-0 by @pkesseli in #871
- [BFCL] Fix
live_simple_44-18-0andlive_simple_45-18-1by @pkesseli in #870 - [BFCL] Fix Nova Handler for Consecutive User Prompt Issue by @HuanzhiMao in #881
- Add support for QwQ and Sky-T1-32B-Preview by @SumanthRH in #888
- add handler for Bielik by @dominikabasaj in #887
- [BFCL Chore] Align Score File
idwith Result File Test Case IDs by @HuanzhiMao in #893 - Fix minor typo in default system prompt without func by @canyon289 in #895
- Falcon3 support by @kirill-fedyanin in #894
- [BFCL] Update tool construction for Palmyra models by @samjulien in #897
- Added compute_exchange_rate to multi_turn_base entry 180 ground truth by @Raymond112514 in #892
- [BFCL] Add New Model
o3-mini-2025-01-31ando3-mini-2025-01-31-FCby @HuanzhiMao in #898 - Add CALM models by @jgreer013 in #900
- [BFCL] Add New Model
gemini-2.0-flash-001,gemini-2.0-flash-lite-preview-02-05,gemini-2.0-pro-exp-02-05. by @HuanzhiMao in #902 - chore: added snippet for hf datasets compatibility by @alt-glitch in #906
- Update model_metadata.py by @jgreer013 in #907
- Rename CALM to CoALM by @jgreer013 in #913
- Bitagent 8b submission by @VectorForger in #917
- Bitagent 8b Metadata Change by @VectorForger in #919
- [BFCL] Add New Model
gpt-4.5-preview-2025-02-27,gpt-4.5-preview-2025-02-27-FCby @HuanzhiMao in #922 - [BFCL] fix bug in how score_dir is handled for bfcl evaluate by @liamcli in #924
- [BFCL] Add New Model
DeepSeek-R1by @HuanzhiMao in #901 - Make all import paths absolute. by @fvisin in #935
- Move logic to eval a task in a separate function. by @fvisin in #933
- Fix Gorilla Paper
requirements.txtLocation to Remove Global Dependency Confusion by @HuanzhiMao in #937 - [BFCL] Add _unused Suffix to Unused Dataset Files in the BFCL Benchmark by @HuanzhiMao in #938
- [BFCL] Support Local Inference for
deepseek-ai/DeepSeek-R1by @HuanzhiMao in #926 - [BFCL] Add Support for
Qwen2.5Models in Function Calling Mode by @HuanzhiMao in #925 - [BFCL] Add New Model
claude-3-7-sonnet-20250219,claude-3-7-sonnet-20250219-FCby @HuanzhiMao in #923 - [BFCL] Add handler and meta info for ToolACE-2-8B by @XuHwang in #941
- [BFCL] Reorganized All
constant.pyFiles to aconstantsFolder by @catherineruoxiwu in #944 - [BFCL] Add New Models
gemini-2.0-flash-lite-001,gemini-2.0-flash-thinking-exp-01-21by @HuanzhiMao in #942 - [BFCL] Add Google
Gemma-3Series Models by @HuanzhiMao in #939 - [BFCL] Move
model_metadata.pytoconstantsfolder by @catherineruoxiwu in #949 - Add Cohere Command A by @harry-cohere in #951
- Reformatted Supported Model Table by @JasonHuang1103 in #961
- [BFCL] Use HTTPS instead of HTTP for OMDB by @hrshtv in #960
- [BFCL] Fix ambiguity in exec_parallel_10 question by @amitojsingh2022 in #962
- [BFCL] Fix API Keys Handling by @catherineruoxiwu in #959
- [BFCL] Fix wrong date in live_simple_205-116-13 by @amitojsingh2022 in #963
- [BFCL] Moved Ground Truths for Executable Tests to
./data/possible_answerFolder by @catherineruoxiwu in #953 - [BFCL] Reorganizing Codes in
./bfcl/eval_checker/executable_eval/data/by @catherineruoxiwu in #954 - [BFCL] Add
gemini-2.5-proto the Leaderboard by @catherineruoxiwu in #974 - [BFCL] Update Retry Logic for Gemini Models by @HuanzhiMao in #976
- [BFCL] Fix Typo in
multi_turn_base_166Ground Truth. by @HuanzhiMao in #979 - Add Salesforce xLAM-2 series of model handlers and update vLLM version from 0.6.3 to 0.6.5 by @zuxin666 in #972
- [BFCL] Retire Executable Categories from Leaderboard by @HuanzhiMao in #943
- feat. Add Novita LLM Models API by @novita-viktor in #980
- [BFCL] Add New Models
Llama-4-Scout,Llama-4-Maverickby @HuanzhiMao in #981 - [BFCL] Add Support for Fully Offline Model Inference via
--local-model-pathby @catherineruoxiwu in #985 - Fix Typo in Model Name for
xLAM-2-8b-fc-rby @HuanzhiMao in #992 - Add ThinkAgents/ThinkAgent-1B by @0xayman in #928
- [BFCL] Add Grok 3 Models to the Leaderboard by @catherineruoxiwu in #987
- [BFCL] Add mistral-large-2411 and mistral-small-2503 by @pracheeti12 in #988
- Add xiaoming-14B by @kevin2016 in #977
- [BFCL] Retire Outdated Models from the Leaderboard by @catherineruoxiwu in #997
- [BFCL] add support for microsoft/Phi-4-mini-instruct by @RobotSail in #967
- [BFCL] Add
microsoft/phi-4to the Leaderboard by @catherineruoxiwu in #1000 - [BFCL] Add GPT 4.1 Series Models to the Leaderboard. by @catherineruoxiwu in #1002
- Bump
writer-sdkDependency Version by @HuanzhiMao in #1006 - [BFCL] add model config by @itea1001 in #999
- [BFCL] Add Validation for Model Names by @catherineruoxiwu in #1008
- [BFCL] Update Error Message for New Handler Mappings by @catherineruoxiwu in #1013
- [BFCL] fix entry id typo in
live_multiple_1052-279-0by @itea1001 in #1022 - Update QwQ-32b api by @CostaliyA in #1014
- Migrate to correct testing API by @emmanuel-ferdman in #1029
- Add gemini-2.5-pro-preview-05-06 Models by @Guangyu-Joshua-Feng in #1031
- [BFCL] Add Qwen 3 Series Models to the Leaderboard by @catherineruoxiwu in #1015
- [BFCL] Remove latency data for open source models by @errorfourten in #1033
- fix treesitter setup by @CharlieJCJ in #1045
- [BFCL] Added support for Mistral Medium 3 by @errorfourten in #1040
- New colab links for gorilla hosted and openfunctions hosted by @ShishirPatil in #1036
- [BFCL] Add
versiontobfclCLI by @ShishirPatil in #1038 - Add DM-Cito-8B by @kevin2016 in #1017
- fix: loosen openai requirements to be >= 1.76.0 by @TheFloatingString in #1050
- [BFCL] Packagerize for PyPI Distribution by @HuanzhiMao in #1054
- [BFCL] CI: Add “Publish to PyPI” workflow with CalVer-serial auto-versioning by @HuanzhiMao in #1055
- [BFCL] Replace Exception with SyntaxError for Java and JavaScript Parsers by @TheFloatingString in #1057
- [BFCL] Support DashScope API Inference for
Qwen3Series by @HuanzhiMao in #1061 - [BFCL] Add type hinting by @TheFloatingString in #1058
- [BFCL] Added support for DeepSeek-R1-0528 and DeepSeek-V3-0324 by @errorfourten in #1063
- [BFCL] Add support for Ling-Lite-V1.5 by @fengzhu1 in #1056
- [BFCL] Omit Reasoning Content from Chat History for Function-Calling Models by @HuanzhiMao in #1064
- Add support for llama-3.1-nemotron-ultra-253b-v1 to BFCL by @AdityaGhai18 in #1032
- _get_item() can not handle the "." directory in path string by @YJ3329 in #1060
- [BFCL] Multi-turn TravelAPI book_flight() Fix by @amitojsingh2022 in #966
- [BFCL] Fix prompt concatenation bug in Qwen template by @nehcgs in #1068
- Add Qwen handler by @zhangyingerjelly in #1072
- Add traceback logging to json outputs by @imradawoodani in #1074
- [BFCL] Fix is_fc_model config propagation by @HuanzhiMao in #1082
- feature: Added RZN-T to the suite of models. by @KevinDayve in #1079
- Fix typo in month parameter (Febuary -> February) by @Gnav3852 in #1084
- Update irrelevance_232 question by @Gnav3852 in #1085
- [BFCL] Fixed missing airport route entries by @amitojsingh2022 in #1087
- [BFCL] Resolve duplicated 'live-relevance_3-3-0' test entry id by @gumgizoa in #1086
- Added support for Claude 4 family models to BFCL by @Swordscore in #1034
- Add DM-Cito-8B-v2 by @kevin2016 in #1088
- update ground truth for multi turn base by @Daniel-Mash in #956
- Restrict GitHub Actions Workflow to Run Only on Source Repository by @HuanzhiMao in #1089
- [BFCL] Add support for Arch-Agent by @nehcgs in #1078
- [BFCL] Introduce OpenAI Responses API Handler + o4-mini/o3 models by @errorfourten in #1062
- nit(docs): Improve README Clarity on Sample Filename by @HuanzhiMao in #1093
- [BFCL] Add support for granite-3.1-8b-instruct and granite-3.2-8b-instruct by @RobotSail in #1041
- [BFCL] Replace
systemrole withdeveloperrole for OpenAI models by @errorfourten in #1090 - [BFCL] nit: Print traceback on generation error by @HuanzhiMao in #1100
- [BFCL] Migrate Gemini Inference to Google AI Studio by @HuanzhiMao in #1099
- [BFCL] Update Gemini model checkpoints to stable 2.5 releases by @HuanzhiMao in #1102
- [BFCL] Reintroduce latency stats for local models, update cost calculation by @Gnav3852 in #1098
- BitAgent Bounty Model Submission by @VectorForger in #1096
- [BFCL] Contact Customer Support Multi Turn & Vehicle Control #914 by @amitojsingh2022 in #1110
New Contributors
- @pkesseli made their first contribution in #865
- @ZydHaha made their first contribution in #858
- @SumanthRH made their first contribution in #888
- @dominikabasaj made their first contribution in #887
- @canyon289 made their first contribution in #895
- @kirill-fedyanin made their first contribution in #894
- @jgreer013 made their first contribution in #900
- @alt-glitch made their first contribution in #906
- @VectorForger made their first contribution in #917
- @liamcli made their first contribution in #924
- @fvisin made their first contribution in #935
- @catherineruoxiwu made their first contribution in #944
- @JasonHuang1103 made their first contribution in #961
- @hrshtv made their first contribution in #960
- @novita-viktor made their first contribution in #980
- @0xayman made their first contribution in #928
- @pracheeti12 made their first contribution in #988
- @kevin2016 made their first contribution in #977
- @RobotSail made their first contribution in #967
- @itea1001 made their first contribution in #999
- @CostaliyA made their first contribution in #1014
- @emmanuel-ferdman made their first contribution in #1029
- @Guangyu-Joshua-Feng made their first contribution in #1031
- @errorfourten made their first contribution in #1033
- @TheFloatingString made their first contribution in #1050
- @fengzhu1 made their first contribution in #1056
- @AdityaGhai18 made their first contribution in #1032
- @YJ3329 made their first contribution in #1060
- @nehcgs made their first contribution in #1068
- @zhangyingerjelly made their first contribution in #1072
- @imradawoodani made their first contribution in #1074
- @KevinDayve made their first contribution in #1079
- @Gnav3852 made their first contribution in #1084
- @gumgizoa made their first contribution in #1086
- @Swordscore made their first contribution in #1034
- @Daniel-Mash made their first contribution in #956
Full Changelog: v1.2...v1.3