Skip to content

Releases: ServiceNow/BrowserGym

v0.14.2

05 Aug 18:27
b9e6894

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.14.1...v0.14.2

v0.14.1: Miniwob zoom (#351)

16 Jun 16:44
9e4e07c

Choose a tag to compare

What's Changed

  • Some changes in preparation for agentlab's new ToolUseAgent and the new APIs by @TLSDC in #340
  • Miniwob zoom by @recursix in #351

Full Changelog: v0.13.4...v0.14.1

v0.13.3: minor fixes

27 Nov 19:56

Choose a tag to compare

What's Changed

browsergym-core

  • Optional method AbstractBrowserTask.get_task_id() #281
  • Fixed BrowserEnv parameter resizeable_window, now working as expected #281

browsergym-experiments

  • Metadata column fix for visualwebarena #278

Full Changelog: v0.13.2...v0.13.3

v0.13.2: experiments updates

21 Nov 20:10

Choose a tag to compare

What's Changed

browsergym-experiments

  • Experiment traces can now be exported into the TapeAgents format #238
  • Installs weblinx_browsergym as a dependency #261
  • WA/VWA full instance reset will only issue a warning instead of crashing if not properly set-up #272
  • New debug benchmark visualwebarena_tiny #271

Full Changelog: v0.13.1...v0.13.2

v0.13.1: Many small fixes

15 Nov 18:26

Choose a tag to compare

What's Changed

browsergym-experiments

browsergym-core

  • Fixed gym warnings "obs not within observation space" #251
  • Trace downgrades from INFO to DEBUG#252
  • More robust env.close(), can now be used in a finally block even after reset failure #253
  • Optional AbstractBrowserTask.teardown() method #255
  • Browsergym's register_task() now supports both frozen, non-overrideable task_kwargs as well as overrideable default_task_kwargs arguments #255
  • More robust frame marking #256 #258

browsergym-assistantbench

  • Refactored AssistantBench mechanism for saving test predictions to JSON files #242

browsergym-webarena

  • Relaxed playwright<1.40 restriction #257

browsergym-visualwebarena

  • Relaxed playwright<1.40 restriction #257

Full Changelog

v0.13.0...v0.13.1

v0.13.0: Minor updates

07 Nov 20:35

Choose a tag to compare

What's changed

browsergym-core

  • More robust frame marking with lenient last try #245
  • Tasks can now choose their own locale and timezone_id #244

browsergym-experiments

  • Pre-download WebLINX data in prepare_backend() #226
  • Increase AssistantBench max_steps to 30 #244
  • Add select_option to webarena / visualwebarena default action set #247

browsergym-visualwebarena

  • Hide huggingface progress bar when downloading the visual evaluation model #241

browsergym-assistantbench

  • Set locale="en-US" and timezone_id="America/New_York"

Full Changelog: v0.12.0...v0.13.0

v0.12.0: VisualWebarena / WebLINX bugfixes

04 Nov 19:44

Choose a tag to compare

Bugfixes

browsergym-experiments

  • Fixes WebLINX task list #235
  • Refactors experiment ID generation #236
  • Adds VisualWebArena task dependencies #237 #239

browsergym-visualwebarena

  • Fixes VisualWebArena tasks with visual validation (missing captioning_fn in evaluator) #240
  • Adds a torch dependency (to run the captioning model) #240

Full Changelog: v0.11.3...v0.12.0

v0.11.3: Minor fixes

01 Nov 15:30

Choose a tag to compare

Bugfixes

  • Fix duplicate depends_on in webarena metadata #228

Improvements

  • Easier webarena / visualwebarena setup with (running nltk.download() at import time) #227
  • More robust full_reset() for webarena / visualwebarena #230
  • Removed ARIA extraction warnings #233
  • New benchmark configuration webarena_tiny #232

Full Changelog: v0.11.2...v0.11.3

v0.11.2: Minor fix

30 Oct 20:25

Choose a tag to compare

Bugfixes

  • Add incomplete ExpResult.status #225

Full Changelog: v0.11.1...v0.11.2

v0.11.1: Benchmark update

30 Oct 19:29

Choose a tag to compare

New features

  • Set max steps to 30 in webarena / visualwenarena benchmarks #214
  • Benchmark dependency graph utilities #220
  • Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224

Bugfixes

  • Rename benchmark after subset_from_split() #221
  • ExpArgs.exp_dir sanitization #222
  • get_step_info() bugfix #223

Full Changelog: v0.11.0...v0.11.1