Skip to content

Conversation

@ikreymer
Copy link
Member

@ikreymer ikreymer commented Apr 29, 2025

Work on #2566
Fixes #2576

Todo:

  • How to handle crawler versions that don't auto-upload WACZ / don't respond to pause message

@ikreymer ikreymer marked this pull request as draft April 29, 2025 19:18
@ikreymer ikreymer force-pushed the pause-resume branch 3 times, most recently from dc2d462 to 1259990 Compare May 1, 2025 01:53
@ikreymer ikreymer marked this pull request as ready for review May 1, 2025 18:48
@ikreymer ikreymer requested review from SuaYoo and tw4l May 5, 2025 18:23
@ikreymer ikreymer requested a review from SuaYoo May 13, 2025 22:58
ikreymer and others added 10 commits May 21, 2025 08:59
- add 'pause' crawl state
- turns off crawler pods, and then redis pod when paused
- add 'paused' on crawl spec to indicate when crawl is paused
- /crawl/<id>/{un}pause apis to toggle 'paused' on crawl spec
- ui: add pause/resume button, paused state
- ui: add pausing/unpausing derivative states when crawl is running and pausing, or paused and not pausing
- Hide "Pause" button if it's not relevant, instead of disabling without
a displayed reason
- Make "Resume" button primary
- Use circular icons to match other status icons
- Show toast message on successful pause/unpause
- set <crawlid>:paused key when a crawl is paused and at least one crawler pod exists
- clear <crawlid>:paused when crawl is paused and more pods running
ensure flag is cleared before redis is shutdown, already cleared when a crawl is unpaused
- stop crawls that have been paused for too long
- add 'paused_crawl_limit_minutes' to Helm chart
- add paused time and expiry to crawlconfig API response
- set to 'stopped_pause_expired' state
- ui: add support for 'Stopped: Paused Too Long' for stopped_pause_expired
- use 'paused_at' in CrawlJob to indicate crawl is paused and when
…vious versions of crawler:

- set :stopping key to true
- when crawler pod exits, immediately reset done -> interrupted
- clear :stopping key when all crawler pods have exited OR crawl not longer paused (to allow resume)
…se, except actual derived 'pausing' state in frontend

ensure resources are available for paused state
ikreymer added 2 commits May 21, 2025 12:09
…eted' after being

paused, goes to running state update
- set state to 'pause' only when all crawler pods are paused
- log 'pausing' on first change, don't recheck stopping if already set
@ikreymer
Copy link
Member Author

Had to do some additional refactoring, switching back to using :paused key, supporting only newer crawler versions, otherwise duplicate data is uploaded. Also cleans up the implementation a bit.

@ikreymer ikreymer merged commit cb50c7c into main May 21, 2025
26 of 27 checks passed
@ikreymer ikreymer deleted the pause-resume branch May 21, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop paused crawl after time limit Add pause and resume buttons to workflow Add paused state to workflow

4 participants