Skip to content

Pause / Resume Crawls Initial Implmentation. #2572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 21, 2025
Merged

Pause / Resume Crawls Initial Implmentation. #2572

merged 15 commits into from
May 21, 2025

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Apr 29, 2025

Work on #2566
Fixes #2576

Todo:

  • How to handle crawler versions that don't auto-upload WACZ / don't respond to pause message

@ikreymer ikreymer marked this pull request as draft April 29, 2025 19:18
@ikreymer ikreymer force-pushed the pause-resume branch 3 times, most recently from dc2d462 to 1259990 Compare May 1, 2025 01:53
@ikreymer ikreymer marked this pull request as ready for review May 1, 2025 18:48
@ikreymer ikreymer requested review from tw4l and SuaYoo May 5, 2025 18:23
@ikreymer ikreymer requested a review from SuaYoo May 13, 2025 22:58
ikreymer and others added 10 commits May 21, 2025 08:59
- add 'pause' crawl state
- turns off crawler pods, and then redis pod when paused
- add 'paused' on crawl spec to indicate when crawl is paused
- /crawl/<id>/{un}pause apis to toggle 'paused' on crawl spec
- ui: add pause/resume button, paused state
- ui: add pausing/unpausing derivative states when crawl is running and pausing, or paused and not pausing
- Hide "Pause" button if it's not relevant, instead of disabling without
a displayed reason
- Make "Resume" button primary
- Use circular icons to match other status icons
- Show toast message on successful pause/unpause
- set <crawlid>:paused key when a crawl is paused and at least one crawler pod exists
- clear <crawlid>:paused when crawl is paused and more pods running
ensure flag is cleared before redis is shutdown, already cleared when a crawl is unpaused
- stop crawls that have been paused for too long
- add 'paused_crawl_limit_minutes' to Helm chart
- add paused time and expiry to crawlconfig API response
- set to 'stopped_pause_expired' state
- ui: add support for 'Stopped: Paused Too Long' for stopped_pause_expired
- use 'paused_at' in CrawlJob to indicate crawl is paused and when
…vious versions of crawler:

- set :stopping key to true
- when crawler pod exits, immediately reset done -> interrupted
- clear :stopping key when all crawler pods have exited OR crawl not longer paused (to allow resume)
…se, except actual derived 'pausing' state in frontend

ensure resources are available for paused state
ikreymer added 2 commits May 21, 2025 12:09
…eted' after being

paused, goes to running state update
- set state to 'pause' only when all crawler pods are paused
- log 'pausing' on first change, don't recheck stopping if already set
@ikreymer
Copy link
Member Author

Had to do some additional refactoring, switching back to using :paused key, supporting only newer crawler versions, otherwise duplicate data is uploaded. Also cleans up the implementation a bit.

@ikreymer ikreymer merged commit cb50c7c into main May 21, 2025
26 of 27 checks passed
@ikreymer ikreymer deleted the pause-resume branch May 21, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop paused crawl after time limit Add pause and resume buttons to workflow Add paused state to workflow
3 participants