Skip to content

Pause / Resume Crawls Initial Implmentation. #2572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Apr 29, 2025

Work on #2566
Fixes #2576

Todo:

  • How to handle crawler versions that don't auto-upload WACZ / don't respond to pause message

@ikreymer ikreymer marked this pull request as draft April 29, 2025 19:18
@ikreymer ikreymer force-pushed the pause-resume branch 3 times, most recently from dc2d462 to 1259990 Compare May 1, 2025 01:53
@ikreymer ikreymer marked this pull request as ready for review May 1, 2025 18:48
@ikreymer ikreymer requested review from tw4l and SuaYoo May 5, 2025 18:23
Comment on lines 624 to 576
const disablePause =
this.workflow.lastCrawlPausing ===
(this.workflow.lastCrawlState === "running");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be checking lastCrawlPausing?

Suggested change
const disablePause =
this.workflow.lastCrawlPausing ===
(this.workflow.lastCrawlState === "running");
const disablePause = this.workflow.lastCrawlPausing;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was that the button is both Pause / Resume, and possible states are:

  • pausing and running (disable, already in process of Pausing)
  • pausing and not running (don't disable, it should show Resume)
  • not pausing and not running (disable, already in process of Resuming)
  • not pausing and running (don't disable, it should show Pause)

Had to double check that, perhaps a bit more confusing then it needs to be.. maybe should rename it to disablePauseResume to be more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lastCrawlPausing / pausing reflects if user requested crawl to be paused/not paused, while crawl state is the actual state the crawl is in, so we enable the button when they are different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a variable rename would work, hidePauseResume and disablePauseResume.

Is there any need to add and check for lastCrawlResuming?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a variable rename would work, hidePauseResume and disablePauseResume.

Is there any need to add and check for lastCrawlResuming?

Resuming is the when lastCrawlPausing is set to false. I suppose it could be renamed too - if true, it means the user wants the crawl paused, if false, it means the user wants the crawl running/resumed (the default).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to hidePauseResume and disablePauseResume. Also did a pass to replace instance of 'unpause' with 'resume', including in the API, to be consistent.

@ikreymer ikreymer requested a review from SuaYoo May 13, 2025 22:58
ikreymer and others added 9 commits May 14, 2025 10:24
- add 'pause' crawl state
- turns off crawler pods, and then redis pod when paused
- add 'paused' on crawl spec to indicate when crawl is paused
- /crawl/<id>/{un}pause apis to toggle 'paused' on crawl spec
- ui: add pause/resume button, paused state
- ui: add pausing/unpausing derivative states when crawl is running and pausing, or paused and not pausing
- Hide "Pause" button if it's not relevant, instead of disabling without
a displayed reason
- Make "Resume" button primary
- Use circular icons to match other status icons
- Show toast message on successful pause/unpause
- set <crawlid>:paused key when a crawl is paused and at least one crawler pod exists
- clear <crawlid>:paused when crawl is paused and more pods running
ensure flag is cleared before redis is shutdown, already cleared when a crawl is unpaused
- stop crawls that have been paused for too long
- add 'paused_crawl_limit_minutes' to Helm chart
- add paused time and expiry to crawlconfig API response
- set to 'stopped_pause_expired' state
- ui: add support for 'Stopped: Paused Too Long' for stopped_pause_expired
- use 'paused_at' in CrawlJob to indicate crawl is paused and when
…vious versions of crawler:

- set :stopping key to true
- when crawler pod exits, immediately reset done -> interrupted
- clear :stopping key when all crawler pods have exited OR crawl not longer paused (to allow resume)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop paused crawl after time limit Add pause and resume buttons to workflow Add paused state to workflow
3 participants