Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: EXC-1992: Fix same round install code and draining queues #4346

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

berestovskyy
Copy link
Contributor

@berestovskyy berestovskyy commented Mar 12, 2025

The current scheduler logic permits advance_long_running_install_code to complete a lengthy install process and then immediately initiate a new one within drain_subnet_queues.

This PR uses the can_execute_subnet_msg function to differentiate between heavy subnet messages, which must respect the round limit, and light subnet messages, which can be executed at any time.

Fixes RUN-386

@github-actions github-actions bot added the fix label Mar 12, 2025
@berestovskyy berestovskyy force-pushed the andriy/exc-1992-fix-same-round-install-and-drain branch from d8ca6a8 to f8b75cc Compare March 12, 2025 19:23
@berestovskyy berestovskyy changed the title fix: EXC-1992: Fix same round advancing install code and draining queues fix: EXC-1992: Fix same round install code and draining queues Mar 12, 2025
@berestovskyy berestovskyy force-pushed the andriy/exc-1992-fix-same-round-install-and-drain branch 2 times, most recently from 4035f66 to ad85db5 Compare March 12, 2025 22:08
The current scheduler logic permits `advance_long_running_install_code`
to complete a lengthy install process and then immediately initiate
a new one within `drain_subnet_queues`.

This PR uses the `can_execute_subnet_msg` function to differentiate
between heavy subnet messages, which must respect the round limit,
and light subnet messages, which can be executed at any time.

Fixes RUN-386
@berestovskyy berestovskyy force-pushed the andriy/exc-1992-fix-same-round-install-and-drain branch from ad85db5 to b1a77b0 Compare March 13, 2025 05:17
@berestovskyy berestovskyy marked this pull request as ready for review March 13, 2025 09:06
@berestovskyy berestovskyy requested a review from a team as a code owner March 13, 2025 09:06
Copy link
Member

@dsarlis dsarlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since the issue is fixed. Both comments are non-blocking and can be followed up on later if there's agreement.

match method {
// Only one install code message allowed at a time.
Ic00Method::InstallCode | Ic00Method::InstallChunkedCode => {
!ongoing_long_install_code && !effective_canister_is_aborted
!instructions_reached && !ongoing_long_install_code && !effective_canister_is_aborted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit brittle to me. Let's imagine that we add a new method, is it clear enough that someone would think about whether they need to make it a special case or they can just put it in the catch all match below?

We have the ic00_permissions which seems like a potentially promising approach to encode some of these requirements better. For example by adding a "heavy_operation" (or consuming instructions flag), maybe also one about "effective_canister_is_aborted". Probably that would require a rename to the module as these are not really permissions then. But I see a benefit in encoding in one place these requirements which would also mean that we make it easier for people to add all restrictions related to management calls (whether it's who can call them, or if they consumed instructions).

Given that this would likely be a bigger refactoring and assuming we see value, we can follow up later, I do not consider it a blocker.


#[test]
#[cfg(not(all(target_arch = "aarch64", target_vendor = "apple")))]
fn yield_for_dirty_pages_copy_works_for_install_code() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the problem would manifest even in case of long install code due to many instructions used, followed up by another install code immediately. Should the tests be a bit more general to cover that instead of focusing on the case of too many dirty pages touched? Is my understanding wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants