Skip to content

8360288: Shenandoah crash at size_given_klass in op_degenerated #26256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

earthling-amzn
Copy link
Contributor

@earthling-amzn earthling-amzn commented Jul 10, 2025

Both degenerated and full GCs unload classes before reclaiming unmarked humongous objects. This may result in a null klass pointer dereference when reclaiming unmarked humongous objects. Prior to this change, the number of regions occupied by a humongous object was computed from the size of the object. To avoid using oop::size after class unloading on an unmarked object, Shenandoah now trashes the humongous start region followed by subsequent continuation regions.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8360288: Shenandoah crash at size_given_klass in op_degenerated (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26256/head:pull/26256
$ git checkout pull/26256

Update a local copy of the PR:
$ git checkout pull/26256
$ git pull https://git.openjdk.org/jdk.git pull/26256/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26256

View PR using the GUI difftool:
$ git pr show -t 26256

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26256.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 10, 2025

👋 Welcome back wkemper! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 10, 2025

@earthling-amzn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8360288: Shenandoah crash at size_given_klass in op_degenerated

Reviewed-by: shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 80 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 10, 2025
@openjdk
Copy link

openjdk bot commented Jul 10, 2025

@earthling-amzn The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jul 10, 2025

Webrevs

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this evidently does not work, see test failures.

Architecturally, the heuristics should be looking only at region data, without looking at objects. I see we often end up calling ShenandoahHeapRegion::required_regions(obj->size(), ...) just to figure out how many HC regions are there in the chain. But we might as well scan regions from the given HS region, until we run out of HC regions.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three things:

  1. Bug synopsis should generally reflect what is being done, not what the symptom is. There is a leeway: it can describe the problem that is being solved.

  2. See the comment, "Reclaim from tail". Have you verified reclaiming from head is fine? If not, I think it is better to find the tail first, then walk it backwards. In fact, maybe it is a good time to introduce an utility method in ShenandoahHeapRegion that tells the tail of HC chain. There is already ShenandoahHeapRegion::humongous_start_region that can have a symmetric ShenandoahHeapRegion::humongous_end_region.

  3. Generally, it looks brittle to touch objects during any heuristics calculation due to these lifecycle problems. There is another instance of ShenandoahHeapRegion::required_regions(obj->size() * HeapWordSize); in ShenandoahGenerationalHeuristics::choose_collection_set -- is it affected by the same issue?

@earthling-amzn
Copy link
Contributor Author

Thanks for the review.

  1. I will update the synopsis when we settle on the approach to solve this issue. To be clear, by "synopsis" you mean the description of the pull request?
  2. Yes, I looked at all usages of ShenandoahHeapRegion::is_humongous* and didn't see any cases that would be affected by this change. All of the GHA and pipeline tests passed. The comment is over 7 years old. Also, this code to make immediate trash of unmarked humongous objects only runs on a safepoint (final-mark, degen or full-gc). I will, however, run the jtreg tests again with trace logging enabled out of an abundance of caution.
  3. Yes, I agree, the code is brittle. For reasons lost to history, STW collections (degen and full-gc) perform weak roots, weak refs and class unloading before choosing the collection set (this order is the reverse of concurrent GCs). My initial attempt to put these steps in the same order for STW collections met with... resistance. Many assertions failed for what looks like erroneous reasons. The other instance of ShenandoahHeapRegion::required_regions(obj->size() * HeapWordSize); is safe here because this is a marked humongous object (and should not have its class unloaded). I can revisit my initial approach, but I'm not sure I want to open that can of worms. I'd rather spend the time on removing STW collections.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, by "synopsis" you mean the description of the pull request?

Yes.

All right then, improve a synopsis and then we are good to go.

}
return required_regions;
regions_trashed++;
region = get_region(region->index() + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-optimization opportunity: track index as a local variable without introducing memory dependency on another region. Would likely pipeline a bit better.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 15, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 15, 2025
@pengxiaolong
Copy link

There is another instance of ShenandoahHeapRegion::required_regions(obj->size() * HeapWordSize); in ShenandoahGenerationalHeuristics::choose_collection_set -- is it affected by the same issue?

I came across the same code recently when I update the choose_collection_set to support CAS for mutator allocation. I was searching and reviewing where ShenandoahHeapRegion::required_regions are used yesterday. There are 9 places where it is been used, out of which only the one in allocation is indeed needed. I was about to create a JBS bug for the improvement.

@earthling-amzn
Copy link
Contributor Author

MacOS test failure looks unrelated (test is running G1).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc [email protected] ready Pull request is ready to be integrated rfr Pull request is ready for review shenandoah [email protected]
Development

Successfully merging this pull request may close these issues.

3 participants