General Retrospective for September and October 2024 Releases #54

adamfarley · 2024-08-01T14:33:47Z

Summary

A retrospective for all efforts surrounding the titular releases.

All community members are welcome to contribute to the agenda via comments below.

This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.

On the day of the meeting we'll review the agenda and add a list of actions at the end.

Invited: Everyone.

Time, Date, and URL

Time: 3-4pm UTC
Date: Monday the 18th of November
URL: https://meet.google.com/uwc-iwjn-rqm

Details

Retrospective Owner Tasks (in order):

Post retro URL in #Release around the start of the new release.
Wait until most builds are released, with no signs of a respin.
Announce the retrospective's date + time on #Release a week in advance.
Host the retrospective:
- Go through the agenda.
- Create a list of actions.
Process each action:
- Create a "WIP" issue including the source comment.
- Add the issue to the current iteration.
- Add an issue link to the action list.
Create a new retrospective issue for the next release.
Set a calendar reminder so you remember to do step 1 before the next release.
Close this issue.

TLDR

Add proposed agenda items as comments below.

andrew-m-leonard · 2024-09-10T18:49:05Z

build repo release branches don't have mandatory PR review, probably as settings regex does not match...?

andrew-m-leonard · 2024-09-10T18:50:33Z

build repo code freeze check for the release branch was not enabled, but then I thought, do we really need it, especially if we get the release branch mandatory review fixed?

andrew-m-leonard · 2024-09-10T19:09:08Z

Currently dryrun tags are the tag previous to the suspected actual GA tag, since it's not easy to "reset" the auto-trigger, maybe we ought to fix that...?

fyi, a bit naff!, but to do a trigger "reset" (since I had to do one for a failed dryrun trigger!)
As a Jenkins "Admin":

Go to "jenkins-worker" node "Script Console": https://ci.adoptium.net/computer/jenkins%2Dworker/script
Remove the workspace "tracking" file that contains the last triggered SCM:

println "rm /home/jenkins/workspace/build-scripts/utils/releaseTrigger_jdk23/workspace/tracking".execute().text

andrew-m-leonard · 2024-09-11T15:35:18Z

getTestDependency was failing on temurin-compliance due to no authentication: adoptium/aqa-tests#5589
This was failing in the July release as well, but failure of this stage does not fail the job.. which means we use the workspace cache, if we have one, and whatever maybe there!

smlambert · 2024-09-12T01:30:18Z

re: #54 (comment)

This was failing in the July release as well, but failure of this stage does not fail the job.. which means we use the workspace cache, if we have one

Do not think there is anything in the dependencies list that gets used by the TC jobs (but could affect if we are using TC Grinder to verify AQAvit tests, though most dependencies do not change often, so cached versions are fine).

andrew-m-leonard · 2024-09-12T10:47:52Z

TRSS needs new JDK versions adding before release week, release-openjdk23-pipeline was missing.

SL/Sept12 - now added

andrew-m-leonard · 2024-09-19T09:22:20Z

We should be more accurate with our release process terminology:
Publish updates to the containers to dockerhub
should be:
Publish docker images to dockerhub

sophia-guo · 2024-09-26T01:08:24Z

When doing the triage, the tap files of the grinder should be attached to the triage issue , for example adoptium/aqa-tests#5598. So the job https://ci.adoptium.net/view/Test_grinder/job/TAP_Collection can collect tap files of pipeline job and tap files of grinder.

sophia-guo · 2024-09-26T01:13:37Z

For trss if rerun job passes the corresponding test job status should be set as pass, so no need to do the extra triage. For example https://trss.adoptium.net/resultSummary?parentId=66e2f744d24e1b006e88e097 aarch64_mac, extended.openjdk rerun passed, the extended.openjdk should set as success.

@adamfarley says: This issue has been raised here.

sophia-guo · 2024-09-26T01:20:11Z

AQA triage, using the auto generated rerun links of rerun test job, which has already prepopulated either failed test targets or failed test cases. https://ci.adoptium.net/job/Test_openjdk23_hs_extended.openjdk_x86-64_windows_rerun/19/

smlambert · 2024-09-26T01:39:29Z

For trss if rerun job passes the corresponding test job status should be set as pass, so no need to do the extra triage. For example https://trss.adoptium.net/resultSummary?parentId=66e2f744d24e1b006e88e097 aarch64_mac, extended.openjdk rerun passed, the extended.openjdk should set as success.

Quick checks to make when triaging, look at the rerun.tap file on the Jenkins job, if its green, nothing to do.

We should also have a different chiclet icon for this "state" where rerun job passes. Suggest a yellow chiclet with a small green circle in top right corner for that state and so forth. Related issue: adoptium/aqa-test-tools#912

sophia-guo · 2024-10-02T19:44:49Z

There are almost no tests jobs were triggered by openjdk**-pipeline or evaluation-openjdk**-pipeline during September release ( i.e, ea build triggered nightly or weekly). As we set around 10 days before and 5 days after release as the no nightly tests job window. https://github.com/adoptium/ci-jenkins-pipelines/blob/master/pipelines/build/common/trigger_beta_build.groovy#L53-L79, which might be fine with January, March, July and September releases. May not be good for October and April releases.

Due to the scheduling of releases in September and October, as well as in March and April, there is a potential overlap that could result in gaps in testing. Specifically, with releases in March and September, followed closely by April and October, there may be minimal time available for comprehensive testing between those consecutive releases. As a result, critical tests may be rushed or omitted, impacting the stability of those releases. For example, reproducible comparing tests on linux are updated in Sep 6th and after that the test was only run once with jdk24 by Oct2.

andrew-m-leonard · 2024-10-03T10:58:32Z

There are almost no tests jobs were triggered by openjdk**-pipeline or evaluation-openjdk**-pipeline during September release ( i.e, ea build triggered nightly or weekly). As we set around 10 days before and 5 days after release as the no nightly tests job window. https://github.com/adoptium/ci-jenkins-pipelines/blob/master/pipelines/build/common/trigger_beta_build.groovy#L53-L79, which might be fine with January, March, July and September releases. May not be good for October and April releases.

Due to the scheduling of releases in September and October, as well as in March and April, there is a potential overlap that could result in gaps in testing. Specifically, with releases in March and September, followed closely by April and October, there may be minimal time available for comprehensive testing between those consecutive releases. As a result, critical tests may be rushed or omitted, impacting the stability of those releases. For example, reproducible comparing tests on linux are updated in Sep 6th and after that the test was only run once with jdk24 by Oct2.

To add some extra info, for example jdk-21.0.5+7 and +8 EA builds both landed during the Sept release "disabled test" period, jdk-21.0.5+6 EA was the last build run with tests prior to release, and jdk-21.0.5+9 after:

smlambert · 2024-10-17T14:19:57Z

October release

dynamic agents for x64Linux were unexpectedly in play (due to having ci.role.test on them which should not be the case, and also because they stay around for 1hr once spun up).
ppc64le_linux problematic machine needed to be taken offline (CURL_OPENSSL_3 not found), would benefit from turning on our "automatically take problem machines offline" feature in test pipelines, to avoid sending more jobs to problem machine)

andrew-m-leonard · 2024-10-21T13:56:50Z

October:
Care needs taking when publishing binaries to check if a platform was rebuilt, for example both jdk17 macAarch64 and jdk17 pLinux were rebuilt, but binaries were still present on the original pipeline. Mac was initially published from the wrong one.

Can we remove bad build artifacts? when we rebuild...

andrew-m-leonard · 2024-10-23T09:39:30Z

October, we forgot to publish JDK11 aarch64 mac even though it had been finished for several days

andrew-m-leonard · 2024-10-23T09:46:44Z

status by platform document #60 is not always being updated...
I think we need to automate this, it's too easy to forget or update wrongly

andrew-m-leonard · 2024-10-23T10:14:28Z

misstakes were made in selecting publish job links, meaning a platform didn't get published when we said it was, due to clicking on WindowsX64 rather than Windowsx32...

smlambert · 2024-10-24T01:06:48Z

aarch64 windows was added as a platform for jdk21 and jdk23, but there were several changes required for it to be ready.

This could have happened well ahead of the release period (as per the plan discussed in past PMC mtg), it could have also been seen during a dry run, but no dry run was performed (were other checklist items not completed, seemed the release champion was not always present and in that event missed the opportunity to communicate that to others and ensure tasks were delegated).

andrew-m-leonard · 2024-10-25T13:41:35Z

We need to invest resource in making the Installers publishing a lot better and automated.
In its current form it mentally scars you !!

sophia-guo · 2024-10-28T14:59:04Z

adoptium/aqa-tests#5692 (comment)

Some arm32 jdk8 tests used to work on non-containers agents. Seems we don't have them any more https://ci.adoptium.net/label/ci.role.test&&sw.os.linux&&hw.arch.aarch32/. If the tests can only pass on non-containers we might need to do a vendor exclude due to our eclipse machine farm having limitations. https://github.com/adoptium/aqa-tests/blob/master/openjdk/excludes/vendors/eclipse/ProblemList_openjdk8.txt

andrew-m-leonard · 2024-10-30T09:54:57Z

I think this release has demonstrated the necessity of a dry-run, but also the issue with the "installers" and the new Azure VMs demonstrates the need for a dry-run installers upload possibly?

sxa · 2024-11-07T10:23:35Z

NOTE: Proposal to move the releasing guide to a wiki in either the build or one of the top level repositories: adoptium/temurin-build#3993

sxa · 2024-11-07T11:10:42Z

Memo to self: Discuss deadlock potential with x64 nodes during installer process.

adamfarley · 2024-11-18T16:45:03Z

Actions

Raise issues (@adamfarley)

General Retrospective for September and October 2024 Releases #54 (comment) - Put in backlog, assign to Andrew (tech debt - critical) - Done. Bug: Repository release branches don't have mandatory PR review temurin-build#4045
General Retrospective for September and October 2024 Releases #54 (comment) - Update docs to do this, plus a general refresh of said docs. - Done. Request for docs update: we should attach tap files during manual triage of release builds aqa-tests#5757

Other actions:
Andrew:

General Retrospective for September and October 2024 Releases #54 (comment) - To change "dryrun" to "candidate" in release docs.
General Retrospective for September and October 2024 Releases #54 (comment) - To update new jdk version.
General Retrospective for September and October 2024 Releases #54 (comment) - To update docs with "publish container images to dockerhub", or possibly "publish docker container images to dockerhub", or even "publish images to dockerhub". Whatever flows best.
General Retrospective for September and October 2024 Releases #54 (comment) - The "10 days beforehand" references are outdated. Andrew to update the docs to reference the new policy of "3 days before, 5 days afterwards".
General Retrospective for September and October 2024 Releases #54 (comment) - To prevent us putting the link in if a build is red:

Someone:

General Retrospective for September and October 2024 Releases #54 (comment) - A video/docs to be created for the 2 use cases: 1) Reruns inside a child of the test job, and 2) a "sibling" rerun of an entire test job (if the original job failed - red status).

Other data:

Add yourself to aqavit-bot if you want to get infrequent test machine updates on machines being knocked offline automatically for issues like these: General Retrospective for September and October 2024 Releases #54 (comment)

adamfarley · 2024-11-18T16:55:03Z

Next retrospective - #64

adoptium deleted a comment from llxia Sep 12, 2024

adoptium deleted a comment from andrew-m-leonard Sep 12, 2024

sophia-guo mentioned this issue Oct 2, 2024

Status of Reproducible comparing tests on linux adoptium/temurin-build#3967

Closed

adamfarley self-assigned this Nov 8, 2024

adamfarley mentioned this issue Nov 18, 2024

Bug: Repository release branches don't have mandatory PR review adoptium/temurin-build#4045

Closed

adamfarley mentioned this issue Nov 18, 2024

Request for docs update: we should attach tap files during manual triage of release builds adoptium/aqa-tests#5757

Open

adamfarley closed this as completed Nov 18, 2024

andrew-m-leonard mentioned this issue Nov 21, 2024

Improve build pipeline publish links to warn of build job failures adoptium/ci-jenkins-pipelines#1149

Closed

adamfarley added this to 2024 4Q Adoptium Plan Nov 22, 2024

smlambert moved this to Done in 2024 4Q Adoptium Plan Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Retrospective for September and October 2024 Releases #54

General Retrospective for September and October 2024 Releases #54

adamfarley commented Aug 1, 2024 •

edited

Loading

andrew-m-leonard commented Sep 10, 2024

andrew-m-leonard commented Sep 10, 2024

andrew-m-leonard commented Sep 10, 2024 •

edited

Loading

andrew-m-leonard commented Sep 11, 2024

smlambert commented Sep 12, 2024

andrew-m-leonard commented Sep 12, 2024 •

edited by smlambert

Loading

andrew-m-leonard commented Sep 19, 2024

sophia-guo commented Sep 26, 2024

sophia-guo commented Sep 26, 2024 •

edited by adamfarley

Loading

sophia-guo commented Sep 26, 2024

smlambert commented Sep 26, 2024

sophia-guo commented Oct 2, 2024

andrew-m-leonard commented Oct 3, 2024

smlambert commented Oct 17, 2024

andrew-m-leonard commented Oct 21, 2024 •

edited

Loading

andrew-m-leonard commented Oct 23, 2024

andrew-m-leonard commented Oct 23, 2024 •

edited

Loading

andrew-m-leonard commented Oct 23, 2024

smlambert commented Oct 24, 2024

andrew-m-leonard commented Oct 25, 2024 •

edited

Loading

sophia-guo commented Oct 28, 2024 •

edited

Loading

andrew-m-leonard commented Oct 30, 2024

sxa commented Nov 7, 2024

sxa commented Nov 7, 2024

adamfarley commented Nov 18, 2024 •

edited by andrew-m-leonard

Loading

adamfarley commented Nov 18, 2024

General Retrospective for September and October 2024 Releases #54

General Retrospective for September and October 2024 Releases #54

Comments

adamfarley commented Aug 1, 2024 • edited Loading

andrew-m-leonard commented Sep 10, 2024

andrew-m-leonard commented Sep 10, 2024

andrew-m-leonard commented Sep 10, 2024 • edited Loading

andrew-m-leonard commented Sep 11, 2024

smlambert commented Sep 12, 2024

andrew-m-leonard commented Sep 12, 2024 • edited by smlambert Loading

andrew-m-leonard commented Sep 19, 2024

sophia-guo commented Sep 26, 2024

sophia-guo commented Sep 26, 2024 • edited by adamfarley Loading

sophia-guo commented Sep 26, 2024

smlambert commented Sep 26, 2024

sophia-guo commented Oct 2, 2024

andrew-m-leonard commented Oct 3, 2024

smlambert commented Oct 17, 2024

andrew-m-leonard commented Oct 21, 2024 • edited Loading

andrew-m-leonard commented Oct 23, 2024

andrew-m-leonard commented Oct 23, 2024 • edited Loading

andrew-m-leonard commented Oct 23, 2024

smlambert commented Oct 24, 2024

andrew-m-leonard commented Oct 25, 2024 • edited Loading

sophia-guo commented Oct 28, 2024 • edited Loading

andrew-m-leonard commented Oct 30, 2024

sxa commented Nov 7, 2024

sxa commented Nov 7, 2024

adamfarley commented Nov 18, 2024 • edited by andrew-m-leonard Loading

Actions

adamfarley commented Nov 18, 2024

adamfarley commented Aug 1, 2024 •

edited

Loading

andrew-m-leonard commented Sep 10, 2024 •

edited

Loading

andrew-m-leonard commented Sep 12, 2024 •

edited by smlambert

Loading

sophia-guo commented Sep 26, 2024 •

edited by adamfarley

Loading

andrew-m-leonard commented Oct 21, 2024 •

edited

Loading

andrew-m-leonard commented Oct 23, 2024 •

edited

Loading

andrew-m-leonard commented Oct 25, 2024 •

edited

Loading

sophia-guo commented Oct 28, 2024 •

edited

Loading

adamfarley commented Nov 18, 2024 •

edited by andrew-m-leonard

Loading