Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Machine requirement: Windows dockerBuild containers #3286

Open
5 of 8 tasks
sxa opened this issue Dec 6, 2023 · 27 comments · Fixed by #3702
Open
5 of 8 tasks

New Machine requirement: Windows dockerBuild containers #3286

sxa opened this issue Dec 6, 2023 · 27 comments · Fixed by #3702

Comments

@sxa
Copy link
Member

sxa commented Dec 6, 2023

I need to request a new machine:

  • New machine operating system (e.g. linux/windows/macos/solaris/aix): Windows
  • New machine architecture (e.g. x64/aarch32/arm32/ppc64/ppc64le/sparc): x64
  • Provider (leave blank if it does not matter): Docker :-)
  • Desired usage: Build containers, similar to what we have for Linux
  • Any unusual specification/setup required:
  • How many of them are required: n/a - they should be created dynamically

Please explain what this machine is needed for:
Running builds in an isolated way where we can achieve SLSA build level 3 compliance on Windows along with the other primary platforms. Ideally we'll be able to create windows-on-windows container images which we share and then download and run the builds in.

As background info:

So the tasks required would be:

  • Identify the appropriate software for running containers and ensure no licensing concerns (Likely something from the microsoft site linked above)
  • See if we can verify that a "basic" dockerfile works in that environment and whether we can map directories into it (same as -v on linux) which are read+write in the container
  • Determine whether we can create a container from the playbooks using a dockerfile equivalent to the Linux ones
  • Once we create the container, map a directoryi from the host into it with -v and use that to build Temurin in the container on the mapped volume so that the output is visible on the host system.
  • Understand whether we can reasonably push the resulting container images with the compiler up to dockerhub
  • Integrate this into the build pipelines
  • Implement processes to regenerate the images when playbook updates are made, - likely an addition to what we do for Linux in https://github.com/adoptium/infrastructure/blob/master/FAQ.md#what-about-the-builds-that-use-the-dockerbuild-tag
  • Declare SLSA Build level 3 on Windows :-)

Once this level of analysis and expertise is gained it will likely make windows installer testing, or any other such activities simpler and give us more options moving forward.

Related for historic reference:

@RadekCap
Copy link

Please, assign this task to me. Thank you.

@sxa
Copy link
Member Author

sxa commented Jul 4, 2024

Of the three options listed on the Microsoft website:

  • The first (Docker CE / Moby) seems to work well out of the box
  • The second (Mirantis) appears to be a commercial offering
  • The third (Containerd+nerdctl) appers functional although networking doesn't work out of the box and it seems to fail to be able to start the eclipse-temurin container's default jshell process.

@sxa
Copy link
Member Author

sxa commented Jul 4, 2024

OK First phase done ...

  • docker run -p 5986:5986 -v c:\Users\sxa:c:\sxa mcr.microsoft.com/windows/servercore:ltsc2022
  • Run ConfigureRemotingForAnsible.ps1 with the usual parameters with the netsh commands disabled (They require windows defender which isn't in the image)
  • Create a user to connect with for the playbooks (MyPassword is not what I've used on the live system!):
net user ansible MyPassword /ADD
net localgroup "Administrators" ansible /ADD
net localgroup "Remote Management Users" ansible /ADD

This allows the machine to be accessible via ansible running on a remote machine :-)

(Also, for my own notes, to debug powershell scripts use Set-PSDebug -Trace 2)

@sxa sxa self-assigned this Jul 4, 2024
@sxa
Copy link
Member Author

sxa commented Jul 5, 2024

Playbook execution notes:

  • VS2013 requires the archive under /Vendor_Files/windows, otherwise MSVS_2013 needs to be skipped
  • NTP_TIME needs to be skipped as that has issues that are presumably related to running in a container: FAILED! => {"changed": false, "msg": "Unhandled exception while executing module: Service 'Windows Time (W32Time)' cannot be started due to the following error: Cannot start service W32Time on computer '.'."}
  • In the absence of the fixed layout files for VS2019 and VS2022, adoptopenjdk needs to be skipped to allow them to complete successfully

@sxa sxa added this to the 2024-07 (July) milestone Jul 5, 2024
@sxa
Copy link
Member Author

sxa commented Jul 5, 2024

ansible can be run on the host to point at the container if you install cygwin which has ansible as one of its installable options (You probably want to include git too if it's a clean install on the host system). Noting that if you use localhost/127.0.0.1 in your hosts file you should specify -e git_sha=12345 or something appropriate otherwise the execution will trip up over

- name: Get Latest git commit SHA (Windows Localhost)

Noting that WSL could probably be used too, but that requires a system with virtualization extension instructions to be available which is not the case on all systems.

@sxa
Copy link
Member Author

sxa commented Jul 26, 2024

Latest attempt is with:
--skip-tags adoptopenjdk,reboot,MSVS_2013,MSVS_2017,NTP_TIME
(Note: MSVS_2013 is because I didn't have the installer on the machine, 2017 did not work, could also add Dragonwell to skip that install which is not required for Temurin.
Playbook changes to make it complete:

  • Set ansible_connection/ansible_winrm_transport in ansible.cfg
  • Set ansible_user/ansible_password in group_vars/all/adoptopenjdk_variables.yml
  • Remove win_reboot: from Common/roles/main.yml Line 60
  • Remove win_reboot: from MSVS_2013 role line 50
  • Remove win_reboot: from MSVS_2017 role line 37
  • Remove checksum parameters MSVS_2022 role line 103 as it's been updated
  • Remove win_reboot from WMF_5.1 role line 29
  • Remove win_reboot from cygwin role line 45 (Although it's already covered with th reboot tag)

After ansible run is complete, run the commands shown in this article

docker ps
docker stop <image>
docker commit <image> win2022_build_image

After which it can be started again and used

@sxa
Copy link
Member Author

sxa commented Jul 29, 2024

docker commit didn't work on my image:
Error response from daemon: re-exec error: exit status 1: output: mkdir \\?\C:\Windows\SystemTemp\hcs376450290\Files: Access is denied
This is specific to the new image which has had the playbook run on it and does not occur when attempting to commit a image with only basic changes applied.

EDIT: This seems to be the temporary location where it is storing the entire image before it is committed and the machine ran out of space.

Noting that outside that directory most of the docker data is stored in C:\ProgramData\docker

EDIT 2: The docker commit command on the second machine which had adequate space used around 95GB of space in C:\windows\SystemTemp to perform the commit (excluded VS2013 and 2017) and took about 40 minutes at 40-50Mb/sec showing on resource monitor, followed by about 10 minutes of using another 15GB on C: then moving data back to the docker directory at a faster rate (Maybe ~100Mb/sec)

It did, however, hit an error Error response from damon: re-execx error: exit status 1: output: hcsshim::IpmportLayer failed in Win32: Access is denied. (0x5) (Probably hit a zero disk space condition on C: since DOCKER_TMPDIR apparently isn't working to relocate that since docker 25)

@sxa
Copy link
Member Author

sxa commented Jul 29, 2024

This is unfortunate. The builds aren't working because it looks like the automatic shortname generation (fsutil behavior set disable8.3 0) does not appear to be working within the container but is mandatory for the openjdk build process. Directories can have a shortname created manually with fsutil file setshortname "Long name" shortname but that is not ideal to do for each possible path.

EDIT: Noting that https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/shortNames/tasks/main.yml already has some explicit short name creation.

@sxa
Copy link
Member Author

sxa commented Jul 29, 2024

Manually created a few of the shortnames that the configure step was objecting to and I have a JDK21u build complete in a container, so this seems feasible 👍🏻

@sxa
Copy link
Member Author

sxa commented Jul 30, 2024

Noting that we should look at doing this with the MS build tools installer which is suitable for use by Open Source projects. The jdk21u builds currently use:

10:04:20  * C Compiler:     Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
10:04:20  * C++ Compiler:   Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)

Other references (this numbering is more confiusing that I realised - I thought we only had the '2022' vs '19.xx' versioning differences to worry about before today...)

@sxa
Copy link
Member Author

sxa commented Jul 30, 2024

Noting that we should look at doing this with the MS build tools installer which is suitable for use by Open Source projects. The jdk21u builds currently use:

10:04:20  * C Compiler:     Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
10:04:20  * C++ Compiler:   Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)

Other references (this numbering is more confiusing that I realised - I thought we only had the '2022' vs '19.xx' versioning differences to worry about before today...)

@sxa
Copy link
Member Author

sxa commented Jul 30, 2024

Struggling with the GPG role at the moment which is called during the ANT role (I'm getting gnupg as a requirement which supplies gpg2 instead of gpg). Also Wix has to be skipped as I don't have ansible.builtin.runs available.

Other than that a two-phase dockerfile is looking quite promising. The first sets up WinRM (will only be invoked locally) and installs cygwin with git and ansible, then triggers a reboot to ensure the cygwin path takes effect.

The second runs the playbooks as normal, although for now I've currently it running in multiple layers for performance of testing to allow the caching of each layer to take effect independently:

  1. --skip-tags adoptopenjdk,reboot,ANT,NTP_TIME,Wix,MSVS_2013,MSVS_2017,MSVS_2019,MSVS_2022
  2. -t ANT
  3. -t MSVS_2019
  4. -t MSVS_2022

This is currently using the playbook branch at https://github.com/sxa/infrastructure/tree/sxa_allhosts which makes a few changes to support this execution.

@sxa sxa modified the milestones: 2024-07 (July), 2024-08 (August) Jul 31, 2024
@sxa
Copy link
Member Author

sxa commented Aug 1, 2024

The above approach seemed to work yesterday now that the machine is rebooted after adding cygwin to the PATH and I had a system which was able to successfully build jdk21u using two dockerfiles (First to configure WinRM, the second to run the playbooks using the individual layers from the previous comment. Next steps as follows:

  • Verify that on a clean image (I made some changes inside the image after my infrastructure branch was extracted, so that needs to be confirmed as captured in the branch)
  • Fix Wix install
  • Fix the git_sha detection
  • Update the MSVS_2022 role to use MS build tools to ensure reproducibility of the builds
  • Ideally test with the MSVS_2013 and 2017 installers available in the image so those roles do not need to be skipped.

Noting that the image without VS2013 or 2017 is 99GB in size.

@sxa
Copy link
Member Author

sxa commented Aug 1, 2024

Now fixed the path setting so that it only requires one dockerfile so we have something consistent with what we have on Linux now 👍🏻

It still currently requires a username/password for the authentication, but the password can be passed into the dockerfile with --build arg PW=SomeAcceptablePassword on the docker build command.

I haven't got it picking up the git_sha properly yet so that is currently hard-coded. Everything else is good enough to be able to run a jdk21u build on, but it's missing the compilers for some earlier versions (Will need those on the host and mapped in via Vendor_Files, similar to what we do with AWX). Also we'll want the jenkins_user role (Currently skipped via adoptopenjdk unless we're happy with the processes running as an administrator within the container (Need to check how well user mapping works in these containers)

Otherwise, here is the dockerfile Dockerfile.win2022v2.txt which uses the playbook changes from https://github.com/sxa/infrastructure/tree/windows_docker_fixes

@sxa sxa pinned this issue Aug 1, 2024
@sxa
Copy link
Member Author

sxa commented Aug 5, 2024

VS2013 install appears to complete OK (Based on the logs in C:\Windows\SystemTemp - more detailed logs are in C:\Temp) but the playbook doesn't terminate that role so it never continues.

Sizes:

Version Path Total file size on file system
VS2022 C:\Program Files\Microsoft Visual Studio\2022 19.7G
VS2019 C:\Program Files\Microsoft Visual Studio\2019 12.5G
VS2017? C:\Program Files\Microsoft Visual Studio 14.0 2.3G
n/a C:\Program Files (x86)\Windows Kits 14G (+7GB with VS2017)
n/a C:\Program Files (x86)\Microsoft SDKs 5.8G

NOTE: The playbooks set up with the dockerfile excluding all the visual studio installations produces a docker image which is 15.4G in size

NOTE 2: If the machine runs out of disk space on C: during a commit phase, there will be hcs* directories left under C:\Windows\SystemTemp which should be removed manually.

@sxa
Copy link
Member Author

sxa commented Aug 6, 2024

Steps to set up:

  • Provision Windows Server 2022 machine - ideally with at least 250GB on the C: file system
  • Install cygwin (installer exe) with git and ansible support added
  • Ideally set up a second disk for Docker and then do something like mklink /J C:\ProgramData\docker F:\ (It's not immediately obvious how to set the data dir to a different location, so this works in the meantime)
  • Set 8.3 name support: fsutil 8dot3name set 0 (Otherwise shortnames can't be set within the containers which cygwin will need to run our automation)
  • Install docker either with the helper script (Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/Windows-Containers/Main/helpful_tools/Install-DockerCE/install-docker-ce.ps1" -o install-docker-ce.ps1 then .\install-docker-ce.ps1 or with the manual download steps at https://docs.docker.com/engine/install/binaries/#install-server-and-client-binaries-on-windows
  • Run docker run mcr.microsoft.com/windows/servercore:ltsc2022 cmd /c dir /x to verify that there is a visible shortname for the Program Files and Program Files (x86) directories.
  • Grab Dockerfile.win2022 from the infrastructure repo and run docker build --build-arg PW=Some-Pa55wd -t win2022_build_image -f Dockerfile.win2022 . 2>&1 | \cygwin64\bin\tee ansible.log (The PW parameter doesn't matter as long as it's valid for a windows user as it only exists during the ansible run)

From there you can run this to start the container:

  • mkdir %HOMEPATH%\workspace
  • docker run it -v %HOMEPATH%\workspace:C:\workspace win2022_build_image

Then go through the normal build process:

  • cd \workspace
  • git clone https://github.com/adoptium/temurin-build
  • cd temurin-build/build-farm
  • set CONFIGURE_ARGS=--with-toolchain-version=2022
  • bash ./make-adopt-build-farm.sh jdk21u

@sxa
Copy link
Member Author

sxa commented Aug 6, 2024

Based on adoptium/temurin-build#2922 (comment) we may be able to switch to using Visual Studio 2022 for everything which would significantly reduce the windows installation requirements. The dockerfile is currently set up to only install VS2022 and not the other versions.

@sxa
Copy link
Member Author

sxa commented Aug 9, 2024

Next bullet on the list is to: Integrate this into the build pipelines

Initial attempts using a jenkins workspace directory with a drive on F: failed because the jenkins docker failed to map it into F: in the container as there was only a C: drive. Switched the workspace directory to C:\jenkins-workspace and we hit path limits:

10:03:48  configure: error: Your base path is too long. It is 112 characters long, but only 100 is supported

Now moved to using C:\ws for the directory and it seems to be progressing well:
Machine: dockerhost-azure-win2022-x64-1 (temporary, called sxa-win2022-3 in the Azure console)
Build job: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-docker/

Noting that I have had errors like this and while I have not identified the exact cause, clearing out the build-scripts directory in the workspace resolves it:

10:24:33  Checked out HEAD commit SHA:
[Pipeline] sh
10:24:34  sh: c:/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker@tmp/durable-34ace7f2/script.sh.copy: No such file or directory`

The build (both jdk8u and jdk21u) then failed later on with another path length issue. I have therefore shortened the name of the job to windbld (Windows Docker Build) and the build has run through to completion. This will need further investigation but it's a good position at which to end the week :-) I've had to make some changes in the build repository to make this work (most specifically using git config --global safe.directory /cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/windbld in openjdk_build_pipeline.groovy to avoid errors such as the one in https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/35/console:

10:14:10  + git clean -fdx
10:14:10  fatal: detected dubious ownership in repository at '/cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker'
10:14:10  To add an exception for this directory, call:
10:14:10  
10:14:10  	git config --global --add safe.directory /cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker

Successful builds in jenkins with windbld job name:

@sxa
Copy link
Member Author

sxa commented Aug 9, 2024

Job https://ci.adoptium.net/job/win2022_docker_image_updater/label=dockerhost-azure-win2022-x64-1/ is being prototyped to create the docker image. It is a stripped down copy of the rhel7/s390x one and will save to win2022_notrhel_image on the host for now, and as per earlier comments it does not include the infrastructure SHA.

@sxa sxa closed this as completed in #3702 Aug 13, 2024
@sxa
Copy link
Member Author

sxa commented Aug 14, 2024

Summary

With the initial feasibility done, I'm going to leave this closed and create follow-on items for the subsequent tasks and the outstanding items on the list:

Jenkins job refs:

@sxa
Copy link
Member Author

sxa commented Aug 22, 2024

Note: The HOME environment variable set when the jenkins agent is started is significant, as it affects where git picks up the .gitconfig from during the pipeline checkout on the host. On the current machine I'm using for testing this is set in the startjenkins.sh script before the agent is started.

@sxa
Copy link
Member Author

sxa commented Aug 22, 2024

Above PR should fix the issue with long file names - I'm doing some extra tests to verify with my current job and have also initiated https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-temurin/138/console to test with the full job name. It should be good with the PR in place as it's using the same logic for overriding the default workspace location as we use in the non-docker situation on Windows.

Note that as part of this I have switched from using the C:\jw directory for the top level jenkins home on the docker host machine to C:\workspace for consistency with the non-docker case.

@sxa
Copy link
Member Author

sxa commented Aug 22, 2024

For my own reference - the build times on the docker machine (Not as powerful as the main build machines - it's 2 core / 8GiB) are:

Version Time for 2-core docker build Typical time on Azure 4-core machine
jdk8u 52m 31m
jdk11u 2h14 1h31
jdk17u 2h20 1h27
jdk21u 2h32 1h29
jdk24 1h45

@sxa
Copy link
Member Author

sxa commented Aug 28, 2024

First build using the main pipelines on the dockerhost machine: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-temurin/151/
"NODE_LABEL": "dockerhost-azure-win2022-x64-1",
"DOCKER_IMAGE": "notrhel_build_image",
USER_REMOTE_CONFIGS:

{
    "branch": "docker_windows_shortpath",
    "remotes": {
        "url": "https://github.com/sxa/ci-jenkins-pipelines.git"
    }
}

DEFAULTS_JSON:

        "pipeline_branch": "docker_windows_shortpath",
        "pipeline_url": "https://github.com/sxa/ci-jenkins-pipelines.git",`

@sxa sxa added the Epic label Sep 4, 2024
@sxa
Copy link
Member Author

sxa commented Sep 25, 2024

It's been quite a lot of work but the sign_Verification job now has a working run after a refactor of the code that does the signing and assembly within the pipelines. Ref: #3709 (comment)
A bit of cleaning up, and then verifying that it can create reproducible builds, will mean this can go in as a PR.

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

--create-sbom wasn't working as ant is not in the PATH on the machine. For now I've added that to the path of the environment variables in the jenkins machine definition, but that's probably something we want to cover in the container image setup.

@sxa
Copy link
Member Author

sxa commented Oct 12, 2024

Noting that when attempting to run a build using a fixed SCM_REF for a reproducibility comparison some problems occur

The create_installer_windows job needs to have the PRODUCT_*_VERSION fields matching the directory layout for the build. As an example windbld#986 which was built with an SCM_REF of jdk-21.0.4+7_adopt produced a zip file with a top level directory of jdk-21.0.5+9 but the JDK inside has OpenJDK Runtime Environment Temurin-21.0.4+7-202410111909 (build 21.0.4-beta+7-202410111909 in the java -version output. This causes the installer job to baulk at the end of a loop searching for a path name with 21.0.4 in it (these lines are not consecutive but there's a lot of debug stuff in these logs). Once it fails to find something it shows the directoriy it has which clearly has an unexpected version number in it.

looking for .\SourceDir\OpenJDK21\hotspot\x64\jdk-21.0.4
looking for .\SourceDir\OpenJDK21\hotspot\x64\jdk21u4-b7
looking for .\SourceDir\OpenJDK21\hotspot\x64\jdk-21+7
looking for .\SourceDir\OpenJDK21\hotspot\x64\jdk-21.0.4+7
looking for .\SourceDir\OpenJDK21\hotspot\x64\jdk-21.0.4.0+7
looking for .\SourceDir\OpenJDK-Latest\hotspot\x64\jdk-21.0.4+7
SOURCE Dir not found / failed
Listing directory :
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK21
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK21\hotspot
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK21\hotspot\x64
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK21\hotspot\x64\jdk-21.0.5+9

This is likely nothing to do with the docker changes, but is likely something we should look to address as a build issue in the general case when building something that isn't the latest version, such as a previous GA level. (FYI @andrew-m-leonard)
I've locked create_installer_windows#779 which shows the issue so it can be looked at and re-run if desired. The lock should be released once this is resolved. Similarly create_installer_windows#785 has been locked which was a re-run with the PRODUCT_*_VERSION fields corrected to be consistent with the directory name in the zip file.

@sxa sxa reopened this Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants