Skip to content

feat: add reproducible central buildspec generation #1115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

tromai
Copy link
Member

@tromai tromai commented Jul 2, 2025

Summary

This Pull Request adds a new command called gen-build-spec.

usage: macaron gen-build-spec [-h] -purl PACKAGE_URL --database DATABASE
                              [--output-format OUTPUT_FORMAT]

options:
  -h, --help            show this help message and exit
  -purl PACKAGE_URL, --package-url PACKAGE_URL
                        The PURL string of the software component to generate build
                        spec for.
  --database DATABASE   Path to the database.
  --output-format OUTPUT_FORMAT
                        The output format. Can be rc-buildspec (Reproducible-central
                        build spec) (default "rc-buildspec")

This command generates a buildspec, which contains the build related information for a PURL that Macaron has analyzed. The output file will be stored within output/macaron.buildspec.

An example

macaron analyze -purl pkg:maven/org.apache.hugegraph/[email protected]
macaron gen-build-spec -purl pkg:maven/org.apache.hugegraph/[email protected] --database output/macaron.db

The content of output/macaron.buildspec, which uses the Reproducible Central buildspec format.

# Copyright (c) 2025, Oracle and/or its affiliates.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
# Generated by Macaron version 0.15.0

# Input PURL - pkg:maven/org.apache.hugegraph/[email protected]
# Initial default JDK version 8 and default build command [['mvn', '-DskipTests=true', '-Dmaven.test.skip=true', '-Dmaven.site.skip=true', '-Drat.skip=true', '-Dmaven.javadoc.skip=true', 'clean', 'package']].

groupId=org.apache.hugegraph
artifactId=computer-k8s
version=1.0.0

gitRepo=https://github.com/apache/hugegraph-computer

gitTag=d2b95262091d6572cc12dcda57d89f9cd44ac88b

tool=mvn
jdk=8

newline=lf

command="mvn -DskipTests=true -Dmaven.test.skip=true -Dmaven.site.skip=true -Drat.skip=true -Dmaven.javadoc.skip=true clean package"

buildinfo=target/computer-k8s-1.0.0.buildinfo

This Buildspec ideally can be used directly as part of the Reproducible Central rebuild infrastructure.

Description of changes

Macaron database extractor

The first step to generate a buildspec is to extract the build related information from an existing Macaron SQLite database. The module macaron_db_extractor.py added in this commit does just that.

It uses sqlalchemy SELECT statement for ORM Mapped Classes to extract the data from the database into equivalent ORM Mapped instances that we defined in src/macaron/database/table_definitions.py for example.

Maven and Gradle CLI Command Parser

We use the build commands obtained in CI/CD configuration (e.g. from github action workflow yaml file) for the final buildspec. However, those build commands cannot be used as is and they requires some additional patching to work as a rebuild command.

A proper way to patch any maven and gradle CLI build command is to first parse is. The maven and gradle CLI command parsers added in this commit leverage Python's builtin argparse library.

CLI Build Command Patcher

The modules added in this commit uses the Maven and GRadle CLI Command Parser to parse and patch the build commands obtained from the Macaron database.

Jdk version finding from java Maven Central artifacts

Macaron can obtain the JDK version for a given build command obtained from CI/CD configuration. In some cases, the CI/CD configuration doesn't have enough information for us to obtain the JDK version. Therefore, we also rely on the JDK version included in META-INF/MANIFEST.MF in java artifacts from Maven Central https://repo1.maven.org/.

The module jdk_finder.py added in this commit help download the java artifacts from Maven Central given a maven type PURL, then returns the JDK version if it is available in META-INF/MANIFEST.MF.

In some cases, the JDK version string from META-INF/MANIFEST.MF don't only contain the JDK major version. For example:

Because Reproducible Central Buildspec requires only the major version of JDK, we need to extract that major version only. The jdk_version_normalizer.py module contains the logic to do just that. It is added this in commit.

Generating the Reproducible Central Buildspec

The two commits

use all components listed above to generate the final Reproducible Central Buildspec

Testing

9d8a2a7
da329fe

A new script called compare_rc_build_spec.py is added to compare the result Buildspec in the integration tests.

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.
  • Come up with proper patching values for Reproducible Central format. Right now it's empty.
  • Added integration tests for the feature. Make sure to test on Docker image.
  • Added unit test for the reproducible central build spec generation format
  • Revise all remaining TODOs in the PR
  • Rebase the branch to properly split the changes into different commit, making it easier for review.
  • Update the api docs in the Sphinx documentation (could be in a different PR)
  • Add a tutorial on how to support generating build spec of a different format (e.g. support new build tool patching, etc.) (could also be in a different PR)
  • Update the PR description with the technical designs and instructions on how to review the code of this PR

@tromai tromai self-assigned this Jul 2, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 2, 2025
@tromai tromai added feature A new feature request and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. labels Jul 2, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 2, 2025
return True


def compare_rc_build_spec(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the logic of this function is taken from https://github.com/oracle/macaron/blob/6a712af1ebdcb435bd5b7199dc1b4f5473663090/tests/vsa/compare_vsa.py

I think the comparing functions within compare_vsa.py could be refactored out into a tests_util.py module so that all "compare" scripts used in integration tests (here) could use if needed.
Please let me know if this should be done in a subsequent PR or this PR.

@tromai tromai force-pushed the tromai/add-build-spec-generation branch 2 times, most recently from 2547333 to 3014540 Compare July 9, 2025 12:25
@tromai tromai changed the title feat: add build spec generation feat: add reproducible central buildspec generation Jul 9, 2025
@tromai tromai force-pushed the tromai/add-build-spec-generation branch from 3014540 to da329fe Compare July 9, 2025 12:34
@tromai tromai marked this pull request as ready for review July 9, 2025 13:21
@tromai tromai requested a review from behnazh-w as a code owner July 9, 2025 13:21
pformat(patches),
)

group = purl.namespace
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Reproducible Central Buildspec file needs the GAV coordinate of the component. This will only make sense for an input maven type PackageURL.
At the moment, we accept any type of PackageURL. Please let me know if it makes sense to enforce the input PackageURL to be of type maven when we are generating a Reproducible Central Buildspec ? (Ideally each type of Buildspec format might have different requirements on the input PURL)

purl,
)
return None
final_jdk_version = "8"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always try to obtain the jdk version from maven central jar, and only fall back to 8 if we cannot do so. Please feel free to let me know if this is desireable.

@tromai
Copy link
Member Author

tromai commented Jul 15, 2025

Because we are providing the path to the database from CLI argument, we need to support mounting this database file into the container file system too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature request OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant