Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depend on Boost using FetchContent instead of relying on system-provided boost. #4663

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fruffy
Copy link
Collaborator

@fruffy fruffy commented May 10, 2024

This PR sets up boost to be installed via FetchContent instead of depending on the system-provided version. Because a lot of Boost is header-only this is a little tricky.

Recent versions of boost have improved their CMake support significantly. I pushed a version which only requires downloading boost and linking the appropriate targets. The required boost headers are exported with each target, comparable to Abseil. To make sure that the linked headers are included as system headers I had to do some patching, but that is fairly straightforward.

If one desires to add additional boost targets, they can do so by setting P4C_TARGET_BOOST_LIBRARIES at command line or before invoking boost.

Dependencies

We currently depend on multiprecision and format as part of our core. iostreams for one particular optimization in the parser and graphs for the graphs backend.

graphs alone has an egregious amount of dependencies: https://github.com/boostorg/graph/blob/develop/CMakeLists.txt#L18
iostreams dependencies https://github.com/boostorg/iostreams/blob/develop/CMakeLists.txt#L79
format dependencies https://github.com/boostorg/format/blob/develop/CMakeLists.txt#L17
multiprecision dependencies https://github.com/boostorg/multiprecision/blob/develop/CMakeLists.txt#L30

The only project out of those that is currently standalone is multiprecision.

Because of this dependency problem we need to add almost all the different boost modules to the include path to make sure that we are including the local boost header first. We do this by globbing for the include folder and adding it to the include path.

@fruffy fruffy added core Topics concerning the core segments of the compiler (frontend, midend, parser) run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run. run-validation Use this tag to trigger a Validation CI run. run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. dependencies and removed core Topics concerning the core segments of the compiler (frontend, midend, parser) labels May 10, 2024
@fruffy fruffy force-pushed the fruffy/boost branch 2 times, most recently from b224bff to e9b0345 Compare May 10, 2024 23:19
@asl
Copy link
Contributor

asl commented May 14, 2024

@fruffy How much of the boost used is not header-only? If it would be just headers, we may just extract a piece of it and vendor.

@fruffy
Copy link
Collaborator Author

fruffy commented May 14, 2024

@fruffy How much of the boost used is not header-only? If it would be just headers, we may just extract a piece of it and vendor.

iostreams and graphs are not header-only iirc.
I am currently running into this problem: https://stackoverflow.com/q/72913306 where you need to set up the includes for all the little boost projects. I have not gotten around to untangling these things yet.

We currently depend on multiprecision and format as part of our core. iostreams for one particular optimization in the parser and graphs for the graphs backend.

graphs alone has an egregious amount of dependencies: https://github.com/boostorg/graph/blob/develop/CMakeLists.txt#L18
iostreams dependencies https://github.com/boostorg/iostreams/blob/develop/CMakeLists.txt#L79
format dependencies https://github.com/boostorg/format/blob/develop/CMakeLists.txt#L17
multiprecision dependencies https://github.com/boostorg/multiprecision/blob/develop/CMakeLists.txt#L30

The only project out of those that is currently standalone is multiprecision.

@asl
Copy link
Contributor

asl commented May 14, 2024

graph is used by one particular backend, so the dependency could be moved there. Everything else besides iostreams seems to be header-only and could be vendored (there is a special boost script for this, btw).

@fruffy fruffy force-pushed the fruffy/boost branch 4 times, most recently from de834be to eebe86c Compare May 15, 2024 21:47
@vlstill
Copy link
Contributor

vlstill commented May 16, 2024

Do we need a concrete version of boost that is newer then what is available on supported OSes? If not, I don't see a big reason to fetch boost ourselves. It can be easily installed almost everywhere.

@fruffy
Copy link
Collaborator Author

fruffy commented May 16, 2024

Do we need a concrete version of boost that is newer then what is available on supported OSes? If not, I don't see a big reason to fetch boost ourselves. It can be easily installed almost everywhere.

  1. The problem is the same as with Protobuf. Boost consistently introduces breaking or subtle changes between versions which often cause problems. There are quite a few issues around boost on this repo alone, the latest one being the Ubuntu 18.04 breakage. By controlling the Boost version we can at least make sure that we have one canonical version we write against.

  2. We ultimately want to get rid of most of Boost because it is a big dependency (Replace boost constructs with their C++17 STL equivalents #3898) and only want to pull in packages selectively. This is trying to set up an infrastructure for it.

@fruffy fruffy force-pushed the fruffy/boost branch 7 times, most recently from d6355d7 to 459d094 Compare May 16, 2024 19:33
@fruffy fruffy added the breaking-change This change may break assumptions of compiler back ends. label May 29, 2024
@asl
Copy link
Contributor

asl commented Jul 3, 2024

What takes a bit is downloading boost, the actual compilation is the same because boost is almost all headers.

I think I already mentioned that it is possible to extract header-only part of boost and vendor it. Then there will be no need to fetch the things.

boost::format and boost::multiprecision are both header-only, we can just fetch them (not sure about their dependencies though). For iostreams likely we'd just re-implement the solution from scratch. Maybe memory-mapped files would be even better approach here. Need to check.

Likely something else could be implemented instead of boost::random, after all only uniform distribution seems to be required.

So, the only dependency would be boost:graphs that could be used when graph backend is selected.

@asl
Copy link
Contributor

asl commented Jul 3, 2024

So, boost::random is also header-only. Dependencies are also header-only. The size of boost subset to support the functionality we need is ~10 MiB (this includes format, random and multiprecision). And yes, we can patch with whatever C++20 features would be necessary :)

@fruffy
Copy link
Collaborator Author

fruffy commented Jul 3, 2024

I think I already mentioned that it is possible to extract header-only part of boost and vendor it. Then there will be no need to fetch the things.

I looked at this but this can quickly become a maintenance headache since you will have to vendor many dozens of files. I found just downloading boost once and then picking and choosing the parts you need is much easier. If necessary, we can clean up the includes we need to add.

boost::format and boost::multiprecision are both header-only, we can just fetch them (not sure about their dependencies though).

boost::multiprecision can be used in standalone mode, boost::format unfortunately pulls in a fair bit of other dependencies which also pull in dependencies.

@asl
Copy link
Contributor

asl commented Jul 3, 2024

There is an official boost tool bcp to make such subset. And I do already have this subset just in case :)

@fruffy
Copy link
Collaborator Author

fruffy commented Jul 3, 2024

There is an official boost tool bcp to make such subset. And I do already have this subset just in case :)

We could add those instead! I have started with boost::format there but then gave up on it.

@fruffy
Copy link
Collaborator Author

fruffy commented Jul 26, 2024

There is an official boost tool bcp to make such subset. And I do already have this subset just in case :)

So after discussing this offline it looks like this may lead to an excessive amount of files we have to check into version control. The approach I have currently is probably the simplest. You only need to pull boost once and then can pick and choose the dependencies.

There is still a problem, I am adding all the possible include paths because of the way boost is structured. This creates a bit of spam, maybe there is a better way to only add one include path, but that requires some fiddling with folders.

@fruffy
Copy link
Collaborator Author

fruffy commented Aug 21, 2024

Recent versions of boost have improved their CMake support significantly. I pushed a version which only requires downloading boost and linking the appropriate targets. The required boost headers are exported with each target, comparable to Abseil. To make sure that the linked headers are included as system headers I had to do some patching, but that is fairly straightforward.

If one desires to add additional boost targets, they can do so by setting P4C_TARGET_BOOST_LIBRARIES at command line or before invoking boost.

Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. Likely some duplication and layering violations could be fixed

set(Boost_USE_STATIC_RUNTIME OFF)
endif()

# The boost graph headers are optional and only required by the graphs back end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a layering violation to me. Maybe the P4C_TARGET_BOOST_LIBRARIES should be used here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a tradeoff between moving a bunch of boost logic to the top-level or keeping it all contained within a single CMake file. I pushed some changes which explore the other option.

Not sure which way is preferable.

cmake/Boost.cmake Outdated Show resolved Hide resolved

# Add boost modules.
# format, multiprecision, and iostreams are needed by P4C core.
set(BOOST_INCLUDE_LIBRARIES format multiprecision iostreams)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be specified by caller?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a variable that is consumed by the Boost CMakefiles to determine which libraries to enable. I think it could be specified by the caller but those three libraries are required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following case (take graph backend as a model example): let us assume there is a backend that would require more boost modules. How these could be accomplished? It seems there should be some generic way to specify boost libraries that are to be used beyond the ones required by frontend currently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, P4C_TARGET_BOOST_LIBRARIES is intended for that. You either specify that at the command line or before invoking the Boost CMake or p4c_obtain_boost. Unfortunately, a back end can not retroactively add dependencies. Or at least I have not tried doing that.

I could make this a function parameter but that still has the same limitations from what I can see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is P4C_TARGET_BOOST_LIBRARIES related to ADDITIONAL_P4C_BOOST_LIBRARIES? Also what exactly does "on commandline" mean? Ideally I would want to be able to set it from downstreams top-level cmake before including p4c. Or at least from cmake presets for the downstream.

Copy link
Collaborator Author

@fruffy fruffy Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is P4C_TARGET_BOOST_LIBRARIES related to ADDITIONAL_P4C_BOOST_LIBRARIES

I reworded the variable, the comment is a little outdated.

Ideally I would want to be able to set it from downstreams top-level cmake before including p4c. Or at least from cmake presets for the downstream.

That should be possible, similar to how bf-p4c works. The only requirement is that once we call FetchContent_MakeAvailable for Boost all the requested modules need to be present. The graph module for the graphs back end is currently initialized like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case it should work well. I was confused by "on commandline".

Copy link
Collaborator Author

@fruffy fruffy Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah both options are possible. You could also just set the required modules using cmake -DADDITIONAL_P4C_BOOST_LIBRARIES=assign;filesystem;graph;...

cmake/Boost.cmake Outdated Show resolved Hide resolved
cmake/Boost.cmake Outdated Show resolved Hide resolved
# Reset temporary variable modifications.
set(CMAKE_UNITY_BUILD ${CMAKE_UNITY_BUILD_PREV})
set(FETCHCONTENT_QUIET ${FETCHCONTENT_QUIET_PREV})
set(P4C_BOOST_LIBRARIES Boost::iostreams Boost::format Boost::multiprecision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a duplication. The list of libraries should be specified once IMO :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list describes the actual CMake targets which are used for includes/linking whereas the other list describes the various Boost modules to enable. Not sure whether there is a 1:1 correspondence between the two.

If yes, we could generate these using a variable like Boost::${Boost_Module} for each list element.

@fruffy fruffy force-pushed the fruffy/boost branch 3 times, most recently from 638af95 to 171e459 Compare August 26, 2024 17:34
Signed-off-by: fruffy <[email protected]>
@asl
Copy link
Contributor

asl commented Aug 26, 2024

This might be a significant regression for downstream backends as it would require lots of patching of p4c cmake files. Adding some additional cmake options to build cmdline also looks a bit hacky to me – everything should be self-contained. Essentially, before the backend could just do find_package (Boost COMPONENTS <whatever is necessary>) in its own cmake file and be ok. Ideally it should be similar way with the new functionality.

Here, it looks like many things are hardcoded. In principle we can use in-tree graph backend to model new functionality, things should not be hard-coded inside cmake macro that is supposed to be generic.

@fruffy
Copy link
Collaborator Author

fruffy commented Aug 26, 2024

This might be a significant regression for downstream backends as it would require lots of patching of p4c cmake files. Adding some additional cmake options to build cmdline also looks a bit hacky to me – everything should be self-contained. Essentially, before the backend could just do find_package (Boost COMPONENTS <whatever is necessary>) in its own cmake file and be ok. Ideally it should be similar way with the new functionality.

Here, it looks like many things are hardcoded. In principle we can use in-tree graph backend to model new functionality, things should not be hard-coded inside cmake macro that is supposed to be generic.

I do not think there is any hardcoding going on, it just makes a couple implicit assumptions of the compiler build system explicit. For example, graphs and iostreams were already part of the top-level CMake. We're only improving things here.

Where things are a problem is that the back ends can not really communicate their dependencies to the top-level FetchContent macro. At least I do not see an easy way to do that. Boost CMake targets are only visible after invoking FetchContent_MakeAvailable(Boost) but that only can be invoked after all Boost modules are known.

One way this circular dependency could be resolved is to add a "BoostDeps.cmake" file to each extension, which is then picked up by this macro...

Essentially, before the backend could just do find_package (Boost COMPONENTS <whatever is necessary>) in its own cmake file and be ok. Ideally it should be similar way with the new functionality.

The original behavior is preserved with P4C_USE_PREINSTALLED_BOOST. But even find_package requires boost to be installed previously, which is a separate installation step. So the effort of adding a CMake parameter or installing boost separately is the same. I understand, however, that it requires significant refactoring for back ends, which is a showstopper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. infrastructure Topics related to code style and build and test infrastructure. run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run. run-validation Use this tag to trigger a Validation CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants