Skip to content

deps: swap in zlib-ng for zlib with ENABLE_ZLIBNG=1 #4639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zachlewis
Copy link
Collaborator

@zachlewis zachlewis commented Feb 14, 2025

Description

Stab at building zlib-ng (zlib api compatibility mode) by default.

Using checked_find_package with ZLIB for any version over 2.0.0 will result in attempting to build, install, and find zlib-ng as zlib.

If zlib-ng is installed and found as "zlib" this way, then the CMake variable ZLIB_VERSION overwritten to match the value of ZLIB_VERSION in zlib.h. This value will match "${ZLIB_VERSION}.zlib-ng".

In a future commit, we should remove the ZLIB_VERSION CMake variable munging, and instead check for the value of ZLIB_VERSION when assembling "build:dependencies" compiling the library.

Tests

We've run some very simple PNG write tests with Windows wheels produced by the CI checks for this PR, and have confirmed that our modifications reliably improve performance as expected.

Todos

  • Be sure when we merge to have the commit message include information about the speedups.
    A few of us should measure performance gain on our end before merging the PR.
  • We should try to confirm exactly which set of zlib symbols are picked up by which libraries, and in particular, be sure we understand whether building OIIO with zlib-ng but linking against a pre-built libpng that wasn't aware of it gives us the better performance, or not.
  • When we end up auto-building libpng, are we passing along the right flags to build libpng to use the zlib-ng? (And for that matter, same issue for libtiff, and anything else that uses zlib.)
  • Make sure an auto-built zlib-ng is not pegging ISA to the build machine, because it may end up being deployed on machines that have less capable CPUs.

Stab at building zlib-ng (zlib api compatibility mode) by default.

Using `checked_find_package` with ZLIB for any version over 2.0.0 will result in attempting to build, install, and find zlib-ng as zlib.

If zlib-ng is installed and found as "zlib" this way, then the CMake variable ZLIB_VERSION overwritten to match the value of ZLIB_VERSION in zlib.h
This value will match "${ZLIB_VERSION}.zlib-ng".

In a future commit, we should remove the `ZLIB_VERSION` CMake variable munging, and instead check for the value of ZLIB_VERSION when assembling "build:dependencies" compiling the OIIO.

Signed-off-by: Zach Lewis <[email protected]>
Comment on lines +75 to +76
set_cache (ZLIB_BUILD_SHARED_LIBS OFF
DOC "Should execute a local ZLIB build, if necessary, build shared libraries" ADVANCED)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern: by building zlib-ng as a static library, won't that prevent it from being used by libpng, which is where much of the benefit of the faster zlib implementation would come from? Any static libraries we incorporate into libOpenImageIO have their symbols hidden by default, specifically so nothing else we link will find those symbols.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah? You'd definitely know better than I; I only barely understand what I'm doing! But yes, we specifically want PNG to find and use zlib-ng; the rest doesn't really matter.

I was hoping by placing the local build prefix at the front of CMAKE_PREFIX_PATH, and passing CMAKE_IGNORE_PATH to child build processes, that would coerce the PNG build process to find and link zlib-ng before any other zlibs. I'll should turn on shared libs and turn off plugin bundling to see what's actually happening...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I see what you mean. I'm honestly not sure. You might be right that it could be picked up by libpng and other dependencies IF we build them ourselves. But I think the usual case is that some/most/all of the dependencies are already built.

One possibility is that we should build statically, but expose the symbols by adding them to the global list in hidesymbols.map, so that libpng and others will find them inside our library at actual load time. We may need to also modify libOpenImageIO/CMakeLists where we have -Wl,--exclude-libs,ALL. (By default, we hide all symbols from static library dependencies so that we only use them internally but don't expose their symbols for dependencies to link against.)

But once we expose them, they might be picked up by anything that links us as a dependency, too, depending on the link order. So, for example, an app that itself uses zlib and OIIO might end up itself using zlib-ng inadvertently. Is that always ok? Maybe it's fine, I just think we need to be confident that this is the behavior we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for example, an app that itself uses zlib and OIIO might end up itself using zlib-ng inadvertently. Is that always ok? Maybe it's fine, I just think we need to be confident that this is the behavior we want.

Hmm. Very interesting. If the zlib-ng documentation is to be believed, I would imagine we'd want this behavior; but I don't feel qualified to so say so for sure...

Would exposing zlib-ng-compatibility-mode symbols in OIIO impose unexpected constraints or build-time requirements on building apps linking libraries that themselves link OIIO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: @joaovbs96 tested one of the Windows bdist artifacts produced by the last Wheels run for this thread, and has confirmed PNG write speed improvements consistent with having linked against zlib-ng instead of zlib -- in his use case (which inspired this PR), write times for very large PNGs decreased from ~20s to ~8s.

if (ENABLE_ZLIBNG)
checked_find_package (ZLIB REQUIRED
VERSION_MIN 2.2.4 # ZLIB-NG
BUILD_LOCAL missing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your rationale for putting BUILD_LOCAL missing here? We do that very sparingly. Do you think we should make zlib-ng require a specific "opt out", meaning that it's used unless you specifically disable it, and furthermore should building it automatically be opt-in or opt-out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that old ZLIB was not 'BUILD_LOCAL missing', it would just fail if not found, unless the build was initiatied with zlib (or all) in the list of things to build automatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is one of those things I was struggling with that keeps this PR in "draft" form -- I need to further explore, and wanted to get your opinion on this. Looking at what I've done a few days later, I'm not sure how much sense I was making. I was definitely overthinking it.

So, what I was trying to do is always try to locally build_ zlib-ng by default (i.e., users could "opt-out" by setting ENABLE_ZLIBNG=0 etc). The plan was for the "ENABLE_ZLIBNG=1" option to always attempt to build zlib-ng when the option is enabled, because find_package(zlib) will (currently) never find a zlib greater than version 1.x, if zlib-ng is installed in compatibility mode. Setting the "VERSION_MIN" to something above 2.0 causes build_ZLIB.cmake to treat zlib as missing, build zlib-ng in compatibility mode, and subsequently "find" zlib-ng 2.2.4 "as" zlib-1.3.1.

My thinking was, if the ZLIBNG building behavior was "opt-out" by default, if I didn't have the BUILD_LOCAL missing bit, users would suddenly be forced to set OpenImageIO_BUILD_MISSING_DEPS or OpenImageIO_BUILD_LOCAL_DEPS by default.

I think now maybe the best course of action is to instead make building zlib-ng "opt-in" behavior, and have users also provide the _BUILD_MISSING_DEPS / _BUILD_LOCAL_DEPS options as necessary; and then I could get rid of the BUILD_LOCAL missing bit (unless you felt that opting in to enabling zlib-ng should always imply BUILD_LOCAL missing).

Also, ENABLE_ZLIBNG as an option name is a bit of a misnomer, because, if disabled, we'd need considerably more complex logic to skip over any found (e.g. user-provided) zlib-ng-compatibility-mode libs in favor of vanilla zlib, since they cannot be differentiated by CMake without inspecting the headers... so, if you have a better name for the option, I'm all ears.

The more I think about it, the more sense it makes to "opt-in" to zlib-ng.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the logic that makes most sense and is symmetric with how we do it elsewhere (like with libjpeg-turbo) is:

  • If zlib-ng is found, it is used by default instead of zlib (but of course with ENABLE_ZLIBNG=OFF able to turn it off even if found).
  • If zlib-ng is not found, it is not auto-built by default (keeping in line with how there's very little that we auto-build by default), but we can auto-build if the right options are set (just like any other package).

To put it another way: using is opt-out, but autobuilding it is opt-in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it another way: using is opt-out, but autobuilding it is opt-in.

Aye-aye.

What makes this tricky is that find_package(ZLIB) can't distinguish zlib from zlib-ng-compatibility-mode (by design), other than that zlib-ng provides a proper CMake config, and original gangster zlib does not. That makes it easy to only find zlib-ng, but difficult to never find zlib-ng.

(but of course with ENABLE_ZLIBNG=OFF able to turn it off even if found)

I'm unsure of the best way to force find_package to always ignore zlib-ng...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it'll get you unstuck, I'm ok accepting "opt in to build", but "always use if found". We can keep it in main for a while, and if it's a problem in practice (or somebody demands to build against zlib even though they also have zlib-ng), then we can try to solve that later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush to get this in place. Let's be sure this does what people expect it to do, and doesn't cause problems.

Keep in mind, we're only talking about zlib-ng libs built expressly to look and quack exactly like zlib. Is it safe to assume that the presence of compatibility-mode-zlib-ng itself is indicative of intent? It's not impossible that this could happen by accident, but I feel like users would be best served by adjusting their build environments, rather than risk making OIIO's build system much smarter than anyone else's...

But who knows why people need to do things. I'll do my best to try to accommodate everyone...

This probably isn't the most intuitive way of presenting these options to users -- very open to suggestions -- but at the moment, here's what I'm doing:

  • DISABLE_ZLIBNG is the legacy behavior "opt-out" switch -- controls whether we find + autobuild zlib-ng (default) over zlib
  • DISABLE_ZLIB controls how strongly we prefer zlib-ng over zlib -- a little (we'd allow zlib if zlib-ng isn't available) (default), or a lot (we'd autobuild a missing zlib-ng)

Allowing for:

  • Default behavior: try to find zlib-ng, else try to find zlib, else try to autobuild zlib-ng.
  • Wheels: DISABLE_ZLIB=1 and OpenImageIO_BUILD_MISSING_DEPS=zlib will force the build system to always find + autobuild zlib-ng.
  • Legacy: DISABLE_ZLIBNG=1 will force the build system to only find + autobuild zlib, ignoring zlib-ng. (Setting DISABLE_ZLIB=1 isn't meaningful when DISABLE_ZLIBNG=1).

Here's the thing: all of this assumes that zlib-ng exports a CMake config, and vanilla zlib does not. And this was reliably the case, until about three weeks ago: there's a commit in zlib's development branch implementing support for exporting configs...!

So, eventually, zlib-1.3.2's exporting of CMake configs will interfere with our heuristic for differentiating zlib-ng from zlib. In theory, a system-level zlib-1.3.2 could cause an expected zlib-ng autobuild to surprisingly fail to occur. (But I think it would also allow those fearful of picking up zlib-ng to point ZLIB_ROOT to the system).

In any case, -DOpenImageIO_BUILD_LOCAL_DEPS=zlib should reliably behave as expected -- we can always force the build system to favor whatever we autobuild. It's important to note that whether zlib-ng or zlib is autobuilt is governed by the *_ZLIBNG options, rather than the value (i.e., "zlib" vs "zlib-ng") passed to the *_DEPS options.

@zachlewis
Copy link
Collaborator Author

(Not sure what's causing the linux Wheels checks to fail, either... all of a sudden, CMake can't find git on the manylinux images?!)

@lgritz
Copy link
Collaborator

lgritz commented Feb 21, 2025

Weird. Last night's scheduled builds ran the wheel workflow just fine. I just kicked this to make it re-run the tests, we'll see if it was some kind of spurious failure.

@lgritz
Copy link
Collaborator

lgritz commented Feb 21, 2025

Nope, still fails. It is failing specifically in the zlib building logic. It must be something in your new cmake code, but nothing leaps out at me.

DISABLE_ZLIB=1 to only build against zlib-ng.

DISABLE_ZLIBNG=1 will attempt to build against zlib instead of zlib-ng; but it will not necessarily _prevent_ building against zlib-ng....

Signed-off-by: Zach Lewis <[email protected]>
@lgritz
Copy link
Collaborator

lgritz commented Feb 24, 2025

Notes on this topic from today's TSC meeting:

  • Be sure when we merge to have the commit message include information about the speedups.
  • A few of us should measure performance gain on our end before merging the PR.
  • We should try to confirm exactly which set of zlib symbols are picked up by which libraries, and in particular, be sure we understand whether building OIIO with zlib-ng but linking against a pre-built libpng that wasn't aware of it gives us the better performance, or not.
  • When we end up auto-building libpng, are we passing along the right flags to build libpng to use the zlib-ng? (And for that matter, same issue for libtiff, and anything else that uses zlib.)
  • Make sure an auto-built zlib-ng is not pegging ISA to the build machine, because it may end up being deployed on machines that have less capable CPUs.

@lgritz
Copy link
Collaborator

lgritz commented Feb 25, 2025

I should have added, @zachlewis, that people at the meeting were excited about these changes and, as Thiago put it, "faster is good". We're just being cautious that there aren't any weird edge cases introduced for other complex apps that incorporate OIIO.

@zachlewis
Copy link
Collaborator Author

Indeed -- this is proper due diligence, and exactly the kind of feedback I was hoping for -- guidance on what to look into and test, so we can move forward (or not!) with confidently supporting optional zlib-ng autobuilds. So, thanks for bringing this up during the TSC.

(Also, for what it's worth, there's always a middle ground we could consider, where we only do the zlib-ng thing as part of the wheels bdists builds workflow, without going as far as supporting zlib-ng autobuilds within OIIO's own build system.)

Will dive in further over the next few days.

@zachlewis
Copy link
Collaborator Author

When we end up auto-building libpng, are we passing along the right flags to build libpng to use the zlib-ng? (And for that matter, same issue for libtiff, and anything else that uses zlib.)

This is a good question -- libpng has recently deprecated its build option for specifying a zlib in favor of ZLIB_ROOT, as set via `checked_find_package(ZLIB ...). If I understand what's happening in our build scripts -- and there's a very high probability I do not -- I believe the PNG autobuild inherits that variable automatically, cuz it's cached by the parent scope (?? i think?)

In any case, without having done anything to explicitly inform the PNG build, PNG does seem to be picking up our zlib-ng, as evidenced by Joao's tests.

That being said, the only place I'm passing zlib-specific flags to a dependency build is in the minizip-ng build (for OCIO); and I think I must be messing stuff up in doing so, cuz logs seem to suggest minizip-ng is linking a static zlib-1.2 (instead of our local "zlib"-1.3).

If that's the case, would I reliably be able to elict a segfault by invoking OCIO and OIIO features that both require zlib? I'll try pointing $OCIO to a .ocioz archive, applying an IBA which requires loading into memory a compressed LUT, and then dumping out a PNG. If I segfault here, I'll also test against a version of zlib-ng compatible with zlib-1.2, so we have a more apples-to-apples-ng comparison.

(But either way, I'll commit a fix so that minizip-ng uses the same zlib that everything else is using).

A few of us should measure performance gain on our end before merging the PR.

FWIW, the performance gain might only be significant for Windows... but maybe we could log a very-large-PNG-write-speed measure stashed in the test suite somewhere, and I can kick off a Wheels run with the tests not disabled... that should at least give us some solid metrics to point to.

Make sure an auto-built zlib-ng is not pegging ISA to the build machine, because it may end up being deployed on machines that have less capable CPUs.

zlib-ng seems to have smart build options and defaults. There's a WITH_NATIVE_INSTRUCTIONS option I'm leaving disabled for compiling with the full instruction set (-march=native); as well as a WITH_RUNTIME_CPU_DETECTION option I'm leaving enabled.

We should try to confirm exactly which set of zlib symbols are picked up by which libraries, and in particular, be sure we understand whether building OIIO with zlib-ng but linking against a pre-built libpng that wasn't aware of it gives us the better performance, or not.

To the extent that the performance difference is measurable on the machines available to me, I'll see if I can test this. Would we expect an existing libpng that links dynamic zlib to perform better if it finds a dynamic zlib-ng instead? Would we expect an existing libpng that links a static zlib to be indifferent to OIIO linking zlib-ng?

All this said, I'm feeling less and less confident that OIIO's build system is the best place for introducing something like zlib-ng, given zlib's ubiquity. It makes more sense when OIIO is building all dependencies locally, and it's something I'd like for our Windows wheels bdists. I think it's cool and convenient that it can be done. But I don't know if the benefits of a switch for better PNG performance outweighs the risks presented by potential dueling zlibs.

And I do need to stress that, from what I've observed, the performance issues that zlib-ng fixes for PNG don't seem to plague any other formats in the wild. If it's possible, and if it doesn't break anything, perhaps it's safer to build and link zlib-ng only for autobuilt PNG? Or does that present nightmares for OIIO?

Maybe the best solution of all involves convincing the png folks to drop zlib entirely in favor of libdeflate, if that's even possible?

@lgritz
Copy link
Collaborator

lgritz commented Feb 28, 2025

What's the relationship/difference between minizip-ng and zlib-ng?

@zachlewis
Copy link
Collaborator Author

What's the relationship/difference between minizip-ng and zlib-ng?

Great question. The two are complimentary libraries -- zlib handles low-level compression and decompression needs via the DEFLATE algorithm, and is a dependency of minizip. Minzip provides higher-level functionality pertaining to zip archives, including encryption and support for other compression schemes. Both minizip-ng and zlib-ng are maintained by the same person. Minizip-ng can use either zlib or zlib-ng.

For OCIO, we use minizip-ng to support .ocioz archives, which are essentially zip files of OCIO configs bundled with the color transform data (e.g. LUTs) the configs reference. OCIOZ archives can be read directly the same way as regular OCIO files. They're awesome.

@zachlewis
Copy link
Collaborator Author

Okay, so, apparently it's at least theoretically possible to get our PNG plugin to use libdeflate on a per-scanline basis, ultimately bypassing libpng's use of zlib entirely. It doesn't seem like it's been done before anywhere (that I could find)... it's possible we'd see performance gains not found in other PNG implementations. On the other hand, actually implementing something like this is far beyond anything I'd know how to do, and it sounds like maybe more effort than its worth; especially if we're happy with just linking zlib-ng as part of the windows wheels builds.

If we want to keep the zlib-ng autobuilding stuff, maybe we only allow ENABLE_ZLIBNG to self-build zlib-ng instead of zlib if and only if OpenImageIO_BUILD_LOCAL_DEPS=all or something -- i.e., only introduce zlib-ng into the mix if the OIIO build system is attempting to build all dependencies.

I remain pretty uncomfortable with the idea of OIIO's build system making it particularly easy to introduce zlib symbol shenanigans down the line.

@lgritz
Copy link
Collaborator

lgritz commented Mar 7, 2025

OK, I understand. minizip-ng is for manipulating zip files, which is not something we need in OIIO. We just want zlib-ng, strictly to directly replace use of zlib in OIIO and, with some luck, its dependencies, with something having an identical API but better performance.

Lets proceed with that plan, then.

I think as a later step, it's worth at some point exploring libdeflate, and we can decide if it gives enough additional performance over zlib-ng to warrant using it where we can, despite having to change the call points for its different API.

For OIIO's use of compression explicitly, I'm not really concerned with libdeflate's API being different -- our direct use of zlib's API is very narrow, so changing is not a big deal. One "con" of libdeflate is that it would only improve places where we explicitly use it, and would not give any improvement to libPNG or other dependencies that are trying to link against the zlib symbols. (If zlib-ng or libdeflate are substantially better, I would like to hope that libPNG or others would switch to them, but that's another story.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants