Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: detect inline keyword #13355

Closed
wants to merge 3 commits into from
Closed

Conversation

Karlson2k
Copy link
Contributor

As I discovered during work on #12897, libcurl does not support use of inline keyword because libcurl can be built with unusual combination of old compiler and cmake.
When built by old compiler and autotools (configure), inline macro is defined to a suitable replacement (or empty string) if need.
The missing part is cmake checks for inline support.
This PR adds support for cmake detection of inline.
As a fallback path, inline compiler-time detection is implemented in curl_setup.h, so even if build system does not check for inline support for any reason the header will substitute a suitable replacement.

With this PR merged, libcurl and curl code can freely use inline keyword.

@Karlson2k Karlson2k force-pushed the better_inline branch 2 times, most recently from 73ff684 to f6086a6 Compare April 12, 2024 00:00
@bagder
Copy link
Member

bagder commented Apr 12, 2024

The configure script already checks for inline since 375cdf8. If not working, it adds a #define inline in lib/curl_config.h (ie replacing it with nothing). Maybe it would make sense to make the cmake check and logic mimic this?

@bagder bagder changed the title Support inline keyword in libcurl/curl cmake: support the inline keyword Apr 12, 2024
@bagder bagder added the cmake label Apr 12, 2024
@Karlson2k
Copy link
Contributor Author

This is exactly what is implemented in this PR.
Plus workarounds if neither cmake nor configure is used

@bagder
Copy link
Member

bagder commented Apr 12, 2024

This is exactly what is implemented in this PR.

Except:

Plus workarounds if neither cmake nor configure is used

... which I think you should drop.

We already use inline in libcurl code (see lib/easy_lock.h) and it works the way configure does it. It think you should limit this approach to only be for cmake and leave other build systems to handle the case themselves.

@vszakats
Copy link
Member

vszakats commented Apr 12, 2024

What is the advantage of the CMake-level check compared to the detection done in C?

With this patch, curl would have 4 inline detections: in autotools (built-in), CMake, curl_setup.h and curl_sha512_256.c.

@vszakats
Copy link
Member

vszakats commented Apr 13, 2024

I propose to move the inline feature check to C, by using this snippet:

#ifndef CURL_INLINE
#if defined(__GNUC__) && (__GNUC__ >= 3)
#define CURL_INLINE __inline__
#elif defined(_MSC_VER) && (_MSC_VER >= 1400)
#define CURL_INLINE __inline
#elif (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)) || \
  (defined(__cplusplus) && (__cplusplus >= 199711L))
#define CURL_INLINE inline
#else
#define CURL_INLINE
#endif
#endif

This covers the vast majority of compilers without the complexity (and build performance drop) of this PR.
It also makes the code behave the same for all build systems. And it also allows customizing this by the builder via `-DCURL_INLINE=....

According to https://github.com/nemequ/hedley/blob/8fb0604a8095f6c907378cc3f0391520ae843f6f/hedley.h#L1528-L1553 (logic also re-used by Brotli) there exists a few further exotic compilers, but I suggest explicitly adding them when curl actually supports those, and if a specific need arises.

It needs to adapt this inline use in curl, like so (plus applying the macro rename to the force inline logic above):

--- curl_sha512_256.c-ORI
+++ curl_sha512_256.c
@@ -338,7 +338,7 @@
 /* Defined as a function. The macro version may duplicate the binary code
  * size as each argument is used twice, so if any calculation is used
  * as an argument, the calculation could be done twice. */
-static MHDX_INLINE curl_uint64_t
+static CURL_INLINE curl_uint64_t
 MHDx_rotr64(curl_uint64_t value, unsigned int bits)
 {
   bits %= 64;

And to handle two previous uses in easy_lock.h:

--- easy_lock.h-ORI
+++ easy_lock.h
@@ -69,7 +69,7 @@
 
 #endif
 
-static inline void curl_simple_lock_lock(curl_simple_lock *lock)
+static CURL_INLINE void curl_simple_lock_lock(curl_simple_lock *lock)
 {
   for(;;) {
     if(!atomic_exchange_explicit(lock, true, memory_order_acquire))
@@ -88,7 +88,7 @@
   }
 }
 
-static inline void curl_simple_lock_unlock(curl_simple_lock *lock)
+static CURL_INLINE void curl_simple_lock_unlock(curl_simple_lock *lock)
 {
   atomic_store_explicit(lock, false, memory_order_release);
 }

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 13, 2024

Plus workarounds if neither cmake nor configure is used

... which I think you should drop.

We already use inline in libcurl code (see lib/easy_lock.h) and it works the way configure does it. It think you should limit this approach to only be for cmake and leave other build systems to handle the case themselves.

Right. The only inline is in easy_lock.h, which looks like never compiled with MSVC. MSVC is the only old compiler checked with cmake.
If CI would had runners with old compilers AND cmake, they would fail on easy_lock.h.
I think this is the reason why inline is not used in curl code: every attempt to use inline faced errors from cmake + old MSVC builds and the easiest solution was just to drop inline so nobody have used it in PRs and patches.

It looks like libcurl and curl still could be built without autotools and cmake. See /plan9, /winbuild, /packages/vms, /packages/OS400. So if any external build system would be used, the code always would have fallback that should prevent compiler from failing.

What is the advantage of the CMake-level check compared to the detection done in C?

It's easy: C macro detection is limited and imprecise, to be on the safe side it prefers to drop inline when it is not possible to precisely understand what is supported as there is no fallback path. As the result, for some of the compilers supporting inline (in one form or another), the inline will not be used (if only C-macro checks are used).

CMake detection always gives the precise result. Moreover, when standard inline is not supported cmake detects supported alternatives and substitute them if needed. The cmake detection does not rely on any kind of heuristic guessing.
Just like configure counterpart, which provides precise result as well.

With this patch, curl would have 4 inline detections: in autotools (built-in), CMake, curl_setup.h and curl_sha512_256.c.

Just three detections: CMake and autotools (the main build systems), curl_setup.h (in case if build system is not used for any reason). curl_sha512_256.c detects supported forms of force inline. The checks in curl_sha512_256.c could be greatly simplified, it was not include to keep the PR size minimal.
I can add it (simplify curl_sha512_256.c) for clarity.

@vszakats
Copy link
Member

Is there an evidence that the "more precise" detection in CMake/autotools has actual benefits that justifies the complexity and build-performance hit it causes?

I'd like learn which common case is missed by the proposed alternative in #13355 (comment). Once we know those, they are fairly simple to add.

If CI would had runners with old compilers AND cmake, they would fail on easy_lock.h.

Above proposal fixes that too by replacing that inline with CURL_INLINE
and using the short detection.

Unless we're talking about a genuinly old or exotic compiler hitting that
#if branch, easy_lock.h is built without issues when using CMake on
Linux. See the recently added curl-for-win / linux-musl-llvm CI job, or
dist / verify-out-of-tree-cmake (I haven't double-checked the latter).

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 13, 2024

@vszakats If curl would care about the most common compilers, then it would be no need for any checks for inline as all common compilers support it natively nowadays.

If you want to simplify the solution, then just use

#undef inline
#define inline /* empty */

The would cover 100% of all cases, exotic or not.
Very simple.

The current proposed solution just balances autotools and cmake builds, which is desired, as I understand (at least there is a special CI runner that compare results for autotools and cmake, however it is pretty limited as the only one compiler and configuration is checked) and adds the fallback path for other used build systems.
This PR provides guaranteed result and uses all available compiler features based on the best available information. This covers any untested and not directly supported compilers even with non-standard settings (like enforced strict C89 with long long).
You do not need to care about any missed (and not checked) compiler. Everything is handled automatically in the best possible way.

@vszakats
Copy link
Member

vszakats commented Apr 13, 2024

You didn't answer the question of what is the benefit of going overboard with detecting a widely supported compiler feature using 4 complex methods, of which 2 are slow, and 2 are almost exact copies of each other, also most likely unnecessary if the CMake and autotools methods are 100% perfect as you say.

This, over a simple, and easy to customize method that covers likely 99.99% of systems out there (..and 100% with customization).

Also noting that this compiler feature is used ONCE in the whole codebase (only for non-OpenSSL builds.)

(The two pre-existing bare uses in easy_lock.h did not cause any problem or reports since they were introduced.)

To put it differently, why is curl needing this elaborate detection dance, when Brotli is happy with pure preprocessor detection, while also using inline much more for it's similarly portable and much more CPU-bound code?

@Karlson2k
Copy link
Contributor Author

You didn't answer the question of what is the benefit of going overboard with detecting a widely supported compiler feature using 4 complex methods, of which 2 are slow, and 2 are almost exact copies of each other, also most likely unnecessary if the CMake and autotools methods are 100% perfect as you say.

You didn't read my answers as all the points have been already answered.

I'll repeat for clarity:

  • autoconf detection is not touched (works as before)
  • CMake detection is added, to match autoconf version. For clarity: to make the same results with autoconf and cmake. This is the reason.
  • ONE detection in the main header is added. For clarity: it is required for builds without autotools and cmake. It is weaker then existing autotools detection and new cmake detection, so it should be used only as a last resort.
  • Another header is detecting "force inline" version, which is different from just inline. For clarity I've pushed new (larger) commit to simplify this header.

I do not see any single "complex" detection method here. All of them are quite simple, readable and some have comments for additional clarity.
There is no "copies" anymore. Actually the code from the c file is moved by this PR to the global header so all curl code can get benefits from it.

Summary: cmake and autotools now matches each other. When they are not used, the code is still compliable regardless of inline support (while inline or replacements are still used when it is safe).

Regarding "slow" detections. autotools detection is already here, for a long time. Cmake detection "slow down" the build for less than a second, while result matched autotools finally.

This, over a simple, and easy to customize method that covers likely 99.99% of systems out there (..and 100% with customization).

It is not possible (or at least not productive) to support all compilers just with macros.
As I understand, curl strives to be universally portable as much as possible, including some outdated systems. The number of compilers, compiler clones, forks is hardly countable (especially if you include old ones). Just to name a few relatively widely used, but not directly supported: tcc, Solaris Studio compilers, icc. I'm not even sure that it is possible to detect everything just by macros.

Also noting that this compiler feature is used ONCE in the whole codebase (only for non-OpenSSL builds.)

(The two pre-existing bare uses in easy_lock.h did not cause any problem or reports since they were introduced.)

To put it differently, why is curl needing this elaborate detection dance, when Brotli is happy with pure preprocessor detection, while also using inline much more for it's similarly portable and much more CPU-bound code?

One more time: autotools detection is used for more than 16 years (since 12-06-2007), cmake detection is now up to par. Plus the fallback path, with weaker detection by macros when both are not used (actually just moved from C file to the header).

I can't tell for other projects, but I can name hundreds (if not thousands) projects that use inline without any detections and fallbacks. Why curl has detection by autotools, while Brotli does not? Ask Brotli authors.

For limited usage of inline, it is a confusion between cause and effect. As I wrote ealier no code (except non-windows parts) uses inline because it throws errors in CI.
With the proper fix, inline should be used much more frequent.

@bagder
Copy link
Member

bagder commented Apr 14, 2024

With the proper fix, inline should be used much more frequent.

inline is generally of questionable value as it second guesses what the compiler can do, and should therefore be used sparsely and only if there is a proven/tested obvious benefit.

I still think this PR is overdoing it. I think the fix should be done without having to touch C code. Ie fix it for cmake, leave everything else.

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 15, 2024

Ie fix it for cmake, leave everything else.

@bagder done, with HAVE_INLINE_CHECKED macro addition. Without this macro it is possible that the correct result (no macro for inline, natively supported by compiler) will be overridden by macros in curl_sha512_256.c with possible mapping to "empty".

I can remove it as well to further reduce the size of the PR as code will be OK, but probably suboptimal.

@Karlson2k
Copy link
Contributor Author

With the proper fix, inline should be used much more frequent.

inline is generally of questionable value as it second guesses what the compiler can do, and should therefore be used sparsely and only if there is a proven/tested obvious benefit.

inline is just a hint for the compiler. Compiler still may inline functions not marked as inline and may NOT inline functions marked with inline. Final decision is on the compiler side anyway, so it should be absolutely safe just to provide additional input as a programmer's suggestion.

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 16, 2024

@bagder I've split the commits. The first commit is pure cmake.
You may cherry-pick it if you wish.
Or let me know if you want me to remove last two commits from this PR.

@Karlson2k
Copy link
Contributor Author

Test failures seems to be unrelated

Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make the build setup deal with inline by itself, source code can assume it works. We already have code in libcurl that does this (in easy_lock.h) so adding a check here seems inconsistent/unnecessary.

configure.ac Outdated Show resolved Hide resolved
lib/curl_config.h.cmake Outdated Show resolved Hide resolved
lib/curl_sha512_256.c Outdated Show resolved Hide resolved
@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 17, 2024

@bagder Done, with further reducing the code in the curl_sha512_256.c. The code now completely relies on the build systems' detection.

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 17, 2024

If we make the build setup deal with inline by itself, source code can assume it works. We already have code in libcurl that does this (in easy_lock.h) so adding a check here seems inconsistent/unnecessary.

As I pointed out previously, curl still supports a few other build systems. Such build systems have no checks for the inline keywords.
Probably these systems are broken already and do not work (while still not removed from the repo).

The PR was updated as requested by @bagder.

@vszakats
Copy link
Member

vszakats commented Apr 17, 2024

I believe that if detection can be reasonably made in C, we should not move and scatter/duplicate detection logic to the build level. Doing the latter has these effects:

  • needs parallel maintenance and manual sync between build-level detections. (with inevitable differences)
    (to clarify just because autotools has a built-in template for doing this, doesn't make it simple/quick and still needs syncing its logic manually with CMake.)
  • doesn't work when building without CMake or autotools.
  • slows down builds. The data here is that each feature detection in CMake takes 2-3 seconds in CI (autotools probably similar). Equivalent to adding a new curl source file to src or lib. Some of these can be optimized out for Windows with a considerable effort, but it's just easier to avoid doing it.
    The implemented CMake method uses 4 of these, worst case. This is 12 seconds. With 35 lines and custom C files compiled in each, it is complex.
  • there was no evidence that in this case a reasonable detection couldn't be done in C.
  • in fact there was evidence to the contrary, e.g. brotli.
  • there is also no evidence that for the single use of inline in curl, such costly/painful and overkill way to detect inline is justified.

I don't agree with merging this PR in its current form.

edit: I think curl should strive to keep portable C89 code and avoid doing low-level magic with hand-optimized functions, like a crypto-backend would do. When e.g. crypto perf is critical curl does best to rely on existing libraries squeezing out max performance where necessary, for those who need that, and where it fixes real bottlenecks backed by measurements. Curl already uses SHA-512/256 from OpenSSL (and forks), covering a significant portion of builds. If performance here is a critical bottleneck, adding support for more backends would seem a better long term solution. (wolfSSL supports it too)

@bagder
Copy link
Member

bagder commented Apr 18, 2024

I believe that if detection can be reasonably made in C

Having thought about it some more, I too am in favor of this (@vszakats's inline #ifdef-check in C).

@Karlson2k
Copy link
Contributor Author

Karlson2k commented Apr 18, 2024

  • slows down builds. The data here is that each feature detection in CMake takes 2-3 seconds in CI (autotools probably similar). Equivalent to adding a new curl source file to src or lib. Some of these can be optimized out for Windows with a considerable effort, but it's just easier to avoid doing it.
    The implemented CMake method uses 4 of these, worst case. This is 12 seconds. With 35 lines and custom C files compiled in each, it is complex.

Your numbers and assumptions are obviously wrong.
I made several runs of CMake configuration with clean cache under Windows.

  • With this patch: [CMake] -- Configuring done (14.5s)
  • Without this patch: [CMake] -- Configuring done (14.0s)

Stable numbers, best 5 out of 10 tries (the other five are 0.2-0.6 seconds longer, randomly).
The added delay is just 0.5 second. The whole CMake reconfiguration takes just 14 seconds.
Obviously, on modern compilers the only first case (with inline) is tried.
If the compiler is so old that it does not support any form of inline, then probably the performance is not something that the user worry about.

I also tried configure under the GNU/Linux.
Added command to print exact timestamps before and after the test.
The results:

2024-04-18 21:26:10.816365470+03:00
checking for inline... inline
2024-04-18 21:26:10.846448242+03:00
2024-04-18 21:26:25.190701707+03:00
checking for inline... inline
2024-04-18 21:26:25.237320318+03:00
2024-04-18 21:27:28.061953377+03:00
checking for inline... inline
2024-04-18 21:27:28.090844360+03:00

The "slow down" is around 0.03 seconds.

Do you really think this critical slow down needs to be optimised?

Added: numbers were taken on the laptop with 25W CPU TDP. Desktops with proper cooling may have better numbers.

Equivalent to adding a new curl source file to src or lib.

I think it's obvious that a two-lines file without any includes (the test file) is compiled much faster than a full additional C file for curl or lib.

  • there was no evidence that in this case a reasonable detection couldn't be done in C.
  • in fact there was evidence to the contrary, e.g. brotli.

Could you please explain, how the code used in some other project proves some particular needs or absence of the particular needs for this project?

Added: Could you please stop ignoring other's arguments? It was answered two times before: here and here
C-only detection would not use inline even if it is supported on pre-C99 compilers (except GCC and MSVC) or compilers in C89 mode.
Also, C-only detection would not try __inline or __inline__ on compilers other than GCC and MSVC.

@Karlson2k
Copy link
Contributor Author

@bagder If these 0.03 seconds / 0.5 seconds delays are important, and you still think that curl must use C-only detection, I'll modify this PR with removal of autotools detection, the matching lines in cmake header template and other related things.

Note: the autotools builds will be modified (by removing checks added 16 years ago), and the code will rely on C-only code.
C-only detection is less robust, especially with non-compliant compilers (that advertise C99 support, but have broken inline). Other drawbacks are mentioned already.

@bagder
Copy link
Member

bagder commented Apr 24, 2024

I prefer the C solution for these reasons:

  1. it removes complexity from configure / cmake (and the risk that they differ) and instead provides a readable version we control better ourselves
  2. it provides a single place for the logic, independent of build method
  3. it works the same for all, even for non-configure non-cmake builds

@Karlson2k
Copy link
Contributor Author

@bagder Reworked again, as you requested.

@Karlson2k
Copy link
Contributor Author

Rebased.

lib/curl_setup.h Outdated Show resolved Hide resolved
@vszakats
Copy link
Member

Regarding the CMake/autotools detection overhead: This hits Windows the most and macOS next. Linux is generally (much) faster, even when cross-compiling. After dealing with these for nearly 2 years, my numbers for sure are not "obviously wrong". They can be seen in CI runs, or any non-bleeding edge, non-Linux, local machine. Probably more with cross-builds and/or a GNU toolchain.

Either way, thanks for your updates. Aside from a minor update request, it LGTM.

@vszakats vszakats changed the title cmake: support the inline keyword build: detect inline keyword Apr 25, 2024
@vszakats vszakats added build and removed cmake labels Apr 25, 2024
@Karlson2k
Copy link
Contributor Author

Regarding the CMake/autotools detection overhead: This hits Windows the most and macOS next. Linux is generally (much) faster, even when cross-compiling. After dealing with these for nearly 2 years, my numbers for sure are not "obviously wrong". They can be seen in CI runs, or any non-bleeding edge, non-Linux, local machine. Probably more with cross-builds and/or a GNU toolchain.

After dealing with it for nearly 16 years, I can absolutely confirm that MSys2/Cygwin is the slowest detection, especially with autotools that are designed to make a lot of short compiler shots. macOS is much closer to Linux and *BSD in terms of configuration phase length.

The key message was that typically just a single form of inline was tested, not all four forms.

Either way, thanks for your updates. Aside from a minor update request, it LGTM.

Corrected as requested.

@Karlson2k
Copy link
Contributor Author

Please wait for CI to complete. Previous try had VS2022 (three different runners) failed on the same test (345), which is suspicious.

@vszakats
Copy link
Member

vszakats commented Apr 25, 2024

Test 345 has been fixed (#13458). It should be gone if you rebase on master.

I can absolutely confirm that MSys2/Cygwin is the slowest detection, especially with autotools that are designed to make a lot of short compiler shots

CMake does the same under the hood. With somewhat fewer tests in our case. "generate" takes 40 seconds on my macOS machine. (58 seconds with autotools.)

FWIW 2 years was solely dealing with curl CMake builds.

@Karlson2k
Copy link
Contributor Author

This C-header only implementation keeps failing on quiche builder all the time in pytest.
Not sure whether it is related or not.
cmake/configure versions were fine, but, again, I'm not sure that it is related.

@vszakats
Copy link
Member

vszakats commented Apr 25, 2024

The quiche issue a known one, with no final fix yet: #13439

Also macOS fell over a few hours ago: #13476. Possibly more; it seems GitHub silently started delegating ARM runners, also Xcode 15, maybe more.

@Karlson2k
Copy link
Contributor Author

OK.
Then this PR is good to go.

@bagder bagder closed this in 382717d Apr 25, 2024
@bagder
Copy link
Member

bagder commented Apr 25, 2024

Thanks!

@Karlson2k Karlson2k deleted the better_inline branch April 26, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

3 participants