Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threads_win: fix build error with VS2010 #24326

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

httpstorm
Copy link
Contributor

@httpstorm httpstorm commented May 3, 2024

As snprintf is not available everywhere, use BIO_snprintf instead.
Fixes:

        IF EXIST test\quic_multistream_test.exe.manifest DEL /F /Q test\quic_multistream_test.exe.manifest
        "link" /nologo /debug setargv.obj /subsystem:console /opt:ref  /nologo /debug @V:\_tmp\nm4.tmp
quic_multistream_test-bin-quic_multistream_test.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _helper_init
test\quic_multistream_test.exe : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '"E:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\link.EXE"' : return code '0x460'

VC 2010 or earlier compilers do not support static inline. To work around this problem, we can use the ossl_inline macro.

Fixes:

crypto\threads_win.c(171) : error C2054: expected '(' to follow 'inline'
crypto\threads_win.c(172) : error C2085: 'get_hold_current_qp' : not in formal parameter list
crypto\threads_win.c(172) : error C2143: syntax error : missing ';' before '{'
crypto\threads_win.c(228) : warning C4013: 'get_hold_current_qp' undefined; assuming extern returning int
crypto\threads_win.c(228) : warning C4047: '=' : 'rcu_qp *' differs in levels of indirection from 'int'

InterlockedAnd64 and InterlockedAdd64 are not available on VS2010 x86. We already have implemented replacements for other functions, such as InterlockedOr64. Apply the same approach to fix the errors. A CRYPTO_RWLOCK rw_lock is added to rcu_lock_st.

Note that the return values of the replacement atomic functions for VS2010 x86 are not checked, because I did not know how to respond to these errors in each location where they are used. This might happen if the lock is not initialised yet. Though the lock is initialised in the beginning when each object is constructed, so this should not occur. Please let me know if any changes are needed.

Fixes:

libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAdd64 referenced in function _get_hold_current_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr referenced in function _get_hold_current_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAnd64 referenced in function _update_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr64 referenced in function _ossl_synchronize_rcu

Build and run tested: Windows 2003 x64, Windows 2003 x86, Visual Studio 2010.
@t8m @levitte @paulidale @nhorman @mattcaswell @hlandau

Checklist
  • documentation is added or updated
  • tests are added or updated

@openssl-machine openssl-machine added the hold: cla required The contributor needs to submit a license agreement label May 3, 2024
@httpstorm httpstorm marked this pull request as draft May 3, 2024 08:52
@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 3, 2024
@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch 2 times, most recently from 8a2db04 to 42bac3d Compare May 3, 2024 09:22
@httpstorm httpstorm marked this pull request as ready for review May 3, 2024 10:18
@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch from 42bac3d to a971c66 Compare May 3, 2024 10:19
@httpstorm
Copy link
Contributor Author

I submitted signed CLA over e-mail.

crypto/threads_win.c Show resolved Hide resolved
@@ -569,6 +576,44 @@ int CRYPTO_atomic_add(int *val, int amount, int *ret, CRYPTO_RWLOCK *lock)
return 1;
}

int CRYPTO_atomic_add64(uint64_t *val, uint64_t op, uint64_t *ret,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to run make update to get this in the exported ordinals list, its why CI is failing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was wondering what this error means.
That ancient piece of history Windows 2003 is using nmake to compile, so I ran the command on macOS instead. I see the two new functions were added to util/libcrypto.num. I hope that's good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. The CI test for symbol presence is failing, need to look at why.

also, if you're going to add new exported library symbols, the docs check is going to fail if you don't document them in the appropriate pod file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc/man3/CRYPTO_THREAD_run_once.pod contains all related function, so I added the description there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Windows works only with nmake. I do have GNU Make 4.4.1, but both fail to perform the update target. So my options are macOS 14.4.1 Intel, and Ubuntu 22.04.4 LTS.

nmake update
Microsoft (R) Program Maintenance Utility Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

NMAKE : fatal error U1073: don't know how to make 'update'
Stop.

make update
makefile:78: *** missing separator.  Stop.

I also noticed that apart from threads_win.c, the atomic functions are also defined in threads_pthread.c and threads_none.c. Do I need to implement them there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I need to add them to ./test/threadstest.c?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do need to add the definitions to threads_pthread.c and threads_none.c, as only one of the three (win/pthreads/none) gets compiled for the build and you need a definition for the exported symbols

As for why make update isn't working. I expect its got something to do with the version of nmake on vs2010. Its worked for me on VS2022 community edition. note VS2010 isn't an officially supported platform

I'd focus on getting the implementations defined on other platforms first, then worry about being able to make the update script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added definitions to threads_pthread.c and threads_none.c. Added tests to threadstest.c.

@nhorman
Copy link
Contributor

nhorman commented May 3, 2024

whats the output of:

make TESTS=test_threads V=1 test

with this change? I'd like to make sure this isn't going to create a performance impact to the rcu locks on windows

@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch from a971c66 to 6231612 Compare May 3, 2024 12:43
@openssl-machine openssl-machine removed the hold: cla required The contributor needs to submit a license agreement label May 3, 2024
crypto/threads_win.c Show resolved Hide resolved
@@ -569,6 +576,44 @@ int CRYPTO_atomic_add(int *val, int amount, int *ret, CRYPTO_RWLOCK *lock)
return 1;
}

int CRYPTO_atomic_add64(uint64_t *val, uint64_t op, uint64_t *ret,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. The CI test for symbol presence is failing, need to look at why.

also, if you're going to add new exported library symbols, the docs check is going to fail if you don't document them in the appropriate pod file

@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch 3 times, most recently from 92cc5b0 to 344d9a4 Compare May 3, 2024 16:36
@github-actions github-actions bot added the severity: ABI change This pull request contains ABI changes label May 3, 2024
@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch from 344d9a4 to b5ddf2e Compare May 3, 2024 18:45
@httpstorm httpstorm marked this pull request as draft May 3, 2024 19:14
@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch 7 times, most recently from 34f90ac to 896cb61 Compare May 3, 2024 23:17
@httpstorm
Copy link
Contributor Author

whats the output of:

make TESTS=test_threads V=1 test
        IF EXIST test\quic_multistream_test.exe.manifest DEL /F /Q test\quic_multistream_test.exe.manifest
        "link" /nologo /debug setargv.obj /subsystem:console /opt:ref  /nologo /debug @V:\_tmp\nm4.tmp
quic_multistream_test-bin-quic_multistream_test.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _helper_init
test\quic_multistream_test.exe : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '"E:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\link.EXE"' : return code '0x460'

with this change? I'd like to make sure this isn't going to create a performance impact to the rcu locks on windows

I agree, performance is an important factor. One approach would be to use macros, so that only VS2010 x86 target is affected. I didn't do it this way, because I tried to follow existing patterns. We can also apply this to other atomic functions (that are already replaced) if you find it necessary.

Note: macOS test ran on MacBook Pro 2019 Intel, the rest of the tests ran on Intel i7 930.
I will review the changes once again after a good sleep.

@nhorman
Copy link
Contributor

nhorman commented May 4, 2024

Something isn't right, your tests on windows 11 show that RCU lock performance is about 1/2 of corresponding rw lock performance. They should.be about equal for the light contention case and RCU should do much better in the heavy contention case.

@httpstorm
Copy link
Contributor Author

httpstorm commented May 4, 2024

@nhorman

Something isn't right, your tests on windows 11 show that RCU lock performance is about 1/2 of corresponding rw lock performance. They should.be about equal for the light contention case and RCU should do much better in the heavy contention case.

Any idea why? I see the same machine on Linux matches the behaviour you mentioned.
The Windows 11 tests from last night use configuration options I copied from the Windows 2022 runner.

perl Configure --banner=Configured enable-demos no-makedepend no-shared no-fips enable-md2 enable-rc5 enable-ssl3 enable-ssl3-method enable-weak-ssl-ciphers enable-trace enable-crypto-mdebug VC-WIN64A-masm

The computer is from 2010 i7 930 4 cores+HT. I ran that test again to compare performance before and after the patches on the same machine, while idle. RCU performance is again half of RW.
perl Configure VC-WIN64A no-shared

# torture_rw_low
- performed 47544198 reads and 6 writes over 2 read and 2 write threads in 4.056000e+00 seconds
+ performed 48842838 reads and 6 writes over 2 read and 2 write threads in 4.048000e+00 seconds
- Average read time 8.531009e-08/read
+ Average read time 8.287807e-08/read
- Averate write time 6.760000e-01/write
+ Averate write time 6.746667e-01/write

# torture_rw_high
- performed 41121498 reads and 654482 writes over 2 read and 2 write threads in 3.589000e+00 seconds
+ performed 43130985 reads and 795234 writes over 2 read and 2 write threads in 3.920000e+00 seconds
- Average read time 8.727795e-08/read
+ Average read time 9.088594e-08/read
- Averate write time 5.483726e-06/write
+ Averate write time 4.929367e-06/write

# torture_rcu_low
- performed 24620885 reads and 6 writes over 2 read and 2 write threads in 4.044000e+00 seconds
+ performed 25965189 reads and 6 writes over 2 read and 2 write threads in 4.055000e+00 seconds
- Average read time 1.642508e-07/read
+ Average read time 1.561706e-07/read
- Average write time 6.740000e-01/write
+ Average write time 6.758333e-01/write

# torture_rcu_high
- performed 21784861 reads and 1760318 writes over 2 read and 2 write threads in 3.956000e+00 seconds
+ performed 25114008 reads and 1759286 writes over 2 read and 2 write threads in 3.945000e+00 seconds
- Average read time 1.815940e-07/read
+ Average read time 1.570836e-07/read
- Average write time 2.247321e-06/write
+ Average write time 2.242387e-06/write

# Linux
# torture_rw_low
performed 29389229 reads and 6 writes over 2 read and 2 write threads in 4.001009e+00 seconds
Average read time 1.361386e-07/read
Averate write time 6.668348e-01/write

# torture_rw_high
performed 8561309 reads and 336462 writes over 2 read and 2 write threads in 3.367428e+00 seconds
Average read time 3.933310e-07/read
Averate write time 1.000835e-05/write

# torture_rcu_low
performed 26298285 reads and 6 writes over 2 read and 2 write threads in 4.000877e+00 seconds
Average read time 1.521345e-07/read
Average write time 6.668128e-01/write

# torture_rcu_high
performed 34343847 reads and 2291988 writes over 2 read and 2 write threads in 3.998745e+00 seconds
Average read time 1.164326e-07/read
Average write time 1.744662e-06/write

Good news: Windows 2003 + VS2010 pass test_threads.

Note: the IPv6 network stack was not installed. If I install it, that test passes, but later two other fail

I ran all tests on other platforms, here are the results. Note that on Windows, x64 and x86 tests were ran at the same time.

quic_multistream_test: fix undefined symbol snprintf with VS2010
Fixes building of tests during OpenSSL compile.
No regression on other platforms. The tests above are compiled with this patch.

diff --git a/test/quic_multistream_test.c b/test/quic_multistream_test.c
index e01e8ae85a..8437d879f9 100644
--- a/test/quic_multistream_test.c
+++ b/test/quic_multistream_test.c
@@ -23,6 +23,12 @@
 #endif
 #include "internal/numbers.h"  /* UINT64_C */
 
+#if (defined(_MSC_VER) && _MSC_VER <= 1600)
+# define SNPRINTF_FN sprintf_s
+#else
+# define SNPRINTF_FN snprintf
+#endif
+
 static const char *certfile, *keyfile;
 
 #if defined(OPENSSL_THREADS)
@@ -784,7 +790,7 @@ static int helper_init(struct helper *h, const char *script_name,
         goto err;
 
     /* Set title for qlog purposes. */
-    snprintf(title, sizeof(title), "quic_multistream_test: %s", script_name);
+    SNPRINTF_FN(title, sizeof(title), "quic_multistream_test: %s", script_name);
     if (!TEST_true(ossl_quic_set_diag_title(h->c_ctx, title)))
         goto err;
 
@@ -5827,7 +5833,7 @@ static int test_script(int idx)
     }
 #endif
 
-    snprintf(script_name, sizeof(script_name), "script %d", script_idx + 1);
+    SNPRINTF_FN(script_name, sizeof(script_name), "script %d", script_idx + 1);
 
     TEST_info("Running script %d (order=%d, blocking=%d)", script_idx + 1,
               free_order, blocking);
@@ -5912,7 +5918,7 @@ static ossl_unused int test_dyn_frame_types(int idx)
             s[i].arg2 = forbidden_frame_types[idx].expected_err;
         }
 
-    snprintf(script_name, sizeof(script_name),
+    SNPRINTF_FN(script_name, sizeof(script_name),
              "dyn script %d", idx);
 
     return run_script(dyn_frame_types_script, script_name, 0, 0);
-- 
2.44.0

I noticed that existing atomic functions in threads_pthread.c check for if (ret != NULL) only in one of the precompiled cases. Is this intentional or should it be corrected? I followed the same implementation in the newly added functions. Such a check is not performed in threads_none.c and threads_win.c, which is inconsistent. The documentation CRYPTO_THREAD_run_once.pod does not mention that ret is checked for NULL, only lock.

Please let me know if I need to change anything else. What about the snprintf patch?

@httpstorm httpstorm marked this pull request as ready for review May 4, 2024 17:40
@httpstorm httpstorm requested a review from nhorman May 4, 2024 17:41
@viruscamp
Copy link

@httpstorm please check #24109 , I have built openssl-3.3.0 with that patch in vs2010 x86 winxp-sp3-x86.
Another change is replacing snprintf with BIO_snprintf.

The tests results:

@httpstorm
Copy link
Contributor Author

@viruscamp
Thank you! Indeed an assembly implementation would be way faster and we won't have to make so many changes.
I am traveling for a few days. I will test the assembly patch and take care of snprintf, once I'm back.

@@ -168,17 +184,23 @@ void ossl_rcu_lock_free(CRYPTO_RCU_LOCK *lock)
OPENSSL_free(lock);
}

static inline struct rcu_qp *get_hold_current_qp(CRYPTO_RCU_LOCK *lock)
static INLINE_VS struct rcu_qp *get_hold_current_qp(CRYPTO_RCU_LOCK *lock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have ossl_inline please use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I changed it. Test build successful on Windows 2003, VS2010.

Comment on lines 30 to 40
/*
* VC++ 2010 or earlier compilers do not support static inline.
* To work around this problem, we can use a macro.
*/

#if (defined(_MSC_VER) && _MSC_VER <= 1600)
# define INLINE_VS
#else
# define INLINE_VS inline
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this. We have ossl_inline already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped. Now using ossl_inline.

@nhorman
Copy link
Contributor

nhorman commented May 6, 2024

no unfortunately, but something definately seems off

@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch 2 times, most recently from 8a9652f to b88b52e Compare May 8, 2024 07:02
@httpstorm httpstorm force-pushed the threads_win-vs2010.2024-05-03 branch from b88b52e to 45bd2fc Compare May 8, 2024 09:47
@httpstorm
Copy link
Contributor Author

Updated:

  • use BIO_snprintf in quic_multistream_test
  • use ossl_inline in threads_win.c

Tests: https://httpstorm.com/share/.ssl/

Now about the ASM lockless implementations of atomic functions: I can either

  1. replace the last patch with them
  2. integrate them in the replacement function, only those I added
  3. integrate them in the replacement function, all
  4. do nothing

@tom-cosgrove-arm
Copy link
Contributor

I'm becoming involved with this at a late stage, so don't want to be very demanding, but it seems that this PR does three different things:

  1. use BIO_snprintf() consistently, since not all platforms have snprintf;
  2. use ossl_inline consistently, since not all compilers support inline;
  3. deal with InterlockedAnd64() and InterlockedAdd64() not being available on older versions of Visual Studio.

While all 3 are needed to get OpenSSL building with older versions of Visual Studio, they feel like quite separate things.

I personally think that 3 separate PRs would mean (1) and (2) could be merged very quickly, while we consider the best approach for the missing lockless functions.

If other people would rather this stays as one PR that's fair enough; splitting this might get things merged sooner

@t8m t8m added branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug triaged: feature The issue/pr requests/adds a feature labels May 10, 2024
@t8m
Copy link
Member

t8m commented May 10, 2024

I would also support splitting this as 1 and 2 would be eligible for cherry pick to stable branches but 3 does not really look like so.

As snprintf is not available everywhere, use BIO_snprintf instead.

Fixes:
        IF EXIST test\quic_multistream_test.exe.manifest DEL /F /Q test\quic_multistream_test.exe.manifest
        "link" /nologo /debug setargv.obj /subsystem:console /opt:ref  /nologo /debug @v:\_tmp\nm4.tmp
quic_multistream_test-bin-quic_multistream_test.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _helper_init
test\quic_multistream_test.exe : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '"E:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\link.EXE"' : return code '0x460'

Signed-off-by: Georgi Valkov <[email protected]>
VC 2010 or earlier compilers do not support static inline.
To work around this problem, we can use the ossl_inline macro.

Fixes:
crypto\threads_win.c(171) : error C2054: expected '(' to follow 'inline'
crypto\threads_win.c(172) : error C2085: 'get_hold_current_qp' : not in formal parameter list
crypto\threads_win.c(172) : error C2143: syntax error : missing ';' before '{'
crypto\threads_win.c(228) : warning C4013: 'get_hold_current_qp' undefined; assuming extern returning int
crypto\threads_win.c(228) : warning C4047: '=' : 'rcu_qp *' differs in levels of indirection from 'int'

Signed-off-by: Georgi Valkov <[email protected]>
InterlockedAnd64 and InterlockedAdd64 are not available on VS2010 x86.
We already have implemented replacements for other functions, such as
InterlockedOr64. Apply the same approach to fix the errors.
A CRYPTO_RWLOCK rw_lock is added to rcu_lock_st.

Replace InterlockedOr64 and InterlockedOr with CRYPTO_atomic_load and
CRYPTO_atomic_load_int, using the existing design pattern.

Add documentation and tests for the new atomic functions
CRYPTO_atomic_add64, CRYPTO_atomic_and

Fixes:
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAdd64 referenced in function _get_hold_current_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr referenced in function _get_hold_current_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAnd64 referenced in function _update_qp
libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr64 referenced in function _ossl_synchronize_rcu

Signed-off-by: Georgi Valkov <[email protected]>
@httpstorm
Copy link
Contributor Author

httpstorm commented May 11, 2024

@tom-cosgrove-arm @t8m @nhorman @viruscamp

I took time to analyse the ASM code from #24109 . It is better optimised than what Visual Studio 2019 would generate for a x86 release (I stepped it in the debugger). The code reads _Ptr, adds _Val then performs atomic compare exchange:

  • if the original value at _Ptr did not change, the new value is written and the function completes
  • if the original value at _Ptr did change, it starts over

Edit: add link to part 3/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch severity: ABI change This pull request contains ABI changes severity: fips change The pull request changes FIPS provider sources triaged: bug The issue/pr is/fixes a bug triaged: feature The issue/pr requests/adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants