New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threads_win: fix build error with VS2010 #24326
base: master
Are you sure you want to change the base?
Conversation
8a2db04
to
42bac3d
Compare
42bac3d
to
a971c66
Compare
I submitted signed CLA over e-mail. |
@@ -569,6 +576,44 @@ int CRYPTO_atomic_add(int *val, int amount, int *ret, CRYPTO_RWLOCK *lock) | |||
return 1; | |||
} | |||
|
|||
int CRYPTO_atomic_add64(uint64_t *val, uint64_t op, uint64_t *ret, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to run make update to get this in the exported ordinals list, its why CI is failing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I was wondering what this error means.
That ancient piece of history Windows 2003 is using nmake
to compile, so I ran the command on macOS instead. I see the two new functions were added to util/libcrypto.num
. I hope that's good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. The CI test for symbol presence is failing, need to look at why.
also, if you're going to add new exported library symbols, the docs check is going to fail if you don't document them in the appropriate pod file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/man3/CRYPTO_THREAD_run_once.pod
contains all related function, so I added the description there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building on Windows works only with nmake
. I do have GNU Make 4.4.1
, but both fail to perform the update
target. So my options are macOS 14.4.1 Intel, and Ubuntu 22.04.4 LTS.
nmake update
Microsoft (R) Program Maintenance Utility Version 10.00.40219.01
Copyright (C) Microsoft Corporation. All rights reserved.
NMAKE : fatal error U1073: don't know how to make 'update'
Stop.
make update
makefile:78: *** missing separator. Stop.
I also noticed that apart from threads_win.c
, the atomic functions are also defined in threads_pthread.c
and threads_none.c
. Do I need to implement them there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe I need to add them to ./test/threadstest.c
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do need to add the definitions to threads_pthread.c and threads_none.c, as only one of the three (win/pthreads/none) gets compiled for the build and you need a definition for the exported symbols
As for why make update isn't working. I expect its got something to do with the version of nmake on vs2010. Its worked for me on VS2022 community edition. note VS2010 isn't an officially supported platform
I'd focus on getting the implementations defined on other platforms first, then worry about being able to make the update script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added definitions to threads_pthread.c
and threads_none.c
. Added tests to threadstest.c
.
whats the output of:
with this change? I'd like to make sure this isn't going to create a performance impact to the rcu locks on windows |
a971c66
to
6231612
Compare
@@ -569,6 +576,44 @@ int CRYPTO_atomic_add(int *val, int amount, int *ret, CRYPTO_RWLOCK *lock) | |||
return 1; | |||
} | |||
|
|||
int CRYPTO_atomic_add64(uint64_t *val, uint64_t op, uint64_t *ret, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. The CI test for symbol presence is failing, need to look at why.
also, if you're going to add new exported library symbols, the docs check is going to fail if you don't document them in the appropriate pod file
92cc5b0
to
344d9a4
Compare
344d9a4
to
b5ddf2e
Compare
34f90ac
to
896cb61
Compare
IF EXIST test\quic_multistream_test.exe.manifest DEL /F /Q test\quic_multistream_test.exe.manifest
"link" /nologo /debug setargv.obj /subsystem:console /opt:ref /nologo /debug @V:\_tmp\nm4.tmp
quic_multistream_test-bin-quic_multistream_test.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _helper_init
test\quic_multistream_test.exe : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '"E:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\link.EXE"' : return code '0x460'
I agree, performance is an important factor. One approach would be to use macros, so that only VS2010 x86 target is affected. I didn't do it this way, because I tried to follow existing patterns. We can also apply this to other atomic functions (that are already replaced) if you find it necessary. Note: macOS test ran on MacBook Pro 2019 Intel, the rest of the tests ran on Intel i7 930. |
Something isn't right, your tests on windows 11 show that RCU lock performance is about 1/2 of corresponding rw lock performance. They should.be about equal for the light contention case and RCU should do much better in the heavy contention case. |
Any idea why? I see the same machine on Linux matches the behaviour you mentioned.
The computer is from 2010 i7 930 4 cores+HT. I ran that test again to compare performance before and after the patches on the same machine, while idle. RCU performance is again half of RW.
# torture_rw_low
- performed 47544198 reads and 6 writes over 2 read and 2 write threads in 4.056000e+00 seconds
+ performed 48842838 reads and 6 writes over 2 read and 2 write threads in 4.048000e+00 seconds
- Average read time 8.531009e-08/read
+ Average read time 8.287807e-08/read
- Averate write time 6.760000e-01/write
+ Averate write time 6.746667e-01/write
# torture_rw_high
- performed 41121498 reads and 654482 writes over 2 read and 2 write threads in 3.589000e+00 seconds
+ performed 43130985 reads and 795234 writes over 2 read and 2 write threads in 3.920000e+00 seconds
- Average read time 8.727795e-08/read
+ Average read time 9.088594e-08/read
- Averate write time 5.483726e-06/write
+ Averate write time 4.929367e-06/write
# torture_rcu_low
- performed 24620885 reads and 6 writes over 2 read and 2 write threads in 4.044000e+00 seconds
+ performed 25965189 reads and 6 writes over 2 read and 2 write threads in 4.055000e+00 seconds
- Average read time 1.642508e-07/read
+ Average read time 1.561706e-07/read
- Average write time 6.740000e-01/write
+ Average write time 6.758333e-01/write
# torture_rcu_high
- performed 21784861 reads and 1760318 writes over 2 read and 2 write threads in 3.956000e+00 seconds
+ performed 25114008 reads and 1759286 writes over 2 read and 2 write threads in 3.945000e+00 seconds
- Average read time 1.815940e-07/read
+ Average read time 1.570836e-07/read
- Average write time 2.247321e-06/write
+ Average write time 2.242387e-06/write
# Linux
# torture_rw_low
performed 29389229 reads and 6 writes over 2 read and 2 write threads in 4.001009e+00 seconds
Average read time 1.361386e-07/read
Averate write time 6.668348e-01/write
# torture_rw_high
performed 8561309 reads and 336462 writes over 2 read and 2 write threads in 3.367428e+00 seconds
Average read time 3.933310e-07/read
Averate write time 1.000835e-05/write
# torture_rcu_low
performed 26298285 reads and 6 writes over 2 read and 2 write threads in 4.000877e+00 seconds
Average read time 1.521345e-07/read
Average write time 6.668128e-01/write
# torture_rcu_high
performed 34343847 reads and 2291988 writes over 2 read and 2 write threads in 3.998745e+00 seconds
Average read time 1.164326e-07/read
Average write time 1.744662e-06/write Good news: Windows 2003 + VS2010 pass
Note: the IPv6 network stack was not installed. If I install it, that test passes, but later two other fail I ran all tests on other platforms, here are the results. Note that on Windows, x64 and x86 tests were ran at the same time.
quic_multistream_test: fix undefined symbol snprintf with VS2010 diff --git a/test/quic_multistream_test.c b/test/quic_multistream_test.c
index e01e8ae85a..8437d879f9 100644
--- a/test/quic_multistream_test.c
+++ b/test/quic_multistream_test.c
@@ -23,6 +23,12 @@
#endif
#include "internal/numbers.h" /* UINT64_C */
+#if (defined(_MSC_VER) && _MSC_VER <= 1600)
+# define SNPRINTF_FN sprintf_s
+#else
+# define SNPRINTF_FN snprintf
+#endif
+
static const char *certfile, *keyfile;
#if defined(OPENSSL_THREADS)
@@ -784,7 +790,7 @@ static int helper_init(struct helper *h, const char *script_name,
goto err;
/* Set title for qlog purposes. */
- snprintf(title, sizeof(title), "quic_multistream_test: %s", script_name);
+ SNPRINTF_FN(title, sizeof(title), "quic_multistream_test: %s", script_name);
if (!TEST_true(ossl_quic_set_diag_title(h->c_ctx, title)))
goto err;
@@ -5827,7 +5833,7 @@ static int test_script(int idx)
}
#endif
- snprintf(script_name, sizeof(script_name), "script %d", script_idx + 1);
+ SNPRINTF_FN(script_name, sizeof(script_name), "script %d", script_idx + 1);
TEST_info("Running script %d (order=%d, blocking=%d)", script_idx + 1,
free_order, blocking);
@@ -5912,7 +5918,7 @@ static ossl_unused int test_dyn_frame_types(int idx)
s[i].arg2 = forbidden_frame_types[idx].expected_err;
}
- snprintf(script_name, sizeof(script_name),
+ SNPRINTF_FN(script_name, sizeof(script_name),
"dyn script %d", idx);
return run_script(dyn_frame_types_script, script_name, 0, 0);
--
2.44.0 I noticed that existing atomic functions in Please let me know if I need to change anything else. What about the |
@httpstorm please check #24109 , I have built openssl-3.3.0 with that patch in vs2010 x86 winxp-sp3-x86. The tests results:
|
@viruscamp |
crypto/threads_win.c
Outdated
@@ -168,17 +184,23 @@ void ossl_rcu_lock_free(CRYPTO_RCU_LOCK *lock) | |||
OPENSSL_free(lock); | |||
} | |||
|
|||
static inline struct rcu_qp *get_hold_current_qp(CRYPTO_RCU_LOCK *lock) | |||
static INLINE_VS struct rcu_qp *get_hold_current_qp(CRYPTO_RCU_LOCK *lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have ossl_inline
please use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I changed it. Test build successful on Windows 2003, VS2010.
crypto/threads_win.c
Outdated
/* | ||
* VC++ 2010 or earlier compilers do not support static inline. | ||
* To work around this problem, we can use a macro. | ||
*/ | ||
|
||
#if (defined(_MSC_VER) && _MSC_VER <= 1600) | ||
# define INLINE_VS | ||
#else | ||
# define INLINE_VS inline | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop this. We have ossl_inline
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped. Now using ossl_inline
.
no unfortunately, but something definately seems off |
8a9652f
to
b88b52e
Compare
b88b52e
to
45bd2fc
Compare
Updated:
Tests: https://httpstorm.com/share/.ssl/ Now about the ASM lockless implementations of atomic functions: I can either
|
45bd2fc
to
ab4b0bf
Compare
I'm becoming involved with this at a late stage, so don't want to be very demanding, but it seems that this PR does three different things:
While all 3 are needed to get OpenSSL building with older versions of Visual Studio, they feel like quite separate things. I personally think that 3 separate PRs would mean (1) and (2) could be merged very quickly, while we consider the best approach for the missing lockless functions. If other people would rather this stays as one PR that's fair enough; splitting this might get things merged sooner |
I would also support splitting this as 1 and 2 would be eligible for cherry pick to stable branches but 3 does not really look like so. |
As snprintf is not available everywhere, use BIO_snprintf instead. Fixes: IF EXIST test\quic_multistream_test.exe.manifest DEL /F /Q test\quic_multistream_test.exe.manifest "link" /nologo /debug setargv.obj /subsystem:console /opt:ref /nologo /debug @v:\_tmp\nm4.tmp quic_multistream_test-bin-quic_multistream_test.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _helper_init test\quic_multistream_test.exe : fatal error LNK1120: 1 unresolved externals NMAKE : fatal error U1077: '"E:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\link.EXE"' : return code '0x460' Signed-off-by: Georgi Valkov <[email protected]>
VC 2010 or earlier compilers do not support static inline. To work around this problem, we can use the ossl_inline macro. Fixes: crypto\threads_win.c(171) : error C2054: expected '(' to follow 'inline' crypto\threads_win.c(172) : error C2085: 'get_hold_current_qp' : not in formal parameter list crypto\threads_win.c(172) : error C2143: syntax error : missing ';' before '{' crypto\threads_win.c(228) : warning C4013: 'get_hold_current_qp' undefined; assuming extern returning int crypto\threads_win.c(228) : warning C4047: '=' : 'rcu_qp *' differs in levels of indirection from 'int' Signed-off-by: Georgi Valkov <[email protected]>
InterlockedAnd64 and InterlockedAdd64 are not available on VS2010 x86. We already have implemented replacements for other functions, such as InterlockedOr64. Apply the same approach to fix the errors. A CRYPTO_RWLOCK rw_lock is added to rcu_lock_st. Replace InterlockedOr64 and InterlockedOr with CRYPTO_atomic_load and CRYPTO_atomic_load_int, using the existing design pattern. Add documentation and tests for the new atomic functions CRYPTO_atomic_add64, CRYPTO_atomic_and Fixes: libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAdd64 referenced in function _get_hold_current_qp libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr referenced in function _get_hold_current_qp libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedAnd64 referenced in function _update_qp libcrypto.lib(libcrypto-lib-threads_win.obj) : error LNK2019: unresolved external symbol _InterlockedOr64 referenced in function _ossl_synchronize_rcu Signed-off-by: Georgi Valkov <[email protected]>
ab4b0bf
to
e8580f1
Compare
@tom-cosgrove-arm @t8m @nhorman @viruscamp
I took time to analyse the ASM code from #24109 . It is better optimised than what Visual Studio 2019 would generate for a x86 release (I stepped it in the debugger). The code reads
Edit: add link to part 3/3 |
As
snprintf
is not available everywhere, useBIO_snprintf
instead.Fixes:
VC 2010 or earlier compilers do not support
static inline
. To work around this problem, we can use theossl_inline
macro.Fixes:
InterlockedAnd64
andInterlockedAdd64
are not available on VS2010 x86. We already have implemented replacements for other functions, such asInterlockedOr64
. Apply the same approach to fix the errors. ACRYPTO_RWLOCK rw_lock
is added torcu_lock_st
.Note that the return values of the replacement atomic functions for VS2010 x86 are not checked, because I did not know how to respond to these errors in each location where they are used. This might happen if the lock is not initialised yet. Though the lock is initialised in the beginning when each object is constructed, so this should not occur. Please let me know if any changes are needed.
Fixes:
Build and run tested: Windows 2003 x64, Windows 2003 x86, Visual Studio 2010.
@t8m @levitte @paulidale @nhorman @mattcaswell @hlandau
Checklist