Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[osx] libcxx v18.1.8: Failure with ILLEGAL instruction at runtime on dependent project #162

Closed
jjerphan opened this issue Jul 12, 2024 · 33 comments · Fixed by #173
Closed

Comments

@jjerphan
Copy link
Member

First observed in ArcticDB, here. Logs for posterity:

The following tests FAILED:
	  1 - Async.SinkBasic (ILLEGAL)
	  2 - Async.DeDupTest (ILLEGAL)
	 23 - Segment.RoundtripTimeseriesDescriptorWriteToBufferV2 (ILLEGAL)
	 27 - SegmentHeader.SerializeUnserializeV1 (ILLEGAL)
	323 - SparseTestStore.Compact (ILLEGAL)
	324 - SparseTestStore.CompactWithStrings (ILLEGAL)
	388 - VersionStoreTest.CompactIncompleteDynamicSchema (ILLEGAL)
	408 - TestEmbedded/SimpleTestSuite.Example/lmdb  # GetParam() = 24-byte object <08-6C 6D-64 62-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00> (ILLEGAL)
	412 - AllStoragesCommonTests/GenericStorageTest.WriteDuplicateKeyException/(ptr = 0x6000023e8018, value = 8-byte object <90-1F 99-0A 01-00 00-00>) (ILLEGAL)

Also observed on the feedstock with conda-forge/arcticdb-feedstock#249.

Currently solved by using libcxx<18 (see man-group/ArcticDB#1680).

We need to come up with a reproducer for it.

@h-vetinari
Copy link
Member

Thanks for the issue. Gah, libcxx 18 is a handful... 😑

We need to come up with a reproducer for it.

If it's a small reproducer we can put it in a test here, if it's larger it would be great if it could be run in the test: section of a feedstock, because then we could add it as a downstreams: here (which doesn't rebuild the feedstock, it just runs the test suite of that other feedstock while installing the local libcxx).

@jjerphan
Copy link
Member Author

jjerphan commented Jul 12, 2024

libcxx-18.1.8-hef8daea_0.conda is the problematic build. Should we declare it as broken for now?

PS: For now I have opened conda-forge/admin-requests#1028 as a proposal. Feel free to approve it or not.

@h-vetinari
Copy link
Member

libcxx-18.1.8-hef8daea_0.conda is the problematic build. Should we declare it as broken for now?

The package was already downloaded a couple thousand times, and probably shows up in various constraints already. Given that arcticdb is so far (🤞) the only project where problems occurred, I'd prefer not to mark things as broken, unless the blast radius turns out to be bigger than a handful of feedstocks.

@h-vetinari
Copy link
Member

So I tried the hypothesis that this had something to do with not shipping an up-to-date libunwind in conda-forge/arcticdb-feedstock#251, but the failure is the same both with the vanilla 18.1.8, as well as the modified version from #163. So from that angle, it looks like the failure comes from somewhere else.

Both fail with:

python/tests/integration/arcticdb/test_arctic.py::test_write_metadata_with_none[real_s3-0] SKIPPED [  1%]
python/tests/integration/arcticdb/test_arctic.py::test_write_metadata_with_none[real_s3-1] SKIPPED [  1%]
Fatal Python error: Illegal instruction

Current thread 0x0000000118f3a600 (most recent call first):
  File "$PREFIX/lib/python3.11/site-packages/arcticdb/version_store/_store.py", line 1926 in compact_incomplete
  File "$PREFIX/lib/python3.11/site-packages/arcticdb/version_store/library.py", line 925 in finalize_staged_data
  File "/Users/runner/miniforge3/conda-bld/arcticdb_1720947707758/test_tmp/python/tests/integration/arcticdb/test_arctic.py", line 229 in test_staged_data
  File "$PREFIX/lib/python3.11/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "$PREFIX/lib/python3.11/site-packages/_pytest/python.py", line 1632 in runtest
  File "$PREFIX/lib/python3.11/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "$PREFIX/lib/python3.11/site-packages/_pytest/runner.py", line 241 in <lambda>
  File "$PREFIX/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "$PREFIX/lib/python3.11/site-packages/_pytest/runner.py", line 240 in call_and_report
  File "$PREFIX/lib/python3.11/site-packages/_pytest/runner.py", line 135 in runtestprotocol
  File "$PREFIX/lib/python3.11/site-packages/pytest_rerunfailures.py", line 549 in pytest_runtest_protocol
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 339 in _main
  File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 285 in wrap_session
  File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "$PREFIX/lib/python3.11/site-packages/_pytest/config/__init__.py", line 178 in main
  File "$PREFIX/lib/python3.11/site-packages/_pytest/config/__init__.py", line 206 in console_main
  File "$PREFIX/bin/pytest", line 10 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, _brotli, zstandard.backend_c, google.protobuf.pyext._message, msgpack._cmsgpack, yaml._yaml, bson._cbson, pymongo._cmessage, _cffi_backend (total: 61)
/Users/runner/miniforge3/conda-bld/arcticdb_1720947707758/test_tmp/run_test.sh: line 9: 14459 Illegal instruction: 4  ARCTICDB_RAND_SEED=$RANDOM pytest python/tests --verbose --reruns 5 --reruns-delay 1 -k "not Azure and not azurite and not test_lmdb_warnings_when_reopened and not test_force_compact_symbol_list_lock_held_past_ttl and not test_write_non_timestamp_index and not test_write_not_sorted_multi_index_exception and not test_map_size_bad_input and not test_map_size_parsing and not lmdb"
WARNING: Tests failed for arcticdb-4.5.0rc1-py311hbadd9f8_1.conda - moving package to /Users/runner/miniforge3/conda-bld/broken
TESTS FAILED: arcticdb-4.5.0rc1-py311hbadd9f8_1.conda

@jdblischak
Copy link
Member

We also ran into this unexpected Illegal instruction error for our osx-64 build in TileDB-Inc/tiledb-vcf-feedstock#125 (comment)

@h-vetinari
Copy link
Member

Can someone try to attach a debugger and get a better stacktrace than just "illegal instruction"?

In the meantime, I'll try to build a >=MacOS 12.0 version for the libcxx_dev label, to see if the deployment target plays a role somehow.

@h-vetinari
Copy link
Member

Also no dice in conda-forge/arcticdb-feedstock#251 with the builds from #164.

@jjerphan @jdblischak, could you check the list of ABI changes in libcxx 18 to see if anything stands out at you?

The first bullet point related to exceptions shouldn't affect arcticdb or our builds here (AFAICT from a code search), but I'm now starting to wonder about one of the oldest patches we're carrying, as it seems to do something related to this.

@isuruf
Copy link
Member

isuruf commented Jul 17, 2024

Does the error happen when libcxx is updated in an existing environment or is it only when arcticdb is built with libcxx=18 ?

@jjerphan
Copy link
Member Author

jjerphan commented Jul 17, 2024

I would said both a priori but I only know that it does when ArcticDB is built against it (and thus used at runtime).

For this point, how about continuing the inspecting and discussion on conda-forge/arcticdb-feedstock#253?

@h-vetinari
Copy link
Member

h-vetinari commented Jul 17, 2024

Does the error happen when libcxx is updated in an existing environment or is it only when arcticdb is built with libcxx=18?

I tested this hypothesis in conda-forge/arcticdb-feedstock#251 (see CI for conda-forge/arcticdb-feedstock@3b8d221) and the problem only arises when building against libcxx 18. If we build with libcxx 17 and then run with libcxx 18, things work.

@isuruf
Copy link
Member

isuruf commented Jul 18, 2024

Looks like arcticdb links to libc++abi.dylib which is not really necessary nor supported I think. With libc++-18 and 17, the linking order is different. Might be the casue.

@h-vetinari
Copy link
Member

So, just for debugging purposes (not proposing we do this), I shipped libc++abi.dylib in #166. Makes no difference though... 🤷

@h-vetinari
Copy link
Member

To Isuru's point:

   INFO (arcticdb,lib/python3.10/site-packages/arcticdb_ext.cpython-310-darwin.so): Needed DSO /Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk/usr/lib/libc++abi.dylib found in $SYSROOT

(which is the only occurrence in a green build, e.g. last run on main).

@h-vetinari
Copy link
Member

From staring at arcticdb's CMakeLists.txt a bit, I cannot see anything specific for building arcticdb_ext that stands out w.r.t. osx/clang, much less libc++abi. It seems to be a pretty vanilla pybind11 build. However, that recipe is currently limited to pybind <2.11, which means versions >1 year old. Not sure if that has any bearing. 🤷

@h-vetinari
Copy link
Member

So this issue seems to be showing up in polars as well (found while searching for the error): pola-rs/polars#17687

In contrast to arcticdb, I couldn't reproduce this on the feedstock though: conda-forge/polars-feedstock#250

@isuruf
Copy link
Member

isuruf commented Jul 19, 2024

No, that's not related. See the comment above your comment in the upstream issue in polars

@h-vetinari
Copy link
Member

Ah OK, didn't know the lts setup for polars. Why isn't this using the new the new archspec capabilities though, instead of merely warning on incompatible CPU instructions...?

In any case, if you have an idea or even just a suspicion what might be causing the libcxx issues, I'm happy to investigate a bit, but for now I'm stumbling in the dark.

@jjerphan
Copy link
Member Author

jjerphan commented Jul 23, 2024

The "Illegal Instruction" might be the result of programs jumping to an inappropriate address, hence it might be something related to the use of different conventions for stack unwinding, or frame pointers.

@jjerphan
Copy link
Member Author

jjerphan commented Jul 23, 2024

From this section of the release notes, some changes to the build system might need to be taken care of:

LIBCXXABI_USE_LLVM_UNWINDER and COMPILER_RT_USE_LLVM_UNWINDER switched defaults from OFF to ON. This means that by default, libc++abi and compiler-rt will link against the LLVM provided libunwind library instead of the system-provided unwinding library. If you are building the LLVM runtimes with the goal of shipping them so that they can interoperate with other system-provided libraries that might be using a different unwinding library (such as libgcc_s), you should pass LIBCXXABI_USE_LLVM_UNWINDER=OFF and COMPILER_RT_USE_LLVM_UNWINDER=OFF to make sure the system-provided unwinding library is used by the LLVM runtimes.

There are also changes regarding LIBUWIND_EXECTOR and LIBCXXABI_EXECUTOR (see the first item of the section), but they should not impact conda-forge's distribution since it does not make use of them.

@h-vetinari
Copy link
Member

The same errors show up in conda-forge/cctools-and-ld64-feedstock#68; the interesting thing is that there we don't have any involvement of libc++abi or libunwind (at least based on the logs).

@h-vetinari
Copy link
Member

@jdblischak, is there a way to test the tiledb failure in conda-forge itself? Could you open a PR to the tiledb feedstock that reproduces it? Then I could test out a couple things more easily.

@h-vetinari
Copy link
Member

Well, I was hoping libcxx 19.1.0.rc1 would perhaps fare differently, but alas no: conda-forge/cctools-and-ld64-feedstock#72

@jjerphan
Copy link
Member Author

jjerphan commented Jul 29, 2024

Some hints from Fedora's distributions: the diff of build scripts for the latest distributed version of LLVM 17 and the latest revision of their version (for LLVM 18.1.8) only introduces -DLIBCXXABI_USE_LLVM_UNWINDER=OFF, as we did previously with -DCOMPILER_RT_USE_LLVM_UNWINDER=OFF:

git clone https://src.fedoraproject.org/rpms/libcxx.git /tmp/fedora-libcxx
cd /tmp/fedora-libcxx
git diff 4310761716f948004c108b5273f76e7a40753073..95d5cc4ebcee624dcde61fe31aa38eea556d2bec
diff --git a/libcxx.spec b/libcxx.spec
index ff478d1..23ad663 100644
--- a/libcxx.spec
+++ b/libcxx.spec
@@ -4,8 +4,8 @@
 # https://bugzilla.redhat.com/show_bug.cgi?id=2158587
 %undefine _include_frame_pointers
 
-%global maj_ver 17
-%global libcxx_version %{maj_ver}.0.6
+%global maj_ver 18
+%global libcxx_version %{maj_ver}.1.8
 #global rc_ver 4
 %global libcxx_srcdir libcxx-%{libcxx_version}%{?rc_ver:rc%{rc_ver}}.src
 %global libcxxabi_srcdir libcxxabi-%{libcxx_version}%{?rc_ver:rc%{rc_ver}}.src
@@ -13,7 +13,7 @@
 
 Name:		libcxx
 Version:	%{libcxx_version}%{?rc_ver:~rc%{rc_ver}}
-Release:	3%{?dist}
+Release:	2%{?dist}
 Summary:	C++ standard library targeting C++11
 License:	Apache-2.0 WITH LLVM-exception OR MIT OR NCSA
 URL:		http://libcxx.llvm.org/
@@ -133,7 +133,7 @@ mv ../%{libcxx_srcdir} libcxx
 mv ../%{libcxxabi_srcdir} libcxxabi
 mv ../%{libunwind_srcdir} libunwind
 mkdir -p runtimes/cmake/Modules
-mv %{SOURCE8} %{SOURCE9} runtimes/cmake/Modules/
+cp %{SOURCE8} %{SOURCE9} runtimes/cmake/Modules/
 %autopatch -p1
 
 %py3_shebang_fix libcxx/utils/
@@ -155,6 +155,7 @@ export ASMFLAGS=$CFLAGS
 	-DLIBCXX_INCLUDE_BENCHMARKS=OFF \
 	-DLIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY=ON \
 	-DLIBCXX_ENABLE_ABI_LINKER_SCRIPT=ON \
+	-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
 	-DLLVM_BUILD_DOCS=ON \
 	-DLLVM_ENABLE_SPHINX=ON \
 	-DLIBUNWIND_INCLUDE_DOCS=ON \
@@ -240,6 +241,36 @@ rm %{buildroot}%{_pkgdocdir}/html/.buildinfo
 %doc %{_pkgdocdir}/html
 
 %changelog
+* Thu Jul 18 2024 Fedora Release Engineering <[email protected]> - 18.1.8-2
+- Rebuilt for https://fedoraproject.org/wiki/Fedora_41_Mass_Rebuild
+
+* Thu Jul 11 2024 Jesus Checa Hidalgo <[email protected]> - 18.1.8-1
+- 18.1.8 Release
+
+* Thu Jun 13 2024 Tom Stellard <[email protected]> - 18.1.7-1
+- 18.1.7 Release
+
+* Tue May 21 2024 Tom Stellard <[email protected]> - 18.1.6-1
+- 18.1.6 Release
+
+* Fri May 03 2024 Tom Stellard <[email protected]> - 18.1.4-1
+- 18.1.4 Release
+
+* Wed Apr 17 2024 Tom Stellard <[email protected]> - 18.1.3-1
+- 18.1.3 Release
+
+* Fri Mar 22 2024 Tom Stellard <[email protected]> - 18.1.2-1
+- 18.1.2 Release
+
+* Wed Mar 13 2024 Tom Stellard <[email protected]> - 18.1.1-1
+- 18.1.1 Release
+
+* Mon Mar 04 2024 Nikita Popov <[email protected]> - 18.1.0~rc4-2
+- Disable LIBCXXABI_USE_LLVM_UNWINDER (rhbz#2267690)
+
+* Thu Feb 29 2024 Tom Stellard <[email protected]> - 18.1.0~rc4-1
+- 18.1.0-rc4 Release
+
 * Thu Jan 25 2024 Fedora Release Engineering <[email protected]> - 17.0.6-3
 - Rebuilt for https://fedoraproject.org/wiki/Fedora_40_Mass_Rebuild
 
diff --git a/sources b/sources
index d7d78a7..c581215 100644
--- a/sources
+++ b/sources
@@ -1,8 +1,8 @@
-SHA512 (libcxx-17.0.6.src.tar.xz) = 8be288ab14bd34a1946aeadc83c9e11ff68eb0cda707fd6bee711514d0e506112ffc2a40d0317d19f64f05b644e072f1322ed0e8ab238d9640b6b827d42126eb
-SHA512 (libcxx-17.0.6.src.tar.xz.sig) = 01ac59892c16b71b2fc1b2154e5f0af123ece7c1d55c85fa38ff9c7a2410ff0ced336f40b57235c59a480e756c87ea2e6324b056cc20368f86f8f3abde6e1bf4
-SHA512 (libcxxabi-17.0.6.src.tar.xz) = acd2b0b48eee4380625347a955fa3dfafce948aeccd87a2070e09bd91b148ae189aec12b506f3097193a8288095e99496c66ce26974772878622e5892d822cc3
-SHA512 (libcxxabi-17.0.6.src.tar.xz.sig) = f9abb5952bc6a95618eaf53df30dd9ce5dfef7dd11c1abadfc05e8dcd9f270e10f35ed4bee9163dfda0a7e508db8ddc75fdb48e724a88a924b2f150bb1e20fd8
-SHA512 (libunwind-17.0.6.src.tar.xz) = 4f0c1a38796022a476fab06d91c5c5ec76e060e0697941be83ee896a18e548d02605e4431ea9ac09dc36bc28cc5609fac6a200faff703eba2360dfab2dcf5300
-SHA512 (libunwind-17.0.6.src.tar.xz.sig) = 9c0990583855d826652ef1ecf3ecdc1f029592ddeacc0d776400b5fa8d044183193d43631c270857e6b48b501ec4f356d84399529fd9c317f69c3e4a9a377851
+SHA512 (libcxx-18.1.8.src.tar.xz) = b24f8adbc4edd0cde3a28c6ce0ae2d7cd32049c155459c669f30ee6400a7a0e789c968db7f93bb0aa0f972b47b86424b9655af00e99867e242baccece8f323e8
+SHA512 (libcxx-18.1.8.src.tar.xz.sig) = cdaee075a6d24c6c8cd2c5e80c68f30eadb73731d6a59767b7cb54f50bb22f7665707de013f04d5048d08edd807442063ca3154c5ca3c4ae236b90d135811af0
+SHA512 (libcxxabi-18.1.8.src.tar.xz) = 40f8691e86948527cd104b3b7f481757e6f8f4892fbe8632a6f86f35008e0c9d721e5f2d3629c6f0b99e9f150ee9f3d650aa111ea2c5f6df44fec481ff00a1f9
+SHA512 (libcxxabi-18.1.8.src.tar.xz.sig) = 5669aaaef6c3b228de56fa05d2099cfb23191242d9fc973f1a1939d9933dfb12124c0e1d08efddb34e6a546026648e8d9e84b3cb7dd641a688c39d2a9a08c320
+SHA512 (libunwind-18.1.8.src.tar.xz) = d6bf0e462db5d99bcdc1170c3789e84f21d86e35a6e79b75bea0a6f9aad222a25400944bb7fa030bf94e51cccc5e42b7b7002be98c0936f2852ac5688c0c3a84
+SHA512 (libunwind-18.1.8.src.tar.xz.sig) = c20f9e9fea812e94227e0b00d725b24f76e3cfcc7fc5360fa60b162429df847e30ebf5e9191c5e613082fec94ef8a4830f04732045acaf864f08dc6703da5954
 SHA512 (HandleFlags.cmake) = 525fe99751c68b93f28651440d299776ba0ea198989a9150f256ca640187551fc8050bbc27fc75e852fd126d4bcd8590ab5d3d7c224b3a5f7125876fc8e48eb1
 SHA512 (WarningFlags.cmake) = ad0d9eff0ce4ef69a8090061e6260a95effa2006a34099d68f0881ddea91724f9bc17133b00c9a0276a6ae35a4a305a3e164f17350e883152f452762cf8b88c5

Naive questions: Is there an adaptation made within the previous patches which is causing problem? Are all the patches still relevant for conda-forge's distributions?

@traversaro
Copy link

traversaro commented Jul 29, 2024

I am back at work and I have access to a macos machine, and I was able to reproduce the issue on both osx-64 and osx-arm64 with a (definitely non-minimal) reproducer:

git clone -b debugcxx18 https://github.com/robotology/idyntree
cd idyntree
pixi run test

this fails with:

59/59 Test #59: IntegrationTestiCubTorqueEstimation ...........SIGTRAP***Exception:   0.00 sec

@traversaro
Copy link

traversaro commented Jul 29, 2024

bt is:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x100085414)
  * frame #0: 0x0000000100085414 iCubTorqueEstimationIntegrationTest`iDynTree::estimateLinkContactWrenchesFromLinkNetExternalWrenches(iDynTree::Model const&, iDynTree::LinkUnknownWrenchContacts const&, iDynTree::LinkWrenches const&, iDynTree::LinkContactWrenches&) + 1032
    frame #1: 0x0000000100004c4c iCubTorqueEstimationIntegrationTest`extractJointTorquesAndContactForces(iDynTree::BerdyHelper const&, iDynTree::VectorDynSize const&, iDynTree::VectorDynSize const&, iDynTree::LinkUnknownWrenchContacts const&, iDynTree::VectorDynSize&, iDynTree::LinkContactWrenches&) + 148
    frame #2: 0x0000000100006718 iCubTorqueEstimationIntegrationTest`compareEstimators(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, CompareEstimatorsOptions const&) + 6656
    frame #3: 0x000000010000842c iCubTorqueEstimationIntegrationTest`main + 192
    frame #4: 0x000000010012d08c dyld`start + 520

but I can't find anything useful in it.

@h-vetinari
Copy link
Member

Well, I was hopeful after having found Homebrew/homebrew-core#178435 and getting the tip there to pass in #172 without having to do the install-name-tool redirects. However, it still fails with conda-forge/cctools-and-ld64-feedstock#72 😭

@jdblischak
Copy link
Member

is there a way to test the tiledb failure in conda-forge itself? Could you open a PR to the tiledb feedstock that reproduces it? Then I could test out a couple things more easily.

@h-vetinari I'm sorry. It's not clear what you are asking me to do. We have never observed this PR in tiledb-feedstock, so I don't know how to reproduce it there.

We observed the Illegal instruction in TileDB-Inc/tiledb-vcf-feedstock. While it isn't in conda-forge, it is a feedstock created by conda-smithy. Thus you should be able to fork the repository to your account and test it just like any conda-forge feedstock. You can even start from the branch on my fork that I used to reproduce this issue in TileDB-Inc/tiledb-vcf-feedstock#131

Unfortunately I'm leaving for vacation today, so I won't be able to help again until next week

@jjerphan jjerphan changed the title [osx_64] libcxx v18.1.8: Failure with ILLEGAL instruction at runtime on dependent project [osx] libcxx v18.1.8: Failure with ILLEGAL instruction at runtime on dependent project Jul 29, 2024
@traversaro
Copy link

traversaro commented Jul 30, 2024

Interesting update on my side. It turns out that in the case of robotology/idyntree#1192 the crash induced by the update to libcxx 18 was actually a problem on my code (fixed in robotology/idyntree#1198), not in libcxx==18.*.

The problematic code was something like:

struct Foo
{
    int a;
    int b;
}

std::vector<Foo> vecFoo;
vecFoo.resize(0);

struct Bar
{   
    int b;
    int c;
}

std::vector<Bar> vecBar;
vecBar.resize(0);

vecBar[0].b = vecFoo[0].b;

It is clearly an undefined behavior code, but apparently it always worked fine on Linux, macOS and Windows, even under valgrind. I do not know if some kind of sanitizer would have detected this, we do not have that enabled in CI at the moment.

@jjerphan
Copy link
Member Author

jjerphan commented Jul 30, 2024

mamba also has problem when libcxx 18 is used at build time on macOS, arm64 at least (see mamba-org/mamba#3371).

mamba-org/mamba@83710ef works around it.

@h-vetinari
Copy link
Member

h-vetinari commented Jul 30, 2024

OK, this looks to have been the due to the hardening mode we enabled for 18.x, and therefore my fault. I'm very sorry for the troubles this has caused, and grateful for everyone's help in tracking this down (especially the relentless @jjerphan, who figured this out in #174 🙏). For context, this had been discussed in #136, and the hardening documentation during the 18.x cycle did not explain what's there now, namely that the default assertion handler is just __builtin_verbose_trap.

The takeaway for this is that the hardening modes need to be off for production, but it would definitely make sense to provide a LIBCXX_HARDENING_MODE=debug build in a separate label, so that people can opt into testing their builds with hardened libcxx (e.g. to find the UB that seems to be present in the affected projects), but avoid this for production.

@jjerphan
Copy link
Member Author

Thank you for all the work on distributing compilers and their implementation of the standard libraries. Now we know the existence of libc++'s hardening mode, and a few projects now know that they can check for UBs and correct them, which is highly beneficial IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants