Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: try to build with HPX on Fedora #7331

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

Conversation

junghans
Copy link
Contributor

@junghans junghans commented Sep 12, 2024

Due to some build failure with Fedora's HPX package in: https://koji.fedoraproject.org/koji/taskinfo?taskID=123213163

@junghans
Copy link
Contributor Author

junghans commented Sep 12, 2024

Ok we can reproduce it here:

The following tests FAILED:
	  2 - Kokkos_CoreUnitTest_Serial2 (Failed)
	  3 - Kokkos_CoreUnitTest_HPX (Failed)
Errors while running CTest

This is on x86_64, no ppc64le and aarch64, Kokkos_CoreUnitTest_HPX even segfaults.

@junghans
Copy link
Contributor Author

Another side note @diehlpk figured out that:

export TESTS_ARGUMENTS="--hpx:ini=hpx.stacks.use_guard_pages=0"

helps with the hpx test.

/CC @hkaiser

@masterleinad
Copy link
Contributor

We might just disable the respective since HPX throws its own exceptions by using std::set_new_handler (or try to catch that exception and throw our own).

@hkaiser
Copy link

hkaiser commented Sep 12, 2024

Yes, currently HPX throws its own exceptions in tthis case, which is wrong. See STEllAR-GROUP/hpx#6543 for a possible fix.

Where can I see the actual runtime error generated by HPX, btw?

@masterleinad
Copy link
Contributor

Where can I see the actual runtime error generated by HPX, btw?

https://github.com/kokkos/kokkos/actions/runs/10834072931/job/30062369318?pr=7331 has

[ RUN ] hpx.view_bad_alloc
terminate called after throwing an instance of 'hpx::detail::exception_with_infohpx::exception'
what(): new allocator failed to allocate memory: HPX(out_of_memory)

and it results from

if (arg_alloc_size)
ptr = operator new(arg_alloc_size, std::align_val_t(alignment),
std::nothrow_t{});
if (!ptr || (reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0)) ||
(reinterpret_cast<uintptr_t>(ptr) & alignment_mask)) {
Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
}
we are explicitly requesting not to throw an error when allocating.

@hkaiser
Copy link

hkaiser commented Sep 12, 2024

Where can I see the actual runtime error generated by HPX, btw?

https://github.com/kokkos/kokkos/actions/runs/10834072931/job/30062369318?pr=7331 has

[ RUN ] hpx.view_bad_alloc
terminate called after throwing an instance of 'hpx::detail::exception_with_infohpx::exception'
what(): new allocator failed to allocate memory: HPX(out_of_memory)

and it results from

if (arg_alloc_size)
ptr = operator new(arg_alloc_size, std::align_val_t(alignment),
std::nothrow_t{});
if (!ptr || (reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0)) ||
(reinterpret_cast<uintptr_t>(ptr) & alignment_mask)) {
Impl::throw_bad_alloc(name(), arg_alloc_size, arg_label);
}

we are explicitly requesting not to throw an error when allocating.

Ok, this is something else, then. Why is a handler set by std::set_new_handler invoked in this case to begin with?

@masterleinad
Copy link
Contributor

Ok, this is something else, then. Why is a handler set by std::set_new_handler invoked in this case to begin with?

It seems https://github.com/llvm/llvm-project/blob/82a36468c74a29b6154639d659550c62457e655b/libcxx/src/new.cpp#L162-L166 is executed but we don't hit the catch clause for some reason but abort with an unhandled exception.

@hkaiser
Copy link

hkaiser commented Sep 12, 2024

Ok, this is something else, then. Why is a handler set by std::set_new_handler invoked in this case to begin with?

It seems https://github.com/llvm/llvm-project/blob/82a36468c74a29b6154639d659550c62457e655b/libcxx/src/new.cpp#L162-L166 is executed but we don't hit the catch clause for some reason but abort with an unhandled exception.

Not sure what I can do about this... Would you have any ideas? It shouldn't matter whether the exception thrown from our new-handler is actually derived from std::bad_alloc (as it should, see the PR I linked above).

@hkaiser
Copy link

hkaiser commented Sep 13, 2024

While I'm still not sure why this is happening, I can offer adding an option to HPX that allows disabling to set a new-handler (in the same way as one can already disable HPX's signal handling). This option could be used in environments like the Kokkos HPX backend. What do you think?

@masterleinad
Copy link
Contributor

Note that we already have a HPX build in our CI that I would expect to reproduce the issue if we update the HPX version there, see

- name: checkout hpx
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
with:
repository: STELLAR-GROUP/hpx
ref: v1.9.0
path: hpx
.

@junghans
Copy link
Contributor Author

junghans commented Sep 16, 2024

Note that we already have a HPX build in our CI that I would expect to reproduce the issue if we update the HPX version there, see

- name: checkout hpx
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
with:
repository: STELLAR-GROUP/hpx
ref: v1.9.0
path: hpx

.

Let's see.

@junghans
Copy link
Contributor Author

@masterleinad github-Linux-hpx works, but the build with the hpx-devel package on Fedora fails. Not sure if it is worth digging, but the fedora build logs are here:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2540054

@masterleinad
Copy link
Contributor

@masterleinad github-Linux-hpx works, but the build with the hpx-devel package on Fedora fails. Not sure if it is worth digging, but the fedora build logs are here:

Interesting. I can reproduce the issue locally with clang and libc++.

@junghans
Copy link
Contributor Author

@masterleinad github-Linux-hpx works, but the build with the hpx-devel package on Fedora fails. Not sure if it is worth digging, but the fedora build logs are here:

Interesting. I can reproduce the issue locally with clang and libc++.

Did you build with:

            -DCMAKE_BUILD_TYPE=Debug \
            -DHPX_WITH_UNITY_BUILD=ON \
            -DHPX_WITH_MALLOC=system \
            -DHPX_WITH_NETWORKING=OFF \
            -DHPX_WITH_EXAMPLES=OFF \
            -DHPX_WITH_TESTS=OFF \

@junghans
Copy link
Contributor Author

@masterleinad @hkaiser any update or patch to try?

@masterleinad
Copy link
Contributor

@masterleinad @hkaiser any update or patch to try?

No, I think we need someone to dedicate some time to understand the root cause why the compiler even calls the handler set by std::set_new_handler.

@junghans
Copy link
Contributor Author

Ok, ping me if there is any patch to add to hpx package on Fedora!

@masterleinad
Copy link
Contributor

@diehlpk Can you try to have a look at this?

@diehlpk
Copy link
Contributor

diehlpk commented Oct 2, 2024

@masterleinad Currently,. I have no time but maybe @hkaiser can have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants