Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure: resulting binary over 2GB #39

Open
prusnak opened this issue Feb 11, 2025 · 3 comments
Open

Build failure: resulting binary over 2GB #39

prusnak opened this issue Feb 11, 2025 · 3 comments

Comments

@prusnak
Copy link

prusnak commented Feb 11, 2025

This is related to pytorch/pytorch#39968 which got resolved in pytorch/pytorch#49050 by splitting the library into smaller ones

I am also including the Nixpkgs issue for more details: NixOS/nixpkgs#239237

In Nixpkgs we used -Xfatbin=-compress-all first (same as pytorch), but we are again hitting the 2G limit. Using -mcmodel=large does not seem to help, so I guess the only way is to split the magma library into smaller ones too (same approach as pytorch)

On x86_64-linux:

/nix/store/81mi7m3k3wsiz9rrrg636sx21psj20hc-glibc-2.40-66/lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o: in function `magma_use_zgeqrf_batched_fused_update':
get_batched_crossover.cpp:(.text+0x35a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `zgeqrf_panel_decision_a100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o: in function `magma_use_cgeqrf_batched_fused_update':
get_batched_crossover.cpp:(.text+0x45a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `cgeqrf_panel_decision_a100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o: in function `magma_use_dgeqrf_batched_fused_update':
get_batched_crossover.cpp:(.text+0x55a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `dgeqrf_panel_decision_a100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o: in function `magma_use_sgeqrf_batched_fused_update':
get_batched_crossover.cpp:(.text+0x65a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `sgeqrf_panel_decision_a100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o: in function `__static_initialization_and_destruction_0()':
get_batched_crossover.cpp:(.text.startup+0x4f8): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `sgeqrf_panel_decision_mi100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
get_batched_crossover.cpp:(.text.startup+0x554): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::~vector()' defined in .text._ZNSt6vectorIS_IiSaIiEESaIS1_EED2Ev[_ZNSt6vectorIS_IiSaIiEESaIS1_EED5Ev] section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
get_batched_crossover.cpp:(.text.startup+0x55b): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /nix/store/dmpy95dhb2m473l4nyxgmcp4cnnial8w-gfortran-14-20241116/lib/gcc/x86_64-unknown-linux-gnu/14.2.1/crtbeginS.o
get_batched_crossover.cpp:(.text.startup+0x854): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `dgeqrf_panel_decision_mi100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
get_batched_crossover.cpp:(.text.startup+0x8a0): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `dgeqrf_panel_decision_mi100' defined in .bss section in CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o
get_batched_crossover.cpp:(.text.startup+0x8a7): additional relocation overflows omitted from the output
lib/libmagma.so: PC-relative offset overflow in GOT PLT entry for `_Z38magmablas_zapply_vector_kernel_batchediiP7double2iPS0_ii'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

On aarch64-linux:

CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x1c): relocation truncated to fit: R_AARCH64_PREL32 against `.text._ZNSt6vectorIS_IiSaIiEESaIS1_EED2Ev'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x6c): relocation truncated to fit: R_AARCH64_PREL32 against `.text.startup'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x9c): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0xb0): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0xc4): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0xd8): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0xec): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x100): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x114): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x128): relocation truncated to fit: R_AARCH64_PREL32 against `.text'
CMakeFiles/magma.dir/control/get_batched_crossover.cpp.o:(.eh_frame+0x13c): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
@mgates3
Copy link
Contributor

mgates3 commented Feb 21, 2025

Could you give some context here to reproduce the issue? The best would be a MAGMA Makefile (make.inc) or CMake configuration that reproduces it. I saw the Nixpkgs issue, but it isn't clear there how you are building MAGMA. It appears that you are compiling for all these CUDA capabilities:
["6.0", "6.1", "6.2", "7.0", "7.2", "7.5", "8.0", "8.6", "8.9", "9.0"]
I don't know that there's significant benefit from compiling for, say, 6.1 or 6.2 if 6.0 is available.

@prusnak
Copy link
Author

prusnak commented Feb 21, 2025

It appears that you are compiling for all these CUDA capabilities:
["6.0", "6.1", "6.2", "7.0", "7.2", "7.5", "8.0", "8.6", "8.9", "9.0"]

Right

I don't know that there's significant benefit from compiling for, say, 6.1 or 6.2 if 6.0 is available.

Maybe @ConnorBaker can explain why we do this?

@ConnorBaker
Copy link

The GPU selection done in Nixpkgs is fairly naive and driven by https://github.com/NixOS/nixpkgs/blob/daa2a442b9c82a265afaedbdf5589adadc01095c/pkgs/development/cuda-modules/gpus.nix

Essentially, every capability supported by a CUDA version is added by default to maximize compatibility and performance. (This is also done in part from reduce load on CI, as specifying a different list of capabilities involves rebuilding all CUDA-enabled packages.)

I recommended that users configure Nixpkgs to target their specific capability. Unfortunately, that generally requires they rebuild everything locally as CI only caches packages built with the default configuration.

I’ve not done any research into the various capabilities to find out what has changed between them (excluding the floating point operation speed up between 8.0 and 8.6), so that would be a good place to start if we wanted to cull additional capabilities from the default set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants