Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader is broken #170

Closed
SteffenDE opened this issue Jul 9, 2022 · 17 comments
Closed

Loader is broken #170

SteffenDE opened this issue Jul 9, 2022 · 17 comments

Comments

@SteffenDE
Copy link

Hi there,

while debugging the erlexec compilation error (nerves-project/nerves_ssh#84), I tried to use this base system for testing inside qemu. The problem is that there seems to be something very wrong with the loader:

Current behavior

Trying to execute any native binary results in an 127 exit code, as the loader does not seem to be found:

iex(1)> System.shell("ls -la /srv/erlang/lib/nerves_uevent*/priv/uevent", into:
 IO.binstream(:stdio, :line))
-rwxr-xr-x    1 root     root         18888 Jan  4  2020 /srv/erlang/lib/nerves_
uevent-0.1.0/priv/uevent
{%IO.Stream{device: :standard_io, line_or_bytes: :line, raw: true}, 0}
iex(2)> System.shell("ldd /srv/erlang/lib/nerves_uevent*/priv/uevent", into: IO
.binstream(:stdio, :line))
        linux-vdso.so.1 (0x00007ffdba9b1000)
        libmnl.so.0 => /usr/lib64/libmnl.so.0 (0x00007fb76f654000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fb76f45b000)
        /lib/ld-musl-x86_64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007fb76f65e
000)
{%IO.Stream{device: :standard_io, line_or_bytes: :line, raw: true}, 0}
iex(3)> System.shell("/srv/erlang/lib/nerves_uevent*/priv/uevent", into: IO.bin
stream(:stdio, :line))
sh: /srv/erlang/lib/nerves_uevent-0.1.0/priv/uevent: not found
                                                              {%IO.Stream{device
: :standard_io, line_or_bytes: :line, raw: true}, 127}

You can reproduce this by building an example project for x86_64, I used https://github.com/nerves-project/nerves_examples/tree/main/minimal.

Expected behavior

Executing binary files should work.

@SteffenDE
Copy link
Author

The last release with ld-musl-x86_64.so.1 is 1.18.4, since 1.19.0 the loader is named ld-linux-x86-64.so.2.

@fhunleth
Copy link
Member

fhunleth commented Jul 9, 2022

We almost never use nerves_system_x86_64 so it's not well maintained. I did just build the minimal example, though. Here's what I see:

$ mix firmware
$ mix firmware.unpack
$ ~/.nerves/artifacts/nerves_toolchain_x86_64_nerves_linux_musl-darwin_arm-1.6.0/bin/x86_64-nerves-linux-musl-ldd --root minimal.unpacked/rootfs  minimal.unpacked/rootfs/srv/erlang/lib/nerves_uevent-0.1.0/priv/uevent
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        libmnl.so.0 => /usr/lib/libmnl.so.0 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        libc.so.6 => /lib/libc.so.6 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        ld-linux-x86-64.so.2 => /lib/ld-linux-x86-64.so.2 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
$ ls -las  minimal.unpacked/rootfs/lib/ld-linux-x86-64*
440 -rwxr-xr-x  1 fhunleth  staff  221864 Jan  4  2020 minimal.unpacked/rootfs/lib/ld-linux-x86-64.so.2

I'm not running it and if I ignore the sed errors (I'm on MacOS, so there's a sed/gsed issue), I think that what I have would work. I.e., no references to /lib/ld-musl-x86_64.so.1.

Could you try deleting the _build and deps directories and doing a clean rebuild?

@fhunleth
Copy link
Member

fhunleth commented Jul 9, 2022

I just ran the minimal example in qemu:

$ mix nerves.gen.qemu_script
$ mix firmware.image
$ ./run-qemu.sh

I only did minimal testing, but it seems to work for me.

@SteffenDE
Copy link
Author

Well that's interesting! I see exactly the same when using mix firmware.unpack, but still in qemu it's /lib/ld-musl-x86_64.so.1:

steffen@sdBookPro ~/n/minimal (main)> ~/.nerves/artifacts/nerves_toolchain_x86_64_nerves_linux_musl-darwin_arm-1.6.0/bin/x86_64-nerves-linux-musl-ldd --root minimal.unpacked/rootfs  minimal.unpacked/rootfs/srv/erlang/lib/nerves_uevent-0.1.0/priv/uevent
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        libmnl.so.0 => /usr/lib/libmnl.so.0 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        libc.so.6 => /lib/libc.so.6 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
        ld-linux-x86-64.so.2 => /lib/ld-linux-x86-64.so.2 (0x00000000deadbeef)
sed: 1: "s/^.*Library r(|un)path ...": RE error: empty (sub)expression
steffen@sdBookPro ~/n/minimal (main)> fwup -a -d nerves_minimal.img -i /Users/steffen/nerves_examples/minimal/_build/x86_64_dev/nerves/images/minimal.fw -t complete
steffen@sdBookPro ~/n/minimal (main)> qemu-system-x86_64 -drive file=nerves_minimal.img,if=virtio,format=raw -net nic,model=virtio -net user,hostfwd=tcp::10022-:22 -curses

image

@fhunleth
Copy link
Member

fhunleth commented Jul 9, 2022

I can reproduce what you did. This is weird.

@fhunleth
Copy link
Member

fhunleth commented Jul 9, 2022

I have to look at this later. You definitely found something weird. I think that the /lib/ld-musl-x86_64.so.1 part actually is ok since ldd maps it to ld-linux-x86-64.so.2 which does exist.

Btw, I did talk with someone yesterday about reviving qemu use of Nerves. I'll encourage them to post publicly if they do anything with it. I like the idea of using qemu, but I never found a workflow that was more efficient than using a real device for me.

@jjcarstens
Copy link
Member

Fwiw - QEMU needs to be told where shared libs are in some cases or it will attempt to find it in the host system.

I suspect that if you set QEMU_LD_PREFIX=/path/to/system/lib then these failures would actually work. To me it seems that the uevent binary cannot locate a shared lib it needs which probably exists in the system, but not referenced in the QEMU runtime. That also might explain with using rpath fixes the issue.

I haven't tested and this would need to be verified, but in all my experience with QEMU, it usually comes down to explicitly setting that LD prefix

@jjcarstens
Copy link
Member

Actually, I might have jumped the gun. My theory deals with qemu-user and not the qemu-system which is being used.

So back to the drawing board

@fhunleth
Copy link
Member

fhunleth commented Jul 9, 2022

I take back what I said about ld-musl-x86_64.so.1. Here's the INTERP section when running readelf on uevent:

  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x0000000000000019 0x0000000000000019  R      0x1
      [Requesting program interpreter: /lib/ld-musl-x86_64.so.1]

I don't think there are any smarts in Linux to fix that path. I.e., when it tries to run uevent, it sees that the interpreter is ld-musl-x86_64.so.1 and blindly runs that program to do all of the dynamic loading stuff. That also makes the error message make more sense - ld-musl-x86_64.so.1 couldn't be found.

As a test, I made a symlink from ld-musl-x86_64.so.1 to ld-linux-x86-64.so.2. That works.

I also did another experiment with building uevent using Buildroot's toolchain wrapper. This is unfortunately a program that's too tightly coupled to Buildroot to reuse in Nerves so Nerves replicates the critical parts. The toolchain wrapper produces ELF files with the INTERP section pointing to ld-linux-x86-64.so.2. There's a debug setting to Buildroot's toolchain wrapper to see what hidden gcc options it enables, and I really think we have all of the important ones.

I think that Buildroot is renaming ld-musl-x86_64.so.1 to ld-linux-x86-64.so.2 since the toolchain provides ld-musl-x86_64.so.1, but it doesn't exist in the Buildroot build directories. I need to dig deeper into why and where it does this since my quick greps through Buildroot didn't find it.

@fhunleth
Copy link
Member

Found it. It looks like a surprisingly easy fix. This was a regression introduced by the Buildroot 2022.05 update due to a config option being needed now and me not catching it. Fixing, but it's getting late so possibly fixed tomorrow morning.

@fhunleth
Copy link
Member

Just to add more detail, without the config option, Buildroot reverted to using a non-Nerves gcc 11.2 glibc toolchain. Here I thought Buildroot was making the dynamic loader naming consistent everywhere for some deep reason, but it's not.

@SteffenDE
Copy link
Author

Interesting, sounds like this could also fix the problem with compiling erlexec in nerves-project/nerves_ssh#84. Good detective work!

I'm thinking if there's an easy way to automatically test this system using QEMU to make sure it works as expected? I've seen that other base systems have a test directory (e.g. https://github.com/nerves-project/nerves_system_rpi3/tree/main/test; clicking on the link leads to an error page though), maybe we could add something like this here as well?

@fhunleth
Copy link
Member

We lost access to our whole hardware test rig so the test programs in the other systems won't work. On the other hand, it seems like this particular system might not be as difficult to test since hardware isn't required. We'd still need to find someone to actually write and contributed the code. That's probably the hardest part.

@fhunleth
Copy link
Member

Fixed by #171.

@fhunleth
Copy link
Member

@SteffenDE A new x86_64 system should be out in an hour or two.

@SteffenDE
Copy link
Author

Nice, I can confirm that building erlexec works without any changes using 1.20.1 👍

Lost access? Is there anything the community can do to change that? Having "known to work" base systems is quite important in my opinion. I'll try to find the time to play with automated QEMU testing in the future.

@fhunleth
Copy link
Member

Yeah, no one disagrees that this isn't important. I don't want to go into what happened with the previous test rig here.

As for someone helping from the community, the answer is that yes people can help. The Nerves project has funds to buy parts for a new test setup. That part is not a problem. The previous test software used NervesHub, but that's no longer an option since there's no longer a public NervesHub server for us to use. On the other hand, it's possible to use a Wireguard VPN to push test firmware builds to devices. Or if there's another way, that's fine too. It would be best to discuss this on #nerves-dev on Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants