rules_cc_toolchain icon indicating copy to clipboard operation
rules_cc_toolchain copied to clipboard

ld.lld: error: cannot open /lib/x86_64-linux-gnu/libm.so.6: No such file or directory (Arch linux)

Open DCNick3 opened this issue 4 years ago • 10 comments

First of all, thanks a lot for this wonderful project, allowing to use hermetic toolchains without going through a lot of pain, trial and error to set up.

Unfortunately, it seems that I can't build any executable due to linker error: ld.lld: error: cannot open /lib/x86_64-linux-gnu/libm.so.6: No such file or directory.

I use arch linux distribution, and it seems to work fine on ubuntu (tested in docker). Also, the error goes away if I disable the sandboxing (--launch_strategy=standalone).

I've made a repro case with docker: you should be able to setup the environment with a simple docker build . -t bazel_rules_cc_toolchain_repro and docker run --rm -it -v $(pwd):/root bazel_rules_cc_toolchain_repro

https://gitlab.com/DCNick3/bazel_rules_cc_toolchain_repro

It seems that it has something to do with how sandboxing is implemented in bazel, but I can't seem to find a good direction to dig in. Here some my thoughts though:

Why is clang trying to look for /lib/x86_64-linux-gnu/libm.so.6? This is not where libm resides on arch. Maybe it's because the clang is built for ubuntu? But it works without sandbox, and, even more, I was fine using prebuilt clang binaries for ubuntu previously, so the problem seems to lie in interaction between clang linker and bazel sandboxing... No idea how to debug this though.

DCNick3 avatar Oct 03 '21 18:10 DCNick3

Hey thank you for raising this. I'm glad you like the project. I've had a brief look into this, and I can repro the issue. I really do enjoy Arch, but I no longer use it as my primary dev environment and isn't officially supported by Bazel either. So given that there is a simple workaround (e.g. --launch_strategy=local) I'm unlikely to prioritise fixing this issue. Though I am working on getting this toolchain to work with sandboxfs which may/not fix this problem as a side effect so I will keep this issue open in case I inadvertently fix it.

The path that you mentioned actually comes from the debian sysroot that I am pulling all the libraries out of. The toolchain actually ignores all the system dependencies (until you run the tests, at which point it hands over to the systems dynamic linker). Given that this is happening during the build phase I don't think that this is an issue with the system dependencies.

As a little side note most libxyz.so files are actually small linker scripts that the dynamic linker uses to find where the actual shared library is. For example, if you run cat /usr/lib/libm.so on Arch the output will be;

/* GNU ld script
*/
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /usr/lib/libm.so.6  AS_NEEDED ( /usr/lib/libmvec.so.1 ) )

You can sort of think about this linker script as a symlink that carries some extra version info.

The path that you have mentioned /lib/x86_64-linux-gnu/libm.so.6 is actually the path from the top of the sysroot to the shared lib. So in this case the sysroot is external/debian_stretch_amd64_sysroot. So you can actually print out the linker script that is causing this issue e.g. cat $(bazel info output_base)/external/debian_stretch_amd64_sysroot/usr/lib/x86_64-linux-gnu/libm.so which outputs;

/* GNU ld script
*/
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libm.so.6  AS_NEEDED ( /usr/lib/x86_64-linux-gnu/libmvec_nonshared.a /lib/x86_64-linux-gnu/libmvec.so.1 ) )

Showing the offending library that it supposedly can't find (but is actually there). I would need to dive much deeper to work out why it can find the file in the ubuntu sandbox but not the Arch one. But that will probably take me some time.

As a workaround to use this toolchain on Arch I would recommend using --spawn_strategy=local which will allow for successful build, whilst sacrificing some (but not all) sandboxing. Then I also noticed that you will need to install the llvm projects libc++ implementation pacman -S libc++. After which point bazel run //main:hello-world --spawn_strategy=local should run as expected.

I would be happy to accept a PR if you do find a fix.

nathaniel-brough avatar Oct 04 '21 09:10 nathaniel-brough

Thanks a lot for the input. I actually was trying to investigate this myself and did find some of the stuff that you pointed out myself, but didn't post it yet (whoops). Will keep it for the sake of fullness.

It seems that the reference to /lib/x86_64-linux-gnu/libm.so.6 comes from /usr/lib/x86_64-linux-gnu/libm.so from the sysroot and archlinux has different location for libm, so this fails. This should be fine, as ld is designed to put the paths into the sysroot even when linker script is used (and this is probably true for lld). (link1, link2).

What I found out additionally may seem like a bigger problem though.

I used --sandbox_debug to make bazel save the exec prefix and look around in it. And actually it seems that bazel does not put everything required in it.

image

Here you can see libm.so symlink, but it's just a linker script, there's no symlink to libm.so.6.

Furthermore, if I do a dirty test by creating the /usr/lib/x86_64-linux-gnu/ directory and putting and put all the required library filenames there (being empty text files) ld no longer complains about not finding the files, but merely misses symbols.

$ tree /usr/lib/x86_64-linux-gnu/
/usr/lib/x86_64-linux-gnu/
├── ld-linux-x86-64.so.2
├── libc_nonshared.a
├── libc.so.6
├── libm.so.6
├── libmvec.so.1
└── libpthread.so.0
< ... >
ld.lld: error: undefined symbol: ungetc
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<char>::pbackfail(int)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<char>::__getchar(bool)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<wchar_t>::pbackfail(unsigned int)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced 1 more times

ld.lld: error: undefined symbol: getc
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<char>::__getchar(bool)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<char>::__getchar(bool)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced by iostream.cpp
>>>               iostream.cpp.o:(std::__1::__stdinbuf<wchar_t>::__getchar(bool)) in archive external/clang_llvm_12_00_x86_64_linux_gnu_ubuntu_16_04/lib/libc++.a
>>> referenced 1 more times

ld.lld: error: too many errors emitted, stopping now (use -error-limit=0 to see all errors)
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
1633341624.351431038: src/main/tools/linux-sandbox-pid1.cc:423: wait returned pid=2, status=0x100
1633341624.351441408: src/main/tools/linux-sandbox-pid1.cc:441: child exited normally with code 1
1633341624.351645217: src/main/tools/linux-sandbox.cc:233: child exited normally with code 1
Target //main:hello-world failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.300s, Critical Path: 0.18s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

It seems that the toolchain breaches the hermeticity and uses system libraries on ubuntu with basic sandboxing enabled (without sandboxfs).

When the sandboxing is disabled though, the exec prefix is not created and everything works fine, as the linker finds all the libraries it needs in the sysroot.

Now the question is how to prevent lld from looking for libraries outside of the sysroot and make bazel put everything required in the exec prefix...

DCNick3 avatar Oct 04 '21 10:10 DCNick3

This seems to be the offending line. If it doesn't find the file in the sysroot it just falls through to look into the system libs.

DCNick3 avatar Oct 04 '21 10:10 DCNick3

GNU ld seems to have similar behavior, though I am not sure it is the function I am looking for

/* Search for and open the file specified by ENTRY.  If it is an
   archive, use ARCH, LIB and SUFFIX to modify the file name.  */

bool
ldfile_open_file_search (const char *arch,
             lang_input_statement_type *entry,
             const char *lib,
             const char *suffix)
{
  search_dirs_type *search;

  /* If this is not an archive, try to open it in the current
     directory first.  */
  if (!entry->flags.maybe_archive)
    {
      if (entry->flags.sysrooted && IS_ABSOLUTE_PATH (entry->filename))
    {
      char *name = concat (ld_sysroot, entry->filename,
                   (const char *) NULL);
      if (ldfile_try_open_bfd (name, entry))
        {
          entry->filename = name;
          return true;
        }
      free (name);
    }
      else if (ldfile_try_open_bfd (entry->filename, entry))
    return true;

      if (IS_ABSOLUTE_PATH (entry->filename))
    return false;
    }
< ... >

One possible way I see for this to force the linker to look inside the sysroot is to patch all the ld scripts to be prefixed with =, which will be replaced with sysroot path

DCNick3 avatar Oct 04 '21 10:10 DCNick3

Firstly thank you for the detailed report, it certainly helps with tracking things down.

Here you can see libm.so symlink, but it's just a linker script, there's no symlink to libm.so.6.

I think you might be onto something here, I've got a couple of things that I can try in this case. Namely, I think there needs to be an additional lib here. e.g. additional_libs = ["usr/lib/x86_64-linux-gnu/libm.so.6"],. Which should place that file in the sandbox.

This seems to be the offending line. If it doesn't find the file in the sysroot it just falls through to look into the system libs.

For reference, I've tracked down the file you linked with the same version of LLVM that the toolchain uses.

Now the question is how to prevent lld from looking for libraries outside of the sysroot and make bazel put everything required in the exec prefix...

I am kinda unsure of this, I find the idea of breaking hermeticity quite concerning. Currently the toolchain, links with -nostdlib (man page) so even if you are right and somehow the toolchain is breaking hermeticy it shouldn't be looking at system libs. From the man page;

--nostdlib Only search directories specified on the command line.

For reference, if you compile with --verbose_failures --linkopt=-v you should get the full command line that clang passes through to ld.lld.

I have a feeling that there is something far more difficult to debug going on between different sandboxing mechanisms. Either way, if we get this working I will try to add an Arch Linux runner to Github actions to prevent regressions.

nathaniel-brough avatar Oct 05 '21 02:10 nathaniel-brough

Ok, so based on your feedback, I think I have fixed your original issue. However, I still haven't tracked down why the sandbox behaves differently on Ubuntu vs Arch. So I'm just going to wait on a review for the PR, and then I'll close this.

I'd be happy to take a PR with a GitHub action for Arch Linux via docker if you have time. That way, I'll be less likely to break Arch support in future. Otherwise, I'll likely get to it when I have more time. I'm thinking something along the lines of;

docker
├── arch-linux
│   └── Dockerfile
└── ubuntu
     └── Dockerfile
etc....
.github
└── workflows
    ├── arch.yml
    └── blank.yml
etc....

I appreciate your help!

nathaniel-brough avatar Oct 05 '21 03:10 nathaniel-brough

Currently the toolchain, links with -nostdlib (man page) so even if you are right and somehow the toolchain is breaking hermeticy it shouldn't be looking at system libs

I think the problem is how sysroot handling logic interplays with the ld scripts:

The ld scripts ask the linker to resolve an absolute path like /lib/x86_64-linux-gnu/libm.so.6

The linker (both llvm and gnu ld) seems to use (roughly) the following algorithm:

function lookup(path) {
  if (path is absolute) {
    // look in sysroot
    if (sysrooted && exists(sysroot/path))
      return sysroot/path;
    // look in system
    if (exists(path))
      return path;
    throw FileNotFound();
  }
  <...>
}

The problem with it is that even though it looks into the sysroot, if it does not find the file there it looks at it just using an absolute path here. nostdlib does nothing here, because it affects only library search paths, but here we already have an absolute path which does not require any lookup.

One way you can fix this potential issue (that is, silently using system libs when library in sysroot can't be found) is replace the absolute paths in the ld scripts with paths prefixed with =. = will be replaced with sysroot path and the linker would fail if the file would not exist in the sysroot.

Maybe you can also patch the llvm or whatever to change the sysroot handling logic, but I think it is a bit more pain that editing the sysroot.

I think I have fixed your original issue

Doesn't this break the idea of how should ld script work? They have some logic in them, using which they may use the additional libraries, i.e. mvec.so is inside AS_NEEDED, which will not link the binary to the mvec.so if no symbols are used from it, but putting it in additional_libs links it unconditionally.

dcnick3@dcnick3-arch:~/git_cloned/bazel_rules_cc_toolchain_test$ ldd bazel-bin/main/hello-world
        linux-vdso.so.1 (0x00007ffe8029b000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f8618008000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f8617ec4000)
        libmvec.so.1 => /usr/lib/libmvec.so.1 (0x00007f8617e98000)
        librt.so.1 => /usr/lib/librt.so.1 (0x00007f8617e8d000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f8617e6c000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f8617ca0000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0x00007f8617bc6000)
        libc++abi.so.1 => /usr/lib/libc++abi.so.1 (0x00007f8617b86000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f8617b6b000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f861806d000)

I think you might be better off adding a way of specifying additional dependencies without having to link with them, so that the ld scripts will do their thing

DCNick3 avatar Oct 05 '21 06:10 DCNick3

The linker (both llvm and gnu ld) seems to use (roughly) the following algorithm:

Yeah, that's true, this might be an upstream bug I'm not knowledgeable enough to say for sure. If this is what is happening how is it that the sandbox in Arch prevented this from occurring but the sandbox in Ubuntu did not?

Maybe you can also patch the llvm or whatever to change the sysroot handling logic, but I think it is a bit more pain than editing the sysroot.

I can definitely look into this, it shouldn't be too hard to do. Though patching LLVM will likely take some time before it makes it into a binary release and I don't have much interest in publishing a forked binary distro of llvm. The sysroot that I am using is hosted by the chromium team (but based directly on the stock Debian filesystem).

@akirabaruah Do you think that there would be any resistance to hosting a modified sysroot?

Doesn't this break the idea of how should ld script work? They have some logic in them, using which they may use the additional libraries, i.e. mvec.so is inside AS_NEEDED, which will not link the binary to the mvec.so if no symbols are used from it, but putting it in additional_libs links it unconditionally.

Nah, the additional libs attribute doesn't affect the command line at all. It just allows bazel to track the other files that the linker needs. That's why it fixes the sandbox issue on Arch. In this case, it is still up to ld.lld to determine if the library is needed. Linking these AS_NEEDED libs is not unconditional.

Granted there is room for improvement in determining which libs are always required and which are not. e.g. libmvec should probably be an additional_lib for libm rather than explicitly linked.

I think you might be better off adding a way of specifying additional dependencies without having to link with them, so that the ld scripts will do their thing

This is the motivation behind having an additional libraries attribute, that's separate from the shared/static library attributes.

nathaniel-brough avatar Oct 06 '21 05:10 nathaniel-brough

If this is what is happening how is it that the sandbox in Arch prevented this from occurring but the sandbox in Ubuntu did not?

What I think happened is it found the libraries on ubuntu breaching the hermeticity (because the path specified exists in the system), but did not find it on arch (because the libraries have different paths).

This is the motivation behind having an additional libraries attribute, that's separate from the shared/static library attributes.

I see. Maybe you should rephrase the docstring with emphasis on this, like Additional files that will be added to link action dependencies, but not passed to linker directly.

DCNick3 avatar Oct 06 '21 07:10 DCNick3

What I think happened is it found the libraries on ubuntu breaching the hermeticity (because the path specified exists in the system), but did not find it on arch (because the libraries have different paths).

Ah yeah, righto that makes sense. Do you want to go ahead and file a bug with LLVM, then we can see if they are likely to accept patches? In the meantime, I'll have a look at what it would take to modify the sysroot.

I see. Maybe you should rephrase the docstring with emphasis on this, like Additional files that will be added to link action dependencies, but not passed to linker directly.

I've broken this out into a separate issue #27.

nathaniel-brough avatar Oct 07 '21 06:10 nathaniel-brough