zig icon indicating copy to clipboard operation
zig copied to clipboard

error: TemporaryNameServerFailure when using package management on Termux

Open leap0x7b opened this issue 2 years ago • 14 comments

Zig Version

0.11.0-dev.1606+3c2a43fdc

Steps to Reproduce and Observed Behavior

  1. Create a Zig project using either zig init-lib or zig init-exe
  2. Create a build.zig.zon file:
.{
    .name = "viisi",
    .description = "A RISC-V hobby computer inspired by old 80s/90s UNIX workstations",
    .version = "0.1.0",
    .dependencies = .{
        .clap = .{
            .url = "https://github.com/Hejsil/zig-clap/archive/272d8e2088b2cae037349fb260dc05ec46bba422.tar.gz",
        },
    },
}
  1. Run zig build

Expected Behavior

It should be able to resolve github.com and download the file.

leap0x7b avatar Feb 13 '23 09:02 leap0x7b

I'm also getting this on Arch Linux with systemd-networkd and systemd-resolved. Working just fine on macOS though. Not sure if this helps but:

$ strace --trace=network zig build test
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
error: TemporaryNameServerFailure
+++ exited with 1 +++

hryx avatar Feb 17 '23 07:02 hryx

A related issue, #14900, was solved recently. Did it solve this issue as well?

andrewrk avatar Apr 10 '23 20:04 andrewrk

I see a different error now, error.ConnectionFailed. The strace output looks identical to before. Let me know if there are any better diagnostics I can provide.

hryx avatar Apr 11 '23 02:04 hryx

Still doesn't work, I also got error.ConnectionFailed as well

leap0x7b avatar Apr 11 '23 05:04 leap0x7b

I went to debug this by altering std.http.Client, but got a strange result before I even changed any code.

I checked out 602029bb2 (commit of latest release at the time of this test), build stage3 (Release) and stage4 (Debug), and ran zig build on a project with dependencies, making sure to clear the global package cache between runs. Both of those built zig executables successfully fetched deps.

But I downloaded the official release with the same commit hash and got error.ConnectionFailed as before. I would run the official build through a debugger but of course it is stripped, so I'm trying to think of a next step.

hryx avatar Apr 13 '23 03:04 hryx

@hryx,

just to rule it out, does this run without error?

const std = @import("std");

pub fn main() !void {
    var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
    const gpa = general_purpose_allocator.allocator();

    var http_client: std.http.Client = .{ .allocator = gpa };
    defer http_client.deinit();

    const uri = try std.Uri.parse("http://github.com");
    var req = try http_client.request(uri, .{}, .{});
    defer req.deinit();
}

mikdusan avatar Apr 13 '23 19:04 mikdusan

@mikdusan It does not! Good idea to test by using an HTTP client directly.

Error output:

error: ConnectionFailed
/home/hryx/lib/std/net.zig:45:9: 0x46a95a in parseIp (main)
        return error.InvalidIPAddressFormat;
        ^
/home/hryx/lib/std/net.zig:75:29: 0x46a549 in parseExpectingFamily (main)
            os.AF.UNSPEC => return parseIp(name, port),
                            ^
/home/hryx/lib/std/net.zig:1414:48: 0x476584 in linuxLookupNameFromDns (main)
    if (ap[0].len < 4 or (ap[0][3] & 15) == 2) return error.TemporaryNameServerFailure;
                                               ^
/home/hryx/lib/std/net.zig:1358:5: 0x4778ec in linuxLookupNameFromDnsSearch (main)
    return linuxLookupNameFromDns(addrs, canon, name, family, rc, port);
    ^
/home/hryx/lib/std/net.zig:996:17: 0x4785e7 in linuxLookupName (main)
                try linuxLookupNameFromDnsSearch(addrs, canon, name, family, port);
                ^
/home/hryx/lib/std/net.zig:933:9: 0x428661 in getAddressList (main)
        try linuxLookupName(&lookup_addrs, &canon, name, family, flags, port);
        ^
/home/hryx/lib/std/net.zig:709:18: 0x35685f in tcpConnectToHost (main)
    const list = try getAddressList(allocator, name, port);
                 ^
/home/hryx/lib/std/http/Client.zig:896:9: 0x3209ac in connect (main)
        return error.ConnectionFailed;
        ^
/home/hryx/lib/std/http/Client.zig:994:23: 0x30f194 in request (main)
        .connection = try client.connect(host, port, protocol),
                      ^
/home/hryx/tmp/issue14636/main.zig:11:15: 0x30e942 in main (main)
    var req = try http_client.request(uri, .{}, .{});
              ^

I added a log right before the error return in linuxLookupNameFromDns with the values of ap:

info: ap0: 7c8f850200010000000000000667697468756203636f6d0000010001 = githubcom
info: ap1: 7f20850200010000000000000667697468756203636f6d00001c0001 = Tgithubcom

hryx avatar Apr 13 '23 19:04 hryx

I fixed the issue on my machine. I'll report what I found in case it helps @leap0x7b or others, but I'm left to speculate about the original root cause.

The issue was mundane: my /etc/resolv.conf was supposed to be a symlink to /run/systemd/resolve/stub-resolv.conf, but it was a plain file. When systemd-resolved is used, it needs to be a link to the stub file. The solution was to remove /etc/resolv.conf and make it a link, but it could probably also be solved by reinstalling the relevant package.

My guess is that my system got into a bad state long ago but didn't show symptoms until the Zig HTTP client tried to resolve DNS by reading resolv.conf directly. It was an empty file (except for a comment), so name resolution obviously failed. I don't understand why every other program was still able to use DNS — maybe glibc falls back to discovering a local nameserver, while musl doesn't. In fact, Zig built locally with glibc was able to make these HTTP requests, even with my previously bad system state.

As for how my system got into that state, who knows, but I have had at least one service that modifies /etc/resolv.conf, and it could have been a since-fixed bug in the service or even package install script. I wasn't able to reproduce the issue after reinstalling Tailscale and NetworkManager, for example.

Anyway, no Zig bug here as far as I am concerned.

hryx avatar Apr 14 '23 22:04 hryx

On Android (termux), /etc/resolv.conf doesn't exist

const std = @import("std");

pub fn main() !void {
    var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
    const gpa = general_purpose_allocator.allocator();

    var http_client: std.http.Client = .{ .allocator = gpa };
    defer http_client.deinit();

    const uri = try std.Uri.parse("http://github.com");
    var req = try http_client.request(.GET, uri, .{
        .allocator = gpa,
    }, .{});
    defer req.deinit();
}
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)

wget and other programs work fine

$PREFIX/etc/resolv.conf does exist (/data/data/com.termux/files/usr/etc/resolv.conf)

Packages in the termux package manager have to patch this folder it seems: https://github.com/termux/termux-packages/pull/14738/files

Applying these patches to std/net.zig appears to fix the issue for using http_client.request, but zig would have to be recompiled with these changes to fix zig build:

-   const file = fs.openFileAbsoluteZ("/etc/hosts", .{}) catch |err| switch (err) {
+   const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/hosts", .{}) catch |err| switch (err) {
-   const file = fs.openFileAbsoluteZ("/etc/resolv.conf", .{}) catch |err| switch (err) {
+   const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/resolv.conf", .{}) catch |err| switch (err) {

I'm not sure what to do about this?

  • Termux can't put things in the standard locations because of android restrictions: https://wiki.termux.com/wiki/Differences_from_Linux
  • Zig wants to ship static binaries that can run on any linux machine, but any zig application that accesses the internet will need to be rebuilt and packaged for termux to fix the hardcoded locations of these system files

Some possible solutions:

  • Do nothing, require termux users to install zig from the system package manager or build it themselves. Binaries built with zig for linux are only for distros conforming to the Filesystem Hierarchy Standard
    • This doesn't seem ideal, zig from package managers is often at least six months out of date
  • Use $PREFIX and change the paths for /etc/resolv.conf and /etc/hosts
    • Similar to #15896 but $PREFIX would have to be preferred - android has its own /etc/hosts file when termux's should be used instead
    • Is $PREFIX any kind of standard environment variable? Will some people have this set for a different reason and this change would cause undesired behaviour for them?
  • Require a special environment variable to be set to request different locations for these system files
  • Something else

pfgithub avatar Aug 01 '23 22:08 pfgithub

Until the problem is solved in zig, the likely temporary solution seems to be laid out in the https://wiki.termux.com/wiki/Differences_from_Linux that you linked. You can use termux-chroot from proot to get back FHS compliance for now.

truemedian avatar Aug 23 '23 23:08 truemedian

+1, I'm also hitting this problem on my Ubuntu laptop. For some reason, specifically github.com takes 10 seconds to resolve when pinging (github.io is fast). Git operations are also delayed by those 10 seconds but eventually work. The Zig package manager however fails with TemporaryNameServerFailure.

(PS: zig 0.12.0-dev.2236+32e88251e)

PS: it works after changing DNS servers from my router to 8.8.8.8 and 8.8.4.4 (make sure that systemd actual sees those changes via resolvectl status. I had to log out and in of the desktop session after changing the DNS settings in the KDE control panel.

PPS: FWIW, on my Mac it's always fast, no matter if my router is used for DNS or the Google DNS servers.

floooh avatar Jan 16 '24 18:01 floooh

Great stuff @pfgithub!

Having looked at #14146 a bit, I also got interested in wanting to get this fixed. Unfortunately I don't have an awesome idea on how to solve it after the $PREFIX idea was rejected for that one.

I think adding an --dns x.y.z.w argument for zig fetch and zig build makes the most sense, and it should be okay in practice as well since it should only be required when (re)fetching dependencies.

There could also be fallback nameservers for popular providers like 1.1.1.1 or 8.8.8.8 but I'm not as hyped on that idea since it could bypass what the user actually wants, for example if they have a resolv.conf elsewhere (like in the case of Termux.)

JerwuQu avatar Feb 05 '24 19:02 JerwuQu

@JerwuQu Wouldn't this mean any program using zig http would need to have its own --dns argument? This is a problem with the HTTP client in the standard library, not just zig fetch and zig build

pfgithub avatar Feb 06 '24 16:02 pfgithub

@pfgithub That is very true. I honestly didn't consider that at first, only thinking about the zig build angle.

In that case I feel more strongly that an environment variable for specifying DNS servers (or an alternative resolv.conf path) is needed. I think specifying nameservers is better than pointing to an alternative resolv.conf path since it then turns into feature that could also be used by FHS-systems rather than a workaround for non-FHS-systems. This could be in the same category as the much prevalent HTTP_PROXY and friends, that I think are also reasonable for the stdlib HTTP client to respect.

I searched quite a bit and can't find any good names that are shared between multiple projects. One project is resolvconf-override which uses NAMESERVER1 and NAMESERVER2. Another project I found is SkyDNS which uses SKYDNS_NAMESERVERS. It seems most large projects respect resolv.conf with an argument override (e.g. cURL). I believe the main reason of this is because there isn't a universally used environment variable already.

I therefore see two options:

  • Go with NAMESERVERS or DNS_NAMESERVERS and let Zig applications set a precedent for other projects. Perhaps in the future we would see this supported as commonly as HTTP_PROXY.
  • Go with ZIG_NAMESERVERS to solve the specific Zig case.

The downside here is that package maintainers can't reasonably set this variable for their users, and would still need to patch the stdlib to get their resolv.conf as default. Requiring users of Zig and Zig-made projects to set this envvar just to have networked applications working is not desirable. Should there then be a flag for compilation (same realm as #18778) to set where the system resolv.conf is located?

I would be keen to hear the thoughts of a Zig maintainer on this issue.

JerwuQu avatar Feb 06 '24 17:02 JerwuQu

I just use "proot-distro" within Termux https://github.com/termux/proot-distro?tab=readme-ov-file#installing

Install and login as root

pd install alpine
pd sh alpine

My fish config has these line.

user must be created. Mount Termux home

alias home="pd sh --user user --termux-home alpine"

Also you can just mount resolve.conf with proot https://github.com/termux/termux-app/issues/869#issuecomment-433985523

I tried but zig prints warning about linker and it's annoying

tamadamas avatar Mar 12 '25 19:03 tamadamas

Great stuff @pfgithub!

Having looked at #14146 a bit, I also got interested in wanting to get this fixed. Unfortunately I don't have an awesome idea on how to solve it after the $PREFIX idea was rejected for that one.

I think adding an --dns x.y.z.w argument for zig fetch and zig build makes the most sense, and it should be okay in practice as well since it should only be required when (re)fetching dependencies.

There could also be fallback nameservers for popular providers like 1.1.1.1 or 8.8.8.8 but I'm not as hyped on that idea since it could bypass what the user actually wants, for example if they have a resolv.conf elsewhere (like in the case of Termux.)

I am facing this same issue on termux, a simple fetch to example.com raises this, I am not finding any solutions

rotleaf avatar Apr 15 '25 16:04 rotleaf

I think adding an --dns x.y.z.w argument for zig fetch and zig build makes the most sense, and it should be okay in practice as well since it should only be required when (re)fetching dependencies.

I think fixing the DNS resolution logic makes the most sense... this workaround should not be necessary.

andrewrk avatar Apr 15 '25 16:04 andrewrk