nix Copy local flakes to the store lazily

Currently flakes are evaluated from the Nix store, so when using a local flake, it's first copied to the store. This means that

$ cd /path/to/nixpkgs
$ nix build .#hello

is a lot slower than the non-flake alternative

$ nix build -f . hello

Ideally, we would copy the flake to the store only when its outPath attribute is evaluated. However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files).

Oct 07 '19 12:10 edolstra

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/flakes-without-git-copies-entire-tree-to-nix-store/10743/2

Dec 30 '20 09:12 nixos-discourse

Still important.

Jun 28 '21 10:06 hmenke

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/my-painpoints-with-flakes/9750/20

Jul 26 '21 08:07 nixos-discourse

It would be nice if Nix could take advantage of the filesystem's native CoW functionality (if present) in order to speed up copying. We discussed this briefly in #offtopic:nixos.org.

Sep 03 '21 11:09 L-as

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-it-possible-to-make-a-flake-that-has-no-source-tree/16037/2

Nov 13 '21 04:11 nixos-discourse

I just hit this. My first attempt at a workaround was to remove the self arg from outputs (as without self I can't access the source tree at all). It turns out that makes it copy the tree and then throw an error about how the outputs function doesn't take a self arg.

Lazily copying the flake only when outPath is evaluated would be ideal, but being able to just drop the self arg to suppress the copy would be a great first step.

@L-as Taking advantage of CoW would be nice but it's not doable on macOS where the Nix store lives on a separate volume (separate volume group even).

Nov 13 '21 19:11 lilyball

Also for context, in my case the flake was not in a git repo, it was just in a folder. Copying to the nix store is unacceptable because the folder contains multiple git repos along with all their build artifacts. Copying a git repo to the Nix store at least would avoid copying untracked files, but in my case it had hundreds of thousands of files and multiple gigabytes of data to look at and copy.

Nov 13 '21 19:11 lilyball

@lilyball @L-as Taking advantage of CoW doesn't work on Linux either due to a VFS limitation: https://github.com/NixOS/nix/issues/5513

One thing I'd like to understand in this issue is why a local flake can't be evaluated "directly" just like the old default.nix-style file evaluation.
I know copying has benefits for hermetic evaluation and such but I don't need that, like, at all.
Sure, remote flakes should be copied to the Nix store and that's really great functionality but I see no point whatsoever in doing the same for local flakes that are already in the FS and not expected to change without the user's knowledge.

Nov 13 '21 21:11 Atemu

@Atemu from what I've read, it's to help enforce hermetic evaluation and avoid impurities. Presumably it also has advantages for code simplicity, because you don't need to write something separate for local flakes.

I agree it's not great UX for those of us who use flakes just to keep track of a dev shell, of course :)

Nov 14 '21 03:11 TLATER

it's to help enforce hermetic evaluation and avoid impurities.

And that's great but I don't see any point in hermetic eval on local files.

it's not great UX for those of us who use flakes just to keep track of a dev shell

It's also bad UX for anyone working on nix-built projects.
Correct me if I'm wrong here but if I was I'm hacking on Nixpkgs to solve some some bug in NixOS with dirty trees (because obviously, I'm hacking), Nix copies the entire 313MiB Nixpkgs checkout to the Nix store every time I eval.
Not only does that take quite a while (even on an SSD it's multiple seconds) but it also causes unnecessary writes. After 70 Nixpkgs evals, you've exhausted the expected daily writes to an SSD. That can't be good for endurance.

Is it just me or is that insane?

Nov 14 '21 10:11 Atemu

but I don't see any point in hermetic eval on local files.

You might not realize you're using local files, accidentally sneak in state, and then be surprised when it doesn't evaluate in deployment (and be all "wait, isn't nix supposed to prevent this?"). Even with fully local files, I'd expect things still to work if I move my directory to a new computer from a restored backup. While I've personally learned when and where local state might happen, it's still a safety net that I consider nice to have.

Of course, giant copies for the tiniest delta is way too much of a cost to incur for that, but this is why we're here - to make sure that flakes don't blow up SSDs all over the place when they finally become non-experimental ;)

Nov 14 '21 17:11 TLATER

You might not realize you're using local files, accidentally sneak in state, and then be surprised when it doesn't evaluate in deployment (and be all "wait, isn't nix supposed to prevent this?").

I don't understand what you mean by that.

How is copying the accidentally added state over to the Nix store first and then evaling it any better than just evaling it directly?

Even with fully local files, I'd expect things still to work if I move my directory to a new computer from a restored backup. While I've personally learned when and where local state might happen, it's still a safety net that I consider nice to have.

How is the location of the directory related to any of this? A direct eval of the same state of a directory in another location will have the same result. How should copying improve anything?

Nov 14 '21 21:11 Atemu

IIRC files that are tracked with git already (and changed) are being staged and then copied to the store. I can see how this ensures that at least the files are tracked and marked as updated (by staging them). I also kind of agree that I think this is the wrong solution to the problem, or perhaps a solution in search of a problem? Most of the time it is very expensive to copy my working directory into the store.

Since I can see why that feature is useful, I'd argue that it should be configurable if you want your flake repos to be copied to the store or not. As far as I know, the hash of the path that is added to the store is also currently used for the eval caching.

Perhaps the current implementation is a nice PoC of how more proper hermetic eval could look like and what it gives us in terms of capabilities (caching, ...).

Nov 14 '21 23:11 andir

I can see how this ensures that at least the files are tracked and marked as updated (by staging them).

That sounds like a sound reason but I can's see how that wouldn't just be possible with direct eval too.

@edolstra could we get some insight from you here?

Nov 16 '21 08:11 Atemu

It's definitely possible, just more work as described in the initial post:

However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files).

Nix already has a "eval may only access these store paths" logic, but no "may only access tracked files of this Git checkout" logic yet, so using the former was the simplest solution I assume.

Nov 16 '21 11:11 Kha

Another important point @rnhmjoj mentioned in Discourse is security. A user can easily unknowingly expose private/secret information globally on a system by building a local flake.

Nov 16 '21 12:11 Atemu

However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files)

I guess this could also be implemented by creating a shallow copy of the flake directory (by creating a forest of symlinks to the original source tree rather than really copying it). That could already make things notably faster (not entirely free, but cheap-enough in most cases), and might be simpler to implement.

Nov 17 '21 09:11 thufschmitt

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-nix-2-4-significantly-slower/16218/3

Nov 23 '21 15:11 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/locally-excluding-nix-flakes-when-using-nix-independenly-of-upstream/16480/17

Dec 12 '21 10:12 nixos-discourse

Linking here yesterday's chat, which I think is relevant.

tl;dr: IMHO nix develop should be the exception: impure and not-in-nix-store by default. Just like nix-shell.

Feb 15 '22 10:02 yajo

I believe local impure flakes are also very useful if you're constantly editing a file that is in your repo but you don't want the flake to be reevaluated and the file constantly copied to the store. Eventually when you're done you can just not have a --local flag (which can imply --impure as well I guess) or something. One use case I have in mind is having a flake where a python package is in scope but is also editable. I can drop into an appropriate shell with nix develop but it requires the path for the package which is a relative path. Ofcourse for hermetic evaluation you want the whole package source to be copied to the store, but I want to edit the package as I develop and use it at the same time. This means I either have to hardcode the path of my local package to the absolute path in my system which makes things very non-reproducible, or I have to update the flake-inputs constantly which keeps copying the thing to the store.

Feb 21 '22 15:02 bmabsout

What @bmabsout just said precisely outlines the whole reason I still haven't adopted flakes yet.

Hermetic eval is very useful for general building etc. but not when I'm in the middle of hacking on things. Nix flakes need to offer the same speed and convenience that i.e. nixos-rebuild -I nixpkgs=... -I nixos-config=... provides.

Feb 21 '22 16:02 Atemu

I also just encountered this problem. We store large amount of data (several TBs) with git annex (see our project at https://github.com/umd-lhcb/lhcb-ntuples-gen). Today we just annexed ~100 GB of new data (so the local repo size grows to around 100 GB, without downloading any other previously annexed data) and it took a whooping 8 minutes to finish a nix develop command, without any changes in flake.nix.

If I'm reading correctly, reverting back to a nix-shell based approach with flake-compat would mitigate our problem until the lazy copy lands. Is that right?

Feb 25 '22 04:02 yipengsun

git-annex and LFS are an interesting case here. Should large files be available in flake eval?

Feb 25 '22 08:02 Atemu

@yipengsun Yes, that's correct.

Feb 25 '22 10:02 edolstra

Even outside nix develop, this seems highly problematic. Couldn't you make use of Git's information to detect what has changed? We have a Merkle tree of the files after all.

Feb 25 '22 11:02 L-as

@L-as This behavior being problematic is why this issue exists...

Feb 25 '22 11:02 edolstra

git-annex and LFS are an interesting case here. Should large files be available in flake eval?

I'd say no unless the flake itself is used as an output. We currently have no such usecase but it could be nice if we can have something like a .flakeignore file to explicitly forbidden copy of files in certain paths.

Feb 25 '22 18:02 yipengsun

I did a bit more investigation, and found out the slowness of the nix develop was due to us accidentally added large files directly to git, and copying these files took a long time.

Also, I tried to setup a minimal flake repo to test the availability of the annexed files:

flake.nix:

{
  description = "test";

  inputs = {
    nixpkgs.url = "nixpkgs/nixpkgs-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = import nixpkgs { inherit system; };
      in
      {
        devShell = pkgs.mkShell {
          name = "test-git-annex";
          buildInputs = with pkgs; [
            git-annex
          ];
        };
      }
    );
}

I generated a large file (~100 MB) with dd, then first added it with git annex add and a nix develop.

After that, I inspected the /nix/store:

❯ ls -l
total 18
-r--r--r-- 3 root root 1001 Dec 31  1969 flake.lock
-r--r--r-- 4 root root  538 Dec 31  1969 flake.nix
lrwxrwxrwx 2 root root  202 Dec 31  1969 my_big_file.bin -> .git/annex/objects/05/12/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin

The symbolic link is broken, because, well, files inside .git folders are not copied over, which is to be expected.

I then tried to unlock the file (see here for more info) with git annex unlock then git commit. Now the store looks like this:

❯ ls -la
total 16583
dr-xr-xr-x    2 root root       5 Dec 31  1969 .
drwxrwxr-t 6826 root nixbld 30212 Feb 26 00:18 ..
-r--r--r--    4 root root    1001 Dec 31  1969 flake.lock
-r--r--r--    5 root root     538 Dec 31  1969 flake.nix
-r--r--r--    2 root root     104 Dec 31  1969 my_big_file.bin

And now my_big_file.bin is a git-annex pointer file:

/annex/objects/SHA256E-s104857600--f6e654508eac102f1efecae5248ca66ea5193d5edf86c895843188d06deff947.bin

To conclude, I think annexed files will NEVER be available for flake eval

Feb 26 '22 06:02 yipengsun

I think it would be similar for git-lfs that the files are not available for flake eval. Because one of the main goal of both git-annex and git-lfs is to NOT add large files directly to git, and only the git part of the flake gets copied.

Feb 26 '22 06:02 yipengsun