llama.cpp nix: consider moving outputs to legacyPackages for lazier evaluation

(This issue is for a conversation concerning specifically the flake.nix, open here for transparency)

The proposal

The current approach to managing multiple variants of llama (BLAS, ROCm, CUDA) is to instantiate Nixpkgs several times in a flake-parts module, expose these instances as arguments for other flake-parts "perSystem" modules, and use them to populate the flake's packages and checks: https://github.com/ggerganov/llama.cpp/blob/15499eb94227401bdc8875da6eb85c15d37068f7/flake.nix#L150-L157

This means that running simple commands like nix flake show will trigger evaluating all of these several nixpkgs instances, tripling the evaluation cost.

We could instead remove all these non-default instantiations from _module.args, and variants from packages and checks, and define all of the same things under legacyPackages. Defining legacyPackages.${system}.cuda, we'd still be able to use nix build .#cuda, but we won't see it in nix flake show, and we might build it without evaluating (much of) import nixpkgs { config.rocmSupport = true; }. More specifically, I'm guessing what we want to avoid is evaluating llamaPackagesRocm.llama-cpp.meta and llamaPackagesRocm.llama-cpp.outPath, but this needs more investigation

Broader context

The broader issue is that llama.cpp shouldn't need to instantiate its own nixpkgs at all, but there should be an ergonomic way to pass the flake a different nixpkgs config. Cf. e.g. https://github.com/NixOS/nixpkgs/pull/160061

CC @philiptaron

Feb 23 '24 11:02 SomeoneSerge

How much of an issue is this performance problem? My opinion is that the discoverability pays for the slower performance, and for production use, when llama.cpp is (for instance) in an overlay, none of this matters in the slightest, since it only gets instantiated at nixos-rebuild time.

Feb 23 '24 17:02 philiptaron

How much of an issue is this performance problem?

https://zimbatm.com/notes/1000-instances-of-nixpkgs

Mar 21 '24 18:03 SomeoneSerge

I'm asking a different question, @SomeoneSerge -- how much of an issue is this performance problem for llama.cpp repository, in terms of actual time spent. Since the proposed change comes with what I consider to be serious discoverability issues -- it turns off nix flake show, basically -- I'm loathe to accept it.

I get the 1000 instances point of view. I don't think it's a problem for this flake's actual audience, where getting to know that the accelerated version exists at all is the difference between "slow" and "fast".

That's how I perceive it.

Mar 21 '24 21:03 philiptaron

I don't see how nix flake show is of any importance, but I am concerned that we're giving nix a poor reputation by explicitly exposing the worst (performance-wise) case to the public:)

actual time spent.

I'd rather measure the minimum RAM required to evaluate the outputs.
I suspect we end up evaluating packages even when we only consume the overlays

Mar 21 '24 21:03 SomeoneSerge

This issue was closed because it has been inactive for 14 days since being marked as stale.

May 05 '24 01:05 github-actions[bot]