sapling icon indicating copy to clipboard operation
sapling copied to clipboard

Actions become substantially slower over time

Open ianb-pomelo opened this issue 1 year ago • 33 comments

I have a checkout of a git repo in sapling that I have been using for a while and as I've used it more and more, the commands (such as sl status and sl) have been substantially slower. I checked out a new version of the repo with Sapling and sl is almost 2x slower in my older checkout and sl status is ~8x slower. I suspect that it has something to do with all the history that I have locally that is hidden. Is there a way to clean up the checkout a bit to remove these stale references that I don't want to bring back? Or do I just have to routinely checkout new copies?

ianb-pomelo avatar Aug 22 '24 20:08 ianb-pomelo

I'm not sure about sl status, but in terms of interacting with Git, I just ran:

git --git-dir .sl/store/git fsck

which appeared to clean some things up?

bolinfest avatar Sep 06 '24 19:09 bolinfest

Had same issue when upgrading to the latest version a while ago. I re-cloned with SL and it's working fine until recently, it's slower again.

nicktogo avatar Sep 17 '24 20:09 nicktogo

Is your repo a .git repo or .sl repo? Which directory does the repo root has?

If it's .git/ repo, does git status make subsequential git/sl status faster? If it's .sl/ repo, what's the output of sl debugstatus and sl debugtree? Do you have watchman installed? Does --config workingcopy.use-rust=true --config status.use-rust=true (or false) make a difference?

status should not be impacted by commit history.

quark-zju avatar Sep 18 '24 00:09 quark-zju

Mine is a .git repo. I tried running git status with git --git-dir .sl/store/git status but got an error fatal: this operation must be run in a work tree.

I've attached the debug logs for sl status that has the couple second delay in checking the status

2024-09-18T14:30:50.932032Z DEBUG run: status_info: status_mode="rust"
2024-09-18T14:30:50.932036Z  INFO run:status: workingcopy::workingcopy: enter
2024-09-18T14:30:50.933227Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: enter
2024-09-18T14:30:50.933743Z DEBUG progress_model::registry: registering ProgressBar connecting watchman
2024-09-18T14:30:50.946475Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: exit
2024-09-18T14:30:50.946499Z DEBUG run:status: watchman_info: watchmanfallback=1
2024-09-18T14:30:50.946506Z  WARN run:status: workingcopy::filesystem::watchmanfs::watchmanfs: watchman error - falling back to slow crawl err=While invoking the watchman CLI to discover the server connection details: No such file or directory (os error 2), stderr=``
2024-09-18T14:30:50.946571Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: enter
2024-09-18T14:30:50.946586Z DEBUG run:status:pending_changes: workingcopy::filesystem::physicalfs: working copy parents: [Ok(HgId("a1b73645a2d513b0b07fbbb8105a0537a04e3f8a"))]
2024-09-18T14:30:50.946737Z DEBUG run:status:pending_changes: progress_model::registry: registering ProgressBar comparing
2024-09-18T14:30:50.946744Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: exit
2024-09-18T14:30:50.946749Z  INFO run:status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.946758Z  INFO run:status:compute_status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.955040Z DEBUG progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.747772Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747795Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747802Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: enter
2024-09-18T14:30:52.748272Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748327Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748335Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748374Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: exit
2024-09-18T14:30:52.748540Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748652Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.748848Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.748949Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.749031Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749039Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749045Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749054Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: enter
2024-09-18T14:30:52.749060Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: pending_change_count=0 timeout_secs=Some(0)
2024-09-18T14:30:52.749065Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: treestate::dirstate: skipping treestate flush - it is not dirty
2024-09-18T14:30:52.749068Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: exit
2024-09-18T14:30:52.750157Z  INFO run:status:compute_status:compute_status{pending_changes_count=0 manifest_files_count=0}: workingcopy::status: exit
2024-09-18T14:30:52.750163Z  INFO run:status:compute_status: workingcopy::status: exit
2024-09-18T14:30:52.750167Z  INFO run:status{status_len=0}: workingcopy::workingcopy: exit
2024-09-18T14:30:52.751429Z DEBUG run: clidispatch::context: maybe starting pager enable_pager=true reason="auto"
2024-09-18T14:30:52.755092Z DEBUG run: progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.755551Z  INFO run: commands::run: exit

ianb-pomelo avatar Sep 18 '24 14:09 ianb-pomelo

I tried running git status with git --git-dir .sl/store/git status

That would be a .sl repo. A .git repo is when you run git init or git clone followed by running sl commands in it and is a separate issue (duplicated with #929).

Could you try the .sl instructions above?

quark-zju avatar Sep 18 '24 15:09 quark-zju

Ah sorry, the output is

sl debugstatus
len(dirstate) = 7915
len(nonnormal) = 4
len(filtered nonnormal) = 4
clock = None
sl debugtree
dirstate v2 (using treestate/60f847df-7593-4f1f-b826-66c1f92973c4, offset 2811706, 7915 files tracked)

I do not have watchman installed

ianb-pomelo avatar Sep 18 '24 15:09 ianb-pomelo

sl debugstatus len(dirstate) = 7915 len(nonnormal) = 4 len(filtered nonnormal) = 4 clock = None

Thanks. It looks normal. Something might have regressed without our notice (non-watchman, non-edenfs use-cases are rare in production). I can have a look when I get time.

quark-zju avatar Sep 18 '24 18:09 quark-zju

@ianb-pomelo What is the version of Sapling you're using? Are you building sl from source? Have you tried

sl status --config workingcopy.use-rust=true --config status.use-rust=true

?

quark-zju avatar Sep 18 '24 20:09 quark-zju

I installed Sapling via Homebrew and am using version Sapling 0.2.20240718-145624-f4e9df48

I tried that command and it was about the same performance as without that config.

I have also tried installing watchman and that substantially sped things up so for me I can just use that setup

ianb-pomelo avatar Sep 18 '24 20:09 ianb-pomelo

I'm using .sl now and I have watchman running. Here are my debug output:

➜  sl debugstatus                                                              
len(dirstate) = 519805
len(nonnormal) = 1
len(filtered nonnormal) = 1
clock = c:1726596135:5724:1:46787
➜  sl debugtree                                                                
dirstate v2 (using treestate/37b9b29e-9233-4a38-b688-96be70bd6b3c, offset 43057603, 519805 files tracked)

setting the flag to true is faster than false. If I just use sl status, that seems to be same as setting to true

sl status --config workingcopy.use-rust=true --config status.use-rust=true

nicktogo avatar Sep 18 '24 20:09 nicktogo

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

nicktogo avatar Sep 20 '24 20:09 nicktogo

I have encountered this, too.

A quick run under Instruments showed that in my case the issue was due to multi-pack index validation. The issue only appears after some time because originally there is no multi-pack index, as the count of pack files is low. This also means I could fix it temporarily by running git's garbage collection on the .sl/store/git directory.

The reason it's slow seems to be due to excessive validations on multi-pack index loading as described in https://github.com/libgit2/libgit2/issues/6166 which is fixed in the upstream. Indeed, simply recompiling sapling with the git2 rust dependency bumped from version 0.14 to the latest 0.19 fixed the problem for me.

destroycomputers avatar Oct 01 '24 21:10 destroycomputers

Thanks for sharing! @destroycomputers this works for me too!!

nicktogo avatar Oct 03 '24 00:10 nicktogo

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

As a temporary solution, you can try disabling multi-pack index by appending to .sl/store/git/config after #### End maintained by Sapling ####:

[maintenance "gc"]
enabled = true
[maintenance "incremental-repack"]
enabled = false

Then delete the multi-pack index file:

rm .sl/store/git/objects/pack/multi-pack-index

The underlying complexity is with --config git.shallow=1 (sapling config, matches git's "filter" concept, not git's "shallow"), the default gc can error out saying some object is not found (reported internally). But multi-pack repack works. If you're not using git.shallow feature you don't need the multi-pack index and can use the regular gc.

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

quark-zju avatar Oct 03 '24 17:10 quark-zju

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

@quark-zju meanwhile, would it be possible to bump the version of the git2 dependency to latest? That would at least fix this specific issue and given the currently used version is over 2 years old, maybe will have some other positive effects.

destroycomputers avatar Oct 04 '24 23:10 destroycomputers

would it be possible to bump the version of the git2 dependency to latest?

Good catch. I didn't realize we're using an old version. I tried upgrading it but it does not seem as trivial since other 3rd party dependencies in the monorepo (like cargo, sqlite, etc) wants the older version. Upgrading them could be a choice but if it's too hard we might try different approaches.

UPDATE: It is more complex than I thought. Let's say it's not feasible to upgrade now. Our plan might be migrating off libgit2.

quark-zju avatar Oct 07 '24 17:10 quark-zju

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git).

Just as a data point, I've ran into this pretty consistently with a large git repo that I've been working in where I get object not found errors after a sl pull, and disabling incremental repack and deleting the file fixes it.

yiding avatar Dec 05 '24 22:12 yiding

Hi @quark-zju, I'm running into the same issues where I get "object not found" after a while. Disabling incremental repack fixes it, but now I'm running into performance issues.

I see that sapling is on libgit2-0.28.1. Would it be alright if I picked your brain about the update to libgit2? I would be willing to put in the work to upgrade this dependency but I need help understanding the repo structure and the internal Meta tooling.

johnpanos avatar Dec 30 '24 17:12 johnpanos

I see that sapling is on libgit2-0.28.1. Would it be alright if I picked your brain about the update to libgit2?

Hi @johnpanos. Thanks for the update. The complexity of upgrading is mainly because the libgit2 is also used in other projects internally in a monorepo (and shared across languages, esp. C++). Upgrading it only for Sapling but not breaking builds for other projects is tricky. It seems it's more feasible to just migrate to using git commands (ex. git cat-file --batch to read objects, but needs to be careful to not trigger remote fetches) or gitoxide.

By the way, are you using .sl mode or .git mode? For the .sl mode we hope to migrate from git's storage system to our own which will known issues like "object not found" (reported a few times on Discord), "too many file descriptors" (ex. #997), and no longer require repack to maintain performance.

quark-zju avatar Dec 30 '24 18:12 quark-zju

@quark-zju is it possible in your build system to compile the dependency statically? Given their (git2 crate) set up, cargo will compile libgit2 statically unless it finds a dynamic library in the system on which compilation takes place. It can also be overriden using a feature flag (vendored-libgit2) to always compile statically. That would allow to upgrade the dependency only in sapling without affecting other projects.

FWIW, the public distribution of sapling that can be found in the Releases of this repository does vendor libgit2 and isn't dependent on it dynamically.

destroycomputers avatar Dec 30 '24 18:12 destroycomputers

Thanks for the quick reply @quark-zju! I also appreciate you letting me know about the monorepo setup, I see how that can be almost insurmountable.

I'm using .sl mode, which means I'm interested in this long-term vision you've mentioned. Is there anywhere I can learn more about the new planned internal storage system?

johnpanos avatar Dec 30 '24 18:12 johnpanos

is it possible in your build system to compile the dependency statically?

Internally we use buck2 to build. The cargo.toml files are generated from buck2 definitions. The problem is that the buck2 definition of libgit2 is shared across languages and projects in the monorepo. So if you want to update it (regardless of using static or dynamic linking) by default it potentially break other Rust/C++/Python projects in the monorepo that directly/indirectly share the same libgit2 buck2 definition. While it might be possible to "fork" related projects for sapling's use-case, it's usually a last resort, not considered as a good practice, and in this case you'd fork C++ libgit2, Rust git2, configure them for all platforms (windows, linux, macos-arm, macos-intel) in the buck2 way, which is non-trivial as well.

Is there anywhere I can learn more about the new planned internal storage system?

You can read some docs at https://sapling-scm.com/docs/dev/internals/indexedlog and https://sapling-scm.com/docs/dev/internals/zstdelta. The indexedlog crate was published as sapling-indexedlog. We can publish the zstdelta and zstore crate too if you're interested.

The zstore is already used for tracking the metadata (i.e. references), if you want to (roughly) compare it with git you can look at the .sl/store/metalog vs a git repo created by sl debugexportmetalog /tmp/git-repo.

quark-zju avatar Dec 30 '24 19:12 quark-zju

@quark-zju fair enough, thanks for the reply.

destroycomputers avatar Dec 30 '24 19:12 destroycomputers

@quark-zju IndexedLog is fascinating. I'm definitely going to do a deep dive on this after my $DAYJOB. I definitely have a lot of questions I'd like to ask, but I don't believe this is the appropriate forum for them.

Is there a more appropriate community discord/slack that I can ask those in instead?

johnpanos avatar Dec 30 '24 19:12 johnpanos

Is there a more appropriate community discord/slack that I can ask those in instead?

There is a Discord link at the bottom of https://sapling-scm.com/

quark-zju avatar Dec 30 '24 20:12 quark-zju

Alrighty, I updated the libgit2 version to 1.9.0 and the rust git2 version to 0.20.0. I then added the code below and ran make oss to build a local version:

impl GitStore {
    /// `open` a Git bare repo at `git_dir`. Gain access to its odb (object database).
    pub fn open(git_dir: &Path, config: &dyn Config) -> Result<Self> {
        let version = git2::Version::get();
        let (major, minor, rev) = version.libgit2_version();
        println!("libgit2 version: {}.{}.{}", major, minor, rev);

I get this output when running sl pull:

➜  sapling git:(main) ✗ sl pull
libgit2 version: 1.9.0
pulling from https://github.com/facebook/sapling
remote: Enumerating objects: 5077, done.
remote: Counting objects: 100% (1909/1909), done.
remote: Compressing objects: 100% (128/128), done.
remote: Total 5077 (delta 1823), reused 1820 (delta 1778), pack-reused 3168 (from 3)
Receiving objects: 100% (5077/5077), 2.36 MiB | 7.83 MiB/s, done.
Resolving deltas: 100% (2759/2759), completed with 417 local objects.
From https://github.com/facebook/sapling
   ebbdd2d6ec..df5b760748  df5b760748517cb6e580efb6a8b985c20bdc4ba2 -> origin/main
➜  sapling git:(main) ✗

My question is that if the Rust compiles, does that mean Sapling is guaranteed to work fine? I understand the internal issues with upgrading this dependency, but @quark-zju do you think this will be fine for my use?

johnpanos avatar Jan 17 '25 19:01 johnpanos

My question is that if the Rust compiles, does that mean Sapling is guaranteed to work fine?

Yes. Obviously there could be bugs in dependencies but there is no requirement (of the compiler) for the first party code to work correctly (unlike the linux kernel which might require some gcc flags).

quark-zju avatar Jan 17 '25 19:01 quark-zju

I've been struggling with actions becoming substantially slower over time.

I'm using .git mode. (Because my workflow has tools that automatically use git and with I was using .sl mode I was experiencing occasional corruptions).

It seems this thread is mixing .sl mode and .git mode. I'm wondering if I should open a unique issue for .git mode?

Over the last few months I've done a few things, each of which temporarily speed me up (or at least seems to) and then I regress:

  • Start with new pull/copy-of-the-repository.
  • Installed watchdog (not sure if this is expected to speed me up in .git mode)
  • running git gc

Things I tried but broke sapling:

  • Running git prune (the sapling was complaining about not being able to find commits).
  • After running git prune and breaking sapling I ran: git fetch origin '+refs/heads/*:refs/remotes/origin/*' and git fetch origin '+refs/pull/*/head:refs/remotes/origin/pr/*' and that fixed the missing commits; but sapling became extremely slow.

I'm not really sure what makes sapling slow in .git mode. I suspect it's either some sapling history in the repo (old/hidden PRs) or the git repo complexity. I'm at a loss how to keep it fast or what commands to run to make it fast again. The only thing that seems to reliably work is to blow away my git repo and pull a fresh copy.

Most of the sapling doctor/diagnose like commands I've found and try don't work in .git mode IIUC.

Actionably I'm wondering:

  • Should I open a separate issue for this?
  • Can I help provide some info to debug this issue?
  • Are there a set of cleanup commands and/or git repo structure to maintain to keep sapling fast when in .git mode?

tgrue-openai avatar Jul 16 '25 00:07 tgrue-openai

@tgrue-openai

If the slowness is on the working copy (affects sl status, etc), there are some related discussions at https://discord.com/channels/1042527964224557107/1337126466906886165

Basically, there are 2 known issues:

  • The usage of --untracked-files=all makes git status incompatible with git's fsmonitor. So if you benefit from git's fsmonitor, your sl status will be slower than git status.
  • The usage of --no-optional-locks makes git status gradually slower without being able to self-recover. If this is the case, running git status once when it's slow can restore performance.

We haven't decided how to fix those issues yet. For the first issue, the more "reliable" fix would be stopping delegating to git status and switching to our own working copy stack (like the one used in .sl mode). This requires us to also implement Git LFS, which is non-trivial. It also hurts compatibility a bit. The (easier?) fix would be avoiding --untracked-files=all and expanding untracked directories ourselves.


If the slowness is on the object storage (affects sl pull, sl sl, etc), there are some known issues:

  • Too many references hurts performance. Feel free to discuss on Discord about this issue.
  • We disabled git gc to workaround (older versions of) a libgit2 bug in the past (that can cause objects to be missing if you have multi-pack indexes). We no longer disable git gc with the latest release (which has an updated libgit2 that no longer has the bug).

Let me know if your case falls into the known issues or if it's something different.

quark-zju avatar Jul 21 '25 16:07 quark-zju

@quark-zju thank you for the response; it helps me think about this a little more clearly.

Some notes about my slow scenarios:

  1. sl rebase and sl pull are the slow commands for me. sl pull usually run via vscode ISL and rebase usually done in the vscode ISL by dragging and dropping.
  2. I primarily use ISL (the cmd line is also slow, but maybe not as slow and I don't use it that often). Having the info in your reply I'll keep an eye out to narrow down the issue.
  3. I may have git running cmds in the background sometimes (various apps, vscode's git integration, etc).
  4. I use sl pr (primarily via ISL; but also via the command line). pr actions in particular seem slower in ISL than command line.

I also see a fair amount of transient errors that refreshing in ISL usually gets me out of.

Because this is heavily ISL, is this the right place to discuss? Let me know if not.

Sapling CLI version: Sapling 0.2.20250521-115337-25ed6ac4 My ISL version: ISL version 0.1.63 (vscode) (I realized when writing this up this version is on the older side, but it's the newest vscode extension in the marketplace. I tried building a newer one but I don't believe that's documented ran into some errors that implied missing deps that I wasn't sure how to work through). I tried the sl web version of ISL (ISL version 0.2.20250521-115337-25ed6ac4 (browser)) and I have similar slowness

A few things I notice:

  1. I see a lot of transient errors in ISL i usually can make go away by refreshing. One random example below, but I see many others. They eventually tend to resolve after a few minutes.
  2. I don't know how to see the output for the web version of ISL but I see a lot of spew for work being done for the vscode ISL in the debug output. Individual steps are what's slow I think but I wonder if there is a bunch of other stuff happening.
  3. I also notice a lot of sl pr output (maybe driven by ISL ui pulling this information). Sometimes I am left with unresolved spinning state (in the screenshot) and I also end up with

Example of Errors in ISL:

Failed to fetch Uncommitted Changes Command /opt/homebrew/bin/sl "status" "-Tjson" "--copies"… exited with non-zero status with exit code 130 Error: Command /opt/homebrew/bin/sl "status" "-Tjson" "--copies"… exited with non-zero status with exit code 130 at Qo (/Users/tgrue/Library/Application Support/Sapling/ISL/isl-server/dist/server-DXUZru8e.js:1:147337) at /Users/tgrue/Library/Application Support/Sapling/ISL/isl-server/dist/server-DXUZru8e.js:1:128687 at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

Image

Another error, this one I think is caused if I leave unresolved conflicting binary objects in a rebase. I just tried in the command line and have the problem if I ask to leave them unresolved, but if I choose local, I don't get the error. This error leaves my sl repo in a bad state (makes it look like i have a new change with the files from the change I was trying to rebase; and leaves the original change alone).

abort: Resolve Git reference: ""
Caused by:
    [] is not a 40-byte hex string

====

In summary, I'm going to try using the sapling CLI more and see if that helps. I think I'm hitting a swarm of three things: (1) somewhat slow commands, (2) sapling outright bugs + poor interactions with git output it might not be handling right), and (3) ISL kind of breaking down in light of those two things.

For actual sapling bugs/traces I'll try to open new issues.

tgrue-openai avatar Aug 08 '25 18:08 tgrue-openai