Can dpkg-query be made multithreaded, and would that help?
It's really slow. I don't mind working on it if maintainers think it's possible.
Would be quite a lot of work. Many processes rely on others running beforehand. For instance, during deployment, the copy operations may not run in parallel, since this might induce race conditions.
Some aspects can be parallelized, though, even though I'm not sure there will be a huge benefit speed wise. For instance, looking up copyright files on Debian and derivatives is really, really slow on some systems (less of a problem in one-time VMs as used in CI/CD environments).
The pre-copy phase (which we consider the deployment phase, i.e., looking up dependencies and registering them in queues to defer the execution of the actual copy process) could be done in multiple threads as long as access to the data storage is synchronized. But, again, I am not entirely sure there's a huge gain.
What exactly is your problem, though? "It is slow" doesn't tell me anything useful. There's a million ways in which an application can be slow.
I only mean it takes a long time, but I expect it to as the resulting AppImage is, in this case, 128MB.
Yeah, but why? Are you using plugins which take a long time because they redundantly run some deployment processes, for instance? Or is it your system's I/O that slows everything down? Are you on Debian and are affected by that copyright files deployment slow-down I observe a lot? Come on, please be a little more helpful. https://www.chiark.greenend.org.uk/~sgtatham/bugs.html
Yes I'm on Debian and yes I did see a lot of dpkg-query notices. I am trying, I hate when users do this too, lol. I just couldn't imagine my distribution mattered.
Just for testing, re-run the process with export DISABLE_COPYRIGHT_FILES_DEPLOYMENT=1. Not recommended for production, but for A/B testing, it'll do.
Yep that's the issue. The deployment flies by with that flag.
That sure is annoying. I see why it's happening, it's because the files are linked to /lib and not in /usr/lib.
Compare:
fred@mapache:~/Workspace/TTAegisub/packages/appimage_bundle$ dpkg-query -S /usr/lib/x86_64-linux-gnu/libffms2.so.5.0.0
libffms2-5:amd64: /usr/lib/x86_64-linux-gnu/libffms2.so.5.0.0
fred@mapache:~/Workspace/TTAegisub/packages/appimage_bundle$ dpkg-query -S /lib/x86_64-linux-gnu/libffms2.so.5.0.0
dpkg-query: no path found matching pattern /lib/x86_64-linux-gnu/libffms2.so.5.0.0
$ ls -alh /|grep lib
lrwxrwxrwx 1 root root 7 Feb 19 2022 lib -> usr/lib
lrwxrwxrwx 1 root root 9 Feb 19 2022 lib32 -> usr/lib32
lrwxrwxrwx 1 root root 9 Feb 19 2022 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 10 Feb 19 2022 libx32 -> usr/libx32
Why not resolve symlink before the call out to dpkg-query? Am I missing something?
So just for the record, this can happen when your I/O isn't quite up to date (e.g., using HDDs (or SSDs via SATA)), when you have hundreds of packages installed, when your filesystem induces a slowdown etc. I'd call this an I/O bandwidth issue. As said, in the real world, it usually doesn't matter so much since most people use automated builds to generate their release binaries (which is something I'd recommend, too). For local testing, the export workaround will help you speed up things. The copyright files deployment typically shouldn't introduce any bugs in production as it's hardly ever touched anyway.
By the way, you should post links to public projects so devs can have a glance at it to look for typical issues.
Why not resolve symlink before the call out to dpkg-query? Am I missing something?
Probably an oversight, I guess? On the other hand, dpkg-query should be aware of the symlinks, since they're part of the pagkage. Does resolving the links beforehand speed up the process?
The symlinks are not part of the package in this case, the symlinks in the root are put there by the base-files package.
hundreds of packages installed
Thousands.
fred@mapache:~$ dpkg -l | wc -l
4087
The symlinks are not part of the package in this case, the symlinks in the root are put there by the base-files package.
I guess in that case, for peace of mind, you'd actually want to ask dpkg-query about either location...
I actually think this is a bug in linux-deploy and you should be calling readlink.
But how can we guarantee (at least to some extent) we capture the right copyright file if we look up the paths first? And, again, does it speed things up? As said, I think actually it might make sense to even look up both locations.
I don't know if it speeds it up as I haven't patched this file yet or built linuxdeploy yet.
https://github.com/linuxdeploy/linuxdeploy/blob/f8d8f499123c1c68a92fb7a76af0ff0cac56039a/src/core/copyright/copyright_dpkgquery.cpp#L13
Patching this should be easy, unless you're against fixing it this way.
As I see it we now have two issues: copyright detection failure due to non-resolution of symlinks, and the original speed issue.
Patching this should be easy, unless you're against fixing it this way.
Surely not. I expressed my concerns about changing the process. This just needs testing. Please don't hesitate to open a PR, ideally with some number crunching.
With the advent of #231 I went from almost all my libraries except ones I compiled myself in /opt failing to find copyright files to none of them failing, so I have renamed this issue, as it does nothing for the speed issue.
As said before, I'm not sure multithreading will speed things up, since it looks like some I/O bottleneck to me. We'd need an alternative frontend for dpkg, I guess... (maybe we can ask it to list multiple packages at once...?).