aconfmgr icon indicating copy to clipboard operation
aconfmgr copied to clipboard

FatalError with trace, no specific information, --verbose does nothing

Open jakecoble opened this issue 4 years ago • 30 comments

 △ ~ aconfmgr --verbose save                                                                                                                                                                                  
: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/jake/.config/aconfmgr
::: Sourcing /home/jake/.config/aconfmgr/00-config.sh...
::: Done (0 native packages, 0 foreign packages, 0 files).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:330 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:883 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:332 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:883 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

Latest version from the AUR. Only thing in 00-config.sh is verbose=1. No other config files.

What can I do to get more information on the error?

jakecoble avatar Jul 18 '21 16:07 jakecoble

Could you please run aconfmgr with the -x shell flag (e.g. bash -x /usr/bin/aconfmgr), and post the output? That should, at least, allow finding the failing command.

CyberShadow avatar Jul 18 '21 18:07 CyberShadow

Seems to be SIGPIPE, so one of the commands involved must be trying to write to a pipe after a command exited. That area of the code is full of nested subshells, so it's tough to track down exactly which command is failing.

jakecoble avatar Sep 04 '21 03:09 jakecoble

So, how can we act on this? Do you have a way to reproduce the problem?

CyberShadow avatar Sep 04 '21 05:09 CyberShadow

I suspect some subtle misconfiguration of my system at play here, so I'll close this and open a new issue if I find something actually wrong with aconfmgr.

jakecoble avatar Sep 04 '21 15:09 jakecoble

@jakecoble did you ever manage to trace the source of this issue? I'm facing a similar issue myself.

I've got a fresh install of aconfmgr but a not-so-fresh install of Arch :smile:

:: ~ » aconfmgr --verbose save
: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/g/.config/aconfmgr
::: Done (configuration not found).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:346 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:900 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:348 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:900 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

( Let me know if you want me to create a new issue for this @CyberShadow )

gardar avatar Feb 01 '22 18:02 gardar

It looks like the exact same issue.

Could you please run with -x, and either post the output or try to find the failing command?

CyberShadow avatar Feb 01 '22 18:02 CyberShadow

Sure thing, here's the full output with -x:

https://gist.github.com/gardar/892d26ec4de0f7bace104deadff031d8

gardar avatar Feb 01 '22 19:02 gardar

Thanks. Unfortunately I can't tell what is failing from the log.

Could you please try this patch (without -x), and post the output: https://github.com/CyberShadow/aconfmgr/compare/debug-find

CyberShadow avatar Feb 01 '22 19:02 CyberShadow

Forgot about this issue! IIRC it turned out that I had a file path on my system with some odd special characters in it. The script was choking on that.

jakecoble avatar Feb 01 '22 19:02 jakecoble

IIRC it turned out that I had a file path on my system with some odd special characters in it. The script was choking on that.

Any hints for how to recreate this problem? (I've been keeping a file with all the special characters I could think of on my real system for testing...)

CyberShadow avatar Feb 01 '22 19:02 CyberShadow

Thanks. Unfortunately I can't tell what is failing from the log.

Could you please try this patch (without -x), and post the output: https://github.com/CyberShadow/aconfmgr/compare/debug-find

Here's the output from that branch:

: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/g/.config/aconfmgr
::: Done (configuration not found).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: tee failed!
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:1673 [FatalError]
::::: /usr/lib/aconfmgr/common.bash:339 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: find failed!
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:1673 [FatalError]
::::: /usr/lib/aconfmgr/common.bash:336 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:349 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:351 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

Forgot about this issue! IIRC it turned out that I had a file path on my system with some odd special characters in it. The script was choking on that.

Interesting, since /home is ignored I would have thought such cases would be unlikely to occur.

The only uncommon/odd thing about my paths/filesystem that I can think of is that I'm using zfs for my root, and I have some snapshots and a few zfs volumes mounted, for containers and such. Could aconfmgr be choking on that?

gardar avatar Feb 01 '22 21:02 gardar

It's certainly possible, if GNU find is unequipped to deal with such special filesystem entries.

We can test that theory - run:

find / -regextype posix-extended -not '(' '(' -regex '/dev|/home|/media|/mnt|/proc|/root|/run|/sys|/tmp|/var/cache' -o -false ')' -printf I -print0 -prune ')' -printf O -print0 > /dev/null ; echo $?

It should print 0 on success.

You could also try ignoring the root of these snapshots in the aconfmgr configuration, and see if that makes any difference.

CyberShadow avatar Feb 01 '22 22:02 CyberShadow

I'm pretty certain GNU find can handle it just fine... I tried the find command and it ran successfully (printed 0 )

I tried adding all mounts except / to IgnorePath with the same result.

I might go ahead and try to add just about everything to IgnorePath and then work my way up to find the problematic path (if there is one). Unless you have some ideas that might get us that result quicker?

gardar avatar Feb 01 '22 23:02 gardar

I'm pretty certain GNU find can handle it just fine... I tried the find command and it ran successfully (printed 0 )

I tried adding all mounts except / to IgnorePath with the same result.

That is interesting, and I'm stumped again.

I might go ahead and try to add just about everything to IgnorePath and then work my way up to find the problematic path (if there is one).

That might be the simplest way forward.

CyberShadow avatar Feb 02 '22 11:02 CyberShadow

After tracing down a lot of directories to add to IgnorePath I figured out what the issue is!

grep is eating all my ram and getting oom killed. I watched my ram go from 1.8gb usage to full 16gb usage in just a few seconds when running aconfmgr.

gardar avatar Feb 02 '22 23:02 gardar

Thanks, that's interesting!

How big is /tmp/aconfmgr-$UID/owned-files? (or ./tmp/owned-files if running from a checkout)

CyberShadow avatar Feb 03 '22 00:02 CyberShadow

63M - 892248 lines

gardar avatar Feb 03 '22 00:02 gardar

So far, unable to reproduce - grep uses a constant 631MB of RSS with 1M filter lines, no matter how much data I pipe into it.

I found this report about a memory leak in grep using a similar usage as ours (reading patterns from file):

https://www.mail-archive.com/[email protected]/msg07422.html

But it seems to have been in grep 3.4, but Arch is at 3.7.

CyberShadow avatar Feb 03 '22 00:02 CyberShadow

Strange! Could there be something else at play? If this was an issue with grep I would suspect others would be affected by this issue too.

Here's the oom dmesg in case it gives you any hints: https://gist.github.com/gardar/c2e7cd37382289ba3621373c9acd03e3

gardar avatar Feb 03 '22 00:02 gardar

One possible way we can try to make progress is to reproduce it in isolation. Here's my attempt to extract the invocation in question:

pacman --query --list --quiet | sed 's#\/$##' | sort --unique > owned-files

sudo find / -regextype posix-extended -not '(' '(' -regex '/dev|/home|/media|/mnt|/proc|/root|/run|/sys|/tmp|/var/cache' -o -false ')' -printf I -print0 -prune ')' -printf O -print0 | \
grep --null --null-data --invert-match --fixed-strings --line-regexp --file <( < owned-files sed -e 's#^#O#')

Does this gobble memory too?

CyberShadow avatar Feb 03 '22 00:02 CyberShadow

Yep, eats the memory too.

I tried it on two other machines I have that are similarily set up, and both of them seem to be unaffected by this issue.

gardar avatar Feb 03 '22 00:02 gardar

I tried it on two other machines I have that are similarily set up, and both of them seem to be unaffected by this issue.

What if you copy the input files over to the other machines?

I.e., save owned-files and the output of find to a file, and then run grep with that input.

CyberShadow avatar Feb 03 '22 00:02 CyberShadow

Found the culpit I think. Tried using ripgrep instead of grep but it failed with the following error:

/dev/fd/63:214314: found invalid UTF-8 in pattern at byte offset 22: O/usr/lib/aspell-0.60/\xEDslenska.alias (disable Unicode mode and use hex escape sequences to match arbitrary bytes in a pattern, e.g., '(?-u)\xFF')

Looked at the file and found this line: /usr/lib/aspell-0.60/íslenska.alias

After I removed it from the file the grep runs just fine and the ram doesn't spike (the usage just goes up by 1gb)

gardar avatar Feb 03 '22 01:02 gardar

Neither tools should be doing UTF-8 decoding for case-sensitive fixed strings.

I'm glad you got the problem sorted :) But, I still can't reproduce this.

By any chance have you kept a copy of the files that exhibit the grep problem?

CyberShadow avatar Feb 03 '22 01:02 CyberShadow

What about if you install https://aur.archlinux.org/packages/aspell-is/ ?

I can regenerate the file, I've done so few times already. Do you want me to get that file to you? It's too big to paste here.

gardar avatar Feb 03 '22 01:02 gardar

Do you want me to get that file to you? It's too big to paste here.

Yes, please!

CyberShadow avatar Feb 03 '22 01:02 CyberShadow

I managed to push the file to my previous gist, see if you can download it from there (scroll down to the end of the page) https://gist.github.com/gardar/c2e7cd37382289ba3621373c9acd03e3

gardar avatar Feb 03 '22 01:02 gardar

Did you manage to replicate the issue with my filelist?

I removed the aspell-is package and the issue seems to be gone, but ideally this should be fixed or at least detected and give a error that indicates what file/package needs to be removed.

gardar avatar Feb 03 '22 15:02 gardar

Did you manage to replicate the issue with my filelist?

I did, thank you. Crashed my whole computer and everything. :D

I removed the aspell-is package and the issue seems to be gone, but ideally this should be fixed or at least detected and give a error that indicates what file/package needs to be removed.

Yep, ideally it should be fixed in GNU grep. I'll see if I can narrow it down to an exemplary test case.

CyberShadow avatar Feb 03 '22 17:02 CyberShadow

Hah ok so it's definitely an issue with grep and not just an issue how grep is used in this case? I'll be damned, it's not everyday you find a bug in core gnu utils like grep! :)

gardar avatar Feb 03 '22 17:02 gardar