edb-debugger icon indicating copy to clipboard operation
edb-debugger copied to clipboard

Edb hangs or crashes on multi threads binary.

Open edpil02 opened this issue 7 years ago • 11 comments

Since update to version 1.0 and git , got hangs or crashes on some threads: -> stop_threads(): paused thread [22233] received an event besides SIGSTOP: status=0x3057. or or got a PTRACE_GETSIGINFO failed. No issues with 0.9.xx versions. Thanks and sorry for english.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

edpil02 avatar Sep 11 '18 08:09 edpil02

I assume when you say "crash", you don't mean that it segfaulted or aborted, but that it is "stuck"?

eteran avatar Sep 11 '18 15:09 eteran

Most of the time it stucks after some syscalls return (often syscall 0x38 or futex syscall). the menu keeps working, i can restart or kill the program but not keep on debugging. Sometimes dmesg shows segfaults from RIP or ptrace errors: -- enable to detach from thread 1956: PTRACE_DETACH failed -- stop_threads(): paused thread [2020] received an event besides SIGSTOP: status=0x3057 -- PTRACE_GETSIGINFO failed. Buildind latest git yesterday and disabling all signal/exception doesnt help. So revert to 0.9 version with no issue.

edpil02 avatar Sep 12 '18 07:09 edpil02

Interesting that 0.9 has no issue as there hasn't (shouldn't) been any fundamental changes to how things worked with regard to threads. Is there a program I can use to replicate it locally easily?

Or is it basically any threaded program?

Unfortunately, dealing with threads is a bit of a difficult task to get just right.

eteran avatar Sep 12 '18 19:09 eteran

With edb 1.0 you can catch or pass exceptions to many sig signals, not with 0.9 version, and i'm wondering if there is a problem in this feature. I noticed it because before, the debugger stucks currently on SIGCHLD signals.Passing this signal with the "signals/exceptions" menu solves my problem but give threads issue. Unfortunately i'm studying a close binary... I'll try to test some other programs. Thanks for your help.

edpil02 avatar Sep 13 '18 05:09 edpil02

I think I see the source of the issue... and it's complicated :-P.

I've reworked how exception ignoring works because it frankly made more sense to handle it at a lower level then we were. And I was able to resolve the hang as well.

However, there is still an issue:

Suppose there are 3 threads, and a SIGUSR1 comes in, a few things happen:

  1. we see the SIGUSR1 on a random thread, and track that thread calling it the "active thread"
  2. we send a SIGSTOP to the other 2 threads so that the whole process stops (this is what the user expects)
  3. this causes more events, which alters the "active thread"! (this is the underlying thing I need to fix)
  4. user (or debugger) says "resume, just pass that exception back to the debuggee
  5. we end up telling the "active thread" to resume with no signal, but sending the SIGUSR1 to the original thread... which ends up killing the process even if we wanted to ignore the exception.

I'll have to think a bit on how to address this, but whatever I come up with, should be able to address this issue and similar ones to it completely.

eteran avatar Sep 14 '18 03:09 eteran

"My binary" is a port of a windows program to linux. Perhaps the code is crappy so ..... Thanks again.

edpil02 avatar Sep 14 '18 06:09 edpil02

@edpil02, nothing to do with your code!

Unfortunately handling threads just right is just complicated. The have some ideas to experiment with which may deal with it well.

I'll keep this issue updated as I experiment

eteran avatar Sep 14 '18 15:09 eteran

@edpil02 Let me know if the latest in master works any better for you. I've definitely addressed some "quirks" that I was able to identify, and things seem stable in my quick and dirty tests. But things like this are hard to know for sure.

eteran avatar Sep 14 '18 20:09 eteran

Just tried latest git with the same binary:

  • The debugger doesnt stuck on syscall 0x38 now, but stop on a RT_SIGPROCMASK (that's weird) and Futex syscalls.Running again is possible too, but it stucks later with stop_threads(): paused thread errors. Seems to run smoother too.

-However when ignoring all signals exceptions with the preference menu, the debugger kills the program as soon as i run it . Got a PTRACE_GETREGS failed error and the restart menu keeps ineffective. I will try to ignore the exceptions one by one when i got more time for testing.

Thanks for your job.

edpil02 avatar Sep 15 '18 11:09 edpil02

Today I've got an EDG hang. The callstack and local variables are available here: https://gist.github.com/sorokin/8d5bc5b0a978c1cc284689f49285e881 .

At the callstack EDG is hanging on waitpid for process 9953. While the process is zombie:

$ cat /proc/9953/status | grep State
State:  Z (zombie)

Should I create a separate issue or is this the same issue?

sorokin avatar Feb 17 '20 20:02 sorokin

It may or may not literally be the same issue. But it's close enough to file under the same task

eteran avatar Feb 17 '20 23:02 eteran