rr icon indicating copy to clipboard operation
rr copied to clipboard

Support executing past execve with gdb

Open neon12345 opened this issue 6 years ago • 11 comments

I get a "Program stopped" (Suspended: Signal : 0:Signal 0) on execve calls in rr replays and a continue results in a stop loop with no progress. I was unable to find out how this stop signal is generated. A normal gdb run of the program without rr works fine.

neon12345 avatar Aug 07 '19 19:08 neon12345

You can't continue past the execve point. To debug after the execve, get the current event number with when, add some small number to it, and then try rr replay -g <event> -p <pid>.

rocallahan avatar Aug 07 '19 22:08 rocallahan

I see now that this is done in GdbServer.cc. Would it be possible to do this step automatically from there to get a user experience similar to normal gdb?

neon12345 avatar Aug 07 '19 22:08 neon12345

Maybe. I'm not sure if the remote agent protocol can handle it.

rocallahan avatar Aug 07 '19 23:08 rocallahan

When I use only gdb and set a breakpoint for the executable after execve, it is possible to step over execve and halt at the breakpoint. Would it not be possible to make a small change to GdbServer.cc to get this behaviour or is there something else? I would try to add it then.

neon12345 avatar Aug 07 '19 23:08 neon12345

The problem is that gdb talks to rr using the gdb remote protocol. That works differently from gdb just running by itself.

rocallahan avatar Aug 07 '19 23:08 rocallahan

Using gdbserver+gdb has the same behaviour. So it should be possible I guess.

neon12345 avatar Aug 07 '19 23:08 neon12345

Great!

rocallahan avatar Aug 07 '19 23:08 rocallahan

I have implemented a first version but have to give up now. In theory one has to implement the exec-events extension sending a different stop reply on execve. (This can be found in gdb/gdbserver/remote-utils.c) The register definitions can be found in gdb/gdbserver/x86-tdesc.h and gdb/amd64-tdep.c. Plus advance execution to the next event after the execve and wait for the next cont. This kind of works when running rr replay normally but not in interpreter mode with eclipse.

There are possibly multiple bugs in the gdb communication.

  1. I sometimes get errors from gdb complaining about more bytes received than expected for example when sending registers.

  2. https://github.com/mozilla/rr/issues/2239 also seems to be a problem here and just sending '3' makes gdb happy.

  3. While running in normal rr replay mode I can set a breakpoint after execve and continue to step from there. With eclipse I can see the stop at the breakpoint but at the same time the program continues to execute until the final kill signal.

I guess this is because of the handling of stop signals which should be batched. Meaning that

stop stop cont

should probably be translated to

stop cont + stop

but this is just a guess.

neon12345 avatar Aug 09 '19 06:08 neon12345

Summary: we don't support gdb executing past execve. You can work around it by digging event numbers out of the trace and doing rr replay -g .

gdb might have some feature to debug past execve but I haven't looked into it. We would accept patches if someone figures it out. But even if this can be made to work with gdb somehow, I'm almost certain it won't be able to reverse-execute through an execve, which is one reason I think Pernosco is a much better long-term approach than trying to squeeze a little bit more functionality out of gdb.

If someone does want to work on this, this issue is where we will discuss that.

rocallahan avatar Jul 16 '21 22:07 rocallahan

With our work on shared memory recording, we created a method to record our executables individually and bypass the execve issue.

neon12345 avatar Jul 17 '21 06:07 neon12345

FTR, in the meantime I added an entry about execve to FAQ, hopefully it will help if someone potentially stumbles upon it.

Hi-Angel avatar Jul 02 '23 15:07 Hi-Angel