syscall_intercept intercept_hook_point_clone_child() is not called within the thread created by fork(), not pthread

intercept_hook_point_clone_child() is called only when the child thread is created by pthread_create(), but not with fork(), although both GLIBC's fork() and pthread_create() with NPTL uses the same system call, clone().

[ Tested Environment ] Ubuntu 24.04 / 6.8.0-35-generic x86_64 GNU/Linux (from official Ubuntu repository) / GLIBC 2.39-0ubuntu8.2 (also)

Jun 06 '24 11:06 hurryman2212

Until someone more knowledgeable answers, maybe this can help... On my system, there is a difference between pthread_create() and fork():

$ strace -f -e trace=clone,clone3,fork,vfork ./fork
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a81d6a10)
$ strace -f -e trace=clone,clone3,fork,vfork ./pthread
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fe92aa70990, parent_tid=0x7fe92aa70990, exit_signal=0, stack=0x7fe92a270000, stack_size=0x7fff80, tls=0x7fe92aa706c0} => {parent_tid=[93337]}, 88)

The problem is not in the syscall itself (clone() or clone3()) but in the passing of the stack to the child. In the strace output above, fork() doesn't allocate any stack for the child process: child_stack=NULL. Consequentially, that argument is checked in src/intercept.c, in the function intercept_routine() (desc.args[1] for clone() and desc.args[0] for clone3()). If you want to use fork() or clone() to create a child process that uses a duplicate of the parent's stack, you will probably have to remove that desc.args[1] check in the if statement.

I don't know what could be a negative implication of that code change.

[ Tested Environment ] Debian 12 / 6.1.0-17-amd64 / GLIBC 2.36-9+deb12u7

Jun 11 '24 02:06 slate5

Until someone more knowledgeable answers, maybe this can help... On my system, there is a difference between pthread_create() and fork():
$ strace -f -e trace=clone,clone3,fork,vfork ./fork
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a81d6a10)
$ strace -f -e trace=clone,clone3,fork,vfork ./pthread
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fe92aa70990, parent_tid=0x7fe92aa70990, exit_signal=0, stack=0x7fe92a270000, stack_size=0x7fff80, tls=0x7fe92aa706c0} => {parent_tid=[93337]}, 88)
The problem is not in the syscall itself (clone() or clone3()) but in the passing of the stack to the child. In the strace output above, fork() doesn't allocate any stack for the child process: child_stack=NULL. Consequentially, that argument is checked in src/intercept.c, in the function intercept_routine() (desc.args[1] for clone() and desc.args[0] for clone3()). If you want to use fork() or clone() to create a child process that uses a duplicate of the parent's stack, you will probably have to remove that desc.args[1] check in the if statement.

I don't know what could be a negative implication of that code change.

[ Tested Environment ] Debian 12 / 6.1.0-17-amd64 / GLIBC 2.36-9+deb12u7

This is actually the exact conclusion I have come up with. Although I also don't know why it checks new stack frames and thought that was related to handling of separate child/parent hooks.

Jun 20 '24 02:06 hurryman2212

You can't return after calling clone in the child with a new stack at intercept_routine:702 because there is nothing to return to. The return address is stored on the stack and it's empty after clone. That's why there is special handling before clone is invoked

Jun 22 '24 02:06 en4bz

You can't return after calling clone in the child with a new stack at intercept_routine:702 because there is nothing to return to. The return address is stored on the stack and it's empty after clone. That's why there is special handling before clone is invoked

Yes, I already know that. What I and the above comment meant is why it checks child_stack = NULL. Were intercept_hook_point_clone_* meant to be used only for threads within the same thread group, not fork()-ed processes?

Jun 22 '24 17:06 hurryman2212

Now that I understand the internals of the library a bit better, I can appreciate what @en4bz meant by his comment above. In short, if cloning with a new (empty) stack would be done in intercept_routine:691 the child wouldn't be able to return to intercept_wrapper.S:171 nor a child would be able to restore context in intercept_wrapper.S:195 from an empty stack. That's why it is needed to return to the assembly part of the library, restore context there, and then execute the syscall (clone) instruction. The consequence is that the library doesn't log (intercept_routine:700) after the syscall (intercept_routine:691) in cases like this.

I'm clarifying this for users like me and @hurryman2212 who expect that intercept_hook_point_clone_* work for every clone, but it seems that these hooks are offered to a user only as alternatives to lack of logging (intercept_routine:700).

Jun 23 '24 03:06 slate5

intercept_hook_point_clone_child() is not called within the thread created by fork(), not pthread_create().