intercept_hook_point_clone_child() is not called within the thread created by fork(), not pthread_create().
intercept_hook_point_clone_child() is called only when the child thread is created by pthread_create(), but not with fork(), although both GLIBC's fork() and pthread_create() with NPTL uses the same system call, clone().
[ Tested Environment ] Ubuntu 24.04 / 6.8.0-35-generic x86_64 GNU/Linux (from official Ubuntu repository) / GLIBC 2.39-0ubuntu8.2 (also)
Until someone more knowledgeable answers, maybe this can help...
On my system, there is a difference between pthread_create() and fork():
$ strace -f -e trace=clone,clone3,fork,vfork ./fork
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a81d6a10)
$ strace -f -e trace=clone,clone3,fork,vfork ./pthread
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fe92aa70990, parent_tid=0x7fe92aa70990, exit_signal=0, stack=0x7fe92a270000, stack_size=0x7fff80, tls=0x7fe92aa706c0} => {parent_tid=[93337]}, 88)
The problem is not in the syscall itself (clone() or clone3()) but in the passing of the stack to the child. In the strace output above, fork() doesn't allocate any stack for the child process: child_stack=NULL. Consequentially, that argument is checked in src/intercept.c, in the function intercept_routine() (desc.args[1] for clone() and desc.args[0] for clone3()). If you want to use fork() or clone() to create a child process that uses a duplicate of the parent's stack, you will probably have to remove that desc.args[1] check in the if statement.
I don't know what could be a negative implication of that code change.
[ Tested Environment ] Debian 12 / 6.1.0-17-amd64 / GLIBC 2.36-9+deb12u7
Until someone more knowledgeable answers, maybe this can help... On my system, there is a difference between
pthread_create()andfork():$ strace -f -e trace=clone,clone3,fork,vfork ./fork clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a81d6a10) $ strace -f -e trace=clone,clone3,fork,vfork ./pthread clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fe92aa70990, parent_tid=0x7fe92aa70990, exit_signal=0, stack=0x7fe92a270000, stack_size=0x7fff80, tls=0x7fe92aa706c0} => {parent_tid=[93337]}, 88)The problem is not in the syscall itself (
clone()orclone3()) but in the passing of the stack to the child. In thestraceoutput above,fork()doesn't allocate any stack for the child process:child_stack=NULL. Consequentially, that argument is checked in src/intercept.c, in the functionintercept_routine()(desc.args[1]forclone()anddesc.args[0]forclone3()). If you want to usefork()orclone()to create a child process that uses a duplicate of the parent's stack, you will probably have to remove thatdesc.args[1]check in theifstatement.I don't know what could be a negative implication of that code change.
[ Tested Environment ] Debian 12 / 6.1.0-17-amd64 / GLIBC 2.36-9+deb12u7
This is actually the exact conclusion I have come up with. Although I also don't know why it checks new stack frames and thought that was related to handling of separate child/parent hooks.
You can't return after calling clone in the child with a new stack at intercept_routine:702 because there is nothing to return to. The return address is stored on the stack and it's empty after clone. That's why there is special handling before clone is invoked
You can't return after calling
clonein the child with a new stack at intercept_routine:702 because there is nothing to return to. The return address is stored on the stack and it's empty afterclone. That's why there is special handling beforecloneis invoked
Yes, I already know that. What I and the above comment meant is why it checks child_stack = NULL. Were intercept_hook_point_clone_* meant to be used only for threads within the same thread group, not fork()-ed processes?
Now that I understand the internals of the library a bit better, I can appreciate what @en4bz meant by his comment above. In short, if cloning with a new (empty) stack would be done in intercept_routine:691 the child wouldn't be able to return to intercept_wrapper.S:171 nor a child would be able to restore context in intercept_wrapper.S:195 from an empty stack. That's why it is needed to return to the assembly part of the library, restore context there, and then execute the syscall (clone) instruction. The consequence is that the library doesn't log (intercept_routine:700) after the syscall (intercept_routine:691) in cases like this.
I'm clarifying this for users like me and @hurryman2212 who expect that intercept_hook_point_clone_* work for every clone, but it seems that these hooks are offered to a user only as alternatives to lack of logging (intercept_routine:700).