parallel icon indicating copy to clipboard operation
parallel copied to clipboard

Unusual thread death with no messages

Open feamsr00 opened this issue 6 years ago • 13 comments

Hello,

Ok, so I've seen some odd error states show up when running some recent thread tests. Under certain conditions we were faced with a script that died in the middle of execution with no indication of what killed it, or for.

<?php

namespace JobTest;

use parallel\Runtime as Thread_Runtime;
use parallel\Channel as Thread_Channel;

use Closure;

const DEFAULT_CHANNEL_CAPACITY = 1024;
const THREAD_ENVIRONMENT_STUB = __DIR__.DIRECTORY_SEPARATOR.'ThreadBooter.php'; //empty files


class Job
{
    public function setChannel(Thread_Channel $oData)
	{
		$this->oData = $oData;
	}

    public function run()
    {
        return "Hello World";
    }
}


class Agent
{
	protected $hThreadRuntime = False;

	public function __construct()
	{
		if (!$this->hThreadRuntime)
		{
			$this->hThreadRuntime=new Thread_Runtime(THREAD_ENVIRONMENT_STUB);
		}
		$this->oData=new Thread_Channel(DEFAULT_CHANNEL_CAPACITY);
		$this->cExecute = function (object $oJob, $oData)
		{
			error_reporting(E_ALL);
			ini_set('error_log', __DIR__.DIRECTORY_SEPARATOR.'thread-engine.log');
			error_log("init");
			$oJob->setChannel($oData);
			return $oJob->run();
		};
	}

	public function run(Job $oJob)
	{
		$oResult = $this->hThreadRuntime->run($this->cExecute, [$oJob, $this->oData]);
		return $oResult;
	}
}

$oJob=new Job;
$oAgent = new Agent;
$hThreadFuture = $oAgent->run($oJob);
var_dump($hThreadFuture->value());
print "Not reachable";

This seems to fall over right inside ::run(). This blows the whole process away, not just the threads. Is Parallel swallowing engine errors? Is it possible for Parallel to full emit any engine error encountered? Is this related to #72 ?

PS Crazy thought, this isn't related to the object typehint for cExecute... is it?

feamsr00 avatar Jul 12 '19 19:07 feamsr00

Killing the whole process would likely indicate a segfault or some other similar issue.

dktapps avatar Jul 18 '19 10:07 dktapps

Seems like you're right! The code above produced an Process exited. Exit code - 0xc0000005 [an Access Violation, but no dump] and the code below...

<<<main test.php>>>
<?php

namespace JobTest;

use parallel\Runtime as Thread_Runtime;
use parallel\Channel as Thread_Channel;

use Closure;

const DEFAULT_CHANNEL_CAPACITY = 1024;
const THREAD_ENVIRONMENT_STUB = __DIR__.DIRECTORY_SEPARATOR.'ThreadBooter.php';

require_once __DIR__.DIRECTORY_SEPARATOR.'ReciprocalPropertyTestClasses.php';

class Agent
{
	protected $hThreadRuntime = False;

	public function __construct()
	{
		if (!$this->hThreadRuntime)
		{
			$this->hThreadRuntime=new Thread_Runtime(THREAD_ENVIRONMENT_STUB);
		}
		$this->cExecute = function ($oObj)
		{
			error_reporting(E_ALL);
			ini_set('error_log', __DIR__.DIRECTORY_SEPARATOR.'thread-engine.log');
			error_log("init");
		};
	}

	public function run($oObj)
	{
		$oResult = $this->hThreadRuntime->run($this->cExecute, [$oObj]);
		return $oResult;
	}
}

$oObj1 = new \Obj1;
$oObj2 = new \Obj2;

$oObj1->set($oObj2);


$oAgent = new Agent;
print "Run";
$hThreadFuture = $oAgent->run($oObj1);
var_dump($hThreadFuture->value());
print "Not reachable";
<<<ReciprocalPropertyTestClasses.php>>>
<?php

class Obj1
{
    public function set(Obj2 $obj)
    {
        $obj->set($this);
        $this->obj = $obj;
    }
}

class Obj2
{
    public function set(Obj1 $obj)
    {
        $this->obj = $obj;
    }
}

produced a stack overflow exception:

***********************
*  EXCEPTION DETAILS  *
***********************

DetailID = 1
	Count:    1
	Exception #:  0XC00000FD
	Stack:        


Thread 0 - System ID 15484

Entry point   php!sapi_cli_single_write+a000 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.531 
Time spent in kernel mode   0 Days 00:00:00.359 
		php_parallel!get_module+0x26d0
		php_parallel!get_module+0x2765
		php_parallel!get_module+0x2765
		php_parallel!get_module+0x2765
		php_parallel!get_module+0x2765
		php_parallel!get_module+0x2765
<///snip///>
Thread 1 - System ID 16084



Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.015 
Time spent in kernel mode   0 Days 00:00:00.046 




Instruction Address


[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 



Thread 2 - System ID 14464



Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.015 
Time spent in kernel mode   0 Days 00:00:00.000 




Instruction Address


[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 



Thread 3 - System ID 14112



Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.000 
Time spent in kernel mode   0 Days 00:00:00.000 




Instruction Address


[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 



Thread 4 - System ID 9396



Entry point   ucrtbase!thread_start 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.000 
Time spent in kernel mode   0 Days 00:00:00.015 


This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.



Instruction Address


[0x7ff80b4402b4] ntdll!NtWaitForMultipleObjects+14 
[0x7ff8080dcc7e] KERNELBASE!WaitForMultipleObjectsEx+fe 
[0x7ff8080dcb6e] KERNELBASE!WaitForMultipleObjects+e 
[0x7fffeb286757] pthreadVC2!ptw32_cancelable_wait+67 
[0x7fffeb288cf4] pthreadVC2!sem_timedwait+f4 
[0x7fffeb286b78] pthreadVC2!ptw32_cond_timedwait+208 
[0x7fffe59396cd] php_parallel!get_module+85bd 
[0x7fffe593b3e6] php_parallel!get_module+a2d6 
[0x7fffeb2883cc] pthreadVC2!ptw32_threadStart+17c 
[0x7ff8076b1ffa] ucrtbase!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+3a 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 

Let me know if I should open a second issue.

I've not needed to collect stacks from php before so I've followed (with limited success) the steps here: https://bugs.php.net/bugs-generating-backtrace-win32.php. Whats the next step?

feamsr00 avatar Jul 20 '19 07:07 feamsr00

I tried the preview version of windbg and I think I was able to tease out the trace for the access violation crash....

[0x0]   php7ts!zend_hash_find + 0xa   
[0x1]   php_parallel!get_module + 0x2079   
[0x2]   php_parallel!get_module + 0x9873   
[0x3]   php_parallel!get_module + 0x955f   
[0x4]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x71f2   
[0x5]   php7ts!destroy_op_array + 0x7da   
[0x6]   php7ts!execute_ex + 0x5f   
[0x7]   php_pthreads!get_module + 0x42   
[0x8]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x6a20   
[0x9]   php7ts!libiconv_set_relocation_prefix + 0x3136   
[0xa]   php7ts!execute_ex + 0x5f   
[0xb]   php_pthreads!get_module + 0x42   
[0xc]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x6a20   
[0xd]   php7ts!zend_execute + 0x1a8   
[0xe]   php7ts!zend_execute_scripts + 0xb9   
[0xf]   php7ts!php_execute_script + 0x261   
[0x10]   php!sapi_cli_single_write + 0x1320   
[0x11]   php!sapi_cli_single_write + 0x2298   
[0x12]   php!sapi_cli_single_write + 0x9f88   
[0x13]   KERNEL32!BaseThreadInitThunk + 0x14   
[0x14]   ntdll!RtlUserThreadStart + 0x21   

I think I got a disassembly of faulting instruction... (x64aa)

   php7ts!zend_hash_find:
00007ffb`46b364a0 48896c2418         mov     qword ptr [rsp+18h], rbp
00007ffb`46b364a5 57                 push    rdi
00007ffb`46b364a6 4883ec20           sub     rsp, 20h
00007ffb`46b364aa 4c8b4208           mov     r8, qword ptr [rdx+8] ds:00000000`00000018=????????????????
00007ffb`46b364ae 488bfa             mov     rdi, rdx
00007ffb`46b364b1 4c8bc9             mov     r9, rcx
00007ffb`46b364b4 4d85c0             test    r8, r8
00007ffb`46b364b7 0f85cb000000       jne     php7ts!zend_hash_find+0xe8 (00007ffb`46b36588)

Is this helpful at all?

feamsr00 avatar Jul 25 '19 21:07 feamsr00

According to the documentation, it may not be possible to pass objects as argument https://www.php.net/manual/en/parallel-runtime.run.php

Tuckbros avatar Sep 21 '19 20:09 Tuckbros

I need a minimal, complete, simple, reproducing test case to investigate.

krakjoe avatar Nov 29 '19 04:11 krakjoe

I've got the same issue with amphp's parallel when using the parallel extension: Sudden SIGSEGV on parallel\Runtime::run(). I'm trying to build something to reproduce.

tux-rampage avatar Apr 28 '20 08:04 tux-rampage

This seems to be some complex edge case, maybe in combination with a specific extension and current state. I'm not quite sure how to build something to reproduce this behavior. I'll let you know if I can dig up something when I got some spare time. If someone finds a way in the meantime it would be great 👍

tux-rampage avatar Apr 28 '20 18:04 tux-rampage

I'll reinvestigate soon, I had to re-engineer my project to allow process models. In the meantime, has anyone with parallel installed managed to copy & paste the code I left in the 3rd comment at the top of this issue? It goes in 2 files and depends only on parallel...

Run that with the debug version of php and I see no reason why you shouldn't get the same trace (unless of course it is an environmental issue but I tried that on 2 workstations and a server and they all dumped...)

feamsr00 avatar Jun 06 '20 07:06 feamsr00

Try extension with #139 PR, at current moment can't test your scripts. Also, didnt test it on windows

dotfry avatar Jun 14 '20 02:06 dotfry

EDIT: Seems like the author already knew about this, see #64

I'll reinvestigate soon, I had to re-engineer my project to allow process models. In the meantime, has anyone with parallel installed managed to copy & paste the code I left in the 3rd comment at the top of this issue? It goes in 2 files and depends only on parallel...

Run that with the debug version of php and I see no reason why you shouldn't get the same trace (unless of course it is an environmental issue but I tried that on 2 workstations and a server and they all dumped...)

I like this test case a lot. I haven't investigated thoroughly, but I highly suspect it has to do with how objects are copied over. Pure speculation for now...

obj1 points to obj2 which has a pointer to obj1

i.e.,

obj1['obj']->obj2 and obj2['obj'] -> obj1

So when a pass-by-value is attempted, you hit a stack overflow if you naively clone objects. E.g., var_dump on obj1 and you'll see PHP knows what it's getting itself into.

I think what needs to happen is to use a reference-table/mapping when doing the copy.

E.g.,

int *Obj1Ptr = &Obj1;
Obj1New = (shallow clone) Obj1
InternalMap[Obj1Ptr] = &Obj1New;
Obj2New = (shallow clone) Obj2
if (InternalMap[&Obj2->Obj]) {
    Obj2New->Obj = InternalMap[&Obj2->Obj];
}

I know user land doesn't get to work with this stuff, but I'm not sure where there is handled in the extension.

I'm probably way over my head here, but I also don't see how #139 resolves this

awderh avatar Sep 16 '20 08:09 awderh

This causes overflow, see #64

krakjoe avatar Oct 14 '20 09:10 krakjoe

[0x1] php_parallel!get_module + 0x2079 […] [0x7] php_pthreads!get_module + 0x42

Running parallel and pthreads simultaneously might not be the best idea.

cmb69 avatar Nov 11 '21 16:11 cmb69

[0x1] php_parallel!get_module + 0x2079 […] [0x7] php_pthreads!get_module + 0x42

Running parallel and pthreads simultaneously might not be the best idea.

they didnt confilict in internals if you dont start using them both.

dotfry avatar Nov 16 '21 22:11 dotfry

Fixed in 67413b193ecda80a32335a76eb25454a4d54bb32

krakjoe avatar May 18 '24 17:05 krakjoe