compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Crash issue due to refcount error with clang compiler

Open Yanfeng-Mi opened this issue 1 year ago • 9 comments

bool AbstractBuffersPool<PoolT, BufferType, BufferParentType>::isPoolBuffer(const BufferParentType *buffer) const { static_assert(std::is_base_of_v<BufferParentType, BufferType>);

return (buffer && this->mainStorage.get() == buffer);   // for clang compiler, unique_ptr is assigned nullptr firstly

}

Yanfeng-Mi avatar Jul 11 '24 02:07 Yanfeng-Mi

ref: https://stackoverflow.com/questions/54237128/does-stdunique-ptr-set-its-underlying-pointer-to-nullptr-inside-its-destructor

Yanfeng-Mi avatar Jul 11 '24 02:07 Yanfeng-Mi

Hi @Yanfeng-Mi could you share more details of the issue? Could you share callstack?

JablonskiMateusz avatar Jul 11 '24 15:07 JablonskiMateusz

the callstack as following: #00 pc 0000000000067c0e /apex/com.android.runtime/lib64/bionic/libc.so (abort+206) (BuildId: 3f70d7b54a58b7ab204a797b00a4a7cb) #01 pc 0000000000432da8 /vendor/lib64/libigdrcl.so (NEO::abortExecution()+8) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #02 pc 0000000000432e87 /vendor/lib64/libigdrcl.so (NEO::abortUnrecoverable(int, char const*)+55) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #03 pc 000000000050b1e5 /vendor/lib64/libigdrcl.so (NEO::ReferenceTrackedObjectNEO::Context::decRefInternal()+117) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #04 pc 0000000000598d52 /vendor/lib64/libigdrcl.so (NEO::MemObj::~MemObj()+1458) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #05 pc 00000000005817d4 /vendor/lib64/libigdrcl.so (NEO::Buffer::~Buffer()+20) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #06 pc 0000000000603844 /vendor/lib64/libigdrcl.so (NEO::BufferHwNEO::XeHpcCoreFamily::~BufferHw()+20) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #07 pc 0000000000603868 /vendor/lib64/libigdrcl.so (NEO::BufferHwNEO::XeHpcCoreFamily::~BufferHw()+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #08 pc 000000000052c551 /vendor/lib64/libigdrcl.so (std::__1::default_deleteNEO::Buffer::operator()(NEO::Buffer*) const+49) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #09 pc 0000000000520683 /vendor/lib64/libigdrcl.so (std::__1::unique_ptr<NEO::Buffer, std::__1::default_deleteNEO::Buffer >::reset(NEO::Buffer*)+99) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #10 pc 0000000000526ac8 /vendor/lib64/libigdrcl.so (std::__1::unique_ptr<NEO::Buffer, std::__1::default_deleteNEO::Buffer >::~unique_ptr()+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #11 pc 00000000005207a6 /vendor/lib64/libigdrcl.so (NEO::AbstractBuffersPool<NEO::Context::BufferPool, NEO::Buffer, NEO::MemObj>::~AbstractBuffersPool()+54) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #12 pc 0000000000520c14 /vendor/lib64/libigdrcl.so (NEO::Context::BufferPool::~BufferPool()+20) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #13 pc 0000000000522378 /vendor/lib64/libigdrcl.so (std::__1::allocatorNEO::Context::BufferPool::destroy(NEO::Context::BufferPool*)+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #14 pc 000000000052234c /vendor/lib64/libigdrcl.so (void std::__1::allocator_traits<std::__1::allocatorNEO::Context::BufferPool >::__destroyNEO::Context::BufferPool(std::__1::integral_constant<bool, true>, std::__1::allocatorNEO::Context::BufferPool&, NEO::Context::BufferPool*)+28) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #15 pc 000000000052231c /vendor/lib64/libigdrcl.so (void std::__1::allocator_traits<std::__1::allocatorNEO::Context::BufferPool >::destroyNEO::Context::BufferPool(std::__1::allocatorNEO::Context::BufferPool&, NEO::Context::BufferPool*)+28) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #16 pc 00000000005222cf /vendor/lib64/libigdrcl.so (std::__1::__vector_base<NEO::Context::BufferPool, std::__1::allocatorNEO::Context::BufferPool >::__destruct_at_end(NEO::Context::BufferPool*)+95) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #17 pc 0000000000522217 /vendor/lib64/libigdrcl.so (std::__1::__vector_base<NEO::Context::BufferPool, std::__1::allocatorNEO::Context::BufferPool >::clear()+23) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #18 pc 0000000000528aad /vendor/lib64/libigdrcl.so (std::__1::vector<NEO::Context::BufferPool, std::__1::allocatorNEO::Context::BufferPool >::clear()+45) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #19 pc 000000000051cff8 /vendor/lib64/libigdrcl.so (NEO::AbstractBuffersAllocator<NEO::Context::BufferPool, NEO::Buffer, NEO::MemObj>::releaseSmallBufferPool()+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #20 pc 000000000051ca58 /vendor/lib64/libigdrcl.so (NEO::Context::~Context()+168) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #21 pc 000000000051d228 /vendor/lib64/libigdrcl.so (NEO::Context::~Context()+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #22 pc 0000000000514ebd /vendor/lib64/libigdrcl.so (NEO::unique_ptr_if_unusedNEO::Context::doDelete(NEO::Context*)+45) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #23 pc 000000000047bdd3 /vendor/lib64/libigdrcl.so (std::__1::unique_ptr<NEO::Context, void ()(NEO::Context)>::reset(NEO::Context*)+99) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #24 pc 000000000047bd68 /vendor/lib64/libigdrcl.so (std::__1::unique_ptr<NEO::Context, void ()(NEO::Context)>::~unique_ptr()+24) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #25 pc 0000000000439754 /vendor/lib64/libigdrcl.so (NEO::unique_ptr_if_unusedNEO::Context::~unique_ptr_if_unused()+20) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c) #26 pc 000000000043944b /vendor/lib64/libigdrcl.so (clReleaseContext+571) (BuildId: 5f66ed8b63a2c1f5fbd41c24c6ddde9c)

Yanfeng-Mi avatar Jul 12 '24 02:07 Yanfeng-Mi

@Yanfeng-Mi Thanks for reporting the issue. Could you share repro steps?

JablonskiMateusz avatar Jul 16 '24 09:07 JablonskiMateusz

@JablonskiMateusz To reproduce this issue, you need to recompile ocl rt driver with clang compiler toolsets. I found this issue on android platform on which clang compiler is used. it's not easy to rebuild the OCL RT with clang compiler tools on Ubuntu and many compiling issues needs to be resolve based on libc++. The root-cause of this issue is different behavior of unique_ptr destruction between gcc(libstdc++) and clang (libc++) . You can refer to my WA patch on android celadon projects: https://github.com/projectceladon/compute-runtime/commit/dcc1b6fc60518b25bb81cfd6450c129e089016b9

Yanfeng-Mi avatar Jul 19 '24 09:07 Yanfeng-Mi

Hi @Yanfeng-Mi ,

We’d like to know if this issue is still affecting you. If so, please provide an update or any additional information. If you have identified a solution, we kindly ask that you create a proper pull request (PR) with the necessary changes for review (https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md). Otherwise, we’ll close this issue after 30 days of inactivity. Your feedback is appreciated!

kgibala avatar Oct 15 '25 11:10 kgibala

@kgibala the issue is still reproducible. I will create PR for this issue.

Yanfeng-Mi avatar Oct 29 '25 15:10 Yanfeng-Mi

@kgibala Could you help review the PR?

Yanfeng-Mi avatar Oct 31 '25 09:10 Yanfeng-Mi

@Yanfeng-Mi Thank you for your contribution! We appreciate your effort in submitting this pull request. Your changes will be reviewed and evaluated through our standard process. We’ll keep you updated on any progress or feedback.

kgibala avatar Oct 31 '25 12:10 kgibala