OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

[BUG] oiiotool crash/undefined behavior at exit

Open ThiagoIze opened this issue 4 years ago • 2 comments

Describe the bug In Windows, oiiotool returns 0 from main, but the process returns 3 to the shell. This is because the c runtime abort function is terminating the process due to an error and it returns 3.

In Linux a few random times I got segfaults at exit. Valgrind reports errors of the form:

==56783== Command: oiiotool --info testrender.tif
==56783== 
testrender.tif       :  320 x  240, 4 channel, uint8 tiff
==56783== Invalid read of size 8
==56783==    at 0x78610E: OpenImageIO_v2_4_0::ImageBufImpl::clear() (src/libOpenImageIO/imagebuf.cpp:703)
==56783==    by 0x785D95: OpenImageIO_v2_4_0::ImageBufImpl::~ImageBufImpl() (src/libOpenImageIO/imagebuf.cpp:485)
==56783==    by 0x78357D: OpenImageIO_v2_4_0::ImageBuf::impl_deleter(OpenImageIO_v2_4_0::ImageBufImpl*) (src/libOpenImageIO/imagebuf.cpp:333)
==56783==    by 0x786F6E: ~unique_ptr (unique_ptr.h:292)
==56783==    by 0x786F6E: OpenImageIO_v2_4_0::ImageBuf::~ImageBuf() (src/libOpenImageIO/imagebuf.cpp:569)
==56783==    by 0x4288D1: std::_Sp_counted_ptr<OpenImageIO_v2_4_0::ImageBuf*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:377)
==56783==    by 0x49CD2B: _M_release (shared_ptr_base.h:155)
==56783==    by 0x49CD2B: ~__shared_count (shared_ptr_base.h:730)
==56783==    by 0x49CD2B: ~__shared_ptr (shared_ptr_base.h:1169)
==56783==    by 0x49CD2B: _Destroy<std::shared_ptr<OpenImageIO_v2_4_0::ImageBuf> > (stl_construct.h:98)
==56783==    by 0x49CD2B: __destroy<std::shared_ptr<OpenImageIO_v2_4_0::ImageBuf> *> (stl_construct.h:108)
==56783==    by 0x49CD2B: _Destroy<std::shared_ptr<OpenImageIO_v2_4_0::ImageBuf> *> (stl_construct.h:136)
==56783==    by 0x49CD2B: _Destroy<std::shared_ptr<OpenImageIO_v2_4_0::ImageBuf> *, std::shared_ptr<OpenImageIO_v2_4_0::ImageBuf> > (stl_construct.h:206)
==56783==    by 0x49CD2B: ~vector (stl_vector.h:677)
==56783==    by 0x49CD2B: ~SubimageRec (src/oiiotool/oiiotool.h:375)
==56783==    by 0x49CD2B: _Destroy<OpenImageIO_v2_4_0::OiioTool::SubimageRec> (stl_construct.h:98)
==56783==    by 0x49CD2B: __destroy<OpenImageIO_v2_4_0::OiioTool::SubimageRec *> (stl_construct.h:108)
==56783==    by 0x49CD2B: _Destroy<OpenImageIO_v2_4_0::OiioTool::SubimageRec *> (stl_construct.h:136)
==56783==    by 0x49CD2B: _Destroy<OpenImageIO_v2_4_0::OiioTool::SubimageRec *, OpenImageIO_v2_4_0::OiioTool::SubimageRec> (stl_construct.h:206)
==56783==    by 0x49CD2B: ~vector (stl_vector.h:677)
==56783==    by 0x49CD2B: OpenImageIO_v2_4_0::OiioTool::ImageRec::~ImageRec() (src/oiiotool/oiiotool.h:415)
==56783==    by 0x49DB81: std::_Sp_counted_ptr<OpenImageIO_v2_4_0::OiioTool::ImageRec*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:377)
==56783==    by 0x48DE1C: _M_release (shared_ptr_base.h:155)
==56783==    by 0x48DE1C: ~__shared_count (shared_ptr_base.h:730)
==56783==    by 0x48DE1C: ~__shared_ptr (shared_ptr_base.h:1169)
==56783==    by 0x48DE1C: OpenImageIO_v2_4_0::OiioTool::Oiiotool::~Oiiotool() (src/oiiotool/oiiotool.h:69)
==56783==    by 0x5CB7C98: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==56783==    by 0x5CB7CE6: exit (in /usr/lib64/libc-2.17.so)
==56783==    by 0x5CA050B: (below main) (in /usr/lib64/libc-2.17.so)
==56783==  Address 0x78d4200 is 0 bytes inside a block of size 66,176 free'd
==56783==    at 0x4C2B06D: free (vg_replace_malloc.c:538)
==56783==    by 0x863B4A: std::_Sp_counted_deleter<OpenImageIO_v2_4_0::pvt::ImageCacheImpl*, void (*)(OpenImageIO_v2_4_0::pvt::ImageCacheImpl*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::
_M_dispose() (shared_ptr_base.h:471)
==56783==    by 0x85B87B: _M_release (shared_ptr_base.h:155)
==56783==    by 0x85B87B: ~__shared_count (shared_ptr_base.h:730)
==56783==    by 0x85B87B: std::__shared_ptr<OpenImageIO_v2_4_0::pvt::ImageCacheImpl, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==56783==    by 0x5CB7C98: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==56783==    by 0x5CB7CE6: exit (in /usr/lib64/libc-2.17.so)
==56783==    by 0x5CA050B: (below main) (in /usr/lib64/libc-2.17.so)
==56783==  Block was alloc'd at
==56783==    at 0x4C2C375: memalign (vg_replace_malloc.c:906)
==56783==    by 0x4C2C43F: posix_memalign (vg_replace_malloc.c:1070)
==56783==    by 0xABA1CB: OpenImageIO_v2_4_0::aligned_malloc(unsigned long, unsigned long) (src/libutil/sysutil.cpp:637)
==56783==    by 0x85B5CD: aligned_new<OpenImageIO_v2_4_0::pvt::ImageCacheImpl> (src/include/OpenImageIO/platform.h:572)
==56783==    by 0x85B5CD: OpenImageIO_v2_4_0::ImageCache::create(bool) (src/libtexture/imagecache.cpp:3899)
==56783==    by 0x47DFF8: main (src/oiiotool/oiiotool.cpp:7010)

This happens because of incorrect static order initialization (and consequently destruction), at least in my build of OIIO using static libraries. There's the static Oiiotool ot in oiiotool.cpp and there's static std::shared_ptr<ImageCacheImpl> shared_image_cache; in imagecache.cpp. If shared_image_cache is destroyed first, because it was initialized after ot (maybe by chance, maybe by design) it will leave the ot with a dangling pointer to the shared cache which of course will fail in undefined ways when we try to later on access it.

Ignoring the issue of mixing shared pointers with naked pointers, I can imagine two possible fixes:

  1. Get rid of the global scope ot and instead create and delete it manually (or through a smart pointer) in main(). You could still keep a global scope Oiiotool* ot so that the code changes would be minimal (. becomes ->, etc.).
  2. If ot can't be modified for some reason, we can due something a bit hacky that looks like:
index e3c87ad0..86778f47 100644
--- a/src/libOpenImageIO/imagebuf.cpp
+++ b/src/libOpenImageIO/imagebuf.cpp
@@ -28,7 +28,7 @@ OIIO_STRONG_PARAM_TYPE(DoLock, bool);
 
 static atomic_ll IB_local_mem_current;
 
-
+extern bool shared_image_cache_destroyed;
 
 ROI
 get_roi(const ImageSpec& spec)
@@ -243,6 +243,8 @@ public:
     // Invalidate the file in our imagecache and the shared one
     void invalidate(ustring filename, bool force)
     {
+       if (shared_image_cache_destroyed)
+          return; // this will leak private cache if private cache is also held in a global scope object...
         ImageCache* shared_imagecache = ImageCache::create(true);
         OIIO_DASSERT(shared_imagecache);
         if (m_imagecache)
@@ -700,7 +702,8 @@ void
 ImageBufImpl::clear()
 {
     if (storage() == ImageBuf::IMAGECACHE && m_imagecache && !m_name.empty()) {
-        m_imagecache->close(m_name);
+       if (!shared_image_cache_destroyed)
+          m_imagecache->close(m_name);
         invalidate(m_name, false);
     }
     free_pixels();
diff --git a/src/libtexture/imagecache.cpp b/src/libtexture/imagecache.cpp
index 60c07f90..ea7ccf16 100644
--- a/src/libtexture/imagecache.cpp
+++ b/src/libtexture/imagecache.cpp
@@ -43,8 +43,14 @@ spin_mutex ImageCacheImpl::m_perthread_info_mutex;
 
 }  // namespace pvt
 
+bool shared_image_cache_destroyed = false; // this must be before shared_image_cache
 namespace {  // anonymous
 
+struct SharedImageCacheDestroyedReporter {
+   SharedImageCacheDestroyedReporter(bool& ref) : m_ref(ref) {}
+   ~SharedImageCacheDestroyedReporter() {m_ref = true;}
+   bool& m_ref;
+} SharedImageCacheDestroyedReporterInstance(shared_image_cache_destroyed);
 static std::shared_ptr<ImageCacheImpl> shared_image_cache;
 static spin_mutex shared_image_cache_mutex;

I gave the second option a try, because it's what first came to mind and seemed like fewer changes and it fixes the valgrind errors, non-zero returns, and crashes. However, I would suggest going with option 1 since it's less hacky. Note that option 1 only solves the problem in oiiotool while option 2 could maybe avoid problems in other code bases that do something similar to oiiotool. What do others prefer?

To Reproduce oiiotool --info somefile.tif or probably any other oiiotool command that loads a file should do this. valgrind on linux always shows the errors for me. However, I only tested static libraries, so it's possible a shared library might by chance have the initialization of the two variables in a different order and you won't get any bugs. It could even be that your static libs are linked together in a different way from mine and so you don't get this problem...

Platform information:

  • OIIO branch/version: I'm running a fork that has 1e488aed merged in. So only a few weeks old.

IF YOU ALREADY HAVE A CODE FIX: There is no need to file a separate issue, please just go straight to making a pull request.

I do have a code fix shown above, but I don't know if it's the fix you'd want (I don't even know if it's what I'd want). Let me know what you think is the right approach.

ThiagoIze avatar Jan 23 '22 06:01 ThiagoIze

FWIW I'm running into the same issue while trying to run OIIO tests on current master (2.5) -- the python test runner gets 3 as exit code, even if main() returns zero.

aras-p avatar Oct 07 '22 10:10 aras-p

See #3591

Does that fully address this issue? Or are there other structural issues we should be thinking about?

lgritz avatar Oct 07 '22 19:10 lgritz