OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

Segfault with jpeg2000 threading

Open etheory opened this issue 3 years ago • 40 comments

https://github.com/OpenImageIO/oiio/blob/5767c62394872b89d3b39a1c303a4378945fc895/src/jpeg2000.imageio/jpeg2000input.cpp#L235-L242

I am reliably getting segfaults for this set of code. If I comment it out, the issue goes away:

received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafa4d700 (LWP 123835)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fffe0619d8f in opj_codec_set_threads () from /tmp/rez_context_2Z5wW6/LD_LIBRARY_PATH/libopenjp2.so.7
#2  0x00007fffeab7c86c in OpenImageIO_v2_4::Jpeg2000Input::open (this=0x7ffdd06c0de0, name=..., p_spec=...) at /scratch/dev/OpenImageIO/src/jpeg2000.imageio/jpeg2000input.cpp:241
#3  0x00007fffeaa62f01 in OpenImageIO_v2_4::ImageInput::valid_file (this=0x7ffdd06c0de0, filename=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageinput.cpp:97
#4  0x00007fffeaa851fa in OpenImageIO_v2_4::ImageInput::create (filename=..., do_open=false, config=<optimized out>, ioproxy=ioproxy@entry=0x0, plugin_searchpath=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageioplugin.cpp:743
#5  0x00007fffeaa86bc3 in OpenImageIO_v2_4::ImageInput::create (filename=..., do_open=true, config=0x40, plugin_searchpath=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageioplugin.cpp:618
#6  0x00007fffeaaf9666 in OpenImageIO_v2_4::pvt::ImageCacheFile::open (this=this@entry=0x7ffdd064b040, thread_info=thread_info@entry=0x7ffdd06ca260) at /scratch/dev/OpenImageIO/src/libtexture/imagecache.cpp:499

etheory avatar Apr 20 '22 05:04 etheory

Do you know which version of OpenJPEG you are using? And does it work for all jpeg2000 files? Or just particular ones?

lgritz avatar Apr 20 '22 05:04 lgritz

We seem to be using openjpeg-2.3.1. And it's happening for all of the .jpg files I've tried to load so far. Though, concerningly, some of them are actually .tif files incorrectly renamed from .jpg's (facepalm).

etheory avatar Apr 20 '22 06:04 etheory

    OIIO_ASSERT(m_image == nullptr);
    if (!opj_read_header(m_stream, m_codec, &m_image)) {
        errorfmt("Could not read Jpeg2000 header");
        close();
        return false;
    }

Now I'm getting a segfault here in opj_read_header:

received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb0a51700 (LWP 141003)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fffe0619ea7 in opj_read_header () from /tmp/rez_context_2Z5wW6/LD_LIBRARY_PATH/libopenjp2.so.7
#2  0x00007fffeab7c866 in OpenImageIO_v2_4::Jpeg2000Input::open (this=0x7ffde06134d0, name=..., p_spec=...) at /scratch/dev/OpenImageIO/src/jpeg2000.imageio/jpeg2000input.cpp:261
#3  0x00007fffeaa62eb1 in OpenImageIO_v2_4::ImageInput::valid_file (this=0x7ffde06134d0, filename=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageinput.cpp:97
#4  0x00007fffeaa851aa in OpenImageIO_v2_4::ImageInput::create (filename=..., do_open=false, config=<optimized out>, ioproxy=ioproxy@entry=0x0, plugin_searchpath=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageioplugin.cpp:743
#5  0x00007fffeaa86b73 in OpenImageIO_v2_4::ImageInput::create (filename=..., do_open=false, config=0x7ffde097a7f0, plugin_searchpath=...) at /scratch/dev/OpenImageIO/src/libOpenImageIO/imageioplugin.cpp:618
#6  0x00007fffeaaf9616 in OpenImageIO_v2_4::pvt::ImageCacheFile::open (this=this@entry=0x7ffde0009a60, thread_info=thread_info@entry=0x7ffde0000a70) at /scratch/dev/OpenImageIO/src/libtexture/imagecache.cpp:499
#7  0x00007fffeab002af in OpenImageIO_v2_4::pvt::ImageCacheImpl::verify_file (this=this@entry=0x2117d80, tf=tf@entry=0x7ffde0009a60, thread_info=thread_info@entry=0x7ffde0000a70, header_only=true) at /scratch/dev/OpenImageIO/src/libtexture/imagecache.cpp:1280
#8  0x00007fffeab0a94e in OpenImageIO_v2_4::pvt::ImageCacheImpl::get_image_info (this=this@entry=0x2117d80, file=0x7ffde0009a60, thread_info=thread_info@entry=0x7ffde0000a70, subimage=subimage@entry=0, miplevel=0, miplevel@entry=32767, dataname=..., 

I'll see if I can get you permission to send the file to you.

etheory avatar Apr 20 '22 06:04 etheory

Are the files jpeg? Or jpeg2000?

lgritz avatar Apr 20 '22 06:04 lgritz

@lgritz I have no idea. It's got a .tif extension. If I do oiiotool --info I get: : 4096 x 4096, 1 channel, uint8 jpeg

And it doesn't crash. The ONLY code that makes it crash is when I do: OIIO::ImageBuf image(inputFilename);

That's where the stack traces are coming from.

I'll get permission to forward on the file to you.

etheory avatar Apr 20 '22 06:04 etheory

I can confirm that a stand-alone single-line c++ code that ONLY calls OIIO::ImageBuf image(inputFilename); with the problematic file segfaults. When I get home I'll send you the file.

etheory avatar Apr 20 '22 08:04 etheory

stationary_pencil02_paint_mask.zip

Here is the offending file (sorry I had to compress the .tif to a .zip else I couldn't attach it). This is from our USD ALab assets. One of the artists appears to have saved what appears to be a .jpg file as a .tif extension.

Our super-super customized old version of OpenImageIO loads it flawlessly, but the latest version on github segfaults.

As mentioned, if I try and load this in custom code that ONLY calls: OIIO::ImageBuf image(inputFilename);

on it, I get a segfault at: https://github.com/OpenImageIO/oiio/blob/43a8dbbed8eafa736b08ab6eda15fb9a89be68e3/src/jpeg2000.imageio/jpeg2000input.cpp#L259, but only if I have already commented out: https://github.com/OpenImageIO/oiio/blob/43a8dbbed8eafa736b08ab6eda15fb9a89be68e3/src/jpeg2000.imageio/jpeg2000input.cpp#L235-L242 else it segfaults earlier there.

I think what is happening is that the .tif loader is invoked, fails, then OIIO then attempts to load the file with all of the registered loaders, and jpg2000 somehow wins and tries the load the file, then fails and somehow segfaults.

Please let me know if you are able to repro.

THANK YOU!

etheory avatar Apr 20 '22 11:04 etheory

Here's what I know so far:

It's a JPEG file (despite the .tif extension), and not jpeg2000. So what's happening is that when the TIFF reader fails, OIIO then just tries every reader until it finds one that will open it (or fails if none will do so). So somehow, when it tries the JPEG-2000 reader, it crashes on that file.

It works fine on my Mac laptop -- that is, it does figure out it's a jpeg file, so whatever other plugins it tries at least don't crash. But I'm using OpenJPEG 2.4, so it's possible that something has been fixed. Next I'll try at work (Linux, older version of OpenJPEG) and see if I can get it to fail there. Also, I should double check that on my Mac it really is trying the OpenJPEG reader on it.

lgritz avatar Apr 20 '22 17:04 lgritz

Thanks @lgritz let me try upgrading to OpenJPEG-2.4 and see if that helps.

etheory avatar Apr 20 '22 23:04 etheory

Well, I tried on Linux with OpenJPEG-2.3 and still couldn't repro...

But I am going to try to propose a patch anyway that will do a better job of quickly identifying that it's not really a jpeg-2000 file and taking an early out before it gets into the openjpeg code, which maybe can't handle certain not-j2k cases without getting into trouble.

lgritz avatar Apr 21 '22 01:04 lgritz

Interesting. Maybe it's some esoteric issue with our specific openjpeg library then.... that changes sounds very useful thanks. Once that's done I'll pull from it and try again. In the meantime I'll also try a different openjpeg version, maybe that will also help. I appreciate your time thanks.

etheory avatar Apr 21 '22 01:04 etheory

Trying right now with openjpeg-2.4.0. I'll let you know how it goes.

etheory avatar Apr 21 '22 02:04 etheory

No good, it still segfaults in exactly the same way :-(.

etheory avatar Apr 21 '22 02:04 etheory

Dammit, I think I found the issue. It looks like the build I'm doing isn't linking agains the correct openjpeg. So it's probably a build error. Lemme keep looking, but this would certainly be a much more sensible explanation than any of the above. I'll keep you posted....

etheory avatar Apr 21 '22 02:04 etheory

It turns out that no, that's not the issue. The build is correct, and it's linking against the correct openjpeg library. So the mystery continues. In the meantime, I'm changing my local version to just return false in the jpeg2000 input loader, and see if that helps.

etheory avatar Apr 21 '22 03:04 etheory

Skipping the jpeg2000 loader by returning false from it fixes the problem. So something on our system is definitely broken when calling any version of the jpeg2000 library. It just instantly segfaults.

etheory avatar Apr 21 '22 03:04 etheory

My hypothesis is that there is simply a flaw in openjpeg where certain nonsense bit patterns that aren't valid OpenJPEG files can cause a crash. It may be somewhat nondeterministic, not in the sense that it doesn't repeat every time for any particular file and build of OIIO, but in the sense that a different set of dependencies (even aside from OpenJPEG) might shift things around in memory just enough for it to be asymptomatic, so is hard for others to reproduce.

Malformed input should never crash a reader, but it's also understandable that OpenJPEG may not be especially hardened to deal with all possible problems that can result from reading a file that isn't jpeg-2000 at all. Some libraries are better than others.

Many of our readers in OIIO implement a ImageInput::valid_file() method that does a simple sanity check "does this appear to be a file of this type" -- usually just reading the first few bytes and looking for the right "magic number", without invoking a true open with the underlying library. The jpeg-2000 reader does not! And so any file it's asked to look at will try to read it earnestly as a jpeg-2000, and that's where things go awry.

As confirmation, @etheory, can you see if you are able to read legit j2k files? Does it only fail for files that are NOT j2k, but that it tries to read with the j2k reader?

I still think the best remedy is to implement a decent valid_file that can't crash. I'll take a stab at that and we'll see if that fixes things for you. (Today was just meetings almost all day long. I'll have more time to look at this tomorrow.)

lgritz avatar Apr 21 '22 05:04 lgritz

I completely agree with your hypothesis, that certainly seems like the most plausible theory. I'll test with some j2k files and see what happens and let you know.

etheory avatar Apr 21 '22 05:04 etheory

There don't appear to be any j2k test images in https://github.com/OpenImageIO/oiio-images. I'll go hunting online.

etheory avatar Apr 21 '22 05:04 etheory

https://www.itu.int/net/ITU-T/sigdb/speimage/ImageForm-s.aspx?val=10100803 Click on J2KP4files.zip

That's what we use for CI

lgritz avatar Apr 21 '22 05:04 lgritz

Hmmm... now I'm not sure if CI does it! But that's the set I test locally.

lgritz avatar Apr 21 '22 05:04 lgritz

BTW, you don't need to change the source code to disable j2k... you can just build with -DENABLE_jpeg2000=0 That trick works for any of our format names!

lgritz avatar Apr 21 '22 05:04 lgritz

If you never deal with j2k files at your studio, you can just skip the whole thing.

lgritz avatar Apr 21 '22 05:04 lgritz

It's also segfaulting with valid jpeg2000 files. Here is one example: https://www.fnordware.com/j2k/relax.jp2

etheory avatar Apr 21 '22 06:04 etheory

It's still segfaulting for all of the files in the examples you suggested, so I don't think it's what we originally though. For me, jpeg2000 just seems completely busted. And it doesn't matter which version of the library I use, they all do the same thing.

etheory avatar Apr 21 '22 06:04 etheory

That file works for me.

Ok, so that seems like my hypothesis is not right at all. I don't have a Plan B... since my idea was to write a nice valid_file() for jpeg2000, that wouldn't really help if you are crashing on legit files -- except if it just helps you avoid the j2k reader altogether, but in that case, simply disabling it from being built is just as good (unless you actually need to read j2k files, then it's back to the drawing board).

lgritz avatar Apr 21 '22 06:04 lgritz

It segfaults every time it calls: opj_codec_set_threads If I comment out that, it then segfaults at: opj_read_header

But none of this happens from oiiotool or from iv, it only happens when calling OIIO::ImageBuf image(inputFilename); which makes absolutely no sense at all.

So I agree, disabling the jpeg2000 plugin seems to be the only option. So weird....

etheory avatar Apr 21 '22 06:04 etheory

Oh, wow the "works from oiiotool or iv" angle is another confusing one. Because underneath both of those... is also just an ImageBuf, nothing special.

lgritz avatar Apr 21 '22 06:04 lgritz

So this is your app that's making the ImageBuf that leads to the crash?

Is it possible that something else in your app, or the way it's compiled and run, is somehow interfering weirdly with openjpeg? Some compile or link option, weird mixing of ABIs, use of nonstandard memory manager or incompatible thread_local model, something like that? I'm grasping at straws here, I know. Just trying to think of what would be different between oiiotool, say, and your app that's experiencing the crash.

lgritz avatar Apr 21 '22 06:04 lgritz

So this is your app that's making the ImageBuf that leads to the crash?

Is it possible that something else in your app, or the way it's compiled and run, is somehow interfering weirdly with openjpeg? Some compile or link option, weird mixing of ABIs, use of nonstandard memory manager or incompatible thread_local model, something like that? I'm grasping at straws here, I know. Just trying to think of what would be different between oiiotool, say, and your app that's experiencing the crash.

In the case I'm testing it now, it's literally just a single-line stand-alone application.

But the ABI angle is definitely where I'm currently at. For the moment disabling jpeg2000 is working fine for what I need to do. Thanks for all your help on this.

etheory avatar Apr 21 '22 07:04 etheory