how to use gpujpeg_huffman_gpu_decoder_decode for decoding
I ran the decode_to_raw_rgb.c example, and the decoding takes a long time. I found that it is calling gpujpeg_huffman_cpu_decoder_decode for decoding, and coder->segment_count = 1. I would like to ask how to configure it to use gpujpeg_huffman_gpu_decoder_decode for decoding in
Hi, good question. Well, I'll start from the end:
I would like to ask how to configure it to use gpujpeg_huffman_gpu_decoder_decode for decoding in
This option is not exposed via API (reason outlined in next paragraph), but you can comment this line out (I've modified the code a bit, in the former version just remove the condition segment_count < 32 and comment out this).
But that condition is there for a reason, namely that if you have just single segment, it is not much suitable to be decoded on the GPU, where just one thread would compute. For this reason it doesn't make much sense. If you had at least double-digit number of segments, we could debate about the 32 threshold... But you can try it - I'd be glad if you get back if you find something interesting.
If you write me more about you use case, maybe I'll be able to make some advice - GPUJPEG performance vastly depend on the use case, you can also look into FAQ -ideally many pictures processed in a batch, deployed restart interval; also if individual images' properties doesn't change every image is good.
Hi, thank you for your suggestion. I have made the changes based on your advice, and it indeed executes gpujpeg_huffman_gpu_decoder_decode, but the time is longer than gpujpeg_huffman_cpu_decoder_decode. It seems that coder->segment_count is automatically calculated rather than manually set by me. I would like to know how to set it. Additionally, my use case involves batch decoding of JPEG images, where images are sent frame by frame like a video stream, and I need to decode each frame individually.
It seems that coder->segment_count is automatically calculated rather than manually set by me.
yes
I would like to know how to set it.
You cannot. The segment count the given by properties of the JPEG file. Just FYI, it is related to the restart interval, eg. for 4000x3000 px, you have 500x375=187500 blocks (block is 8x8 px). Assuming restart interval is 100, you'll get 1875 segments (in the simple case). But this can be affected just when encoding the JPEG file(s).
If there is a single segment in the JPEG file, the Huffman decoder must run sequentially. And CPU is faster than GPU for a single thread. For eventual performance improvement, the CPU implementation must be improved, the deployment on GPU doesn't make AFAIK much sense here.