ImageSharp icon indicating copy to clipboard operation
ImageSharp copied to clipboard

Add support for HEIF based images

Open ynse01 opened this issue 2 years ago • 35 comments

Prerequisites

  • [X] I have written a descriptive pull-request title
  • [X] I have verified that there are no overlapping pull-requests open
  • [X] I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules :cop:.
  • [X] I have provided test coverage for my change (where applicable)

Description

This PR is an attempt to start with #1320

Implemented decoding and encoding of images based on a HEIF (ISO/IEC 23008-12:2022) container. These include (amongst others): HEIC, HIF and AVIF.

Please note this PR does NOT add any new compression algorithm to ImageSharp. For now it will take a JPEG thumbnail as its pixel source. Needless to say, much more work is needed after this PR to reach the goal. I'm try to help there also.

Please do let me know any comments, also on whether to support the HEIF file format in the first place.

ynse01 avatar Dec 29 '23 13:12 ynse01

Oh wow! Thanks! 🙏 I’ll have a good dig through asap!!

JimBobSquarePants avatar Jan 04 '24 08:01 JimBobSquarePants

I noticed that you used the x265 encoder. This is licensed under the GPL and cannot be used in this project.

dlemstra avatar Jan 04 '24 15:01 dlemstra

I noticed that you used the x265 encoder. This is licensed under the GPL and cannot be used in this project.

True, and that is why I used the Nokia repository as reference for my implementation, which has a license which I guess is compatible with ImageSharp. I did not start on writing an HVC Codec, due to patent concerns.

Nokia also includes a Patent waiver in its license. AVIF is supposed to be patent free, even its Codec. For AVIF I used SVT-AV1 as a reference, which has a 3-clause BSD license.

With all this, I hope the reference implementations that are mentioned now, can be used to create an AVIF Codec implementation in ImageSharp.

ynse01 avatar Jan 05 '24 22:01 ynse01

Not sure why one of the RowIntervalTests is failing, didn't touched that part of the code...

ynse01 avatar Mar 14 '24 21:03 ynse01

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

ynse01 avatar Apr 01 '24 19:04 ynse01

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

Hi @ynse01 I absolutely do want to continue but we'll need to get AV1 format together before I can merge the code. Thanks!

JimBobSquarePants avatar Apr 01 '24 22:04 JimBobSquarePants

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

Hi @ynse01 I absolutely do want to continue but we'll need to get AV1 format together before I can merge the code. Thanks!

AV1 is quite complex with all its options, especially for implementing an efficient Encoder.

Would it be easier for you, if we focus first on getting the pure HEIF container support in, including all metadata (ICC, Exif and XMP) working ? AV1 compression could than be a separate next step.

ynse01 avatar Apr 02 '24 10:04 ynse01

My concern here would be merging an incomplete format. We’ll have to make all the parts internal if you want to do things in that order.

JimBobSquarePants avatar Apr 03 '24 08:04 JimBobSquarePants

I think I understand what you're trying to tell me: From a user perspective, parsing only metadata out of an image is not considered supporting a format. Users expect pixels out of an image. Which would mean I guess, that the minimal viable PR to be merged in would be something like a first image round-trip, using 1 chosen set of parameters. That would allow for adding the other (many) variations relatively quickly and keep up with user's expectations.

The downside I see with this approach, is that the internal design of the implementation doesn't get any feedback until the very end, making design changes a bit more painful. Not to mention the size of the first PR ...

Now that I know you would like to have the AVIF format supported, I'm motivated again to continue to implement it.

@JimBobSquarePants Let me know your thoughts on the approach, order and intermediate goals of AVIF support.

ynse01 avatar Apr 04 '24 21:04 ynse01

I think I understand what you're trying to tell me: From a user perspective, parsing only metadata out of an image is not considered supporting a format. Users expect pixels out of an image. Which would mean I guess, that the minimal viable PR to be merged in would be something like a first image round-trip, using 1 chosen set of parameters. That would allow for adding the other (many) variations relatively quickly and keep up with user's expectations.

The downside I see with this approach, is that the internal design of the implementation doesn't get any feedback until the very end, making design changes a bit more painful. Not to mention the size of the first PR ...

Now that I know you would like to have the AVIF format supported, I'm motivated again to continue to implement it.

@JimBobSquarePants Let me know your thoughts on the approach, order and intermediate goals of AVIF support.

Yes exactly. If we ship with support for a format, it must support the full decode/encode process for one compression format. Our users expect that now as that's how we've always worked.

I would implement a single compression (the most common if you know that) algorithm then iterate once we know that is complete. Addition compression algorithms can be separate PRs.

Regarding reviewing the code. This is why it's good to have the PR open early so we can talk through changes as they're written. I try to read changes as soon as they appear in PRs but if there's ever anything you explicitly want me to look at just tag me.

I must say though, the quality of code you are delivering is very high!!

JimBobSquarePants avatar Apr 06 '24 01:04 JimBobSquarePants

Sounds like a plan !

The user community of AVIF encoders seems to be quite scattered, can't identify a dominant force there. Feel free to provide real-life images, so I can direct the support towards these.

For now, I plan for the following initial limitations to AV1 compression:

  • 8 bit depth only
  • Color pixels only
  • Base Profile for now
  • Square block sizes
  • Split partition type only, to keep the blocks square
  • DCT transform type only
  • Limited selection of Prediction modes
  • Limited subsampling options
  • No filters
  • No alpha
  • No sequences / animation

I do plan to take both lossy and lossless into account from the start.

@JimBobSquarePants In the meantime, you could help my local workflow, by adding the relevant file extensions to GIT LFS in Shared-infrastructure please:

*.heic              filter=lfs diff=lfs merge=lfs -text
*.hif               filter=lfs diff=lfs merge=lfs -text
*.avif              filter=lfs diff=lfs merge=lfs -text

ynse01 avatar Apr 06 '24 09:04 ynse01

Sounds like a plan !

The user community of AVIF encoders seems to be quite scattered, can't identify a dominant force there. Feel free to provide real-life images, so I can direct the support towards these.

For now, I plan for the following initial limitations to AV1 compression:

  • 8 bit depth only
  • Color pixels only
  • Base Profile for now
  • Square block sizes
  • Split partition type only, to keep the blocks square
  • DCT transform type only
  • Limited selection of Prediction modes
  • Limited subsampling options
  • No filters
  • No alpha
  • No sequences / animation

I do plan to take both lossy and lossless into account from the start.

@JimBobSquarePants In the meantime, you could help my local workflow, by adding the relevant file extensions to GIT LFS in Shared-infrastructure please:

*.heic              filter=lfs diff=lfs merge=lfs -text
*.hif               filter=lfs diff=lfs merge=lfs -text
*.avif              filter=lfs diff=lfs merge=lfs -text

That all sounds great. Sorry about the delay, I've been very swamped.

If you use the following command in your fork you will get those changes.

git submodule foreach git pull origin main

JimBobSquarePants avatar Apr 17 '24 12:04 JimBobSquarePants

@ynse01 The Av1BitStreamReader seems to work as it should, I was able to parse Av1CodecConfiguration with it d8e7078b. Maybe I can help with this PR, but I still in the process of reading the spec, there is a lot to grasp.

brianpopow avatar Jun 23 '24 11:06 brianpopow

@ynse01 The Av1BitStreamReader seems to work as it should, I was able to parse Av1CodecConfiguration with it d8e7078. Maybe I can help with this PR, but I still in the process of reading the spec, there is a lot to grasp.

There sure is a lot of grasp for AVIF.

The parsing of ObuSequenceHeader seems to work OK indeed, but somewhere in either ObuFrameHeader or in the handling of TileGroups there is a bug. The way AVIF (or technically Av1) intertwines BitStream coding with Symbol coding is pretty hard to debug. Merged in my latest changes even though some tests are still failing, as I did some minor refactoring.

ynse01 avatar Jun 24 '24 22:06 ynse01

I fixed the bug :-) Tests are passing again.

@brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at:

  • Item locations can be relative to multiple origins, this is incompletely implemented in the HeifLocation class.
  • Sorting of item locations, such that we can read the stream sequentially.
  • There is no connection between HeifDecoderCore and Av1Decoder classes. Ideally, this could be designed such that LegacyJpeg (and potentially Jpeg-XR) can be implemented similarly, thinking of an interface here. Look in 'ParseMediaData', which is currently hardcoded into LegacyJPEG.
  • We need code to split the incoming pixels into their respective Y, U and V channels, as Av1 uses the YUV pixel format exclusively.

I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

ynse01 avatar Jun 25 '24 17:06 ynse01

I fixed the bug :-) Tests are passing again.

@brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at: ... I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

Good job in finding the bug! I had some time on the weekend too look into your code. It's impressive howmuch is already done with the OBU reader. I thought maybe I could start with something small, maybe implementing parsing 5.9.14. Segmentation params as a start?

brianpopow avatar Jun 25 '24 17:06 brianpopow

I fixed the bug :-) Tests are passing again. @brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at: ... I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

Good job in finding the bug! I had some time on the weekend too look into your code. It's impressive howmuch is already done with the OBU reader. I thought maybe I could start with something small, maybe implementing parsing 5.9.14. Segmentation params as a start?

Great ! I see the implement-another-spec-section-virus has gotten to you also. If you like more of these, feel free to do so. As a goal, we could support images that do not use the reduced still picture header feature and explicitly contain every OBU module out there. The test image Irvine_CA.avif is such an example, in ObuFrameHeaderTest you can uncomment it and work your way through the the OBU headers.

ynse01 avatar Jun 25 '24 18:06 ynse01

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's. From Residual and down, I don't think the spec and the code map 1-to-1. As the SVT-AV1 library has extensive intrinsic parts, I would like to leverage those and stick close to their implementation.

ynse01 avatar Jul 04 '24 20:07 ynse01

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's. From Residual and down, I don't think the spec and the code map 1-to-1. As the SVT-AV1 library has extensive intrinsic parts, I would like to leverage those and stick close to their implementation.

OK, thanks for the update, I was following the spec very closely. I will take a look into SVT-AV1 library. I have seen that I have implemented GetPlaneResidualSize() in 13173a39, which you already have done in a another place, that was a waste of time, I have overlooked that.

brianpopow avatar Jul 04 '24 21:07 brianpopow

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's.

I tried to look into the SVT-AV1 repository, but I have some understanding issues, @ynse01 maybe you can help me understand what you are using as the reference code base. Just for clarification, when you say SVT-AV1 library you mean this repo: https://gitlab.com/AOMediaCodec/SVT-AV1 right? If so, I only see an encoder here, where is the decoder part?

brianpopow avatar Jul 11 '24 16:07 brianpopow

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's.

I tried to look into the SVT-AV1 repository, but I have some understanding issues, @ynse01 maybe you can help me understand what you are using as the reference code base. Just for clarification, when you say SVT-AV1 library you mean this repo: https://gitlab.com/AOMediaCodec/SVT-AV1 right? If so, I only see an encoder here, where is the decoder part?

Indeed, that is the repository that I've used as reference. The decoder has been removed recently, in this commit. As I've used an earlier version of that exact decoder as the basis for my code here. This is a bit of a worry from a maintenance point of view.

The SVT-AV1 library is still one of the more efficient encoders out there and its license is permissive. Hopefully we can learn a couple of tricks from them on the encoder.

ynse01 avatar Jul 11 '24 18:07 ynse01

Hi @ynse01, I still have some problems understanding howto decode a single image with the SVT-AV1 library.

I am using now the tagged version v2.1.0. I can see now there is an SvtAv1DecApp, but that seems to be for video files, if I am not mistaken. Is it not possible to decode just a single avif file? This would make comparing our implementation with the reference much easier, but I struggle to understand how this can be done.

brianpopow avatar Jul 12 '24 18:07 brianpopow

@brianpopow Running the decoder app on an AVIF file will not work. I don't think SVT-AV1 has the HEIF container parsing capabilities. It seems to use the IVF file format, but it supports more, see here. It appears to me that IVF can contain an AV1 bitstream although its meant for VP8. And yes, I know this whole container / bitstream and their naming is pretty confusing.

To summarize the terminology: Heif is the ISO base container (also known as ISOBMFF), where AV1 is just one of the supported bitstream formats. AVIF is to be interpreted here as an AV1 bitstream inside a Heif container, where the bitstream contains exactly one intra frame. With the same terminology, Section 1 of AVIF spec also touched on this briefly.

The workflow I used to "debug" SVT-AV1 is considerably less user friendly from what you suggest. I mainly used modified unit tests from their own test suite, to get the information that I need. Alternatively, when you convert the AVIF file into IVF, you could get it visualized in a debugger.

ynse01 avatar Jul 12 '24 20:07 ynse01

@ynse01 I have noticed an issue in the bitstream reader in ReadLiteral(). When you read to the end of the stream the byte position would be advanced, resulting in an Index out of range issue. I have added a test demonstrating the issue: 62728b6f

Also when there is only 4 bytes of data available, the initialization would also result in an index out of range issue, because the AdvanceToNextWord would be called twice in the constructor. I did noticed this when decoding example image jpeg444_xnconvert.avif when av1CodecConfiguration was parsed in HeifDecoderCore -> ParsePropertyContainer()

I have tried to adjust the ReadLiteral() implementation to fix that, but I did not want to override your code without asking for your opinion first. I have added a new Av1BitStreamReader2 to demonstrate this. It's adapted from libgav1, because I found the implementation of Svt-AV1 too complicated.

All Bitstream tests pass with that implementation.

brianpopow avatar Jul 20 '24 12:07 brianpopow

@brianpopow I like your new Av1BitStreamReader, for 3 reasons:

  • Fixes existing bug with reading large integers
  • Doesn't need extra padding bytes at the end
  • Has less jumps, which is more efficient with modern (Intel) CPU's

So, replaced the implementation. I did rework your suggested code a bit and implemented switching to Symbol reading again.

ynse01 avatar Jul 20 '24 15:07 ynse01

@ynse01 Just a heads up. I'm going to be merging PRs #2751 #2762 very soon which contain some breaking changes to our codec APIs. If you have any questions about how to correctly implement the changes following merge, please let me know.

JimBobSquarePants avatar Jul 31 '24 12:07 JimBobSquarePants