Reading metadata methods on PNG images are discarding some custom key/value
Hi, I am using this lib to read all sorts of image metadata and while I was trying to figure it out why some metadata values on PNG images did not appear on the PngImage class data, I notice they were found but discarded inside the PngChunk::parseChunkContent. Although this seems to be by design I think this can also be interpreted as an issue. Since this information could be saved on PngImage class as a container like the other metadata types, and later on the write metadata for this type of metadata fields could also be implemented.
The reason we don't support this metadata is two fold:
- We support Exif, IPTC, XMP and ICC
- This was only brought recently to my attention: http://dev.exiv2.org/issues/1312
Your analysis of how to implement this is correct. This would involve adding another metadata container. We have a long term goal (and working prototype) to support a single "unified" metadata container. Currently Team Exiv2 has a full pipeline of scheduled activity. Without another engineer to embrace this topic, implementation isn't scheduled (hence target 1.0).
If you would like to join the team and work on this subject, I assure you of a warm welcome and great colleagues who will mentor and support you.
As this topic has only been raised recently, I'm don't even know the extend of the metadata involved. Even if you don't want to get involved in implementation, you could make a very valuable to investigate the "scope" of the metadata involved. To convert your "some" metadata into a table (or spec) of "PNG Metadata Fields".
Let me condense the wide-ranging discussion on Redmine with Arnold using his files http://dev.exiv2.org/attachments/download/1184/itxt-german.png and http://dev.exiv2.org/attachments/download/1183/itxt2.png
$ exiv2 -pR http://dev.exiv2.org/attachments/download/1183/itxt2.png
STRUCTURE OF PNG FILE: http://dev.exiv2.org/attachments/download/1183/itxt2.png
address | chunk | length | data | checksum
8 | IHDR | 13 | ...[...E..... | 0x52edaae4
33 | gAMA | 4 | .... | 0x0bfc6105
49 | sBIT | 4 | .... | 0x4da52df6
65 | bKGD | 6 | ...... | 0x95cd2f20
83 | pCAL | 44 | bogus units...........foo/bar | 0x57407b1c
139 | pHYs | 9 | ......... | 0xd2dd7efc
160 | tIME | 7 | .....:. | 0x8eff267a
179 | tEXt | 9 | Title.PNG | 0xdc017935
200 | iTXt | 39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
251 | zTXt | 26 | test.....N.)-Qx.P......... | 0xa869e99d
289 | IDAT | 4828 | x..\+........EFFb...X$.......D | 0x353e27e9
5129 | zTXt | 202 | Description....M.Mj.@...>.[... | 0xa9a15024
5343 | iTXt | 111 | Warning...de.WARNING........0. | 0xb3adddee
5466 | IEND | 0 | | 0xae426082
$ exiv2 -pR http://dev.exiv2.org/attachments/download/1184/itxt-german.png
STRUCTURE OF PNG FILE: http://dev.exiv2.org/attachments/download/1184/itxt-german.png
address | chunk | length | data | checksum
8 | IHDR | 13 | ...[...E..... | 0x52edaae4
33 | gAMA | 4 | .... | 0x0bfc6105
49 | sBIT | 4 | .... | 0x4da52df6
65 | pCAL | 44 | bogus units...........foo/bar | 0x57407b1c
121 | tIME | 7 | .....:. | 0x8eff267a
140 | bKGD | 6 | ...... | 0x95cd2f20
158 | pHYs | 9 | ......... | 0xd2dd7efc
179 | tEXt | 9 | Title.PNG | 0xdc017935
200 | iTXt | 39 | Author...fr.Auteur.La plume de | 0x4fdb72e1
251 | IDAT | 4000 | x..\+........EFFb...X$.......D | 0x199b9fd7
4263 | IDAT | 831 | .......#T..6.....`....G...(<.. | 0xa028a770
5106 | zTXt | 202 | Description..x.M.Mj.@...>.[... | 0x692a52f9
5320 | iTXt | 111 | Warning...de.WARNING.x......0. | 0xb5cd3e90
5443 | iTXt | 65 | Deutsch...de..Steinstra..e 10, | 0x23616b74
5520 | IEND | 0 | | 0xae426082
$
I believe Exiv2 ignores chunks such as Author, Description, Warning and Deutsch which are not part of the supported formats (Exif, IPTC, XMP and ICC). However, I'm sure there are other fields and we need a "spec" to understand the extend of the effort involved.
Another word of encouragement to "sign up" for this challenge is to say that I have found working with Arnold to be a pleasant and enjoyable experience. When you work with him, he'll be quick to test, comment and encourage you. A very nice guy. [email protected]
I'm going to close this for now. The issue of providing additional PNG metadata support is logged in Redmine http://dev.exiv2.org/issues/1312 I've updated Redmine#1312 with a pointer to this discussion. If @MarcoMartins86, or anybody else, wishes to discuss this matter further, we can reopen this issue.
Hi @clanmills , thank for your quick reply. So I might be able to help in the development of this feature, do I contact Arnold directly or is there another process to get more information in this matter?
Very good. Yes, let Arnold know that you are willing to work on this. He’ll be delighted and will work with you to test your code. Arnold is a ‘friendly user’ and can’t help you to write the code. It appears to me that you have understood how to implement this feature. I’ll send you an invitation to join the team. Before you write any code, we need to ‘scope’ the feature to know what’s involved.
Here's a document about PNG metadata: https://docs.oracle.com/javase/8/docs/api/javax/imageio/metadata/doc-files/png_metadata.html
That's a good start, however it's not the whole story as that spec explicitly defines chunk types such as ICC, Palette, ColorSpace and others. However zTXt and tEXt are chunks that contain structured data:
540 rmills@rmillsmbp:~/temp $ grep ELEMENT pngmetadata.txt
<!ELEMENT "javax_imageio_png_1.0" (IHDR?, PLTE?, bKGD?, cHRM?,
<!ELEMENT "IHDR" EMPTY>
<!ELEMENT "PLTE" (PLTEEntry)*>
<!ELEMENT "PLTEEntry" EMPTY>
<!ELEMENT "bKGD" (bKGD_Grayscale | bKGD_RGB | bKGD_Palette)>
<!ELEMENT "bKGD_Grayscale" EMPTY>
<!ELEMENT "bKGD_RGB" EMPTY>
<!ELEMENT "bKGD_Palette" EMPTY>
<!ELEMENT "cHRM" EMPTY>
<!ELEMENT "gAMA" EMPTY>
<!ELEMENT "hIST" (hISTEntry)*>
<!ELEMENT "hISTEntry" EMPTY>
<!ELEMENT "iCCP" EMPTY>
<!ELEMENT "iTXt" (iTXtEntry)*>
<!ELEMENT "iTXtEntry" EMPTY>
<!ELEMENT "pHYS" EMPTY>
<!ELEMENT "sBIT" (sBIT_Grayscale | sBIT_GrayAlpha | sBIT_RGB |
<!ELEMENT "sBIT_Grayscale" EMPTY>
<!ELEMENT "sBIT_GrayAlpha" EMPTY>
<!ELEMENT "sBIT_RGB" EMPTY>
<!ELEMENT "sBIT_RGBAlpha" EMPTY>
<!ELEMENT "sBIT_Palette" EMPTY>
<!ELEMENT "sPLT" (sPLTEntry)*>
<!ELEMENT "sPLTEntry" EMPTY>
<!ELEMENT "sRGB" EMPTY>
<!ELEMENT "tEXt" (tEXtEntry)*>
<!ELEMENT "tEXtEntry" EMPTY>
<!ELEMENT "tIME" EMPTY>
<!ELEMENT "tRNS" (tRNS_Grayscale | tRNS_RGB | tRNS_Palette)>
<!ELEMENT "tRNS_Grayscale" EMPTY>
<!ELEMENT "tRNS_RGB" EMPTY>
<!ELEMENT "tRNS_Palette" EMPTY>
<!ELEMENT "zTXt" (zTXtEntry)*>
<!ELEMENT "zTXtEntry" EMPTY>
<!ELEMENT "UnknownChunks" (UnknownChunk)*>
<!ELEMENT "UnknownChunk" EMPTY>
For example the metadata zTXt is a "container" for Exif, IPTC and XMP.
544 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2 $ exiv2 -pS test/data/ReaganLargePng.png
STRUCTURE OF PNG FILE: test/data/ReaganLargePng.png
address | chunk | length | data | checksum
8 | IHDR | 13 | ............ | 0x8cf910c3
33 | zTXt | 8461 | Raw profile type exif..x...iv. | 0x91fbf6a0 <--- EXIF hides here
8506 | zTXt | 636 | Raw profile type iptc..x..TKn. | 0x4e5178d3 <--- EXIF hides here
9154 | iTXt | 7156 | XML:com.adobe.xmp.....<?xpacke | 0x8d6d70ba <--- XMP hides here
16322 | gAMA | 4 | .... | 0x0bfc6105
16338 | iCCP | 1151535 | ICC profile..x...UP.........!! | 0x11f49e31 <--- ICC here (and in the "high level" spec).
1167885 | bKGD | 6 | ...... | 0xa0bda793
1167903 | pHYs | 9 | ...#...#. | 0x78a53f76
1167924 | tIME | 7 | ......2 | 0x582d32e4
1167943 | zTXt | 278 | Comment..x.}[email protected].. | 0xdb1dfff5
1168233 | IDAT | 8192 | x...k.%.u%....D......GWW...ER. | 0x929ed75c
1176437 | IDAT | 8192 | .F('.T)\....D"]..."2 '(...D%.. | 0x52c572c0
1184641 | IDAT | 8192 | y-.....>....3..p.....$....E.Bj | 0x65a90ffb
1192845 | IDAT | 8192 | ....S....?..G.....G........... | 0xf44da161
1201049 | IDAT | 7173 | .evl...3K..j.S.....x......Z .D | 0xbe6d3574
1208234 | IEND | 0 | | 0xae426082
545 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2 $
I recommend that you and Arnold discuss this together. I'm happy to join your discussion, although I don't know much about PNG.
We use Slack in Team Exiv2 as our Instant Messenger. If you provide your email address, I'll send you an invitation. If you want to keep your email address private, please drop me an email: [email protected]
Hi Robin, I have sent you a private email a couple of days ago. Since I am still not on your group in Slack I assume that you didn't receive it or that you still didn't have the time. Reach me if you need me to send it again.
Robin is currently on vacation, so it might take a little longer before he can reply.
After checking what was already done in the code and after reading http://www.libpng.org/pub/png/spec/iso/index-object.html chapter 11.3.4 Textual information. It looked to me that we are already successfully parsing mostly if not all the formats described in the mentioned chapter and then discarding the data. So my proposed approach is to add a new pngti.hpp and respective pngti.cpp where a new container for these different types of data will be defined and implemented so it can be used inside the PngImage class and filled on its readMetadata method. In the meanwhile, I will do any necessary adjustments to what is now done on parsing. So my goal will be mainly to read into an appropriate data container the Textual information metadata of PNG images. After this is done and merged I will think of how to write these types of metadata.
Hi Marco, I'll keep an eye on this topic, though I'm off chasing other 'rabbits' down their holes ;-) Whenever possible, I'll try to test any changes you make, though it will likely mean I'll have to resurrect some older code setup.
Marco:
Well done to isolate 11.3.4 in http://www.libpng.org/pub/png/spec/iso/index-object.html Very useful. That identified the following elements:
Title | Short (one line) title or caption for image Author | Name of image's creator Description | Description of image (possibly long) Copyright | Copyright notice Creation Time | Time of original image creation Software | Software used to create the image Disclaimer | Legal disclaimer Warning | Warning of nature of content Source | Device used to create the image Comment | Miscellaneous comment
And, I think pngchunk.cpp will parse and rewrite chunk for you (I can't remember, I haven't looked at the code recently). If you set a breakpoint on pngimage.cpp readMetadata() and writeMetadata() you'll discover how the code works. Let me know if you want to discuss it.
However parsing and writing the metadata is only part of the challenge. We have to solve how to "address" this metadata. Exiv2 has three "families" of metadata (Exif, Iptc and Xmp) which are subdivided by "group" and "tag". For example: Exif.Image.Make. We could use use Exif.Png.Description etc..., however this isn't Exif metadata. (Nor Iptc, nor Xmp). We could introduce a family "Png" (with a PngMetadata vector API). As things stand this would be Png.Description. Can the code deal with duplex X.Y keys, or does it have to be triple X.Y.Z key? I see there are other types of metadata in 11.3.5 Miscellaneous information. Perhaps they should also be supported. So, then we fix for the X.Y keys, as we can have Png.Text.Description etc, Png.Misc.Background.
Can you consider is how this metadata can (or cannot) be supported in the .exv file. This is an extracted metadata file (it's something like a JPEG with no image). Because it's a JPEG, it can write Exif, Iptc, Xmp (and ICC profiles) using the embedding scheme for JPEG. Perhaps we can use another APP1 segment to store the PNG metadata. ExifTool supports .exv and we should be careful to maintain compatibility.
Some other matters:
- The "unified metadata container" Perhaps we can introduce that into Exiv2 v0.27 with PNG metadata support and avoid introducing more metadata containers.
- The command-line switches. Eg -px will print only Xmp. I think -pj is available for print only Jpg.
- What updates are required to the samples?
- Update to the test harness
- Update to exiv2.1 (the man page)
Incidentally, the Video metadata was all added as Xmp.Video.XXX I wasn't involved with the Video project, however I believe the "Xmp" family was hi-jacked to present video metadata without introducing another "family". I don't think we should do that again by introducing Xmp.Png.Description etc.
As you can see, there's a little more to this than appears at first glance (and my second glance will also contain omissions). However don't get discouraged - you'll get there. I'm hoping that v0.27 with security fixes will reach RC1 in December and be released in 2018. This "Png Metadata" can target v0.28.
Robin,
Yes, I think it is best to introduce a new PngMetadata vector and so implement the "unified metadata container" concept just to this type, since in the future it seems that all metadata will be using this concept (just remind me where can I find the functional example of this code?). I was not thinking in subdividing by group or family but by a kind of known paths logical approach, so X.Y.K or X.Y.Z.K doesn't matter, from left to right the first unknown part of the path was the key.
Proposed steps:
- Add the unified metadata container for a PngMetadata vector and store the chosen metadata types (I can try to add them all but not sure if this is the correct approach).
- Implement a new command-line switch for printing PngMetadata.
- Try to solve the exported .exv and the new metadata problem.
- Check what can be done to update the tests on reading PngMetadata and add new samples.
- Merge the code to master here.
- Update the man pages (read part).
- Make it possible to write PngMetadata and deal with the different types constraints.
- Make it possible to import a .exv file to an image.
- Write test and samples related to writing PngMetadata.
- Merge the code to master here.
- Update the man pages (write part).
The "unified metadata container" code is in the "unstable" branch on SVN. Something like svn://dev.exiv2.org/svn/branches/unstable I built that code last year and I believe it's working. However, I have not studied it nor stepped it in the debugger to discover how it works. Getting that into Exiv2 would be a "headline" feature for v0.28.
I think you should add the 12 elements above (Title .. Comment) as Png.Text.Title, Png.Text.Comment You can easily add Misc and other parts of the PNG spec if requested.
However, let's not be in a hurry to merge "unified metadata container" with master. I'd like to start work on releasing master as v0.27 in December and hope to reach RC1 for Christmas. After that, we'll probably have a 0.27 branch to which we'll selectively add fixes and mods in 2018. Master will then be the 0.28 development branch.
I'm in Vietnam at the moment for the wedding of Tuan and Alice on Saturday 25 November. Tuan was a GSoC student in 2013 and contributed the "web ready" feature. I'll be home in England on 30 November. Hope to start work on releasing v0.27 on Monday 4 December.
@clanmills Do you know if there are any further branches on svn that are not on git/github? I think we should migrate them, otherwise we are going to forget / loose them.
@MarcoMartins86 Thanks for working on this. Please keep in mind to keep the PRs in a reviewable size, as others must review them before being merged. We are going to write a contribution guide soon-ish where all of this is going to be explained.
I only discovered about the "unstable" branch when I visited Andreas last year. There is a "video write" branch, however I don't think it's worth saving. The video-read code is in the code base and needs a lot of work. The video-write code is based on that. Regrettably, I don't think it's worth salvaging.
There is an SVN "team" repos: svn://dev.exiv2.org/svn/team which has useful stuff that doesn't belong in 'master' such as the Logo, some drawings and meeting notes. Importantly, I moved 'website' there a few months ago when somebody complained about it being in 'master'. If you move 'website' to Git, I will be unable to work on the release in December as I have found it impossible to work with git.
I think uploading the branches won't hurt, provided it's not too much work.
In the long term we should migrate the team repository from svn, but if it contains many binaries I am not even sure that git is the right tool for that, maybe git+lsf. But I would keep it in svn for now.
I think we can leave svn://dev.exiv2.org/svn/team (and unstable) in svn for now. If it's not broken, don't change it. What you decide for the longer term, is up to you. I believe GitHub provides a "wiki" in which we can place the team resources without them being downloaded to every clone of 'master'.
This is related to the Unified Metadata Container which is a top-level feature of v1.00. I'm closing this issue as this topic because it will be dealt with in #1505 and other related issues.
I'm going to close this to reduce the number of open issues. The Unified Metadata Container and Family PNG are a top-level feature of v1.00 and will be delivered.
Hi,
Should we reopen this issue and assign it to the project "1.0"? I think in this issue there is relevant information to be considered for extending the parsing capabilities for PNG files.
Although #1505 is a super-set capturing this PNG-parsing feature, one might think about first extending the PNG parsing capabilities without addressing the major "Unified Metadata Concept" feature.
@piponazo. Yes. I think this should be re-opened with a target of v1.00. Earlier this year I built the "unstable" branch and it works. We need to study that code and port it to 'main'. This is of great value to Exiv2 and could be the defining feature of v1.00.
I discuss PNG "Textual information" in my book (Author etc). So to apply the Exiv2 Tag syntax (Family.Group.Name), we could use "PNG.Text.Author". The PNG Specification permits users to define other keywords. I think every member of the PNG Family will be in group Text, although there could be other groups such as comments.
The reason that the Unified Metadata Container is necessary is to avoid introducing another family metadata vector similar to Exiv2::ExifData, IptcData and XmpData.
11.3.4.2 Keywords and text strings
The following keywords are predefined and should be used where appropriate.
| Title | Short (one line) title or caption for image |
|---|---|
| Author | Name of image's creator |
| Description | Description of image (possibly long) |
| Copyright | Copyright notice |
| Creation Time | Time of original image creation |
| Software | Software used to create the image |
| Disclaimer | Legal disclaimer |
| Warning | Warning of nature of content |
| Source | Device used to create the image |
| Comment | Miscellaneous comment |
Other keywords may be defined for other purposes. Keywords of general interest can be registered with the PNG Registration Authority. It is also permitted to use private unregistered keywords.
11.3.4.2 Keywords and text strings The following keywords are predefined and should be used where appropriate
There is a slightly updated list in the official PNG Third Edition spec which adds XML:com.adobe.xmp. This is widely deployed.
In addition the PNG Extensions spec, v1.6.0 adds the Collection keyword. I haven't seen this one in the wild.
The eXIf Exchangeable Image File (Exif) Profile has also been moved from the extensions spec into PNG Third Edition, as it is widely used. One clarification was added: conforming to browser handling of eXIf in particular orientation: browsers only honor this if that chunk precedes IDAT while the extensions spec left it open where eXIf could go. PNG Third Edition requires that, like most other chunks apart from animation frames, it be before IDAT. See Chunk ordering