execute_json should set the ExifTool param -struct too
Currently calling execute_json sets only -j as ExifTool parameter - but it does not set the -struct parameter. That's a dangerous for metadata properties with structured values and multiple values.
Example: the IPTC Photo Metadata Standard defines a property Location Shown in the Image which has a structure of City, State/Province, Country Name, Country Code, Sublocation and more. And it may have multiple values = multiple structures.
Using the -j (JSON) parameter without the -struct parameter returns such a result:
"XMP:LocationShownCity": ["City (Location shown2) (ref2021.1)"],
"XMP:LocationShownCountryCode": ["ABC","ABC"],
"XMP:LocationShownCountryName": ["CountryName (Location shown1) (ref2021.1)","CountryName (Location shown2) (ref2021.1)"],
"XMP:LocationShownSublocation": ["Sublocation (Location shown1) (ref2021.1)"],
Using the -j (JSON) parameter WITH the -struct parameter returns such a result:
"XMP:LocationShown": [{
"CountryCode": "ABC",
"CountryName": "CountryName (Location shown1) (ref2021.1)",
"Sublocation": "Sublocation (Location shown1) (ref2021.1)"
},{
"City": "City (Location shown2) (ref2021.1)",
"CountryCode": "ABC",
"CountryName": "CountryName (Location shown2) (ref2021.1)"
}],
The essential difference: the XMP:LocationShownCity and the XMP:LocationShownSublocation of the result without -struct have only a single value in the array, but knowbody knows if this is the City name or Sublocation name of the first location of of the second location. While the XMP:LocationShown has a JSON object for each location and the first object has no City but a Sublocation, the second object has a City but no Sublocation. Which location has what structured data is crystal clear.
(Note: the results above are taken from the IPTC Photo Metadata reference image, it has values telling to which property it belongs.)
With this semantic issue as background I suggest to set the -struct parameter with the -j parameter in the execute_json method.
@nitmws interesting observation, and thanks for the details report. Can you attach a file with these EXIF tags so I can analyze further whether using exiftool (utility) with different parameters might return different results (i'm thinking of testing grouping and stuff)?
Do you think adding the @-struct@ parameter will have any potential impact on undesired behavior?
Thanks
@sylikc find attached some files:
- IPTC-PhotometadataRef-Std2021.1_2.jpg: the JPEG file with all metadata properties defined by the IPTC Photo Metadata Standard - IPTC-PhotometadataRef-Std2021 1_2
- IPTC-PhotometadataRef-Std2021.1_2-out_NOstruct.json: the JSON file with the metadata of the IPTC image without setting the -struct parameter - IPTC-PhotometadataRef-Std2021.1_2-out_NOstruct.json.txt
- IPTC-PhotometadataRef-Std2021.1_2-out.json: the JSON file with the metadata of the IPTC image with the -struct parameter - IPTC-PhotometadataRef-Std2021.1_2-out.json.txt
- (I had to append .txt to the file names as JSON files cannot be uploaded.)
Comparing the JSON files will show the big difference: the JSON of the *out_NOstruct file has only properties with a single or an array of values while the *-out.json file has properties which have another object (or an array of objects) as value - with its own properties - as value.
This difference will have an impact on processing the metadata because with the JSON of the *out_NOstruct file people have to collect a set of properties to generate the metadata of a Location Shown and if there are multiple they have to assign the first value of an array to the first Location Shown and the second value of the array to the second Location Show. Bad experience: the XMP:LocationShownCity property has only one value, to which of the two Locations should it be assigned ???
Showing the metadata of the JSON file with the -struct parameter this is much easier: the properties of the first Location Shown are the properties of the first object in the array of XMP's LocationShown and the second object in the array holds the properties of the second location, it is crystal clear which of the two properties are missing the name of a city. It is easier to process - but different.
it's taken me awhile to get back to this... but after thinking it over, while this could potentially be a script-breaking change for downstream users, it's just the better thing to do. Much like some of the ExifToolAlpha() changes I had made making it clearer what info comes from what file, this is a good change even if it does break downstream users (until they adapt their code)
After reading through the documentation it's still debatable whether to add the flag or not... the current design of PyExifTool makes minimal changes to the default Exiftool output.
There's already tons of grief from people trying to figure out the '-G' and '-n' flags which are specified on default... (only one issue opened in this repo, but the upstream and stackoverflow have a good number)
if specifying -struct lots of changes would be required of downstream users... (as per the exiftool documentation)
By default XMP structures are flattened into individual tags in the JSON output, but the original structure may be preserved with the -struct option (this also causes all list-type XMP tags to be output as JSON arrays, otherwise single-item lists would be output as simple strings).
in my own use case, I already pass -struct in params, and while the '-G', '-n' can be removed... specifying result = self.execute("-j", "-struct", *params) would not allow it to be removed
if I replaced the default parameters to '-G', '-n', '-struct', there could be other unintended side effects (as per exiftool documentation)
-struct, --struct Output structured XMP information instead of flattening to individual tags. This option works well when combined with the XML (-X) and JSON (-j) output formats. For other output formats, XMP structures and lists are serialized into the same format as when writing structured information (see https://exiftool.org/struct.html for details). When copying, structured tags are copied by default unless --struct is used to disable this feature (although flattened tags may still be copied by specifying them individually unless -struct is used). These options have no effect when assigning new values since both flattened and structured tags may always be used when writing. (-struct flag disables structured tags copying)
I'm not sure what the right way to do this is... as a user of PyExifTool myself, I just specify -struct in my own calls using params="-struct"
I understand your concerns, @sylikc , regarding backward compatibility.
We at IPTC are aware than many users of photo metadata prefer using simple properties without a structure. But the structured properties get more important, e.g. for telling from which URL a licensed image can be bought Google uses the Web URL sub-property of the structured Licensor property.
Therefore I suggest this help by PyExiftool to implement the safe use of structured properties in an easy way:
- Add a function exiftool.ExifTool.execute_struct(self, *params): (or named .execute_safestruct(...) )
- This function sets the parameters -j, -G, -n and -struct
- It returns a list of dictionaries made from the JSON object(s) returned by ExifTool
- (It may ignore parameters of the function call disabling -j, -n, -struct and any variant of -G)
- The documentation of this function at https://sylikc.github.io/pyexiftool/reference/1-exiftool.html explains the features of this function:
- Properties structured in the embedded XMP metadata are returned as structure and therefore sub-properties can be related to the wrapping property correctly. (-j, -struct parameter)
- The simple and the structured properties, including sub-properties, are named properly (-G parameter)
- The real value of properties with values from an enumeration is returned, not an Exiftool alias. (-n parameter)
I'm still thinking about your suggestion above... I am debating whether to add this helper function to ExifToolHelper or ExifToolAlpha. The whole description of it is an add-on to the base ExifTool functionality, and so as an extension, it wouldn't end up in the base class...
the -G, and -n is in the common_args by default. Any invocation of execute_json would set the parameter.
It would be relatively trivial to write an execute_struct() into Helper or Alpha
def execute_struct(self, *params):
return self.execute_json("-struct", *params)
Have you also considered suggesting to the upstream exiftool tool to have -struct the default for -j?
To which class this function is added is up to you.
Can a parameter set by such a function be overridden by the *params? The -G parameter is a good starting point but for the IPTC properties a -G1 is recommended as it includes the XML/XMP namespace in the tag name. And this makes the proper naming of properties more safe.
Regarding combining -j and -struct I had a conversation with Phil Harvey but it is also backward compatibility stopping this idea.
Can a parameter set by such a function be overridden by the *params? The
-Gparameter is a good starting point but for the IPTC properties a-G1is recommended as it includes the XML/XMP namespace in the tag name. And this makes the proper naming of properties more safe.
The -G parameter was set default by the original author of PyExifTool and it's set during init. It sets the common_args property, which is only writable before an invocation of run. Basically, common_args of exiftool is passed to any commands used. if common_args are set to ['-G1', '-n'] in the constructor, or set in the properties afterwards, it'll be included in all commands.
Setting -G1 in params if -G is in common_args doesn't work, I did a quick test
# does not work
with exiftool.ExifTool() as et:
print(et.execute_json('-G1', filepath))
# works
with exiftool.ExifTool(common_args=['-n']) as et:
print(et.execute_json('-G1', filepath))
# works
with exiftool.ExifTool(common_args=['-G1', '-n']) as et:
print(et.execute_json(filepath))
# works
et = exiftool.ExifToolHelper()
et.common_args = ['-G1', '-n']
print(et.execute_json(filepath))
Regarding combining
-jand-structI had a conversation with Phil Harvey but it is also backward compatibility stopping this idea.
I see... yeah, backwards compatibility ties the hands to how much can change with projects which have so many dependencies... I took a leap of faith last year when I totally chopped up PyExifTool https://github.com/sylikc/pyexiftool/pull/13 . Luckily, aside from code refactors, the intended output didn't change, and it looks like adoption is good.