nifi icon indicating copy to clipboard operation
nifi copied to clipboard

NIFI-12854: Add S3 user metadata via flowfile attributes

Open gforeman02 opened this issue 1 year ago • 4 comments

Summary

NIFI-12854 The PutS3Object processor currently supports adding user metadata to objects via dynamic properties. This PR adds support for using FlowFile attributes to generate S3 user metadata.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • [x] Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • [x] Pull Request based on current revision of the main branch
  • [x] Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • [x] Build completed using mvn clean install -P contrib-check
  • [x] JDK 21

Licensing

  • [x] New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • [x] New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • [x] Documentation formatting appears as expected in rendered files

gforeman02 avatar Mar 06 '24 13:03 gforeman02

Hey @gforeman02 thanks for putting this forward. I'm not sure that I agree that it's a good idea to add this, though. In general, we should not support multiple different ways to do the same thing. Doing so just leads to confusion. Is there a reason that the existing method does not work for you?

markap14 avatar Mar 06 '24 19:03 markap14

Hello @markap14 , thanks for posting. Yes, the existing method (dynamic properties) requires knowing what the user metadata keys will be in advance. My client has a scenario where one flow has metadata key A and another that has key B and this is only known by reading the flowfile attributes. It would be great for the processor to work in the same way it does for tags.

gforeman02 avatar Mar 06 '24 19:03 gforeman02

@markap14 one other approach could be to have a special dynamic property that the processor uses to read key/value pairs from the dynamic property value. Maybe something like a key "metadata.json" with a value of { "a": "a_val", "b": "b_val" .... }.

gforeman02 avatar Mar 07 '24 13:03 gforeman02

@markap14 i created a custom s3 nar with the following mod:

final Map<String, String> userMetadata = new HashMap<>();
for (final Entry<PropertyDescriptor, String> entry : context.getProperties().entrySet()) {
    if (entry.getKey().isDynamic()) {
        final String value = context.getProperty(
                entry.getKey()).evaluateAttributeExpressions(ff).getValue();
        if( "metadata.json".equals(entry.getKey().getName())){
            ObjectReader reader = new ObjectMapper().readerFor(Map.class);
            Map<String, String> map = reader.readValue(value);
            userMetadata.putAll(map);
        } else {
            userMetadata.put(entry.getKey().getName(), value);
        }
    }
}

I am happy to modify this PR and resubmit it for consideration.

gforeman02 avatar Mar 14 '24 19:03 gforeman02

@exceptionfactory thanks for the idea on an approach. I guess a decision needs to be made:

Should the PutS3Object processor support dynamic metadata keys?

+1 here for supporting dynamic metadata keys since I have a scenario where this is used. @markap14 if the concern is unique to me then I have a workaround in place.

gforeman02 avatar Jun 04 '24 22:06 gforeman02

Thanks for the reply @gforeman02, if you willing to rework the PR along the lines of using dynamic properties with wildcards as a way to support dynamic metadata keys, that sounds like a good way forward, without introducing new properties.

exceptionfactory avatar Jun 20 '24 18:06 exceptionfactory

@gforeman02 Heads up: you have merge conflicts with main that need to be resolved on your branch.

MikeThomsen avatar Jun 24 '24 13:06 MikeThomsen

@gforeman02 Given the current conflicts and the different direction, I'm closing this pull request for now. Feel free to reopen when you are ready with a different approach, or open a new PR if you prefer. Thanks!

exceptionfactory avatar Jun 24 '24 22:06 exceptionfactory