aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Add S3 compatible OutputStream

Open maciejwalkowiak opened this issue 3 years ago • 7 comments

Describe the feature

It would be great if SDK come with S3 compatible implementation of java.io.OutputStream. It is a common use case and lots of folks implement the same thing over and over again:

https://github.com/search?l=Java&q=s3outputstream&type=Code

Is your Feature Request related to a problem?

We're migrating Spring Cloud AWS to SDK v2 and found a need for SDK v2 compatible S3 OutputStream.

Proposed Solution

No response

Describe alternatives you've considered

Upload in chunks: https://github.com/CI-CMG/aws-s3-outputstream/blob/trunk/src/main/java/edu/colorado/cires/cmg/s3out/S3OutputStream.java

Upload first to disk, once upload is over submit complete file to S3: https://github.com/EnjoyLifeFund/macHighSierra-cellars/blob/49a477d42f081e52f4c5bdd39535156a2df52d09/tachyon/0.8.2/libexec/underfs/s3/src/main/java/tachyon/underfs/s3/S3OutputStream.java

Acknowledge

  • [ ] I may be able to implement this feature request

AWS Java SDK version used

2

JDK version used

any

Operating System and version

any

maciejwalkowiak avatar Mar 26 '22 12:03 maciejwalkowiak

What would you do retries when a connection is lost? A few years back, we changed our S3 logic so the file is always buffered on local disk so the upload can retry if something goes wrong.

prdoyle avatar Mar 26 '22 18:03 prdoyle

@prdoyle valid question. Your approach has been implemented in https://github.com/EnjoyLifeFund/macHighSierra-cellars/blob/49a477d42f081e52f4c5bdd39535156a2df52d09/tachyon/0.8.2/libexec/underfs/s3/src/main/java/tachyon/underfs/s3/S3OutputStream.java

maciejwalkowiak avatar Mar 27 '22 06:03 maciejwalkowiak

Another current alternative is a hacky adapter between the InputStream the current client exposes and an OutputStream. Similar to

    public static class CopyStream extends ByteArrayOutputStream {
        public CopyStream() { super(); }

        public InputStream toInputStream() {
            return new ByteArrayInputStream(this.buf, 0, this.count);
        }

        public int size() {
            return this.count;
        }
    }

[...]

        PDFDocument outputDocument = generateDocument();
        CopyStream outputStream = new CopyStream();
        outputDocument.save(outputStream);
        outputDocument.close();
        InputStream inputStream = outputStream.toInputStream();
        long size = outputStream.size();
        // Do S3 upload using inputStream and size

SamStephens avatar Apr 05 '22 00:04 SamStephens

The proposed solution in my duplicate issue https://github.com/aws/aws-sdk-java-v2/issues/3131 is:

        PDFDocument outputDocument = generateDocument();
        PutObjectRequest putRequest = buildS3PutRequest();

        try (OutputStream outputStream = S3_CLIENT.putObjectOutputStream(putRequest)) {
            outputDocument.save(outputStream);
            outputDocument.close();
        }

Although I'm not super familiar with Java streams and the appropriate idioms for them.

SamStephens avatar Apr 05 '22 00:04 SamStephens

@maciejwalkowiak thank you for reaching out, added this feature request to our backlog.

For everyone tracking this, add 👍 to the original description to show your support, it helps us with prioritization.

debora-ito avatar Apr 05 '22 00:04 debora-ito