aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Streaming of S3 large files using S3AsyncClient and AsyncResponseTransformer.toPublisher(), CRT vs netty backpressure

Open VadimKirilchuk opened this issue 1 year ago • 12 comments

Describe the bug

To stream data directly instead of writing it to a file or into a memory #391 introduced AsyncResponseTransformer.toPublisher() method which essentially provides a way to return Mono<ResponseEntity<Flux<ByteBuffer>> to the caller.

The javadoc however, clearly mentions that: "You are responsible for subscribing to this publisher and managing the associated back-pressure. Therefore, this transformer is only recommended for advanced use cases."

I am attaching a project which has two things: Spring Boot server which can download large file from S3 using S3AsyncClient (use spring.profiles.active in application.yml to use either CRT or Netty client) and very slow consumer. s3-streaming.zip

If you specify a bucket and large file in the application.yml, start the application with CRT client (default) and start slow client you may observe that in the log, every 10 seconds spring reactor netty direct memory consumption will be printed, it will quickly fill with the size of the file while consumer slowly keeps reading it.

Now, if you have 10 slow consumers, your direct mem will quickly hit the wall and requests will start to error out, causing Premature End of Content and closed connections.

I was not able to find any examples on how to maintain backpressure as Flux.limitRate didn't work at all.

CRT client was chosen by us based on the AWS http configuration page.

It would be nice if someone can look at this to figure out if there is an issue with the code itself, or code itself is fine and this use case doesn't play well with CRT.

I need to also mention that once CRT is replaced with netty, memory consumption stays minimal, so backpressure seems to work without any manual intervention if netty is used on both sides (reactor-netty and s3 sdk netty). However, we observed a different issue with that - under load, some of the requests are idling and closed by a timeout on the ALB side, it is under investigation.

The point is, while documentation says about backpressure, it seems to work if using netty as s3 http client and doesn't work if using CRT as client. Please advice.

Expected Behavior

CRT s3 http client behaves similar to s3 netty http client and doesn't push all the data to server direct buffers.

Current Behavior

CRT instantly pushes all data to the server's netty direct byte buffers for slow consumers. That doesn't happen if using netty http client instead of CRT.

Reproduction Steps

Use the attached zip file maven project. Update application.yml to point to s3 bucket and file which is larger than 200MB. You may also need to specify AWS credentials provider in the code or in the system.

Start the server. Start the SlowDrainApplication main method. Observe the server logs to see Netty: direct mem consumption. All data will get pushed into the direct memory.

Go to application.yml and change spring.profiles.active from 'crt' to 'netty'. Repeat the exercise. Observe the server logs to see that no excess memory consumption is happening without any additional backpressure configuration.

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.25.10

CRT Client Version

0.29.14

Reactor netty and netty version

reactor-netty 1.1.3 netty 4.1.108.Final

JDK version used

openjdk version "17.0.3" 2022-04-19 oracle jdk 17

Operating System and version

windows 11 23H2 and some linux image in EKS

VadimKirilchuk avatar Apr 26 '24 18:04 VadimKirilchuk

I have found something interesting there. If add doOnRequest(n -> log.info("Requested {}", n)); to the returned Flux, I can observe that by default MonoSendMany in reactor requests 1 and then 127 chunks. CRT client seems to use 8MB chunks and as a Publisher it will push (1+127)*8=1024Mb data to the subscriber.

@GetMapping(path="/{filekey}")
public Mono<ResponseEntity<Flux<ByteBuffer>>> downloadFile(@PathVariable("filekey") String filekey) {    
    GetObjectRequest request = GetObjectRequest.builder()
      .bucket(s3config.getBucket())
      .key(filekey)
      .build();
    
    return Mono.fromFuture(s3client.getObject(request, AsyncResponseTransformer.toPublisher()))
      .map(response -> {
        checkResult(response.response());
        String filename = getMetadataItem(response.response(),"filename",filekey);            
        return ResponseEntity.ok()
          .header(HttpHeaders.CONTENT_TYPE, response.response().contentType())
          .header(HttpHeaders.CONTENT_LENGTH, Long.toString(response.response().contentLength()))
          .header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"" + filename + "\"")
          .body(Flux.from(response));
      });
}

I tried to add limit(3) to the returned Flux and CRT publisher does respect it, however limit() is not what can be used here as it basically ends the stream.

As I said before I also tried limitRate(2) and that one is not respected.

Then I tried to wrap the Flux into MyFlux which wraps Subscriber into MySubscriber which wraps Subscription into MySubscription in order to override requested number.

public class MySubscription implements Subscription {
    private final Subscription original; //+ constructor

   @Override
   public void request(long n) {
       return original.request(2);
   }
}

And that does override requested 127 to 2, however it seems like it completely breaks MonoSendMany logic and I think it stops publishing anything to the channel/consumer. Consumer blocks (remember I am using blocking IO for consumer, so it's completely stuck).

Let me check what are s3-netty chunk sizes and how it handles request 127.

VadimKirilchuk avatar Apr 30 '24 03:04 VadimKirilchuk

Ok, s3-netty uses 8KB chunks instead of 8MB chunks which really helps here.

So, here is what I think could be a solution:

  1. Make reactor-netty request less than 128/64(refill), but that's hardcoded in static final fields of MonoSend (the class constants are defined, but they are used in MonoSendMany).
  2. Make CRT client use smaller download chunks, there is an option for upload chunk sizes, but I am not sure there is one for download..
  3. Figure out how to make limitRate() work, but CRT doesn't respect it and MonoSendMany may not like it either due to internal calculations based on the 128/64 MAX_SIZE/REFILL_SIZE.

Please advice.

VadimKirilchuk avatar Apr 30 '24 03:04 VadimKirilchuk

Alright, I just tried

S3AsyncClient.crtBuilder()
    .minimumPartSizeInBytes(1L * 1024 * 1024) // 1 MB

And even though parameter name is "uploadPartSize" and javadoc refers to Amazon Multipart Upload Limits, it does control both download and upload chunk sizes.

So, the javadoc is fine except the parameter name.

From here, I will let you review this issue to check if you want to update javadoc/param names a little bit, or even split them into download and upload chunk sizes.

Also I will wait for an answer on limitRate().

VadimKirilchuk avatar Apr 30 '24 03:04 VadimKirilchuk

Hi @VadimKirilchuk, AWS CRT-based S3 client utilizes S3 multipart upload and ranged GET (GetObject request with ranged header) to achieve parallel transfers. The reason why the default minimumPartSizeInBytes is high is because S3 requires part size to be at least 5MB. It works for download because ranged GET doesn't have this limitation. Agreed that we should update our documentation to clarify that.

When you invoke GetObject, under the hood, AWS-CRT S3 client will initially downloads a few parts and buffer them in memory (buffer size can be configured via https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3CrtAsyncClientBuilder.html#initialReadBufferSizeInBytes(java.lang.Long)) and after that, the CRT will send more data only after the existing parts have been consumed. Maybe you can try lowering initialReadBufferSizeInBytes?

zoewangg avatar May 02 '24 22:05 zoewangg

Hi @zoewangg,

We can definitely play with that setting and this may reduce CRT read buffer, but the problem between reactor-netty and s3 client is that reactor-netty requests 128 part sizes to be pushed from s3 client to reactor-netty. Since the part size is 8MB by default it is equal to 1024MB per single GetObject request.

We reduced the part size to 1MB and overall behavior is more stable now as it will push 128*1=128MB at max until waiting for next request(REFILL_SIZE).

So, is there a way to limitRate on the AsyncResponseTransformer publisher instead of changing the part size? I think the answer is no, but maybe I miss something.

Sounds like I should raise another request for reactor-netty as they have these MAX_SIZE/REFILL_SIZE hardcoded which doesn't play nicely with huge chunks.

VadimKirilchuk avatar May 03 '24 15:05 VadimKirilchuk

So, is there a way to limitRate on the AsyncResponseTransformer publisher instead of changing the part size? I think the answer is no, but maybe I miss something.

Right, not for AWS-CRT based S3 client. Currently, the chunk size is the same as the part size, so it can only be configured by part size. There's is a plan on CRT side to support streaming with smaller chunks, but we don't have a timeline at the moment.

zoewangg avatar May 03 '24 16:05 zoewangg

Hi, just posting an update on this one. We haven't experimented with initialReadBufferSizeInBytes yet. However, since we reduced part size, we transitively reduced the initialReadBufferSizeInBytes. 8MB => 80MB initial buffer 1MB => 10MB initial buffer

I am actually thinking about increasing it back to 80MB, or even more. Reactor-netty was requesting too many parts/chunks, so we fixed that by reducing part size, but as a result there is less buffering/caching now, I guess we can improve that by increasing buffering/caching on s3-crt side using bigger initialReadBufferSizeInBytes.

Basically, by playing with both initialReadBufferSizeInBytes and minimumPartSizeInBytes we can achieve a desired result.

I am also still behind on opening an issue against reactor-netty in regards to hardcoded MAX_SIZE/REFILL_SIZE.

VadimKirilchuk avatar May 08 '24 02:05 VadimKirilchuk

I have finally submitted a feature request on the reactor-netty side, let's see what they say. Reactor-netty feature request

VadimKirilchuk avatar May 21 '24 15:05 VadimKirilchuk

Upd. I will be working on a PR for reactor-netty to make request configurable by property, gonna take some time. I am not sure if we want to close this one since there is no big issue with AWS SDK other than minor javadoc glitch.

VadimKirilchuk avatar May 29 '24 02:05 VadimKirilchuk

@zoewangg good news, my pull request on the reactor-netty side was merged, once it's released it will be possible to keep default part size and simply reduce the prefetch size on the reactor-netty side by using: reactor.netty.send.maxPrefetchSize set to lower values (default is 128). For instance a value of 32 would make reactor-netty request AWS SDK to push up to 32 * 8MB = 256MB into reactor-netty buffers per request.

Once again, I think this issue can be closed unless you want to proceed and update some javadocs or any other docs.

VadimKirilchuk avatar Jun 10 '24 20:06 VadimKirilchuk

@VadimKirilchuk Thank you for providing this very clear explanation on this issue.

We thought hitting a wall in memory in CRT when downloading large files with concurrency since it looks like this but possibly not.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000..., pid=1, tid=48
#
# JRE version: Java(TM) SE Runtime Environment (17.0.10+11) (build 17.0.10+11-LTS-240)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0.10+11-LTS-240, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [AWSCRT_131480...libaws-crt-jni.so+0x81...]  Java_software_amazon_awssdk_crt_http_HttpClientConnection_httpClientConnectionShutdown+0x1
#
# Core dump will be written. Default location: //core.1
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
[thread 41 also had an error]
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
 
[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f...]

oogetyboogety avatar Jun 14 '24 04:06 oogetyboogety

Hi @oogetyboogety, I am not sure that your issue is similar to mine.

backpressure mechanism not working in CRT

It does work, initially I thought it's not working but down the road it was discovered that due to big part/chunk size reactor-netty on the spring boot side requests too many parts which leads to eventual out of memory if multiple slow clients are downloading huge files in parallel.

The issue provides two solutions for that:

  1. Reduce part size
  2. Tune reactor.netty.send.maxPrefetchSize

why connections were shutdown prematurely, even after we manually increased all idle timeouts in an Istio Envoy proxy or an AWS ALB

You would still need to investigate root cause of the error you have from the JVM.

Problematic frame: C [AWSCRT_131480...libaws-crt-jni.so+0x81...]
Java_software_amazon_awssdk_crt_http_HttpClientConnection_httpClientConnectionShutdown+0x1

The crash happened outside the Java Virtual Machine in native code.

There seem to be some native code issue in the CRT library, but I don't think it's related to the backpressure. Would you mind creating a separate issue for that?

I also think the current issue fall through the cracks and no-one responding to it anymore, so opening new issue would definitely bring more attention to your problem :)

VadimKirilchuk avatar Jun 18 '24 18:06 VadimKirilchuk

@VadimKirilchuk before resolving this issue, I want to ask if there's anything pending from the SDK team. I see across some of the comments the ask to improve the documentation around backpressure, is this still the case? If so, where this would be more visible?

debora-ito avatar Nov 07 '24 22:11 debora-ito

@debora-ito

Yes, one thing is:

S3AsyncClient.crtBuilder()
    .minimumPartSizeInBytes(1L * 1024 * 1024) // 1 MB

Even though parameter name is "uploadPartSize" and javadoc refers to Amazon Multipart Upload Limits, it does control both download and upload chunk sizes.

So, the javadoc is more or less fine except the parameter name itself.

As for backpressure, I am not sure there is a need to update anything.

VadimKirilchuk avatar Nov 08 '24 04:11 VadimKirilchuk