NettyNioAsyncHttpClient Channel Cache Memory Usage
Describe the issue
We're using S3AsyncClient to process customer data from S3. As such we create shared instance of NettyNioAsyncHttpClient for s3 clients. The NettyNioAsyncHttpClient holds pool of netty channels for each bucket we process. This can grow significantly with number of customers. Could there be such cache per region?
Steps to Reproduce
Long running app periodically accessing customers S3 buckets. The memory grows with number of customers.
Your Environment
- AWS Java SDK version used: 2.15.45
- JDK version used: 11
- Operating System and version:
Hi @langrp thanks for reaching out. NettyNioAsyncHttpClient actually doesn't have the knowledge of S3 buckets. A channel corresponds to a TCP connection to the S3 service, and the same connection can be re-used by multiple requests, which might be associated with the same bucket or different buckets. By default, NettyNioAsyncHttpClient has 50 maximum connections, configured by NettyNioAsyncHttpClient.Builder#maxConcurrency, so with the default configuration, there are at most 50 channels in the pool.
The memory grows with number of customers.
Do you mean the objects are not getting GC'd and the memory continue to grow over time, eventually leading to out of memory errors? If so, could you do a heap dump? Could you share the sample code of how you create the S3Client and NettyNioAsyncHttpClient?
Hi, thank you for coming back so quickly.
Apologies for not being more specific. I refer to the SdkChannelPoolMap populated with keys from here, which is based on request URI. In case of S3 the request host is prefixed by bucket name, therefore each bucket have its own ChannelPool (configured with max-concurrency you mentioned).
The client is created this way
NettyNioAsyncHttpClient.Builder b = NettyNioAsyncHttpClient.builder();
b.connectionTimeout(Duration.ofMillis(10000))
.readTimeout(Duration.ofMillis(30000))
.writeTimeout(Duration.ofMillis(30000));
.maxConcurrency(20)
.maxPendingConnectionAcquires(1000)
.connectionAcquisitionTimeout(Duration.ofMillis(30000))
.build();
Though the pool I refer has still over 890 entries as per below table extracted from my heap dump.
| Class Name | Shallow Heap | Retained Heap |
|---|---|---|
| software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient @ 0x3814503c8 | 24 | 779,011,936 |
| - pools software.amazon.awssdk.http.nio.netty.internal.AwaitCloseChannelPoolMap @ 0x38357ddb8 | 56 | 779,011,888 |
| - map java.util.concurrent.ConcurrentHashMap @ 0x38357ddf0 | 64 | 779,011,568 |
| - table java.util.concurrent.ConcurrentHashMap$Node[2048] @ 0x381f551f0 - 896 entries | 8,208 | 779,011,432 |
Thank you for the clarification! I see what you mean now. Currently, we only clean up channel pools from SdkChannelPoolMap when the client is closed. We'd need to investigate it to see the best way to address the issue. One possible solution is to have a separate thread running periodically to remove the channel pools that do not have open connections. Will discuss with the team.
If I may propose solution. I'd enhance the SdkHttpRequest to be region aware and use such url as a key into the map. That way we'd lower down the size of map, while keeping the performance. It would also allow us to share the same http client across service clients such as sns, s3, and not increasing the size of map. I think most of the service clients are aware of region they connect to. What do you and your team think about that?
Absolutely! We welcome suggestions and new ideas! 🙂 Could you elaborate on caching region? I'm not sure how that would lower down the size of the map because we'd still need to have the mapping between the host and connections.
We talked about another solution to just remove a channel pool from the channel pool map automatically when the last connection is closed.
My goal is to have one channel pool per region. So any S3 bucket request within the region would be able to use the connection from the channel pool. This would decrease the size of the map to number of regions our app needs to connect. Therefore instead of channel pool map size depends on number of buckets, we would have size dependent on number of regions.
From my understanding each service client is aware of region to which it communicates. So whenever I create S3AsyncClient for us-east-1 and list objects in bucket my-test it would grab the connection for that region.
It seems even with the region, we would still need a map inside SdkChannelPoolMap, something like Map<Region, Map<URI, SimpleChannelPoolAwareChannelPool>>. So the inner map size would still depend on the number of URIs - the number of buckets in your case.
I think I have similar problem where the retained heap grows
and same for map node
Do you know about any solution which can be applied to workaround the issue ?
Same problem here.
In scenarios where buckets (uris) come and go it even gets worse -
just think of creating and deleting (trial) customers with having their isolated buckets -
software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient#pools
turns into a severe memory leak - as i cannot see that entries get removed.
you may end up in thousands of map entries with quite expensive memory-footprint (uri + software.amazon.awssdk.http.nio.netty.internal.SimpleChannelPoolAwareChannelPool)
i guess a "low hanging fruit" fix for this problem would be to support some configurable ttl (time to live) for the current ever growing map based "pools" implementation (eg. turn software.amazon.awssdk.http.nio.netty.internal.SdkChannelPoolMap#map into a cache which removes entries that were not accessed for ttl) - to not break anything you could stick to a default of no ttl.
What do you think?