[Bug]: Can't copy large files to container
Testcontainers version
3.8.0
Using the latest Testcontainers version?
Yes
Host OS
Windows
Host arch
x86
.NET version
8.0.3
Docker version
Client: Podman Engine
Version: 4.9.2
API Version: 4.9.2
Go Version: go1.21.6
Git Commit: f9a48ebcfa9a39144be0f86f4ba842752835f945
Built: Sat Feb 3 00:29:04 2024
OS/Arch: windows/amd64
Server: Podman Engine
Version: 4.7.0
API Version: 4.7.0
Go Version: go1.20.8
Built: Wed Sep 27 20:24:38 2023
OS/Arch: linux/amd64
Docker info
host:
arch: amd64
buildahVersion: 1.32.0
cgroupControllers: []
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: conmon-2.1.7-2.fc38.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.7, commit: '
cpuUtilization:
idlePercent: 99.76
systemPercent: 0.07
userPercent: 0.17
cpus: 16
databaseBackend: boltdb
distribution:
distribution: fedora
variant: container
version: "38"
eventLogger: journald
freeLocks: 2036
hostname: mlse2068
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 524288
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 524288
size: 65536
kernel: 5.15.146.1-microsoft-standard-WSL2
linkmode: dynamic
logDriver: journald
memFree: 30362726400
memTotal: 33512914944
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.8.0-1.fc38.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.8.0
package: netavark-1.8.0-2.fc38.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.8.0
ociRuntime:
name: crun
package: crun-1.9.2-1.fc38.x86_64
path: /usr/bin/crun
version: |-
crun version 1.9.2
commit: 35274d346d2e9ffeacb22cc11590b0266a23d634
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20231004.gf851084-1.fc38.x86_64
version: |
pasta 0^20231004.gf851084-1.fc38.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.1-1.fc38.x86_64
version: |-
slirp4netns version 1.2.1
commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.3
swapFree: 8589934592
swapTotal: 8589934592
uptime: 25h 7m 1.00s (Approximately 1.04 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
store:
configFile: /home/user/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/user/.local/share/containers/storage
graphRootAllocated: 1081101176832
graphRootUsed: 31850049536
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 54
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /home/user/.local/share/containers/storage/volumes
version:
APIVersion: 4.7.0
Built: 1695839078
BuiltTime: Wed Sep 27 20:24:38 2023
GitCommit: ""
GoVersion: go1.20.8
Os: linux
OsArch: linux/amd64
Version: 4.7.0
What happened?
Trying to use a large file with .WithResourceMapping results in an IOException stating that the stream is too long. The file I'm copying is a few gigabytes in size.
Relevant log output
System.IO.IOException: Stream was too long.
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.MemoryStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
--- End of stack trace from previous location ---
at ICSharpCode.SharpZipLib.Tar.TarBuffer.WriteRecordAsync(CancellationToken ct, Boolean isAsync)
at ICSharpCode.SharpZipLib.Tar.TarBuffer.WriteBlockAsync(Byte[] buffer, Int32 offset, CancellationToken ct, Boolean isAsync)
at ICSharpCode.SharpZipLib.Tar.TarOutputStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken, Boolean isAsync)
at System.IO.Stream.<CopyToAsync>g__Core|27_0(Stream source, Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at System.IO.Strategies.BufferedFileStreamStrategy.CopyToAsyncCore(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at DotNet.Testcontainers.Containers.TarOutputMemoryStream.AddAsync(DirectoryInfo directory, FileInfo file, UnixFileModes fileMode, CancellationToken ct) in /_/src/Testcontainers/Containers/TarOutputMemoryStream.cs:line 145
Additional information
No response
Thanks for creating the issue. It looks like that it fails copying the file stream content to the underlying SharpZipLib stream.
https://github.com/testcontainers/testcontainers-dotnet/blob/ac58b9f28cceb5b9db6f30e86ea3a9dae37e3bbb/src/Testcontainers/Containers/TarOutputMemoryStream.cs#L145-L146
How large are a few gigabytes? It should not be difficult to reproduce, I guess 😬. I can try to reproduce it later the day.
I did not remember the actual implementation, but after spending a few minutes looking at it, the exception and error you are seeing make sense. The maximum size of a MemoryStream object is approximately 2GB (2,147,483,591 bytes).
public sealed class GitHub : IResourceMapping
{
public MountType Type => MountType.Tmpfs;
public AccessMode AccessMode => AccessMode.ReadOnly;
public string Source => "foo";
public string Target => "foo";
UnixFileModes IResourceMapping.FileMode => Unix.FileMode755;
[Fact]
public async Task Issue1180()
{
using var memoryStream = new MemoryStream();
// using var fileStream = new FileStream(Target, FileMode.CreateNew, FileAccess.Write, FileShare.Read);
using var tarOutputMemoryStream = new TarOutputMemoryStream(memoryStream, NullLogger.Instance);
await tarOutputMemoryStream.AddAsync(this);
}
Task IFutureResource.CreateAsync(CancellationToken ct)
{
return Task.CompletedTask;
}
Task IFutureResource.DeleteAsync(CancellationToken ct)
{
return Task.CompletedTask;
}
Task<byte[]> IResourceMapping.GetAllBytesAsync(CancellationToken ct)
{
// https://learn.microsoft.com/en-us/dotnet/api/system.array.
const int maxArrayDimension = 2147483591;
return Task.FromResult(new byte[maxArrayDimension]);
}
}
This example demonstrates it very well. If you change the MemoryStream to a FileStream (you will need to adjust the TarOutputMemoryStream ctor), the issue no longer occurs. Storing that amount of data in memory does not make a lot of sense. Streaming it would be more efficient, but at this point, I have no idea how to forward the data internally without taking a closer look at it. Right now, I assume supporting files larger than 2GB (total tarball size) will require some more work than I initially thought.
I've only glanced at the implementation when debugging the issue, but is it possible to determine the size of the files and then choose either file or memory based on that?
is it possible to determine the size of the files and then choose either file or memory based on that?
I do not think that is an appropriate fix. Writing it to a file and then reading it again won't be very performant. I think it is better to properly support and forward a stream.