Gzip with a large buffer source doesn't raise an error
Version
v24.11.1
Platform
Linux cc96c567cc46 6.12.53 #1 SMP PREEMPT_DYNAMIC Tue Oct 28 18:05:29 UTC 2025 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
import { createReadStream, createWriteStream } from 'node:fs';
import { Readable } from 'node:stream';
import { pipeline } from 'node:stream/promises';
import { createGunzip, createGzip } from 'node:zlib';
import { buffer } from 'node:stream/consumers';
const buf = Buffer.allocUnsafe(4646837723);
await pipeline(Readable.from([buf]), createGzip(), createWriteStream('test.gz', { flags: 'w' }));
const out = await buffer(createReadStream('test.gz').pipe(createGunzip()));
console.log(out.length);
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior? Why is that the expected behavior?
throw an error or output 4646837723
What do you see instead?
output 351870427
Additional information
No response
From my research regarding this issue this is what I found.
Gzip uses a 32-bit field (ISIZE) to store the original uncompressed size.
This field can only represent values up to:
2^32 = 4,294,967,296 bytes
Your input buffer is:
4,646,837,723 bytes
Since this exceeds the 4 GB limit, gzip wraps the number modulo 2^32, exactly as defined in RFC 1952 (“ISIZE contains the size of the original input modulo 2^32”).
The modulo calculation is:
4646837723 - 4294967296 = 351870427
So the value stored in the gzip trailer becomes:
351,870,427 bytes
This is why gunzip() returns an output of length 351,870,427, not the real 4.6 GB size.
This is expected behavior per the gzip spec — gzip cannot represent uncompressed sizes ≥ 4 GB.
If the original size must be preserved, gzip cannot be used; ZIP64 or Zstandard are required.
Related links: https://unix.stackexchange.com/questions/612905/how-portable-is-a-gzip-file-over-4-gb-in-size
So from my understanding,
This is not a Node.js bug. The behavior is due to a limitation in the gzip file format specification (RFC 1952). Gzip stores the uncompressed size in a 32-bit field (ISIZE), which can only represent values up to 4 GB. Any file larger than that will wrap around modulo 2³², which is why the output is smaller than the original size.
You can throw error manually without changing internals something like this
import { createReadStream, createWriteStream } from 'node:fs';
import { Readable } from 'node:stream';
import { pipeline } from 'node:stream/promises';
import { createGunzip, createGzip } from 'node:zlib';
import { buffer } from 'node:stream/consumers';
const MAX_GZIP_SIZE = 2 ** 32;
const buf = Buffer.allocUnsafe(4646837723);
if (buf.length > MAX_GZIP_SIZE) {
throw new Error(`Input too large for gzip: ${buf.length} bytes`);
}
await pipeline(Readable.from([buf]), createGzip(), createWriteStream('test.gz', { flags: 'w' }));
const out = await buffer(createReadStream('test.gz').pipe(createGunzip()));
console.log(out.length);
Not true. If I split the buffer manually and write all of them to the stream. Everything will be fine. Like this:
// entire test code
import { createReadStream, createWriteStream } from 'node:fs';
import { unlink } from 'node:fs/promises';
import { Readable } from 'node:stream';
import { buffer } from 'node:stream/consumers';
import { pipeline } from 'node:stream/promises';
import { createGunzip, createGzip } from 'node:zlib';
const buf = Buffer.allocUnsafe(4646837723);
console.time('test');
await pipeline(Readable.from([buf]), createGzip(), createWriteStream('test.gz', { flags: 'w' }));
const out = await buffer(createReadStream('test.gz').pipe(createGunzip()));
await unlink('test.gz');
console.timeLog('test', 'Write with single buffer: ' + out.length);
const kChunkSize = 2 ** 31 - 1;
const parts = [];
for (let i = 0; i < buf.length; i += kChunkSize) {
parts.push(buf.subarray(i, i + kChunkSize));
}
await pipeline(Readable.from(parts), createGzip(), createWriteStream('test-2.gz', { flags: 'w' }));
const out2 = await buffer(createReadStream('test-2.gz').pipe(createGunzip()));
await unlink('test-2.gz');
console.timeLog('test', 'Write with multiple buffer chunks: ' + out2.length + ', is same: ' + buf.equals(out2));
The execute output is:
test: 831.56ms Write with single buffer: 351870427
test: 19.653s Write with multiple buffer chunks: 4646837723, is same: true