tus-node-server icon indicating copy to clipboard operation
tus-node-server copied to clipboard

Upload fails - file metadata is missing in GCS

Open sumit-rd opened this issue 2 years ago • 2 comments

I have react application in production (tus node server with tus gcs store) with many active users. From time to time my server is being reset from error I get from tus, due to file metadata is missing. The error is:

            /node_modules/@tus/gcs-store/dist/index.js:130
            const { size, metadata: meta } = metadata.metadata;
                    ^
            TypeError: Cannot destructure property 'size' of 'metadata.metadata' as it is undefined.

I can’t find a way to reproduce it, and therefore I cant find a way to fix it. I also couldn’t find a way to catch this error and prevent the reset.

sumit-rd avatar Nov 29 '23 11:11 sumit-rd

I added a size field to the meta object I was passing, set to the file.size but just discovered that didn't work

dpmillerau avatar Nov 13 '24 23:11 dpmillerau

I've been experiencing this error due to ratelimiting of GCS, caused my multiple requests from the Node server to GCS within a short timeframe whenever an upload finishes. tus's GCS implementation uses file metadata to save information on the total file size. Upon a request to continue an upload, the server first requests an existing file from GCS. If the file does not exist, the server begins a new upload. If the file does exist, the server checks the metadata of the file to determine where to continue the upload. Problems arise if a file's metadata is lost. tus then no longer knows where to continue the upload, and must instead restart from the beginning. The current implementation completely crashes if metadata is not found.

Problem

Every time a chunk is received from a client phone, tus makes most five calls to GCS:

  1. Retrieves the existing file and its metadata
  2. Uploads new chunk to GCS
  3. Tells GCS to combine the existing file and new chunk; this creates a new file without metadata, which has the contents of the original file and the new file appended
  4. Sets metadata of new file to the updated metadata
  5. Deletes the newly-uploaded chunk, as it has been appended to the new file

If 4 fails to execute, the file loses its metadata, and tus is no longer able to resume the upload. Pushing so many requests to GCS easily causes ratelimits to occur. Note that ratelimits are more likely to occur with multiple concurrent uploads, as well as with a smaller chunk size, as more requests will be sent overall. Currently, my $5$ concurrent uploads with chunk size 256KB is ratelimited every $10-20$ seconds.

Current working solution: backoff/retry algorithm

tus implements steps 3, 4, and 5 on lines 112-116 of @tus/gcs-store/src/index.ts. Instead of these single-attempt calls, we can wrap each call in a backoff/retry function that attempts to connect to GCS and retries after a delay if it fails. This function waits $2^n$ seconds after the $n$-th failed attempt, up to a maximum of 5 attempts with $1+2+4+8=15$ total seconds waiting (waits are only in-between pairs of function calls, so the $5^{th}$ attempt does not wait).

async function attempt(func) {
	let backoff = 1000	// in milliseconds
	let limit = 5

	for (let i = 0; i < limit; i++) {
		try {
			await func();
			break;
		} catch (e) {
			if (i === limit-1 || e.code != 429) throw e;
			await new Promise(res => setTimeout(res, backoff));
			backoff *= 2
		}
	}
}

Then within GCSStore#write, replace the following lines:

try {
	if (file !== destination) {
		await this.bucket.combine([file, destination], file)
		await Promise.all([
			file.setMetadata(options.metadata),
			destination.delete({ignoreNotFound: true}),
		])
	}

	resolve(bytes_received)
} catch (error) {
	log(error)
	reject(ERRORS.FILE_WRITE_ERROR)
}

with:

try {
	if (file !== destination) {
		await attempt(() => this.bucket.combine([file, destination], file))
		await attempt(() => file.setMetadata(options.metadata))
		await attempt(() => destination.delete({ ignoreNotFound: true }))
	}
	resolve(bytes_received);
}
catch (error) {
	log(error);
	reject(ERRORS.FILE_WRITE_ERROR);
}

Note that this implementation still crashes if all calls within attempt are ratelimited. I'll add better error checking later, but it works well for my usecase for the time being.

GCS quota and retry strategy documentation: https://cloud.google.com/storage/quotas https://cloud.google.com/storage/docs/retry-strategy

JasonFeng365 avatar Jun 27 '25 22:06 JasonFeng365