B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

sha1 empty on uploading file < 5GB in size

Open theproductiveprogrammer opened this issue 6 years ago • 8 comments

When I try to upload files with size ~4.7 GB using b2 upload-file the upload works fine BUT the resulting file does not have a contentSha1.

For example here is the result of using b2 list-file-versions for one file. Note that contentSha1 is empty:

{
  "files": [
    {
      "accountId": "5e87fabdd376",
      "action": "upload",
      "bucketId": "858e38d72fdaebcd6de30716",
      "contentLength": 4718592000,
      "contentSha1": null,
      "contentType": "application/octet-stream",
      "fileId": "4_z858e38d72fdaebcd6de30716_f211ba5d67ec9dfa7_d20191214_m100048_c001_v0001131_t0026",
      "fileInfo": {
        "src_last_modified_millis": "1576071219000"
      },
      "fileName": "high/2019-12-12_00_27.2.dar",
      "uploadTimestamp": 1576317648000
    }
  ],
  "nextFileId": null,
  "nextFileName": null
}

According to the large files documentation files of upto 5GB are 'normal' files and should have their contentSha1 set.

I have tried uploading with and without the --sha1 flag set. It seems to make no difference.

Steps to reproduce

  1. Upload a large-ish file: b2 upload-file bucket bigfile test/bigfile
  2. Upload another large-ish file: b2 upload-file --sha1 sha bucket anotherfile test/anotherfile
  3. Check properties: b2 list-file-versions bucket

contentSha1 should be available. It will be null.

theproductiveprogrammer avatar Dec 15 '19 20:12 theproductiveprogrammer

The default part size is 100MB per default, so your 5GB file will be uploaded as a large file. You need to increase the minimum part size. The b2 CLI upload-file documentation states:

By default, the file is broken into as many parts as possible to maximize upload parallelism and increase speed. The minimum that B2 allows is 100MB. Setting --minPartSize to a larger value will reduce the number of parts uploaded when uploading a large file.

svonohr avatar Dec 15 '19 20:12 svonohr

Ah I see - thanks @svonohr. That makes sense now.

theproductiveprogrammer avatar Dec 16 '19 11:12 theproductiveprogrammer

Just a note - the size supported by the cloud was reduced to 5MB but the message in CLI was not updated. The actual variable is now dynamically set by the server during authorization.

ppolewicz avatar Dec 16 '19 14:12 ppolewicz

What I understand from this now is that Sha's are not useful for validating that the file data is not corrupted on the server. In fact, there seems to be no way to validate that data on Backblaze is not corrupted (due to hardware or software issues). I guess that means we need a backup for any backups on Backblaze!

theproductiveprogrammer avatar Dec 16 '19 20:12 theproductiveprogrammer

TBH the most faulty place for the data to be corrupted is a consumer-grade hard drive streaming data through consumer-grade (non-ECC) memory using a possibly not properly vented CPU. Then even if it is hashed for transfer and storage, after you restore and verify the hash, you still end up with corrupted data as it was corrupted on read.

@theproductiveprogrammer please see the "Managing metadata for interoperability" section of b2 integration checklist to better understand where the checksum is stored for large files.

If you are passing --sha1 to an upload and it does not get saved into file_info as per the document above, that's a bug. Please confirm whether that's actually the case and if it is, then we'll definitely fix it.

ppolewicz avatar Dec 16 '19 23:12 ppolewicz

If you are passing --sha1 to an upload and it does not get saved into file_info as per the document above, that's a bug. Please confirm whether that's actually the case and if it is, then we'll definitely fix it.

Yes @ppolewicz when passing --sha as a parameter the return value of list-file-versions still shows contentHash as empty for big files.

theproductiveprogrammer avatar Dec 17 '19 01:12 theproductiveprogrammer

Hello. I'm experiencing a similar issue. I am using the command line tool with a file of 434MB.

I am using the command:

b2 upload-file --sha1 <the sha1 hash> <bucket> ./<file> <file>

The resulting upload results in an object in storage with the following fields

image

I'm not sure if I am doing something wrong or if there is an issue here.

Thanks

mrbluehollywoo avatar Feb 14 '20 09:02 mrbluehollywoo

I think this will be fixed with b2sdk 1.1.0 release where the upload management code has been rewritten

ppolewicz avatar May 28 '20 13:05 ppolewicz