Leaving partial large files during sync
I'm syncing using the b2API functions directly from Python (3.5).
I basically just copied much of the Sync function in command_tool.py - with the work being done with the sync_folders function.
One thing I've noticed - during testing for various reasons the upload was forcibly cancelled (either crashing out, or more likely, me closing it). This has resulted in some of my files having a number of "started large file" elements showing on the browse-files list. Even though the files are completed and the folders are now fully in sync (though I haven't allowed deleting).
Can the sync_folders function be improved to continue these files rather than start a new one on restart?
The resuming of unfinished files is handled in the _upload_large_file function. This function is used by all upload functions for large files, so resume should for sync. Are you sure that that files didn't change and sync is actually starting a new upload? For large files it may take a while to verify that the uploaded parts match the local file.
The files definitely didn't change.
Looking at list_unfinished_large_files, I can see a total of 7 entries encompassing 3 different files.
All files are showing as completed on the web interface, and don't "continue" or otherwise get overwritten or do anything when I re-run sync.
As far as sync_folders appears to be concerned, all files are finished.
What you describe is a vague scenario of a bug in resume. We don't know the precise occurrence condition though, so investigation can be very hard. Can you somehow narrow it down to a reproducible scenario?
Can you check if the unfinished files have parts that finished uploading (b2_list_parts)? Only files with at least one part are considered for resume. Maybe we should change that.
Am 04.10.2016 01:22 schrieb "Paweł Polewicz" [email protected]:
What you describe is a vague scenario of a bug in resume. We don't know the precise occurrence condition though, so investigation can be very hard. Can you somehow narrow it down to a reproducible scenario?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Backblaze/B2_Command_Line_Tool/issues/255#issuecomment-251255142, or mute the thread https://github.com/notifications/unsubscribe-auth/AQGT4aFDg5EjfakFci2jRtRm8bTSNUm4ks5qwY42gaJpZM4KM8nX .
His upload speed is very limited and he was just testing it, so it is likely that no parts were finished.
How can we judge whether to close a large file upload or not, when no parts are finished?
This is 100% reproducable for me very easily:
- Run Sync script
- Close the console window the script is running in.
- Run Sync script again. The script will start again and leave the unfinished file-part in place.
All of them show 0 bytes in the web interface, so I guess no parts finished (I generally terminated it very quickly).
@bwbeach @svonohr I think there is a bigger issue here. How can we know whether a given large file upload can be removed or not? Do we have a reliable way?
I think there are two things we can do:
- Also resume files with no finished parts. This doesn't help upload time, but doesn't create unnecessary unfinished files. However, this could lead to problems when uploading files with the same name in parallel (but that's a problem anyway with upload resuming).
- Clean unfinished files during sync with the --delete or --keepDays option, just like any other file. This will get rid of old unfinished files.
I experience the same behaviour with unfinished large files using the b2 command line interface. My line speed is reasonable. During sync operations, I do see errors logged to the console, but sync continues.
What I'd like is for sync (or upload...) to pick up and complete the operation. Reloading a 9/10ths finished large file costs me metered bandwidth. Most of my files are 1 GB in size.
@kazsulec if the problem is reproducible, could you enable logs and share them? b2 CLI already retries a part upload on failure, it is seems odd that it does not work for you.