linode-cli icon indicating copy to clipboard operation
linode-cli copied to clipboard

Added --recursive support to upload_object for s3 file storage

Open mechpaul opened this issue 4 years ago • 9 comments

This commit adds --recursive and folder support to upload_object for s3 file storage.

  • Updated the argparse help documentation to show that both files and folders are supported
  • Added --recursive support.
  • Added support for parsing folders

If a user supplies a folder but does not supply --recursive, then only the files from that folder are uploaded.

If a user supplies a folder and supplies --recursive, then all files from that folder recursively are uploaded.

I did not test my code against symbolic/hard links.

mechpaul avatar Dec 04 '21 21:12 mechpaul

Working as intended.

It was coded this way for the following reasons:

  • The Linode web GUI does not allow for the creation of folders
  • The original implementation of this function did not preserve folder structure. For example, if you passed foo/bar/baz.txt, it did not upload foo/bar/baz.txt but only baz.txt. This was done to maintain parity with the older implementation.
  • There is no documentation regarding the preservation of folders in object storage.

To continue with the critique of this, now that I'm running this against larger file sets only the first 1400 files are uploaded. The rest of the files are skipped...? I'm wondering if there is some Linode limitation on the number of files uploaded in one login session.

mechpaul avatar Dec 07 '21 18:12 mechpaul

There's no login session at work here - the CLI uses Object Storage keys to perform operations through this plugin, and these are effectively stateless. There is probably an upper limit to objects in a bucket, but it would be much higher than 1400.

I don't think that preserving folder structure is a break from old behavior; in the example above, I asked the s3cmd to recursively upload tmp/ and did not get all keys prefixed with tmp/, it merely preserved the structure beneath that directory. I think the CLI should do the same thing; tmp/file1 should still end up with the key file1, but tmp/a/file2 should preserve its prefix beneath tmp, become a/file2.

Supposing all structure was flattened, what would happen if I had two files with the same name in different directories during a recursive upload? Take for example:

# set up a new folder, tmp2, with one subdirectory and a file named "pet" in each
wsmith@linode::linode-cli[mechpaul/master]$ mkdir -p tmp2/a
wsmith@linode::linode-cli[mechpaul/master]$ echo "cat" > tmp2/a/pet
wsmith@linode::linode-cli[mechpaul/master]$ echo "dog" > tmp2/pet
# recursively upload tmp2
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj put --recursive tmp2/  277-test
pet
 |####################################################################################################| 100.0%
pet
 |####################################################################################################| 100.0%
Done.
# but only one file exists in the bucket - the second one overrode the first
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj ls 277-test
2021-12-07 19:57  4  pet
# let's see which one it is
wsmith@linode::linode-cli[mechpaul/master]$ python3 -m linodecli obj get 277-test pet
 |####################################################################################################| 100.0%
Done.
wsmith@linode::linode-cli[mechpaul/master]$ cat pet
dog

In the above, I lost my cat. Maybe this explains where some of the files you were attempting to upload went too?

In object storage, there aren't really "folders" - these are emulated client-side using prefixes (i.e. delimitated on a /, a/ is a prefix in a/pet). Most s3 clients (i.e. s3cmd, cyberduck, even the Cloud Manager) display forward-slash delimitated prefixes as folders, but they are essentially in a flat space in object storage. Incidentally, these "folders" only exist if there are any keys under them; in my first example, removing file2 would also remove the a/ folder entirely. Maybe that's why the Cloud Manager won't let you create them yourself?

Dorthu avatar Dec 07 '21 20:12 Dorthu

Ahhhh, I didn't know those folders were just virtual folders in a flat space!

The files I was trying to upload were about 1.2M PNG files - about 14 GB worth. There is no folder structure (all flat), and each file is named {sha256}.png. So no, there's no name clobbering here.

You can reject the PR. I'll work on preserving folder structure and submit a new PR with the changes requested. Thanks for sharing your knowledge with me. Linode is my first cloud platform.

mechpaul avatar Dec 07 '21 20:12 mechpaul

Thanks for contributing! I'm fine to leave this open if you want to push changes here. I also don't mind doing it myself if you don't feel like it doing it.

Dorthu avatar Dec 07 '21 20:12 Dorthu

Go ahead and leave it open. I'll resubmit by EOWeekend.

mechpaul avatar Dec 08 '21 17:12 mechpaul

Hmmm... dealing with a lot of corner cases to support folders on the source computer and virtual folders in s3.

  • Suppose the user puts in ../../myfolder/. In this case, the user would not want to have ../ in the virtual folder name.
  • In either Windows or Linux, what if an absolute path from the directory root (in linux) or drive (D:\whatever) is provided?

Pretty sure I have these figured out now.

After working out these issues, I'm now trying to figure out another issue.

subfolder\00000e5e66a365da9134ab537cf8125f.png |####################################################################################################| 100.0% Error: SignatureDoesNotMatch

Going to dig into this error to figure out what's going on. Looks like it's an error from Boto.

I passed in the following:

linode obj put d:\maplestory\imageslinode\ --recursive happychinchilla

I threw about 70 images in here, half of which are in the folder /subfolder/ which should be preserved in the S3 object key. It's throwing this error on the first image contained within a subfolder.

mechpaul avatar Dec 10 '21 21:12 mechpaul

I took a stab at this in https://github.com/mechpaul/linode-cli/pull/1 (which is a PR against this branch) - let me know if it works for you.

Dorthu avatar Dec 14 '21 20:12 Dorthu

I commented in the above-linked PR, but I can reproduce that SignatureDoesNotMatch error on the master branch; it seems to be a bug with how some characters in file names are treated (in your case \, in mine $).

Dorthu avatar Dec 14 '21 20:12 Dorthu

I put up a fix for the bug in #278; let me know if it works for you

Dorthu avatar Dec 14 '21 20:12 Dorthu