MBrace.Azure icon indicating copy to clipboard operation
MBrace.Azure copied to clipboard

WASB support

Open isaacabraham opened this issue 8 years ago • 14 comments

This should close #158 and puts in place the groundwork for a resolution to #65.

This PR is probably not quite ready for accepting but I'd still like some review on this if possible.

isaacabraham avatar Jun 02 '17 16:06 isaacabraham

  • Is this going to be breaking change?
  • I'd also like to see some tests exercising the new (+new vs old) behaviour.
  • Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo MBrace.Azure.Extensions
  • You will also need to make sure that the runtime itself doesn't use (or is not affected in any disruptive way) the affected store interfaces for it's own files (e.g. uploaded assemblies, results of computations, etc).

I saw a comment on a related issue:

Then, when you create a CloudFileInfo, if you forget to put a leading / in the path, it inserts the /mbraceuserdata folder at the front. I have no idea what the reasoning behind this is (or the implications of removing it). If you put a / at the front, this doesn't happen at all.

'Processes' start in a specific directory called /mbraceuserdata. That is the current working directory. If you put a leading / then your path becomes an absolute path. If you don't then it's relative to the current working directory.

The same way that you start your shell and

$ pwd
/Users/username # this is your current directory
$ touch foo.txt     # /Users/username/foo.txt
$ touch /foo.txt    # /foo.txt

krontogiannis avatar Jun 03 '17 11:06 krontogiannis

@krontogiannis Thanks for feedback :-)

  1. Yes, it's a breaking change in MBrace.Azure because it affects the way that paths are used. foo/bar.txt would be invalid, as would /foo/bar.txt. Instead, you must specify a container using WASB notation e.g. [email protected]. There are a few reasons I want to go down this route, but the main one is that the current implement is somewhat inconsistent and there's "magic" going on: -
  • If you forget to put a leading slash, it goes to a completely different location i.e. /foo/bar maps to a container foo and a file bar, but foo/bar goes to a container called $root and a file called foo/bar.
  • There's the "magic" mbraceuserdata container as well. How this is used is again a little magic and unclear for the user (and there's #158 as a bug related to this as well).
  1. This has no impact (currently) on MBrace.Core - it's just on MBrace.Azure. Clearly the build currently doesn't work because there are some tests that need to be updated, but currently the only changes are in MBrace.Azure.

  2. I'm not sure how this could be implemented as an extension.

  3. Yes, this needs to look at the runtime stuff (I think that's why the build currently fails).

Again, this PR isn't ready for consumption but I'm grateful for your time to tell me some points to look at - I'll continue looking at this next week.

isaacabraham avatar Jun 04 '17 11:06 isaacabraham

I've had a think about this over the weekend, and I'm wavering on this. The main problem I have with the current implementation is that it's easy to accidentally start working with files in another location than what you thought, simply by forgetting the leading slash. The idea of the default directory only further serves to confuse matters.

Perhaps there's a way to keep the current behaviour and optionally also allow WASB paths.

isaacabraham avatar Jun 05 '17 11:06 isaacabraham

I think this is great feature work, though I agree tests are needed

Is this going to be breaking change?

This wouldn't concern me - growing functionality and usage is most important at this point

Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo MBrace.Azure.Extensions

For MBrace.Azure I'd like to see WASB be the default, per this comment from @eiriktsarpalis:

I think that the WASB approach would be great to adopt as the default uri scheme in MBrace.Azure, since it allows support for multiple accounts and it trims down verbosity by allowing relative paths. The current "container/file.txt" uri scheme is problematic, since it is ambiguous; this could simply be the name of a blob and not a full path.

dsyme avatar Jun 05 '17 17:06 dsyme

I've made some changes (although there's still some tests I need to write + feedback) so it's essentially backwards compatible now (hence why it's green).

The table below should show the different permutations. Default Container means the contextual "default container" of the blob store - this might be something like mbraceassemblies or mbraceuserdata depending on the case. Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data). In my view, we would recommend the last two rows as the "recommended" way of using the blob store in future. The others are kind of "magic" - you have to know about this default folder, you can easily forget a leading slash etc., and you can also come unstuck if for some reason you get a blob store without a default folder and start writing to the $root container.

Client Path Default Container Old Blob Path New Blob Path
foo/bar.txt vagabond vagabond@foo/bar.txt vagabond@foo/bar.txt
foo/bar.txt None $root@foo/bar.txt WASB Exception
/foo/bar.txt vagabond [email protected] vagabond@foo/bar.txt
/foo/bar.txt None [email protected] WASB Exception
bar.txt vagabond [email protected] [email protected]
bar.txt None [email protected] WASB Exception
blah@foo/bar.txt vagabond not supported blah@foo/bar.txt
blah@foo/bar.txt None not supported blah@foo/bar.txt

This applies for both CloudFile and CloudDirectory operations.

There's one hack I've had to put in currently which revolves around container names when creating a dictionary.

Also note that this isn't the full WASB spec (which also allows you to include an account as well) but it puts us in a much better place for if we address the ability to connect to multiple storage accounts from an MBrace cluster.

Feedback welcome.

isaacabraham avatar Jun 08 '17 16:06 isaacabraham

Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data).

I very much agree in retrospect. I recommend we just go ahead with that change?

dsyme avatar Jun 08 '17 16:06 dsyme

Which change - this one or also change MBrace.Core to not put in a "default" directory for user operations?

isaacabraham avatar Jun 08 '17 16:06 isaacabraham

... also change MBrace.Core to not put in a "default" directory for user operations?

this

dsyme avatar Jun 08 '17 16:06 dsyme

I've discovered that the default user directory is a feature that's localised to MBrace.Azure, and it was a relatively painless removal. This means that the above matrix can be simplified when working with user data as follows: -

Client Path Old Blob Path New Blob Path
foo/bar.txt $root@foo/bar.txt WASB Exception
/foo/bar.txt [email protected] WASB Exception
bar.txt [email protected] WASB Exception
blah@foo/bar.txt not supported blah@foo/bar.txt

In other words - this would now be a breaking change, but FWIW I think it's a lot less open to misinterpretation and acknowledges that trying to map conventional file paths to blobs is a somewhat leaky abstraction.

isaacabraham avatar Jun 09 '17 06:06 isaacabraham

Basic testing being done on: -

  • [x] Basic serialization (checking that assemblies + .vgb files can be accessed by mbrace itself)
  • [x] CloudFile and CloudDirectory
  • [x] CloudFlow
  • [x] CloudValue
  • [ ] CloudDictionary

I'm not sure what else is needed.

isaacabraham avatar Jun 09 '17 09:06 isaacabraham

@isaacabraham Do tests pass? Remember CI testing is not enabled on this repo, so we currently have to run tests manually

dsyme avatar Jun 09 '17 12:06 dsyme

I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through).

Once that's working I'll add more tests.

isaacabraham avatar Jun 09 '17 13:06 isaacabraham

I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through). Once that's working I'll add more tests.

@isaacabraham Getting a subset of the tests running on CI under simulated Azure storage would be absolutely fantastic progress in the long term. It's going to be hard to get contributions without that

The other thing I think we really need to help enable contributions is a way to co-develop FsPickler+Vagabond+MBrace.Core+MBrace.Azure/AWS+StarterKit (or + one's own data scripting code) in one smooth build+test+deploy+use inner-dev workflow. In some ways I regret how the repos have been split into multiple github project, as propagating a fix from FsPickler or Mono.Cecil through to actually testing it in your own data scripting code is incredibly painful currently requiring updating about 15 nuget packages.

The future of MBrace surely has to be as a go-to solution for F#-centric data scripters who have Spark-like needs and want to have control over a big data scripting stack that they can easily comprehend and contribute to. This means it must be possible to iterate from an idea "MBrace would be better if it just had XYZ" to refining, using and contributing that idea rapidly.

dsyme avatar Jun 09 '17 13:06 dsyme

@dsyme agree 100% - i'm feeling that same pain now (and is exactly why I'm shying away from changes to MBrace.Core here). One of the things I looked at a while ago was dumping reliance on service bus and moving entirely onto storage queues - this should mean that we could just use the storage emulator entirely for MBrace.Azure in a local context for all three components (storage = blobs, state = tables, messaging = queues).

isaacabraham avatar Jun 09 '17 13:06 isaacabraham