WASB support
This should close #158 and puts in place the groundwork for a resolution to #65.
This PR is probably not quite ready for accepting but I'd still like some review on this if possible.
- Is this going to be breaking change?
- I'd also like to see some tests exercising the new (+new vs old) behaviour.
- Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo
MBrace.Azure.Extensions - You will also need to make sure that the runtime itself doesn't use (or is not affected in any disruptive way) the affected store interfaces for it's own files (e.g. uploaded assemblies, results of computations, etc).
I saw a comment on a related issue:
Then, when you create a CloudFileInfo, if you forget to put a leading / in the path, it inserts the /mbraceuserdata folder at the front. I have no idea what the reasoning behind this is (or the implications of removing it). If you put a / at the front, this doesn't happen at all.
'Processes' start in a specific directory called /mbraceuserdata. That is the current working directory. If you put a leading / then your path becomes an absolute path. If you don't then it's relative to the current working directory.
The same way that you start your shell and
$ pwd
/Users/username # this is your current directory
$ touch foo.txt # /Users/username/foo.txt
$ touch /foo.txt # /foo.txt
@krontogiannis Thanks for feedback :-)
- Yes, it's a breaking change in MBrace.Azure because it affects the way that paths are used.
foo/bar.txtwould be invalid, as would/foo/bar.txt. Instead, you must specify a container using WASB notation e.g.[email protected]. There are a few reasons I want to go down this route, but the main one is that the current implement is somewhat inconsistent and there's "magic" going on: -
- If you forget to put a leading slash, it goes to a completely different location i.e.
/foo/barmaps to a containerfooand a filebar, butfoo/bargoes to a container called$rootand a file calledfoo/bar. - There's the "magic"
mbraceuserdatacontainer as well. How this is used is again a little magic and unclear for the user (and there's #158 as a bug related to this as well).
-
This has no impact (currently) on MBrace.Core - it's just on MBrace.Azure. Clearly the build currently doesn't work because there are some tests that need to be updated, but currently the only changes are in MBrace.Azure.
-
I'm not sure how this could be implemented as an extension.
-
Yes, this needs to look at the runtime stuff (I think that's why the build currently fails).
Again, this PR isn't ready for consumption but I'm grateful for your time to tell me some points to look at - I'll continue looking at this next week.
I've had a think about this over the weekend, and I'm wavering on this. The main problem I have with the current implementation is that it's easy to accidentally start working with files in another location than what you thought, simply by forgetting the leading slash. The idea of the default directory only further serves to confuse matters.
Perhaps there's a way to keep the current behaviour and optionally also allow WASB paths.
I think this is great feature work, though I agree tests are needed
Is this going to be breaking change?
This wouldn't concern me - growing functionality and usage is most important at this point
Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo MBrace.Azure.Extensions
For MBrace.Azure I'd like to see WASB be the default, per this comment from @eiriktsarpalis:
I think that the WASB approach would be great to adopt as the default uri scheme in MBrace.Azure, since it allows support for multiple accounts and it trims down verbosity by allowing relative paths. The current "container/file.txt" uri scheme is problematic, since it is ambiguous; this could simply be the name of a blob and not a full path.
I've made some changes (although there's still some tests I need to write + feedback) so it's essentially backwards compatible now (hence why it's green).
The table below should show the different permutations. Default Container means the contextual "default container" of the blob store - this might be something like mbraceassemblies or mbraceuserdata depending on the case. Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data). In my view, we would recommend the last two rows as the "recommended" way of using the blob store in future. The others are kind of "magic" - you have to know about this default folder, you can easily forget a leading slash etc., and you can also come unstuck if for some reason you get a blob store without a default folder and start writing to the $root container.
| Client Path | Default Container | Old Blob Path | New Blob Path |
|---|---|---|---|
foo/bar.txt |
vagabond |
vagabond@foo/bar.txt |
vagabond@foo/bar.txt |
foo/bar.txt |
None | $root@foo/bar.txt |
WASB Exception |
/foo/bar.txt |
vagabond |
[email protected] |
vagabond@foo/bar.txt |
/foo/bar.txt |
None | [email protected] |
WASB Exception |
bar.txt |
vagabond |
[email protected] |
[email protected] |
bar.txt |
None | [email protected] |
WASB Exception |
blah@foo/bar.txt |
vagabond |
not supported | blah@foo/bar.txt |
blah@foo/bar.txt |
None | not supported | blah@foo/bar.txt |
This applies for both CloudFile and CloudDirectory operations.
There's one hack I've had to put in currently which revolves around container names when creating a dictionary.
Also note that this isn't the full WASB spec (which also allows you to include an account as well) but it puts us in a much better place for if we address the ability to connect to multiple storage accounts from an MBrace cluster.
Feedback welcome.
Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data).
I very much agree in retrospect. I recommend we just go ahead with that change?
Which change - this one or also change MBrace.Core to not put in a "default" directory for user operations?
... also change MBrace.Core to not put in a "default" directory for user operations?
this
I've discovered that the default user directory is a feature that's localised to MBrace.Azure, and it was a relatively painless removal. This means that the above matrix can be simplified when working with user data as follows: -
| Client Path | Old Blob Path | New Blob Path |
|---|---|---|
foo/bar.txt |
$root@foo/bar.txt |
WASB Exception |
/foo/bar.txt |
[email protected] |
WASB Exception |
bar.txt |
[email protected] |
WASB Exception |
blah@foo/bar.txt |
not supported | blah@foo/bar.txt |
In other words - this would now be a breaking change, but FWIW I think it's a lot less open to misinterpretation and acknowledges that trying to map conventional file paths to blobs is a somewhat leaky abstraction.
Basic testing being done on: -
- [x] Basic serialization (checking that assemblies + .vgb files can be accessed by mbrace itself)
- [x] CloudFile and CloudDirectory
- [x] CloudFlow
- [x] CloudValue
- [ ] CloudDictionary
I'm not sure what else is needed.
@isaacabraham Do tests pass? Remember CI testing is not enabled on this repo, so we currently have to run tests manually
I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through).
Once that's working I'll add more tests.
I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through). Once that's working I'll add more tests.
@isaacabraham Getting a subset of the tests running on CI under simulated Azure storage would be absolutely fantastic progress in the long term. It's going to be hard to get contributions without that
The other thing I think we really need to help enable contributions is a way to co-develop FsPickler+Vagabond+MBrace.Core+MBrace.Azure/AWS+StarterKit (or + one's own data scripting code) in one smooth build+test+deploy+use inner-dev workflow. In some ways I regret how the repos have been split into multiple github project, as propagating a fix from FsPickler or Mono.Cecil through to actually testing it in your own data scripting code is incredibly painful currently requiring updating about 15 nuget packages.
The future of MBrace surely has to be as a go-to solution for F#-centric data scripters who have Spark-like needs and want to have control over a big data scripting stack that they can easily comprehend and contribute to. This means it must be possible to iterate from an idea "MBrace would be better if it just had XYZ" to refining, using and contributing that idea rapidly.
@dsyme agree 100% - i'm feeling that same pain now (and is exactly why I'm shying away from changes to MBrace.Core here). One of the things I looked at a while ago was dumping reliance on service bus and moving entirely onto storage queues - this should mean that we could just use the storage emulator entirely for MBrace.Azure in a local context for all three components (storage = blobs, state = tables, messaging = queues).