storm icon indicating copy to clipboard operation
storm copied to clipboard

support using WebHDFS to serve storm-mesos tarball

Open erikdw opened this issue 9 years ago • 3 comments

PR #57 was an attempt to support use of WebHDFS to fetch the storm-mesos tarball. However, the implementation in #57 was not ideal for non-WebHDFS use cases, since it avoids using the Mesos Fetcher, and thus prevents any benefits of caching, etc. that are added to the Fetcher.

As to why we need any special handling for using WebHDFS, it's a bit complicated as you can read here and here. Basically there are some deficiencies in Mesos and WebHDFS which prevent using the Mesos Fetcher to download a tarball from WebHDFS. The Mesos deficiencies have some tickets for them already, but I don't think a ticket exists yet for WebHDFS.

  • https://issues.apache.org/jira/browse/MESOS-1509
  • https://issues.apache.org/jira/browse/MESOS-1686
  • https://issues.apache.org/jira/browse/MESOS-3367
  • https://issues.apache.org/jira/browse/MESOS-4735

erikdw avatar Feb 22 '16 10:02 erikdw

Notably, Mesos ~~v0.29.0~~ v1.0+ supports setting the URI's filename, since someone fixed MESOS-4735 (a bug I filed for solving #97).

So that should give us the ability to get this working! If the URI is webhdfs, we can do some parsing of it to set the CommandInfo.URI.filename to the bare foo.tar.gz name (or foo.tgz), and the Mesos fetcher should take care of unpacking it for us. Alternatively it can just be an explicit parameter in storm.yaml:

  • mesos.executor.uri.filename

I think we can even put the code in now and have it just ignore this setting for Mesos pre-0.29.0. We would somehow need to see if the URI.CommandInfo can have a filename set. I imagine the protobuf generated code gives some ability to check if a setter is available.

erikdw avatar Apr 18 '16 05:04 erikdw

@echinthaka : FYI ^ we can now work on supporting WebHDFS URIs!

erikdw avatar Apr 18 '16 05:04 erikdw

Notably, in MESOS-5119 the field was changed to CommandInfo.URI.output_file and the semantics were adjusted a bit:

Add subdirectory support to URI.output_file field.

URI.output_file allows the user to specify the path of the file that'll
be saved in the sandbox when the URI is fetched, but previously it would
fail at fetch time if "filename" had a directory component. This change
allows users to specify a relative path for custom ouput targets within
the sandbox.

So we should also adjust the config parameter if we decide to go that route:

  • mesos.executor.uri.output_file

Notably, that is the name in mesos v1.0.0:

  • https://github.com/apache/mesos/blob/1.0.0/include/mesos/mesos.proto#L439

erikdw avatar Apr 23 '16 19:04 erikdw