apache-spark-node Build and publish docs

Use jsdoc to build docs in some local directory and publish them somewhere.

Nov 23 '15 20:11 henridf

Could you also add some "developer docs" containing some concepts you're following? For example, I'd like to understand why you chose to wrap the functions etc. as JS. I thought that this would be usable OOTB (with some additional, "ugly" JS code, yes), but maybe I didn't understand this correctly.

Some insights would be great! Thanks in advance...

Dec 08 '15 10:12 tobilg

@tobilg yes, some sort of high-level docs explaining the design choices tradeoffs are a good idea. It's probably still early to write those because all of those concepts are still very much in flux, but in the meantime, here are some points related to your question.

I initially started out figuring that it would indeed be possible to use the imported objects without a wrapper. But pretty quickly I ran into issues the required at least some level of wrapping.

One example is dealing with defaults (two sample occurrences: https://github.com/henridf/apache-spark-node/blob/master/lib/DataFrame.js#L116 and https://github.com/henridf/apache-spark-node/blob/master/lib/sqlContext.js#L56)

Also, converting to/from the native representation (for example, https://github.com/henridf/apache-spark-node/blob/master/lib/DataFrame.js#L402). When using directly the java objects, head would return a bunch of opaque row java objects, which the user would have no idea how to use.

And of course, there's the issue of documenting what functions are available and how to use them (pointing people to the scala documentation wouldn't work very well...).

I anticipate that over time, the wrappers will do more such "convenience" work to make it more possible to use spark in a way that feels "node-like". This is similar to the python wrapper.

Dec 09 '15 05:12 henridf

Thanks a lot for taking this. I already understand your design choices much better now. As I mostly use dataframe.toJSON(), if don't really struggle with the native data types, and therefore probably didn't notice the hurdles with the native representations.

I like the idea of providing convenience wrappers, but one downside what I can think of could be that once Spark changes it's APIs, you'll have to potentially rewrite a lot of code. Or are you using some kind of code generation/transpilation (from the Scala sources)?

Dec 09 '15 08:12 tobilg

Once things are in decent shape, I'm hoping that keeping up with Spark API changes won't be a huge burden - generally they tend to avoid make breaking API changes, it's more a case of the API surface that expands. We'll see how that plays out in practice, but it seems to have worked reasonably well for pyspark.

I've done the first pass manually from scala sources, with heavy use of emacs/regexes. Then hand-tweaking.... Not an ideal process, but again it's more of a one-time thing (I hope!).

On 9 December 2015 at 00:25, Tobi [email protected] wrote:

Thanks a lot for taking this. I already understand your design choices much better now. As I mostly use dataframe.toJSON(), if don't really struggle with the native data types, and therefore probably didn't notice the hurdles with the native representations.

I like the idea of providing convenience wrappers, but one downside what I can think of could be that once Spark changes it's APIs, you'll have to potentially rewrite a lot of code. Or are you using some kind of code generation/transpilation (from the Scala sources)?

— Reply to this email directly or view it on GitHub https://github.com/henridf/apache-spark-node/issues/9#issuecomment-163148636 .

Dec 09 '15 21:12 henridf