node-gitteh icon indicating copy to clipboard operation
node-gitteh copied to clipboard

Gitteh needs a primary maintainer!

Open samcday opened this issue 12 years ago • 61 comments

I had an epiphany this week: I've been selfish when it comes to Gitteh.

I've acknowledged several times in issues / emails that I don't really have enough time for Gitteh. But then, like most developers, I'm presented with a problem of some kind and want to solve it. Then suddenly Gitteh seems interesting again, and I feel all inspired and start making some promises to do things that I generally don't.

Today, I think node-gitteh holds the most promise out of most git libraries for Node. Here's why:

  • It's under the official libgit2 umbrella and has compatible licencing.
  • The latest version is stable and has a decent suite of tests.
  • It has a decent amount of interest (260 stars, 37 forks, 10 watchers).
  • It's had a lot of blood, sweat, and tears poured into it :)

I think the design choices made for gitteh are sound, there's been some quibbling regarding some implementation choices (such as CS), but I think these can be addressed in the short-term easily enough.

SO. What I need is for someone to step up to the plate. I need someone who wants to become the primary maintainer for gitteh.

Please mention your interest here, and we'll hash it out. The person I pick to be the primary maintainer will ideally be someone who's shown a healthy amount of interest in gitteh over the past few months and has some Github projects demonstrating an understanding of Node, and C++ (read: best candidate would be someone who's written a Node native module before). Once I find the right person, I will ensure they are given contributor access to this repository.

samcday avatar Aug 31 '13 09:08 samcday

@FrozenCow @lushzero @jmendeth @ben @iamwilhelm - Who's it gonna be fellas? :)

samcday avatar Aug 31 '13 09:08 samcday

Does it need to be just one?

[...] has some Github projects demonstrating an understanding of Node, and C++ [...]

I have some overall experience in it, I've mostly written Robotskirt (Node.JS wrapper for the now-frozen Sundown library), as you can see in the contributors page; from that I derived the V8U set of microutilities for writing V8 addons. And some other small modules, like parport.js (high-level access to parallell ports from Node).

But I wouldn't like to hold that responsibility alone.

PS: @ben is already a maintainer. :)

mildsunrise avatar Aug 31 '13 10:08 mildsunrise

I have experience with C++ and I have experience with Node/JS. However I don't have any experience writing C++ addons for Node (apart from the PR I did here), so I don't think I'm fit for the job.

Does it need to be just one?

I agree, 2 or more would be best.

FrozenCow avatar Aug 31 '13 10:08 FrozenCow

However I don't have any experience writing C++ addons for Node (apart from the PR I did here)

you'll get experience from gitteh! :D

mildsunrise avatar Aug 31 '13 15:08 mildsunrise

@samcday That's good to hear. I'd hate to see it languish. I have a vested interest in making sure gitteh is up to date and up to par, as I use it in cubehero.com. However, I haven't written a node native module before, I'm not against spending the time to learn, however. I think as one of the maintainer, I'd probably start off doing things like making sure documentation is up to date, and making sure PRs and issues don't just sit there.

iamwilhelm avatar Aug 31 '13 15:08 iamwilhelm

I think as one of the maintainer, I'd probably start off doing things like making sure documentation is up to date, and making sure PRs and issues don't just sit there.

That's the spirit! It's clear that we need several maintainers. :smiley:

mildsunrise avatar Aug 31 '13 15:08 mildsunrise

I'm afraid I have to bow out for the maintainer role. I'm a bit overcommitted just now, and more projects are coming down the pipeline. I'll keep watching this, and feel free to summon me for libgit2 questions.

ben avatar Aug 31 '13 20:08 ben

I'm afraid I have to bow out for the maintainer role. I'm a bit overcommitted just now, and more projects are coming down the pipeline. I'll keep watching this, and feel free to summon me for libgit2 questions.

I understand. But don't worry, we'll still @ summon you. ;)

mildsunrise avatar Aug 31 '13 21:08 mildsunrise

I'm pretty much out on gitteh at this point and working on sgit. To me a lot of the fundamental technological decisions made in gitteh were sound at the time they were made but not now. Libgit2 is very different now so they don't make sense anymore, i.e. C++ async, thread handling and thread locking which is 70% of gitteh's code.

lushzero avatar Sep 01 '13 01:09 lushzero

@lushzero Fair enough, I'm not sure I'd want someone working on gitteh if they make statements like that without backing it up with proper reasoned arguments ;)

@jmendeth @FrozenCow @iamwilhelm Thanks for indicating interest. I wrote up the initial issue pretty fast, and didn't mean for it to sound like it precludes more than one active maintainer. What I meant by it is I need someone (or someones, hehe) to be involved in gitteh enough so that it keeps moving along, as it's not really something I realistically have time for. That said, I plan on remaining involved, and may be able to contribute some more features over the coming weeks / months. We'll see.

@iamwilhelm I didn't actually realise you were using gitteh in anger somewhere like that. That's pretty cool! Gitteh is actually something I wrote for a project I was working on a while ago, but I've since scrapped that project and I think that's one of the biggest reasons why I haven't had much time for gitteh.

@ben Understandable!

Okay so the guys who are interested, sounds like we need to divvy up some responsibilities. Here's a quick rough cut of what might work:

  • @jmendeth + @FrozenCow work on getting bindings up to date with latest libgit2.
  • @iamwilhelm Get new version of documentation up, implement some higher level methods in gitteh userland.

Whatcha think?

samcday avatar Sep 01 '13 03:09 samcday

Yeah, that sounds great to me.

But I think that, before trying to start working, we should set a simpler build system (i.e. remove coffeescript), so that we (and future contributors) can focus better on it.

There's also things we can simplify on gitteh's code, now that LibUV is standard in every Node.

mildsunrise avatar Sep 01 '13 08:09 mildsunrise

Since @lushzero was working on a fork, out of interest I thought I'd look around whether there were other git libraries for node based on libgit2. I found https://github.com/nodegit/nodegit/tree/wip . Does anyone know anything about that project? They seem to do things very similar to what node-gitteh does and still active. I'm not sure how far along they are, but they do have treebuilder stuff in there, which was something I wanted to put in gitteh too.

With sgit, there are 3 projects doing the exact same thing with slight in their API and the way it works. I'm a bit torn at what to do here. Focus our efforts on one project? Keep developing multiple projects side-by-side? Collaborate a bit? I'm still interested in the new methods @lushzero is suggesting, because the current way to implement methods in gitteh is quite hard as a beginner. With the code being in a better state, it would be easier for people to contribute.

That being said (back on-topic), I don't mind being a maintainer, though I do not know how long it will be of interest for me. I'm working on a project for myself that uses gitteh and for that I'm adding features that are needed, but I don't know how long that project will last. I'll likely lose interest/become less active once that project dies. So, short-term: yes. long-term: I'm not sure.

FrozenCow avatar Sep 01 '13 13:09 FrozenCow

@lushzero @FrozenCow @samcday

My thoughts:

  1. @lushzero has interesting views (which I don't understand yet; I want to talk about them on a dedicate place) which can help improve Gitteh as well as SGit.

  2. I understand that @lushzero didn't back his arguments on his comment; it's very natural since this is not the adequate place to talk about implementation issues, so we'd better open a dedicate issue to talk about it.

  3. Nodegit was a quick & low-level binding. Lately it has improved a lot, but Gitteh was has been a serious high-level approach since the beginning. So we shouldn't focus much on nodegit.

  4. @FrozenCow About your interest matter, I have to say: this is what happens to everyone. :smile: Even @samcday admits he once lost interest (and now Gitteh seems interesting again). I also wanted to do a rewrite and actually started it (and developed it for some time), and it has some functionality, but now it sits there, abandoned.

    So, that "interest" problem is normal on everyone, what (I think) it's important here is that we inform others about it before leaving. Other than that, I don't see any problem in being a maintainer.

PS. Sorry for my ugly English.

mildsunrise avatar Sep 01 '13 15:09 mildsunrise

I thought I might introduce myself as I'm now making active contributions to nodegit. I'm the author of the wip branch https://github.com/nodegit/nodegit/tree/wip

The wip branch is a rewrite of nodegit from the ground up. It uses codegen to generate the node bindings automatically, from a JSON description of the API. The wip branch is, I believe, already more comprehensive than both nodegit/master and node-gitteh. IMO, codegen is a better approach than writing bindings by hand because it requires less manual work to add support for a new function -- and libgit2's API is huge! Other advantages include: a bug fixed in one place is a bug fixed in all places, etc.

This wip version isn't exactly quick and low-level, although it follows this simple[istic] design philosophy: it aims to implement something like a 1:1 mapping between the node API and the C API, with a layer added in pure Javascript that is more "convenient". For example, there is a wrapper around the treebuilder to make it simple to make deep (nested) changes to trees https://github.com/nodegit/nodegit/blob/wip/example/new-commit.js and there are event emitters for walking history https://github.com/nodegit/nodegit/blob/wip/example/walk-history.js.

All of that "convenience" is implemented in Javascript so people can make nodegit as friendly as possible without having to write C++ -- and at the same time, since nearly 100% of the libgit2 functionality is available to them, people are no longer blocked waiting for someone else to implement some function (like clone or treebuilder) in C++.

Even the 1:1 "raw" API is not as bad as it sounds. It's still object-oriented in style, and is asynchronous exactly and only where libgit2 (potentially) does blocking IO:

  git.Repo.open(path.resolve(__dirname, '../.git'), function(error, repo) {
    if (error) throw error;

    repo.getCommit('59b20b8d5c6ff8d09518454d4dd8b7b30f095ab5', function(error, commit) {
      if (error) throw error;

      commit.getEntry('README.md', function(error, entry) {
        if (error) throw error;

        entry.getBlob(function(error, blob) {
          if (error) throw error;

          console.log(entry.name(), entry.sha(), blob.size());
        });
      });
    });
  });

Note that in the above, code, nothing is part of the "convience" API -- everything is 1:1 and automatically generated. Preliminary documentation of this raw API is here: http://www.nodegit.org/nodegit/ . Documentation of the convenience layer is forthcoming (but check the examples at https://github.com/nodegit/nodegit/tree/wip/example )

I'd like to explore the idea of whether we can merge these two projects. That is, whether we can just take the codegen rewrite of nodegit, and once it is stable and well documented, call it libgit2/WHATEVER and deprecate the previous two projects. One project has the benefit of: one implementation, more maintainers, fewer bugs, better documentation.

So that's my hope.

nkallen avatar Sep 01 '13 19:09 nkallen

Hi @nkallen,

that explains many things (I remember nodegit to be a lot hackish and have a raw API, and the results with codegen really impressed me). However, going from manual to auto is a huge change, and has to be studied accordingly (it's also a big phylosophy change).

You see, Gitteh now wants to remodellate itself and needs some people who care of it (reason of this thread). So I think the best approach would be to let @samcday decide, then wait a bit so we can simplify and get things to stability again, and then we would all debate about the fusion you're proposing ---I'm serious about getting a single project with all our efforts combined.

Conclusion: we need some time to get Gitteh itself to normality, and then we can talk about that change. :)

My thoughts.

mildsunrise avatar Sep 01 '13 20:09 mildsunrise

@nkallen I find the codegen approach interesting. I guess it sort of leads into the main goals of a binding project and what individual developers want to use it for. I have been making pretty swift progress on node-sgit, https://github.com/DerpKitten/node-sgit, and ended up with something relatively high level. My needs are to be able to create repositories, commit files to them and then get history and revisions of files. What particular codegen tool are you using?

I am really curious as to what everyone else's needs are and would a higher level or lower level approach be a better fit for them?

I do concur that finding someone to combine efforts would make sense if it can be worked out. Although I could definitely see a situation that results in 2 binding modules, one low level and one higher level.

lushzero avatar Sep 01 '13 21:09 lushzero

@lushzero Oohh I googled node-sgit and looked at your account and couldn't found it. I've now catched you! >:)

I am really curious as to what everyone else's needs are and would a higher level or lower level approach be a better fit for them?

I'm also curious as to what benefits node-sgit gives and how much precision it offers over the generated bindings. Performance is also an important matter.

I do concur that finding someone to combine efforts would make sense if it can be worked out. Although I could definitely see a situation that results in 2 binding modules, one low level and one higher level.

Me too.

mildsunrise avatar Sep 01 '13 22:09 mildsunrise

@jmendeth I don't think this is something that needs me to make a decision on actually! For one, I'm putting gitteh out there for some new people to take ownership of, so my vote means a whole lot less anyway. Second of all, I think @nkallen's approach sounds very promising, as maintaining C++ bindings are tedious and error prone. If the general consensus from those who's opinions I value (basically everyone in this thread) is that a codegen approach is better, then I think that we should go for it. If we can deprecate gitteh and nodegit in favour of a unified codebase, vision and group of contributors, I think that's the best possible outcome for the Node community that just wants to load a couple of commits / blobs and walk a reflog or two :P

samcday avatar Sep 01 '13 22:09 samcday

If we can deprecate gitteh and nodegit in favour of a unified codebase, vision and group of contributors, I think that's the best possible outcome for the Node community that just wants to load a couple of commits / blobs and walk a reflog or two :P

Don't get me wrong, I just would like to know i.e. the performance impact, modularity change, API compatibility, etc. of such a big change, before jumping into it. But if you guys see it clear, let's go on! :) So what do you suggest?

mildsunrise avatar Sep 01 '13 22:09 mildsunrise

EDIT: replies going fast, this was in a response to @samcday ;) I agree with what @jmendeth just said.

Sounds good. We could also decide to start in a separate project (like nodegit's wip branch) and move it over under the libgit2 umbrella once we think things are stable/settled. We could also decide to stay api-compatible with gitteh and keep everything under the same name. People who were using gitteh can keep using gitteh that way.

Anyway, this is a pretty substantial decision. I still need to try nodegit's wip branch by running it in an application. That should give me a better idea what to do next.

FrozenCow avatar Sep 01 '13 22:09 FrozenCow

Sounds good. We could also decide to start in a separate project (like nodegit's wip branch) and move it over under the libgit2 umbrella once we think things are stable/settled. We could also decide to stay api-compatible with gitteh and keep everything under the same name. People who were using gitteh can keep using gitteh that way.

That's one option. The other is to just deprecate this and focus our efforts on nodegit as @samcday suggested.

If we were to deprecate gitteh, we could write a last gitteh version module that adapts nodegit's API to Gitteh's. (that is, a simple JS module that depends on nodegit and uses it to provide Gitteh's API, plus new features).

Anyway, this is a pretty substantial decision. I still need to try nodegit's wip branch by running it in an application. That should give me a better idea what to do next.

:+1:

mildsunrise avatar Sep 01 '13 22:09 mildsunrise

Could anyone else chime in with their requirements. Mine are pretty simple but were without much or reliable support in any of the existing modules I tested. I settled on a very simple api that is probably closer to the command line git than anything else. Super simple, no nesting, calls are self contained. The result is easy to learn, use, extremely ?fast? and light on memory. https://github.com/DerpKitten/node-sgit/blob/master/example.js . I'm not sure if my use cases are anything like anyone else's though.

This isn't a reflection of the qualitative value of nodegit, just the fact that it is much lower level but the "log" case in nodegit is roughly 38 lines (see API example, https://github.com/nodegit/nodegit) to only 6 in sgit. In both cases the timing is quite fast. Across ten runs against the same .git repo (sgit) with 479 commit log entries I get for sgit: 27541637 median nanoseconds and for nodegit 107501468 median nanoseconds. Sgit is marginally faster (about 70 million nanoseconds) but it's neglible for any real world use and the benchmark methodology is imperfect (process.hrtime). Mostly the difference is in the approach I think, nodegit loops over each, sgit gets them in a large batch.

If someone can provide a working example of the "log" case with gitteh I can time that as well.

To me it's sort of meaningless to talk about performance in abstract without talking about actual numbers and applicable use cases. From my read everything anyone here is talking about is pretty damn fast, there's not all that much road between the JS call, the C++ wrapping and the underlying C library no matter how you cut it. I don't think performance is going to be a big issue for anyone but please explain how that thinking is wrong.

My suggestion is that if we call all agree on the rough sketch of what the node API should look like we can then decide what backend(s) configuration makes the most sense for most of the people most of the time.

lushzero avatar Sep 02 '13 01:09 lushzero

Hi guys,

I'm excited you are going to consider the codegen approach as an option. I agree with the commenters above ( @jmendeth , @FrozenCow) that you guys should evaluate it thoroughly before making a commitment. It's still a work in progress, and an outside evaluation is exactly what I need to get the quality to a high level. I'm sure you'll find bugs and have suggestions for improvements, but I'm confident that after I incorporate your feedback, you'll find the functionality comprehensive, stable, and performant.

Whoever is willing to volunteer to beta-test it, I'm available to help. The documentation is a bit sparse, as I said, but you can read the examples linked to above and also the source code to this application I built http://gitdb.elasticbeanstalk.com/repos/gitdb/refs/heads/master/tree/ which uses it.

Making the decision to adopt codegen won't be easy. The API will break. Yes, a compatibility layer in Javascript can be written -- and it's "easy" to do so, if incredibly tedious.

If I may "chime in" with some requirements, I think the goal first and foremost should be to support as large a portion of libgit2 functionality as possible. That supports the widest variety of use-cases, obviously. A good reason not to have 3 separate projects is that libgit's API is large, and getting decent coverage is a lot of effort.

For example @lushzero , here is a pretty terse way to do a git log in nodegit/wip:

git.Repo.open(path.resolve(__dirname, '../.git'), function(error, repo) {
  repo.getMaster(function(error, branch) {
    var history = branch.history();
    history.on('commit', function(commit) {
      console.log('commit ' + commit.sha());
      console.log('Author:', commit.author().name() + ' <' + commit.author().email() + '>');
      console.log('Date:', commit.date());
      console.log('\n    ' + commit.message());
    }).start();
  });

Again, following the same approach: nearly all of libgit2's functionality is directly available in javascript-land, and the API is as friendly as you want by writing wrappers in javascript.

The reason libgit2 doesn't JUST have a simple git log command is because you might want to do git log a..b, abort after 100 entries, change the sorting to topological, or any manner of crazy combinations. Thus libgit2 provides git_revwalk_new, git_revwalk_push, and git_revwalk_next. And similarly, nodegit/wip now provides all of that to you in javascript land too -- and ALSO the simple, friendlier branch.history.on(), which is implemented in pure javascript. That is surely easier writing a bunch of friendly code in C++, for example as you have here: https://github.com/DerpKitten/node-sgit/blob/master/src/sgit.cc#L130 .

That said, your approach has, at least in theory, higher throughput because it crosses the runtime boundary less often and it has less context switching (because fewer items are put in the libuv event queue). On the other hand, your approach doesn't interleave well. It hogs the libuv thread while it looks up all of the commits -- each of which is potentially a random access on disk -- which means that a concurrent application (like a web server) will have extremely latency variance as disproportionately large jobs clog up your thread pool.

In summary, I think libgit2's API strikes the right balance of small, compose-able tools that provide maximum flexibility. Making it directly available in Javascript (in an OO way) provides the greatest functionality and flexibility. The only downside of the 1:1 mapping is that it means crossing the runtime barrier and context switching more often -- potentially a performance hit. However, I'm confident I can address performance issues as they arise, and my preliminary benchmarks suggest that preloading data/minimizing runtime boundary crossings provides a minimal benefit:

https://gist.github.com/nkallen/6406128

nkallen avatar Sep 02 '13 02:09 nkallen

It hogs the libuv thread while it looks up all of the commits -- each of which is potentially a random access on disk -- which means that a concurrent application (like a web server) will have extremely latency variance as disproportionately large jobs clog up your thread pool.

Exactly. That's the big inconvenient I see in node-sgit.

mildsunrise avatar Sep 02 '13 07:09 mildsunrise

@nkallen Looking at the generated (because they're autogenerated?) nodegit C files, I see it uses Boost to escape values, joins them into JSON and passes that to the JS side. It'll be interesting to see the performance of that against what a manually written module such as Gitteh. :)

I must have some benchmarker around, will try to adapt it to nodegit.

mildsunrise avatar Sep 02 '13 08:09 mildsunrise

The generated code doesn't use boost. See https://github.com/nodegit/nodegit/blob/wip/src/revwalk.cc

This uses a custom code generator so the code can generate any style of C++

On Sep 2, 2013, at 4:15 AM, Xavier Mendez [email protected] wrote:

@nkallen Looking at the generated (because they're autogenerated?) nodegit C files, I see it uses Boost to escape values, joins them into JSON and passes that to the JS side. It'll be interesting to see the performance of that against what a manually written module such as Gitteh. :)

Had some benchmarker around, will try to adapt it to nodegit.

— Reply to this email directly or view it on GitHub.

nkallen avatar Sep 02 '13 15:09 nkallen

@iamwilhelm I didn't actually realise you were using gitteh in anger somewhere like that. That's pretty cool!

@samcday thanks! Gitteh did cause me to burst veins sometimes, when the lib didn't install or build on npm, but that's over now. Re: documentation How did you plan on generating the documentation? Did you have a Cake task somewhere, or have a particular preference? Should we move the documentation talk to a different issue, and leave this thread for handover talk?

iamwilhelm avatar Sep 02 '13 17:09 iamwilhelm

@jmendeth I don't think this is something that needs me to make a decision on actually! For one, I'm putting gitteh out there for some new people to take ownership of, so my vote means a whole lot less anyway.

@samcday I think we look to you, since you're the current maintainer, and you actually have an informed opinion of what might work for gitteh. You may not have to make the final decision, but you will probably have to guide the decision, or else the project will languish from indecision.

iamwilhelm avatar Sep 02 '13 17:09 iamwilhelm

When it comes to needs and requirements, my needs are the following:

  1. It has the higher level functions (what git refers to as porcelain). I'm not clear on the distinction between what everyone calls the higher and lower level functions right now, since I'm not as familiar with the libgit2 library, but I think it's important to have the main API match the commands you'd use on the command line, day to day--though I don't mind walking trees and commits myself. That will help new users wrap their head around how to use the library more easily. Any more than that is icing on the cake for me. That said, I do see the value in having coverage of the entire libgit2 lib, so you can implement your own stuff.

  2. The API should be relatively stable. I know we're still at 0.17.x and not 1.0, but like others before me, I find no pleasure in building on top of shifting sands. That said, as long as we agree on something, and we're not changing things every couple of months, that's fine by me.

  3. I hadn't thought about the issue with libuv and great variance in response times. But now that ya'll mention it, yes, it would be bad for me to have high variance in response times when using the lib. Fast git lib with low response time variance is something I'd get behind. But we'd need to benchmark the two different approaches.

iamwilhelm avatar Sep 02 '13 18:09 iamwilhelm

I'm not clear on the distinction between what everyone calls the higher and lower level functions right now [...]

I can shed some light on this. The "Goldilocks" level of abstraction for libgit2 is to provide the building blocks you'd need to implement a full git client (command-line, visual, or otherwise), without actually being opinionated on how that client is structured. So a feature like clone is appropriate for this level, but parsing command-line arguments is not.

If you check the diff for that last link, you'll see that we ended up merging the command-line parsing stuff into the examples area, which is a great place to look if you want to know how to use libgit2, but a built binary is free from all that stuff.

The API should be relatively stable. I know we're still at 0.17.x and not 1.0 [...]

This is sort of the nature of libgit2 right now, I'm afraid. We've tried to get all the breaking changes out of the way, but there are no guarantees just yet. After we hit 1.0, we intend to fully commit to SemVer; we won't break the API without changing the major version number.

I understand. But don't worry, we'll still @ summon you. ;)

@jmendeth, I'm counting on it. :grinning:

ben avatar Sep 02 '13 19:09 ben