Initial feedback
Great work on this plugin! It looks like a great start!
I haven't pulled it down and played with it yet, but I will soon. Here's a few comments based on my first pass of the source code. It's a lot, so don't worry about tackling it all at once! And it's not criticism, just some ideas/questions from my perspective. It looks like you've done great work!
Module name: git could just as well be a git server, but this is a git client. I actually like how concise this name is and think that it's totally possible for the git app to be both a client and a server. That said, the structure of the app should be designed so that there is room for a plausible future git server. (Or, if the client and the server will be separate apps, the names should reflect that. However, I don't think that'll be strictly necessary.) This might be as simple as moving all the client-related config into a sub-struct of the Git struct type:
type Git struct {
Client *GitClient `json:"client,omitempty"
}
type GitClient struct {
Repositories []Repository `json:"repositories"`
wg *sync.WaitGroup
}
Struct name: Consider naming it App -- I haven't been super consistent about this and it's not a big deal, but then it's less repetitive than caddygit.Git -- instead, it'd be caddygit.App which makes a little more sense and is more descriptive.
Module registration: This should be changed:
func init() {
if err := caddy.RegisterModule(Git{}); err != nil {
caddy.Log().Fatal(err.Error())
}
}
to this:
func init() {
caddy.RegisterModule(Git{})
}
to work with the latest commit -- last breaking change! :+1:
Module info: Avoid intialization here. Do it in Provision() instead.
Interface guards: Looks good!
forEach: What is the purpose of this exactly? Why not just use a regular for loop?
Repository struct:
- URL: Does it support SSH as well? (Is that something that can be easily added later?)
- Path: Does it create a subfolder or does it go directly into the given path exactly as specified? What if the folder already exists and is not a git repo, etc? What is the default path?
- Tag: The
{latest}placeholder is a good idea! But it should be properly namespaced, and maybe a little more descriptive, like{git.repo.latest}for example. - Branch: Is it needful to have both
TagandBranchfields? Could there be just a single field to specify which commit/branch/tag to pull? (Related idea: In the future, users may want a way to specify a tag pattern, i.e. the latest tag that matches a pattern -- so keep that in mind. Don't have to implement that now, I'm just saying to consider it so there's room for that to be added later.) - Username and Password: As much as I like having all the config in a single document, I think there should be a way for credentials to be specified outside the config file, like in an env variable.
- SingleBranch and Depth: What are the defaults? What values are allowed? Would it make sense to keep clones/pulls lightweight by default and only clone the specified branch at a depth of 1? (Or would that break most sites?)
- Interval: It feels weird to specify -1 to disable periodic pulling IMO, what if 0 disabled it instead? Actually... what is the point of disabling it? Wouldn't it be kind of useless then?
- Then and ThenLong: I think the design of this feature could use some careful thought. Do users need to execute more than one command? If so, can we do something more like:
CommandsAfter []Command `json:"commands_after,omitempty"`
...
}
type Command struct {
// Args[0] is the command to run, Args[1:] is the arguments, just
// like exec.Command - best to not have to do any parsing, you
// won't need go-shellquote
Args []string `json:"command,omitempty"`
// Or "Async", I don't care
Background bool `json:"background,omitempty"`
// maybe an option to determine whether the command chain
// terminates if this command returns an error, but you
// might also wait until this is requested by a user
}
Struct tags: Always specify omitempty so that config adapters don't produce unnecessary, empty keys, which clutters up the resulting config. This will become noticeable once this app is usable from the Caddyfile.
Context: Consider using the caddy.Context that is passed in during Provision - you can store it in your App struct if you need to; this context value gets cancelled when the app is stopping. It looks like you've already got cancellation figured out using channels and waitgroups and there's nothing wrong with that per-se, but if using the context makes things simpler, I'd say at least look into it.
Logging: Great to see that you're using the zap.Logger facilities provided by the core!
Running commands: This looks leaky - if the goal is to run the command wait for it to terminate OR for some other event (like a timeout or cancellation), then consider something more like this: https://github.com/caddyserver/xcaddy/blob/master/environment.go#L186-L220
That's all for now -- keep up the good work! Can't wait to start using this.
Thank you @mholt for the feedback 😄
forEach: What is the purpose of this exactly? Why not just use a regular for loop?
Wanted to loop by reference and writing r.something was cleaner than repos[i].something. I could have just declared r := repos[i] but then again forEach did that for me 😅
URL: Does it support SSH as well? (Is that something that can be easily added later?)
Not yet. But I don't think it should be hard to add later on.
Path: Does it create a subfolder or does it go directly into the given path exactly as specified? What if the folder already exists and is not a git repo, etc? What is the default path?
Yes, path definitely needs more provision and validation. Basically, if the path is /abc/def and def exists and is not a git repo, it initializes def and if def does not exists, it creates def and initializes. If def exists and is a git repo, it simply opens it. Need more work with the first two cases.
Tag: The {latest} placeholder is a good idea! But it should be properly namespaced, and maybe a little more descriptive, like {git.repo.latest} for example.
I don't understand the namespacing here. Basically it's like a magic keyword that just gets the latest tag.
Branch: Is it needful to have both Tag and Branch fields? Could there be just a single field to specify which commit/branch/tag to pull? (Related idea: In the future, users may want a way to specify a tag pattern, i.e. the latest tag that matches a pattern -- so keep that in mind. Don't have to implement that now, I'm just saying to consider it so there's room for that to be added later.)
Branch and Tag are different because if the tag is specified to be latest, the latest tag is based on the branch specified. For example, Caddy v2 is on the master branch and Caddy v1 is on the v1 branch. If branch is empty (or master), it will get only the latest tags from master branch, i.e., for v2 or if the branch is v1 it gets only tags for v1 branch.
Also, about the regex, we could make a Tag struct and do something like this:
type Tag struct {
Latest bool // gets the latest tag
Name string // name of the tag (can be a regex)
}
Basically if Tag.Latest is true, it gets the latest tag based on regex and if false, it tries for the specified name.
SingleBranch and Depth: What are the defaults? What values are allowed? Would it make sense to keep clones/pulls lightweight by default and only clone the specified branch at a depth of 1? (Or would that break most sites?)
Default for SingleBranch is false and Depth is 0 (basically all the commits). I don't feel comfortable changing these defaults because at least without these we're sure nothing would break.
Interval: It feels weird to specify -1 to disable periodic pulling IMO, what if 0 disabled it instead? Actually... what is the point of disabling it? Wouldn't it be kind of useless then?
0 interval actually does disable it. This is disabled only when user specifies it, i.e., value <= 0 or if a specific tag is provided. It doesn't make sense to disable it but we can so there's no harm keeping it.
I've gone through other points. Will make changes accordingly.
Wanted to loop by reference and writing r.something was cleaner than repos[i].something. I could have just declared r := repos[i] but then again forEach did that for me 😅
Fair enough; but I think the function is more boilerplate. Instead, you might consider moving most of that logic to an unexported method on Repository so that each repository can provision itself. That's how the HTTP app works, it calls a provision method on each server.
Basically, if the path is /abc/def and def exists and is not a git repo, it initializes def and if def does not exists, it creates def and initializes. If def exists and is a git repo, it simply opens it.
Most of this sounds good, but the bolded part diverges from the git command:
Cloning into an existing directory is only allowed if the directory is empty.
Something to consider. I think the way the git command works is nice and conservative.
I don't understand the namespacing here. Basically it's like a magic keyword that just gets the latest tag.
The namespacing is necessary since technically Replacers are global throughout all of the caddy config, even across apps, although I guess in practice it's unlikely for them to cross boundaries. For consistency with the rest of Caddy, I would encourage you to namespace the placeholders properly, it also allows for future growth/expansion of the app without collisions.
Branch and Tag are different because if the tag is specified to be latest, the latest tag is based on the branch specified. For example, Caddy v2 is on the master branch and Caddy v1 is on the v1 branch. If branch is empty (or master), it will get only the latest tags from master branch, i.e., for v2 or if the branch is v1 it gets only tags for v1 branch.
Ah, I see, so specifying a branch name like master will always pull the latest commit from the branch, but specifying {latest} (or whatever the placeholder ends up being) will only pull the latest tag from the branch. Did I get that right?
It's a bit confusing I think. It also makes it easy to give invalid or conflicting config: you could specify a Tag and a Branch that don't work together. Maybe consolidate these into just one field that accepts a tag, branch, commit, or something like {git.repo.branch.latest_tag} where branch is the branch name. This also makes it clear you're referring to the latest tag, rather than the latest commit.
But there are other ways this could be solved, too. I am just suggesting one -- but I do strongly recommend changing this, it will be a bit of a footgun as-is.
Default for SingleBranch is false and Depth is 0 (basically all the commits). I don't feel comfortable changing these defaults because at least without these we're sure nothing would break.
Sounds good to me then.
0 interval actually does disable it. This is disabled only when user specifies it, i.e., value <= 0 or if a specific tag is provided. It doesn't make sense to disable it but we can so there's no harm keeping it.
But isn't the empty value the same as 0? How do you tell the difference if it's not a pointer?
Also, I would advise against adding in config features unless you know they'll be needed. Even if it is easy to disable polling, it doesn't seem useful at all. Adding it is a backwards-compatible change, so I'd suggest leaving this until later if somebody actually needs it.
Review as of commit 65c0237a3a0b9f18fa78bf470ecd62bb20614b14
It's looking a lot better!
-
There are many packages, and they are all very small. Also
package utilsis a code smell. This whole project could probably be contained in 2 package, say:package caddygitandpackage gitclient. Moveutilsintocaddygit, then I imagineservice,repository, andcommandercould all go into a newgitclientpackage. Ideallygitclientwould rely oncaddygitif necessary, and not the other way around. -
I like the idea of the
Serviceinterface, but it can be better: https://github.com/vrongmeal/caddygit/blob/65c0237a3a0b9f18fa78bf470ecd62bb20614b14/service/service.go#L24-L30 Instead of two methods, have just one:Start(ctx context.Context) (<-chan time.Time, error)- and the context should probably be passed in so the caller can also cancel/stop the service. The current API makes it possible to range over a service that hasn't been started. -
The
service.Typeandservice.Optsthing is weird. Just make the service a guest module, which makes your app a "host module": https://caddyserver.com/docs/extending-caddy#host-modules - this is exactly the kind of thing it is there for. So, as it stands, you'd have two service modules initially: poll and webhook. (And they should only have 1 name!) -
The placeholders as described in the readme are confusing:
// `{git.ref.branch.<branch>}` for branch name. Equivalent to `<branch>`.
// `{git.ref.branch.<branch>.latest_commit}` is same as above.
// `{git.ref.latest_commit}` is same as above for default branch. Equivalent to empty string.
// `{git.ref.branch.<branch>.latest_tag}` fetches latest tag for given branch.
// `{git.ref.latest_tag}` is same as above for default branch.
// `{git.ref.tag.<tag>}` for tag name.
"branch": "my-branch",
I can see why there are repetitions, but having two similar ways of doing the same thing is not great. It also just occurred to me that "latest" is ambiguous -- how is the sorting done? Version number? Lexicographically? Date? (e.g. semver's sorting is broken -- they use lexicographical sort).
I think a better approach would be to first figure out what the string value of the "branch" field is. Well, it's called "branch", so I would expect it to be a branch name. So it is confusing if I can put a tag or commit ID here too. So maybe 1) rename the field to "ref", 2) have the field's value be a ref ID, that is, anything you can git checkout like a tag, commit SHA, or branch name. I believe git's default behavior when checking out a branch is to use its latest commit, right? So I shouldn't even need a placeholder. You only need a placeholder when the value is unknown up front. So you might need a placeholder for latest tag on a certain branch (for some definition of "latest" -- you will probably find in the future that some want latest by date, others by semver...) instead of the latest commit on a certain branch, which could look like {git.ref.latest_tag.<branch>} -- and that's honestly only necessary if somebody requests that feature!
Does that make sense? It's quite likely that custom placeholders aren't necessary here at all, for now. (You can still use a replacer, in case people use an environment variable placeholder like {env.BRANCH_NAME} to configure their app, but that's already implemented by caddy core!)
-
Consider renaming "username" and "password" fields to "auth_user" and "auth_secret", respectively. This makes them more generally named for when it's not a username+password combo (i.e. an access token).
-
Just wondering, what is the purpose of customizing the remote name?
I can see why there are repetitions, but having two similar ways of doing the same thing is not great. It also just occurred to me that "latest" is ambiguous -- how is the sorting done? Version number? Lexicographically? Date? (e.g. semver's sorting is broken -- they use lexicographical sort).
The tag is latest by date.
I think a better approach would be to first figure out what the string value of the "branch" field is. Well, it's called "branch", so I would expect it to be a branch name. So it is confusing if I can put a tag or commit ID here too. So maybe 1) rename the field to "ref", 2) have the field's value be a ref ID, that is, anything you can git checkout like a tag, commit SHA, or branch name. I believe git's default behavior when checking out a branch is to use its latest commit, right? So I shouldn't even need a placeholder. You only need a placeholder when the value is unknown up front. So you might need a placeholder for latest tag on a certain branch (for some definition of "latest" -- you will probably find in the future that some want latest by date, others by semver...) instead of the latest commit on a certain branch, which could look like {git.ref.latest_tag.
} -- and that's honestly only necessary if somebody requests that feature!
The issue is the go-git package requires me to specify the kind of reference. I need to check how it's implemented in the git cli, accordingly I'll change it.
Just wondering, what is the purpose of customizing the remote name?
This was just to support an already existing repository which the module opens. But then I again reset the given remote URL according to the provided one so it is indeed useless. I should remove this.
The issue is the go-git package requires me to specify the kind of reference. I need to check how it's implemented in the git cli, accordingly I'll change it.
Ah, sorry, I didn't realize that. I guess now you know what at least one user would expect 😅 -- if it's not overly difficult, I do think this would be an awesome way to handle it: the user wouldn't have to worry about what kind of ref it is (especially if they're coming from the git CLI), it would just work.
Anyway, the rest of your post sounds good! :+1:
This module is looking a lot better already.
I think the next major thing people will need is webhook support (that's an HTTP handler) and then a Caddyfile directive to enable that. Then the rest is bug fixes probably!
Looking forward to the webhook features, seem like currently we only support poll mode?
After confirmation, I believe caddygit already supported webhook now. But i really need an example to learn how to config the webhook.
Hey! It supports webhook (github + generic). Sorry there's no documentation currently. I have been a bit busy lately. I will add documentation as soon as I can. If you're interested you could contribute in adding more pending features as well. That would be highly appreciated.
Thank you for your reply and wish you all the best. I am not familiar with golang, so I am afraid I can't make any substantial contributions. Looking forward to your documentation
Hey, thanks for this awesome plugin! I've started using caddy recently to host a personal blog, using Hugo for static content management. I plan to keep my markdown posts in a github repo and then issue a regenerate command for hugo every time I commit to the repo. I've tested it with polling and it works really well, but now I want to take a step further and try the webhook feature, but I'm not having success with the caddy webhook part in the JSON for caddygit (I'm not familiar with Go either to fully understand what's the expected structure), is there any example that you can provide of how to setup the webhook in the JSON config for Caddy ?
Thanks!
Hey! Sorry for this.
Here's an example I used while testing:
{
"apps": {
"http": {
"servers": {
"srv1": {
"listen": [
":9091"
],
"automatic_https": {
"disable": true
},
"routes": [
{
"handle": [
{
"handler": "git",
"hook_secret": "GH_TEST_REPO_SECRET",
"hook": { "type": "github" },
"repo": {
"path": "/Users/vrongmeal/Desktop/test-repo",
"url": "https://github.com/vrongmeal/test-repo.git"
},
"commands_after": [
{
"command": [
"echo",
"hello world"
]
}
]
}
]
}
]
}
}
}
}
}
This is basically an HTTP handler. You can also use webhook without the HTTP app.
// Rest of the config same as that in README
"service": {
"type": "webhook",
"secret": "GH_TEST_REPO_SECRET",
"port": 12345, // port to run the webhook server
"path": "/webhook", // http://localhost:12345/webhook
"hook": { "type": "github" }
},
// ...
Hey! Sorry for this. Here's an example I used while testing: ...
This is very helpful, thank you!