Error handling suggestion
The problem: Solve littering and easing cleanup scenarios in case of failed deployment. (e.g. due to dry run failing.)
The proposed solution: Consider
networks:
production:
host:
- 192.168.0.1
commands:
fetch:
desc: Fetch the build
run: wget http://s3.amazonaws.com/my/src/code.tgz -O /tmp/code.tgz .tgz
ensure: rm -f /tmp/code.tgz
extract:
desc: Extract the build
run: cd /app/releases && tar -xvzpf /tmp/code.tgz
on-error: rm -rf /app/releases/code
dryrun:
desc: Do a dry run
run: /app/releases/code/dry-run.sh
start:
desc: start the app
run:
targets:
deploy:
- fetch
- extract
- dryrun
- start
ensure scripts would be always be run after all commands have run. They are run in reverse order.
on-error scripts would be run only if a command failed. They are run in reverse order.
Objections and discussions are welcome.
Awesome idea, something like a rollback would be cool as well. Maybe we can use a target name for a rollback.
networks:
production:
host:
- 192.168.0.1
commands:
fetch:
desc: Fetch the build
run: wget http://s3.amazonaws.com/my/src/code.tgz -O /tmp/code.tgz .tgz
ensure: rm -f /tmp/code.tgz
extract:
desc: Extract the build
run: cd /app/releases && tar -xvzpf /tmp/code.tgz
rollback: cleanup
dryrun:
desc: Do a dry run
run: /app/releases/code/dry-run.sh
remove_tmp:
desc: Remove
run: rm -rf /tmp/build
cleandb:
desc: Clean my db
run: dbcommand --remove-last-migration
start:
desc: start the app
run:
targets:
deploy:
- fetch
- extract
- dryrun
- start
cleanup:
- remove_tmp
- cleandb
good ideas, but we need to make sure that the semantics are consistent and clear. For a "command" we now accept:
- desc
- run
- script
- local
- serial
- once
just something to consider, we should keep this list as minimal as possible and as intuitive as possible.
@eduardonunesp - When is rollback invoked? Only on errors? Is it a special command sup production rollback (a default capistrano command)? I tend to think the term is slightly overloaded and hence not very clear. Just my 2 cents. But maybe your approach has a simpler model in that you don't mix things at the command level with things at the target level (except for pointing at it).
@pkieltyka Agree - as this list grows the complexity also grows. Any suggestions? I really do think error handling is needed somehow.
@stengaard indeed my idea added some complexity, on-error looks good as well, my major concern is to clean up the deploy when something goes wrong.
@pkieltyka the list looks good and small, also a fallback like on-error is something that is missing
maybe the command should be onerror just to keep pattern
I like the concept of onerror roll-backs :+1:
But I'm missing why we'd need ensure, though -- you can put ensuring commands right into your existing commands, like
commands:
fetch:
run: >
curl http://s3.amazonaws.com/my/src/code.tgz | \
tar -xvzp > /tmp/code || exit 1 && \
test -f /tmp/code || exit 1
The point of ensure was that they were run after all other commands are run in the target list. Irregardless of return codes. (Also: return code of ensure command should be ignored.)
So they are very handy for cleanup tasks that should always be run without chaining a lot of commands together.
ensure: rm -rf /tmp/code
They should of course only be run if the command they are attached to has run.
You can use http://redsymbol.net/articles/bash-exit-traps/ for a clean-up on exit. I'm still not convinced we need ensure, since you can achieve the same thing easily using bash/sh. Let's keep the API clean unless we have a really strong consensus on a feature like that.
However, I like the onerror rollbacks, that's a great feature. That one makes sense, as it's not easily achievable by sequentially executed bash commands.
I do agree with your views on API size, but I still feel I haven't explained the usefulness of ensure well enough. I'll give it another go. :)
The thing about ensure is that you could want to use the product of a command (e.g. /tmp/code) in a later command - but would then need a separate cleanup command.
Compare
commands:
fetch:
run: curl http://s3/build_script > /tmp/file
build:
run: /tmp/file
cleanup:
run: rm /tmp/file
targets:
deploy:
- fetch
- build
- cleanup
To
commands:
fetch:
run: curl http://s3/build_script > /tmp/file
ensure: rm /tmp/file
build:
run: /tmp/file
targets:
deploy:
- fetch
- build
The point I'm trying (badly) to get at is locality and readability. In the example the same piece of "code" that litters is charged with cleaning it up once it's done. Further you can't put the discrete commands together in a way that would litter.
Also: maybe the name ensure is horrible. I suck at naming things. If a had a dog it's name would be: dog. Or Arnie.
Haha dog :smile: ! Dog come here!