CF restart --rolling can cause a healthy app to vanish (downtime)

Open Rand0mF opened this issue 3 years ago • 0 comments

Please fill out the issue checklist below and provide ALL the requested information.

[x] I reviewed open and closed github issues that may be related to my problem.
[x] I tried updating to the latest version of the CF CLI to see if it fixed my problem.
[x] I attempted to run the command with CF_TRACE=1 to help debug the issue.
[x] I am reporting a bug that others will be able to reproduce.

Describe the bug and the command you saw an issue with A healthy app is running as version V1 on CF. When a new app version V2 is pushed (cf push --strategy rolling) and meanwhile a rolling restart cf restart --strategy=rolling is executed, the version V1 will be replaced with V2 even though it was never healthy.

What happened On app changes, the pipeline will deploy the new app version to CF cf push --strategy rolling. It is possible, that this app version will never be healthy. In this case the command will fail eventually. This is fine. However, during this push attempt, someone executes cf restart --strategy=rolling which will finally result in a downtime.

Expected behavior CF --rolling commands should never cause a downtime. Expected would be that the healthy app version will be taken as a restart for the rolling restart and not just the latest version. The documentation states there will be no downtime for this command. Screenshot:

Exact Steps To Reproduce

push a healthy app to CF with HTTP as health check type.

app.get('/health', (req, res) => {
    res.json({});
})

---
applications:
  - name: test-app
    timeout: 180
    stack: cflinuxfs3
    command: npm start
    buildpacks:
      - "https://github.com/cloudfoundry/nodejs-buildpack.git#v1.7.40"
    processes:
      - type: web
        instances: 1
        memory: 500MB
        health-check-type: http
        health-check-http-endpoint: "/health"
        health-check-invocation-timeout: 10
        timeout: 30

modify the source code to simulate a non healthy app

app.get('/health', (req, res) => {
    //res.json({});
})

push the modified source code. cf push --strategy rolling
notice the command waits for the app to become healthy (which it never will)
during the waiting of the first command, execute cf restart --strategy=rolling test-app
notice the push command immediately exits with Deployment has been superseded
notice with cf apps 3 processes are listed web:1/1, web:0/1, web:0/1
wait until the restart command times out waiting for the app to be healthy.
cf app will show one process which is not running for the app web:0/1. It seems like cf restart falls back to the last version instead of the last healthy version.

Provide more context CF API version: 3.113.0 CF CLI version: 8.2.0+fd8fbca64.2022-02-09 (also tested with cf cli v7) platform: Linux (5.4.0-89-generic #100~18.04.1-Ubuntu)

Feb 18 '22 12:02 Rand0mF