"Re-run failed jobs" will not work with a parallel workflow
After implementing a custom build ID to ensure I could re-run workflows which I've integrated with Cypress Dashboard and configured to run parallelly, I've run into an issue (oddly different than this one).
Here is my job FWIW:
- id: cypress-mocked-api-tests
uses: cypress-io/github-action@v2
with:
wait-on: 'https://localhost:9001/index.js'
start: npm run start:${{ env.NODE_ENV }}
config-file: cypress/config/${{ env.NODE_ENV }}.json
config: video=true,videoUploadOnPasses=false
spec: '**/*.spec.ts'
install: false
record: true
parallel: true
group: 'Mocked-API'
ci-build-id: ${{ needs.prepare.outputs.uuid }}
This job will load balance all my spec files across five containers under a "Mocked-API" group. This works great and I can re-run all jobs without issue.
On a recent run, one of the five containers failed because one test failed. I thought I'd test how "Re-run failed jobs" worked on just the failed container job. My hope/expectation was that it would be smart enough to know which spec files it ran when the entire workflow executed originally (which would have been six spec files which included 22 test) and run those. Instead it ran zero spec files and completed successfully. It seems like the matrix-level orchestration that is needed is not occurring when only a failed container job is re-run. It looks like someone else has run into this issue too and is trying to solve it by disabling the "Re-run failed jobs" option in Github (which doesn't seem possible).
This is a fairly big problem because it resulted in the group (which I've configured as a status check in my trunk branch protection rule) passing and the PR being able to be merged when it had never successfully run all tests.
@mellis481 We recommend passing the GITHUB_TOKEN secret (created by the GH Action automatically) as an environment variable. This will allow correctly identifying every build and avoid confusion when re-running a build.
You can find an example here: https://github.com/cypress-io/github-action#record-test-results-on-cypress-dashboard
@conversaShawn That did nothing. This is what happened:
- I added the GITHUB_TOKEN as an env variable to my
cypress-io/github-action@v2job in my PR workflow. - I added a failing test to my suite.
- I ran the workflow which is configured to run in parallel using 5 containers. The test failed on Machine 5.
- I executed "Re-run failed jobs".
- On the second workflow run, Machine 5 executed 0 tests and passed.
We are seeing exactly the same issue - rerunning failed jobs only will not run any tests but mark each as passed.
Here is our configuration
- name: Run integration tests
timeout-minutes: 20
uses: cypress-io/github-action@v4
env:
CYPRESS_RECORD_KEY: ${{ secrets.CYPRESS_RECORD_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
ci-build-id: ${{ needs.prepare.outputs.uuid }}
config: baseUrl=${{ format('https://pr{0}-www.build.{1}', github.event.number, env.CBR_PROJECT_DOMAIN) }}
wait-on: ${{ format('https://pr{0}-www.build.{1}', github.event.number, env.CBR_PROJECT_DOMAIN) }}
wait-on-timeout: 120
browser: chrome
record: true
parallel: true
group: merge
install: false
working-directory: tests/web
```
Same case here. Tests pass without execution after retrying failed jobs.
There were recently some changes in our services repo that may have taken care of this issue. Can someone retest with 10.7.0 or later and post results? Thanks!
There were recently some changes in our services repo that may have taken care of this issue. Can someone retest with
10.7.0or later and post results? Thanks!
@admah Thanks for contributing to this thread! I just tested with 10.8.0 and it did NOT work correctly. What I'm seeing now is different and not nearly as problematic as the initially-reported issue (the most egregious part of which was passing a workflow after re-running a workflow with a failing Cypress test), but still incorrect. To provide more details...
I added a failing test to my repo that is currently configured to balance my 39 Cypress spec files across five containers. As expected, the job for the container that had the new failing test failed while all other jobs completed successfully.

I then selected to "Re-run failed jobs". When I did this, it created a new workflow run which essentially copied the jobs that completed successfully in the first run and started re-running the one failing job. When I went into Cypress Dashboard to inspect this re-run further, I found that it was running specs in only one container (good), but it was running all 39 specs in that container (bad/whacky).

It should have re-run only the specs that it originally ran in the first run in that container (in my case 7 specs). The failing test in this workflow run did end up failing the job and, subsequently, workflow as desired, but it's, of course, undesirable for "Re-run failed jobs" to re-run all Cypress specs. It's not re-running failed (Cypress) jobs at that point; it's re-running all Cypress tests using the number of containers that failed in the original run.
@mellis481 thanks for the screenshots and additional context. That's very helpful. I was able to get some more clarity on this from our Cloud team.
Here is the current status:
- Before, there was an issue where all re-runs got a PASS, regardless of actual status. This issue has been fixed.
- Currently, if a re-run is initiated, all specs get run on the machines available. That is not optimal. The Cloud team is looking into the connection between GH Actions and Cypress in order to set up re-runs to be accurate and efficient.
@admah I'm glad the update from your Cloud team matches my findings (in many less words :smile:).
Hoping additional info on the second bullet will be shared in this thread when available.
@mellis481 yes, I will be providing updates as they're available.
I will be closing this and updating in #531 since this is a duplicate of that issue.
Duplicate of #531