github-action "Re-run failed jobs" will not work with a parallel workflow

After implementing a custom build ID to ensure I could re-run workflows which I've integrated with Cypress Dashboard and configured to run parallelly, I've run into an issue (oddly different than this one).

Here is my job FWIW:

- id: cypress-mocked-api-tests
  uses: cypress-io/github-action@v2
  with:
    wait-on: 'https://localhost:9001/index.js'
    start: npm run start:${{ env.NODE_ENV }}
    config-file: cypress/config/${{ env.NODE_ENV }}.json
    config: video=true,videoUploadOnPasses=false
    spec: '**/*.spec.ts'
    install: false
    record: true
    parallel: true
    group: 'Mocked-API'
    ci-build-id: ${{ needs.prepare.outputs.uuid }}

This job will load balance all my spec files across five containers under a "Mocked-API" group. This works great and I can re-run all jobs without issue.

On a recent run, one of the five containers failed because one test failed. I thought I'd test how "Re-run failed jobs" worked on just the failed container job. My hope/expectation was that it would be smart enough to know which spec files it ran when the entire workflow executed originally (which would have been six spec files which included 22 test) and run those. Instead it ran zero spec files and completed successfully. It seems like the matrix-level orchestration that is needed is not occurring when only a failed container job is re-run. It looks like someone else has run into this issue too and is trying to solve it by disabling the "Re-run failed jobs" option in Github (which doesn't seem possible).

This is a fairly big problem because it resulted in the group (which I've configured as a status check in my trunk branch protection rule) passing and the PR being able to be merged when it had never successfully run all tests.

Jun 22 '22 17:06 mellis481

@mellis481 We recommend passing the GITHUB_TOKEN secret (created by the GH Action automatically) as an environment variable. This will allow correctly identifying every build and avoid confusion when re-running a build.

You can find an example here: https://github.com/cypress-io/github-action#record-test-results-on-cypress-dashboard

Jun 28 '22 19:06 conversayShawn

@conversaShawn That did nothing. This is what happened:

I added the GITHUB_TOKEN as an env variable to my cypress-io/github-action@v2 job in my PR workflow.
I added a failing test to my suite.
I ran the workflow which is configured to run in parallel using 5 containers. The test failed on Machine 5.
I executed "Re-run failed jobs".
On the second workflow run, Machine 5 executed 0 tests and passed.

Jun 28 '22 19:06 mellis481

We are seeing exactly the same issue - rerunning failed jobs only will not run any tests but mark each as passed.

Here is our configuration

      - name: Run integration tests
        timeout-minutes: 20
        uses: cypress-io/github-action@v4
        env:
          CYPRESS_RECORD_KEY: ${{ secrets.CYPRESS_RECORD_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          ci-build-id: ${{ needs.prepare.outputs.uuid }}
          config: baseUrl=${{ format('https://pr{0}-www.build.{1}', github.event.number, env.CBR_PROJECT_DOMAIN) }}
          wait-on: ${{ format('https://pr{0}-www.build.{1}', github.event.number, env.CBR_PROJECT_DOMAIN) }}
          wait-on-timeout: 120
          browser: chrome
          record: true
          parallel: true
          group: merge
          install: false
          working-directory: tests/web
          ```

Jul 04 '22 09:07 ilovegithub2

Same case here. Tests pass without execution after retrying failed jobs.

Jul 15 '22 22:07 mgambati

Aug 03 '22 19:08 modern-sapien

There were recently some changes in our services repo that may have taken care of this issue. Can someone retest with 10.7.0 or later and post results? Thanks!

Sep 13 '22 23:09 admah

There were recently some changes in our services repo that may have taken care of this issue. Can someone retest with 10.7.0 or later and post results? Thanks!

@admah Thanks for contributing to this thread! I just tested with 10.8.0 and it did NOT work correctly. What I'm seeing now is different and not nearly as problematic as the initially-reported issue (the most egregious part of which was passing a workflow after re-running a workflow with a failing Cypress test), but still incorrect. To provide more details...

I added a failing test to my repo that is currently configured to balance my 39 Cypress spec files across five containers. As expected, the job for the container that had the new failing test failed while all other jobs completed successfully.

I then selected to "Re-run failed jobs". When I did this, it created a new workflow run which essentially copied the jobs that completed successfully in the first run and started re-running the one failing job. When I went into Cypress Dashboard to inspect this re-run further, I found that it was running specs in only one container (good), but it was running all 39 specs in that container (bad/whacky).

It should have re-run only the specs that it originally ran in the first run in that container (in my case 7 specs). The failing test in this workflow run did end up failing the job and, subsequently, workflow as desired, but it's, of course, undesirable for "Re-run failed jobs" to re-run all Cypress specs. It's not re-running failed (Cypress) jobs at that point; it's re-running all Cypress tests using the number of containers that failed in the original run.

Sep 14 '22 18:09 mellis481

@mellis481 thanks for the screenshots and additional context. That's very helpful. I was able to get some more clarity on this from our Cloud team.

Here is the current status:

Before, there was an issue where all re-runs got a PASS, regardless of actual status. This issue has been fixed.
Currently, if a re-run is initiated, all specs get run on the machines available. That is not optimal. The Cloud team is looking into the connection between GH Actions and Cypress in order to set up re-runs to be accurate and efficient.

Sep 14 '22 18:09 admah

@admah I'm glad the update from your Cloud team matches my findings (in many less words :smile:).

Hoping additional info on the second bullet will be shared in this thread when available.

Sep 14 '22 18:09 mellis481

@mellis481 yes, I will be providing updates as they're available.

I will be closing this and updating in #531 since this is a duplicate of that issue.

Sep 14 '22 18:09 admah

Duplicate of #531

Sep 14 '22 18:09 admah