Failing tests on the fixed version
Hi,
It seems there are some (unexpected) failing test cases on the fixed version of, e.g., Express-1f. Here is the step-by-step to reproduce this issue.
My system:
$ sw_vers
ProductName: macOS
ProductVersion: 13.0.1
BuildVersion: 22A400
$ git --version
git version 2.38.1
$ python3 --version
Python 3.10.8
$ node --version
v18.11.0
$ npm --version
8.19.2
# Get BugsJS
$ rm -rf /tmp/bugsjs-issue-11
$ git clone https://github.com/BugsJS/bug-dataset.git /tmp/bugsjs-issue-11
$ cd /tmp/bugsjs-issue-11
# Run `test` on Express-1f
$ python3 main.py -p Express -b 1 -t test -v fixed -o Express-1f-test/
which produces the following log
Cloning into 'express'...
remote: Enumerating objects: 30151, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 30151 (delta 0), reused 0 (delta 0), pack-reused 30149
Receiving objects: 100% (30151/30151), 8.56 MiB | 8.51 MiB/s, done.
Resolving deltas: 100% (17015/17015), done.
Note: switching to 'tags/Bug-1-full'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 8e0080e1 Bug-1 full
/bin/sh: n: command not found
npm WARN deprecated [email protected]: 'native-or-bluebird' is deprecated. Please use 'any-promise' instead.
npm WARN deprecated [email protected]: Deprecated, use jstransformer
npm WARN deprecated [email protected]: Please update to at least constantinople 3.1.1
npm WARN deprecated [email protected]: please upgrade to graceful-fs 4 for compatibility with current and future versions of Node.js
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Legacy versions of mkdirp are no longer supported. Please update to mkdirp 1.x. (Note that the API surface has changed to use Promises in 1.x.)
npm WARN deprecated [email protected]: Legacy versions of mkdirp are no longer supported. Please update to mkdirp 1.x. (Note that the API surface has changed to use Promises in 1.x.)
npm WARN deprecated [email protected]: Legacy versions of mkdirp are no longer supported. Please update to mkdirp 1.x. (Note that the API surface has changed to use Promises in 1.x.)
npm WARN deprecated [email protected]: Please upgrade to v7.0.2+ of superagent. We have fixed numerous issues with streams, form-data, attach(), filesystem errors not bubbling up (ENOENT on attach()), and all tests are now passing. See the releases tab for more information at <https://github.com/visionmedia/superagent/releases>.
npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
npm WARN deprecated [email protected]: Critical security bugs fixed in 2.5.5
npm WARN deprecated [email protected]: Please upgrade to latest, formidable@v2 or formidable@v3! Check these notes: https://bit.ly/2ZEqIau
npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
npm WARN deprecated [email protected]: Mocha v2.0.x is no longer supported.
npm WARN deprecated [email protected]: This module is no longer maintained, try this instead:
npm WARN deprecated npm i nyc
npm WARN deprecated Visit https://istanbul.js.org/integrations for other alternatives.
added 173 packages, and audited 174 packages in 12s
1 package is looking for funding
run `npm fund` for details
38 vulnerabilities (1 low, 6 moderate, 21 high, 10 critical)
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
(node:95589) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:95589) [DEP0066] DeprecationWarning: OutgoingMessage.prototype._headers is deprecated
Number of tests: 737
passes: 718
failures: 21
pending: 0
/bin/sh: istanbul: command not found
21 failing tests. Given it I'm running test on the fixed version I was expecting 0 failures. Please find attached the resulting test_results.json.zip file. Is this somehow expected? Do you spot any miss configuration on my end?
Miscellaneous
A couple of errors I noticed in the log which may or may not be the reason for the 21 failing tests:
-
/bin/sh: n: command not foundwhich is thrown by this line of code. -
/bin/sh: istanbul: command not foundwhich is thrown by the code-coverage step. I've tried to addsp.call("npm install istanbul", shell=True)to therun_npm_installfunction to somehow force the installation ofistanbul, but it didn't work.
-- Best, Jose
After analyzing the provided docker and reproducing the installation steps, I've managed to easily address the miscellaneous issues.
- The
/bin/sh: n: command not foundcan be addressed by runningnpm install -g n. (I was not aware of thenpackage to manage different versions of node.) - The
/bin/sh: istanbul: command not foundcan be addressed by runningnpm install -g istanbul. (It seems that-g(i.e., global) did the trick.)
And I'm now down to 11 failing tests: 8 unit tests and 3 acceptance tests.
Well, according to the Projects/Express/Express_bugs.csv file, there are a total number of 732 tests on Express-1: 721 pass and 11 fail, and I'm now getting those exact numbers with the test command. For the record, here's the list of failing tests on Express-1b and Express-1f.
| Test | JS file |
|---|---|
| "should restore req.params after leaving router" | test/app.router.js |
| "should be invoked instead of auto-responding" | test/res.format.js |
| "should respond with html" | test/res.redirect.js |
| "should escape the url" | test/res.redirect.js |
| "should respond with text" | test/res.redirect.js |
| "should encode the url" | test/res.redirect.js |
| "should set body to \"\"" | test/res.send.js |
| "should invoke the callback on 404" | test/res.sendFile.js |
| "should succeed with proper cookie" | test/acceptance/auth.js |
| "should fail without proper password" | test/acceptance/auth.js |
| "should succeed with proper credentials" | test/acceptance/auth.js |
Point is, if I've the same set of failing tests on the fixed (aka non-buggy) and buggy version, which tests truly trigger the buggy behavior? I would say none, but I might be missing something here.
Hmm, I think I finally managed to understand the issue. There are 11 failing tests on the fixed and on the buggy version, and 12 failing tests on the fixed-only-test-change version. The 11 tests could be considered broken tests (i.e., they fail due to any other reason than the Express-1 bug) and the extra one is the one that triggers the buggy behavior.
| Test | JS file | buggy | fixed | fixed-only-test-change | fault-revealing |
|---|---|---|---|---|---|
| "should restore req.params after leaving router" | test/app.router.js | Fail | Fail | Fail | No |
| "should be invoked instead of auto-responding" | test/res.format.js | Fail | Fail | Fail | No |
| "should respond with html" | test/res.redirect.js | Fail | Fail | Fail | No |
| "should escape the url" | test/res.redirect.js | Fail | Fail | Fail | No |
| "should respond with text" | test/res.redirect.js | Fail | Fail | Fail | No |
| "should encode the url" | test/res.redirect.js | Fail | Fail | Fail | No |
| "should set body to \"\"" | test/res.send.js | Fail | Fail | Fail | No |
| "should invoke the callback on 404" | test/res.sendFile.js | Fail | Fail | Fail | No |
| "should succeed with proper cookie" | test/acceptance/auth.js | Fail | Fail | Fail | No |
| "should fail without proper password" | test/acceptance/auth.js | Fail | Fail | Fail | No |
| "should succeed with proper credentials" | test/acceptance/auth.js | Fail | Fail | Fail | No |
| "should only include each method once" | test/app.options.js | --- | Pass | Fail | Yes |
Is there any way/procedure/script to automatically remove the 11 broken tests from, at least, the fixed version? Otherwise, how would one be able to compute accurate code coverage or mutation score of a suite that has 11 failing tests?
PS: A message to my future self, the fixed-only-test-change version corresponds (roughly) to the buggy version on Defects4J.
Thanks for your exploration on this, @jose, it really helped me. I wrote some scripts to collect the tests that we should expect to pass on the "fixed" version and fail on the "fixed-only-test-change" version. Attaching this here in case someone else comes across and tries to use this dataset.
You should run run_all_tests.sh first inside the bugsjs Docker container, then run collect_all_tests.py to read and summarize the test results. Some paths are specific to my setup.