zap-extensions icon indicating copy to clipboard operation
zap-extensions copied to clipboard

spider: do not add seeds to the found list

Open jeremychoi opened this issue 2 years ago • 27 comments

Overview

Fix zaproxy/zaproxy#7737

Checklist

  • [N/A] Update help
  • [x] Update changelog
  • [x] Run ./gradlew spotlessApply for code formatting
  • [x] Write tests
  • [x] Check code coverage
  • [x] Sign-off commits
  • [x] Squash commits
  • [x] Use a descriptive title

For more details, please refer to the developer rules and guidelines.

jeremychoi avatar Oct 04 '23 06:10 jeremychoi

Please see https://github.com/zaproxy/zap-extensions/pull/4963/checks?check_run_id=17379176664

The changelog should be updated. Better to have some tests.

thc202 avatar Oct 04 '23 07:10 thc202

Note that the robots.txt and sitemap.xml URLs still appear in the SiteMap. This makes sense if they really exist, but maybe we should not include them if they are 404s and requested by the spider? I've wondered about this for a while. That doesnt have to be part of this PR, just a comment 😁

psiinon avatar Oct 04 '23 08:10 psiinon

Better to have some tests.

maybe I could add a test like

  1. spiderController.addSeed(uri, ...)
  2. assert if the uri does not exist in foundURIs.

Does that makes sense?

BTW, since I'm yet to learn more about the ZAP code, I'm not sure at the moment how I can access 'foundURIs' in SpiderControllerUnitTest.Java? @thc202 Could you give me an example?

jeremychoi avatar Oct 04 '23 23:10 jeremychoi

maybe we should not include them if they are 404s and requested by the spider?

+1 from a user perspective :)

jeremychoi avatar Oct 04 '23 23:10 jeremychoi

The changelog should be updated.

Could you let me know where the changelog is? Maybe it'd be better if the information could be found in https://github.com/zaproxy/zaproxy/blob/main/CONTRIBUTING.md#guidelines-for-pull-request-pr-submission-and-processing

jeremychoi avatar Oct 04 '23 23:10 jeremychoi

https://github.com/zaproxy/zap-extensions/blob/main/addOns/spider/CHANGELOG.md

kingthorin avatar Oct 04 '23 23:10 kingthorin

For the record that's in https://github.com/zaproxy/zap-extensions/blob/main/CONTRIBUTING.md#changelog

thc202 avatar Oct 05 '23 07:10 thc202

maybe I could add a test like

  1. spiderController.addSeed(uri, ...)
  2. assert if the uri does not exist in foundURIs.

Does that makes sense?

BTW, since I'm yet to learn more about the ZAP code, I'm not sure at the moment how I can access 'foundURIs' in SpiderControllerUnitTest.Java? @thc202 Could you give me an example?

@thc202 or anyone, could I get any info which will be helpful?

jeremychoi avatar Oct 30 '23 02:10 jeremychoi

In SpiderControllerUnitTest something like:

@Test
void shouldNotNotifySeedAsFoundUri() {
    // Given
    URI seed = …
    // When
    spiderController.addSeed(seed, …);
    // Then
    verify(spider, times(0)).notifyListenersFoundURI(eq(seed), …);
}

although this is not correct (IMO) we should still notify about the seeds used, the referenced issue was about the count of found URIs so the fix should be to ignore seeds when counting.

thc202 avatar Oct 30 '23 07:10 thc202

@jeremychoi do you still plan to finish this?

kingthorin avatar Jan 08 '24 00:01 kingthorin

@kingthorin will try to find time to finish this week.

jeremychoi avatar Jan 08 '24 06:01 jeremychoi

@kingthorin @thc202 please review. @thc202 we should still notify about the seeds used => In that case, IMO it should be implemented in a separate way, instead of calling notifyListenersFoundURI() whose name doesn't fit here, because it's not that seeds are found.

jeremychoi avatar Jan 09 '24 00:01 jeremychoi

Won't this mean that none of the seeds are marked found, though we only want to skip the "artificial" seeds (robots, sitemap, ds_store)?

@kingthorin you're right. would this work if we checked the found-URI belongs to the aritificial seeds?

jeremychoi avatar Jan 18 '24 23:01 jeremychoi

I'd be good with that.

kingthorin avatar Jan 19 '24 00:01 kingthorin

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

github-actions[bot] avatar Feb 27 '24 06:02 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA

jeremychoi avatar Feb 27 '24 06:02 jeremychoi

recheck

jeremychoi avatar Feb 27 '24 06:02 jeremychoi

Could anyone help with solving this error?

A failure occurred while executing me.champeau.gradle.japicmp.JApiCmpWorkAction 937 actionable tasks: 99 executed, 311 from cache, 527 up-to-date Detected binary changes. - current: spider-0.11.0.jar - baseline: spider-0.6.0.jar.

jeremychoi avatar Feb 28 '24 00:02 jeremychoi

@jeremychoi are you able to keep going with this?

kingthorin avatar May 14 '24 15:05 kingthorin

@kingthorin Thanks for the heads up. Yes. Sorry for the delay. I'll find time this weekend.

jeremychoi avatar May 15 '24 06:05 jeremychoi

Can we have a clarification of what the final behaviour is going to be?

thc202 avatar May 15 '24 06:05 thc202

@thc202

Can we have a clarification of what the final behaviour is going to be?

the file seeds(e.g. robots, sitemap) will not be notified as 'foundURI'.

jeremychoi avatar May 20 '24 01:05 jeremychoi

We are back to the remark in https://github.com/zaproxy/zap-extensions/pull/4963#issuecomment-1784616816.

thc202 avatar May 20 '24 06:05 thc202

@thc202 we should still notify about the seeds used, the referenced issue was about the count of found URIs so the fix should be to ignore seeds when counting. => do you mean file seeds should be notified as found URIs?

jeremychoi avatar May 22 '24 02:05 jeremychoi

Please feel free to close this if my last contribution is not good enough. Unfortunately I'm afraid I have no bandwidth to further work on this recently.

jeremychoi avatar Jun 21 '24 09:06 jeremychoi

We are working on it.

thc202 avatar Jun 21 '24 09:06 thc202

@jeremychoi thanks for getting it started and thanks for letting us know.

kingthorin avatar Jun 21 '24 09:06 kingthorin