urlchecker-python icon indicating copy to clipboard operation
urlchecker-python copied to clipboard

Add/asyncio aiohttp acceleration

Open SuperKogito opened this issue 5 years ago • 3 comments

This PR :

  • is related to #50
  • implements urls checks accelerations using asyncio and aiohttp libs instead of requests
  • adds more test files and tests for better coverage
  • drops support for python 3.5
  • attempts to fix urlchecker-action/issues/76

@vsoch I tried my best to make sure that all the features are working similarly to previous versions. However, due to the libs differences from requests, the timeout arg is now defined in minutes. This PR will need some thorough testing due to the implemented major changes in the core of urlchecker. However, no stress so take your time :wink:

SuperKogito avatar Feb 14 '21 20:02 SuperKogito

Codecov Report

Merging #52 (595a5dc) into master (08fc1bb) will increase coverage by 13.98%. The diff coverage is 94.54%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #52       +/-   ##
===========================================
+ Coverage   76.76%   90.75%   +13.98%     
===========================================
  Files          12       12               
  Lines         383      400       +17     
===========================================
+ Hits          294      363       +69     
+ Misses         89       37       -52     
Impacted Files Coverage Δ
urlchecker/client/__init__.py 78.43% <ø> (+3.92%) :arrow_up:
urlchecker/logger.py 100.00% <ø> (+57.14%) :arrow_up:
urlchecker/client/check.py 85.93% <75.00%> (+61.34%) :arrow_up:
urlchecker/core/urlproc.py 96.52% <95.91%> (+4.13%) :arrow_up:
urlchecker/version.py 100.00% <100.00%> (ø)
urlchecker/core/check.py 86.66% <0.00%> (+3.33%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 08fc1bb...595a5dc. Read the comment docs.

codecov[bot] avatar Feb 14 '21 21:02 codecov[bot]

I don't see where this comment is but I wanted to respond:

according to aiohttp the errors can happen in the session or the get call. The reason why I went with this implementation is that it was the best one to check the urls correctly. Also the tests don't necessarily provide the best coverage of all the exception possibilities that's why I am skeptical about the alternatives. One particular error I kept getting and I am not sure how to react to it other than increasing the pause variable was [429 too many requests] and that is why I tried to diversify the urls in the test files (that way I won't overwhelm the server).

This is actually my original concern - most servers implement rate limiting, so if you do too many requests too soon, you'll get a 429 and the ip address is blocked. We would want to speed up many things with aiohttp potentially, but checking (and retrying) urls is very likely to lead to being blocked (and the serial approach with a timeout is better). So this is my main concern for wanting to update this to a newer technology, aside from not supporting under python 3.5, we actually add a feature that makes it more likely to have an error.

vsoch avatar Feb 15 '21 19:02 vsoch

You are right about this, that's why I asked #52 if we should add this as an argument activated feature. Should we drop this?

SuperKogito avatar Feb 15 '21 19:02 SuperKogito