Increase timeout duration for get request in `fetch_http`
The current timeout of 5 seconds is insufficient for fetching archives like https://www.busybox.net/downloads/busybox-1.01.tar.bz2, https://www.uclibc.org/downloads/uClibc-0.9.30.tar.gz since these websites are bit slow in their response.
@keshav-space is this because of https://github.com/nexB/scancode.io/blob/d6389b28841c4edf25075208eaf0708658650d06/scanpipe/pipes/fetch.py#L380 ?
@keshav-space is this because of
https://github.com/nexB/scancode.io/blob/d6389b28841c4edf25075208eaf0708658650d06/scanpipe/pipes/fetch.py#L380
?
@pombredanne No.
The problem is here in fetch_http
https://github.com/nexB/scancode.io/blob/d6389b28841c4edf25075208eaf0708658650d06/scanpipe/pipes/fetch.py#L99
The current timeout of 5 seconds is insufficient for fetching archives
https://docs.python-requests.org/en/latest/user/advanced/#timeouts
The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine
As a clarification, the timeout value here is not the time available to fetch the whole file, just the time allowed to get a response from the server.
That been said, those server URLs are extremely slow to provide a connection answer, over 10 seconds at times. I'm not sure what would be the best timeout value here, as raising it too much may have unwanted consequences.
@keshav-space what's you take?
@tdruez Some of these URLs very inconsistent in their response time, for example if I try getting response https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.3.11.tar.gz it takes around ~20 seconds but if try to get the response for the same archive the second time it takes less than ~5 seconds, definitely some of these websites are doing delivery optimization.
@keshav-space So we may want to implement a combination of adding an automatic retry on timeout exception + raising the default timeout value to something around 10 seconds.
@tdruez yes, that should work.