NO_PROXY environment variable is not considered while installing the pack dependencies
SUMMARY
I have HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables set in my setup and also, I maintain an internal PyPI hosting most of the python dependencies. Now, I want the st2 pack install file:///path/to/pack/folder command to download python dependencies from this internal PyPI instead of the official PyPI and so, I added the internal PyPI URL to the NO_PROXY environment variable. But the st2 pack install always tries to fetch the dependencies via the HTTP(S)_PROXY only.
STACKSTORM VERSION
st2 3.1.0, on Python 3.6.9
Steps to reproduce the problem
Set HTTP_PROXY and HTTPS_PROXY environment variables and make sure NO_PROXY environment variable contain the internal PyPI URL. Now, install a pack using st2 pack install and it will still try to fetch the dependencies via the proxy instead of avoiding it.
Expected Results
Ideally during the pack installation, the dependencies should be pulled directly from the internal PyPI avoiding the HTTP(S)_PROXY as the internal PyPI URL is part of the NO_PROXY environment variable.
Actual Results
Downloads the dependencies using the HTTP(S)_PROXY or FAILS if the HTTP(s)_PROXY URL is not reachable.
What happened? What output did you get?
I made the proxies set via HTTP(s)_PROXY unreachable, and ran st2 pack install file:///path/to/pack/folder failed while installing the dependencies.
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden',))': /internal/blobs/six/
Ideally it should never check for the HTTP(S)_PROXY and pull the dependencies from the internal PyPI.
Possible bug:
I see that we always set --proxy flag on the pip install command if the HTTP(S)_PROXY env. variable is set or the http_proxy or https_proxy are set in the config file as described here: https://docs.stackstorm.com/packs.html#installing-packs-from-behind-a-proxy and NO_PROXY is not checked.
https://github.com/StackStorm/st2/blob/master/st2common/st2common/util/virtualenvs.py#L244
Are you setting the hostname instead of full URL for your pip registry? Like no_proxy=your-pip-repo,localhost,10.1.0.0/16?
Also make sure to use lower-case no_proxy.
@armab Thank you for the reply.
I tried with the FQDN and also the regex matching the domain like .internal.example.com. But no luck.
no_proxy in the config file? Yes.
Which config file do you mean?
The envs needs to be placed for the respective services https://docs.stackstorm.com/packs.html#installing-packs-from-behind-a-proxy and then services should be restarted to pick that up.
st2actionrunner is the actual service which runs all the pack install/download commands under the hood.
I remember no_proxy worked before in general, but it might have an edge case for pip?
Yes @armab
My installation runs on a CentOS machine and the content of both the /etc/sysconfig/st2actionrunner and /etc/sysconfig/st2api file is as below:
http_proxy=http://http-proxy.example.com:1234
https_proxy=https://https-proxy.example.com:1234
no_proxy=127.0.0.1,localhost,.internal.example.com
Thanks for more info!
I looked deeper in the history and this might be helpful to have more context when trying to find the fix these days: https://github.com/StackStorm/st2/pull/3556#discussion_r126786840 and https://github.com/pypa/pip/issues/2440
As pip doesn't have an explicit no_proxy setting and so we pass it via env variable, maybe try to debug further here:
https://github.com/StackStorm/st2/blob/ce31176556be506606b83f6854630f8e6c0fd5ea/st2common/st2common/util/virtualenvs.py#L262-L266
and check if env indeed has no_proxy info passed or not.
I have also encountered this issue. It is due to the way requests handles environment variables specifying proxies, and those that are directly given to the requests.Session constructor itself, via the pip install --proxy ... argument.
In short, requests does the following in requests.Session.merge_environment_settings
- uses the environment variables
https_proxyandno_proxyto decide ifhttps_proxyneeds to be respected for the particular url (inrequests.utils.get_environ_proxies); and - merges this proxy information over the explicit session proxies.
Unfortunately, StackStorm currently lifts the value of https_proxy and passes it to pip via pip --proxy ... when it does not need to. This means the following will happen:
# suppose these are in the environment
https_proxy=some-internal-proxy
no_proxy=something-inside-network.com
# and this is either in PIP_INDEX_URL (or perhaps in `/etc/pip.conf`)
PIP_INDEX_URL=https://somewhere.something-inside-network.com/pypi
# then StackStorm `packs.setup_virtualenv` will eventually attempt to install `six` using
pip install --proxy some-internal-proxy six
And then requests:
- creates a "request specific"
proxies = {}, correctly indicating that no proxy is required (the url was blacklisted byno_proxy); and - merges this on top of the explicit session proxies
{"http": "some-internal-proxy", "https": "some-internal-proxy"}(generated from thepip --proxy ...); and - the result of merging an empty dictionary on top means the final decision about proxies is still
{"http": "some-internal-proxy", "https": "some-internal-proxy"}
Thus we are left still trying to contact the proxy, when we should not be.
Possible Solution
StackStorm should not be passing --proxy arguments to pip unless http_proxy and https_proxy are nonexistent. Preferably it should instead allow requests to pick them up from the environment only.
@donkopotamus Looks like you drilled down the problem really deep with the root-cause analysis. That's exactly the approach needed here.
If you could open a PR with a bugfix and test cases that would cover this issue, - we'd be happy to review the solution. Contributions are always welcome to StackStorm!