Switch cloudant provider from cloudant library to ibmcloudant library
closes: #21004, cloudant is no longer developed and ibmcloudant is the active replacement
This is a breaking change to the provider in the following ways:
-
get_connnow returns aCloudantV1object with different function names than the previousCloudantobject; the mappings from the old library functions to the new are outlined here: https://github.com/cloudant/python-cloudant/blob/master/MIGRATION.md#reference-table -
get_connnow directly returns a usable object instead of a context manager;with-block syntax is no longer needed - The connection is now checked for a mandatory
hostvalue that functions as the Cloudant account name (previously the host field was optional, but could cause a runtime failure)
Rats, looks like the joke is on me and this PR is blocked on another third-party...ibmcloudant has a dependency conflict with the Snowflake Python library that nerfs the Airflow build for Python 3.8 and 3.9.
There's an issue for this in the Snowflake repo (two issues actually) with an open PR, so hopefully it will be fixed soon and this Cloudant PR will unblock.
Ironic as this issue stalled for a while waiting for Flask-AppBuilder to fix a dependency pin.
Details:
ibmcloudant brings in ibm-cloud-sdk-core which requires urllib3 2.x, which is fine for core Airflow, but the snowflake provider brings in snowflake-connector-python, which (for Python 3.8. and 3.9 only) requires urllib3 1.x.
Rats, looks like the joke is on me and this PR is blocked on another third-party...
ibmcloudanthas a dependency conflict with the Snowflake Python library that nerfs the Airflow build for Python 3.8 and 3.9.There's an issue for this in the Snowflake repo (two issues actually) with an open PR, so hopefully it will be fixed soon and this Cloudant PR will unblock.
Ironic as this issue stalled for a while waiting for Flask-AppBuilder to fix a dependency pin.
Details:
ibmcloudantbrings inibm-cloud-sdk-corewhich requiresurllib32.x, which is fine for core Airflow, but the snowflake provider brings insnowflake-connector-python, which (for Python 3.8. and 3.9 only) requires urllib3 1.x.
You could potentially also add "3.8" and "3.9" to "excluded-python-versions" in the new version of cloudant provider (in provider.yaml) with the link to the snowflake issue - and remove the exclusions once this problem is fixed.
See https://github.com/apache/airflow/pull/41548 - where we just removed such exclusion for Papermill for Python 3.12
I've added "all-versions" tag to the PR of yours - to make sure it's tested on all versions - so once you exclude python versions you should be able to see both CI images and tests to run for cloudant only on 3.10 -> 3.12 images and be skipped for 3.8 and 3.9
Great explanation, thank you @potiuk
Ok @potiuk I've added the exclusion and an explanatory comment, so I think this is complete. Please review at your convenience, thanks.
You should also add a note at the top of changelog (see comment at the top of changelog) and bump the version to the new major release in the provider.yaml.
edit: moved this comment to an issue #41718
@potiuk @eladkal we have 3 persistent failures in the provider checks; tldr, I think there's a bug-ish in the provider checks commands in breeze where all providers are assumed to be supported on Python 3.8.
The command breeze release-management verify-provider-packages --use-packages-from-dist --package-format wheel --use-airflow-version wheel --airflow-constraints-reference default --providers-constraints-location /files/constraints-3.8/constraints-source-providers-3.8.txt installs all of the provider wheels built in the earlier breeze release-management prepare-provider-packages ... step.
This process all runs on 3.8 and so implicitly assumes that all of the providers are legit for 3.8; neither step looks like it checks excluded-python-versions. So the install fails due to the ibmcloudant and snowflake dependency conflict (the very reason we excluded 3.8 in the first place).
These commands happen in Provider packages wheel build and verify, Compat 2.8.4:P3.8 provider check and Compat 2.9.1:P3.8 provider check.
Addressing this is obviously your call, but maybe some options are?:
- Make
prepare-provider-packagescheck excluded versions before building a particular provider wheel - Add "Remove incompatible
provider packages" task to Provider packages wheel build and verify, and have the task reference the provider dependencies json and the excluded python versions. - Run the provider checks on every supported python version (after doing 1 or 2)
I checked the repo and none of the released providers exclude Python 3.8, I figure maybe this problem hasn't come up yet.
We also have spurious failures where one of the tests in "Lowest direct dependency resolution tests" and one in "Kubernetes tests" will fail, but it looks like a different one each time. Suspiciously the failures take about twice as long as any of the successes (2hrs vs ~55mins) so maybe a resource starvation or a hang in a testing resource.
We also have spurious failures where one of the tests in "Lowest direct dependency resolution tests" and one in "Kubernetes tests" will fail, but it looks like a different one each time. Suspiciously the failures take about twice as long as any of the successes (2hrs vs ~55mins) so maybe a resource starvation or a hang in a testing resource.
Yes. Some flakes.
I checked the repo and none of the released providers exclude Python 3.8, I figure maybe this problem hasn't come up yet.
Yes. As I wrote in the #41718 - likely matrix for all providers and python versions is far too big an overhead for this one - this is the first time we have a provider that does not work for 3.8. So likely removing the providers that are not 3.8 compatible is the way to go - feel free to do so. We already have a code that removes non-compatible providers in compatibility checks - you can add ibmcloudant there (with comment referring to the conflict and issue with snowflake). Similarly you can even hard-code a step after builiding the providers in verify providers (again - with comment and link to the snowflake issue and note that it should be removed after the issue is solved.
Here is the compatibility check config and you can add ibmcloudant there for all versions simply. Also we could - theoretically also add 3.10 there and NOT remove cloudant, but taking into account temporary (hopefully) state for the removal, I don't think we need.
BASE_PROVIDERS_COMPATIBILITY_CHECKS: list[dict[str, str | list[str]]] = [
{
"python-version": "3.8",
"airflow-version": "2.8.4",
"remove-providers": "fab",
"run-tests": "true",
},
{
"python-version": "3.8",
"airflow-version": "2.9.3",
"remove-providers": "",
"run-tests": "true",
},
{
"python-version": "3.8",
"airflow-version": "2.10.0",
"remove-providers": "",
"run-tests": "true",
},
]
There is one more issue - when package is excluded fromcertai version , the 'lowest deendencies' test goes wild - it tests all other packages then. This should be fixed as part of CI infra fix
I added a fixup that is supposed to skip cloundant tests for 3.8, 3.9, 3.12 @topherinternational
Almost there. For some strange reason apache.beam "downgrade" does not work quite consistently and hangs for Python 3.12.
Need to take a look
I guess we will also have to exclude Python 3.12 version - until apache.beam fixes the conflicts.
There is one more issue - when package is excluded fromcertai version , the 'lowest deendencies' test goes wild - it tests all other packages then. This should be fixed as part of CI infra fix
I found the bug in breeze that causes this, I am coding it up into a separate PR.
Hmm. The problem is this for apache.beam
--verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes
--junitxml=/files/test_result-providers_apache_beam-postgres.xml
--timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60
--disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration
--warning-output-path=/files/warnings-providers_apache_beam-postgres.txt
--ignore=helm_tests --with-db-init --ignore=tests/providers/apache/beam
--ignore=tests/system/providers/apache/beam
--ignore=tests/integration/providers/apache/beam --no-cov
It runs all other tests as beam is also excluded for Python 3.12
This one should fix it https://github.com/apache/airflow/pull/41991
Hmm. The problem is this for apache.beam
--verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --junitxml=/files/test_result-providers_apache_beam-postgres.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 --disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration --warning-output-path=/files/warnings-providers_apache_beam-postgres.txt --ignore=helm_tests --with-db-init --ignore=tests/providers/apache/beam --ignore=tests/system/providers/apache/beam --ignore=tests/integration/providers/apache/beam --no-covIt runs
all othertests as beam is also excluded for Python 3.12
Hmm, that looks like the same problem I brought up in #41967, Breeze removes the test dir argument for the excluded provider, and so the command winds up running all of the tests.
I rebased to see if the fix #41991 is merged
Opened #42027 which I think fixes the current test failures.
🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞
Another cross-fix: #42030
🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞
I removed those skip functions from the Dockerfile and entrypoint files since I think that's covered now by #41991.
🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞
Woooooohoooo!
Woooooohoooo!
This has been...educational 😉
This has been...educational
Oh, I am sure it was :)