airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Switch cloudant provider from cloudant library to ibmcloudant library

Open topherinternational opened this issue 1 year ago • 11 comments

closes: #21004, cloudant is no longer developed and ibmcloudant is the active replacement

This is a breaking change to the provider in the following ways:

  • get_conn now returns a CloudantV1 object with different function names than the previous Cloudant object; the mappings from the old library functions to the new are outlined here: https://github.com/cloudant/python-cloudant/blob/master/MIGRATION.md#reference-table
  • get_conn now directly returns a usable object instead of a context manager; with-block syntax is no longer needed
  • The connection is now checked for a mandatory host value that functions as the Cloudant account name (previously the host field was optional, but could cause a runtime failure)

topherinternational avatar Aug 17 '24 18:08 topherinternational

Rats, looks like the joke is on me and this PR is blocked on another third-party...ibmcloudant has a dependency conflict with the Snowflake Python library that nerfs the Airflow build for Python 3.8 and 3.9.

There's an issue for this in the Snowflake repo (two issues actually) with an open PR, so hopefully it will be fixed soon and this Cloudant PR will unblock.

Ironic as this issue stalled for a while waiting for Flask-AppBuilder to fix a dependency pin.

Details:

ibmcloudant brings in ibm-cloud-sdk-core which requires urllib3 2.x, which is fine for core Airflow, but the snowflake provider brings in snowflake-connector-python, which (for Python 3.8. and 3.9 only) requires urllib3 1.x.

topherinternational avatar Aug 17 '24 21:08 topherinternational

Rats, looks like the joke is on me and this PR is blocked on another third-party...ibmcloudant has a dependency conflict with the Snowflake Python library that nerfs the Airflow build for Python 3.8 and 3.9.

There's an issue for this in the Snowflake repo (two issues actually) with an open PR, so hopefully it will be fixed soon and this Cloudant PR will unblock.

Ironic as this issue stalled for a while waiting for Flask-AppBuilder to fix a dependency pin.

Details:

ibmcloudant brings in ibm-cloud-sdk-core which requires urllib3 2.x, which is fine for core Airflow, but the snowflake provider brings in snowflake-connector-python, which (for Python 3.8. and 3.9 only) requires urllib3 1.x.

You could potentially also add "3.8" and "3.9" to "excluded-python-versions" in the new version of cloudant provider (in provider.yaml) with the link to the snowflake issue - and remove the exclusions once this problem is fixed.

potiuk avatar Aug 18 '24 20:08 potiuk

See https://github.com/apache/airflow/pull/41548 - where we just removed such exclusion for Papermill for Python 3.12

potiuk avatar Aug 18 '24 20:08 potiuk

I've added "all-versions" tag to the PR of yours - to make sure it's tested on all versions - so once you exclude python versions you should be able to see both CI images and tests to run for cloudant only on 3.10 -> 3.12 images and be skipped for 3.8 and 3.9

potiuk avatar Aug 18 '24 20:08 potiuk

Great explanation, thank you @potiuk

topherinternational avatar Aug 19 '24 01:08 topherinternational

Ok @potiuk I've added the exclusion and an explanatory comment, so I think this is complete. Please review at your convenience, thanks.

topherinternational avatar Aug 19 '24 06:08 topherinternational

You should also add a note at the top of changelog (see comment at the top of changelog) and bump the version to the new major release in the provider.yaml.

potiuk avatar Aug 19 '24 10:08 potiuk

edit: moved this comment to an issue #41718

@potiuk @eladkal we have 3 persistent failures in the provider checks; tldr, I think there's a bug-ish in the provider checks commands in breeze where all providers are assumed to be supported on Python 3.8.

The command breeze release-management verify-provider-packages --use-packages-from-dist --package-format wheel --use-airflow-version wheel --airflow-constraints-reference default --providers-constraints-location /files/constraints-3.8/constraints-source-providers-3.8.txt installs all of the provider wheels built in the earlier breeze release-management prepare-provider-packages ... step.

This process all runs on 3.8 and so implicitly assumes that all of the providers are legit for 3.8; neither step looks like it checks excluded-python-versions. So the install fails due to the ibmcloudant and snowflake dependency conflict (the very reason we excluded 3.8 in the first place).

These commands happen in Provider packages wheel build and verify, Compat 2.8.4:P3.8 provider check and Compat 2.9.1:P3.8 provider check.

Addressing this is obviously your call, but maybe some options are?:

  1. Make prepare-provider-packages check excluded versions before building a particular provider wheel
  2. Add "Remove incompatible provider packages" task to Provider packages wheel build and verify, and have the task reference the provider dependencies json and the excluded python versions.
  3. Run the provider checks on every supported python version (after doing 1 or 2)

I checked the repo and none of the released providers exclude Python 3.8, I figure maybe this problem hasn't come up yet.

topherinternational avatar Aug 23 '24 04:08 topherinternational

We also have spurious failures where one of the tests in "Lowest direct dependency resolution tests" and one in "Kubernetes tests" will fail, but it looks like a different one each time. Suspiciously the failures take about twice as long as any of the successes (2hrs vs ~55mins) so maybe a resource starvation or a hang in a testing resource.

topherinternational avatar Aug 23 '24 04:08 topherinternational

We also have spurious failures where one of the tests in "Lowest direct dependency resolution tests" and one in "Kubernetes tests" will fail, but it looks like a different one each time. Suspiciously the failures take about twice as long as any of the successes (2hrs vs ~55mins) so maybe a resource starvation or a hang in a testing resource.

Yes. Some flakes.

potiuk avatar Aug 25 '24 10:08 potiuk

I checked the repo and none of the released providers exclude Python 3.8, I figure maybe this problem hasn't come up yet.

Yes. As I wrote in the #41718 - likely matrix for all providers and python versions is far too big an overhead for this one - this is the first time we have a provider that does not work for 3.8. So likely removing the providers that are not 3.8 compatible is the way to go - feel free to do so. We already have a code that removes non-compatible providers in compatibility checks - you can add ibmcloudant there (with comment referring to the conflict and issue with snowflake). Similarly you can even hard-code a step after builiding the providers in verify providers (again - with comment and link to the snowflake issue and note that it should be removed after the issue is solved.

Here is the compatibility check config and you can add ibmcloudant there for all versions simply. Also we could - theoretically also add 3.10 there and NOT remove cloudant, but taking into account temporary (hopefully) state for the removal, I don't think we need.

BASE_PROVIDERS_COMPATIBILITY_CHECKS: list[dict[str, str | list[str]]] = [
    {
        "python-version": "3.8",
        "airflow-version": "2.8.4",
        "remove-providers": "fab",
        "run-tests": "true",
    },
    {
        "python-version": "3.8",
        "airflow-version": "2.9.3",
        "remove-providers": "",
        "run-tests": "true",
    },
    {
        "python-version": "3.8",
        "airflow-version": "2.10.0",
        "remove-providers": "",
        "run-tests": "true",
    },
]

potiuk avatar Aug 25 '24 10:08 potiuk

There is one more issue - when package is excluded fromcertai version , the 'lowest deendencies' test goes wild - it tests all other packages then. This should be fixed as part of CI infra fix

potiuk avatar Aug 30 '24 10:08 potiuk

I added a fixup that is supposed to skip cloundant tests for 3.8, 3.9, 3.12 @topherinternational

potiuk avatar Aug 31 '24 23:08 potiuk

Almost there. For some strange reason apache.beam "downgrade" does not work quite consistently and hangs for Python 3.12.

Need to take a look

potiuk avatar Sep 01 '24 09:09 potiuk

I guess we will also have to exclude Python 3.12 version - until apache.beam fixes the conflicts.

potiuk avatar Sep 01 '24 12:09 potiuk

There is one more issue - when package is excluded fromcertai version , the 'lowest deendencies' test goes wild - it tests all other packages then. This should be fixed as part of CI infra fix

I found the bug in breeze that causes this, I am coding it up into a separate PR.

topherinternational avatar Sep 01 '24 17:09 topherinternational

Hmm. The problem is this for apache.beam

--verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes
--junitxml=/files/test_result-providers_apache_beam-postgres.xml
--timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 
--disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration
--warning-output-path=/files/warnings-providers_apache_beam-postgres.txt
--ignore=helm_tests --with-db-init --ignore=tests/providers/apache/beam
--ignore=tests/system/providers/apache/beam
--ignore=tests/integration/providers/apache/beam --no-cov

It runs all other tests as beam is also excluded for Python 3.12

potiuk avatar Sep 04 '24 09:09 potiuk

This one should fix it https://github.com/apache/airflow/pull/41991

potiuk avatar Sep 04 '24 10:09 potiuk

Hmm. The problem is this for apache.beam


--verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes

--junitxml=/files/test_result-providers_apache_beam-postgres.xml

--timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 

--disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration

--warning-output-path=/files/warnings-providers_apache_beam-postgres.txt

--ignore=helm_tests --with-db-init --ignore=tests/providers/apache/beam

--ignore=tests/system/providers/apache/beam

--ignore=tests/integration/providers/apache/beam --no-cov

It runs all other tests as beam is also excluded for Python 3.12

Hmm, that looks like the same problem I brought up in #41967, Breeze removes the test dir argument for the excluded provider, and so the command winds up running all of the tests.

topherinternational avatar Sep 04 '24 11:09 topherinternational

I rebased to see if the fix #41991 is merged

potiuk avatar Sep 04 '24 21:09 potiuk

Opened #42027 which I think fixes the current test failures.

topherinternational avatar Sep 05 '24 11:09 topherinternational

🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞

potiuk avatar Sep 05 '24 11:09 potiuk

Another cross-fix: #42030

topherinternational avatar Sep 05 '24 12:09 topherinternational

🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞

potiuk avatar Sep 05 '24 12:09 potiuk

I removed those skip functions from the Dockerfile and entrypoint files since I think that's covered now by #41991.

topherinternational avatar Sep 05 '24 13:09 topherinternational

🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞🤞 🤞 🤞 🤞 🤞 🤞 🤞 🤞

potiuk avatar Sep 05 '24 13:09 potiuk

Woooooohoooo!

potiuk avatar Sep 06 '24 11:09 potiuk

Woooooohoooo!

This has been...educational 😉

topherinternational avatar Sep 06 '24 11:09 topherinternational

This has been...educational

Oh, I am sure it was :)

potiuk avatar Sep 06 '24 11:09 potiuk