airflow icon indicating copy to clipboard operation
airflow copied to clipboard

XComObjectStorageBackend returns the S3 path during deserialization instead of the data

Open Uture opened this issue 1 year ago • 4 comments

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

After configuring the object storage as XCom backend, the serialization works fine above the specified threshold, but once another task consumes the previously stored XCom, the deserialization doesn't seem to work. Instead of the deserialized data, the path of the object is returned.

What you think should happen instead?

The stored object should be deserialized and returned to the downstream task.

How to reproduce

import pendulum
from airflow.decorators import dag, task


@dag(
    schedule_interval=None,
    catchup=False,
    start_date=pendulum.datetime(2024, 1, 1, tz="utc"),
)
def dag_test():

    @task()
    def producer():
        import random

        return [random.randint(0, 100) for _ in range(10_000)]

    @task()
    def consumer(obj):
        print(obj)

    producer_t = producer()
    producer_t >> consumer(producer_t)


dag_test()

Operating System

apache/airflow:2.9.0-python3.11

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Uture avatar May 14 '24 05:05 Uture

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar May 14 '24 05:05 boring-cyborg[bot]

cc @bolkedebruin I haven't looked at the code, but was expecting that during deserialization this works out of the box.

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/xcoms.html#object-storage-xcom-backend

kaxil avatar May 14 '24 09:05 kaxil

cc @uranusjr

https://github.com/apache/airflow/blob/f57de6c1836199190ab02419aa2b9d5caee33002/airflow/providers/common/io/xcom/backend.py#L151-L165

kaxil avatar May 14 '24 10:05 kaxil

I seem to recall @TJaniF has a very similar issue a while ago. I don’t remember the details but it was some sort of configuration issues for that instance. And if it is indeed not a bug in Airflow logic, we should try to detect the configuration issue and raise it better to the user, instead of returning a wrong value.

uranusjr avatar May 14 '24 10:05 uranusjr

Yes, I had the same issue and as far as I am aware this PR fixes it: https://github.com/apache/airflow/pull/39313 so it should be fixed in 2.9.2 :)

TJaniF avatar May 14 '24 10:05 TJaniF

Good to hear Astronomer is now also in the time machine bussiness.

uranusjr avatar May 14 '24 10:05 uranusjr

Well thanks. That was quick :-).

bolkedebruin avatar May 14 '24 13:05 bolkedebruin

Note: @Uture I don't think the fix will do this for past xcom values. You will need to regenerate those.

bolkedebruin avatar May 14 '24 13:05 bolkedebruin

Great, thank you all for resolving this so quickly.

Uture avatar May 14 '24 16:05 Uture