XComObjectStorageBackend returns the S3 path during deserialization instead of the data
Apache Airflow version
2.9.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
After configuring the object storage as XCom backend, the serialization works fine above the specified threshold, but once another task consumes the previously stored XCom, the deserialization doesn't seem to work. Instead of the deserialized data, the path of the object is returned.
What you think should happen instead?
The stored object should be deserialized and returned to the downstream task.
How to reproduce
import pendulum
from airflow.decorators import dag, task
@dag(
schedule_interval=None,
catchup=False,
start_date=pendulum.datetime(2024, 1, 1, tz="utc"),
)
def dag_test():
@task()
def producer():
import random
return [random.randint(0, 100) for _ in range(10_000)]
@task()
def consumer(obj):
print(obj)
producer_t = producer()
producer_t >> consumer(producer_t)
dag_test()
Operating System
apache/airflow:2.9.0-python3.11
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
cc @bolkedebruin I haven't looked at the code, but was expecting that during deserialization this works out of the box.
https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/xcoms.html#object-storage-xcom-backend
cc @uranusjr
https://github.com/apache/airflow/blob/f57de6c1836199190ab02419aa2b9d5caee33002/airflow/providers/common/io/xcom/backend.py#L151-L165
I seem to recall @TJaniF has a very similar issue a while ago. I don’t remember the details but it was some sort of configuration issues for that instance. And if it is indeed not a bug in Airflow logic, we should try to detect the configuration issue and raise it better to the user, instead of returning a wrong value.
Yes, I had the same issue and as far as I am aware this PR fixes it: https://github.com/apache/airflow/pull/39313 so it should be fixed in 2.9.2 :)
Good to hear Astronomer is now also in the time machine bussiness.
Well thanks. That was quick :-).
Note: @Uture I don't think the fix will do this for past xcom values. You will need to regenerate those.
Great, thank you all for resolving this so quickly.