airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Task fails and cannot read logs. Invalid URL 'http://:8793/log/...': No host supplied

Open pedro-cf opened this issue 1 year ago • 50 comments

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

I'm having an issue with an airflow instance where a task fails and I cannot read the logs.

Logs:

*** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=my_dag/run_id=dynamic__apple_3_my_dag_cb353081__2024-09-09T14:41:22.596199__f73c5571719e4f35bf195ded40e5e25b/task_id=cleanup_temporary_directory/attempt=1.log': No host supplied

Event logs:

Executor CeleryExecutor(parallelism=128) reported that the task instance <TaskInstance: my_dag.cleanup_temporary_directory dynamic__apple_3_my_dag_cb353081__2024-09-09T14:41:22.596199__f73c5571719e4f35bf195ded40e5e25b [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

What you think should happen instead?

I should be able to access the logs.

How to reproduce

Not sure how to.

Operating System

Ubuntu 24.04 LTS

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

Deployed with docker-compose on Docker Swarm setup on 2 VMs.

Anything else?

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

pedro-cf avatar Sep 10 '24 15:09 pedro-cf

having the same issue with 2.10.1 in k8s, using the CeleryKubernetesExecutor.

Could this be related to the inheritance issue that was discussed in https://github.com/apache/airflow/issues/41891?

andrew-stein-sp avatar Sep 10 '24 20:09 andrew-stein-sp

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

pedro-cf avatar Sep 10 '24 20:09 pedro-cf

Having the same issue on 2.10.0 through a podman-compose

adriens avatar Sep 16 '24 03:09 adriens

We have upgraded on 2.10.1 like @andrew-stein-sp and we could reproduce the same behavior

adriens avatar Sep 16 '24 04:09 adriens

got the same behavior since upgrading from version 2.9.3 to 2.10.1. We are using LocalExecutor

sosystems-dev avatar Sep 17 '24 11:09 sosystems-dev

I have the same issue with 2.10.0, using the CeleryExecutor. It worked before I upgrading from version 2.9.0 to 2.10.0.

*** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=service_stop/run_id=manual__2024-09-18T09:42:54+09:00/task_id=make_accountlist_task/attempt=1.log': No host supplied

eventlog

Executor CeleryExecutor(parallelism=6) reported that the task instance <TaskInstance: service_stop.make_accountlist_task manual__2024-09-18T09:42:54+09:00 [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally

Scheduler has a error log at the same hour as eventlog.

[2024-09-18T00:43:18.036+0000] {celery_executor.py:291} ERROR - Error sending Celery task: module 'redis' has no attribute 'client'
Celery Task ID: TaskInstanceKey(dag_id='service_stop', task_id='make_accountlist_task', run_id='manual__2024-09-18T09:42:54+09:00', try_number=1, map_index=-1)
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 220, in send_task_to_executor
    result = task_to_run.apply_async(args=[command], queue=queue)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/task.py", line 594, in apply_async
    return app.send_task(
           ^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 797, in send_task
    with self.producer_or_acquire(producer) as P:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 932, in producer_or_acquire
    producer, self.producer_pool.acquire, block=True,
              ^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 1354, in producer_pool
    return self.amqp.producer_pool
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/amqp.py", line 591, in producer_pool
    self.app.connection_for_write()]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 829, in connection_for_write
    return self._connection(url or self.conf.broker_write_url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 880, in _connection
    return self.amqp.Connection(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/connection.py", line 201, in __init__
    if not get_transport_cls(transport).can_parse_url:
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/__init__.py", line 91, in get_transport_cls
    _transport_cache[transport] = resolve_transport(transport)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/__init__.py", line 76, in resolve_transport
    return symbol_by_name(transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/utils/imports.py", line 59, in symbol_by_name
    module = imp(module_name, package=package, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/redis.py", line 282, in <module>
    class PrefixedRedisPipeline(GlobalKeyPrefixMixin, redis.client.Pipeline):
                                                      ^^^^^^^^^^^^
AttributeError: module 'redis' has no attribute 'client'

mn7k avatar Sep 18 '24 01:09 mn7k

same issue for us when upgrading to 2.10.2

damiah avatar Sep 29 '24 21:09 damiah

We’re encountering the same issue as well.

nikithapk avatar Oct 03 '24 06:10 nikithapk

We have switched from Bitnami docker-compose to official Apache docker-compose and we could make it run successfuly :star_struck:

adriens avatar Oct 03 '24 07:10 adriens

try to check that dags exist in worker, schedule and webserver. I deploy Airflow in K8S and get this error when putting my dags into scheduler(expecting that in will replicate into another pods), but when I check dags folder in worker it was empty

Dzhalolov avatar Oct 07 '24 12:10 Dzhalolov

At this time (https://github.com/apache/airflow/issues/42136#issuecomment-2357283522), I used the airflow db upgrade command , but I realized it has been deprecated. I retried the upgrade using the airflow db migrate -n "2.10.2" command, and it works for me now.

https://airflow.apache.org/docs/apache-airflow/2.10.0/installation/upgrading.html#offline-sql-migration-scripts

mn7k avatar Oct 15 '24 06:10 mn7k

We encountered the same problem in Airflow 2.9.3 Here are the Worker logs at the time of the error:

[2024-10-11 10:45:38,544: WARNING/ForkPoolWorker-16] Failed operation _store_result.  Retrying 2 more times.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: could not receive data from server: Connection timed out


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/backends/database/__init__.py", line 47, in _inner
    return fun(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/backends/database/__init__.py", line 117, in _store_result
    task = list(session.query(self.task_cls).filter(self.task_cls.task_id == task_id))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2901, in __iter__
    result = self._iter()
             ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not receive data from server: Connection timed out

[SQL: SELECT celery_taskmeta.id AS celery_taskmeta_id, celery_taskmeta.task_id AS celery_taskmeta_task_id, celery_taskmeta.status AS celery_taskmeta_status, celery_taskmeta.result AS celery_taskmeta_result, celery_taskmeta.date_done AS celery_taskmeta_date_done, celery_taskmeta.traceback AS celery_taskmeta_traceback 
FROM celery_taskmeta 
WHERE celery_taskmeta.task_id = %(task_id_1)s]
[parameters: {'task_id_1': '5d1bef21-fbf4-4feb-9f2c-a54c95b4d738'}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

I can also note that increasing the sql_alchemy_pool_size parameter to 50 reduced the number of such errors, but did not eliminate them completely.

quack39 avatar Oct 17 '24 11:10 quack39

The same issue in Airflow 2.10.2

ali-naderi avatar Oct 22 '24 09:10 ali-naderi

TL;DR look for invalid python scripts on the malfunctioning worker. Try creating a DagBag on the worker and see what happens.

# Ensure the AIRFLOW_HOME points to the right location, then run on the worker
>>> from airflow.models import DagBag
>>> DagBag(include_examples=False)

I had this issue too, turns out I edited one of the files through vim, pasted some code, and it pasted tabs instead of spaces, so the file became an invalid python script due to TabError: inconsistent use of tabs and spaces in indentation. After I fixed that, it all went back to normal.

Note that the problematic file doesn't have to be imported by the failing DAG/task. If I understand the issue correctly, a DagBag cannot be created if one of the DAG definition files or their imports isn't a valid python file. Then the issue manifests as DAGs supposedly not being found. In my case, the filesystem isn't shared between the scheduler and the malfunctioning celery worker, and the affected file was unmodified on the scheduler (or modified in a correct way) - so no "big red import error" was displayed in the webserver UI.

Dev-iL avatar Oct 28 '24 16:10 Dev-iL

Hi @quack39 and all, I am getting same error. I have deployed Airflow [2.9.3] in AKS. But when executing the DAGS getting below error. Don't getting any clue what needs to be updated. I am using Helm [1.15.0] for deployment and using "KubernetesExecuter"

Error

Could not read served logs: HTTPConnectionPool(host='test-dag-config-nlp8suol', port=8793): Max retries exceeded with url: /log/dag_id=test_dag/run_id=manual__2024-11-06T06:43:18.256272+00:00/task_id=config/attempt=1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8ef9eaecd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))

abhijit-sarkar-ext avatar Nov 06 '24 08:11 abhijit-sarkar-ext

I had the same problem when changing sequentialexecutor to localexecutor. After some test , I find I must make parallelism equal to CPU core number .

with t3.large(CPU core :2) parallelism =32 (default) NG parallelism =4 most of the tasks are NG , but some OK parallelism =2 all OK with t3.xlarge(CPU core :4) parallelism =4 all OK

But is this the expected action ? I'm not sure .

darenpang avatar Nov 09 '24 02:11 darenpang

same issue for us when upgrading to 2.10.3,using k8s.

liangpengfei avatar Nov 12 '24 06:11 liangpengfei

The same issue in Airflow 2.10.2 with KubernetesExecutor.

huang06 avatar Nov 14 '24 09:11 huang06

I was just able to get the following to work:

task = BashOperator(
task_id="bash_command",
bash_command=bash_command,
retries=2,
retry_delay=timedelta(minutes=1),
do_xcom_push=False,
env={
'PYTHONUNBUFFERED': '1',
'PYTHONFAULTHANDLER': '1', # Helps debug crashes
'FORCE_COLOR': '1' # Preserves color output in logs
},
cwd='/tmp',
append_env=True
)

initially it failed with the same error, then succeeded on retry, I believe because the log stream was still in 'create' and not available for the first attempt.

kate-rodgers avatar Nov 15 '24 00:11 kate-rodgers

Same issue here, even if the task is successful

hditano avatar Nov 16 '24 07:11 hditano

We have same problem. Please help ) 2.10.3

babaymaster avatar Nov 22 '24 12:11 babaymaster

I've ran accidentally into this issue when I defined a custom volume mount in my docker-compose.yml. At first, I defined the mount as part of the airflow-worker service, which apparently overrode the volume mounts imported from x-airflow-common. This then lead to exactly this issue eventually. I resolved this issue for me, by defining my custom mount in x-airflow-common instead. That is how I ended up in this issue; there might of course be different, completely unrelated ways. But if you've encountered this and worked with custom mounts in a docker-environment, double check your docker-compose.yml and the imports done by <<: * operator. Just a quick heads-up in the hope that it helps somebody.

jliebers avatar Nov 22 '24 13:11 jliebers

The same issue here with 2.10.3 and CeleryExecutor. Worker log:

Nov 29 01:04:05 ubuntu-s-4vcpu-8gb-amd-fra1-01 airflow[2929312]: [2024-11-29T01:04:05.161+0000] {scheduler_job_runner.py:910} ERROR - Executor CeleryExecutor(parallelism=64) reported that the task instance <TaskInstance: my_dag.task_id scheduled__2024-11-29T00:30:00+00:00 [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally

opeida avatar Nov 29 '24 01:11 opeida

same issue deploying on docker with the tutorial supplied on the airflow website. Running test_dags.py as follow : `import datetime

import pendulum

from airflow.models.dag import DAG from airflow.operators.empty import EmptyOperator

now = pendulum.now(tz="UTC") now_to_the_hour = (now - datetime.timedelta(0, 0, 0, 0, 0, 3)).replace(minute=0, second=0, microsecond=0) START_DATE = now_to_the_hour DAG_NAME = "test_dag_v2"

dag = DAG( DAG_NAME, schedule="*/10 * * * *", default_args={"depends_on_past": True}, start_date=pendulum.datetime(2021, 1, 1, tz="UTC"), catchup=False, )

run_this_1 = EmptyOperator(task_id="run_this_1", dag=dag) run_this_2 = EmptyOperator(task_id="run_this_2", dag=dag) run_this_2.set_upstream(run_this_1) run_this_3 = EmptyOperator(task_id="run_this_3", dag=dag) run_this_3.set_upstream(run_this_2)`

The dag is success but i got the following log message : *** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=test_dag_v1/run_id=manual__2024-11-29T10:15:31.097211+00:00/task_id=run_this_1/attempt=1.log': No host supplied I verify in my logs folder and i have the log of test_dag_v2 filling.

ClementViricel avatar Nov 29 '24 10:11 ClementViricel

had the same issue when migrating to 2.10.3 I found this post very useful for troubleshooting https://github.com/apache/airflow/discussions/32234

From my perspective this also occurs on Dummy and EmptyOperators as they don't have any output to log. I think there was some discussion on Github about this behaviour.

Hope it helps :-)

sosystems-dev avatar Nov 29 '24 10:11 sosystems-dev

i got the same issue on 2.10.3

mahmoudmostafa0 avatar Dec 16 '24 09:12 mahmoudmostafa0

can someone add ?keepalives=1&keepalives_idle=30&keepalives_interval=10&keepalives_count=5 to the database connection string and test the behavior?

quack39 avatar Dec 27 '24 13:12 quack39

add ?keepalives=1&keepalives_idle=30&keepalives_interval=10&keepalives_count=5 to the database connection string and test the behavior?

tried that now and still got the same issue, i also downgraded to 2.9.3

mahmoudmostafa0 avatar Dec 29 '24 10:12 mahmoudmostafa0

Having the same issue on 2.10.2

tuanpb99 avatar Jan 06 '25 10:01 tuanpb99

We are facing the same issue on 2.10.4

nikhilcss97 avatar Jan 07 '25 00:01 nikhilcss97