airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Support concurrent logs collection for containers in KubernetesPodOperator

Open mrk-andreev opened this issue 1 year ago • 5 comments

Description

During the discussion in PR #43853 with @dstandish, we noticed that there are two separate implementation flows for collecting logs from init_containers and containers that do similar things.

The current implementation processes logs sequentially, which is correct for init_containers because they are executed one after another—each starting only after the previous one has successfully completed.

However, for regular containers (not init_containers), this logic isn’t fully accurate since they can run in parallel.

Related code are https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L617

I suggest to implement concurrent logs collection for containers.

Use case/motivation

This might cause issues during long-running executions, as we won’t see logs from other containers until one of them has completed.

Related issues

https://github.com/apache/airflow/pull/43853

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

mrk-andreev avatar Dec 18 '24 22:12 mrk-andreev

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Dec 18 '24 22:12 boring-cyborg[bot]

@potiuk , @eladkal , @dstandish

What do you think about that?

mrk-andreev avatar Dec 18 '24 22:12 mrk-andreev

I think it would be a great idea to review and make it "common" and reused accross the components. Also we have another, related discussion about logging in https://github.com/apache/airflow/issues/45079 where a solutions is proposed to handle both - streaming logs and serving logs after task is completed - with regards to OOM issues.

Currently we merge and store logs in memory which - in some situations might in some situations lead to OOMs. And some refactoring and restructuring of the logging mechanism might be necessary to handle things better.

I think both of you @mrk-andreev and @jason810496 are very well skilled and prepared to handle that - under @dstandish @eladkal (and a little of mine) guidande and mentoring, so if you both work on this together to get both sides of logging improved.

I know it's not directly the same issue, but having a small team of skilled people to focus on both and collaborate, might be a great idea.

@dstandish - WDYT? Having the three of maintainers here to overlook it while Mark and LIU ZHE YOU work on those, might be a good idea to distract from other Airlfow 3 streams?

potiuk avatar Dec 20 '24 11:12 potiuk

Hi @potiuk, I’m happy to get involved in the refactoring of the stream logs issue!

I might start by breaking down #45079 into some TODO tasks and proposing PRs. Do we need to collaborate on Slack, or is discussing on GitHub sufficient?

I think it would be a great idea to review and make it "common" and reused across the components.

From my perspective, it’s quite challenging to define a common interface for both use cases at this stage. For #45079 , there are some additional logic to deal with metadata for logs. Perhaps Mark and I could work in parallel for now, and later consolidate the streaming logs part into common modules, which might make it easier to identify shared utilities.

jason810496 avatar Dec 20 '24 11:12 jason810496

I might start by breaking down https://github.com/apache/airflow/issues/45079 into some TODO tasks and proposing PRs. Do we need to collaborate on Slack, or is discussing on GitHub sufficient?

GitHub is enough.

From my perspective, it’s quite challenging to define a common interface for both use cases at this stage. For https://github.com/apache/airflow/issues/45079 , there are some additional logic to deal with metadata for logs. Perhaps Mark and I could work in parallel for now, and later consolidate the streaming logs part into common modules, which might make it easier to identify shared utilities.

Yeah. I was more thinking about both of you starting to work in parallel and review and comment on each-other's PRs / issues with us overlooking it - since the subjects are very close (even if not the same) - having extra pairs of eyes on the work done while it is being done is always helpful.

potiuk avatar Dec 20 '24 11:12 potiuk