airflow icon indicating copy to clipboard operation
airflow copied to clipboard

[Helm Chart] Support multiple DagProcessors

Open jtvmatos opened this issue 3 years ago • 9 comments

Description

Airflow 2.4.0 has introduced support for Support multiple DagProcessors parsing files from different locations.

However, Helm chart only supports Provision Standalone Dag Processor (or at least will support in Airflow Helm Chart 1.7.0).

Use case/motivation

For Airflow Helm Chart users take advantage of these new AIP-43 features, it is necessary to add support for provision multiple DagProcessors in Helm chart.

Related issues

Helmchart: Provision Standalone Dag Processor Support multiple DagProcessors parsing files from different locations

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

jtvmatos avatar Sep 19 '22 16:09 jtvmatos

Marked it as a good first issue. I think it would require some smart way of defining how to do it in the configuration. Even if you are not willling to implement it (or maybe?), a proposal from the user perspective how it could look like when it comes to configuration would be most welcome @jtvmatos . Any ideas and suggestions?

potiuk avatar Sep 20 '22 12:09 potiuk

Hi @potiuk thanks for the feedback.

Unfortunately in the coming weeks I won't be available to create this PR (I will be unavailable for more than a month... if no one solves it by then, I can open the PR....)

But yes, this PR can be tricky and can create a breaking change in the current processor configurations. For example, creating a nested level for multiple DagProcessors could be an option:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  # Airflow dag processors in the deployment
  processor:
    # Processor 1
    - name: processor-1 # Alias for metadata.name
      replicas: 1
      revisionHistoryLimit: ~
      command: ~
      args: ["bash", "-c", "exec airflow dag-processor"]
      strategy:
     (...)
    # Processor 2
    # - name: processor-2
    #   args: ["bash", "-c", "exec airflow dag-processor --subdir  ..."]

With this values.yaml, will be possible to set the processors configurations as currently used for env or secrets, i.e., dagProcessor.processor[0]..., dagProcessor.processor[1]... for N processors (>=2.4.0).

Alternatively, we can just add a extraDagProcessor list:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  
   (...)
   
  env: []

  extraDagProcessor: []

This list is not breaking for the current settings. However, it almost imply a "master" processor and "other" processors.

jtvmatos avatar Sep 21 '22 06:09 jtvmatos

No worries - we do not have to add it super-quickly I think (but if somoene will want to add it this is still an option :)).

I'd say the best approach would be to make the top-level dag processor entry optional (and fail the configuration if both "arrays" and "top level" are specified (then we could name the array "multipleDagProcessors" simply?). WDYT @jedcunningham @mhenc ?

potiuk avatar Sep 22 '22 09:09 potiuk

I create a fork, with the approach that I mention before (only for my local tests): -> https://github.com/apache/airflow/commit/0793c55be0f63d8da53d16c1097bc274bbcc1a69

For now this is all I can do. But it can be useful for the discussion :)

Thank you all!

jtvmatos avatar Sep 22 '22 10:09 jtvmatos

We have a similar problem for celery workers too. I haven't given it a ton of thought, but I feel something like the breaking change as proposed above will be a better solution for us. That does mean we need to wait for version 2 of the chart though.

Either way, removing it from the existing milestone for now.

jedcunningham avatar Oct 03 '22 18:10 jedcunningham

Hi @jedcunningham,

Should we then follow a approach like @potiuk suggested? Something like:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  # Number of airflow dag processors in the deployment
  replicas: 1

  (...)

# Airflow Multiple Dag Processor Config
multipleDagProcessor:
  enabled: false
  processor:
    # Processor 1
    - name: processor-1
      replicas: 1
     (...)
    - name: processor-2
      args: ["bash", "-c", "exec airflow dag-processor --subdir  ..."]
     (...)

With a validation that only one (dagProcessor or multipleDagProcessor) can be activated, and a Note that in Helm Chart v2 both configs will be merge. WDYT?

jtvmatos avatar Oct 04 '22 07:10 jtvmatos

@jtvmatos I think it's better to open a draft PR and discuss code changes over the PR

eladkal avatar Jan 26 '23 13:01 eladkal

What is the current status of this? Looking forward to the feature of having multiple dag folders within the helm chart.

apinchuk1 avatar May 14 '24 16:05 apinchuk1

What is the current status of this? Looking forward to the feature of having multiple dag folders within the helm chart.

If you would like to contribute it @apinchuk1 , this this is the most certain way of making a "status change" on that. Other than - someone will have to contribute it, so there is no "status change".

potiuk avatar May 15 '24 12:05 potiuk