dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Feature][Worker Group] Adding exclusive worker field for default worker group to exclude the specific worker.

Open batmanneverdie opened this issue 2 years ago • 9 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

Adding exclusive worker field for default worker group in t_ds_worker_group table to exclude the specific worker.

Why: in DS 3.1.3 version, all active workers will add to default worker group. But in same scenarios, like remote worker, it's a specific environment to run the specific tasks. The remote worker may not be suitable for all tasks, some dispatched tasks to the worker will cause error since environment issue.

I think common users need not consider about choose which worker group when run tasks, if admin could exclusive some inapplicable worker for default, then all rest of worker in default worker group is usable. This will be friendly for common user.

在 DS 3.1.3 版本中,DS 会将所有存活的workerNodeInfo添加到default分组,这会导致一个问题:

前提:remote worker 的环境是特殊的,只能执行特定的任务,而普通用户在配置任务时,不应该考虑选择什么 worker 分组,应该无脑默认default即可。

问题:default中包含的 remote worker 如果不被排除,则任务很可能会被调度到此 remote worker 从而导致任务因为环境问题失败,这是很低级的问题。

解决方案:

t_ds_worker_group表中添加exclusive_worker字段,含义:被排除的 worker 地址,逗号分隔。

注意:仅default分组可以修改此字段,且default分组不可删除。

影响:

  1. 新注册 worker 时(active worker)

    检测 active worker 是否在exclusive_worker中存在,存在则从default分组的workerNodeInfo排除,否则添加。

  2. 前端需要修改 worker 分组的编辑逻辑: worker group

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

batmanneverdie avatar Jun 07 '23 03:06 batmanneverdie

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

Adding exclusive worker field for default worker group in t_ds_worker_group table to exclude the specific worker.

Why: in DS 3.1.3 version, all active workers will add to default worker group. But in same scenarios, like remote worker, it's a specific environment to run the specific tasks. The remote worker may not be suitable for all tasks, some dispatched tasks to the worker will cause error since environment issue.

I think common users need not consider about choose which worker group when run tasks, if admin could exclusive some inapplicable worker for default, then all rest of worker in default worker group is usable. This will be nice for common user.

在 DS 3.1.3 版本中,DS 会将所有存活的workerNodeInfo添加到default分组,这会导致一个问题:

前提:remote worker 的环境是特殊的,只能执行特定的任务,而普通用户在配置任务时,不应该考虑选择什么 worker 分组,应该无脑默认default即可。

问题:default中包含的 remote worker 如果不被排除,则任务很可能会被调度到此 remote worker 从而导致任务因为环境问题失败,这是很低级的问题。

解决方案:

t_ds_worker_group表中添加exclusive_worker字段,含义:被排除的 worker 地址,逗号分隔。

注意:仅default分组可以修改此字段,且default分组不可删除。

影响:

  1. 新注册 worker 时(active worker)

    检测 active worker 是否在exclusive_worker中存在,存在则从default分组的workerNodeInfo排除,否则添加。

  2. 前端需要修改 worker 分组的编辑逻辑: worker group

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

github-actions[bot] avatar Jun 07 '23 03:06 github-actions[bot]

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

  • In order for us to understand your request as soon as possible, please provide detailed information, version or pictures.
  • If you haven't received a reply for a long time, you can join our slack and send your question to channel #troubleshooting

github-actions[bot] avatar Jun 07 '23 03:06 github-actions[bot]

May I ask why not use another worker group that only contains available workers? I don't think this is a common issue in most cases.

rickchengx avatar Jun 07 '23 05:06 rickchengx

For the case of creating a new worker group, I think it is not friendly in the following scenarios:

  1. At present, DS will select the default group by default. I think most users should not care about what worker group is available and what worker group is specific, they just choose the default is OK. But if the default group contains a specific worker, then each task user needs to consider which worker group to choose. It's unfriendly.
  2. The maintainer must know the address of the worker in advance, and then configure the worker group. When the worker expands or modifies the deployment machine, the worker address will change. The maintainer also need to change the worker group, which increases the mental burden.

BTW,I think that in the worker group management UI, the worker should display its active status, such as green for survival, and red for death or non-existence. Currently all display green.

batmanneverdie avatar Jun 07 '23 06:06 batmanneverdie

May I ask why not create a default group replace the original(default) one?

Radeity avatar Jun 07 '23 14:06 Radeity

@Radeity I want default worker group could contain all active workers (including the expand worker like increment k8s replicas), but exclude the specific worker. Whatever the default is original or user created.

batmanneverdie avatar Jun 09 '23 01:06 batmanneverdie

Do you mean in this scalable scenario, it's hard to maintain a worker group to replace the default worker group(contains all nodes) exclude some specific Workers?

Maybe https://github.com/apache/dolphinscheduler/issues/14192 can help. After supporting label definition, for the scenario you mentioned above, you can set specific label for excluded Workers, and then create a new worker group to replace the default one, which can set the expression with NotIn operation.

Anyway, whether to add exclusive worker field, I have no objection, wait for others' suggestion.

Radeity avatar Jun 09 '23 04:06 Radeity

Do you mean in this scalable scenario, it's hard to maintain a worker group to replace the default worker group(contains all nodes) exclude some specific Workers?

Yes, That's what I mean. 😆

batmanneverdie avatar Jun 09 '23 05:06 batmanneverdie

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Aug 12 '24 00:08 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Oct 06 '24 00:10 github-actions[bot]