hertzbeat icon indicating copy to clipboard operation
hertzbeat copied to clipboard

[Task] <Support discovery monitoring instance through http_sd>

Open Calvin979 opened this issue 1 year ago • 12 comments

Description

  1. http request http_sd api to get service instance ip, port and more.
  2. auto create monitor to monitoring service instance throght prometheus protocol

Refer: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config https://prometheus.io/docs/prometheus/latest/http_sd/

Task List

No response

Calvin979 avatar Mar 06 '24 15:03 Calvin979

@tomsun28 Please assign this issue to me.

Calvin979 avatar Mar 06 '24 15:03 Calvin979

Hi, I feel that we need to carefully consider the following designs

  • What method is used to call the http_sd api to obtain service instance? Is it a scheduled request or a one-time request? When to update service instance data.
  • How to automatically create monitoring.
  • more

tomsun28 avatar Mar 07 '24 02:03 tomsun28

Hi, I feel that we need to carefully consider the following designs

  • What method is used to call the http_sd api to obtain service instance? Is it a scheduled request or a one-time request? When to update service instance data.
  • How to automatically create monitoring.
  • more

Thanks for your advice. I will check the code and consider how to design it better.

Calvin979 avatar Mar 07 '24 12:03 Calvin979

@tomsun28 Hi, I found that Prometheus uses http_sd or file_sd to discover service instances automatically in order to manage them better. Here is the illustration: image

Do you mean that Hertzbeat needs to support this feature? Like when somebody wants to creat a RocketMQ monitor, he/she could enter a http_sd endpoint to manage RocketMQ cluster instead of creating monitor for each RocketMQ instance. Here is the illustration: image

Calvin979 avatar Mar 17 '24 11:03 Calvin979

Hi, it’s like this👍👍. We discover and create monitoring instances automatically throght the http_sd endpoint response content.

I thinks there is a difficulty that to design how and when to create monitoring instances and how to dispaly them.

tomsun28 avatar Mar 18 '24 02:03 tomsun28

Hi, @tomsun28 I've spended some times for sorting process of collecting data and found that there are some existed design problems.

  1. All protocols in path org.dromara.hertzbeat.common.entity.job.protocol have no super class or interface, which is hard to define and modify base field or config in runtime.
  2. Every single concrete collect class in path org.dromara.hertzbeat.collector.collect has a lot of duplicate code due to org.dromara.hertzbeat.collector.collect.AbstractCollect has only one method to normative behavior of subclass. Such as preCheckParam, Remote connecting, Exception handling and so on.

Thus, here are my thoughts:

  1. Create base Protocol interface image
  2. Redesign AbstractCollector image

I write a demo as well: AbstractCollector:

public abstract class AbstractCollector<P extends CommonProtocol, D> implements HertzBeatCollector<P, D> {
    private P protocol;

    public void collect(CollectRep.MetricsData.Builder builder, long monitorId, String app, Metrics metrics) {
        protocol = constructNewProtocol();

        CollectClient<D> collectClient = null;
        try {
            // 检查参数
            if (!isValidProtocol(protocol)) {
                return;
            }

            //todo 对于通过http_sd定义的集群监控,可以在这里循环进行以下步骤,最后统一聚合到CollectRep.MetricsData.Builder即可

            // 创建客户端
            collectClient = createCollectClientTemplate(protocol);
            // 连接远程客户端
            collectClient.connect();

            // 采集数据
            processCollectedData(builder, collectClient.fetchData());
        }catch (Exception exception) {
            // 异常处理,随时可以实现并扩展
            DefaultExceptionHandler.getInstance().onHandle(exception, builder, null);
        }finally {
            closeGracefully(collectClient, null);
        }
    }

    public abstract CollectClient<D> createCollectClientTemplate(P protocol) throws Exception;

    /**
     * the protocol this collect instance support
     * @return protocol str
     */
    public abstract String supportProtocol();

    private P constructNewProtocol() {
        // clone a new protocol
        // redefine base config here, like http_sd
        // 这里直接面向CommonProtocol编码即可
        return null;
    }

    private void closeGracefully(CollectClient<D> collectClient, Logger logger) {
        if (Objects.isNull(collectClient)) {
            return;
        }

        try {
            collectClient.close();
        } catch (Exception exception) {
            if (Objects.nonNull(logger) && logger.isWarnEnabled()) {
                logger.warn("Failed to close CollectClient: {}", CommonUtil.getMessageFromThrowable(exception));
            }
        }
    }
}

SmtpDemo:

// Define the class of protocol and response data when you extend AbstractCollector
public class SmtpDemo extends AbstractCollector<SmtpProtocol, String> {
    @Override
    public boolean isValidProtocol(SmtpProtocol protocol) {
        // check SmtpProtocol here...
        return true;
    }

    @Override
    public CollectClient<String> createCollectClientTemplate(SmtpProtocol protocol) {
        // create client and define each action here...

        String host = protocol.getHost();
        String port = protocol.getPort();
        int timeout = CollectUtil.getTimeout(protocol.getTimeout());
        SMTP smtp = new SMTP();
        smtp.setConnectTimeout(timeout);
        smtp.setCharset(StandardCharsets.UTF_8);

        return new CollectClient<>() {
            @Override
            public void connect() throws Exception {
                // connect remote client...
                smtp.connect(host, Integer.parseInt(port));
            }

            @Override
            public String fetchData() throws Exception {
                // fetch data from remote...
                smtp.helo(protocol.getEmail());
                return smtp.getReplyString();
            }

            @Override
            public void close() throws Exception {
                // close client...
                smtp.disconnect();
            }
        };
    }

    @Override
    public void processCollectedData(CollectRep.MetricsData.Builder builder, String collectedData) {
        // transfer collected data and build HertzBeat data here...
    }

    @Override
    public String supportProtocol() {
        return null;
    }
}

What are the benefits if we refactor Collector as mentioned about?

  1. Friendly for expanding collector, we can directly extend AbstractCollector and code step by step. There is not necessary to worry about any other things like Catching Exception and to look for useful parameter in org.dromara.hertzbeat.common.entity.job.Metrics, all we need to do is focus on how to monitor remote server.
  2. Handling Exception in one place and good for extension. image
  3. Easy for Dynamic protocol processing and aggregating data without worrying upper code change. Before: image After: image
  4. We can enhance collecting data easily because each action is completely divided.

It will be a dramatical refactor and needs to modify each Collector. I have no idea if it is a better design. Do you have any better suggesion?

Calvin979 avatar Mar 24 '24 14:03 Calvin979

Hi, better design, 👍👍👍 I think this is a great abstraction of collect capabilities and protocol. We can do this. But there are still some points that we need to consider. In some collect, client connection need to be reused without being closed and something else.

tomsun28 avatar Mar 24 '24 15:03 tomsun28

Hi, better design, 👍👍👍 I think this is a great abstraction of collect capabilities and protocol. We can do this. But there are still some points that we need to consider. In some collect, client connection need to be reused without being closed and something else.

Maybe creating a wrapper for CollectClient can help reuse connection. I will consider it as well.

Calvin979 avatar Mar 24 '24 16:03 Calvin979

#1791

Calvin979 avatar Apr 19 '24 18:04 Calvin979

Hi, @tomsun28 I've written a demo for what you mentioned before about http_sd in pr #1791

  1. Enter http_sd url(optional)
  2. Detect http_sd through one-time job
  3. Batch create monitor by http_sd

001addmonitor 002httpsd 003monitorlist

Is it the desired result?

Calvin979 avatar May 19 '24 16:05 Calvin979

Yeah, 👍👍 Geat! Can you sumbit a pr for this, now we are publishing new version, we can merge this after release.

tomsun28 avatar May 20 '24 03:05 tomsun28

Yeah, 👍👍 Geat! Can you sumbit a pr for this, now we are publishing new version, we can merge this after release.

I will create a pr tonight.

Calvin979 avatar May 20 '24 04:05 Calvin979