skywalking icon indicating copy to clipboard operation
skywalking copied to clipboard

[Feature] Add virtual MQ analysis for native traces.

Open wu-sheng opened this issue 3 years ago • 1 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

As we have the virtual database and cache analysis in 9.3.0, let's bring the last missing core analysis, virtual MQ, into the backend analysis.

As we designed years ago, span layer includes the following options

public enum SpanLayer {
    DB(1), RPC_FRAMEWORK(2), HTTP(3), MQ(4), CACHE(5);

MQ span layer represents a queue server with consumer and producer sides, such as, Kafka, RocketMQ, and Pulsar. SkyWalking Java agent has had plugins for these typical and widely used MQ's Java clients for years too.

Now, let's analyze their access load(producing/consuming load), such as X messages per second, and other typical metrics There is one special about time, MQ is able to consume and produce messages in bulk mode, and some queue clients are using pulling mode(typically Kafka), so the span could only include pulling time, but, meanwhile, some pushing mode client span could including all further processes' time.

Also, endpoint_mq_consume_count and endpoint_mq_consume_latency are purely analyzed through OAL, even having tag match expression, we should consider merging it in a consistent way like existing VirtualServiceProcessor implementations.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

wu-sheng avatar Sep 23 '22 14:09 wu-sheng

Please assign to me

pg-yang avatar Oct 16 '22 03:10 pg-yang

Don't OAL support labeled AVG metrics ?

I'm analyzing MQ metrics , I think it's better to provide transmission latency metrics labeled by queue or topic , as following :

mq_transmission_latency = from(MessageQueueAccess.transmissionLatency).filter(operation == MessageQueueOperation.Consume).avgLabeled(MessageQueueAccess.queue);

But AvgLabeledFunction is in meter package , and annotated by @MeterFunction.

MetricsHolder only initialize Class annotated by @MetricsFunction https://github.com/apache/skywalking/blob/5abe6ceb1ff0938be917f8b440c5eb3590cf31db/oap-server/oal-rt/src/main/java/org/apache/skywalking/oal/rt/parser/MetricsHolder.java#L41-L45

Could I implement labeled AVG metrics in OAL if It's not supported ? If the answer of the above question is Yes , Should I add a new Labeled AVG metrics class to metrics package , instead of using AvgLabeledFunction class in meter package ?

pg-yang avatar Oct 18 '22 09:10 pg-yang

The existing function is in the meter system because its implementation is designed in this way. MAL function(meter function) is designed for statistics aggregation and merging. Meanwhile, the OAL function targets for raw data, typical in agent trace and mesh ALS, it is one value per request(latency per request).

When you dig deeper, we would notice meter function can't be applied in OAL scenario.

At last, of course, you could implement a new label based avg OAL function, just in a different way.

wu-sheng avatar Oct 18 '22 09:10 wu-sheng

I have submit draft PR for discussion about labeled metrics https://github.com/apache/skywalking/pull/9813 .

pg-yang avatar Oct 19 '22 08:10 pg-yang

mq_transmission_latency = from(MessageQueueAccess.transmissionLatency).filter(operation == MessageQueueOperation.Consume).avgLabeled(MessageQueueAccess.queue);

Besides that tech perspective discussion, I want to back to this original proposal. Could you explain why labeled metrics are meaningful for OAL? In the Virtual MQ case, queue or topic name should be an endpoint in today's concept. Then what else would be a label?

wu-sheng avatar Oct 19 '22 09:10 wu-sheng

OAL doesn't cover the label system at the beginning, because usually entity is already assigned out of labels(service name is a field of the source), and the type of various metrics(common usages of labels in meter system) are actually mapping to various sources. In years, there is hard to find more dimensions we need to build labels for OAL, so it ends at what you see today.

wu-sheng avatar Oct 19 '22 09:10 wu-sheng

Emm , I want to provide consumer count , producer count , latency , etc metrics labeled by topic , queue . And I don't plan to provide endpoint , I think .

pg-yang avatar Oct 19 '22 10:10 pg-yang

Why don't provide an endpoint? The endpoint is designed for this actually. The topic of a queue is the same as HTTP URI.

wu-sheng avatar Oct 19 '22 10:10 wu-sheng

Emm , just for reducing user's operation and ui page in browser .

pg-yang avatar Oct 19 '22 10:10 pg-yang

That is not the point of using a label system. Label system(even in Prometheus) exists because they don't want the entity concept, and want to merge the values in one metric to visualize. We have the entity concept, and UI support to set multiple metrics for one graph. These are just two solutions to one problem. But neither of the solutions is trying to reduce browser operation. That is UI/UX design thing

wu-sheng avatar Oct 19 '22 10:10 wu-sheng

Also, you could list the key metrics(if there are few) on the table of endpoints directly, then users don't have to jump into the dashboard. And we could try to disable jump into if necessary. Let's focus on the right direction, rather than mixing two purposes, which could make the project hard to maintain and confused to others.

wu-sheng avatar Oct 19 '22 10:10 wu-sheng

OK , provide endpoint

pg-yang avatar Oct 19 '22 10:10 pg-yang