protobuf parsing error when sending pyroscope-dotnet info to vector
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
I'm developing a PoC about connecting pyroscope-dotnet with vector. I have this example app running in a simple pod within a k8s cluster.
PYROSCOPE_SERVER_ADDRESS env var is set and points to a vector http server using protobuf decoding like so:
pyroscope:
type: http_server
address: 0.0.0.0:4040
path: /ingest
decoding:
codec: protobuf
protobuf:
desc_file: /vector-data-files/profile.desc
message_type: perftools.profiles.Profile
As you might see, I'm referencing a profile.desc based on this .proto file and generating it using protoc CLI tool like this:
protoc --descriptor_set_out=profile.desc --include_imports profile.proto
profile.desc is being transformed into a configMap so the pod is able to consume it.
Seems that both actors are able to communicate but when checking the logs I see the following:
Sample app is erroring as follows:
[2024-01-27 00:30:20.158 | info | PId: 1 | TId: 13] PyroscopePprofSink 400
vector logs shows:
{"error":"Error parsing protobuf: DecodeError { description: \"invalid wire type: ThirtyTwoBit (expected LengthDelimited)\", stack: [] }","error_code":"decoder_deserialize","error_type":"parser_failed","host":"aks-default-42715425-vmss000028","internal_log_rate_limit":true,"message":"Failed deserializing frame.","metadata":{"kind":"event","level":"ERROR","module_path":"vector::internal_events::codecs","target":"vector::internal_events::codecs"},"pid":1,"source_type":"internal_logs","stage":"processing","timestamp":"2024-01-27T00:30:20.158087899Z","vector":{"component_id":"pyroscope-ingest","component_kind":"source","component_type":"http_server"}}
Configuration
apiVersion: v1
kind: Namespace
metadata:
name: poc
---
apiVersion: v1
kind: Pod
metadata:
name: pyropoc
namespace: poc
annotations:
app: pyropoc
spec:
containers:
- name: pyropoc
image: ""
env:
- name: ASPNETCORE_URLS
value: http://*:5000
- name: PYROSCOPE_SERVER_ADDRESS
value: http://vector-agent.monitoring.svc.cluster.local:4040
ports:
- containerPort: 5000
protocol: TCP
name: http
Version
0.29.0
Debug Output
No response
Example Data
No response
Additional Context
No response
References
https://github.com/grafana/pyroscope-dotnet/issues/56
HI @jsonarso !
I'm not familiar with Pyroscope, but the error is indicating that Vector isn't receiving length-delimited protobuf messages. Is it possible to configure the client to frame the protobuf messages in that manner?
Hey @jszwedko, thanks for the quick response.
I don't have a lot of experience with it either. It uses an agent based on dd-trace-dotnet to retrieve continuous profiling info from dotnet apps.
I only see some basic environment variables for configuration but nothing related to framing. https://grafana.com/docs/pyroscope/latest/configure-client/language-sdks/dotnet/
Could vector http server framing feature help in this scenario or that happens after decoding... just wondering https://vector.dev/docs/reference/configuration/sources/http_server/#framing
Mmm, yeah, I see. I'm struggling to find some docs on the protocol pyroscope uses to forward data. It might be worth asking their community about it.
I think I'm kind of clarifying things a little bit after reading this: https://grafana.com/docs/pyroscope/latest/configure-server/about-server-api/#pprof-format
What I've found is that the dotnet profiler sends a multipart/form-data HTTP request which contains both a .pprof file in it and a JSON sample type config.
----cpp-httplib-multipart-data-yJjxnAh1Fskvttvw
Content-Disposition: form-data; name="profile"; filename="profile.pprof"
W
O
!"#$��� """"""""""""""""" " "
"
"""""
"
""""""""""""""""""""""""""""""""""""" " "!"!"""""#"#"$"$* * * * * *
*
* * *
* * *
* * * * * * * * * * * * * *! *" *# *$ *% * & *!' *"( *#) *$* 22nanoseconds2cpu2Microsoft.Extensions.Hosting2MMicrosoft.Extensions.Hosting.Internal!Host.<DisposeAsync>g__DisposeAsync|16_02GMicrosoft.Extensions.Hosting.Internal!Host.<DisposeAsync>d__16.MoveNext2System.Private.CoreLib2RSystem.Runtime.CompilerServices!AsyncMethodBuilderCore.Start..........`
----cpp-httplib-multipart-data-yJjxnAh1Fskvttvw
Content-Disposition: form-data; name="sample_type_config"; filename="sample_type_config.json"
{
"alloc_samples": {
"units": "objects",
"display-name": "alloc_objects"
},
"alloc_size": {
"units": "bytes",
"display-name": "alloc_space"
},
"cpu": {
"units": "samples",
"sampled": true
},
"exception": {
"units": "exceptions",
"display-name": "exceptions"
},
"lock_count": {
"units": "lock_samples",
"display-name": "mutex_count"
},
"lock_time": {
"units": "lock_nanoseconds",
"display-name": "mutex_duration"
},
"wall": {
"units": "samples",
"sampled": true
},
"inuse_objects": {
"units": "objects",
"display-name": "inuse_objects",
"aggregation": "average"
},
"inuse_space": {
"units": "bytes",
"display-name": "inuse_space",
"aggregation": "average"
}
}
----cpp-httplib-multipart-data-yJjxnAh1Fskvttvw--
Interesting things I've noticed:
- Without any encoding, vector split the request into multiple events/messages.
- When setting http server encoding to binary, I receive the whole message in one event.
- There some other encoding happening behind the scenes cause vector shows messages as follows:
{
"message":"----cpp-httplib-multipart-data-HOQwocFKsiFUtIar\r\nContent-Disposition: form-data; name="profile"; filename="profile.pprof"\r\n\r\n\n\u0004\b\u0002\u0010\u0001\u0012\u000b\n\u0003\u0001\u0002\u0003\u0012\u0004���\u0004"\u0006\b\u0001"\u0002\b\u0001"\u0006\b\u0002"\u0002\b\u0002"\u0006\b\u0003"\u0002\b\u0003\u0006\b\u0001\u0010\u0004 \u0003\u0006\b\u0002\u0010\u0005 \u0003\u0006\b\u0003\u0010\u0006 \u00032\u00002\u000bnanoseconds2\u0003cpu2\u0016System.Private.CoreLib2System.Threading!WaitHandle.WaitOneNoCheck2>System.Threading!PortableThreadPool.GateThread.GateThreadStart2%System.Threading!Thread.StartCallbackZ\u0004\b\u0002\u0010\u0001`\u0001\r\n----cpp-httplib-multipart-data-HOQwocFKsiFUtIar\r\nContent-Disposition: form-data; name="sample_type_config"; filename="sample_type_config.json"\r\n\r\n{\n "alloc_samples": {\n "units": "objects",\n "display-name": "alloc_objects"\n },\n "alloc_size": {\n "units": "bytes",\n "display-name": "alloc_space"\n },\n "cpu": {\n "units": "samples",\n "sampled": true\n },\n "exception": {\n "units": "exceptions",\n "display-name": "exceptions"\n },\n "lock_count": {\n "units": "lock_samples",\n "display-name": "mutex_count"\n },\n "lock_time": {\n "units": "lock_nanoseconds",\n "display-name": "mutex_duration"\n },\n "wall": {\n "units": "samples",\n "sampled": true\n },\n "inuse_objects": {\n "units": "objects",\n "display-name": "inuse_objects",\n "aggregation": "average"\n },\n "inuse_space": {\n "units": "bytes",\n "display-name": "inuse_space",\n "aggregation": "average"\n }\n}\r\n----cpp-httplib-multipart-data-HOQwocFKsiFUtIar--\r\n"
}
This seems to be like a get and forward scenario but of course I need to keep http request format as its generated by the profiler... any ideas?
Interesting, thanks for those additional details. I think the best you'll be able to do, with just Vector, is to receive the whole payload as one event (the binary encoding). I'm guessing that won't enable you to do the processing you desire though? It would allow simple pass-thru.
Alternatively, this might be a place where you'd need a sidecar to receive the requests, parse them, and forward them to Vector.