vector Missed field "time" for sinks.prometheus_remote

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

When data came from sources.http_server, Vector receive many logs in one time (example 1000). And after processing to sinks.prometheus_remote_write transfered data flow, like:

2024-02-09T14:09:31.558577877Z HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200
2024-02-09T14:09:32.558577877Z HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200
2024-02-09T14:09:33.558577877Z HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200
2024-02-09T14:09:34.558577877Z HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200

as result we have several metrics with same time:

2024-02-09T14:09:36Z
2024-02-09T14:09:35Z
2024-02-09T14:09:34Z
2024-02-09T14:09:38Z <----- here
2024-02-09T14:09:38Z <----- here
2024-02-09T14:09:38Z <----- here
2024-02-09T14:09:41Z

First problem: Mimir don't want accept metrics, because think is some problem in cluster and metrics duplicated.

ZoneName=\"mysuperdomain.com\", __name__=\"HttpCode\"}" 
ts=2024-02-09T14:09:42.384115087Z 
caller=grpc_logging.go:60 
level=warn method=/cortex.Ingester/Push 
duration=6.813224ms 
msg=gRPC 
err="
	rpc error: 
	code = Code(400) 
	desc = user=websec-logwrap: the sample has been rejected because another sample with 
				the same timestamp, but a different value, has already been ingested 
				(err-mimir-sample-duplicate-timestamp). The affected sample has 
				timestamp 2024-02-09T14:09:38.17Z and is from series 
				{ClientIP=\"1.2.3.4\", ZoneName=\"mysuperdomain.com\", __name__=\"HttpCode\"}"

Second problem: on graphs I see holes, because metrics inserting by groups.

Attempted Solutions

No response

Proposal

Prometheus protocol support "time" as last field

HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200 2024-02-09T14:09:34.558577877Z

and we can pass real metric time, not time when Mimir accepted metric

log_to_metric:
  inputs: [ "trans_cf_mimir_json" ]
  type: "log_to_metric"
  time_field: realtime <---------- here
  metrics:
    - type: "gauge"
      field: "HttpCode"
      tags:
        ZoneName: "{{ZoneName}}"
        ClientIP: "{{ClientIP}}"
...

References

No response

Version

No response

Feb 14 '24 11:02 suslikas

Hi @suslikas !

The log_to_metric transform will actually use the timestamp from the log event, if it is set. If it is not set then it uses "current timestamp" at the time the transform processes the log event. In your case where incoming HTTP requests have 1000 events, those would likely all end up with the same timestamp. The Prometheus Remote Write sink will then use that timestamp when sending the event.

Looking at your input data, maybe you need to be aggregating duplicate points? You can use the aggregate transform for that.

Feb 14 '24 16:02 jszwedko

To expand on this:

time_field: realtime <---------- here

Is the behavior of the log_to_metric transform if there is no timestamp on the event. If the issue is that your logs do have timestamps you could use a remap transform to drop it via .del(timestamp). This will cause the log_to_metric transform to assign the current time as the timestamp. However, this could still result in duplicate data points if the timestamp happens to be the same for multiple events.

Feb 14 '24 16:02 jszwedko

Let me check after one week, thank you for idea, i just not found any information in documentation about timestamp.

Feb 14 '24 16:02 suslikas

Hey, here documentation about log_to_metric and based on example I should do transformation like:

log_to_metric:
  inputs: [ "trans_cf_mimir_json" ]
  type: "log_to_metric"
  metrics:
    - type: "histogram"
      field: "EdgeResponseStatus"
      tags:
        ZoneName                  : "{{ZoneName}}"
        ClientRequestHost         : "{{ClientRequestHost}}"
        ClientIP                  : "{{ClientIP}}"
    - type: "gauge"
      field: "EdgeResponseBytes"
      tags:
        ZoneName                  : "{{ZoneName}}"
        ClientRequestHost         : "{{ClientRequestHost}}"
        ClientIP                  : "{{ClientIP}}"
    - type: "set"
      field: "ClientCountry"
      tags:
        ZoneName                  : "{{ZoneName}}"
        ClientRequestHost         : "{{ClientRequestHost}}"
        ClientIP                  : "{{ClientIP}}"

But no idea how to pass timestamp, then should be type=timestamp or smth but in documentation nothing about it. Or I'm wrong?

Feb 27 '24 10:02 suslikas

For log_to_metric it will choose the timestamp by attempting to pull the timestamp field from the log event. If there is none, then it will use the current time. This behavior should definitely be documented, I see that it is not 😅

Feb 27 '24 12:02 jszwedko

Now I got you. I prepared transforms case, let's spin some time, ill inform you about results. Thank you!

      trans_cf_mimir_json:
        inputs:
        - trans_cf_full_json_throttle
        source: |
          if .custom_log_type == "http_requests" {
            structured = {
              {
                "EdgeResponseStatus"        : .EdgeResponseStatus,
                "ClientIP"                  : .ClientIP,
                "ClientCountry"             : .ClientCountry,
                "ClientRequestHost"         : .ClientRequestHost,
                "EdgeResponseBytes"         : .EdgeResponseBytes,
                "ZoneName"                  : .ZoneName,
                "timestamp"                 : .custom_timestamp_received
              }
            }
            . = structured
          } else {
            abort
          }

Feb 28 '24 07:02 suslikas

No success, I forward all to vector console, and I see:

Json with prepared for log_to_metric

{
    "ClientCountry": "fr",
    "ClientIP": "1.2.3.4",
    "ClientRequestHost": "d.dd.com",
    "EdgeResponseBytes": 6147,
    "EdgeResponseStatus": 200,
    "ZoneName": "dd.com",
    "timestamp": "2024-03-01T10:48:26Z"
}

Output after log_to_metric

2024-03-01T10:29:41.292898701Z EdgeResponseBytes{ClientIP="1.2.3.4",ClientRequestHost="d.ddd.com",ZoneName="ddd.com"} = 744
2024-03-01T10:29:41.292900051Z ClientCountry{ClientIP="1.2.3.5",ClientRequestHost="d.ddd.com",ZoneName="ddd.com"} + fr
2024-03-01T10:29:41.292901971Z EdgeResponseStatus{ClientIP="1.2.3.6",ClientRequestHost="my.ddd.com",ZoneName="ddd.com"} + histogram 1@302
2024-03-01T10:29:41.292903401Z EdgeResponseBytes{ClientIP="1.2.3.7",ClientRequestHost="my.ddd.com",ZoneName="ddd.com"} = 992
2024-03-01T10:29:41.292904921Z ClientCountry{ClientIP="1.2.3.8",ClientRequestHost="my.ddd.com",ZoneName="ddd.com"} + us

And in mimir log's I see error message

err="rpc error: 
	code = Code(400) 
	desc = 	user=websec-logwrap: the sample has been rejected 
			because another sample with the same timestamp, but a 
			different value, has already been ingested (err-mimir-sample-duplicate-timestamp). 
			The affected sample has timestamp 2024-03-01T10:16:43.319Z and 
			is from series 

			{ClientIP=\"1.2.3.0\", ClientRequestHost=\"dd.com\", ZoneName=\"dd.com\", __name__=\"EdgeResponseStatus_bucket\", le=\"+Inf\"}"

Mar 01 '24 10:03 suslikas

I think what might be happening here is that the timestamp is a string rather than a "timestamp" type. You can convert the string to a timestamp using parse_timestamp in a remap transform prior to the log_to_metric transform:

.timestamp = parse_timestamp!(.timestamp, "%+")

Mar 01 '24 12:03 jszwedko

Hey, I made many tests and had long playing with timestamps and time formats, but no success. Last what I tested

                "timestamp"                 : now()

should work, but no. log_to_metric not add last line part - time.

P.S. here good example how to store timestamp in json, yes, be fact it's should be string.

Mar 04 '24 11:03 suslikas

Can you provide the full Vector configuration you tried? It's not clear to me what you tried exactly.

Mar 04 '24 15:03 jszwedko

# cat /etc/vector/vector.yaml 
api:
  address: 0.0.0.0:8686
  enabled: true
  playground: false
data_dir: /vector-data-dir
sinks:
  out_websec_mimir:
    endpoint: http://websec-logwrap-general-haproxy.websec-logwrap-general.svc.cluster.local/api/v1/push
    healthcheck:
      enabled: false
    inputs:
    - log_to_metric
    - vector_metrics
    type: prometheus_remote_write
  out_websec_opensearch:
    auth:
      password: $WEBSEC_OS_PASS
      user: $WEBSEC_OS_USER
      strategy: basic
    batch:
      max_events: 500
      timeout_secs: 3
    buffer:
      max_events: 1500
    bulk:
      index: logwrap-cf-{{.custom_log_type}}-{{.custom_log_scope}}-{{.ZoneName}}-%Y-%m-%d-%H
    compression: gzip
    endpoints:
    - http://websec-logwrap-general-opensearch.websec-logwrap-general.svc.cluster.local
    inputs:
    - trans_cf_full_json
    - trans_cf_short_json
    type: elasticsearch
sources:
  in_https:
    address: 0.0.0.0:8443
    auth:
      password: $WEBSEC_HTTP_PASS
      username: $WEBSEC_HTTP_USER
    tls:
      crt_file: /tmp/secret/WEBSEC_SSL_CRT
      enabled: true
      key_file: /tmp/secret/WEBSEC_SSL_KEY
    type: http_server
  vector_metrics:
    type: internal_metrics
transforms:
  log_to_metric:
    inputs:
    - trans_cf_mimir_json
    metrics:
    - field: EdgeResponseStatus
      tags:
        ClientIP: '{{ClientIP}}'
        ClientRequestHost: '{{ClientRequestHost}}'
        ZoneName: '{{ZoneName}}'
      type: histogram
    - field: EdgeResponseBytes
      tags:
        ClientIP: '{{ClientIP}}'
        ClientRequestHost: '{{ClientRequestHost}}'
        ZoneName: '{{ZoneName}}'
      type: gauge
    - field: ClientCountry
      tags:
        ClientIP: '{{ClientIP}}'
        ClientRequestHost: '{{ClientRequestHost}}'
        ZoneName: '{{ZoneName}}'
      type: set
    type: log_to_metric
  trans_cf_full_json:
    inputs:
    - in_https
    source: |
      . = parse_json!(.message)
      .custom_timestamp_received = now()
      .custom_log_scope = "full"
      if .Kind == "firewall" { .custom_log_type = "firewall_events" } else { .custom_log_type = "http_requests" }
      if .Kind == "firewall" && match(string!(.ClientRequestHost), r'aaa1\.com') { .ZoneName = "aaa1.com" }
      if .Kind == "firewall" && match(string!(.ClientRequestHost), r'aaa2\.com') { .ZoneName = "aaa2.com" }
      if .custom_log_type == "http_requests" {
        .custom_timestamp_received = .EdgeStartTimestamp
        if exists(.EdgeResponseCompressionRatio) {
          .EdgeResponseCompressionRatio = to_float!(.EdgeResponseCompressionRatio)
        }
      } else if .custom_log_type == "firewall_events" {
        .custom_timestamp_received = .Datetime
      } else {
        .custom_err_info = "custom_log_type error"
        log(., level: "info", rate_limit_secs: 2)
        abort
      }

      if !exists(.ZoneName) {
        .custom_err_info = "ZoneName error"
        log(., level: "info", rate_limit_secs: 2)
        abort
      }
    type: remap
  trans_cf_full_json_throttle:
    inputs:
    - trans_cf_full_json
    threshold: 100
    type: throttle
    window_secs: 5
  trans_cf_mimir_json:
    inputs:
    - trans_cf_full_json_throttle
    source: |
      if .custom_log_type == "http_requests" {
        structured = {
          {
            "EdgeResponseStatus"        : .EdgeResponseStatus,
            "ClientIP"                  : .ClientIP,
            "ClientCountry"             : .ClientCountry,
            "ClientRequestHost"         : .ClientRequestHost,
            "EdgeResponseBytes"         : .EdgeResponseBytes,
            "ZoneName"                  : .ZoneName,
            "timestamp"                 : now() <---------------------- here last experiment, was .custom_timestamp_received, sample data "2024-03-04T15:15:55.000Z"
          }
        }
        . = structured
      } else {
        abort
      }
    type: remap
  trans_cf_short_json:
    inputs:
    - trans_cf_full_json
    source: |
      if .custom_log_type == "http_requests" {
        structured = {
          {
            "ZoneName"                  : .ZoneName,
            "ClientRequestHost"         : .ClientRequestHost,
            "custom_timestamp_received" : .custom_timestamp_received,
            "custom_log_scope"          : "short",
            "custom_log_type"           : .custom_log_type,
            "ClientCountry"             : .ClientCountry,
            "ClientIP"                  : .ClientIP,
            "EdgeResponseStatus"        : .EdgeResponseStatus,
            "EdgeResponseBytes"         : .EdgeResponseBytes,
            "ClientRequestMethod"       : .ClientRequestMethod,
            "ClientRequestURI"          : .ClientRequestURI,
            "ClientRequestProtocol"     : .ClientRequestProtocol,
            "ClientRequestUserAgent"    : .ClientRequestUserAgent,
            "ClientSrcPort"             : .ClientSrcPort,
            "WAFAttackScore"            : .WAFAttackScore,
            "WAFRCEAttackScore"         : .WAFRCEAttackScore,
            "WAFSQLiAttackScore"        : .WAFSQLiAttackScore,
            "WAFXSSAttackScore"         : .WAFXSSAttackScore
          }
        }
        . = structured
      } else if .custom_log_type == "firewall_events" {
        structured = {
          {
            "ZoneName"                  : .ZoneName,
            "ClientRequestHost"         : .ClientRequestHost,
            "ClientRefererHost"         : .ClientRefererHost,
            "custom_timestamp_received" : .custom_timestamp_received,
            "custom_log_scope"          : "short",
            "custom_log_type"           : .custom_log_type,
            "Action"                    : .Action,
            "ClientCountry"             : .ClientCountry,
            "ClientIP"                  : .ClientIP,
            "ClientRequestMethod"       : .ClientRequestMethod,
            "ClientRequestPath"         : .ClientRequestPath,
            "ClientRequestProtocol"     : .ClientRequestProtocol,
            "EdgeResponseStatus"        : .EdgeResponseStatus,
            "Description"               : .Description,
            "ClientIPClass"             : .ClientIPClass
          }
        }
        . = structured
      } else {
        .custom_err_info = "full-shorts error"
        log(., level: "info", rate_limit_secs: 2)
        abort
      }
    type: remap

Idea: receive http and firewall logs from CloudFlare, do some normalisation, put all to OpenSearch for long (part of data) and short (all data) storage and put metrics to Mimir and store for 2-3 years. CloudFlare logpush transfer up to 1k record per one time from one domain, as result - part of metrics marked as duplicated. I can't aggregate or do some preprocessing, because I need all metriks as-is. Because why I need additionally add real metric time, as support Prometheus

HttpCode{ClientIP="1.2.3.4",ZoneName="mysuperdomain.com"} = 200 2024-02-09T14:09:34.558577877Z

Mar 04 '24 15:03 suslikas

Thanks! The last config you shared should result in now() being used as the generated metric timestamp. Are you finding that not to be the case?

For reference, this happens in this function: https://github.com/vectordotdev/vector/blob/a59aeb921bc93bc7590265f9e4335a8d824b95b4/src/transforms/log_to_metric.rs#L703-L775

Mar 05 '24 19:03 jszwedko

I'm very sad, but not work...

log from console, no any additional timestamp

2024-03-07T06:40:25.087086166Z EdgeResponseStatus{ClientIP="1.1.1.1",ClientRequestHost="d.dd.com",ZoneName="dd.com"} + histogram 1@202
2024-03-07T06:40:25.087086166Z EdgeResponseBytes{ClientIP="1.1.1.1",ClientRequestHost="d.dd.com",ZoneName="dd.com"} = 743
2024-03-07T06:40:25.087086166Z ClientCountry{ClientIP="1.1.1.1",ClientRequestHost="d.dd.com",ZoneName="dd.com"} + kr

error from mimir

...the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2024-03-07T08:57:13.826Z and is from series {ClientIP=\"1.1.1.1\", ClientRequestHost=\"d.dd.com\", ZoneName=\"dd.com\", __name__=\"EdgeResponseStatu s_bucket\", le=\"+Inf\"}"...

Mar 07 '24 10:03 suslikas

I'm having a difficult time following all of this 😓 I think what would be useful would be to come up with a minimal reproducible example of the issue that we could run to see the problematic behavior.

Mar 07 '24 20:03 jszwedko

No problem, I'm on Light's side ;)

All solution work on Kubernetes, now upgraded to latest version

vector 0.36.0 (x86_64-unknown-linux-gnu a5e48bb 2024-02-13 14:43:11.911392615)

Test config

# cat /etc/vector/vector.yaml 
data_dir: /vector-data-dir
sources:
  in_demo_logs:
    format: apache_common
    type: demo_logs
transforms:
  trans_demo_logs:
    inputs:
    - in_demo_logs
    source: |
      .mydata = parse_apache_log!(.message, format: "common")
      structured = {
        {
          "status": .mydata.status,
          "host": .mydata.host,
          "timestamp": to_unix_timestamp(now()),
          "timestamp_old": .mydata.timestamp,
          "timestamp_now": now()
        }
      }
      . = structured
    type: remap
  log_to_metric_demo_logs:
    inputs:
    - trans_demo_logs
    metrics:
    - field: status
      tags:
        host: '{{host}}'
      type: histogram
    type: log_to_metric
sinks:
  out_console:
    encoding:
      codec: text
    inputs:
    - log_to_metric_demo_logs
    type: console
  out_console2:
    encoding:
      codec: json
    inputs:
    - trans_demo_logs
    type: console

in_demo_logs generate events described here
trans_demo_logs parse to json where we use host, status and timestamp in log_to_metric_demo_logs. I tried use different timestamp formats, because present several examples how to push timestamp to Prometheus.
result in out_console

{"host":"97.126.228.41","status":301,"timestamp":1709886787,"timestamp_now":"2024-03-08T08:33:07.424016563Z","timestamp_old":"2024-03-08T08:33:07Z"}
2024-03-08T08:33:07.424052303Z status{host="97.126.228.41"} + histogram 1@301
{"host":"242.65.69.186","status":503,"timestamp":1709886788,"timestamp_now":"2024-03-08T08:33:08.422606556Z","timestamp_old":"2024-03-08T08:33:08Z"}
2024-03-08T08:33:08.422636887Z status{host="242.65.69.186"} + histogram 1@503

by your code and this example result and format should be

2024-03-08T07:20:11Z status{host="188.139.109.176"} + histogram 1@403 2000-10-10T20:55:36Z

or may be

2024-03-08T07:20:11Z status{host="188.139.109.176"} + histogram 1@403 1709886787

but both not work.

Mar 08 '24 08:03 suslikas

Thanks @suslikas ! The runnable example really helps.

I think a point of confusion here is that the text encoding you used for the out_console sink isn't intended to exactly match Prometheus output (although it is similar). I think this is why you aren't seeing the timestamp at the end of the output like you expect. If I change both encoding.codec to native_json I see:

{"log":{"host":"30.247.74.90","status":200,"timestamp":1709935433,"timestamp_now":"2024-03-08T22:03:53.663689Z","timestamp_old":"2024-03-08T22:03:53Z"}}
{"metric":{"name":"status","tags":{"host":"30.247.74.90"},"timestamp":"2024-03-08T22:03:53.663857Z","kind":"incremental","distribution":{"samples":[{"value":200.0,"rate":1}],"statistic":"histogram"}}}

The first line is the log that is input to log_to_metric and the second is the output.

Here you can see that the timestamp field of the metric was set to the current timestamp when the transform received the log message. This is because you converted the timestamp field to a unix timestamp and so the log_to_metric transform is unable to use it. The transform will only use the timestamp if it is a "timestamp" type and not an integer or a string. If I instead use this config:

data_dir: /tmp/vector
sources:
  in_demo_logs:
    format: apache_common
    type: demo_logs
transforms:
  trans_demo_logs:
    inputs:
    - in_demo_logs
    source: |
      .mydata = parse_apache_log!(.message, format: "common")
      structured = {
        {
          "status": .mydata.status,
          "host": .mydata.host,
          "timestamp": now(),
          "timestamp_old": .mydata.timestamp,
          "timestamp_now": now()
        }
      }
      . = structured
    type: remap
  log_to_metric_demo_logs:
    inputs:
    - trans_demo_logs
    metrics:
    - field: status
      tags:
        host: '{{host}}'
      type: histogram
    type: log_to_metric
sinks:
  out_console:
    encoding:
      codec: native_json
    inputs:
    - log_to_metric_demo_logs
    type: console
  out_console2:
    encoding:
      codec: native_json
    inputs:
    - trans_demo_logs
    type: console

Where timestamp is simply set to now() which is a "timestamp" type rather than an integer unix timestamp.

I get:

{"log":{"host":"220.107.194.97","status":503,"timestamp":"2024-03-08T22:06:36.379226Z","timestamp_now":"2024-03-08T22:06:36.379226Z","timestamp_old":"2024-03-08T22:06:36Z"}}
{"metric":{"name":"status","tags":{"host":"220.107.194.97"},"timestamp":"2024-03-08T22:06:36.379226Z","kind":"incremental","distribution":{"samples":[{"value":503.0,"rate":1}],"statistic":"histogram"}}}

Here you can see the metric did pull the timestamp from the log.

Does this help clear things up?

Mar 08 '24 22:03 jszwedko

Hi, may be I'm wrong... You are right, if I use native_json in console I see normal json with timestamp.

Here https://github.com/vectordotdev/vector/issues/19754#issuecomment-1923444818 I made workaround with haproxy to solve problem with auth and tenant_id. And when I enable body logging I can't find any information about timestamp.

10.15.173.107:35624 [12/Mar/2024:09:42:54.520] mimir-wrapper mimir-wrapper/mimir-aws-lb-1 0/0/0/22/22 200 100 - - ---- 3/3/0/0/0 0/0 {#88#C6#16#88#0A#9F#01#0A#1A#0A#08ClientIP#12#0E1.1.1.1#0A'#0A#11#09#1C#B0RequestHost#12#12d.dd.com#0A#1B#0A#08ZoneName#12#0F>#1D} "POST /api/v1/push HTTP/1.1"
10.15.173.107:35624 [12/Mar/2024:09:43:30.212] mimir-wrapper mimir-wrapper/mimir-aws-lb-1 0/0/0/19/19 200 100 - - ---- 3/3/0/0/0 0/0 {#EC#8D#15#88#0A#9E#01#0A#1A#0A#08ClientIP#12#0E2.2.2.2#0A*#0A#11#09#1C#C0RequestHost#12#15d.dd.com#0A#17#0A#08ZoneName#12#0Bn#1D#19#F0>%#0A#08__name__#12#19EdgeResponseStatus_bucket#0A#0B#0A#02le#12#050.005#12#07#10#DB#D2#AE#90#E31#0A#9D#01#FE#A1} "POST /api/v1/push HTTP/1.1"
10.15.173.107:35624 [12/Mar/2024:09:45:12.725] mimir-wrapper mimir-wrapper/mimir-aws-lb-1 0/0/0/19/19 200 100 - - ---- 3/3/1/1/0 0/0 {#BB#B3#15#80#0A#9C#01#0A#18#0A#08ClientIP#12#0C3.3.3.3#0A&#0A#11#09#1A#B0RequestHost#12#11d.dd.com#0A#1B#0A#08ZoneName#12#0Fn:#1D} "POST /api/v1/push HTTP/1.1"

Also in Mimir logs I can't see any timestamps. So, I have no idea how to continue debugging.

Mar 12 '24 11:03 suslikas

I think you may have more luck using something like Wireshark for debugging since you'll need to capture the payloads and decode them as protobuf (since the Prometheus Remote Write protocol uses protobuf). Or write a small dummy program that receives the HTTP requests and decodes the bodies as protobuf. This would help you validate the payloads contain the values you expect.

Mar 12 '24 20:03 jszwedko

HI,

yes, one of idea was write small wrapper and decode data. Thank you. I'll update this case when will have more information.

Mar 13 '24 07:03 suslikas

I saw this issue filed recently: https://github.com/vectordotdev/vector/issues/20119. Maybe there is some overlap with what you are experiencing?

Mar 18 '24 13:03 jszwedko

Yes, just I getting batches from sources.http_server versus aitchjoe from Vector Aggregator.

P.S. I'm waiting for new installation of Mimir and then can continue testing...

Mar 18 '24 15:03 suslikas

@jszwedko TLDR. You are right, timestamp work only with parse_timestamp

"timestamp": parse_timestamp!("2024-03-27T11:57:05.866162572Z", "%+")

And my code example for GO 1.21+, how to catch data Vector->Prometheus

package main

import (
        "fmt"
        "log"
        "net/http"
        "github.com/prometheus/common/model"
        "github.com/prometheus/prometheus/storage/remote"
)

func main() {
        http.HandleFunc("/api/v1/push", func(w http.ResponseWriter, r *http.Request) {
                req, err := remote.DecodeWriteRequest(r.Body)
                if err != nil {
                        http.Error(w, err.Error(), http.StatusBadRequest)
                        return
                }

                for _, ts := range req.Timeseries {
                        m := make(model.Metric, len(ts.Labels))
                        for _, l := range ts.Labels {
                                m[model.LabelName(l.Name)] = model.LabelValue(l.Value)
                        }
                        fmt.Println(m)

                        for _, s := range ts.Samples {
                                fmt.Printf("\tSample:  %f %d\n", s.Value, s.Timestamp)
                        }

                        for _, e := range ts.Exemplars {
                                m := make(model.Metric, len(e.Labels))
                                for _, l := range e.Labels {
                                        m[model.LabelName(l.Name)] = model.LabelValue(l.Value)
                                }
                                fmt.Printf("\tExemplar:  %+v %f %d\n", m, e.Value, e.Timestamp)
                        }

                        for _, hp := range ts.Histograms {
                                h := remote.HistogramProtoToHistogram(hp)
                                fmt.Printf("\tHistogram:  %s\n", h.String())
                        }
                }
        })

        log.Fatal(http.ListenAndServe(":8181", nil))
}

I think many guys will be thankful if you add to documentation additionally information about how to redefine timestamp and format with several examples.

Mar 27 '24 12:03 suslikas

Thanks for following up @suslikas ! Are things working for you now?

I think many guys will be thankful if you add to documentation additionally information about how to redefine timestamp and format with several examples.

I'm not sure I understand this question 🤔

Mar 27 '24 14:03 jszwedko

Yes, all good now. I suggest add to prometheus_remote_write documentation information about default timestamp field and correct type with example, how to convert log string to metrics.

I think issue can be closed.

Mar 27 '24 14:03 suslikas

how to convert log string to metrics.

I think that would be more appropriate on the log_to_metric transform. The prometheus_remote_write sink simply uses the timestamp field from the metric, which I think is what would be expected.

Thanks for confirming. I'll close out this issue. Hopefully it may help others as well!

Mar 27 '24 14:03 jszwedko

Missed field "time" for sinks.prometheus_remote_write

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version