vector icon indicating copy to clipboard operation
vector copied to clipboard

Vector Lookup address to DNS even if TTL is higher

Open manavadariakevin opened this issue 1 year ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We are using Vector version vector-0.40.0-1.x86_64 in our linux setup where we have below configuration to send logs to vector aggregators and the endpoint is on Envoy Proxy.

sinks: vector: type: vector #healthcheck: False address: "https://vector-nonprod.abc.com" compression: True inputs: - parsing - nginx batch: max_bytes: 10000 max_events: 10000 buffer: type: "disk" max_size: 268435488 request: rate_limit_num: 30 retry_attempts: 100 timeout_secs: 5 retry_max_duration_secs: 5 retry_initial_backoff_secs: 1 retry_jitter_mode: Full

it keeps connecting to DNS for lookup for vector-nonprod.abc.com all the time and it is making too much query to DNS while it should use the DNS caching itself or use server resolv configuration to get the data instead of going directly to DNS.

Here are some connections towards our DNS server and this is just for nonprod , but for prod we have something like 500 connections towards DNS and 300 something queries per minute towards DNS. this is affecting our DNS badly with too many requests. If there is any solution to make this work please guide.

netstat -n | grep 254 udp 0 0 10.10.10.17:28174 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:36843 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:47618 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:59961 10.10.10.254:53 ESTABLISHED

Configuration

sinks:
  vector:
     type: vector
       #healthcheck: False
     address: "https://vector-nonprod.abc.com:443"
     compression: True
     inputs:
       - parsing
       - nginx
     batch:
       max_bytes: 10000
       max_events: 10000
     buffer:
       type: "disk"
       max_size: 268435488
     request:
       rate_limit_num: 30
       retry_attempts: 100
       timeout_secs: 5
       retry_max_duration_secs: 5
       retry_initial_backoff_secs: 1
       retry_jitter_mode: Full

Version

vector 0.40.0 (x86_64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

manavadariakevin avatar Oct 08 '24 11:10 manavadariakevin

Also have used below combination in address "https://vector-nonprod.abc.com:443"

still it is the same.

manavadariakevin avatar Oct 08 '24 12:10 manavadariakevin

I think we discussed this in Discord a bit. I mentioned there that Vector does a DNS lookup every time in initiates a connection. However, even given that, it seems like you are seeing many more lookups than might be expected (it seems unlikely, but maybe possible?, that Vector is initiating 500 connections per second).

Regardless, it does seem prudent for Vector to do DNS caching so I think adding that would be a reasonable way to address this issue.

jszwedko avatar Oct 08 '24 22:10 jszwedko

Yes , I meet the same problem my config:

[sinks.out] type = "loki" inputs = [ "remove_kafka_fields" ] endpoint = "http://distributor-loki.my.com/" out_of_order_action = "accept" remove_timestamp = true tenant_id = "myapp"

use tcpdump to watch: tcpdump -vvn port 53

so many dns resolution ;

killkill avatar Oct 11 '24 17:10 killkill

Just want to add more here we also using Splunk HEC as sink and it is also having the similar issue and we see too many DNS connections and queries being done which is heavy on DNS setup. Would be good if we get some fix for this.

manavadariakevin avatar Nov 13 '24 12:11 manavadariakevin

Hello, we don't have the capacity to get to this right now. We always welcome PRs and we do our best to review them ASAP.

Regardless, it does seem prudent for Vector to do DNS caching so I think adding that would be a reasonable way to address this issue.

In this instance, the solution Jesse mentioned seems like the best way to fix this issue. I would start looking at https://github.com/vectordotdev/vector/blob/master/src/dns.rs, potentially introducing a caching layer there. We might also want to expose some new config options for this, such as turning caching on/off and TTL for cache entries.

pront avatar Nov 13 '24 15:11 pront

In this instance, the solution Jesse mentioned seems like the best way to fix this issue. I would start looking at https://github.com/vectordotdev/vector/blob/master/src/dns.rs, potentially introducing a caching layer there. We might also want to expose some new config options for this, such as turning caching on/off and TTL for cache entries.

All in favor of such a modification to allow Vector to have some sense of a local cache for DNS lookups, though I would strongly warn against a TTL that is inside of Vector for DNS caching. DNS already has the concept of TTL, and layering a different (manual) TTL on top of that will be confusing and may lead to conflicting operational goals at different layers of the pointer resolution process. Please use DNS TTL as "the" TTL.

johnhtodd avatar Dec 04 '24 17:12 johnhtodd

Please use DNS TTL as "the" TTL.

Sounds right 👍 I suppose this concept will come up during the PR review. @johnhtodd you are welcome to help review this feature whenever we have a PR.

pront avatar Dec 04 '24 18:12 pront

Will jump on this in the next few days when I have a moment or two.

I'll likely be using Hickory's Resolver based on a recommendation from jszwedko in the Discord.

However that gets implemented, I'll be treating it as a hard requirement to be able to disable Vector's internal resolver and opt for a local resolver instead via config.

Just making this comment so y'all are aware I'm willing/am looking at this 🙂

PriceHiller avatar Dec 21 '24 06:12 PriceHiller

Hi

Did we have any update on this one ? if this can be checked and fixed ?

manavadariakevin avatar May 27 '25 09:05 manavadariakevin

No updates from my side, but I'd be happy to see a PR using the Hickory resolver as described above.

jszwedko avatar May 27 '25 21:05 jszwedko