gobgp icon indicating copy to clipboard operation
gobgp copied to clipboard

prometheus support

Open fujita opened this issue 8 years ago β€’ 32 comments

Is anyone interested in?

fujita avatar Nov 22 '17 15:11 fujita

Yes, but more interesting would be Zabbix.

CoreTex avatar Nov 23 '17 13:11 CoreTex

Maybe there is a common way to handle both monitoring via prometheus and other monitoring systems in a generic way?

benerudolph avatar Nov 23 '17 14:11 benerudolph

These metrics must be helpful for us.

  • routes # of adj-in, accepted, and local RIB for route server mode
  • per peer bgp packets counter
  • buffering status

nstgt avatar Nov 25 '17 08:11 nstgt

Prometheus support would be greatly appreciated.

kilo-bit avatar Dec 08 '17 02:12 kilo-bit

+1 for Prometheus

imcom avatar Dec 26 '17 07:12 imcom

+1 for Prometheus!

tonidas avatar Jan 04 '18 16:01 tonidas

+1 for Prometheus!

xiaopeng163 avatar Mar 15 '18 15:03 xiaopeng163

Is anyone interested in?

@fujita, I am interested πŸ˜ƒ There are at least two ways to do that.

  1. A separate project for gobgp_exporter, which queries gobgp API.
  2. A separate go routine in gobgp

Any preference?

greenpau avatar May 02 '18 23:05 greenpau

@fujita , I looked that gobgp code and I personally like point 1. Could you create a separate project, say gobgp_exporter for the exporter under osrg? I could put a PR for the exporter code pretty fast. The client API for gobgp is pretty straight-forward, e.g. showRibInfo() function.

greenpau avatar May 03 '18 03:05 greenpau

Sorry about the delay, I've just started to investigate implementation options.

@greenpau why can't we pick the best of option 1 and 2? Having a separate project (exporter) forces users to manage another daemon. We can't run a go routine to execute the gRPC API and handle http requests from prometheus?

I've just create a branch to only send the received number in each peer: https://github.com/fujita/gobgp/commit/090cca2967b3251d4656b06e13864de4c1f8ac4b

fujita avatar Jun 02 '18 13:06 fujita

@fujita, it depends on the circumstances.

  1. Sync vs Async polling...imagine you want to monitor a routing table contining 1000s routes. Do you want the polling to be sync or async?
  2. Containerizing daemons. Once you containerized gobgp with prom built-in, the different go routine compete for the same cpu, because 1 cpu for a container.
  3. Relid configs. How do you reload prom config?
  4. etc.

Prom exporters are designed on per app basis. Compare node_exporter and backbox_exporter and you will get the gist.

greenpau avatar Jun 02 '18 17:06 greenpau

  1. Sync vs Async polling.

I don't understand. I don't see how this is related with the exporter and exposing models.

  1. Containerizing daemons.

1 cpu for a container? Why? You can configure what you like.

  1. Relid configs. How do you reload prom config?

I had quick look at the official exporters, mysql, consul. Looks like they don't support the config reloading.

fujita avatar Jun 03 '18 10:06 fujita

@fujita , see below.

1 cpu for a container? Why? You can configure what you like.

Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.

I don't understand. I don't see how this is related with the exporter and exposing models.

Polling = scraping

Sync scraping means you make a call to the URL of an exporter, and subsequent data collection starts. Once the data collected it outputs it to your scraper.

Async scraping means that data collection runs independent of the scrape calls. In that mode, the scraper gets that data from data collection buffer.

Why that matters. Consider your exporter has 100-100s of metrics and data collection takes 1-3s. That's the time your GoBGP server routine competes with the collectors' routines for single CPU time. It obviously does not matter when you collect status of BGP peers, because it probably takes 0.00001s.

Looks like they don't support the config reloading.

Right. Consider you want to change the configuration of the exporter, would you restart GoBGP and drop neighbor relations with your peers?

greenpau avatar Jun 04 '18 13:06 greenpau

Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.

Hmm, really? Inside a single container, can you try to run two processes; each execute an endless loop? Then see the output of top command on a host OS.

fujita avatar Jun 04 '18 14:06 fujita

Hmm, really?

Your GoBGP and Prometheus go routines are not separate processes. They are go routines. Hence, will be on the same CPU.

greenpau avatar Jun 04 '18 15:06 greenpau

can you try to run two processes; each execute an endless loop?

@fujita , that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.

greenpau avatar Jun 04 '18 15:06 greenpau

By default, processes can run on any CPU. However, processes can be instructed to run on a predetermined selection of CPUs, by changing the affinity of the process. Child processes inherent the CPU affinities of their parents. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/chap-Realtime_Reference_Guide-Affinity.html

@fujita , refreshed my memory on the subject. Here, there is a snippet to run the test you proposed.

As the sample runs below demonstrate, the amount of real and CPU time consumed when running the program will depend on intra-core caching effects and whether the processes are using the same CPU.

greenpau avatar Jun 04 '18 20:06 greenpau

can you try to run two processes; each execute an endless loop?

@fujita, that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.

Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.

fujita avatar Jun 04 '18 22:06 fujita

Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.

@fujita , yes, two processes running in a container can be on separate CPUs. My mistake.

The test script:

import os

def child():
   print('\nA new child ',  os.getpid())
   os._exit(0)

def parent():
   while True:
      newpid = os.fork()
      if newpid == 0:
         child()
      else:
         pids = (os.getpid(), newpid)
         print("parent: %d, child: %d\n" % pids)
      reply = input("q for quit / c for new fork")
      if reply == 'c':
          continue
      else:
          break

parent()

Running the test:

# python test.py
parent: 29, child: 30

q for quit / c for new fork('\nA new child ', 30)

The two processes are 1847 (parent) and 1848 (child):

$ ps -ef | grep python
root      1847  1817  0 20:05 pts/3    00:00:00 python test.py
root      1848  1847  0 20:05 pts/3    00:00:00 [python] <defunct>

CPU Sets:

# cat /proc/1847/status  | grep Cpus
Cpus_allowed:	f
Cpus_allowed_list:	0-3

# cat /proc/1848/status  | grep Cpus
Cpus_allowed:	f
Cpus_allowed_list:	0-3

The CPU process:

#  cat /proc/1848/stat | cut -d' ' -f39
1

#  cat /proc/1847/stat | cut -d' ' -f39
0

greenpau avatar Jun 05 '18 00:06 greenpau

Good, now you understand goroutines DON'T compete for a single CPU inside a container. A container is just namespaces and processes from the perspective of Linux kernel so it can use as many CPUs as you have.

For me, some official exporters don't support reloading because it's not useful much. So I go with the current design. If someone needs reloading, it's pretty easy for him to implement a own gobgp exporter by using gobgp/exporter as a package; he just needs a main function.

fujita avatar Jun 05 '18 01:06 fujita

now you understand goroutines DON'T compete for a single CPU inside a container.

@fujita , overconfidence πŸ˜† they DO compete when there is only 1 CPU to compete over πŸ˜„ They also compete when you have more goroutines than CPUs. I get that an application, over time, might use all CPUs. How do you know that goroutines are being load balanced across all CPUs?

greenpau avatar Jun 05 '18 02:06 greenpau

No difference between the exporter design and the current design on that.

fujita avatar Jun 05 '18 02:06 fujita

FYI, check this article for CPU, memory profiling with Go.

greenpau avatar Jun 05 '18 02:06 greenpau

Tally allows you to define and emit metrics in different ways. Its an option that allows you to configure multiple ways of collecting/emitting metrics. Just a suggestion.

jeffbean avatar Jun 14 '18 03:06 jeffbean

@fujita , I started with external exporter here: https://github.com/greenpau/gobgp_exporter Would love to get your feedback on the code. There are probably some issues with concurrency. Still getting up to speed with Go. I spent my weekend on this πŸ˜„

greenpau avatar Aug 21 '18 00:08 greenpau

@greenpau nice, but why you modify pb.go file?

fujita avatar Aug 23 '18 02:08 fujita

why you modify pb.go file?

@fujita , I guess the primary reason is that I want to control the timeout on the client.

The Collect() will be called on each scrape. If the daemon does not respond, the srcape holds the session and will not timeout right away. This results in hanging scrapes. It is not long, but should timeout in a second or two. In my last commit, I added a logic where the GoBGP polling and scraping happen async. There is also some concurrency logic.

Additionally, ... somewhat related, but not ... since the changes to the API (hiding behind the /internal), the only way to get to some functionality (e.g. things related to ptype, any, etc.) is by adding extra functions in the way I did it with extended client. Not ideal. I would prefer to have that code in upstream, but I don't know whether you want to have it there.

greenpau avatar Aug 24 '18 00:08 greenpau

I think I am close to getting some of the things mentioned here done.

greenpau avatar Aug 24 '18 00:08 greenpau

why not using context.WithTimeout?

fujita avatar Aug 25 '18 11:08 fujita

@fujita, because I don’t know how context works. I will take time to read up on it.

greenpau avatar Aug 25 '18 13:08 greenpau