Feature: Introspection API
This issue is related to #27, and aims at identifying the requirements for a monitor's state to be accessed and manipulated by another program (which could be written in Python or not).
What we should do is:
- define use cases
- list the required features
- see why the current implementation does not allow for the above two points
Some remarks about what we need to consider:
- Identify how many instances of a monitor can run in a single Python process (I'm not sure that we can run many instances)
- Be clear about the security implications of accessing a running monitor (it might become a security weak point)
- Identify ways to do IPC to interact with a running monitor (we should be able to query a monitor status without Python)
A few notes:
Use Cases
- Viewing the status of the monitors
- Status of the process itself (a /ping url to monitoring.web) to make sure that it's alive.
- What monitors are configured to run?
- How many times did each run?
- What is the latest status for each monitor
- Perhaps view the history of success/failure runs
- Enable/disable a monitor. Nice to have
- An HTTP API to the same features as in 1 - be able to query the monitoring service for which monitors are installed, their success/failure status etc.
- Enable/disable through a web api. Nice to have.
- Possibly - back this data in a database, so that it retains its history regardless of restarts. Or if not a full fledged DB then perhaps a json file or a pickle file, although that's suboptimal.
Remarks
- How many instances can run - great question. I don't know. A related question - how to run multiple monitors. In threads? Something else? Possibly we could spawn new processes but that seems a bit drastic and adds some possibly unrequired overhead (need to come up with a way to communicate b/w processes). Not a showstopper but would be much easier if it all resides in the same memory space.
- Security - so you mean to sort of document that? No objection, but seems kind of trivial, I mean anyone launching a web service knows that it's accessible and understands the security implications, what's the point of warning them in this case specifically?
- IPC - not sure I understand your point - do you mean a rest interface in monitoring.web? If so - then I agree.
I would just like to underline that monitoring.web should be a separate project (separate codebase) and that the goal of this ticket is to make a list of requirements related to making sure that monitoring can be queried (and maybe controlled) by another program (in your case, a web-facing application).
To clarify some of the points:
-
Monitors should be run as separate processes -- but it might be possible to run multiple monitor instances within the same process. It is something that could be investigated as part of this ticket.
-
By security I mean to ensure that a running monitor does not offer a way to intrude the system through the introspection API. In other words, we should think about the different options for ensuring that a process is authorized to introspect (or control) a running monitor.
-
By IPC I mean the way another process would query/command a running monitor. Sockets or named pipes are options, but there are other ways to interact with a running process. It might be interesting to use FUSE to create a filesystem exposing a running monitor status.
Ok got it. So initially I thought we might run everything in the same process. How terrible would that be? I realize python's concurrency isn't top notch but I wanted to make sure we aren't missing out a simple solution. Are threads that bad in python? How about green threads, could they serve us better in that respect? Anyway if the answer is NO then fuse seems like an interesting choice but I have no past experience with it. If it's a file system then perhaps using Unix users and permission would be the answer to the security concerns. Otherwise I guess a token base authentication might work but tokens aren't full proof and they do add a bit of overhead when setting things up. Maybe a file with strict permission on a predefined location that has a token. But again if using fuse then maybe a token isn't required at all
As for the use cases:
- introspect status of each monitor. When did it run, when is it scheduled to run next, what was the status of the last run, maybe a history of all past runs and what rules does it have for success/failure.
- get a list of monitors, be able to somehow gather a list of all processes that run monitors (if using a file system then ls...)
- possibly enble/disable a monitor.
The monitor(s) should ideally be run as true separate processes, so that you can be sure they are always running idependently from the state of the UI. That being said, it's entirely possible to run monitors within the UI's process, but I don't think it's desirable (mostly for stability and security reasons). Green threads are not a good option as some rules trigger sub-processes or have non-yielding operations that would retain CPU too much compared to other green threads.
FUSE is an option for introspection (read-only), but probably not so much for control (write) unless we can make sure only one process writes on a given path. ZeroMQ might be a more scalable option, but the problem would be that we would need to setup some kind of authentication scheme (FUSE would rely on file-system access rights).
Do you know any project that offers similar remote introspection/control features?
I like fuse. My only concern is interoperability. How well does it run on macs? On different Linuxes? I suppose Windows is out if the question.
As for write API we could use a scheme in which there's a /enabled file or /disabled file for each monitor. So it's a limited control functionality but could suffice. Another option is a write only file called /commands for each monitor to which the UI sends text commands. On Jun 4, 2014 11:14 PM, "Sébastien Pierre" [email protected] wrote:
The monitor(s) should ideally be run as true separate processes, so that you can be sure they are always running idependently from the state of the UI. That being said, it's entirely possible to run monitors within the UI's process, but I don't think it's desirable (mostly for stability and security reasons). Green threads are not possible as some rules trigger sub-processes or have non-yielding operations that would retain CPU too much compared to other green threads.
FUSE is an option for introspection (read-only), but probably not so much for control (write) unless we can make sure only one process writes on a given path. ZeroMQ might be a more scalable option, but the problem would be that we would need to setup some kind of authentication scheme (FUSE would rely on file-system access rights).
Do you know any project that offers similar remote introspection/control features?
— Reply to this email directly or view it on GitHub https://github.com/sebastien/monitoring/issues/29#issuecomment-45145245.
What do you think of the following scheme:
Mounting
You run each monitor with a parameter that is the name of the file that identifies it. For example:
python my-monitor.py /monitoring/my-monitor
For this to work the directory /monitoring should exist and the process has to have write access to it. You may also omit the file name, in which case it is deduced from the monitor's file name:
python my-monitor.py /monitoring/ # ends with / => file name omitted.
# File name will be the name of the python file itself
# will log to /monitoring/my-monitor
If there are conflicts (e.g. a file already exists - used by another monitor) then the process exists with error. In other words - the first wins.
The files aren't of course actual files, they are fuse based files. So when a user reads a file this results in a call to the fuse based read operation. The read operation returns the content of this "file" which is the results of all past runs served from memory (we could limit this to, say 100 or configurable. if needed, then store endlessly in a db or another file on disk in order to withstand restarts).
So an end user (the UI process or an operator) might run:
cat /monitoring/my-monitor.jsonline
or
tail /monitoring/my-monitor.jsonline
Data Format
I suggest that the data format will be configurable by the suffix of the queried file.
So cat /monitoring/monitor.jsonline would result in a \n delimited list of json items and cat /monitoring/monitor.csv would result in a coma separated file.
I suggest we start first with the jsonline format and add other formats per necessity,
An example log line might look something like this:
{"services":[{"name": "jenkins", "monitors": {"type": "SystemInfo", "freq": "5s", "success": 5, ts: ""2014-06-07T13:00:00.590"}}]}
Listing Monitors
A user (an operator or a UI process) may list existing monitors simply by running
ls /monitors/
Commanding
So far we've been talking about a read-only API. There are a few ways to possibly support a write API, or a command API.
Options 1: Use a /disabled file.
So for example this operation would result in disabling the monitor:
touch /monitoring/my-monitor/disabled
So simply creating a file named /disabled would mean that the monitor is disabled.
Re-enabling a monitor means removing this file:
rm /monitoring/my-monitor/disabled
If we later want to add other types of commands then we need to come up with more semantic names.
Option 2: Use a /commands file and /status file
Each monitor will have a /commands file and a /status file. The /commands file is where the user pipes in commands to the monitor. For example:
echo "disabled=true" > /monitoring/my-monitor/commands
The /status file is where a user introspects the status of the monitor.
The content of the /status file might look like:
$ cat /monitoring/my-monitor/status
disabled=true
...
The upside of option 2 is that it is more generic (assuming we'd like to add some more commands later on, and not just enable/disable)
Feedback from commands
If a command that's being executed fails for any reason, an os-level error code is returned. (I assume fuse lets you do that). How do we get more data about the failure though? It's great that we know that it failed, but how can we learn about why it failed? TBD...
Security and permissions
We shall use os-level file system permissions. More details need to be defined. In short - a user might have readonly access as well as read/write or listing access (execute right on a dir)
Whose responsibility is it to set proper access? The monitor itself? The operator?
Fallback to non-fuse systems
In case we need to serve non-fuse based systems (probably not - but - go figure), we could actually implement a very similar protocol based on the real files. Or at least the readonly API. We would simply be writing files instead of serving FUSE API calls. Regarding commanding - extra care needs to be taken when obeying commands - where we aren't able to return a result code, but for readonly API that should be fairly easy.
This looks really good! I guess the next step would be to see how we can update the base classes (Monitor, Rule and Action) to output information that can be serialized to json and csv.
You said you wanted to check the threading model of fusepy (or other python fuse lib). Is it adequate?
On 9 ביונ 2014, at 23:27, Sébastien Pierre [email protected] wrote:
This looks really good! I guess the next step would be to see how we can update the base classes (Monitor, Rule and Action) to output information that can be serialized to json and csv.
— Reply to this email directly or view it on GitHub.
I had a very brief look and it seems to support multi-threading -- I have to write a prototype to make sure it works though.