CU-8694wh3d5 track usage
Add a usage monitor.
What it does is monitor:
- The input text length
- The input preprocessed text length
- The number of output entities
The usage monitor is configured such that it logs on file every time 10 (configurable) inferences are logged (to avoid constant IO). It also logs the rest of the buffer when the user monitor is dereferenced.
This PR also adds the relevant config part (config.general.usage_monitor). The usage monitoring can be disabled if needed by changing the config.
Task linked: CU-8694wh3d5 Track model usage
setting this via an env var i.e.
MEDCAT_LOGSIf this is False or 0, no logs / usage at all Also we need logs to persist outside of process exit
Do we want dynamic changes to the environmental variables to be reflected in medcat? Or would it be sufficient to check once upon model init / load?
I think it would make sense to be able to change the behaviour dynamically. However, since os.environ is a snapshot of the environmental variables when the process was started, we'd need to somehow retrieve these changes (don't know if this is trivial).
What I'm thinking is something along the following lines:
- Allow
enabledto beTrue,False, orauto - When set to
False, no logging (obviously) - When set to
True, the config-specified files are used - When set to
auto, the behaviour is automatic- Based on the environmental variable
- The environmental variable is checked periodically (i.e no more often than 10 seconds)
- The file location is picked automatically based on OS
- I.e
~/.local/share/medcat/logs/for *nix andC:\Users\%USERNAME%\.cache\medcat\logs\for Windows
- I.e
As for persistent - the current implementation (outside the tests) is persistent.
Sounds good - happy for auto just make sure the docstrings explain the behaviour.