MedCAT CU-8694wh3d5 track usage

Add a usage monitor.

What it does is monitor:

The input text length
The input preprocessed text length
The number of output entities

The usage monitor is configured such that it logs on file every time 10 (configurable) inferences are logged (to avoid constant IO). It also logs the rest of the buffer when the user monitor is dereferenced.

This PR also adds the relevant config part (config.general.usage_monitor). The usage monitoring can be disabled if needed by changing the config.

Jun 21 '24 16:06 mart-r

Task linked: CU-8694wh3d5 Track model usage

Jun 21 '24 16:06 tomolopolis

setting this via an env var i.e. MEDCAT_LOGS If this is False or 0, no logs / usage at all Also we need logs to persist outside of process exit

Do we want dynamic changes to the environmental variables to be reflected in medcat? Or would it be sufficient to check once upon model init / load? I think it would make sense to be able to change the behaviour dynamically. However, since os.environ is a snapshot of the environmental variables when the process was started, we'd need to somehow retrieve these changes (don't know if this is trivial).

What I'm thinking is something along the following lines:

Allow enabled to be True, False, or auto
When set to False, no logging (obviously)
When set to True, the config-specified files are used
When set to auto, the behaviour is automatic
- Based on the environmental variable
- The environmental variable is checked periodically (i.e no more often than 10 seconds)
- The file location is picked automatically based on OS
  - I.e ~/.local/share/medcat/logs/ for *nix and C:\Users\%USERNAME%\.cache\medcat\logs\ for Windows

As for persistent - the current implementation (outside the tests) is persistent.

Jul 10 '24 15:07 mart-r

Sounds good - happy for auto just make sure the docstrings explain the behaviour.

Jul 16 '24 15:07 tomolopolis