pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

Proposal: pipeline add global flag to limit history log retaintion

Open parkzhou0527 opened this issue 3 years ago • 3 comments

Background

We have a pipeline installation running for almost 3 years (in one of our customers); so build history accumulated for quite a large number of 1000+;

And our developers find that it's very very slow to open the pipeline homepage, and the chrome memory was consumed too much.

After a quick debug, we find it's was caused by the pipeline history, the homepage tries to load all the history at once.

So, we cleanup the history to less than 200; everything runs so quickly again.

Proposal

Pipeline should support a global flag --log-limits (which might have it's own default value, say 200); this global flag was respected by the pipeline.

We do not want to too much complicated policy to control the history log reservation. For our daily development cycle, the latest 5 build log (I means most of the times) is enough to let us debug errors.

parkzhou0527 avatar May 16 '22 07:05 parkzhou0527

@kaleocheng

parkzhou0527 avatar May 16 '22 07:05 parkzhou0527

actually there are two issues:

  1. the backend loads all history for each get pipelines api https://github.com/Wiredcraft/pipelines/blob/aeb8b65d47bd77ef7b5e9e4f21b8b1933fc39952/pipelines/api/utils.py#L82 which can be a mem killer and slow down api response
  2. there is no pagination support on the get pipelines api means frontend also render all history meta data, which can slow down the web ui loading.

for the first one in a long-term we'd better to introduce a db storage for pipelines( e.g. redis/postgres), but it's another topic, before that we can apply a quick fix on the current file based storage: 1. update https://github.com/Wiredcraft/pipelines/blob/aeb8b65d47bd77ef7b5e9e4f21b8b1933fc39952/pipelines/api/utils.py#L81 to only return latest X items sorted by created or modified time 2. add new flag for the X for pipelines cli. let's take Park's suggestion name --log-limits

for the second one I will create another ticket to determine.

kaleocheng avatar May 17 '22 06:05 kaleocheng

for a quick mitigation, see #134.

current run folder names are just uuid, lacking metadata, causing hard to sort without further looking into the status JSON files.

a further mitigation could be writing an additional metadata status file for each pipeline, including history runs, with their creation date etc. this would only happen when create new runs. may still easier than totally switching to DBMS.

sp3c73r2038 avatar May 17 '22 08:05 sp3c73r2038