rfcs RFC: Configurable Build Event Stores

Experimental PR: https://github.com/concourse/concourse/pull/5651

Signed-off-by: Aidan Oldershaw [email protected]

May 10 '20 17:05 aoldershaw

Awesome!

May 11 '20 01:05 evanchaoli

Thanks for your feedback @agurney! You bring up some really good points. Going to preface my response by saying I'm not an expert on this stuff, so please bear with me if I say anything that doesn't make sense 😄- if you disagree with anything, please let me know!

I'd like to probe a bit on what's envisaged here, because I am uneasy about bringing a lot of fresh complexity into Concourse itself as far as log forwarding. There are existing tools like logstash, fluentd, or NiFi, which can do all sorts of things in terms of fanout, enrichment/transformation, and high-performance streaming

I looked briefly into logstash during my investigation into Elasticsearch, but I've never used any log forwarding tools myself so don't fully grasp the value-add for Concourse build events (besides performance, as I imagine they're quite well optimized). To me, it seems like one benefit is turning raw unstructured logs into structured data (I suppose that's the enrichment/transformation you bring up), but Concourse build events already are structured to a certain degree. Another benefit is the decoupling of inputs from the outputs - as you mention, Concourse could just emit to syslog, and logstash/fluentd/NiFi could forward the logs to the correct destination. However, given the fact that Concourse will need to retrieve logs from these destinations as well, there's an implicit coupling that still needs to exist somewhere:

Then the implementation is that the builds DB table grows an extra column for the remote storage URL (maybe with other data as needed for retrieval, such as auth tokens or whatever). When build logs are requested, and not found locally, we fetch them via that URL instead.

I'm not quite sure what you mean by "storage URL" - do you mean something like a https://... API endpoint that Concourse can just call out to without needing to know what's on the other side, or something with a provider-specific scheme that Concourse has to interpret: e.g. elasticsearch://{elasticsearch-url}/builds/{build_id}.

If you mean the former, I'm not convinced this will work in general. For instance, if we emit build events to Elasticsearch, we could store the /{index}/_search API endpoint with an appropriate query, but how would we go about e.g. pagination, or even parsing the results into a common form?

If you mean the latter, wouldn't it be easier to just have an implementation of an EventStore that has a Get method (that can live outside of the core-Concourse codebase)?

Let me know if I completely misunderstood what you meant here 😄

So I think the remaining non-existing piece is retrieving logs from their remote home, for display through Concourse. I'm imagining a world where logs always go to the local Postgres for at least a few days, but some operators will configure aggressive local reaping combined with forwarding into another store

This sounds a lot like what @jchesterpivotal was saying here: https://github.com/concourse/rfcs/pull/53#discussion_r424025339 (keep recent builds as close as possible to the ATC, and optionally configure an external event store for archiving builds). I think that's a good way to look at the problem. I did reply to his comment mentioning that I think this could be modelled with the proposed EventStore interface, but it's probably worth revisiting the proposed design with that in mind.

I'm not sure that it should be Concourse's job to decide when remote logs should be reaped. Once we've passed the data on, it's the other system's job to implement its own retention policy, according to whatever criteria happen to make sense... We may also not want to force log deletion just because a pipeline has been deleted - maybe those logs are still useful for analytics

That's a really good point, I hadn't considered that! Do you think there's value in Concourse letting the external system know "by the way, this pipeline no longer exists, so you may want to delete the build events", and the system can choose to delete the events or not?

EDIT: I think if you want to keep the build events around, it probably makes sense to archive the pipeline rather than delete it (in which case we wouldn't try to delete the build events in the first place). It probably doesn't always make sense to enable the "build log reaper" - but that's opt-in, anyway.

How should Concourse be expected to authenticate itself to a remote log store, given that not all users should see all logs, and remote systems have their own idea of how access control works? (That's still the case for the current proposal as well, since a given EventStore would have to implement that logic in some fashion.)

I'm not 100% sure of what types of access control remote systems have, so maybe you could speak more to that. I kind of envisioned it as working like it does currently with Postgres: the ATC is provided credentials that would grant access to all build events, and Concourse implements handles user access controls by team at the API. So, EventStore implementations assume they're able to access build events for any build with the credentials they're configured with, and it's up to Concourse not to serve build events to users who aren't able to access them.

May 28 '20 19:05 aoldershaw