Better hooks for recon automation
Hi @honoki,
I would like to start a discussion on hooks and their applicability in automatic recon.
Current implementation based on monitoring changes to documents via CouchDB _changes has some issues/limitations:
-
updateevent doesn't carry the information what was updated. So you cannot really trigger any actions on assigning/removing tags, adding/removing IPs from domains, adding/removing scope to programs, changes on titles/return codes on services etc. - There is no
deleteevent. Deleted documents triggerupdateevent - again without info that an "update" was in fact deletion). - Recreated documents (deleted and added again) trigger
updateevent wherenewevent should be triggered.
I'm aware that all of those problems are mainly related to limited functionality of _changes endpoint.
So I started to think if maybe we can extend the database with something like another type of document (e.g. queue) where all changes will be stored and can be picked up by bbrf listen?
Another idea would be to extend all applicable document types (program, domain, ip, service, url) with last_changes object that will carry the information what was changed. Based on the bbrf listen can have more granular event handling.
I've been thinking about this and would like to also incorporate the following changes:
-
bbrf listenonly allows processing changes in real-time, i.e. does not allow processing changes at a later point in time, for example changes that occurred while nobbrf listenwas running. I'd like to add a view to fetch all updates that haven't been marked as processed to bridge that gap;
I'd suggest to add a field history to each document, that would capture the following:
{
"processed": false,
"last_updated": 0,
"event": "NEW"
}
where processed will need to be set to true by any implementation that processes the change (e.g. bbrf listen), and event will be from a defined set: NEW, UPDATED, DELETED. There might be room for discussion here on what this set of events should be, e.g. should UPDATED be extended to include UPDATED_TAGS, UPDATED_IPS, etc.
To facilitate this, a new view queue can be added to the CouchDB server with the following function:
function (doc) {
if(doc.history && doc.history.processed_by == null)
emit( doc.type, doc.history );
else if (!doc.history)
emit( doc.type, {processed_by: null, last_updated: 0, event: "LEGACY"} )
}
This would support a client syntax like:
# get all queued documents
bbrf queue
# get all queued urls
bbrf queue url
# get all queued domains marked as UPDATED
bbrf queue domain updated
I'd love to hear your input on these thoughts.
One disadvantage while testing this seems to be that documents, when deleted in couchdb, are no longer showing up in views. In other words, setting event: DELETED is moot, because the document will never be returned.
This could be avoided by going for a separate document type, as you suggested. E.g. a document type queue with all the history information and a reference to the document. A disadvantage of this approach would be having to send additional requests to the couchdb API for every created/updated/deleted document to create the queue file.
Another complication I thought of is that marking a field processed: false whenever a document is updated makes it tricky for bbrf to e.g. mark them as "processed by listener" because the update is now (while testing locally) hard-coded to always set the history.processed to false.
Maybe this can be fixed by never touching the history.processed upon creation/update, and updating the couchdb view so it only rerturns documents that don't have that field set.