Constructing Knowledge Graphs from Log Extraction and Integrate with Cartography mappings
I would like to enrich my knowledge graph with logs from different services. As an example, user A created a bucket or any other event in my system and I have the logs about it. Now that I have my cloud mapped, it is a live system that produces logs and events. I would like to add this information to my KG. Can it be integrated with cartography? New intel? What are the best practices for doing this?
Given such option, I will be able to apply graph data science models and gain useful insights. I would like to connect to all logs and merge them with my cartography-based Knowledge Graph, build nodes and relationships. My graph will become a live organism as a result of this.
@ramonpetgrave64
This is a great idea. One prerequisite for this is having consistent identifiers with which to correlate objects in both logs and Cartography data. As far as I can tell, Cartography does have unique identifiers for most things, but the property names in which you'll find them aren't consistent.
Just to give an example, in an AWSAccount you'd need to use id, whereas an AWSUser has a userid and arn but not an id.
What about joining forces and add this functionality?
It would be awesome. I've already been working on something similar and had to fork because Cartography was missing quite a few things (not just the IDs, but support for Neo4j 4.x among other things). I would prefer to have this in the base Cartography than to maintain it separately.
But, I'll have to see if my employer could set aside time for me to do this, first.
And, we'd need to find a way to add these IDs that is agreeable with the maintainers. For example, I don't suppose we can change existing property names as it would break backwards compatibility. We'd probably need to add a new field that mirrors IDs in some existing fields.
@danielsaporo can you share your work? I have found some work named SLOGERT. Will be happy to arrange a Zoom meeting with you and have a look on your progress. I have some ideas.
@steve-solun I'm not sure what I can share - I'll check and get back to you. But in terms of IDs, I didn't do much more than add an extra field to some of the types as described earlier.
It's worth hearing from one of the maintainers what they think about the consistent IDs. Maybe there's a better way I haven't thought of.
Can you please tag the relevant maintainers? @danielsaporo
@achantavy
@danielsaporo @steve-solun - Filed https://github.com/lyft/cartography/issues/1024 to track consistent IDs. I've started to do this in #895. I really want to spend more cycles there but...
I'll have to see if my employer could set aside time for me to do this, first.
I'm needing to balance this too :).
Anyway, this is a legit problem and we will fix it.
For @steve-solun's idea on correlating this with log extraction though, I wonder if that correlation tool would be better suited for another tool: it might make most sense to have another tool pull from a neo4j database created with cartography and then correlate that against a log source and then put that in a secondary data store.
I see, what should be our action items @achantavy ?
@steve-solun - Following up on this a bit, you might want to check out https://github.com/grapl-security/grapl - https://www.youtube.com/watch?v=uErWRAJ4I4w. I haven't dug deep into the code but it seems like it accomplishes the scenario you are looking for (assuming it has a CloudTrail plugin).
Dear @achantavy thanks so much for the share, I will check it out. It's a great pleasure to cooperate with you and your team :)
Converting this to a discussion. We can then decide on concrete deliverables and create those as issues.