rita WIP: Added document to explain measurements in RITA

Added a document that explains measurements in RITA Based on documents I was given I wasn't sure if we wanted a document like this, or one that literally went through and explained the column titles given in show-* commands, if we wanted something else I can rework this into that easily

Jul 30 '19 22:07 carrohan

What you have is good too. I don't think the interval score and data size score are actually displayed in the beacon output, but we do store those in the database. I'm not sure if we should explain those in the doc or not.

But I would like something that goes through and literally explains the column titles in the show-* commands. Maybe something like this could go under each section in the document (e.g. "## Beacons") before each of the in depth explanations. Each list item could be a link (if applicable) to the relevant section in the document or have a short description if there is not a section. If you need help figuring out how to get the links right let me know. I think there are examples in our other docs though.

The following are the column titles you will see in the output of rita show-beacons. You can click them for more information.

Score (just link to the score section)
Source IP - IP address that initiated the connections.
Destination IP - IP address that received the connections.
Connections - The total number of connections between the source and destination IPs.
Avg Bytes - etc.
Intvl Range
Size Range
Top Intvl
Top Size
Top Intvl Count
Top Size Count
Intvl Skew (just link to the section)
Size Skew (just link to the section)
Intvl Dispersion (just link to the section)
Size Dispersion (just link to the section)

Aug 14 '19 15:08 ethack

I fully agree with @ethack I think it would provide more insights to document each column. As I suggested in the issue #273 I think a similar approach could be useful for the analyzers in order to help future / independant maintainers that might not have prior insider knowledge.

Jan 03 '20 09:01 Spriithy

As I dug through the code and tried to make sense of the several indicators and scores used in the analyzer I really wished I had some documentation to back my intuitions mostly regarding the choices made.

Why pick a 30 seconds in the computation of the tsMadmScore and use 32 seconds right after for dsMadmScore ?

Anyways, I think the indicators are all straightforward. Just the scores might need some explanation.

Jan 03 '20 18:01 Spriithy

@Spriithy Thanks for the feedback!

To answer your question about these lines,

https://github.com/activecm/rita/blob/d7f7b17928d02b003b9de02095acda33617924a8/pkg/beacon/analyzer.go#L164-L174

ts stands for timestamp and refers to the connection interval metrics. ds stands for data size and refers to the connection size metrics. In both cases the divisors are setting a value to normalize the score against. So anything with an interval dispersion greater or equal to 30 seconds will all have the same tsMadmScore of 0. Likewise, anything with a data size dispersion greater or equal to 32 bytes will have the same dsMadmScore of 0.

I'm not entirely sure how 30 seconds and 32 bytes were picked. It could be they were just arbitrary choices that tended to work well.

Jan 06 '20 16:01 ethack

Thanks for the feedback ! Is there some sort of roadmap for the project ? Anything maybe we could contribute to ?

Jan 06 '20 16:01 Spriithy

No public roadmap :( But any issue marked "good first issue" would be very helpful and contributions would be welcome. If you're interested in any of them just start commenting on the issue with questions or a proposed solution.

Jan 06 '20 16:01 ethack

I updated the doc I'd written before to hopefully explain some of the headers better, but didn't have a chance to get it proof read. I'm not totally sure if it's what you were hoping for @Spriithy so please feel free (but not obligated by any means) to give any feedback (and apologies for taking so long to get back to this!)

Feb 14 '20 20:02 carrohan

I think an explanation is still needed to explain what is a good / bad score of beacon.

We do not understand immediately that the score is not a probability.
We do not understand what threshold can be used to separate beacon and non-beacon (or even if using a threshold make sense).

Apr 21 '21 09:04 thibaultbl