vulnerablecode CRAVEX: Collect and normalize exploit pointers

We want to collect data about exploits.

[ ] https://github.com/nexB/vulnerablecode/issues/1452
[x] https://github.com/nexB/vulnerablecode/issues/1453
[x] https://github.com/nexB/vulnerablecode/issues/1454
[ ] https://github.com/nexB/vulnerablecode/issues/1455

See discussion document at https://docs.google.com/document/d/1XtMmxthmANhr-IqXsyMgFnrOq5fTGfsE/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true

See work-in-progress normalized model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing

Sep 26 '19 21:09 pombredanne

exploitdb has moved to https://gitlab.com/exploit-database/exploitdb

Apr 29 '23 12:04 armijnhemel

This is a nice dataset:

https://github.com/nomi-sec/NVD-Exploit-List-Ja the license is TBD though https://github.com/nomi-sec/NVD-Exploit-List-Ja/issues/1

Also:

https://github.com/Patrowl/PatrowlHearsData is Apache-licensed
https://github.com/CERTCC/labyrinth/ by @ahouseholder now has a license and already aggregates the above.

Jul 20 '23 21:07 pombredanne

And also https://github.com/nomi-sec/PoC-in-GitHub

Jul 20 '23 21:07 pombredanne

We're tagging vul IDs (more than just CVE) at

https://github.com/CERTCC/exploitdb
https://github.com/CERTCC/metasploit-framework

We pull updates from their respective repositories every few hours, crawl the diffs for IDs we recognize, and then tag the commit in which the ID first appeared.

https://github.com/CERTCC/metasploit-framework/tags
https://github.com/CERTCC/exploitdb/tags

The ID patterns we look for are here: https://github.com/CERTCC/git_vul_driller/blob/dd49cec61aac5ee9e84d57313a7876145e0b1522/git_vul_driller/patterns.py#L15-L56

Jul 21 '23 20:07 ahouseholder

https://github.com/CERTCC/labyrinth/ by @ahouseholder now has a license and already aggregates the above.

Just to set expectations on data quality: Please be aware of the notes about signal-to-noise in the Labyrinth README. An ID that shows up in Labyrinth might be because there's an exploit repo that mentions it, or it could be a number of other relatively benign reasons because our code isn't smart enough to tell the difference. Labyrinth's findings are meant to serve as input to an analysis process, not a production exploit feed.

Jul 21 '23 20:07 ahouseholder

On the other hand, we're more confident about the exploitdb/metasploit tags indicating exploits because there's a human vetting process involved (i.e., their developers decide what to include in their product).

Jul 21 '23 20:07 ahouseholder

@ahouseholder Thank you for the valuable insights. In the end, I want to know if my code is vulnerable. So the idea here with exploits is this, inc combination with reachability:

Using scancode to detect packages (PURL) or any of the many other tools that use PURLs and a lookup in VulnerableCode or in any other vulnerability DB that uses PURL I know I am potentially vulnerable
The I would like to automate as much as possible things to find out if my usage of the vulnerable package is exploitable. For this there are two tracks that I think can help: 2.1. Static reachability: given knowledge of a fix commit and a static analysis of the vulnerable library interaction with my code, do I use any of the vulnerable code paths? 2.2. Dynamic exploitability: given an exploit script (eventually curated to conform to a common interface and setup), is my configuration exploitable?

And if I am either exploitable or the vulnerable code is reachable, then I need to patch (possibly with the fix commit)

Jul 22 '23 10:07 pombredanne

Related: https://github.com/nexB/vulnerablecode/issues/655

Jul 28 '23 09:07 armijnhemel

See discussion document at https://docs.google.com/document/d/1XtMmxthmANhr-IqXsyMgFnrOq5fTGfsE/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true

See work-in-progress normalized model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing

Aug 02 '24 19:08 DennisClark

The proposed normalized Exploits model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing is ready for review.

Aug 15 '24 19:08 DennisClark

The proposed normalized Exploits model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing has been reviewed and ready for implementation.

Aug 19 '24 16:08 DennisClark