devops-automation Proposal: Dependency Freshness Score

Context

This proposal is in the context of mitigating supply chain risk around OSS that many banks are looking to address.

We are proposing this at FINOS to gauge interest in collaborating on this as an open-source project, affiliated with the DevOps SIG, perhaps in conjunction with the OpenSSH ScoreCard project (https://github.com/ossf/scorecard).

This proposal goes beyond known vulnerabilities and aims to quantify dependency risk in an actionable manner so that application developers can upgrade to more recent releases, and thus mitigate potential dependency risk preemptively.

Rationale

For the majority of open-source software, the concept of End Of Life (as it is defined for vendor software) does not apply. There is no official support to begin with, and thus no "end of support" either. However, OSS library releases can easily become stale, as more recent releases are available, and those more recent releases will include bug fixes and potential security vulnerability fixes, which the application developers would not benefit from unless they upgrade to those more recent releases.

It is however not obvious for an application developer to know which releases are stale, which need upgrading, and also it is particularly difficult to quantify staleness in a way that can be aggregated and reported on for multiple projects across an organization, which poses a major problem - we cannot improve what we cannot measure.

This proposal aims to address all of these issues.

Note: end of life does exist for some OSS libraries, for example older versions of Angular are marked EOL and replaced with newer projects. These are special cases that would be handled above and beyond this proposal.

Library Staleness Indicator

The first element is an identification, for a given library release, of exactly "how stale" it is vs the most recent available release. In order to determine this, it is necessary to rank releases in order of most to least recent. Since versioning schemes of OSS libraries vary greatly, we propose fundamentally two methods:

For those libraries which support Semantic Versioning (SemVer) then this convention of major / minor / patch should be used to accurately rank releases in order. This will ensure that 2.x releases are always considered "newer" than 1.x releases, irrespective of the date they were published
For those without SemVer support, the release date can be used as a proxy to rank releases in publication order

Lots of exceptions exist, but those cover the majority of cases.

Then, a library staleness indicator can be calculated by taking into account both the gap in number of months between a given release and the latest (using a date-based penalty score) and the number of releases between the release and the latest (a rank-based penalty score).

The end result is a numerical value that determine "how stale" is a given library release at a given moment in time (this will evolve over time as more releases are made available).

Release Freshness Score

The above provides a per-library scoring, but an application can contains hundreds of library dependencies.

It is therefore necessary, before this can be useful, to go one step further and produce an aggregate score at the application level, for all library releases this application depends on.

Here, we consider that an application's dependency freshness is only as good as the weakest link in the chain, meaning we take the top "most stale" libraries and use that to compute an aggregate freshness score. In other words, upgrading releases that are 3 months old while not touching libraries that are 5 years old will not influence the freshness score much, this is done purposefully to incentivize application developers to focus on the oldest (and thus logically the ones with potentially the most risk) and upgrade those first.

The final result is a single numerical value that represents a dependency freshness score for an application, and this can be aggregated and averages calculated for all applications in an organization, and once application developers start to action this by upgrading their library dependencies, these numbers are expected to improve over time.

Next Steps

The above focuses on actionable insights only, meaning if a given library has no release since 2017, and an application is using that most recent 2017 release, then it would be considered "fresh". This is important, the freshness score has to be actionable by application developers (by upgrading releases to newer versions).

This does not mean that a "dead" library project poses no risk: if it is dead and not being developed anymore, it could very well have vulnerabilities that are not addressed. But it cannot be simply upgraded in-place, the only possibility would be to remove the dependency entirely and replace it with another, which might not be feasible or desirable depending on the library in question. Hence the first focus of the freshness score is on what can be upgraded, but there is more to consider.

Beyond a mere freshness score, other criteria would need to be considered to establish a true Dependency Risk Score:

Known CVE vulnerabilities (for obvious reasons)
Latest Release Age (a "dead" library could be considered fresh while still posing risks)
OSS activity (are there still fixes being merged?)
and several other factors, out of scope here

Note: OSS library scorecards exist that can supply part of the above.

Mar 16 '22 19:03 brunon

This sounds a great idea, I am very keen to be involved and support this however I can.

Mar 17 '22 16:03 peter-thomas-db

As discussed in the meeting, this seems like an excellent idea and one that is broadly applicable beyond the finance industry.

Some suggestions:

Expanding on @ashukla13's idea, it would be nice to have a set of metrics. For any given dependency it would be good to track:

How much work the dependency is doing across the estate
How many transitive dependencies are being brought in as a result of this one
The "popularity" somehow of a given dependency (maybe based on committers / stars / NPM downloads)

... all of which are factors I use when trying to evaluate whether to start using a dependency, but should be considered throughout the lifetime of the software.

Dependabot

I would take a close look at this: it is modular and isn't (necessarily) tied to GitHub, and contains plenty of tooling to support pulling the dependency data you need. The community around that tool might be willing to help with building these metrics too.

Mar 17 '22 16:03 robmoffat

This issue was discussed at #46 but SIG members were on vacation due to holidays in the UK and US. The decision was made to socialise this issue so it can be picked up at the next scheduled DevOps call.

Please give your feedback in the comments below.

Apr 21 '22 16:04 mcleo-d