Unusual version filtering/ sorting
While using this resource with the BOSH releases, old versions are getting discovered as latest. Where the latest release is v263 the release being detected is stable-3363.24.
After doing some investigation it became increasingly unclear what the expected behaviour of this resource is with regards to release version filtering and sourcing. After looking through commit history I found: 7331e0a7adf030efc35eb086967f0c16b47ff0c1 containing the note NOTE: There is also strict filtering on only allowing semver supported tags. (the note is also in the README Note that releases must have semver compliant tags to be detected.) using v?([^v].*) as a regex matcher. Based on that commit it appears as though it was not the intention to match tags like stable-3363.24.
In order to match only match semver compliant versions I swapped out the determineVersionFromTag with below:
func determineVersionFromTag(tag string) string {
re := regexp.MustCompile(`^v?(\d+\.)?(\d+\.)?(\*|\d+)$`)
matches := re.FindStringSubmatch(tag)
if len(matches) > 0 {
return strings.Join(matches[1:], "")
}
return ""
}
On doing this and running the tests I discovered a number of failing tests, one example is:
Expected
<[]resource.Version | len:2, cap:2>: [
{Tag: "0.4.0", ID: ""},
{Tag: "v0.4.2", ID: ""},
]
to equal
<[]resource.Version | len:6, cap:6>: [
{Tag: "0.4.0", ID: ""},
{Tag: "0.4.1-rc.9", ID: ""},
{Tag: "v0.4.1-rc.10", ID: ""},
{Tag: "v0.4.2-rc.1", ID: ""},
{Tag: "v0.4.2", ID: ""},
{Tag: "v0.4.3-rc.1", ID: ""},
]
This test appears to be matching versions that are not strictly semver versions but can easily be matched by swapping the above regex to ^v?(\d+\.)?(\d+\.)?(\*|\d+.*)$ or even using a slightly modified version of the original regex doing a less strict match ^v?([^v].*).
During my investigations I also found that the github library that is being used makes it relively easy to filter based on the latest published version using the RepositoryRelease.PublishedAt field, the sort function would look something like:
type byVersion []*github.RepositoryRelease
func (r byVersion) Len() int {
return len(r)
}
func (r byVersion) Swap(i, j int) {
r[i], r[j] = r[j], r[i]
}
func (r byVersion) Less(i, j int) bool {
return r[i].PublishedAt.Before(r[j].PublishedAt.Add(0))
}
Some example filter/ sorted outputs:
Current:
v260.1
v260.2
v260.3
v260.4
v260.5
v260.6
v261
v261.1
v261.2
v261.3
v261.4
v262
v262.1
v262.2
v262.3
v263
stable-3312.9
stable-3312.12
stable-3312.15
stable-3312.17
stable-3363
stable-3363.1
stable-3363.9
stable-3363.10
stable-3363.14
stable-3363.15
stable-3363.19
stable-3363.20
stable-3363.22
stable-3363.24
Strict Semver:
v260.1
v260.2
v260.3
v260.4
v260.5
v260.6
v261
v261.1
v261.2
v261.3
v261.4
v262
v262.1
v262.2
v262.3
v263
Timestamp:
stable-3312.9 2016-12-15 07:43:18 +0000 UTC
v260.1 2016-12-20 03:16:26 +0000 UTC
stable-3312.12 2016-12-20 19:24:14 +0000 UTC
v260.2 2017-01-12 07:56:36 +0000 UTC
stable-3312.15 2017-01-13 03:00:18 +0000 UTC
v260.3 2017-01-14 00:16:44 +0000 UTC
v260.4 2017-01-21 02:16:34 +0000 UTC
v260.5 2017-01-24 00:00:34 +0000 UTC
stable-3312.17 2017-02-02 23:23:53 +0000 UTC
v260.6 2017-02-03 21:29:34 +0000 UTC
v261 2017-02-16 02:15:09 +0000 UTC
stable-3363 2017-02-16 02:36:29 +0000 UTC
v261.1 2017-02-17 18:50:41 +0000 UTC
stable-3363.1 2017-02-18 00:46:10 +0000 UTC
v261.2 2017-02-22 02:43:15 +0000 UTC
stable-3363.9 2017-02-23 02:46:25 +0000 UTC
v261.3 2017-03-06 18:39:33 +0000 UTC
stable-3363.10 2017-03-09 00:04:27 +0000 UTC
v261.4 2017-03-14 16:56:19 +0000 UTC
stable-3363.14 2017-03-31 00:41:58 +0000 UTC
stable-3363.15 2017-04-05 23:16:00 +0000 UTC
stable-3363.19 2017-04-17 22:53:28 +0000 UTC
stable-3363.20 2017-04-25 23:31:04 +0000 UTC
stable-3363.22 2017-05-15 17:43:00 +0000 UTC
stable-3363.24 2017-05-22 21:48:14 +0000 UTC
v262 2017-06-06 13:41:50 +0000 UTC
v262.1 2017-06-10 02:40:25 +0000 UTC
v262.2 2017-06-28 22:03:41 +0000 UTC
v262.3 2017-07-06 18:00:58 +0000 UTC
v263 2017-08-14 19:33:31 +0000 UTC
Happy to raise a pull request to impliment any of the above but some guidance around what the correct direction is would be extremely helpful.
- How strict do we want to be with Semver versions?
- Should semver version matching be a configuration option?
- Should we provide options on sorting via configuration (PublishedAt or highest semver)
IMO the most reasonable result would be to just ignore the stable-3312.9 tags. As you pointed out, it seems like we're unintentionally matching them. I don't think there's a particularly correct way to interpret the stable-* vs. v* tags - to me it's weird that the single repo serves as a source of multiple types of releases. There's no clear way that they relate to each other, and thus they can't be reasonably ordered semantically.
If the releases had more than just descriptions, the reasoning for having it never return both is pretty clear: one release would contain stemcells, and the next would contain releases - there's no clear way to use that, from a CI standpoint. You'd either always want stemcells or always want releases.
PR #54 has been raised.
I'm with @vito. The fact there are two different versioning schemas in the same repo is difficult to determine. The #54 PR might resolve your issue, but it could expose a better interface.
I'd propose that we expose an interface for tag matching like the git resource does with tag_filter -- option 2 in your list above.
@jtarchie thanks for the feedback, I agree, having it optional would be best, ill rework #54. Do we want to leave the default behavior as it is in #54 right now (fixed semver matching) and have tag_filter override it? I don't want to introduce breaking change if we can avoid it.
@vito @jtarchie #54 has now been updated, it includes a few new options based on your feedback.
Default behaviour: Uses semver matching with minor improvements to filter out the stable releases etc as detailed in this issue.
New behaviour:
-
tag_filter_regex: Optional. If you want to match tags that are not semver compliant you can use this setting to filter by any regex that you like. You may also wish to match minor semver versions only via a regex of^v3.2.\d+$or something similar. By default versions will be sorted based on published date unless thesemverflag is also set, in which case versions will be sorted by number. -
semver: Optional. Defaultfalse. This is only used whentag_filter_regexis present, semver versions are used by default iftag_filter_regexis not defined. When set totrueresource ordering will be based on semver version numbers. -
date_ordered: Optional. Defaultfalse. When set to true the github release published at date will be used to find the newest version of a resource. Iftag_filter_regexis set this is the default ordering mechanism.