github-release-resource icon indicating copy to clipboard operation
github-release-resource copied to clipboard

Unusual version filtering/ sorting

Open sambryant4 opened this issue 8 years ago • 5 comments

While using this resource with the BOSH releases, old versions are getting discovered as latest. Where the latest release is v263 the release being detected is stable-3363.24.

After doing some investigation it became increasingly unclear what the expected behaviour of this resource is with regards to release version filtering and sourcing. After looking through commit history I found: 7331e0a7adf030efc35eb086967f0c16b47ff0c1 containing the note NOTE: There is also strict filtering on only allowing semver supported tags. (the note is also in the README Note that releases must have semver compliant tags to be detected.) using v?([^v].*) as a regex matcher. Based on that commit it appears as though it was not the intention to match tags like stable-3363.24.

In order to match only match semver compliant versions I swapped out the determineVersionFromTag with below:

func determineVersionFromTag(tag string) string {
  re := regexp.MustCompile(`^v?(\d+\.)?(\d+\.)?(\*|\d+)$`)
  matches := re.FindStringSubmatch(tag)
  if len(matches) > 0 {
    return strings.Join(matches[1:], "")
  }
  return ""
}

On doing this and running the tests I discovered a number of failing tests, one example is:

Expected
    <[]resource.Version | len:2, cap:2>: [
        {Tag: "0.4.0", ID: ""},
        {Tag: "v0.4.2", ID: ""},
    ]
to equal
    <[]resource.Version | len:6, cap:6>: [
        {Tag: "0.4.0", ID: ""},
        {Tag: "0.4.1-rc.9", ID: ""},
        {Tag: "v0.4.1-rc.10", ID: ""},
        {Tag: "v0.4.2-rc.1", ID: ""},
        {Tag: "v0.4.2", ID: ""},
        {Tag: "v0.4.3-rc.1", ID: ""},
    ]

This test appears to be matching versions that are not strictly semver versions but can easily be matched by swapping the above regex to ^v?(\d+\.)?(\d+\.)?(\*|\d+.*)$ or even using a slightly modified version of the original regex doing a less strict match ^v?([^v].*).

During my investigations I also found that the github library that is being used makes it relively easy to filter based on the latest published version using the RepositoryRelease.PublishedAt field, the sort function would look something like:

type byVersion []*github.RepositoryRelease

func (r byVersion) Len() int {
  return len(r)
}

func (r byVersion) Swap(i, j int) {
  r[i], r[j] = r[j], r[i]
}

func (r byVersion) Less(i, j int) bool {
  return r[i].PublishedAt.Before(r[j].PublishedAt.Add(0))
}

Some example filter/ sorted outputs:

Current:

v260.1
v260.2
v260.3
v260.4
v260.5
v260.6
v261
v261.1
v261.2
v261.3
v261.4
v262
v262.1
v262.2
v262.3
v263
stable-3312.9
stable-3312.12
stable-3312.15
stable-3312.17
stable-3363
stable-3363.1
stable-3363.9
stable-3363.10
stable-3363.14
stable-3363.15
stable-3363.19
stable-3363.20
stable-3363.22
stable-3363.24

Strict Semver:

v260.1
v260.2
v260.3
v260.4
v260.5
v260.6
v261
v261.1
v261.2
v261.3
v261.4
v262
v262.1
v262.2
v262.3
v263

Timestamp:

stable-3312.9       2016-12-15 07:43:18 +0000 UTC
v260.1              2016-12-20 03:16:26 +0000 UTC
stable-3312.12      2016-12-20 19:24:14 +0000 UTC
v260.2              2017-01-12 07:56:36 +0000 UTC
stable-3312.15      2017-01-13 03:00:18 +0000 UTC
v260.3              2017-01-14 00:16:44 +0000 UTC
v260.4              2017-01-21 02:16:34 +0000 UTC
v260.5              2017-01-24 00:00:34 +0000 UTC
stable-3312.17      2017-02-02 23:23:53 +0000 UTC
v260.6              2017-02-03 21:29:34 +0000 UTC
v261                2017-02-16 02:15:09 +0000 UTC
stable-3363         2017-02-16 02:36:29 +0000 UTC
v261.1              2017-02-17 18:50:41 +0000 UTC
stable-3363.1       2017-02-18 00:46:10 +0000 UTC
v261.2              2017-02-22 02:43:15 +0000 UTC
stable-3363.9       2017-02-23 02:46:25 +0000 UTC
v261.3              2017-03-06 18:39:33 +0000 UTC
stable-3363.10      2017-03-09 00:04:27 +0000 UTC
v261.4              2017-03-14 16:56:19 +0000 UTC
stable-3363.14      2017-03-31 00:41:58 +0000 UTC
stable-3363.15      2017-04-05 23:16:00 +0000 UTC
stable-3363.19      2017-04-17 22:53:28 +0000 UTC
stable-3363.20      2017-04-25 23:31:04 +0000 UTC
stable-3363.22      2017-05-15 17:43:00 +0000 UTC
stable-3363.24      2017-05-22 21:48:14 +0000 UTC
v262                2017-06-06 13:41:50 +0000 UTC
v262.1              2017-06-10 02:40:25 +0000 UTC
v262.2              2017-06-28 22:03:41 +0000 UTC
v262.3              2017-07-06 18:00:58 +0000 UTC
v263                2017-08-14 19:33:31 +0000 UTC

Happy to raise a pull request to impliment any of the above but some guidance around what the correct direction is would be extremely helpful.

  • How strict do we want to be with Semver versions?
  • Should semver version matching be a configuration option?
  • Should we provide options on sorting via configuration (PublishedAt or highest semver)

sambryant4 avatar Aug 31 '17 07:08 sambryant4

IMO the most reasonable result would be to just ignore the stable-3312.9 tags. As you pointed out, it seems like we're unintentionally matching them. I don't think there's a particularly correct way to interpret the stable-* vs. v* tags - to me it's weird that the single repo serves as a source of multiple types of releases. There's no clear way that they relate to each other, and thus they can't be reasonably ordered semantically.

If the releases had more than just descriptions, the reasoning for having it never return both is pretty clear: one release would contain stemcells, and the next would contain releases - there's no clear way to use that, from a CI standpoint. You'd either always want stemcells or always want releases.

vito avatar Aug 31 '17 14:08 vito

PR #54 has been raised.

sambryant4 avatar Sep 01 '17 08:09 sambryant4

I'm with @vito. The fact there are two different versioning schemas in the same repo is difficult to determine. The #54 PR might resolve your issue, but it could expose a better interface.

I'd propose that we expose an interface for tag matching like the git resource does with tag_filter -- option 2 in your list above.

jtarchie avatar Sep 05 '17 15:09 jtarchie

@jtarchie thanks for the feedback, I agree, having it optional would be best, ill rework #54. Do we want to leave the default behavior as it is in #54 right now (fixed semver matching) and have tag_filter override it? I don't want to introduce breaking change if we can avoid it.

sambryant4 avatar Sep 05 '17 18:09 sambryant4

@vito @jtarchie #54 has now been updated, it includes a few new options based on your feedback.

Default behaviour: Uses semver matching with minor improvements to filter out the stable releases etc as detailed in this issue.

New behaviour:

  • tag_filter_regex: Optional. If you want to match tags that are not semver compliant you can use this setting to filter by any regex that you like. You may also wish to match minor semver versions only via a regex of ^v3.2.\d+$ or something similar. By default versions will be sorted based on published date unless the semver flag is also set, in which case versions will be sorted by number.

  • semver: Optional. Default false. This is only used when tag_filter_regex is present, semver versions are used by default if tag_filter_regex is not defined. When set to true resource ordering will be based on semver version numbers.

  • date_ordered: Optional. Default false. When set to true the github release published at date will be used to find the newest version of a resource. If tag_filter_regex is set this is the default ordering mechanism.

srbry avatar Sep 13 '17 08:09 srbry