go-cmp icon indicating copy to clipboard operation
go-cmp copied to clipboard

Convenient FilterPath functions

Open dnephin opened this issue 7 years ago • 9 comments

Hello, I've been using go-cmp for a bit now and I've found that the two most common cases for needing an Option are:

  1. AllowUnexported/IgnoreUnexported - which is already easy to use
  2. A FilterPath for fields that have values set from time.Now() (time.Time, or time.Duration)

It wasn't immediately obvious to me how to create a correct FilterPath filter function. After a quick scan of the godoc my first approach was something like this:

func(path gocmp.Path) bool {		
    return path.Last().String() == ".Elapsed"		
}

This worked, but it seems to me like it could potentially match fields that I don't want it to match if I used it with a large nested structure where multiple types had an Elapsed field.

Digging into the godoc further I noticed that a bunch of the types embedded PathStep, which made me realize that a more correct way of using Path would probably be to iterate over the steps, type assert to the expected type, and compare to Name() or Key().

My next iteration (https://github.com/gotestyourself/gotestyourself/pull/62) looks something like this:

func fieldPathWithTypes(pathspec string) func(gocmp.Path) bool {
	return func(path gocmp.Path) bool {
		spec := strings.Split(pathspec, ".")
		index := 0

		for _, step := range path {
			fieldStep, ok := step.(gocmp.StructField)
			if !ok {
				continue
			}
			if index >= len(spec) || spec[index] != fieldStep.Name() {
				return false
			}
			index++
		}
		return index == len(spec)
	}
}

Which could be made a lot simpler if it's expected that PathStep.String() is stable and wont change between minor versions:

func fieldPathSimple(pathspec string) func(gocmp.Path) bool {
	return func(path gocmp.Path) bool {
		return path.String() == pathspec
	}
}

The same idea could be applied using GoString() if that is considered stable. The idea behind both of these functions is that it's much easier for a reader to understand a dotted path string, than it is to read a long function with a bunch of type assertions.

Questions:

  1. What is the expected/"best practice" way of using PathStep? Is it similar to fieldPathWithTypes ?
  2. Would you be interested in a PR that adds a convenience function for building the PathFilter filter function from a simple string? This could be something like fieldPathWithTypes or it could be more verbose (to match against something like GoString(). I believe the only type that can't be easily translated from a string representation would be Transform.
  3. Would you be interested in a PR that adds one or more examples for using FilterPath (based on feedback from these other questions) ?

dnephin avatar Mar 05 '18 00:03 dnephin

A FilterPath for fields that have values set from time.Now() (time.Time, or time.Duration)

If you're just trying to ignore a single struct field, does cmpopts.IgnoreFields suit your needs?

What is the expected/"best practice" way of using PathStep? Is it similar to fieldPathWithTypes ?

Iterating over the steps is the correct approach. Path.String is probably stable (hard to imagine how it would change), but Path.GoString can and has slightly changed over time.

Would you be interested in a PR that adds a convenience function for building the PathFilter filter function from a simple string?

Interestingly, when I first built cmp, I had a helper function that did exactly that. The rationale was that filtering by struct field names alone would be a very common use-case. That's actually why Path.String only returns struct fields names. However, after discussions with others and playing around with it, I decided that the lack of type safety was not worth the ease of use it brought. For the time being, I still prefer to have more type safety (e.g., the cmpopts.IgnoreFields function requires you to pass in a zero-value of the struct so that we can verify that the field specifier really does exist).

Would you be interested in a PR that adds one or more examples for using FilterPath (based on feedback from these other questions) ?

It's been on my radar for a while to write a tutorial for how to use cmp. The examples today are not great. The tutorial would either be in the form of a GitHub wiki article (with a link from godoc) or in the form of a series of godoc Examples that go from incredibly simple to more complex. I haven't figure out what I'm going to do there.

If you think you have a good example to contribute, I'm willing to take a look.

dsnet avatar Mar 05 '18 20:03 dsnet

If you're just trying to ignore a single struct field, does cmpopts.IgnoreFields suit your needs?

I would like to avoid having to ignore it. I want to use a Comparer, but only for specific fields (ex: https://github.com/gotestyourself/gotestyourself/pull/62/files#diff-3c98f0bec66867cf0fc114dece491b4aR80). The Comparer will verify that both values are non-zero and within some threshold of each other. If the field were just ignored the test would pass even if the field had a zero value, which is bad if you want the field to have a value of time.Now() .

I decided that the lack of type safety was not worth the ease of use it brought

I think it would be possible to make something that is type safe, ex: https://github.com/dnephin/gotestyourself/commit/002bdd865d689a1b2f80b253a4dc1199846843b7

Every PathStep type has a specific notation, so the type of each path is checked, not just the string matching a name.

Edit: I think a similar approach could be done with a set of functions instead of parsing a string. Each step would be matched using something like:

pathFilter := Path(IsField("Summary"), IsSlice(), IsField("Elapsed"))

which would match a path of []PathStep{StructField{Name: "Summary"}, SliceIndex{}, StructField{Name: "Elapsed"}}

I'll try to put together a clean example of FilterPath.

dnephin avatar Mar 05 '18 21:03 dnephin

What I mean by lack of type safety is that:

type Foo struct { Hello string }
var x, y Foo
cmp.Equal(x, y, cmp.FilterPath(Spec("Helllo"), cmp.Ignore()))

does not detect that "Helllo" is misspelled. That is because Spec has no type information regarding the target type specifier is for.

dsnet avatar Mar 05 '18 21:03 dsnet

Ah yes, I see what you mean. I think generally these failures to match would be obvious when the test fails, but probably not easy to debug.

Don't you still have that problem with the custom path filters? You still have to compare step.Name() == name, or is there some other way? You could just compare the type, not the field name, but that would match multiple fields.

I created a prototype of the second approach (using functions instead of parsing a string): https://github.com/gotestyourself/gotestyourself/compare/master...dnephin:gocmp-opts-2

With this approach a path would be:

	path := Path(
		Type(node{}),
		Step(Field("Ref"), Type(&node{})),
		Indirect,
		Step(Field("Children"), Type([]node{})),
		Slice,
		Step(Field("Labels"), Type(map[string]node{})),
		MapKey("first"),
		Step(Field("Value"), Type(0)),
	)

It seems to me like this is unnecessarily verbose. If you just checked the type of one step was a reference, slice, or map, then you know the next will need to be Indirect, Slice, or Map.

With less strict matching logic this could be (the naming needs some work):

	partial := PathPartial(
		Step(Field("Ref"), Type(&node{})),
		Step(Field("Children"), Type([]node{})),
		Step(Field("Labels"), Type(map[string]node{})),
		Step(Field("Value"), Type(0)),
	)

With this approach a user can check all the types if they want to, or reduce it to just field matching. Does something like this look more viable?

dnephin avatar Mar 06 '18 03:03 dnephin

I looked around cmpopts a bit and found filterField, which would perfect:

https://github.com/google/go-cmp/blob/0c93777250bd58222171888d1651979319d8b381/cmp/cmpopts/struct_filter.go#L15-L28

This is much better than having to define the entire path, and should be type safe. What's the status of "concerns of how helper filters can be composed together easily" ?

Another option, which I'm also fond of, would be to expose just the filter part:

func FieldFilter(typ interface{}, name string) func(p cmp.Path) bool {
	sf := newStructFilter(typ, name)
	return sf.filter
}

I'd like to open a PR that exports one of these.

dnephin avatar Mar 08 '18 20:03 dnephin

I remember now some of my concerns. I wished I had expanded more on them when I wrote that TODO.

1. Returning only func(cmp.Path) bool and func(interface{}) bool doesn't compose well Suppose you have a cmpopts helper that returns a filter that is a function of both the values and the path, you can't easily compose those two together. You would need to return both function signatures and require the user to manually call FilterPath and FilterValues on them. It's clunky to use:

pf, vf := cmpopts.HelperFilters(...)
opt := cmp.FilterPath(pf, cmp.FilterValues(vf, ...))

For that reason, I'm not sure a helper that returns a func(cmp.Path) bool is setting a good precedence.

2. Should FilterField(MyStruct{}, "MyField", opts) apply exactly on MyStruct.MyField or the entire sub-tree beneath it? In the case of IgnoreFields it doesn't matter since there is no sub-tree underneath. However, once you generalize it, you have to answer this question. Sometimes you want the exact node, other-times you want the entire sub-tree.

dsnet avatar Mar 13 '18 21:03 dsnet

Thanks for the explanation.

My initial reaction is:

1. Returning only func(cmp.Path) bool and func(interface{}) bool doesn't compose well

To provide a filter of both path and values why not just expose that as an cmp.Option directly (instead of returning separately filters) ?

func HelperFilters(..., opt cmp.Option) cmp.Option { 
    ...
    return cmp.FilterPath(pf, cmp.FilterValues(vf, ...))
}

This helper can be combined with any other Option. This is already the pattern used by IgnoreFields and IgnoreTypes.

I think that is a good argument against the "alternate option" from my previous comment.

2. Should FilterField(MyStruct{}, "MyField", opts) apply exactly on MyStruct.MyField or the entire sub-tree beneath it?

I was thinking about this as well. FilterField is definitely not a complete solution. To match against anything that isn't a field (slice index, map keys, indirect, or transform) FilterField would not work and you'd need something like my Path prototype, or a custom path matching function.

For the cases where FilterField is appropriate the field is either a scalar type or compound type. For scalar types there is no ambiguity, both prefix path matching and exact path matching would be equivalent.

For the case where the field has a slice or map value an exact match is appropriate because trying to match against specified indexes or keys isn't support by FilterField (as mentioned above). For fields which are struct values FilterField would need to be used with the most specific struct. If that isn't possible (because there are multiple fields with the same struct type, but you only want to match one) then once again FilterField is not appropriate.

So for all the cases where FilterField is appropriate and useful it seems like exact match is correct.

dnephin avatar Mar 13 '18 22:03 dnephin

I apologize for the delay in replying. Adding Filter helpers in cmpopts should probably come after I figure out what the right approach is in #36 as I strongly suspect that will have an impact on the API of Filter. It overlaps with first-class Filter helpers in that it needs to address the issue of single-node vs. sub-tree as well.

dsnet avatar Mar 21 '18 19:03 dsnet

Fair enough. I've read over and commented on #36 to see if I understand the problem correctly.

For now I've implemented PathField: https://github.com/gotestyourself/gotestyourself/pull/86/files#diff-3c98f0bec66867cf0fc114dece491b4aR99

dnephin avatar Mar 28 '18 20:03 dnephin