No way to query past Query.Prefix?
Hi gents!
Currently, Query.Prefix is used to do the initial seek when a Query begins, and iteration only continues while that prefix holds. There needs to be a flag that basically says, hey, don't use Prefix once iteration has begun. Otherwise, there's no way to effectively do a seek and do an arbitrary amount of scanning. This seems like a critical feature to leave behind, yikes.
Hopefully, I'm missing something!?
For example, in go-ds-badger: (line 270+)
it := txn.NewIterator(opt)
it.Seek(prefix)
...
qrb.Process.Go(func(worker goprocess.Process) {
...
for sent := 0; it.ValidForPrefix(prefix); sent++ {
...
}
}
...
could be enhanced to:
it := txn.NewIterator(opt)
it.Seek(prefix)
if q.PrefixOnlySeeks {
prefix = nil
}
...
Adding this would be easy, but all the impls would need to be updated of course. So maybe it's done as a selectively available feature?
This is a dealbreaker for us, but we love Datastore and want to use and support it! If some at PL agrees to accept our pull requests in a timely fashion, we'll help do the work. I'm sad we're now in this position, and would take less than a few mins for dbs like leveldb, badger, and bolt. What do y'all think?
We kindly need resolution on this relatively soon one way or the other so that we know we need to pass on Datastore.
Thanks, Drew
Some datastores may not be able to support this (like flatfs, but it doesn't even support ordered iterating, so it's probably fine).
:+1: from me, just make sure that it's optional and not enabled by default (default = zero value)
cc @Stebalien @bigs
Ok, cool, thanks for the quick response.
How about I whip up some changes, post them here for initial review/discussion and if they look good, we can take it from there.
Ok, so so here's my proposed change to Query.
query.go ~ line 60
type Query struct {
Prefix string // namespaces the query to results whose keys have Prefix
IterationPrefix string // if set, this overrides Prefix during query iteration (n/a for some impls)
At first I was thinking a bool as I described in my prev post, but the drawback is that client would be forced to step entry by entry in order to close the query (as she checks each Result.Entry.Key to see if iteration has proceeded sufficiently far). That would mean the client couldn't use Query.Rest() since iteration would go on and on.
So If we go with adding the IterationPrefix field (above), the behavior augmentation is clear and well-defined. Impl changes are also trivial and unambiguous.
If y'all like this, give me the green light and I'll start submitting the changes. I'm OCD when it comes to naming and comments, so don't be shy for any name, style, or comment edits -- I get it. :)
Prefix/IterationPrefix don't really seem intuitive, how about Prefix/StartPrefix:
type Query struct {
Prefix string
StartPrefix string
[...]
}
func Query[...]
if q.StartPrefix == "" {
q.StartPrefix = q.Prefix
}
if !strings.HasPrefix(q.StartPrefix, q.Prefix) {
return errors.New("StartPrefix not within Prefix")
}
[...]
it.Seek(q.StartPrefix)
for [...] it.ValidForPrefix(q.Prefix)
Love it -- great suggestion!
SeekPrefix or StartPrefix? SeekPrefix may more accurately describe the behavior for impls that literally have seek (vs prefix) behavior.
Yeah, SeekPrefix makes more sense
i'm definitely not against this—i think it could be quite useful. i think we'll need to update datastores that don't natively support this to provide it or at least heavily document its absence.
How about:
type Query struct {
Prefix string // namespaces the query to results whose keys have Prefix
SeekPrefix string // if set, an alternate to Prefix that initially positions the query (n/a for some impls)
...
Ok, so I made the discussed changes to leveldb, badger, and bolt -- changes were clean, cost-free, were only a couple lines, and have been tested. Meanwhile, I'm not familiar with S3 to know the obvious path, so I was going to pass on S3 for now. If y'all want, I can have go-ds-S3/s3.go::Query() return an unimp error if Query.SeekPrefix is set -- just let me know.
Feel free to open the PRs, it's easier to discuss changes there. (https://github.com/whyrusleeping/sql-datastore/ will also need support)
ok great
Can you link the PRs here? (makes it easier for others to follow)
Oops, yeah good idea. :D
4 PRs submitted. Will do SQL once we're past this step if that werks.
The 3 ds impl changes will of course get compile fails until the Query.SeekPrefix change is merged.
ds.Query: https://github.com/ipfs/go-datastore/pull/119
badger: https://github.com/ipfs/go-ds-badger/pull/47
leveldb: https://github.com/ipfs/go-ds-leveldb/pull/26
bolt: (I also added support for Query.Prefix) https://github.com/ipfs/go-ds-bolt/pull/1
Note: S3 calls this "StartAfter" (but SeekPrefix is fine with me).
Question: How should this interact with sorting by value?
There are two reasonable solutions I can think of:
- Apply after sorting. This is useful for resuming query.
- Apply before sorting. This is useful for, e.g., scanning a range of dates. Maybe we even want to replace this field with a range (Start, Stop)? This is also likely easier to implement everywhere.
Thoughts?
if we want to stay in line with leveldb and descendants, it should happen after the sort. it feels like the semantics of the alternative would almost be undefined
On Wed, Feb 6, 2019 at 18:40 Steven Allen [email protected] wrote:
Question: How should this interact with sorting by value?
There are two reasonable solutions I can think of:
- Apply after sorting. This is useful for resuming query.
- Apply before sorting. This is useful for, e.g., scanning a range of dates. Maybe we even want to replace this field with a range (Start, Stop)? This is also likely easier to implement everywhere.
Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/go-datastore/issues/116#issuecomment-461232579, or mute the thread https://github.com/notifications/unsubscribe-auth/AANBWgXTFd3FTp8lT_8BHrSsA7iTTc5eks5vK2gJgaJpZM4agZRr .
Liking that Start Stop idea, and it maps well to most impls (tho I can't speak for S3). Steb, as you raised in the other thread, there's the bookkeeping factor w/ ascending vs descending order thing, so that's the main PITA factor I see. With a Stop though, that means there'd now be another reason for a query to stop (which is useful b/c even w/ the SeekPrefix I'm using for now, for example, I have to check each result key to see if it's time to stop). So I'm seeing that Start/Stop is a nice alternative to Prefix (using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).
So the bottom line seems to be that Start+Stop seems to be a wealthier alternative to Prefix, but having both may be over-defined-ish.
I'd happily help support a switch to Start+Stop if that's what others would be into. I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.
So I'm seeing that Start/Stop is a nice alternative to Prefix
Yes.
(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).
Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).
I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.
We have a shared test suite in the test directory of this package (the entrypoint is SubtestAll). However, we're not using it everywhere and it's far from complete. See https://github.com/ipfs/go-ds-leveldb/blob/d9761ea1cccd53b8a27a50c1e6a2c4e975567699/ds_test.go#L325-L329 for an example.
But yeah, a complete shared test suite would be amazing.
I'd happily help support a switch to Start+Stop if that's what others would be into.
Thoughts @magik6k, @bigs? TL;DR: "seeking" implies seeking through some ordered set while a range is just a range of keys. IMO, this better matches user needs and should be easier to implement correctly.
yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.
(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).
Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).
Well, my thought was to define the Stop key as inclusive, which of course means either <= or >=, depending on ascending/descending keys. This would appear to fit in everywhere. So maybe instead of Start/Stop, it's Start/Last, First/Last, or Seek/Last?
i think the [Start, Stop) are a bit more reasonable, even if they require a little extra legwork for the Prefix case. that's not a particularly strong feeling, however, so feel free to disregard if you feel the implementation benefits. for your semantics, [Start, End] may suit as names
which of course means either <= or >=, depending on ascending/descending keys.
With start/stop, I wouldn't take order into account (we also support things like ordering by value). That is, start/stop mean "all keys between start and stop" where start <= stop. That's really the main reason I prefer start/stop over seek.
Well, my thought was to define the Stop key as inclusive,
Stop inclusive can't mimic Prefix. For example, [/a, /a/zzz] doesn't include /a/zzzz.
Note: We could also just use a custom range type:
type Bound {
Value string
Exclusive bool
}
func (b Bound) Open() bool {
return b == Bound{} // zero value is "open"
}
func (b Bound) Before(key string) bool {
if b.Exclusive {
return bound < key
} else if key == "" {
return true
} else {
return bound <= key
}
}
func (b Bound) After(key string) bool {
if b.Exclusive {
return key < bound
} else if key == "" {
return true // open range
} else {
return key <= bound
}
}
type Range struct {
Start, Stop Bound
}
func (r *Range) Contains(key string) bool {
return r.Start.Before(key) && r.Stop.After(key)
}
yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.
You mean get rid of Query.Prefix?
my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.
re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.
my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.
Got it. Yeah, that would also work (although I mildly prefer ranges).
re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.
It just feels like an unnecessary breaking change (although we could deprecate it). With prefix and start/stop, I assume you'd turn the prefix into a range and then take the intersection. We can even add helper functions for this.
Range code: https://gist.github.com/66d63607ee56de9192025c8606f57e3b
(needs tests).
Really, this is starting to make me like my ranges less... (but at least that code only has to be written once).
Is there anything notable blocking from this landing in master? Would be great to use proper ranges, and if possible, let the underlying storage backend optimize range queries.
It needs to be implemented (correctly) in badger, flatfs, s3, etc. and that's a fair amount of work, especially because it interacts with other parts of the query system.
Would it make sense to scope this to another interface, something like RangeDatastore or something.
Implemented as:
type RangeDatastore interface {
Datastore
QueryRange(ctx context.Context, q query.Range)
}
Implementation based on your range code gist
This way it can be implemented by only those datastore implementations that actually support this kind of Query effectiently, and not worry about the weird edge case of implementing range queries on those that don't make sense to.
Unfortunately, no. Go has some pretty big holes in its composition story until we get generics. We'd:
- Have to implement this for, e.g., the "Mount" datastore.
- We'd then need to deal with some sub-datastores potentially not implementing it.
To be clear, I'd very much like this feature, someone just needs to submit patches to implement and test it.
I'd also be open to removing some features of our query system that we frankly don't use to make this more approachable.
- The ability to sort by anything but the key. No datastore supports this efficiently.
- The ability to seek to some offset by index. This feature isn't currently used or even all that usable and would be made redundant by the proposed feature.
Together, these two "features" are responsible for most of the complexity in our query system, and are, as far as I know, completely unused.
Are you up for doing that?