go-datastore icon indicating copy to clipboard operation
go-datastore copied to clipboard

No way to query past Query.Prefix?

Open drew-512 opened this issue 7 years ago • 31 comments

Hi gents!

Currently, Query.Prefix is used to do the initial seek when a Query begins, and iteration only continues while that prefix holds. There needs to be a flag that basically says, hey, don't use Prefix once iteration has begun. Otherwise, there's no way to effectively do a seek and do an arbitrary amount of scanning. This seems like a critical feature to leave behind, yikes.

Hopefully, I'm missing something!?

For example, in go-ds-badger: (line 270+)

it := txn.NewIterator(opt)
it.Seek(prefix)
...
qrb.Process.Go(func(worker goprocess.Process) {
    ...
    for sent := 0; it.ValidForPrefix(prefix); sent++ {
        ...
    }
}
...

could be enhanced to:

it := txn.NewIterator(opt)
it.Seek(prefix)
if q.PrefixOnlySeeks {
    prefix = nil
}
...

Adding this would be easy, but all the impls would need to be updated of course. So maybe it's done as a selectively available feature?

This is a dealbreaker for us, but we love Datastore and want to use and support it! If some at PL agrees to accept our pull requests in a timely fashion, we'll help do the work. I'm sad we're now in this position, and would take less than a few mins for dbs like leveldb, badger, and bolt. What do y'all think?

We kindly need resolution on this relatively soon one way or the other so that we know we need to pass on Datastore.

Thanks, Drew

drew-512 avatar Feb 04 '19 00:02 drew-512

Some datastores may not be able to support this (like flatfs, but it doesn't even support ordered iterating, so it's probably fine).

:+1: from me, just make sure that it's optional and not enabled by default (default = zero value)

cc @Stebalien @bigs

magik6k avatar Feb 04 '19 00:02 magik6k

Ok, cool, thanks for the quick response.

How about I whip up some changes, post them here for initial review/discussion and if they look good, we can take it from there.

drew-512 avatar Feb 04 '19 02:02 drew-512

Ok, so so here's my proposed change to Query.

query.go ~ line 60

type Query struct {
	Prefix            string   // namespaces the query to results whose keys have Prefix
	IterationPrefix   string   // if set, this overrides Prefix during query iteration (n/a for some impls)

At first I was thinking a bool as I described in my prev post, but the drawback is that client would be forced to step entry by entry in order to close the query (as she checks each Result.Entry.Key to see if iteration has proceeded sufficiently far). That would mean the client couldn't use Query.Rest() since iteration would go on and on.

So If we go with adding the IterationPrefix field (above), the behavior augmentation is clear and well-defined. Impl changes are also trivial and unambiguous.

If y'all like this, give me the green light and I'll start submitting the changes. I'm OCD when it comes to naming and comments, so don't be shy for any name, style, or comment edits -- I get it. :)

drew-512 avatar Feb 04 '19 15:02 drew-512

Prefix/IterationPrefix don't really seem intuitive, how about Prefix/StartPrefix:

type Query struct {
  Prefix            string
  StartPrefix     string
[...]
}

func Query[...]
  if q.StartPrefix == "" {
    q.StartPrefix = q.Prefix
  }
  if !strings.HasPrefix(q.StartPrefix, q.Prefix) {
    return errors.New("StartPrefix not within Prefix")
  }
  [...]

  it.Seek(q.StartPrefix)
  for [...] it.ValidForPrefix(q.Prefix)

magik6k avatar Feb 04 '19 16:02 magik6k

Love it -- great suggestion!

drew-512 avatar Feb 04 '19 16:02 drew-512

SeekPrefix or StartPrefix? SeekPrefix may more accurately describe the behavior for impls that literally have seek (vs prefix) behavior.

drew-512 avatar Feb 04 '19 16:02 drew-512

Yeah, SeekPrefix makes more sense

magik6k avatar Feb 04 '19 17:02 magik6k

i'm definitely not against this—i think it could be quite useful. i think we'll need to update datastores that don't natively support this to provide it or at least heavily document its absence.

bigs avatar Feb 04 '19 18:02 bigs

How about:

type Query struct {
	Prefix            string   // namespaces the query to results whose keys have Prefix
	SeekPrefix        string   // if set, an alternate to Prefix that initially positions the query (n/a for some impls)
...

drew-512 avatar Feb 04 '19 18:02 drew-512

Ok, so I made the discussed changes to leveldb, badger, and bolt -- changes were clean, cost-free, were only a couple lines, and have been tested. Meanwhile, I'm not familiar with S3 to know the obvious path, so I was going to pass on S3 for now. If y'all want, I can have go-ds-S3/s3.go::Query() return an unimp error if Query.SeekPrefix is set -- just let me know.

drew-512 avatar Feb 06 '19 16:02 drew-512

Feel free to open the PRs, it's easier to discuss changes there. (https://github.com/whyrusleeping/sql-datastore/ will also need support)

magik6k avatar Feb 06 '19 16:02 magik6k

ok great

drew-512 avatar Feb 06 '19 16:02 drew-512

Can you link the PRs here? (makes it easier for others to follow)

magik6k avatar Feb 06 '19 17:02 magik6k

Oops, yeah good idea. :D

4 PRs submitted. Will do SQL once we're past this step if that werks.

The 3 ds impl changes will of course get compile fails until the Query.SeekPrefix change is merged.

ds.Query: https://github.com/ipfs/go-datastore/pull/119

badger: https://github.com/ipfs/go-ds-badger/pull/47

leveldb: https://github.com/ipfs/go-ds-leveldb/pull/26

bolt: (I also added support for Query.Prefix) https://github.com/ipfs/go-ds-bolt/pull/1

drew-512 avatar Feb 06 '19 17:02 drew-512

Note: S3 calls this "StartAfter" (but SeekPrefix is fine with me).

Stebalien avatar Feb 06 '19 18:02 Stebalien

Question: How should this interact with sorting by value?

There are two reasonable solutions I can think of:

  1. Apply after sorting. This is useful for resuming query.
  2. Apply before sorting. This is useful for, e.g., scanning a range of dates. Maybe we even want to replace this field with a range (Start, Stop)? This is also likely easier to implement everywhere.

Thoughts?

Stebalien avatar Feb 06 '19 23:02 Stebalien

if we want to stay in line with leveldb and descendants, it should happen after the sort. it feels like the semantics of the alternative would almost be undefined

On Wed, Feb 6, 2019 at 18:40 Steven Allen [email protected] wrote:

Question: How should this interact with sorting by value?

There are two reasonable solutions I can think of:

  1. Apply after sorting. This is useful for resuming query.
  2. Apply before sorting. This is useful for, e.g., scanning a range of dates. Maybe we even want to replace this field with a range (Start, Stop)? This is also likely easier to implement everywhere.

Thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/go-datastore/issues/116#issuecomment-461232579, or mute the thread https://github.com/notifications/unsubscribe-auth/AANBWgXTFd3FTp8lT_8BHrSsA7iTTc5eks5vK2gJgaJpZM4agZRr .

bigs avatar Feb 07 '19 00:02 bigs

Liking that Start Stop idea, and it maps well to most impls (tho I can't speak for S3). Steb, as you raised in the other thread, there's the bookkeeping factor w/ ascending vs descending order thing, so that's the main PITA factor I see. With a Stop though, that means there'd now be another reason for a query to stop (which is useful b/c even w/ the SeekPrefix I'm using for now, for example, I have to check each result key to see if it's time to stop). So I'm seeing that Start/Stop is a nice alternative to Prefix (using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

So the bottom line seems to be that Start+Stop seems to be a wealthier alternative to Prefix, but having both may be over-defined-ish.

I'd happily help support a switch to Start+Stop if that's what others would be into. I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.

drew-512 avatar Feb 07 '19 15:02 drew-512

So I'm seeing that Start/Stop is a nice alternative to Prefix

Yes.

(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).

I already have test code that tests our ds stuff (independent of the impl used), so I'd be up to make common_test.go that would run core tests for bolt, badger, SQL, leveldb, and other impls that support the whole shootin match. I was surprised it didn't already exist, but who wants to do that -- I get it. Literally the opposite of feeling productive lol.

We have a shared test suite in the test directory of this package (the entrypoint is SubtestAll). However, we're not using it everywhere and it's far from complete. See https://github.com/ipfs/go-ds-leveldb/blob/d9761ea1cccd53b8a27a50c1e6a2c4e975567699/ds_test.go#L325-L329 for an example.

But yeah, a complete shared test suite would be amazing.

I'd happily help support a switch to Start+Stop if that's what others would be into.

Thoughts @magik6k, @bigs? TL;DR: "seeking" implies seeking through some ordered set while a range is just a range of keys. IMO, this better matches user needs and should be easier to implement correctly.

Stebalien avatar Feb 12 '19 19:02 Stebalien

yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.

bigs avatar Feb 12 '19 23:02 bigs

(using Prefix is equivalent to using the same key for Start and Stop if I'm seeing things right).

Almost. Stop would have to be Start+1. E.g., /a -> /b (non-inclusive).

Well, my thought was to define the Stop key as inclusive, which of course means either <= or >=, depending on ascending/descending keys. This would appear to fit in everywhere. So maybe instead of Start/Stop, it's Start/Last, First/Last, or Seek/Last?

drew-512 avatar Feb 13 '19 15:02 drew-512

i think the [Start, Stop) are a bit more reasonable, even if they require a little extra legwork for the Prefix case. that's not a particularly strong feeling, however, so feel free to disregard if you feel the implementation benefits. for your semantics, [Start, End] may suit as names

bigs avatar Feb 13 '19 15:02 bigs

which of course means either <= or >=, depending on ascending/descending keys.

With start/stop, I wouldn't take order into account (we also support things like ordering by value). That is, start/stop mean "all keys between start and stop" where start <= stop. That's really the main reason I prefer start/stop over seek.

Well, my thought was to define the Stop key as inclusive,

Stop inclusive can't mimic Prefix. For example, [/a, /a/zzz] doesn't include /a/zzzz.


Note: We could also just use a custom range type:

type Bound {
    Value string
    Exclusive bool
}

func (b Bound) Open() bool {
    return b == Bound{} // zero value is "open"
}

func (b Bound) Before(key string) bool {
    if b.Exclusive {
        return bound < key
    } else if key == "" {
        return true
    } else {
        return bound <= key
    }
}

func (b Bound) After(key string) bool {
    if b.Exclusive {
        return key < bound
    } else if key == "" {
        return true // open range
    } else {
        return key <= bound
    }
}

type Range struct {
    Start, Stop Bound
}

func (r *Range) Contains(key string) bool {
    return r.Start.Before(key) && r.Stop.After(key)
}

yeah, i'm into it. we could even implement a member method, Prefix(string) that configures this for you. this definitely increases the power of datastores.

You mean get rid of Query.Prefix?

Stebalien avatar Feb 13 '19 18:02 Stebalien

my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.

re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.

bigs avatar Feb 13 '19 19:02 bigs

my thinking was that if Start and Stop were still performing prefix matching, ["/a", "/a"] would be equivalent to {Prefix: "/a"}. i do agree, however, that exclusive upper bounds are, perhaps, more intuitive.

Got it. Yeah, that would also work (although I mildly prefer ranges).

re: removing Prefix, i think i misunderstood earlier the intent of this change. i thought we were replacing it with stop/start functionality. i don't know how the two would interact.

It just feels like an unnecessary breaking change (although we could deprecate it). With prefix and start/stop, I assume you'd turn the prefix into a range and then take the intersection. We can even add helper functions for this.

Stebalien avatar Feb 13 '19 19:02 Stebalien

Range code: https://gist.github.com/66d63607ee56de9192025c8606f57e3b

(needs tests).

Really, this is starting to make me like my ranges less... (but at least that code only has to be written once).

Stebalien avatar Feb 13 '19 20:02 Stebalien

Is there anything notable blocking from this landing in master? Would be great to use proper ranges, and if possible, let the underlying storage backend optimize range queries.

jsimnz avatar Nov 14 '21 10:11 jsimnz

It needs to be implemented (correctly) in badger, flatfs, s3, etc. and that's a fair amount of work, especially because it interacts with other parts of the query system.

Stebalien avatar Nov 15 '21 18:11 Stebalien

Would it make sense to scope this to another interface, something like RangeDatastore or something.

Implemented as:

type RangeDatastore interface {
   Datastore
   QueryRange(ctx context.Context, q query.Range)
}

Implementation based on your range code gist

This way it can be implemented by only those datastore implementations that actually support this kind of Query effectiently, and not worry about the weird edge case of implementing range queries on those that don't make sense to.

jsimnz avatar Nov 16 '21 03:11 jsimnz

Unfortunately, no. Go has some pretty big holes in its composition story until we get generics. We'd:

  1. Have to implement this for, e.g., the "Mount" datastore.
  2. We'd then need to deal with some sub-datastores potentially not implementing it.

To be clear, I'd very much like this feature, someone just needs to submit patches to implement and test it.

I'd also be open to removing some features of our query system that we frankly don't use to make this more approachable.

  1. The ability to sort by anything but the key. No datastore supports this efficiently.
  2. The ability to seek to some offset by index. This feature isn't currently used or even all that usable and would be made redundant by the proposed feature.

Together, these two "features" are responsible for most of the complexity in our query system, and are, as far as I know, completely unused.

Are you up for doing that?

Stebalien avatar Nov 16 '21 18:11 Stebalien