polymath icon indicating copy to clipboard operation
polymath copied to clipboard

Remove `sort`, `sort_reversed`, and `seed` from the protocol

Open dglazkov opened this issue 3 years ago • 6 comments

Instead of having a special kind of sort, consider sending a random vector:

import polymath

import numpy as np

library = polymath.Library(filename="libraries/wdl-library.json")
query_vector = np.random.rand(1536)
query = polymath.Library.base64_from_vector(query_vector)
result = library.query(
    version=1,
    query_embedding=query,
    count=1000,
    query_embedding_model="openai.com:text-embedding-ada-002",
    sort="similarity")
print('random test\n\n')
for bit in result.bits:
    print(bit.text)

dglazkov avatar Jan 30 '23 04:01 dglazkov

Here's my intuition: I don't think we're going to use these. Let's just remove them?

@jkomoros WDYT?

dglazkov avatar Feb 04 '23 21:02 dglazkov

Add them as we find use case maybe

dglazkov avatar Feb 04 '23 21:02 dglazkov

Yeah, I agree we can remove them. sort_reverse doesn't actually have a use case, it's a thing that might hypothetically be useful in the future if there's a sort where you want to have a reverse order (but that's hypothetical).

In the future we might want to support other kinds of sorts, like similarity-per-token but we can cross that bridge when we come to it.

There is a similar distinction between pick a random space in the embedding space and return all of the bits closest to it and give me a totally random collection of bits. The latter is useful theoretically for things like "What are some concepts in this library." But it's also useful for the use case of scraping content from the library (see #94) by continually just fetching random bits. And anyway we can just make it "if you don't pass a query_embedding, we return a random set of results up to the content length limit"

The seed is only useful if you do want a random sort (which is not obvious we do; the random embedding use case is probably better way of doing that), and you want a way to express "don't cache the request I just did, give me a different random sort" by passing a different seed.

I imagine in the future that there will be use cases for different sorts, but for now it's probably best to just rip out the machinery given that we don't need it right now, in the interest of keeping semantics simpler.

jkomoros avatar Feb 05 '23 00:02 jkomoros

@dglazkov Wanderer doesn't currently use sort=random anymore does it? If not I might just start ripping this out

jkomoros avatar Feb 05 '23 00:02 jkomoros

Yep, it doesn't use it anymore. I am using the random vector thingy now.

dglazkov avatar Feb 05 '23 00:02 dglazkov

  • [x] Document the protocol somewhere, including all arguments and what they do, now that Library.query() uses **kwds it's hard to tell
  • [x] Remove sort from protocol
  • [ ] Shouldn't Library.query() set sort=similarity, not _produce_query_result?

jkomoros avatar Feb 05 '23 13:02 jkomoros