java-driver icon indicating copy to clipboard operation
java-driver copied to clipboard

JAVA-3118: Add support for vector data type in Schema Builder, QueryBuilder

Open SiyaoIsHiding opened this issue 1 year ago • 4 comments

Currently, the SchemaBuilder works with vector like this:

    assertThat(
            createTable("foo")
                .withPartitionKey("k", DataTypes.INT)
                .withColumn("v", new DefaultVectorType(DataTypes.FLOAT, 3)))
        .hasCql("CREATE TABLE foo (k int PRIMARY KEY,v VECTOR<FLOAT, 3>)");

Or

assertThat(createTable("foo")
            .withPartitionKey("k", DataTypes.INT)
            .withColumn("v", DataTypes.custom("org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,3)")
            ))
            .hasCql("CREATE TABLE foo (k int PRIMARY KEY,v VECTOR<FLOAT, 3>)");

Please let me know if you want something like .withColumn("v", DataTypes.vector(DataTypes.FLOAT, 3)).

SiyaoIsHiding avatar May 06 '24 19:05 SiyaoIsHiding

what's the CASSANDRA ticket for this ?

i will test this downstream here:

  • https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStoreConfig.java#L551-L568
  • https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStoreConfig.java#L502-L512
  • https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStore.java#L342-L345

michaelsembwever avatar May 08 '24 13:05 michaelsembwever

i can't get this to compile

[ERROR] Failed to execute goal org.revapi:revapi-maven-plugin:0.10.5:check (default) on project java-driver-query-builder: The following API problems caused the build to fail:
[ERROR] java.method.addedToInterface: method com.datastax.oss.driver.api.querybuilder.select.Select com.datastax.oss.driver.api.querybuilder.select.Select::orderBy(com.datastax.oss.driver.api.querybuilder.select.Ann): Method was added to an interface.
[ERROR]

am i doing something wrong ?

michaelsembwever avatar Jun 11 '24 16:06 michaelsembwever

Is there a separate ticket for vector similarity functions ? https://cassandra.apache.org/doc/latest/cassandra/developing/cql/functions.html#vector-similarity-functions

michaelsembwever avatar Jun 11 '24 20:06 michaelsembwever

One other thing worth mentioning: the Cassandra impl also supports a way to get "the similarity calculation of the best scoring node closest to the query data as part of the results". Take a look at the similarity_dot_product() function (and the other choices as well) in the relevant Cassandra docs. The query builder should have support for those as well.

absurdfarce avatar Sep 20 '24 23:09 absurdfarce

The revapi thing is fixed and the vector similarity function is already supported by the existing Function term. I added tests for it as examples: https://github.com/apache/cassandra-java-driver/blob/19148d5cb9e2e2975a4d503358e9fae8737e0fcc/query-builder/src/test/java/com/datastax/oss/driver/api/querybuilder/select/SelectSelectorTest.java#L235-L274

In terms of the spring-ai downstream, as we won't actually break any API, is there anything we should test or how?

SiyaoIsHiding avatar Oct 03 '24 02:10 SiyaoIsHiding

Good call on the similarity_* functions @SiyaoIsHiding!

absurdfarce avatar Oct 09 '24 16:10 absurdfarce

I assume @lukasz-antoniak comment here about adding getAnn() in DefaultSelect should be addressed here. Thank you for your suggestion! It's added.

SiyaoIsHiding avatar Oct 10 '24 02:10 SiyaoIsHiding

I'm not 💯 sure what the accessors on DefaultSelect are intended to do but I suppose it does make sense to keep it consistent and add an accessor for the Ann object.

Either way I'm satisfied with where this stands now... 👍 from me!

absurdfarce avatar Oct 10 '24 07:10 absurdfarce