JAVA-3118: Add support for vector data type in Schema Builder, QueryBuilder
Currently, the SchemaBuilder works with vector like this:
assertThat(
createTable("foo")
.withPartitionKey("k", DataTypes.INT)
.withColumn("v", new DefaultVectorType(DataTypes.FLOAT, 3)))
.hasCql("CREATE TABLE foo (k int PRIMARY KEY,v VECTOR<FLOAT, 3>)");
Or
assertThat(createTable("foo")
.withPartitionKey("k", DataTypes.INT)
.withColumn("v", DataTypes.custom("org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,3)")
))
.hasCql("CREATE TABLE foo (k int PRIMARY KEY,v VECTOR<FLOAT, 3>)");
Please let me know if you want something like .withColumn("v", DataTypes.vector(DataTypes.FLOAT, 3)).
what's the CASSANDRA ticket for this ?
i will test this downstream here:
- https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStoreConfig.java#L551-L568
- https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStoreConfig.java#L502-L512
- https://github.com/spring-projects/spring-ai/blob/main/vector-stores/spring-ai-cassandra/src/main/java/org/springframework/ai/vectorstore/CassandraVectorStore.java#L342-L345
i can't get this to compile
[ERROR] Failed to execute goal org.revapi:revapi-maven-plugin:0.10.5:check (default) on project java-driver-query-builder: The following API problems caused the build to fail:
[ERROR] java.method.addedToInterface: method com.datastax.oss.driver.api.querybuilder.select.Select com.datastax.oss.driver.api.querybuilder.select.Select::orderBy(com.datastax.oss.driver.api.querybuilder.select.Ann): Method was added to an interface.
[ERROR]
am i doing something wrong ?
Is there a separate ticket for vector similarity functions ? https://cassandra.apache.org/doc/latest/cassandra/developing/cql/functions.html#vector-similarity-functions
One other thing worth mentioning: the Cassandra impl also supports a way to get "the similarity calculation of the best scoring node closest to the query data as part of the results". Take a look at the similarity_dot_product() function (and the other choices as well) in the relevant Cassandra docs. The query builder should have support for those as well.
The revapi thing is fixed and the vector similarity function is already supported by the existing Function term. I added tests for it as examples: https://github.com/apache/cassandra-java-driver/blob/19148d5cb9e2e2975a4d503358e9fae8737e0fcc/query-builder/src/test/java/com/datastax/oss/driver/api/querybuilder/select/SelectSelectorTest.java#L235-L274
In terms of the spring-ai downstream, as we won't actually break any API, is there anything we should test or how?
Good call on the similarity_* functions @SiyaoIsHiding!
I assume @lukasz-antoniak comment here about adding getAnn() in DefaultSelect should be addressed here. Thank you for your suggestion! It's added.
I'm not 💯 sure what the accessors on DefaultSelect are intended to do but I suppose it does make sense to keep it consistent and add an accessor for the Ann object.
Either way I'm satisfied with where this stands now... 👍 from me!