kNN Query Limit Error: "k value in knn query too large, provided X and the limit is 4096"
When performing k-Nearest Neighbors (kNN) queries using the sqlite-vec extension, queries with a LIMIT or k value greater than the configured maximum (vec_max_k) result in an OperationalError. The default maximum k value is 4096, which can cause issues when trying to retrieve a larger number of results. Is there a possibility to extend the limit?
I also encounter this issues in latest version (Python 0.1.6), and surprised why a kNN query must have harder limit.. I think the maximum limit must the length of row, so we can get all possibilities regardless the score returned..
Hello! So that 4096 limit was attacked to mitigate any possible denial of service attacks, because the results of KNN queries are stored in memory.
I wanted to avoid an attacker adding a k = 99999999 clause to a query and exhausting all the memory of an application.
That being said, making this a configurable settings makes a lot of sense, so I'll take a look at including one. Probably one that can either 1) increase the limit to a new N value, or 2) removing the limit entirely. But this would be an opt-in per-table flag, since I want the default to always be safe.
+1 for making this configurable. I am migrating vector search PoC from usearch to sqlite-vec and also caught by this limit 😭 Fortunately, it's not the size of any buffer, etc., so I can easily compile the lib with increased limits and try it out in practice
vec0 works blazingly fast!
Hi, @asg017, excellent work with this extension!
Are there plans for making this configurable in the near future?
Best, Paweł
Just curious what k value are you trying to use? Will likely temporarily bump it to 16,384 in the next release, and add a proper configurable setting soon after
Also, for folks who find this thread who need no limit — keep in mind that internally sqlite-vec uses a O(n^2) algorithm internally on the value of k, so even if you could use a larger number ,it's likely it would be slow. Consider instead storing vectors "manually" outside of a vec0 virtual table, in which case you can use whatever LIMIT value as you wish.