| Possible issue |
✅ Correct inverted inner product metric
Suggestion Impact:The commit updated InnerProduct to return the negated dot product for both float32 and float64 cases, matching the suggestion. It also refactored SphericalDistance to use vek dot products instead of gonum, but the key impact is the sign change in InnerProduct.
code diff:
@@ -90,13 +88,13 @@
_v1 := any(v1).([]float32)
_v2 := any(v2).([]float32)
- return T(vek32.Dot(_v1, _v2)), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
-
- return T(vek.Dot(_v1, _v2)), nil
+ return T(-vek32.Dot(_v1, _v2)), nil
+
+ case []float64:
+ _v1 := any(v1).([]float64)
+ _v2 := any(v2).([]float64)
+
+ return T(-vek.Dot(_v1, _v2)), nil
default:
return 0, moerr.NewInternalErrorNoCtx("InnerProduct type not supported")
Correct the InnerProduct function to return the negative dot product. The current implementation returns a positive value, which inverts the logic of the distance metric.
pkg/vectorindex/metric/distance_func.go [87-104]
func InnerProduct[T types.RealNumbers](v1, v2 []T) (T, error) {
switch any(v1).(type) {
case []float32:
_v1 := any(v1).([]float32)
_v2 := any(v2).([]float32)
- return T(vek32.Dot(_v1, _v2)), nil
+ return T(-vek32.Dot(_v1, _v2)), nil
case []float64:
_v1 := any(v1).([]float64)
_v2 := any(v2).([]float64)
- return T(vek.Dot(_v1, _v2)), nil
+ return T(-vek.Dot(_v1, _v2)), nil
default:
return 0, moerr.NewInternalErrorNoCtx("InnerProduct type not supported")
}
}
[Suggestion processed]
Suggestion importance[1-10]: 10
__
Why: The suggestion correctly identifies a critical logic error where the InnerProduct distance metric was inverted. This would cause vector searches to return the most dissimilar vectors instead of the most similar, fundamentally breaking the feature.
| High
|
Handle zero vectors in cosine distance
In CosineDistance, check if the result of vek.CosineSimilarity is NaN. If it is, return an error to handle cases with zero-magnitude vectors, restoring the previous error-handling behavior.
pkg/vectorindex/metric/distance_func.go [106-124]
func CosineDistance[T types.RealNumbers](v1, v2 []T) (T, error) {
switch any(v1).(type) {
case []float32:
_v1 := any(v1).([]float32)
_v2 := any(v2).([]float32)
- return T(1 - vek32.CosineSimilarity(_v1, _v2)), nil
+ similarity := vek32.CosineSimilarity(_v1, _v2)
+ if math.IsNaN(float64(similarity)) {
+ return 0, moerr.NewInternalErrorNoCtx("cannot compute cosine similarity with zero vector")
+ }
+ return T(1 - similarity), nil
case []float64:
_v1 := any(v1).([]float64)
_v2 := any(v2).([]float64)
- return T(1 - vek.CosineSimilarity(_v1, _v2)), nil
+ similarity := vek.CosineSimilarity(_v1, _v2)
+ if math.IsNaN(similarity) {
+ return 0, moerr.NewInternalErrorNoCtx("cannot compute cosine similarity with zero vector")
+ }
+ return T(1 - similarity), nil
default:
return 0, moerr.NewInternalErrorNoCtx("CosineDistance type not supported")
}
}
- [ ] Apply / Chat <!-- /improve --apply_suggestion=2 -->
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly points out that a check for zero-magnitude vectors was removed, which can lead to NaN results from vek.CosineSimilarity. Reintroducing an error for this case prevents silent failures and improves the function's robustness.
| Medium
|
| High-level |
✅ The PR introduces a new dependency
Suggestion Impact:The commit removed the 'vek' dependency from go.mod and refactored code to stop using vek/vek32, replacing it with internal moarray implementations. This aligns with the suggestion to avoid adding/keeping the new dependency due to maintenance concerns.
code diff:
# File: go.mod
@@ -92,7 +92,6 @@
github.com/tidwall/pretty v1.2.1
github.com/tmc/langchaingo v0.1.13
github.com/unum-cloud/usearch/golang v0.0.0-20251130095425-a2f175991017
- github.com/viterin/vek v0.4.3
go.starlark.net v0.0.0-20250701195324-d457b4515e0e
go.uber.org/automaxprocs v1.5.3
go.uber.org/ratelimit v0.2.0
@@ -134,7 +133,6 @@
github.com/bytedance/gopkg v0.1.3 // indirect
github.com/bytedance/sonic/loader v0.3.0 // indirect
github.com/cespare/xxhash v1.1.0 // indirect
- github.com/chewxy/math32 v1.10.1 // indirect
github.com/cilium/ebpf v0.9.1 // indirect
github.com/clbanning/mxj v1.8.4 // indirect
github.com/cloudwego/base64x v0.1.6 // indirect
@@ -218,7 +216,6 @@
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/valyala/fastrand v1.1.0 // indirect
github.com/valyala/histogram v1.2.0 // indirect
- github.com/viterin/partial v1.1.0 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
go.opentelemetry.io/proto/otlp v1.3.1 // indirect
go.uber.org/atomic v1.11.0 // indirect
# File: pkg/vectorindex/metric/distance_func.go
@@ -19,108 +19,31 @@
"github.com/matrixorigin/matrixone/pkg/common/moerr"
"github.com/matrixorigin/matrixone/pkg/container/types"
- "github.com/viterin/vek"
- "github.com/viterin/vek/vek32"
- "gonum.org/v1/gonum/blas/blas32"
- "gonum.org/v1/gonum/blas/blas64"
+ "github.com/matrixorigin/matrixone/pkg/vectorize/moarray"
)
func L2Distance[T types.RealNumbers](v1, v2 []T) (T, error) {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _v2 := any(v2).([]float32)
-
- return T(vek32.Distance(_v1, _v2)), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
-
- return T(vek.Distance(_v1, _v2)), nil
-
- default:
- return 0, moerr.NewInternalErrorNoCtx("L2Distance type not supported")
- }
+ return moarray.L2Distance(v1, v2)
}
func L2DistanceSq[T types.RealNumbers](v1, v2 []T) (T, error) {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _v2 := any(v2).([]float32)
- dist := vek32.Distance(_v1, _v2)
-
- return T(dist * dist), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
- dist := vek.Distance(_v1, _v2)
-
- return T(dist * dist), nil
-
- default:
- return 0, moerr.NewInternalErrorNoCtx("L2DistanceSq type not supported")
- }
+ return moarray.L2DistanceSq(v1, v2)
}
func L1Distance[T types.RealNumbers](v1, v2 []T) (T, error) {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _v2 := any(v2).([]float32)
-
- return T(vek32.ManhattanDistance(_v1, _v2)), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
-
- return T(vek.ManhattanDistance(_v1, _v2)), nil
-
- default:
- return 0, moerr.NewInternalErrorNoCtx("L1Distance type not supported")
- }
+ return moarray.L1Distance(v1, v2)
}
func InnerProduct[T types.RealNumbers](v1, v2 []T) (T, error) {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _v2 := any(v2).([]float32)
-
- return T(vek32.Dot(_v1, _v2)), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
-
- return T(vek.Dot(_v1, _v2)), nil
-
- default:
- return 0, moerr.NewInternalErrorNoCtx("InnerProduct type not supported")
- }
+ return moarray.InnerProduct(v1, v2)
}
func CosineDistance[T types.RealNumbers](v1, v2 []T) (T, error) {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _v2 := any(v2).([]float32)
+ return moarray.CosineDistance(v1, v2)
+}
- return T(1 - vek32.CosineSimilarity(_v1, _v2)), nil
-
- case []float64:
- _v1 := any(v1).([]float64)
- _v2 := any(v2).([]float64)
-
- return T(1 - vek.CosineSimilarity(_v1, _v2)), nil
-
- default:
- return 0, moerr.NewInternalErrorNoCtx("CosineDistance type not supported")
- }
-
+func CosineSimilarity[T types.RealNumbers](v1, v2 []T) (T, error) {
+ return moarray.CosineSimilarity(v1, v2)
}
// SphericalDistance is used for InnerProduct and CosineDistance in Spherical Kmeans.
@@ -132,20 +55,9 @@
// Compute the dot product of the two vectors.
// The dot product of two vectors is a measure of their similarity,
// and it can be used to calculate the angle between them.
- dp := float64(0)
-
- switch any(v1).(type) {
- case []float32:
- _v1 := blas32.Vector{N: len(v1), Inc: 1, Data: any(v1).([]float32)}
- _v2 := blas32.Vector{N: len(v2), Inc: 1, Data: any(v2).([]float32)}
- dp = float64(blas32.Dot(_v1, _v2))
-
- case []float64:
- _v1 := blas64.Vector{N: len(v1), Inc: 1, Data: any(v1).([]float64)}
- _v2 := blas64.Vector{N: len(v2), Inc: 1, Data: any(v2).([]float64)}
- dp = blas64.Dot(_v1, _v2)
- default:
- return 0, moerr.NewInternalErrorNoCtx("SphericalDistance type not supported")
+ var dp T
+ for i := range v1 {
+ dp += v1[i] * v2[i]
}
// Prevent NaN with acos with loss of precision.
@@ -155,33 +67,14 @@
dp = -1.0
}
- theta := math.Acos(dp)
+ theta := math.Acos(float64(dp))
//To scale the result to the range [0, 1], we divide by Pi.
return T(theta / math.Pi), nil
}
func NormalizeL2[T types.RealNumbers](v1 []T, normalized []T) error {
- switch any(v1).(type) {
- case []float32:
- _v1 := any(v1).([]float32)
- _normalized := any(normalized).([]float32)
-
- copy(_normalized, _v1)
- vek32.DivNumber_Inplace(_normalized, vek32.Norm(_v1))
-
- case []float64:
- _v1 := any(v1).([]float64)
- _normalized := any(normalized).([]float64)
-
- copy(_normalized, _v1)
- vek.DivNumber_Inplace(_normalized, vek.Norm(_v1))
-
- default:
- return moerr.NewInternalErrorNoCtx("NormalizeL2 type not supported")
- }
-
- return nil
+ return moarray.NormalizeL2(v1, normalized)
}
func ScaleInPlace[T types.RealNumbers](v []T, scale T) {
@@ -259,7 +152,7 @@
var distanceFunction DistanceFunction[T]
switch metric {
case Metric_L2Distance:
- distanceFunction = L2Distance[T]
+ distanceFunction = L2DistanceSq[T]
case Metric_L2sqDistance:
distanceFunction = L2DistanceSq[T]
The suggestion advises considering the long-term maintenance burden and risk of adding the new vek dependency, which appears to be maintained by a single developer, despite its significant performance improvements.
Examples:
go.mod [95]
github.com/viterin/vek v0.4.3
pkg/vectorindex/metric/distance_func.go [22-23]
"github.com/viterin/vek"
"github.com/viterin/vek/vek32"
Solution Walkthrough:
Before:
// go.mod
// ... (no 'vek' dependency)
// pkg/vectorindex/metric/distance_func.go
import "gonum.org/v1/gonum/blas/blas32"
func L2Distance[T types.RealNumbers](v1, v2 []T) (T, error) {
// ...
diff := blas32.Vector{
Data: make([]float32, len(_v1)),
// ...
}
for i := range _v1 {
diff.Data[i] = _v1[i] - _v2[i]
}
return T(blas32.Nrm2(diff)), nil
}
After:
// go.mod
require (
// ...
github.com/viterin/vek v0.4.3
)
// pkg/vectorindex/metric/distance_func.go
import "github.com/viterin/vek/vek32"
func L2Distance[T types.RealNumbers](v1, v2 []T) (T, error) {
// ...
_v1 := any(v1).([]float32)
_v2 := any(v2).([]float32)
return T(vek32.Distance(_v1, _v2)), nil
}
Suggestion importance[1-10]: 6
__
Why: The suggestion raises a valid strategic concern about adding a new dependency and its long-term maintenance risk, which is an important consideration for the project's health, even if it's not a code defect.
| Low
|