opencode
opencode copied to clipboard
[Feature] Flexible Vector Dimension Infrastructure for Future Experiments
Flexible Vector Dimension Infrastructure for Future Experiments
Summary
Migrate from hardcoded 2048-dim vectors to a configurable infrastructure that supports:
- Runtime configuration of vector dimensions
- Multiple concurrent indexes for A/B testing
- Gradual traffic routing between indexes
- Future experiments (256-dim, 512-dim, etc.) without code changes
Motivation
Currently, vector dimension is hardcoded across the codebase:
-
src/yuhgettintru/embeddings.py:23-DEFAULT_DIMENSION = 2048 -
src/yuhgettintru/search/__init__.py- hardcoded index name -
scripts/backfill_embeddings.py- no dimension flag -
scripts/create_vector_index.py- hardcoded dimension
This prevents:
- Running experiments with different dimensions (256, 512, 1024)
- Gradual cutovers with traffic splitting
- Safe rollback if a dimension change degrades quality
Background
Voyage AI's voyage-3.5 supports dimensions: 2048, 1024, 512, 256 via Matryoshka learning with minimal quality loss.
Current Architecture
Product Model → VoyageEmbeddingGenerator (2048-dim) → MongoDB `embedding_voyage` field
↓
Search Service → $vectorSearch on "vector_index_voyage" (2048-dim)
Proposed Architecture
Config.yaml → VoyageEmbeddingGenerator (configurable dim) → MongoDB
↓
Search Service → $vectorSearch
(configurable index + traffic routing)
Implementation Plan
Phase 1: Infrastructure (Flexible Foundation)
1.1 Add Vector Configuration to config.yaml
vector:
dimension: 1024 # Default embedding dimension
index_name: vector_index_voyage_1024 # Current active index
routing:
enabled: false # Enable when running experiments
weights:
vector_index_voyage_2048: 1.0 # Legacy index
vector_index_voyage_1024: 0.0 # New index
1.2 Update Embedding Generator to Read from Config
File: src/yuhgettintru/embeddings.py
# Before
DEFAULT_DIMENSION = 2048
# After
def __init__(self, config, output_dimension: int | None = None):
self.output_dimension = output_dimension or config.vector.dimension
1.3 Update Search Service for Traffic Splitting
File: src/yuhgettintru/search/__init__.py
def build_vector_search_pipeline(self, query_embedding, config):
indexes = []
if config.vector.routing.enabled:
# Query multiple indexes with weights
for idx_name, weight in config.vector.routing.weights.items():
if weight > 0:
indexes.append((idx_name, weight))
else:
indexes = [(config.vector.index_name, 1.0)]
# Merge results using weighted RRF
return self._weighted_hybrid_search(query_embedding, indexes, config)
1.4 Parameterize Backfill Script
File: scripts/backfill_embeddings.py
# New usage
python scripts/backfill_embeddings.py \
--dimension 1024 \
--index vector_index_voyage_1024 \
--store costuless
1.5 Parameterize Index Creation Script
File: scripts/create_vector_index.py
# New usage
python scripts/create_vector_index.py \
--name vector_index_voyage_1024 \
--dimension 1024 \
--path embedding_voyage
Phase 2: Backfill (Future Step)
Not included in this issue - to be scheduled separately.
Phase 3: Gradual Cutover (Future Step)
Not included in this issue - to be scheduled separately.
Files to Modify
| File | Change | Priority |
|---|---|---|
.beads/config.yaml |
Add vector section |
Required |
src/yuhgettintru/embeddings.py |
Read dimension from config, add config injection | Required |
src/yuhgettintru/search/__init__.py |
Config-driven index + weighted traffic splitting | Required |
src/yuhgettintru/cli.py |
Pass config to embedding generator | Required |
src/yuhgettintru/extract_service.py |
Use config for embedding generation | Required |
src/yuhgettintru/llm_enhancer.py |
Use config for embedding generation | Required |
src/yuhgettintru/services.py |
Inject config into services | Required |
scripts/backfill_embeddings.py |
Add --dimension and --index flags |
Required |
scripts/create_vector_index.py |
Make dimension/index configurable | Required |
tests/conftest.py |
Provide mock config fixture | Optional |
tests/test_embeddings.py |
Update tests to use config | Optional |
tests/test_search_service.py |
Update tests for traffic splitting | Optional |
Future Experiment Workflow
Once this infrastructure is in place, running dimension experiments is straightforward:
# 1. Create new index with experimental dimension
./scripts/create_vector_index.py \
--name vector_index_voyage_256 \
--dimension 256 \
--path embedding_voyage
# 2. Backfill subset of products
./scripts/backfill_embeddings.py \
--dimension 256 \
--index vector_index_voyage_256 \
--limit 10000
# 3. Enable routing in config.yaml
# vector:
# routing:
# enabled: true
# weights:
# vector_index_voyage_2048: 0.8
# vector_index_voyage_1024: 0.15
# vector_index_voyage_256: 0.05
# 4. Monitor metrics (click-through, latency, zero-results)
# 5. Adjust weights or rollback
Configuration Options
Basic (Single Index)
vector:
dimension: 1024
index_name: vector_index_voyage_1024
Experiment (Multiple Indexes)
vector:
dimension: 1024 # For new embeddings
index_name: vector_index_voyage_1024
routing:
enabled: true
weights:
vector_index_voyage_2048: 0.8 # 80% traffic
vector_index_voyage_1024: 0.2 # 20% traffic
Acceptance Criteria
- [ ]
config.yamlincludesvectorsection with dimension, index_name, and optional routing - [ ]
VoyageEmbeddingGeneratorreads dimension from config - [ ]
ProductSearchServiceuses configured index name - [ ] Traffic splitting works (when enabled in config)
- [ ]
create_vector_index.pyaccepts--dimensionand--nameflags - [ ]
backfill_embeddings.pyaccepts--dimensionand--indexflags - [ ] All existing tests pass after refactoring
- [ ] Search quality remains unchanged (same results, same order)
Out of Scope
- Actually migrating from 2048-dim to 1024-dim (Phase 2)
- Running A/B tests or gradual cutovers (Phase 3)
- Changes to quantization strategy (currently int8, no change planned)
References
- Voyage AI Flexible Dimensions
- Voyage-3.5 Blog Post
- Current embedding implementation:
src/yuhgettintru/embeddings.py - Current search implementation:
src/yuhgettintru/search/__init__.py
Metadata
| Field | Value |
|---|---|
| Priority | Medium |
| Estimated Effort | 2-3 days |
| Requires Database Migration | No (new index) |
| Requires Downtime | No |
| 风险 | Low - no behavioral changes, just infrastructure |