opencode icon indicating copy to clipboard operation
opencode copied to clipboard

[Feature] Flexible Vector Dimension Infrastructure for Future Experiments

Open hammertoe opened this issue 1 month ago • 0 comments

Flexible Vector Dimension Infrastructure for Future Experiments

Summary

Migrate from hardcoded 2048-dim vectors to a configurable infrastructure that supports:

  • Runtime configuration of vector dimensions
  • Multiple concurrent indexes for A/B testing
  • Gradual traffic routing between indexes
  • Future experiments (256-dim, 512-dim, etc.) without code changes

Motivation

Currently, vector dimension is hardcoded across the codebase:

  • src/yuhgettintru/embeddings.py:23 - DEFAULT_DIMENSION = 2048
  • src/yuhgettintru/search/__init__.py - hardcoded index name
  • scripts/backfill_embeddings.py - no dimension flag
  • scripts/create_vector_index.py - hardcoded dimension

This prevents:

  • Running experiments with different dimensions (256, 512, 1024)
  • Gradual cutovers with traffic splitting
  • Safe rollback if a dimension change degrades quality

Background

Voyage AI's voyage-3.5 supports dimensions: 2048, 1024, 512, 256 via Matryoshka learning with minimal quality loss.

Current Architecture

Product Model → VoyageEmbeddingGenerator (2048-dim) → MongoDB `embedding_voyage` field
                                                              ↓
Search Service → $vectorSearch on "vector_index_voyage" (2048-dim)

Proposed Architecture

Config.yaml → VoyageEmbeddingGenerator (configurable dim) → MongoDB
                                                              ↓
                                            Search Service → $vectorSearch
                                            (configurable index + traffic routing)

Implementation Plan

Phase 1: Infrastructure (Flexible Foundation)

1.1 Add Vector Configuration to config.yaml

vector:
  dimension: 1024                      # Default embedding dimension
  index_name: vector_index_voyage_1024 # Current active index
  routing:
    enabled: false                     # Enable when running experiments
    weights:
      vector_index_voyage_2048: 1.0    # Legacy index
      vector_index_voyage_1024: 0.0    # New index

1.2 Update Embedding Generator to Read from Config

File: src/yuhgettintru/embeddings.py

# Before
DEFAULT_DIMENSION = 2048

# After
def __init__(self, config, output_dimension: int | None = None):
    self.output_dimension = output_dimension or config.vector.dimension

1.3 Update Search Service for Traffic Splitting

File: src/yuhgettintru/search/__init__.py

def build_vector_search_pipeline(self, query_embedding, config):
    indexes = []
    if config.vector.routing.enabled:
        # Query multiple indexes with weights
        for idx_name, weight in config.vector.routing.weights.items():
            if weight > 0:
                indexes.append((idx_name, weight))
    else:
        indexes = [(config.vector.index_name, 1.0)]
    
    # Merge results using weighted RRF
    return self._weighted_hybrid_search(query_embedding, indexes, config)

1.4 Parameterize Backfill Script

File: scripts/backfill_embeddings.py

# New usage
python scripts/backfill_embeddings.py \
  --dimension 1024 \
  --index vector_index_voyage_1024 \
  --store costuless

1.5 Parameterize Index Creation Script

File: scripts/create_vector_index.py

# New usage
python scripts/create_vector_index.py \
  --name vector_index_voyage_1024 \
  --dimension 1024 \
  --path embedding_voyage

Phase 2: Backfill (Future Step)

Not included in this issue - to be scheduled separately.

Phase 3: Gradual Cutover (Future Step)

Not included in this issue - to be scheduled separately.

Files to Modify

File Change Priority
.beads/config.yaml Add vector section Required
src/yuhgettintru/embeddings.py Read dimension from config, add config injection Required
src/yuhgettintru/search/__init__.py Config-driven index + weighted traffic splitting Required
src/yuhgettintru/cli.py Pass config to embedding generator Required
src/yuhgettintru/extract_service.py Use config for embedding generation Required
src/yuhgettintru/llm_enhancer.py Use config for embedding generation Required
src/yuhgettintru/services.py Inject config into services Required
scripts/backfill_embeddings.py Add --dimension and --index flags Required
scripts/create_vector_index.py Make dimension/index configurable Required
tests/conftest.py Provide mock config fixture Optional
tests/test_embeddings.py Update tests to use config Optional
tests/test_search_service.py Update tests for traffic splitting Optional

Future Experiment Workflow

Once this infrastructure is in place, running dimension experiments is straightforward:

# 1. Create new index with experimental dimension
./scripts/create_vector_index.py \
  --name vector_index_voyage_256 \
  --dimension 256 \
  --path embedding_voyage

# 2. Backfill subset of products
./scripts/backfill_embeddings.py \
  --dimension 256 \
  --index vector_index_voyage_256 \
  --limit 10000

# 3. Enable routing in config.yaml
# vector:
#   routing:
#     enabled: true
#     weights:
#       vector_index_voyage_2048: 0.8
#       vector_index_voyage_1024: 0.15
#       vector_index_voyage_256: 0.05

# 4. Monitor metrics (click-through, latency, zero-results)

# 5. Adjust weights or rollback

Configuration Options

Basic (Single Index)

vector:
  dimension: 1024
  index_name: vector_index_voyage_1024

Experiment (Multiple Indexes)

vector:
  dimension: 1024                    # For new embeddings
  index_name: vector_index_voyage_1024
  routing:
    enabled: true
    weights:
      vector_index_voyage_2048: 0.8  # 80% traffic
      vector_index_voyage_1024: 0.2  # 20% traffic

Acceptance Criteria

  • [ ] config.yaml includes vector section with dimension, index_name, and optional routing
  • [ ] VoyageEmbeddingGenerator reads dimension from config
  • [ ] ProductSearchService uses configured index name
  • [ ] Traffic splitting works (when enabled in config)
  • [ ] create_vector_index.py accepts --dimension and --name flags
  • [ ] backfill_embeddings.py accepts --dimension and --index flags
  • [ ] All existing tests pass after refactoring
  • [ ] Search quality remains unchanged (same results, same order)

Out of Scope

  • Actually migrating from 2048-dim to 1024-dim (Phase 2)
  • Running A/B tests or gradual cutovers (Phase 3)
  • Changes to quantization strategy (currently int8, no change planned)

References

Metadata

Field Value
Priority Medium
Estimated Effort 2-3 days
Requires Database Migration No (new index)
Requires Downtime No
风险 Low - no behavioral changes, just infrastructure

hammertoe avatar Dec 24 '25 09:12 hammertoe