Skip to content

Conversation

@r-bit-rry
Copy link
Contributor

@r-bit-rry r-bit-rry commented Nov 20, 2025

What does this PR do?

This PR fixes an issue where custom collection names for vector stores were not being correctly utilized by the underlying storage providers, resulting in a mismatch between the logical API identifier (UUID) and the physical storage identifier.

Specifically, it:

  1. Updates the VectorIORouter to pass the canonical vector_store_id (UUID) to the provider in model_extra.
  2. Updates OpenAIVectorStoreMixin to use the provided vector_store_id as the logical identifier for the VectorStore resource, while retaining provider_vector_store_id (custom name) as the provider_resource_id.
  3. Updates inline providers (sqlite_vec, faiss) to use the provider_resource_id (if available) as the physical storage identifier (e.g., table name, bank ID).

This ensures that when a user specifies a collection_name, it is used for the physical storage while the API continues to return the expected UUID format, resolving the discrepancy and ensuring correct routing and storage.

Closes #4135

Test Plan

  • Added 3 comprehensive integration tests to tests/integration/vector_io/test_openai_vector_stores.py:

    • test_openai_vector_store_custom_collection_name: Validates custom collection name creation and metadata storage
    • test_openai_vector_store_collection_name_validation: Validates input sanitization (alphanumeric, hyphens, underscores only)
    • test_openai_vector_store_collection_name_with_data: Validates data insertion and search operations with custom collection names
  • Verified ID synchronization across all layers:

    • Client receives UUID (vs_abc123)
    • Router maps UUID to custom collection name
    • Provider uses UUID for routing, custom name for physical storage
    • Physical storage (SQLite, FAISS, etc.) uses custom collection name
  • Ensured backward compatibility: existing code without collection_name continues to use auto-generated UUIDs

  • Tested with ollama, nomic-embed-text:latest, sqlite

The changes resolve the ID synchronization issue and enable users to specify meaningful collection names for easier management of vector stores.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 20, 2025
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold on this. we need clarification on the use case. #4135

as written, this doesn't consider any of the remote providers.

-1 giving the api caller control over the backend ids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Being able to set the collection name for vector dbs

2 participants