feat(vector_io): add custom collection names support for vector stores #4203
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR fixes an issue where custom collection names for vector stores were not being correctly utilized by the underlying storage providers, resulting in a mismatch between the logical API identifier (UUID) and the physical storage identifier.
Specifically, it:
VectorIORouterto pass the canonicalvector_store_id(UUID) to the provider inmodel_extra.OpenAIVectorStoreMixinto use the providedvector_store_idas the logical identifier for theVectorStoreresource, while retainingprovider_vector_store_id(custom name) as theprovider_resource_id.sqlite_vec,faiss) to use theprovider_resource_id(if available) as the physical storage identifier (e.g., table name, bank ID).This ensures that when a user specifies a
collection_name, it is used for the physical storage while the API continues to return the expected UUID format, resolving the discrepancy and ensuring correct routing and storage.Closes #4135
Test Plan
Added 3 comprehensive integration tests to
tests/integration/vector_io/test_openai_vector_stores.py:test_openai_vector_store_custom_collection_name: Validates custom collection name creation and metadata storagetest_openai_vector_store_collection_name_validation: Validates input sanitization (alphanumeric, hyphens, underscores only)test_openai_vector_store_collection_name_with_data: Validates data insertion and search operations with custom collection namesVerified ID synchronization across all layers:
vs_abc123)Ensured backward compatibility: existing code without
collection_namecontinues to use auto-generated UUIDsTested with ollama, nomic-embed-text:latest, sqlite
The changes resolve the ID synchronization issue and enable users to specify meaningful collection names for easier management of vector stores.