[PR #58] [MERGED] rely on provider for counting of vectors and only optionally fallback on DB due to slow query #118

Closed
opened 2026-02-15 16:30:21 -05:00 by yindo · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/Mintplex-Labs/vector-admin/pull/58
Author: @timothycarambat
Created: 9/25/2023
Status: Merged
Merged: 9/25/2023
Merged by: @timothycarambat

Base: masterHead: rely-on-provider-for-counts


📝 Commits (1)

  • 6544089 rely on provider for counting of vectors and only optionally fallback on DB due to slow query

📊 Changes

7 files changed (+61 additions, -17 deletions)

View changed files

📝 backend/endpoints/v1/workspaces/index.js (+2 -2)
📝 backend/models/workspaceDocument.js (+23 -2)
📝 backend/utils/vectordatabases/providers/chroma/index.js (+10 -1)
📝 backend/utils/vectordatabases/providers/index.js (+7 -8)
📝 backend/utils/vectordatabases/providers/pinecone/index.js (+11 -1)
📝 backend/utils/vectordatabases/providers/qdrant/index.js (+4 -1)
📝 backend/utils/vectordatabases/providers/weaviate/index.js (+4 -2)

📄 Description

Until the data migration is done and organiztion_id is appended to document vectors we will need to rely on the provider for counting vectors.

  1. This can count documents or vectors that VectorAdmin isn't aware of because we are reading from remote
  2. The SQL query currently used is not great and with 50K documents results in an even larger ...IN(1,2,3,) query looking for document ids that are in document_vectors and would be easier to have a fixed organization_id key we can COUNT against since we will very easily reach the upper end of IN() parameters

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/Mintplex-Labs/vector-admin/pull/58 **Author:** [@timothycarambat](https://github.com/timothycarambat) **Created:** 9/25/2023 **Status:** ✅ Merged **Merged:** 9/25/2023 **Merged by:** [@timothycarambat](https://github.com/timothycarambat) **Base:** `master` ← **Head:** `rely-on-provider-for-counts` --- ### 📝 Commits (1) - [`6544089`](https://github.com/Mintplex-Labs/vector-admin/commit/6544089e9b5e5a70ec2b7e15c57f5a67eec603d1) rely on provider for counting of vectors and only optionally fallback on DB due to slow query ### 📊 Changes **7 files changed** (+61 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `backend/endpoints/v1/workspaces/index.js` (+2 -2) 📝 `backend/models/workspaceDocument.js` (+23 -2) 📝 `backend/utils/vectordatabases/providers/chroma/index.js` (+10 -1) 📝 `backend/utils/vectordatabases/providers/index.js` (+7 -8) 📝 `backend/utils/vectordatabases/providers/pinecone/index.js` (+11 -1) 📝 `backend/utils/vectordatabases/providers/qdrant/index.js` (+4 -1) 📝 `backend/utils/vectordatabases/providers/weaviate/index.js` (+4 -2) </details> ### 📄 Description Until the data migration is done and `organiztion_id` is appended to `document vectors` we will need to rely on the provider for counting vectors. 1. This can count documents or vectors that VectorAdmin isn't aware of because we are reading from remote 2. The SQL query currently used is not great and with 50K documents results in an even larger `...IN(1,2,3,)` query looking for document ids that are in `document_vectors` and would be easier to have a fixed `organization_id` key we can `COUNT` against since we will very easily reach the upper end of `IN()` parameters --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
yindo added the pull-request label 2026-02-15 16:30:21 -05:00
yindo closed this issue 2026-02-15 16:30:21 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Mintplex-Labs/vector-admin#118