[PR #33] [MERGED] feat: support bm25 milvus function #56

Closed
opened 2026-02-16 06:15:58 -05:00 by yindo · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/langchain-ai/langchain-milvus/pull/33
Author: @zc277584121
Created: 1/3/2025
Status: Merged
Merged: 1/10/2025
Merged by: @zc277584121

Base: mainHead: main


📝 Commits (4)

  • 8845a95 feat: support bm25 milvus function
  • 2b421ba fix ci
  • 15d4340 refine name of built-in function attribute
  • cf0f476 final refinement for BM25 builtin function

📊 Changes

9 files changed (+883 additions, -384 deletions)

View changed files

📝 libs/milvus/langchain_milvus/__init__.py (+6 -0)
libs/milvus/langchain_milvus/function.py (+74 -0)
libs/milvus/langchain_milvus/utils/constant.py (+4 -0)
📝 libs/milvus/langchain_milvus/vectorstores/milvus.py (+679 -236)
📝 libs/milvus/langchain_milvus/vectorstores/zilliz.py (+3 -75)
📝 libs/milvus/poetry.lock (+7 -48)
📝 libs/milvus/pyproject.toml (+1 -1)
📝 libs/milvus/tests/integration_tests/vectorstores/test_milvus.py (+107 -24)
📝 libs/milvus/tests/unit_tests/test_imports.py (+2 -0)

📄 Description

This PR introduced some major refactors:

  • Introduce the abstract class BaseMilvusBuiltInFunction, which is a light wrapper of Milvus Function.
  • Introduce Bm25BuiltInFunction extended from BaseMilvusBuiltInFunction , which includes the Milvus FunctionType.BM25 settings and the configs of Milvus analyzer. We can use this Bm25BuiltInFunction to implement Full text search in Milvus
  • In the future, Milvus will support more built-in Functions which support text-in(instead of vector-in) abilities, without transporting text to embedding on the user's end because it does this on the server's end automatically (here is a FunctionType.TEXTEMBEDDING example). So in the future we can implement more subclass from BaseMilvusBuiltInFunction to support the text-in functions in Milvus.
  • The how-to-use introduction is on the way, and there are some use case examples in the unittest test_builtin_bm25_function(). Simply speaking, we can pass in any customized Langchain embedding functions or milvus built-in functions to the Milvus class initialization function to build multi index fields in Milvus.
    Some use case examples will be like these:
from langchain_milvus import Milvus, BM25BuiltInFunction
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

vectorstore = Milvus.from_documents(
    documents=docs,
    embedding=embedding,
    builtin_function=BM25BuiltInFunction(
        output_field_names="sparse"
    ),
    #"dense" field is used for similarity search for OpenAI dense embedding, "sparse" field is used for BM25 full-text search
    vector_field=["dense", "sparse"],
    connection_args={
        "uri": URI,
    },
    drop_old=True,
)

or with multi embedding fields and bm25 function:

from langchain_voyageai import VoyageAIEmbeddings

embedding = OpenAIEmbeddings()
embedding2 = VoyageAIEmbeddings(model="voyage-3")

vectorstore = Milvus.from_documents(
    documents=docs,
    embedding=[embedding, embedding2],
    builtin_function=BM25BuiltInFunction(
        input_field_names="text",
        output_field_names="sparse"
    ),
    text_field="text",
    vector_field=["dense", "dense2", "sparse"],
    connection_args={
        "uri": URI,
    },
    drop_old=True,
)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/langchain-ai/langchain-milvus/pull/33 **Author:** [@zc277584121](https://github.com/zc277584121) **Created:** 1/3/2025 **Status:** ✅ Merged **Merged:** 1/10/2025 **Merged by:** [@zc277584121](https://github.com/zc277584121) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (4) - [`8845a95`](https://github.com/langchain-ai/langchain-milvus/commit/8845a953847723f99cafc3bc56df714ba43e0975) feat: support bm25 milvus function - [`2b421ba`](https://github.com/langchain-ai/langchain-milvus/commit/2b421ba3688fb6fcca676a5091e3b071a5e45e82) fix ci - [`15d4340`](https://github.com/langchain-ai/langchain-milvus/commit/15d434017e38d9c2e147da9d405c0b9f620ab987) refine name of built-in function attribute - [`cf0f476`](https://github.com/langchain-ai/langchain-milvus/commit/cf0f4766bc0a1c760ad78cb9c13c00984703a331) final refinement for BM25 builtin function ### 📊 Changes **9 files changed** (+883 additions, -384 deletions) <details> <summary>View changed files</summary> 📝 `libs/milvus/langchain_milvus/__init__.py` (+6 -0) ➕ `libs/milvus/langchain_milvus/function.py` (+74 -0) ➕ `libs/milvus/langchain_milvus/utils/constant.py` (+4 -0) 📝 `libs/milvus/langchain_milvus/vectorstores/milvus.py` (+679 -236) 📝 `libs/milvus/langchain_milvus/vectorstores/zilliz.py` (+3 -75) 📝 `libs/milvus/poetry.lock` (+7 -48) 📝 `libs/milvus/pyproject.toml` (+1 -1) 📝 `libs/milvus/tests/integration_tests/vectorstores/test_milvus.py` (+107 -24) 📝 `libs/milvus/tests/unit_tests/test_imports.py` (+2 -0) </details> ### 📄 Description This PR introduced some major refactors: - Introduce the abstract class `BaseMilvusBuiltInFunction`, which is a light wrapper of [Milvus Function](https://milvus.io/docs/manage-collections.md#Function). - Introduce `Bm25BuiltInFunction` extended from `BaseMilvusBuiltInFunction` , which includes the Milvus `FunctionType.BM25` settings and the configs of Milvus analyzer. We can use this `Bm25BuiltInFunction` to implement [Full text search](https://milvus.io/docs/full-text-search.md) in Milvus - In the future, Milvus will support more built-in Functions which support text-in(instead of vector-in) abilities, without transporting text to embedding on the user's end because it does this on the server's end automatically (here is a `FunctionType.TEXTEMBEDDING` [example](https://github.com/milvus-io/pymilvus/blob/master/examples/text_embedding.py)). So in the future we can implement more subclass from `BaseMilvusBuiltInFunction` to support the text-in functions in Milvus. - The how-to-use introduction is on the way, and there are some use case examples in the unittest `test_builtin_bm25_function()`. Simply speaking, we can pass in any customized Langchain embedding functions or milvus built-in functions to the Milvus class initialization function to build multi index fields in Milvus. Some use case examples will be like these: ```python from langchain_milvus import Milvus, BM25BuiltInFunction from langchain_openai import OpenAIEmbeddings embedding = OpenAIEmbeddings() vectorstore = Milvus.from_documents( documents=docs, embedding=embedding, builtin_function=BM25BuiltInFunction( output_field_names="sparse" ), #"dense" field is used for similarity search for OpenAI dense embedding, "sparse" field is used for BM25 full-text search vector_field=["dense", "sparse"], connection_args={ "uri": URI, }, drop_old=True, ) ``` or with multi embedding fields and bm25 function: ```python from langchain_voyageai import VoyageAIEmbeddings embedding = OpenAIEmbeddings() embedding2 = VoyageAIEmbeddings(model="voyage-3") vectorstore = Milvus.from_documents( documents=docs, embedding=[embedding, embedding2], builtin_function=BM25BuiltInFunction( input_field_names="text", output_field_names="sparse" ), text_field="text", vector_field=["dense", "dense2", "sparse"], connection_args={ "uri": URI, }, drop_old=True, ) ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
yindo added the pull-request label 2026-02-16 06:15:58 -05:00
yindo closed this issue 2026-02-16 06:15:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: langchain-ai/langchain-milvus#56