Adds contribution guidelines for integrations (#1369)

* Adds beginning of integration contribution guide * Fix typo * Adds more context to each integration category * Adds example PRs and some integration specific information to the contributing guide * Updates integration guide with memory, tools, and vector store pages * Update integration contribution guide, fix and add links * Use relative links * Revert to absolute paths
2026-07-01 12:17:38 -04:00 · 2023-05-23 11:02:57 -07:00
parent 1290d4fe7c
commit b153681b2d
8 changed files with 272 additions and 0 deletions
@@ -0,0 +1,158 @@
+# Contributing Integrations to LangChain
+
+In addition to the [general contribution guidelines](https://github.com/hwchase17/langchainjs/blob/main/CONTRIBUTING.md), there are a few extra things to consider when contributing third-party integrations to LangChain that will be covered here. The goal of this page is to help you draft PRs that take these considerations into account, and can therefore be merged sooner.
+
+Integrations tend to fall into a set number of categories, each of which will have their own section below. Please read the [general guidelines](#general-concepts), then see the section at the end of this page specific to what you're building for additional information and examples.
+
+## General concepts
+
+The following guidelines apply broadly to all type of integrations:
+
+### Creating a separate entrypoint
+
+You should generally not export your new module from an `index.ts` file that contains many other exports. Instead, you should add a separate entrypoint for your integration in [`langchain/scripts/create-entrypoints.js`](https://github.com/hwchase17/langchainjs/blob/main/langchain/scripts/create-entrypoints.js) within the `entrypoints` object:
+
+```js
+import * as fs from "fs";
+import * as path from "path";
+
+// This lists all the entrypoints for the library. Each key corresponds to an
+// importable path, eg. `import { AgentExecutor } from "langchain/agents"`.
+// The value is the path to the file in `src/` that exports the entrypoint.
+// This is used to generate the `exports` field in package.json.
+// Order is not important.
+const entrypoints = {
+  // agents
+  agents: "agents/index",
+  "agents/load": "agents/load",
+  ...
+  "vectorstores/chroma": "vectorstores/chroma",
+  "vectorstores/hnswlib": "vectorstores/hnswlib",
+  ...
+};
+```
+
+The entrypoint name should conform to its path in the repo. For example, if you were adding a new vector store for a hypothetical provider "langco", you might create it under `vectorstores/langco.ts`. You should add it above as:
+
+```js
+import * as fs from "fs";
+import * as path from "path";
+
+// This lists all the entrypoints for the library. Each key corresponds to an
+// importable path, eg. `import { AgentExecutor } from "langchain/agents"`.
+// The value is the path to the file in `src/` that exports the entrypoint.
+// This is used to generate the `exports` field in package.json.
+// Order is not important.
+const entrypoints = {
+  // agents
+  agents: "agents/index",
+  "agents/load": "agents/load",
+  ...
+  "vectorstores/chroma": "vectorstores/chroma",
+  "vectorstores/hnswlib": "vectorstores/hnswlib",
+  "vectorstores/langco": "vectorstores/langco",
+  ...
+};
+```
+
+A user would then import your new vector store as `import { LangCoVectorStore } from "langchain/vectorstores/langco";`.
+
+### Third-party dependencies
+
+You may use third-party dependencies in new integrations, but they should be added as `peerDependencies` and `devDependencies` with an entry under `peerDependenciesMeta` in [`langchain/package.json`](https://github.com/hwchase17/langchainjs/blob/main/langchain/package.json), **not under any core `dependencies` list**. This keeps the overall package size small, as only people who are using your integration will need to install, and allows us to support a wider range of runtimes.
+
+We suggest using caret syntax (`^`) for peer dependencies to support a wider range of people trying to use them as well as to be somewhat tolerant to non-major version updates, which should (theoretically) be the only breaking ones.
+
+Please make sure all introduced dependencies are permissively licensed (MIT is recommended) and well-supported and maintained.
+
+You must also add your new entrypoint under `requiresOptionalDependency` in the [`create-entrypoints.js`](https://github.com/hwchase17/langchainjs/blob/main/langchain/scripts/create-entrypoints.js) file to avoid breaking the build:
+
+```js
+// Entrypoints in this list require an optional dependency to be installed.
+// Therefore they are not tested in the generated test-exports-* packages.
+const requiresOptionalDependency = [
+  "agents/load",
+  ...
+  "vectorstores/chroma",
+  "vectorstores/hnswlib",
+  "vectorstores/langco",
+  ...
+];
+```
+
+If you have conformed to all of the above guidelines, you can just import your dependency as normal in your integration's file in the LangChain repo. Developers who import your entrypoint will then see an error message if they are missing the required peer dependency.
+
+### Prioritize using exported third-party types for client config
+
+Many integrations initialize instances of third-party clients, which often require vendor-specific configuration and options in addition to LangChain specific configuration. To avoid unnecessary repetition and desyncing, we suggest using imported third-party configuration types whenever available, unless there's a specific reason to only support a subset of these options.
+
+Here's a simplified example:
+
+```ts
+import {
+  LangCoClient,
+  LangCoClientOptions,
+} from "langco-client";
+
+import { BaseDocumentLoader, DocumentLoader } from "../base.js";
+
+export class LangCoDatasetLoader
+  extends BaseDocumentLoader
+  implements DocumentLoader
+{
+  protected langCoClient: LangCoClient;
+
+  protected datasetId: string;
+
+  protected verbose: boolean;
+
+  constructor(
+    datasetId: string,
+    config: {
+      verbose: boolean;
+      clientOptions?: LangCoClientOptions;
+    }
+  ) {
+    super();
+    this.langCoClient = new LangCoClient(config.clientOptions ?? {});
+    this.verbose = config.verbose ?? false;
+  }
+...
+}
+```
+
+Above, we have a document loader that we're sure will always require a specific `datasetId`, and then some `config` properties that could change in the future containing a LangChain specific configuration property, `verbose`. We have also put a `clientOptions` parameter within that `config` that is passed directly into the third party client. With this structure, if the underlying client adds new options, all we need to do is bump the version.
+
+### Documentation and integration tests
+
+We highly appreciate documentation and integration tests showing how to set up and use your integration. Providing this will make it much easier for reviewers to verify that your integration works and will streamline the review process.
+
+New docs pages should be added as `.mdx` files in the appropriate location under `docs/` (`.mdx` is an extended markdown format that allows use of additional statements like `import`). Code examples within docs pages should be under `examples` and imported like this:
+
+```md
+import CodeBlock from "@theme/CodeBlock";
+import LangCoExample from "@examples/document_loaders/langco.ts";
+
+<CodeBlock language="typescript">{LangCoExample}</CodeBlock>
+```
+
+This allows the linter and formatter to pick up example code blocks within docs as well.
+
+### Linting and formatting
+
+As with all contributions, make sure you run `yarn lint` and `yarn format` so that everything conforms to our established style.
+
+## Integration-specific guidelines and example PRs
+
+Below are links to guides with advice and tips for specific types of integrations:
+
+- [LLM providers](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/LLMS.md) (e.g. OpenAI's GPT-3)
+- Chat model providers (TODO) (e.g. Anthropic's Claude, OpenAI's GPT-4)
+- [Memory](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/MEMORY.md) (used to give an LLM or chat model context of past conversations, e.g. Motörhead)
+- [Vector stores](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/VECTOR_STORES.md) (e.g. Pinecone)
+- [Persistent message stores](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/LLMS.md) (used to persistently store and load raw chat histories, e.g. Redis)
+- [Document loaders](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/DOCUMENT_LOADERS.md) (used to load documents for later storage into vector stores, e.g. Apify)
+- Embeddings (TODO) (e.g. Cohere)
+- [Tools](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/TOOLS.md) (used for agents, e.g. the SERP API tool)
+
+This is a living document, so please make a pull request if we're missing anything useful!
@@ -0,0 +1,11 @@
+# Contributing third-party document loaders
+
+This page contains some specific guidelines and examples for contributing integrations with third-party document loaders.
+
+Document loaders are classes that pull in text from a given source and load them into chunks called **documents** for later use in queryable vector stores. Some example sources include PDFs, websites, and Notion docs.
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+## Example PR
+
+You can take a look at this PR adding Apify Datasets as an example when creating your own document loader integrations: https://github.com/hwchase17/langchainjs/pull/1271
@@ -0,0 +1,22 @@
+# Contributing third-party LLMs
+
+This page contains some specific guidelines and examples for contributing integrations with third-party LLM providers.
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+
+## Example PR
+
+We'll be referencing this PR adding Amazon SageMaker endpoints as an example: https://github.com/hwchase17/langchainjs/pull/1267
+
+## General ideas
+
+The general idea for adding new third-party LLMs is to subclass the `LLM` class and implement the `_call` method. As the name suggests, this method should call the LLM with the given prompt and transform the LLM response into some generated string output.
+
+The example PR for Amazon SageMaker is an interesting example of this because SageMaker endpoints can host a wide variety of models with non-standard input and output formats. Therefore, the contributor added a [simple abstract class](https://github.com/hwchase17/langchainjs/pull/1267/files#diff-4496012d30c03b969546b14039f8deee1b5ba9152a86222100d76c4da77f060cR35) that a user can implement depending on which specific model they are hosting that transforms input from LangChain into a format expected by the model and output into a plain string.
+
+Other third-party providers like OpenAI and Anthropic will have a defined input and output format, and in those cases, the input and output transformations should happen within the `_call` method.
+
+## Wrap LLM requests in this.caller
+
+The base LLM class contains an instance property called `caller` that will automatically handle retries, errors, timeouts, and more. You should wrap calls to the LLM in `this.caller.call` [as shown here](https://github.com/hwchase17/langchainjs/pull/1267/files#diff-4496012d30c03b969546b14039f8deee1b5ba9152a86222100d76c4da77f060cR148)
@@ -0,0 +1,24 @@
+# Contributing third-party memory
+
+This page contains some specific guidelines and examples for contributing integrations with third-party memory providers.
+
+In LangChain, memory differs from [message stores](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/MESSAGE_STORES.md) in that memory does not actually handle persistently storing messages, but acts as a representation of the LLM or chat model's awareness of past conversations, while message stores handle the actual message data persistence. For example, memory may perform other transformations on the messages, like summarization, or may emphasize specific pieces of pertinent information. Memory may rely on message stores as a backing class.
+
+Another key difference is that message stores are only used with chat models.
+
+Before getting started, think about whether your planned integration would be more suited as a message store or as memory!
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+## Example PR
+
+You can use this PR adding Motorhead memory as a reference: https://github.com/hwchase17/langchainjs/pull/598
+
+## General ideas
+
+LangChain memory at its core contains two important methods:
+
+- `loadMemoryVariables`, which loads memory from a message store or other source and formats it.
+- `saveContext`, which stores a representation of the current input and output values in the message store.
+
+As previously mentioned, saving context does not need to involve storing a verbatim transcript of the back-and-forth with the LLM (though you can certainly do that!). It can also involve summarizing or emphasizing different parts of memory, like certain words, mentioned people, or key phrases to prompt the LLM to "remember" details in a different way.
@@ -0,0 +1,19 @@
+# Contributing third-party persistent message stores
+
+This page contains some specific guidelines and examples for contributing integrations with third-party message stores.
+
+In LangChain, message stores differ from [memory](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/integrations/MEMORY.md) in that they simply serialize and persistently store chat messages, while memory, despite its name, does not actually handle persistently storing messages, but acts as a representation of the LLM or chat model's awareness of past conversations. For example, memory may perform other transformations on the messages, like summarization, or may emphasize specific pieces of pertinent information. Memory may rely on message stores as a backing class.
+
+Another key difference is that message stores are only used with chat models.
+
+Before getting started, think about whether your planned integration would be more suited as a message store or as memory!
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+## Example PR
+
+We'll be referencing this PR adding a Redis-backed message store as an example: https://github.com/hwchase17/langchainjs/pull/951
+
+## Serializing and deserializing chat messages
+
+LangChain messages implement a `BaseChatMessage` class that contains information like the message's content and role of the speaker. In order to provide a standard way to map these messages to a storable JSON format, you should use the utility `mapChatMessagesToStoredMessages` and `mapStoredMessagesToChatMessages` functions as [shown here](https://github.com/hwchase17/langchainjs/pull/951/files#diff-4c638d231a5e5bb29a149c6fb7d8f4b24aaf1b6fcc2cc2a728346eaebb6c9c47R17).
@@ -0,0 +1,13 @@
+# Contributing third-party tools
+
+This page contains some specific guidelines and examples for contributing integrations with third-party APIs within tools.
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+## Example PR
+
+You can use this PR adding an AWSLambda tool as a reference when creating your own tools (minus the dynamic import!): https://github.com/hwchase17/langchainjs/pull/727
+
+## Guidelines
+
+Because tools are relatively simple (only requiring a well-thought out description and a single function), and `DynamicTools` and `StructuredDynamicTools` offer developers a high degree of flexibility for specific tasks, submitted tools should be useful to a broad group of developers and have solid use-cases.
@@ -0,0 +1,9 @@
+# Contributing third-party vector stores
+
+This page contains some specific guidelines and examples for contributing integrations with third-party vector store providers.
+
+**Make sure you read the [general guidelines page](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!**
+
+## Example PR
+
+You can use this PR adding Faiss as a reference when creating your own vector store integration: https://github.com/hwchase17/langchainjs/pull/685
@@ -19,6 +19,12 @@ If you are not sure what to work on, we have a few suggestions:

 We are currently trying to keep API parity between the Python and JS versions of LangChain, where possible. As such we ask that if you have an idea for a new abstraction, please open an issue first to discuss it. This will help us make sure that the API is consistent across both versions. If you're not sure what to work on, we recommend looking at the links above first.

+## Want to add a specific integration?
+
+LangChain supports several different types of integrations with third-party providers and frameworks, including LLM providers (e.g. [OpenAI](https://github.com/hwchase17/langchainjs/blob/main/langchain/src/llms/openai.ts)), vector stores (e.g. [FAISS](https://github.com/ewfian/langchainjs/blob/main/langchain/src/vectorstores/faiss.ts)), document loaders (e.g. [Apify](https://github.com/hwchase17/langchainjs/blob/main/langchain/src/document_loaders/web/apify_dataset.ts)) persistent message history stores (e.g. [Redis](https://github.com/hwchase17/langchainjs/blob/main/langchain/src/stores/message/redis.ts)), and more.
+
+We welcome such contributions, but ask that you read our dedicated [integration contribution guide](https://github.com/hwchase17/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) for specific details and patterns to consider before opening a pull request.
+
 ## 🗺️Contributing Guidelines

 ### 🚩GitHub Issues
@@ -162,6 +168,8 @@ To run only integration tests, run:
 yarn test:int
 ```

+Note that many integration tests require credentials or other setup. You may need to set up a `langchain/.env` file like the example [here](https://github.com/hwchase17/langchainjs/blob/main/langchain/.env.example).
+
 **Environment tests** test whether LangChain works across different JS environments, including Node.js (both ESM and CJS), Edge environments (eg. Cloudflare Workers), and browsers (using Webpack).

 To run the environment tests with Docker run:
@@ -170,6 +178,14 @@ To run the environment tests with Docker run:
 yarn test:exports:docker
 ```

+#### Running a single test
+
+To run a single test, run:
+
+```bash
+yarn test:single ./path/to/yourtest.test.ts
+```
+
 ### Building

 To build the project, run: