Startup Data Enrichment
This directory contains evaluation script for evaluating an agent on how well it does at researching information about a startup.
Dataset
The dataset used can be found here. This dataset has a list of AI startups to do research on and extract the following fields for:
namedescriptionwebsitecrunchbase_profileyear_foundedceototal_funding_mm_usdlatest_roundlatest_round_datelatest_round_amount_mm_usd
Example input
{
"company": "LangChain",
"extraction_schema": {
"type": "object",
"title": "company_info",
"required": [
"name",
"description",
"website",
"crunchbase_profile",
"year_founded",
"ceo",
"total_funding_mm_usd",
"latest_round",
"latest_round_date",
"latest_round_amount_mm_usd"
],
"properties": {
"ceo": {
"type": "string",
"description": "Name of the company's CEO"
},
"name": {
"type": "string",
"description": "Official company name"
},
"website": {
"type": "string",
"format": "uri",
"description": "Company's official website URL"
},
"description": {
"type": "string",
"description": "Brief description of the company and its activities"
},
"latest_round": {
"type": "string",
"description": "Type of the most recent funding round (e.g., Series A, Seed, etc.)"
},
"year_founded": {
"type": "integer",
"minimum": 1800,
"description": "Year when the company was founded"
},
"latest_round_date": {
"type": "string",
"format": "date",
"description": "Date of the most recent funding round (YYYY-MM-DD)"
},
"crunchbase_profile": {
"type": "string",
"format": "uri",
"description": "Company's Crunchbase profile URL"
},
"total_funding_mm_usd": {
"type": "number",
"minimum": 0,
"description": "Total funding raised in millions of USD"
},
"latest_round_amount_mm_usd": {
"type": "number",
"minimum": 0,
"description": "Amount raised in the most recent funding round in millions of USD"
}
},
"description": "Company information"
}
}
Example output
{
"info": {
"ceo": "Harrison Chase",
"name": "LangChain, Inc.",
"website": "https://www.langchain.com",
"description": "LangChain helps developers to build applications powered by large language models (LLMs). It provides tools and frameworks to integrate LLMs with external data sources and APIs, facilitating the creation of advanced AI applications.",
"latest_round": "Series A",
"year_founded": 2022,
"latest_round_date": "2024-02-15",
"crunchbase_profile": "https://www.crunchbase.com/organization/langchain",
"total_funding_mm_usd": 35,
"latest_round_amount_mm_usd": 25
}
}
Using the dataset
To use the data from this dataset in your own project, you can:
(1) clone the dataset using LangSmith SDK:
from langsmith import Client
client = Client()
cloned_dataset = client.clone_public_dataset(
"https://smith.langchain.com/public/afabd12a-62fa-4c09-b083-6b1742b4cc3a/d",
dataset_name="Startup Data Enrichment"
)
(2) create a new dataset with the same examples using the following command:
python startup_data_enrichment/create_dataset.py
Evaluation Metric
The extracted outputs are evaluated using LLM-as-a-judge that compares extracted and reference outputs for each company and produces a score between 0 and 1, where 1 is a perfect match and 0 is a complete mismatch.
You can adjust the prompt and evaluation criteria in the run_eval.py script if you're adapting this to your own dataset.
Invoking the agent
The agent is invoked using a RemoteGraph:
from langgraph.pregel.remote import RemoteGraph
agent_graph = RemoteGraph(agent_id, url=agent_url)
agent_graph.invoke(inputs)
Using different agent schema
Your agent might be using a custom input/output schema that doesn't match the dataset schema. To handle this, you can modify transform_dataset_inputs and transform_agent_outputs in run_eval.py in the following way:
def transform_dataset_inputs(inputs: dict) -> dict:
"""Transform LangSmith dataset inputs to match the agent's input schema before invoking the agent."""
# see the `Example output` in the README for reference on what the output should look like
return {"my_agent_key": inputs["company"], ...}
def transform_agent_outputs(outputs: dict) -> dict:
"""Transform agent outputs to match the LangSmith dataset output schema."""
# see the `Example output` for reference on what the output should look like
return {"info": outputs["my_agent_output_key"]}
transform_dataset_inputs will be applied to LangSmith dataset inputs before invoking the agent, and transform_agent_outputs will be applied to the agent's response before it's compared to the expected output in the LangSmith eval dataset.
Running evals
First, make sure you have created the dataset as described in the Using the dataset section.
To evaluate the agent, you can run startup_data_enrichment/run_eval.py script. This will create new experiments in LangSmith for the two datasets mentioned above.
By default this will use the Startup Data Enrichment dataset & Company mAIstro agent by LangChain.
python startup_data_enrichment/run_eval.py --experiment-prefix "My custom prefix"
You can pass the following parameters to customize the evaluation:
--dataset-name: Name of the dataset to evaluate against. Defaults toStartup Data Enrichmentdataset.--graph-id: graph ID of the agent to evaluate. Defaults tocompany_maistro.--agent-url: URL of the deployed agent to evaluate. Defaults toCompany mAIstrodeployment.--experiment-prefix: Prefix for the experiment name.
Testing the agent locally
Import agent
You can import the compiled LangGraph graph object corresponding to your agent and that as agent_graph in run_eval.py instead of RemoteGraph. Then you can run the evaluation script as usual - graph-id and agent-url params will be ignored.
Run local LangGraph server
You can test the agent locally by using LangGraph CLI. From the directory that contains the langgraph.json configuration file, run
langgraph dev
This will start a local server that you can interact with using RemoteGraph.
Then simply pass local URL for agent-url parameter and run the evaluation script as before:
python startup_data_enrichment/run_eval.py --experiment-prefix "My custom prefix" --agent-url http://localhost:8123