[PR #92] [MERGED] Added benchmarks to typerwriter 1, multiverse, relational data, update evaluators #101

Closed
opened 2026-02-16 00:18:12 -05:00 by yindo · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/langchain-ai/langchain-benchmarks/pull/92
Author: @eyurtsev
Created: 11/28/2023
Status: Merged
Merged: 11/29/2023
Merged by: @eyurtsev

Base: mainHead: eugene/adding_comparison_code


📝 Commits (10+)

📊 Changes

9 files changed (+2239 additions, -1111 deletions)

View changed files

📝 docs/source/notebooks/tool_usage/multiverse_math.ipynb (+715 -377)
📝 docs/source/notebooks/tool_usage/relational_data.ipynb (+309 -76)
📝 docs/source/notebooks/tool_usage/typewriter_1.ipynb (+1044 -168)
📝 docs/source/notebooks/tool_usage/typewriter_26.ipynb (+66 -462)
📝 langchain_benchmarks/tool_usage/agents.py (+4 -3)
📝 langchain_benchmarks/tool_usage/evaluators.py (+65 -13)
langchain_benchmarks/tool_usage/prompts.py (+24 -0)
📝 langchain_benchmarks/tool_usage/tasks/multiverse_math.py (+3 -3)
📝 poetry.lock (+9 -9)

📄 Description

  • Added benchmarks to typerwriter 1, multiverse, relational data
  • Updated the evaluator to be more configurable; it'll grade the multiverse math correctly now + allow skipping grading output for typewriter tasks
  • Fixed examples in a dataset (updated them already in the public dataset)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/langchain-ai/langchain-benchmarks/pull/92 **Author:** [@eyurtsev](https://github.com/eyurtsev) **Created:** 11/28/2023 **Status:** ✅ Merged **Merged:** 11/29/2023 **Merged by:** [@eyurtsev](https://github.com/eyurtsev) **Base:** `main` ← **Head:** `eugene/adding_comparison_code` --- ### 📝 Commits (10+) - [`c90e1c0`](https://github.com/langchain-ai/langchain-benchmarks/commit/c90e1c095dc23d1473bd349bc9ec6a7485d28c1b) Added simple comparison code - [`a993020`](https://github.com/langchain-ai/langchain-benchmarks/commit/a99302011ec295eff54b861ff9374cf3dcaa5c67) fix bug in agents - [`5577f0d`](https://github.com/langchain-ai/langchain-benchmarks/commit/5577f0dca723ae4d7dad576245457aa00e7640ff) fix bug in multiverse math - [`7ece20a`](https://github.com/langchain-ai/langchain-benchmarks/commit/7ece20adaa7607a984b054119554d2d9b3a2a147) update relational data - [`dd55bd5`](https://github.com/langchain-ai/langchain-benchmarks/commit/dd55bd51283965ea8a513734dd4e82a9f983b5ef) x - [`ee2f579`](https://github.com/langchain-ai/langchain-benchmarks/commit/ee2f579eb13ed5371c79dde52c4fc412199934bd) x - [`41094fa`](https://github.com/langchain-ai/langchain-benchmarks/commit/41094fa507ada35ce77776a014af8ad4fc4219b2) x - [`422ed56`](https://github.com/langchain-ai/langchain-benchmarks/commit/422ed5616e22db1142e2295a333576477190b73c) x - [`c7cdff5`](https://github.com/langchain-ai/langchain-benchmarks/commit/c7cdff5422698b241feab573d77bcab770750355) x - [`6b44f30`](https://github.com/langchain-ai/langchain-benchmarks/commit/6b44f30e76dc75379179bf2f9da6184fb32d353c) x ### 📊 Changes **9 files changed** (+2239 additions, -1111 deletions) <details> <summary>View changed files</summary> 📝 `docs/source/notebooks/tool_usage/multiverse_math.ipynb` (+715 -377) 📝 `docs/source/notebooks/tool_usage/relational_data.ipynb` (+309 -76) 📝 `docs/source/notebooks/tool_usage/typewriter_1.ipynb` (+1044 -168) 📝 `docs/source/notebooks/tool_usage/typewriter_26.ipynb` (+66 -462) 📝 `langchain_benchmarks/tool_usage/agents.py` (+4 -3) 📝 `langchain_benchmarks/tool_usage/evaluators.py` (+65 -13) ➕ `langchain_benchmarks/tool_usage/prompts.py` (+24 -0) 📝 `langchain_benchmarks/tool_usage/tasks/multiverse_math.py` (+3 -3) 📝 `poetry.lock` (+9 -9) </details> ### 📄 Description * Added benchmarks to typerwriter 1, multiverse, relational data * Updated the evaluator to be more configurable; it'll grade the multiverse math correctly now + allow skipping grading `output` for typewriter tasks * Fixed examples in a dataset (updated them already in the public dataset) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
yindo added the pull-request label 2026-02-16 00:18:12 -05:00
yindo closed this issue 2026-02-16 00:18:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: langchain-ai/langchain-benchmarks#101