docs: update docs to unified interface

This commit is contained in:
Clelia (Astra) Bertelli
2026-02-13 17:27:44 +01:00
parent e98ed6f241
commit 993bc6a5a8
4 changed files with 105 additions and 105 deletions
+44 -43
View File
@@ -2,12 +2,12 @@
> Semantic search and document parsing tools for the command line
A collection of high-performance CLI tools for document processing and semantic search, built with Rust for speed and reliability.
A high-performance CLI tool for document processing and semantic search, built with Rust for speed and reliability.
- **`parse`** - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format
- **`search`** - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching
- **`ask`** - AI agent with search and read tools for answering questions over document collections (defaults to OpenAI, but see the [config section](#configuration) to learn more about connecting to any OpenAI-Compatible API)
- **`workspace`** - Workspace management for accelerating search over large collections
- **`semtools parse`** - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format
- **`semtools search`** - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching
- **`semtools ask`** - AI agent with search and read tools for answering questions over document collections (defaults to OpenAI, but see the [config section](#configuration) to learn more about connecting to any OpenAI-Compatible API)
- **`semtools workspace`** - Workspace management for accelerating search over large collections
**NOTE:** By default, `parse` uses LlamaParse as a backend. Get your API key today for free at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai). `search` and `workspace` remain local-only. `ask` requires an OpenAI API key.
@@ -25,7 +25,8 @@ A collection of high-performance CLI tools for document processing and semantic
Prerequisites:
- For the `parse` tool: LlamaIndex Cloud API key
- For the `parse` subcommand: LlamaIndex Cloud API key
- For the `ask` subcommand: OpenAI API key
Install:
@@ -53,41 +54,41 @@ Basic Usage:
```bash
# Parse some files
parse my_dir/*.pdf
semtools parse my_dir/*.pdf
# Search some (text-based) files
search "some keywords" *.txt --max-distance 0.3 --n-lines 5
semtools search "some keywords" *.txt --max-distance 0.3 --n-lines 5
# Ask questions about your documents using an AI agent
ask "What are the main findings?" papers/*.txt
semtools ask "What are the main findings?" papers/*.txt
# Combine parsing and search
parse my_docs/*.pdf | xargs search "API endpoints"
semtools parse my_docs/*.pdf | xargs search "API endpoints"
# Ask a question to a set of files
ask "Some question?" *.txt
semtools ask "Some question?" *.txt
# Combine parsing with the ask agent
parse research_papers/*.pdf | xargs ask "Summarize the key methodologies"
semtools parse research_papers/*.pdf | xargs ask "Summarize the key methodologies"
# Ask based on stdin content
cat README.md | ask "How do I install SemTools?"
cat README.md | semtools ask "How do I install SemTools?"
```
Advanced Usage:
```bash
# Combine with grep for exact-match pre-filtering and distance thresholding
parse *.pdf | xargs cat | grep -i "error" | search "network error" --max-distance 0.3
semtools parse *.pdf | xargs cat | grep -i "error" | semtools search "network error" --max-distance 0.3
# Pipeline with content search (note the 'xargs' on search to search files instead of stdin)
find . -name "*.md" | xargs parse | xargs search "installation"
find . -name "*.md" | xargs semtools parse | xargs semtools search "installation"
# Combine with grep for filtering (grep could be before or after parse/search!)
parse docs/*.pdf | xargs search "API" | grep -A5 "authentication"
semtools parse docs/*.pdf | xargs semtools search "API" | grep -A5 "authentication"
# Save search results from stdin search
parse report.pdf | xargs cat | search "summary" > results.txt
semtools parse report.pdf | xargs cat | semtools search "summary" > results.txt
```
Using Workspaces:
@@ -95,7 +96,7 @@ Using Workspaces:
```bash
# Create or select a workspace
# Workspaces are stored in ~/.semtools/workspaces/
workspace use my-workspace
semtools workspace use my-workspace
> Workspace 'my-workspace' configured.
> To activate it, run:
> export SEMTOOLS_WORKSPACE=my-workspace
@@ -107,17 +108,17 @@ export SEMTOOLS_WORKSPACE=my-workspace
# All search commands will now use the workspace for caching embeddings
# The initial command is used to initialize the workspace
search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
# If documents change, they are automatically re-embedded and cached
echo "some new content" > ./some_large_dir/some_file.txt
search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
# If documents are removed, you can run prune to clean up stale files
workspace prune
semtools workspace prune
# You can see the stats of a workspace at any time
workspace status
semtools workspace status
> Active workspace: arxiv
> Root: /Users/loganmarkewich/.semtools/workspaces/arxiv
> Documents: 3000
@@ -127,10 +128,10 @@ workspace status
## CLI Help
```bash
$ parse --help
$ semtools parse --help
A CLI tool for parsing documents using various backends
Usage: parse [OPTIONS] <FILES>...
Usage: semtools parse [OPTIONS] <FILES>...
Arguments:
<FILES>... Files to parse
@@ -140,33 +141,32 @@ Options:
-b, --backend <BACKEND> The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
-v, --verbose Verbose output while parsing
-h, --help Print help
-V, --version Print version
```
```bash
$ search --help
$ semtools search --help
A CLI tool for fast semantic keyword search
Usage: search [OPTIONS] <QUERY> [FILES]...
Usage: semtools search [OPTIONS] <QUERY> [FILES]...
Arguments:
<QUERY> Query to search for (positional argument)
[FILES]... Files or directories to search
[FILES]... Files to search, optional if using stdin
Options:
-n, --n-lines <N_LINES> How many lines before/after to return as context [default: 3]
--top-k <TOP_K> The top-k files or texts to return (ignored if max_distance is set) [default: 3]
-m, --max-distance <MAX_DISTANCE> Return all results with distance below this threshold (0.0+)
-i, --ignore-case Perform case-insensitive search (default is false)
-j, --json Output results in JSON format
-h, --help Print help
-V, --version Print version
```
```bash
$ workspace --help
$ semtools workspace --help
Manage semtools workspaces
Usage: workspace <COMMAND>
Usage: semtools workspace [OPTIONS] <COMMAND>
Commands:
use Use or create a workspace (prints export command to run)
@@ -175,15 +175,15 @@ Commands:
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
-j, --json Output results in JSON format
-h, --help Print help
```
```bash
$ ask --help
A CLI tool for fast semantic keyword search
$ semtools ask --help
A CLI tool for document-based question-answering
Usage: ask [OPTIONS] <QUERY> [FILES]...
Usage: semtools ask [OPTIONS] <QUERY> [FILES]...
Arguments:
<QUERY> Query to prompt the agent with
@@ -194,8 +194,9 @@ Options:
--api-key <API_KEY> OpenAI API key (overrides config file and env var)
--base-url <BASE_URL> OpenAI base URL (overrides config file)
-m, --model <MODEL> Model to use for the agent (overrides config file)
--api-mode <API_MODE> API mode to use: 'chat' or 'responses' (overrides config file)
-j, --json Output results in JSON or text format
-h, --help Print help
-V, --version Print version
```
## Configuration
@@ -260,11 +261,11 @@ Configuration values are resolved in the following priority order (highest to lo
This allows you to set common defaults in the config file while overriding them on a per-command basis when needed.
### Tool-Specific Configuration
### Subcommand-Specific Configuration
#### Parse Tool
#### Parse Subcommand
The `parse` tool requires a LlamaParse API key. Get your free API key at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).
The `parse` subcommand requires a LlamaParse API key. Get your free API key at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).
Configuration options:
- `api_key`: Your LlamaParse API key
@@ -273,9 +274,9 @@ Configuration options:
- `parse_kwargs`: Additional parsing parameters
- `check_interval`, `max_timeout`, `max_retries`, `retry_delay_ms`, `backoff_multiplier`: Retry and timeout settings
#### Ask Tool
#### Ask Subcommand
The `ask` tool requires an OpenAI API key for the agent's LLM.
The `ask` subcommand requires an OpenAI API key for the agent's LLM.
Configuration options:
- `api_key`: Your OpenAI API key
@@ -285,7 +286,7 @@ Configuration options:
You can also override these per-command:
```bash
ask "What is this about?" docs/*.txt --model gpt-4o --api-key sk-...
semtools ask "What is this about?" docs/*.txt --model gpt-4o --api-key sk-...
```
## Agent Use Case Examples
+45 -45
View File
@@ -1,58 +1,58 @@
# Augmented CLI Tooling
If executing bash commands, you have three very helpful utilities installed
If executing bash commands, you have three very helpful sucommands installed within `semtools`:
- `parse` -- converts any non grep-able format into markdown, outputs a filepath for a converted markdown file for every input file to stdin
- `search` -- performs a search using static embeddings on either stdin or a list of files (very similar to grep). Works best with keyword based search queries. Only works with text-based files so it may require the `parse` tool to help preprocess into markdown.
- `workspace` - Workspace management for accelerating search over large collections
These commands, combined with other CLI commands, you can ensure that you can search large amounts of files efficiently, while handling various formats of documents. Both `parse` and `search` can scale to hundreds of thousands of documents.
## Parse CLI Help
## Parse Subcommand Help
```bash
parse --help
$ semtools parse --help
A CLI tool for parsing documents using various backends
Usage: parse [OPTIONS] <FILES>...
Usage: semtools parse [OPTIONS] <FILES>...
Arguments:
<FILES>... Files to parse
Options:
-c, --parse-config <PARSE_CONFIG> Path to the config file. Defaults to ~/.parse_config.json
-b, --backend <BACKEND> The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
-h, --help Print help
-V, --version Print version
-c, --config <CONFIG> Path to the config file. Defaults to ~/.semtools_config.json
-b, --backend <BACKEND> The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
-v, --verbose Verbose output while parsing
-h, --help Print help
```
## Search CLI Help
## Search Subcommand Help
```bash
search --help
$ semtools search --help
A CLI tool for fast semantic keyword search
Usage: search [OPTIONS] <QUERY> [FILES]...
Usage: semtools search [OPTIONS] <QUERY> [FILES]...
Arguments:
<QUERY> Query to search for (positional argument)
[FILES]... Files or directories to search
[FILES]... Files to search, optional if using stdin
Options:
-n, --n-lines <N_LINES> How many lines before/after to return as context [default: 3]
--top-k <TOP_K> The top-k files or texts to return (ignored if max_distance is set) [default: 3]
-m, --max-distance <MAX_DISTANCE> Return all results with distance below this threshold (0.0+)
-i, --ignore-case Perform case-insensitive search (default is false)
-j, --json Output results in JSON format
-h, --help Print help
-V, --version Print version
```
## Workspaces CLI Help
## Workspaces Subcommand Help
```bash
workspace --help
$ semtools workspace --help
Manage semtools workspaces
Usage: workspace <COMMAND>
Usage: semtools workspace [OPTIONS] <COMMAND>
Commands:
use Use or create a workspace (prints export command to run)
@@ -61,55 +61,55 @@ Commands:
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
-j, --json Output results in JSON format
-h, --help Print help
```
## Common Usage Patterns
Here's how to convert those standalone commands to `semtools` subcommands:
### Using Parse and Search
```bash
# Parse a PDF and search for specific content
parse document.pdf | xargs cat | search "error handling"
semtools parse document.pdf | xargs cat | semtools search "error handling"
# Search within many files after parsing
parse my_docs/*.pdf | xargs search "API endpoints"
semtools parse my_docs/*.pdf | xargs semtools search "API endpoints"
# Search with custom context and thresholds or distance thresholds
search "machine learning" *.txt --n-lines 5 --max-distance 0.3
semtools search "machine learning" *.txt --n-lines 5 --max-distance 0.3
# Search from stdin
echo "some text content" | search "content"
echo "some text content" | semtools search "content"
# Parse multiple documents
parse report.pdf data.xlsx presentation.pptx
semtools parse report.pdf data.xlsx presentation.pptx
# Chain parsing with semantic search
parse *.pdf | xargs search "financial projections" --n-lines 3
semtools parse *.pdf | xargs semtools search "financial projections" --n-lines 3
# Search with distance threshold (lower = more similar)
parse document.pdf | xargs cat | search "revenue" --max-distance 0.2
semtools parse document.pdf | xargs cat | semtools search "revenue" --max-distance 0.2
# Search multiple files directly
search "error handling" src/*.rs --top-k 5
semtools search "error handling" src/*.rs --top-k 5
# Combine with grep for exact-match pre-filtering and distance thresholding
parse *.pdf | xargs cat | grep -i "error" | search "network error" --max-distance 0.3
semtools parse *.pdf | xargs cat | grep -i "error" | semtools search "network error" --max-distance 0.3
# Pipeline with content search (note the 'cat')
find . -name "*.md" | xargs parse | xargs search "installation"
find . -name "*.md" | xargs semtools parse | xargs semtools search "installation"
```
### Using with Workspaces
```bash
# Create or select a workspace
# Workspaces are stored in ~/.semtools/workspaces/
workspace use my-workspace
semtools workspace use my-workspace
> Workspace 'my-workspace' configured.
> To activate it, run:
> export SEMTOOLS_WORKSPACE=my-workspace
@@ -121,44 +121,44 @@ export SEMTOOLS_WORKSPACE=my-workspace
# All search commands will now use the workspace for caching embeddings
# The initial command is used to initialize the workspace
search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
# If documents change, they are automatically re-embedded and cached
echo "some new content" > ./some_large_dir/some_file.txt
search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
# A workspace example if you are using with parse
# create a workspace
workspace use my-workspace2
semtools workspace use my-workspace2
export SEMTOOLS_WORKSPACE=my-workspace2
# parse files, and then search over the parsed files, and cache the file embeddings
parse *.pdf | xargs search "financial projections" --n-lines 3
semtools parse *.pdf | xargs semtools search "financial projections" --n-lines 3
# if you run the command with a different query (see option a and b), over the same set of files, then search will operate
# over the cached file embeddings
# option a - parse won't rerun since files already cached
parse *.pdf | xargs search "balance sheet" --n-lines 3
semtools parse *.pdf | xargs semtools search "balance sheet" --n-lines 3
# option b - run search directly over the parse cache
xargs search "balance sheet" /Users/jerryliu/.parse/*.pdf.md --n-lines 3
xargs semtools search "balance sheet" /Users/jerryliu/.parse/*.pdf.md --n-lines 3
# If documents are removed, you can run prune to clean up stale files
workspace prune
semtools workspace prune
# You can see the stats of a workspace at any time
workspace status
semtools workspace status
> Active workspace: arxiv
> Root: /Users/loganmarkewich/.semtools/workspaces/arxiv
> Documents: 3000
> Index: Yes (IVF_PQ)
```
## Tips for using these tools
- If you have run / plan on running repeated `search` queries over the same file or set of files, you SHOULD create a workspace (`workspace use`) before running parse/search commands - otherwise you will be re-embedding the same document collections from scratch every time. Make sure the environment variable is set before downstream commands.
- Before you create a workspace, you can check current workspace through `workspace status` which will also give the directory where all workspaces are stored.
- You can choose to add a new workspace or prune an existing one if you are changing to a different collection of files. You can check the status through `workspace status`.
- If you have run / plan on running repeated `search` queries over the same file or set of files, you SHOULD create a workspace (`semtools workspace use`) before running parse/search commands - otherwise you will be re-embedding the same document collections from scratch every time. Make sure the environment variable is set before downstream commands.
- Before you create a workspace, you can check current workspace through `semtools workspace status` which will also give the directory where all workspaces are stored.
- You can choose to add a new workspace or prune an existing one if you are changing to a different collection of files. You can check the status through `semtools workspace status`.
- `parse` will always output paths of parsed files to stdin. These parsed files represent the markdown version of their original file (for example, parsing a PDF or DOCX file into markdown).
- ALWAYS call `parse` first when interacting with PDF (or similar) formats so that you can get the paths to the markdown versions of those files
- `search` only works with text-based files (like markdown). It's a common pattern to first call `parse` and either feed files into `search` or cat files and search from stdin
+8 -8
View File
@@ -42,10 +42,10 @@ In this example, I have a folder of 900+ papers (as PDFs) from an AI conference.
First, I parse the PDFs (which caches them to disk at `~/.parse`):
```bash
parse ./papers
semtools parse ./papers
```
Next, I invoke `claude` and ask it to summarize its takes on aspects of the papers. Using `parse`, `search`, and other common CLI commands, Claude is able to gather specific information from all of the papers.
Next, I invoke `claude` and ask it to summarize its takes on aspects of the papers. Using `semtools parse`, `semtools search`, and other common CLI commands, Claude is able to gather specific information from all of the papers.
Below is the full trace from Claude-Code, which demonstrates it's ability to use `semtools` like `parse` and `search` to gather information from your files.
@@ -82,7 +82,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
⏺ I can see there are 910 ACL 2023 papers in PDF format. Now I'll parse all these PDFs to make them searchable, then look for papers
discussing LLMs and evaluations.
⏺ Bash(parse *.pdf)
⏺ Bash(semtools parse *.pdf)
 /Users/loganmarkewich/.parse/2023.acl-long.0.pdf.md
/Users/loganmarkewich/.parse/2023.acl-long.1.pdf.md
/Users/loganmarkewich/.parse/2023.acl-long.10.pdf.md
@@ -96,7 +96,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
⏺ Great! Now I have parsed all the PDFs. Let me search for papers related to LLMs and evaluations using the search tool.
⏺ Bash(parse *.pdf | xargs cat | search "large language model, evaluation, assessment, benchmark" --n-lines 5 --max-distance 0.4)
⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "large language model, evaluation, assessment, benchmark" --n-lines 5 --max-distance 0.4)
 <stdin>:447094::447105 (0.16900031210055844)
447095: for Computational Linguistics.
447096:
@@ -104,7 +104,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
⏺ Now let me search for more specific terms to get a broader view of LLM evaluation research:
⏺ Bash(parse *.pdf | xargs cat | search "GPT, BERT, language model evaluation, performance evaluation, model comparison" --n-lines 5
⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "GPT, BERT, language model evaluation, performance evaluation, model comparison" --n-lines 5
--max-distance 0.35)
 <stdin>:244769::244780 (0.2719643874147169)
244770: <td>
@@ -113,7 +113,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
⏺ Now let me search for specific types of evaluation studies and findings:
⏺ Bash(parse *.pdf | xargs cat | search "robustness, bias, fairness, hallucination, reliability" --n-lines 4 --max-distance 0.3)
⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "robustness, bias, fairness, hallucination, reliability" --n-lines 4 --max-distance 0.3)
 Using cached result for: 2023.acl-long.0.pdf
Using cached result for: 2023.acl-long.1.pdf
Using cached result for: 2023.acl-long.10.pdf
@@ -129,7 +129,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
⏺ Let me search for paper titles and abstracts to get a better understanding of the specific papers discussing LLM evaluation:
⏺ Bash(parse *.pdf | xargs cat | search "Abstract" --n-lines 10 | grep -A 10 -B 2 -i "large language model\|evaluation\|benchmark\|GPT\|BERT")
⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "Abstract" --n-lines 10 | grep -A 10 -B 2 -i "large language model\|evaluation\|benchmark\|GPT\|BERT")
 <stdin>:94456::94477 (0.00000000009809730805202874)
94457: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
94458: Reasoning by Large Language Models
@@ -203,5 +203,5 @@ This run cost $0.68 and took about 4 minutes. Not bad for researching across 900
## Tips
- Your `CLAUDE.md` / `AGENTS.md` / etc. is critical to ensure your agent knows about `semtools`. Giving it good examples of usage (maybe from this guide) is a good idea.
- Parsing files can take a while. I recommend using `parse` ahead of time if you intend to give your agent a lot of files to work with that need to be parsed. Results will be cached to disk at `~/.parse`.
- Parsing files can take a while. I recommend using `semtools parse` ahead of time if you intend to give your agent a lot of files to work with that need to be parsed. Results will be cached to disk at `~/.parse`.
- Giving your agent any context on the files it has access to will also help it understand how to help you with them.
+8 -9
View File
@@ -59,7 +59,7 @@ Here's an example where I prompt it to interact with a directory of 900+ PDF fil
First, I parse the PDFs (which caches them to disk at `~/.parse`):
```bash
parse ./papers
semtools parse ./papers
```
Then I can write a script to call an agent to interact with the files and search for information:
@@ -109,13 +109,13 @@ Which outputs:
$ python ./agent.py
Calling tool execute_bash({'command': 'find . -name "*.pdf" | head -10'})
Calling tool execute_bash({'command': 'find . -name "*.pdf" | wc -l'})
Calling tool execute_bash({'command': 'find . -name "*.pdf" | xargs parse'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "large language model, LLM, evaluation, benchmark" --top-k 10 --n-lines 5'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "GPT, ChatGPT, instruction tuning, evaluation benchmark, model evaluation" --top-k 15 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "human evaluation, automatic evaluation, BLEU, ROUGE, BERTScore, evaluation metric" --top-k 10 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "scaling laws, emergent abilities, few-shot learning, in-context learning, prompt engineering" --top-k 10 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "title, abstract" --top-k 5 --n-lines 10 | head -50'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "# " --top-k 20 --n-lines 2 | grep -E "(LLM|evaluation|benchmark|GPT|language model)" | head -20'})
Calling tool execute_bash({'command': 'find . -name "*.pdf" | xargs semtools parse'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "large language model, LLM, evaluation, benchmark" --top-k 10 --n-lines 5'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "GPT, ChatGPT, instruction tuning, evaluation benchmark, model evaluation" --top-k 15 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "human evaluation, automatic evaluation, BLEU, ROUGE, BERTScore, evaluation metric" --top-k 10 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "scaling laws, emergent abilities, few-shot learning, in-context learning, prompt engineering" --top-k 10 --n-lines 4'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "title, abstract" --top-k 5 --n-lines 10 | head -50'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "# " --top-k 20 --n-lines 2 | grep -E "(LLM|evaluation|benchmark|GPT|language model)" | head -20'})
Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | head -5 | xargs -I {} sh -c \'echo "=== {} ===" && head -20 "{}"\''})
Based on my analysis of the ACL 2023 papers, I can now provide a comprehensive summary of the key findings about LLMs and evaluations:
@@ -182,4 +182,3 @@ The ACL 2023 papers collectively suggest that:
The research community is moving toward more nuanced, multi-dimensional evaluation approaches that better capture the complexities of language understanding and generation in real-world applications.
```