docs: update docs to unified interface

2026-06-30 21:38:00 -04:00 · 2026-02-13 17:27:44 +01:00
parent e98ed6f241
commit 993bc6a5a8
4 changed files with 105 additions and 105 deletions
@@ -2,12 +2,12 @@

 > Semantic search and document parsing tools for the command line

-A collection of high-performance CLI tools for document processing and semantic search, built with Rust for speed and reliability.
+A high-performance CLI tool for document processing and semantic search, built with Rust for speed and reliability.

- **`parse`** - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format
- **`search`** - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching
- **`ask`** - AI agent with search and read tools for answering questions over document collections (defaults to OpenAI, but see the [config section](#configuration) to learn more about connecting to any OpenAI-Compatible API)
- **`workspace`** - Workspace management for accelerating search over large collections
+- **`semtools parse`** - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format
+- **`semtools search`** - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching
+- **`semtools ask`** - AI agent with search and read tools for answering questions over document collections (defaults to OpenAI, but see the [config section](#configuration) to learn more about connecting to any OpenAI-Compatible API)
+- **`semtools workspace`** - Workspace management for accelerating search over large collections

 **NOTE:** By default, `parse` uses LlamaParse as a backend. Get your API key today for free at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai). `search` and `workspace` remain local-only. `ask` requires an OpenAI API key.

@@ -25,7 +25,8 @@ A collection of high-performance CLI tools for document processing and semantic

 Prerequisites:

- For the `parse` tool: LlamaIndex Cloud API key
+- For the `parse` subcommand: LlamaIndex Cloud API key
+- For the `ask` subcommand: OpenAI API key

 Install:

@@ -53,41 +54,41 @@ Basic Usage:

 ```bash
 # Parse some files
-parse my_dir/*.pdf
+semtools parse my_dir/*.pdf

 # Search some (text-based) files
-search "some keywords" *.txt --max-distance 0.3 --n-lines 5
+semtools search "some keywords" *.txt --max-distance 0.3 --n-lines 5

 # Ask questions about your documents using an AI agent
-ask "What are the main findings?" papers/*.txt
+semtools ask "What are the main findings?" papers/*.txt

 # Combine parsing and search
-parse my_docs/*.pdf | xargs search "API endpoints"
+semtools parse my_docs/*.pdf | xargs search "API endpoints"

 # Ask a question to a set of files
-ask "Some question?" *.txt 
+semtools ask "Some question?" *.txt 

 # Combine parsing with the ask agent
-parse research_papers/*.pdf | xargs ask "Summarize the key methodologies"
+semtools parse research_papers/*.pdf | xargs ask "Summarize the key methodologies"

 # Ask based on stdin content
-cat README.md | ask "How do I install SemTools?"
+cat README.md | semtools ask "How do I install SemTools?"
 ```

 Advanced Usage:

 ```bash
 # Combine with grep for exact-match pre-filtering and distance thresholding
-parse *.pdf | xargs cat | grep -i "error" | search "network error" --max-distance 0.3
+semtools parse *.pdf | xargs cat | grep -i "error" | semtools search "network error" --max-distance 0.3

 # Pipeline with content search (note the 'xargs' on search to search files instead of stdin)
-find . -name "*.md" | xargs parse | xargs search "installation"
+find . -name "*.md" | xargs semtools parse | xargs semtools search "installation"

 # Combine with grep for filtering (grep could be before or after parse/search!)
-parse docs/*.pdf | xargs search "API" | grep -A5 "authentication"
+semtools parse docs/*.pdf | xargs semtools search "API" | grep -A5 "authentication"

 # Save search results from stdin search
-parse report.pdf | xargs cat | search "summary" > results.txt
+semtools parse report.pdf | xargs cat | semtools search "summary" > results.txt
 ```

 Using Workspaces:
@@ -95,7 +96,7 @@ Using Workspaces:
 ```bash
 # Create or select a workspace
 # Workspaces are stored in ~/.semtools/workspaces/
-workspace use my-workspace
+semtools workspace use my-workspace
 > Workspace 'my-workspace' configured.
 > To activate it, run:
 >   export SEMTOOLS_WORKSPACE=my-workspace
@@ -107,17 +108,17 @@ export SEMTOOLS_WORKSPACE=my-workspace

 # All search commands will now use the workspace for caching embeddings
 # The initial command is used to initialize the workspace
-search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
+semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10

 # If documents change, they are automatically re-embedded and cached
 echo "some new content" > ./some_large_dir/some_file.txt
-search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
+semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10

 # If documents are removed, you can run prune to clean up stale files
-workspace prune
+semtools workspace prune

 # You can see the stats of a workspace at any time
-workspace status
+semtools workspace status
 > Active workspace: arxiv
 > Root: /Users/loganmarkewich/.semtools/workspaces/arxiv
 > Documents: 3000
@@ -127,10 +128,10 @@ workspace status
 ## CLI Help

 ```bash
-$ parse --help
+$ semtools parse --help
 A CLI tool for parsing documents using various backends

-Usage: parse [OPTIONS] <FILES>...
+Usage: semtools parse [OPTIONS] <FILES>...

 Arguments:
  <FILES>...  Files to parse
@@ -140,33 +141,32 @@ Options:
  -b, --backend <BACKEND>  The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
  -v, --verbose            Verbose output while parsing
  -h, --help               Print help
-  -V, --version            Print version
 ```

 ```bash
-$ search --help
+$ semtools search --help
 A CLI tool for fast semantic keyword search

-Usage: search [OPTIONS] <QUERY> [FILES]...
+Usage: semtools search [OPTIONS] <QUERY> [FILES]...

 Arguments:
  <QUERY>     Query to search for (positional argument)
-  [FILES]...  Files or directories to search
+  [FILES]...  Files to search, optional if using stdin

 Options:
  -n, --n-lines <N_LINES>            How many lines before/after to return as context [default: 3]
      --top-k <TOP_K>                The top-k files or texts to return (ignored if max_distance is set) [default: 3]
  -m, --max-distance <MAX_DISTANCE>  Return all results with distance below this threshold (0.0+)
  -i, --ignore-case                  Perform case-insensitive search (default is false)
+  -j, --json                         Output results in JSON format
  -h, --help                         Print help
-  -V, --version                      Print version
 ```

 ```bash
-$ workspace --help
+$ semtools workspace --help
 Manage semtools workspaces

-Usage: workspace <COMMAND>
+Usage: semtools workspace [OPTIONS] <COMMAND>

 Commands:
  use     Use or create a workspace (prints export command to run)
@@ -175,15 +175,15 @@ Commands:
  help    Print this message or the help of the given subcommand(s)

 Options:
-  -h, --help     Print help
-  -V, --version  Print version
+  -j, --json  Output results in JSON format
+  -h, --help  Print help
 ```

 ```bash
-$ ask --help
-A CLI tool for fast semantic keyword search
+$ semtools ask --help
+A CLI tool for document-based question-answering

-Usage: ask [OPTIONS] <QUERY> [FILES]...
+Usage: semtools ask [OPTIONS] <QUERY> [FILES]...

 Arguments:
  <QUERY>     Query to prompt the agent with
@@ -194,8 +194,9 @@ Options:
      --api-key <API_KEY>    OpenAI API key (overrides config file and env var)
      --base-url <BASE_URL>  OpenAI base URL (overrides config file)
  -m, --model <MODEL>        Model to use for the agent (overrides config file)
+      --api-mode <API_MODE>  API mode to use: 'chat' or 'responses' (overrides config file)
+  -j, --json                 Output results in JSON or text format
  -h, --help                 Print help
-  -V, --version              Print version
 ```

 ## Configuration
@@ -260,11 +261,11 @@ Configuration values are resolved in the following priority order (highest to lo

 This allows you to set common defaults in the config file while overriding them on a per-command basis when needed.

-### Tool-Specific Configuration
+### Subcommand-Specific Configuration

-#### Parse Tool
+#### Parse Subcommand

-The `parse` tool requires a LlamaParse API key. Get your free API key at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).
+The `parse` subcommand requires a LlamaParse API key. Get your free API key at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).

 Configuration options:
 - `api_key`: Your LlamaParse API key
@@ -273,9 +274,9 @@ Configuration options:
 - `parse_kwargs`: Additional parsing parameters
 - `check_interval`, `max_timeout`, `max_retries`, `retry_delay_ms`, `backoff_multiplier`: Retry and timeout settings

-#### Ask Tool
+#### Ask Subcommand

-The `ask` tool requires an OpenAI API key for the agent's LLM.
+The `ask` subcommand requires an OpenAI API key for the agent's LLM.

 Configuration options:
 - `api_key`: Your OpenAI API key
@@ -285,7 +286,7 @@ Configuration options:

 You can also override these per-command:
 ```bash
-ask "What is this about?" docs/*.txt --model gpt-4o --api-key sk-...
+semtools ask "What is this about?" docs/*.txt --model gpt-4o --api-key sk-...
 ```

 ## Agent Use Case Examples
@@ -1,58 +1,58 @@
 # Augmented CLI Tooling

-If executing bash commands, you have three very helpful utilities installed
+If executing bash commands, you have three very helpful sucommands installed within `semtools`:
 - `parse` -- converts any non grep-able format into markdown, outputs a filepath for a converted markdown file for every input file to stdin
 - `search` -- performs a search using static embeddings on either stdin or a list of files (very similar to grep). Works best with keyword based search queries. Only works with text-based files so it may require the `parse` tool to help preprocess into markdown.
 - `workspace` - Workspace management for accelerating search over large collections

 These commands, combined with other CLI commands, you can ensure that you can search large amounts of files efficiently, while handling various formats of documents. Both `parse` and `search` can scale to hundreds of thousands of documents.

-## Parse CLI Help
+## Parse Subcommand Help

 ```bash
-parse --help
+$ semtools parse --help
 A CLI tool for parsing documents using various backends

-Usage: parse [OPTIONS] <FILES>...
+Usage: semtools parse [OPTIONS] <FILES>...

 Arguments:
  <FILES>...  Files to parse

 Options:
-  -c, --parse-config <PARSE_CONFIG>  Path to the config file. Defaults to ~/.parse_config.json
-  -b, --backend <BACKEND>            The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
-  -h, --help                         Print help
-  -V, --version                      Print version
+  -c, --config <CONFIG>    Path to the config file. Defaults to ~/.semtools_config.json
+  -b, --backend <BACKEND>  The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
+  -v, --verbose            Verbose output while parsing
+  -h, --help               Print help
 ```

-## Search CLI Help
+## Search Subcommand Help

 ```bash
-search --help
+$ semtools search --help
 A CLI tool for fast semantic keyword search

-Usage: search [OPTIONS] <QUERY> [FILES]...
+Usage: semtools search [OPTIONS] <QUERY> [FILES]...

 Arguments:
  <QUERY>     Query to search for (positional argument)
-  [FILES]...  Files or directories to search
+  [FILES]...  Files to search, optional if using stdin

 Options:
  -n, --n-lines <N_LINES>            How many lines before/after to return as context [default: 3]
      --top-k <TOP_K>                The top-k files or texts to return (ignored if max_distance is set) [default: 3]
  -m, --max-distance <MAX_DISTANCE>  Return all results with distance below this threshold (0.0+)
  -i, --ignore-case                  Perform case-insensitive search (default is false)
+  -j, --json                         Output results in JSON format
  -h, --help                         Print help
-  -V, --version                      Print version
 ```

-## Workspaces CLI Help
+## Workspaces Subcommand Help

 ```bash
-workspace --help
+$ semtools workspace --help
 Manage semtools workspaces

-Usage: workspace <COMMAND>
+Usage: semtools workspace [OPTIONS] <COMMAND>

 Commands:
  use     Use or create a workspace (prints export command to run)
@@ -61,55 +61,55 @@ Commands:
  help    Print this message or the help of the given subcommand(s)

 Options:
-  -h, --help     Print help
-  -V, --version  Print version
+  -j, --json  Output results in JSON format
+  -h, --help  Print help
 ```


 ## Common Usage Patterns

+Here's how to convert those standalone commands to `semtools` subcommands:
+
 ### Using Parse and Search

 ```bash
 # Parse a PDF and search for specific content
-parse document.pdf | xargs cat | search "error handling"
+semtools parse document.pdf | xargs cat | semtools search "error handling"

 # Search within many files after parsing
-parse my_docs/*.pdf | xargs search "API endpoints"
+semtools parse my_docs/*.pdf | xargs semtools search "API endpoints"

 # Search with custom context and thresholds or distance thresholds
-search "machine learning" *.txt --n-lines 5 --max-distance 0.3
+semtools search "machine learning" *.txt --n-lines 5 --max-distance 0.3

 # Search from stdin
-echo "some text content" | search "content"
+echo "some text content" | semtools search "content"

 # Parse multiple documents
-parse report.pdf data.xlsx presentation.pptx
+semtools parse report.pdf data.xlsx presentation.pptx

 # Chain parsing with semantic search
-parse *.pdf | xargs search "financial projections" --n-lines 3
+semtools parse *.pdf | xargs semtools search "financial projections" --n-lines 3

 # Search with distance threshold (lower = more similar)
-parse document.pdf | xargs cat | search "revenue" --max-distance 0.2
+semtools parse document.pdf | xargs cat | semtools search "revenue" --max-distance 0.2

 # Search multiple files directly
-search "error handling" src/*.rs --top-k 5
+semtools search "error handling" src/*.rs --top-k 5

 # Combine with grep for exact-match pre-filtering and distance thresholding
-parse *.pdf | xargs cat | grep -i "error" | search "network error" --max-distance 0.3
+semtools parse *.pdf | xargs cat | grep -i "error" | semtools search "network error" --max-distance 0.3

 # Pipeline with content search (note the 'cat')
-find . -name "*.md" | xargs parse | xargs search "installation"
+find . -name "*.md" | xargs semtools parse | xargs semtools search "installation"
 ```

-
-
 ### Using with Workspaces

 ```bash
 # Create or select a workspace
 # Workspaces are stored in ~/.semtools/workspaces/
-workspace use my-workspace
+semtools workspace use my-workspace
 > Workspace 'my-workspace' configured.
 > To activate it, run:
 >   export SEMTOOLS_WORKSPACE=my-workspace
@@ -121,44 +121,44 @@ export SEMTOOLS_WORKSPACE=my-workspace

 # All search commands will now use the workspace for caching embeddings
 # The initial command is used to initialize the workspace
-search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
+semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10

 # If documents change, they are automatically re-embedded and cached
 echo "some new content" > ./some_large_dir/some_file.txt
-search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10
-
+semtools search "some keywords" ./some_large_dir/*.txt --n-lines 5 --top-k 10

 # A workspace example if you are using with parse
 # create a workspace
-workspace use my-workspace2
+semtools workspace use my-workspace2
 export SEMTOOLS_WORKSPACE=my-workspace2
+
 # parse files, and then search over the parsed files, and cache the file embeddings
-parse *.pdf | xargs search "financial projections" --n-lines 3
+semtools parse *.pdf | xargs semtools search "financial projections" --n-lines 3
+
 # if you run the command with a different query (see option a and b), over the same set of files, then search will operate
 # over the cached file embeddings
 # option a - parse won't rerun since files already cached
-parse *.pdf | xargs search "balance sheet" --n-lines 3
+semtools parse *.pdf | xargs semtools search "balance sheet" --n-lines 3
+
 # option b - run search directly over the parse cache
-xargs search "balance sheet" /Users/jerryliu/.parse/*.pdf.md --n-lines 3  
+xargs semtools search "balance sheet" /Users/jerryliu/.parse/*.pdf.md --n-lines 3  

 # If documents are removed, you can run prune to clean up stale files
-workspace prune
+semtools workspace prune

 # You can see the stats of a workspace at any time
-workspace status
+semtools workspace status
 > Active workspace: arxiv
 > Root: /Users/loganmarkewich/.semtools/workspaces/arxiv
 > Documents: 3000
 > Index: Yes (IVF_PQ)
-
 ```

-
 ## Tips for using these tools

- If you have run / plan on running repeated `search` queries over the same file or set of files, you SHOULD create a workspace (`workspace use`) before running parse/search commands - otherwise you will be re-embedding the same document collections from scratch every time. Make sure the environment variable is set before downstream commands.
- Before you create a workspace, you can check current workspace through `workspace status` which will also give the directory where all workspaces are stored.
- You can choose to add a new workspace or prune an existing one if you are changing to a different collection of files. You can check the status through `workspace status`.
+- If you have run / plan on running repeated `search` queries over the same file or set of files, you SHOULD create a workspace (`semtools workspace use`) before running parse/search commands - otherwise you will be re-embedding the same document collections from scratch every time. Make sure the environment variable is set before downstream commands.
+- Before you create a workspace, you can check current workspace through `semtools workspace status` which will also give the directory where all workspaces are stored.
+- You can choose to add a new workspace or prune an existing one if you are changing to a different collection of files. You can check the status through `semtools workspace status`.
 - `parse` will always output paths of parsed files to stdin. These parsed files represent the markdown version of their original file (for example, parsing a PDF or DOCX file into markdown).
 - ALWAYS call `parse` first when interacting with PDF (or similar) formats so that you can get the paths to the markdown versions of those files
 - `search` only works with text-based files (like markdown). It's a common pattern to first call `parse` and either feed files into `search` or cat files and search from stdin
@@ -42,10 +42,10 @@ In this example, I have a folder of 900+ papers (as PDFs) from an AI conference.
 First, I parse the PDFs (which caches them to disk at `~/.parse`):

 ```bash
-parse ./papers
+semtools parse ./papers
 ```

-Next, I invoke `claude` and ask it to summarize its takes on aspects of the papers. Using `parse`, `search`, and other common CLI commands, Claude is able to gather specific information from all of the papers.
+Next, I invoke `claude` and ask it to summarize its takes on aspects of the papers. Using `semtools parse`, `semtools search`, and other common CLI commands, Claude is able to gather specific information from all of the papers.

 Below is the full trace from Claude-Code, which demonstrates it's ability to use `semtools` like `parse` and `search` to gather information from your files.

@@ -82,7 +82,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use
 ⏺ I can see there are 910 ACL 2023 papers in PDF format. Now I'll parse all these PDFs to make them searchable, then look for papers
  discussing LLMs and evaluations.

-⏺ Bash(parse *.pdf)
+⏺ Bash(semtools parse *.pdf)
  ⎿  /Users/loganmarkewich/.parse/2023.acl-long.0.pdf.md 
     /Users/loganmarkewich/.parse/2023.acl-long.1.pdf.md
     /Users/loganmarkewich/.parse/2023.acl-long.10.pdf.md
@@ -96,7 +96,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use

 ⏺ Great! Now I have parsed all the PDFs. Let me search for papers related to LLMs and evaluations using the search tool.

-⏺ Bash(parse *.pdf | xargs cat | search "large language model, evaluation, assessment, benchmark" --n-lines 5 --max-distance 0.4)
+⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "large language model, evaluation, assessment, benchmark" --n-lines 5 --max-distance 0.4)
  ⎿  <stdin>:447094::447105 (0.16900031210055844)
     447095: for Computational Linguistics.
     447096:
@@ -104,7 +104,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use

 ⏺ Now let me search for more specific terms to get a broader view of LLM evaluation research:

-⏺ Bash(parse *.pdf | xargs cat | search "GPT, BERT, language model evaluation, performance evaluation, model comparison" --n-lines 5
+⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "GPT, BERT, language model evaluation, performance evaluation, model comparison" --n-lines 5
      --max-distance 0.35)
  ⎿  <stdin>:244769::244780 (0.2719643874147169)                                                             
     244770: <td>
@@ -113,7 +113,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use

 ⏺ Now let me search for specific types of evaluation studies and findings:

-⏺ Bash(parse *.pdf | xargs cat | search "robustness, bias, fairness, hallucination, reliability" --n-lines 4 --max-distance 0.3)
+⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "robustness, bias, fairness, hallucination, reliability" --n-lines 4 --max-distance 0.3)
  ⎿  Using cached result for: 2023.acl-long.0.pdf 
     Using cached result for: 2023.acl-long.1.pdf
     Using cached result for: 2023.acl-long.10.pdf
@@ -129,7 +129,7 @@ Below is the full trace from Claude-Code, which demonstrates it's ability to use

 ⏺ Let me search for paper titles and abstracts to get a better understanding of the specific papers discussing LLM evaluation:

-⏺ Bash(parse *.pdf | xargs cat | search "Abstract" --n-lines 10 | grep -A 10 -B 2 -i "large language model\|evaluation\|benchmark\|GPT\|BERT")
+⏺ Bash(semtools parse *.pdf | xargs cat | semtools search "Abstract" --n-lines 10 | grep -A 10 -B 2 -i "large language model\|evaluation\|benchmark\|GPT\|BERT")
  ⎿  <stdin>:94456::94477 (0.00000000009809730805202874)                    
     94457:   Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
     94458:                 Reasoning by Large Language Models
@@ -203,5 +203,5 @@ This run cost $0.68 and took about 4 minutes. Not bad for researching across 900
 ## Tips

 - Your `CLAUDE.md` / `AGENTS.md` / etc. is critical to ensure your agent knows about `semtools`. Giving it good examples of usage (maybe from this guide) is a good idea.
- Parsing files can take a while. I recommend using `parse` ahead of time if you intend to give your agent a lot of files to work with that need to be parsed. Results will be cached to disk at `~/.parse`.
+- Parsing files can take a while. I recommend using `semtools parse` ahead of time if you intend to give your agent a lot of files to work with that need to be parsed. Results will be cached to disk at `~/.parse`.
 - Giving your agent any context on the files it has access to will also help it understand how to help you with them.
@@ -59,7 +59,7 @@ Here's an example where I prompt it to interact with a directory of 900+ PDF fil
 First, I parse the PDFs (which caches them to disk at `~/.parse`):

 ```bash
-parse ./papers
+semtools parse ./papers
 ```

 Then I can write a script to call an agent to interact with the files and search for information:
@@ -109,13 +109,13 @@ Which outputs:
 $ python ./agent.py
 Calling tool execute_bash({'command': 'find . -name "*.pdf" | head -10'})
 Calling tool execute_bash({'command': 'find . -name "*.pdf" | wc -l'})
-Calling tool execute_bash({'command': 'find . -name "*.pdf" | xargs parse'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "large language model, LLM, evaluation, benchmark" --top-k 10 --n-lines 5'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "GPT, ChatGPT, instruction tuning, evaluation benchmark, model evaluation" --top-k 15 --n-lines 4'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "human evaluation, automatic evaluation, BLEU, ROUGE, BERTScore, evaluation metric" --top-k 10 --n-lines 4'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "scaling laws, emergent abilities, few-shot learning, in-context learning, prompt engineering" --top-k 10 --n-lines 4'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "title, abstract" --top-k 5 --n-lines 10 | head -50'})
-Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs search "# " --top-k 20 --n-lines 2 | grep -E "(LLM|evaluation|benchmark|GPT|language model)" | head -20'})
+Calling tool execute_bash({'command': 'find . -name "*.pdf" | xargs semtools parse'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "large language model, LLM, evaluation, benchmark" --top-k 10 --n-lines 5'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "GPT, ChatGPT, instruction tuning, evaluation benchmark, model evaluation" --top-k 15 --n-lines 4'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "human evaluation, automatic evaluation, BLEU, ROUGE, BERTScore, evaluation metric" --top-k 10 --n-lines 4'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "scaling laws, emergent abilities, few-shot learning, in-context learning, prompt engineering" --top-k 10 --n-lines 4'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "title, abstract" --top-k 5 --n-lines 10 | head -50'})
+Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | xargs semtools search "# " --top-k 20 --n-lines 2 | grep -E "(LLM|evaluation|benchmark|GPT|language model)" | head -20'})
 Calling tool execute_bash({'command': 'find /Users/loganmarkewich/.parse -name "*.md" | head -5 | xargs -I {} sh -c \'echo "=== {} ===" && head -20 "{}"\''})
 Based on my analysis of the ACL 2023 papers, I can now provide a comprehensive summary of the key findings about LLMs and evaluations:

@@ -182,4 +182,3 @@ The ACL 2023 papers collectively suggest that:

 The research community is moving toward more nuanced, multi-dimensional evaluation approaches that better capture the complexities of language understanding and generation in real-world applications.
 ```
-