remove llama-extract

remove report pdf
make e2e work
2026-07-01 21:44:37 -04:00 · 2025-02-05 12:54:43 -06:00 · 2025-02-04 21:21:02 -06:00 · 2025-02-04 21:09:29 -06:00 · 2025-02-04 15:37:32 -06:00 · 2025-02-03 22:47:13 -06:00
116 changed files with 5762 additions and 970 deletions
@@ -45,4 +45,4 @@ jobs:
      - name: Test import
        shell: bash
        working-directory: ${{ vars.RUNNER_TEMP }}
-        run: python -c "import llama_parse"
+        run: python -c "import llama_cloud_services"
@@ -23,16 +23,31 @@ jobs:
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}
+
      - name: Install Poetry
        uses: snok/install-poetry@v1
        with:
          version: ${{ env.POETRY_VERSION }}
+
      - name: Install deps
        shell: bash
        run: pip install -e .
-      - name: Build and publish to pypi
+
+      - name: Build and publish llama-cloud-services
        uses: JRubics/poetry-publish@v2.1
        with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_cloud_services"
+          pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
+          poetry_install_options: "--without dev"
+
+      - name: Build and publish llama-parse
+        uses: JRubics/poetry-publish@v2.1
+        with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_parse"
          pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
          poetry_install_options: "--without dev"

@@ -52,6 +67,7 @@ jobs:
          export PKG=$(ls dist/ | grep tar)
          set -- $PKG
          echo "name=$1" >> $GITHUB_ENV
+
      - name: Upload Release Asset (sdist) to GitHub
        id: upload-release-asset
        uses: actions/upload-release-asset@v1
@@ -33,6 +33,7 @@ repos:
    rev: v1.0.1
    hooks:
      - id: mypy
+        exclude: ^tests/
        additional_dependencies:
          [
            "types-requests",
@@ -46,7 +47,7 @@ repos:
          [
            --disallow-untyped-defs,
            --ignore-missing-imports,
-            --python-version=3.8,
+            --python-version=3.10,
          ]
  - repo: https://github.com/adamchainz/blacken-docs
    rev: 1.16.0
@@ -1,158 +1,45 @@
-# LlamaParse
-
-[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
-[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-cloud-services)](https://pypi.org/project/llama-cloud-services/)
+[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_cloud_services)](https://github.com/run-llama/llama_cloud_services/graphs/contributors)
 [![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)

-LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
+# Llama Cloud Services

-It is really good at the following:
+This repository contains the code for hand-written SDKs and clients for interacting with LlamaCloud.

- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+This includes:

-LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
-
-The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
-
-Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
-
-If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+- [LlamaParse](./parse.md) - A GenAI-native document parser that can parse complex document data for any downstream LLM use case (Agents, RAG, data processing, etc.).
+- [LlamaReport (beta/invite-only)](./report.md) - A prebuilt agentic report builder that can be used to build reports from a variety of data sources.
+- [LlamaExtract (coming soon!)]() - A prebuilt agentic data extractor that can be used to transform data into a structured JSON representation.

 ## Getting Started

-First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
-
-Then, make sure you have the latest LlamaIndex version installed.
-
-**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
-
-```
-pip uninstall llama-index  # run this if upgrading from v0.9.x or older
-pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
-```
-
-Lastly, install the package:
-
-`pip install llama-parse`
-
-Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+Install the package:

 ```bash
-export LLAMA_CLOUD_API_KEY='llx-...'
-
-# output as text
-llama-parse my_file.pdf --result-type text --output-file output.txt
-
-# output as markdown
-llama-parse my_file.pdf --result-type markdown --output-file output.md
-
-# output as raw json
-llama-parse my_file.pdf --output-raw-json --output-file output.json
+pip install llama-cloud-services
 ```

-You can also create simple scripts:
+Then, get your API key from [LlamaCloud](https://cloud.llamaindex.ai/).
+
+Then, you can use the services in your code:

 ```python
-import nest_asyncio
+from llama_cloud_services import LlamaParse, LlamaReport

-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
-
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
-    verbose=True,
-    language="en",  # Optionally you can define a language, default=en
-)
-
-# sync
-documents = parser.load_data("./my_file.pdf")
-
-# sync batch
-documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
-
-# async
-documents = await parser.aload_data("./my_file.pdf")
-
-# async batch
-documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
+parser = LlamaParse(api_key="YOUR_API_KEY")
+report = LlamaReport(api_key="YOUR_API_KEY")
 ```

-## Using with file object
+See the quickstart guides for each service for more information:

-You can parse a file object directly:
-
-```python
-import nest_asyncio
-
-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
-
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
-    verbose=True,
-    language="en",  # Optionally you can define a language, default=en
-)
-
-file_name = "my_file1.pdf"
-extra_info = {"file_name": file_name}
-
-with open(f"./{file_name}", "rb") as f:
-    # must provide extra_info with file_name key with passing file object
-    documents = parser.load_data(f, extra_info=extra_info)
-
-# you can also pass file bytes directly
-with open(f"./{file_name}", "rb") as f:
-    file_bytes = f.read()
-    # must provide extra_info with file_name key with passing file bytes
-    documents = parser.load_data(file_bytes, extra_info=extra_info)
-```
-
-## Using with `SimpleDirectoryReader`
-
-You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
-
-```python
-import nest_asyncio
-
-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
-from llama_index.core import SimpleDirectoryReader
-
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    verbose=True,
-)
-
-file_extractor = {".pdf": parser}
-documents = SimpleDirectoryReader(
-    "./data", file_extractor=file_extractor
-).load_data()
-```
-
-Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
-
-## Examples
-
-Several end-to-end indexing examples can be found in the examples folder
-
- [Getting Started](examples/demo_basic.ipynb)
- [Advanced RAG Example](examples/demo_advanced.ipynb)
- [Raw API Usage](examples/demo_api.ipynb)
+- [LlamaParse](./parse.md)
+- [LlamaReport (beta/invite-only)](./report.md)
+- [LlamaExtract (coming soon!)]()

 ## Documentation

-[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
+You can see complete SDK and API documentation for each service on [our official docs](https://docs.cloud.llamaindex.ai/).

 ## Terms of Service

@@ -160,6 +47,4 @@ See the [Terms of Service Here](./TOS.pdf).

 ## Get in Touch (LlamaCloud)

-LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
-
-LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
+You can get in touch with us by following our [contact link](https://www.llamaindex.ai/contact).
@@ -53,7 +53,7 @@
   "source": [
    "!pip install llama-index\n",
    "!pip install llama-index-core\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -190,7 +190,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(result_type=\"markdown\")"
   ]
@@ -22,7 +22,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install llama-parse llama-index llama-index-postprocessor-sbert-rerank"
+    "!pip install llama-cloud-services llama-index llama-index-postprocessor-sbert-rerank"
   ]
  },
  {
@@ -82,7 +82,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -81,7 +81,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "docs = LlamaParse(result_type=\"text\").load_data(\"./caltrain_schedule_weekend.pdf\")"
   ]
@@ -26,7 +26,7 @@
    "!pip install llama-index-embeddings-openai\n",
    "!pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -108,7 +108,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2021_10k.pdf\")"
   ]
@@ -22,7 +22,7 @@
    "%pip install llama-index-embeddings-openai\n",
    "%pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "%pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
-    "%pip install llama-parse\n",
+    "%pip install llama-cloud-services\n",
    "%pip install llama-index-vector-stores-astra-db"
   ]
  },
@@ -107,7 +107,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
   ]
@@ -176,7 +176,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
   ]
@@ -130,7 +130,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"text\").load_data(file_path)"
   ]
@@ -73,7 +73,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"text\").load_data(\"./attention.pdf\")"
   ]
@@ -120,7 +120,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./attention.pdf\")"
   ]
@@ -142,7 +142,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"text\").load_data(file_path)"
   ]
@@ -21,7 +21,7 @@
   "outputs": [],
   "source": [
    "%pip install llama-index\n",
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
   ]
  },
  {
@@ -41,7 +41,7 @@
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "api_key = \"llx-\"  # get from cloud.llamaindex.ai"
   ]
@@ -32,7 +32,7 @@
   "outputs": [],
   "source": [
    "!pip install llama-index-core\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -119,7 +119,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(extract_charts=True, invalidate_cache=True)\n",
    "json_objs = parser.get_json_result(\"./agentless.pdf\")"
@@ -116,7 +116,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"markdown\").load_data(\"./policy.pdf\")"
   ]
@@ -35,7 +35,7 @@
    "!pip install llama-index-core\n",
    "!pip install llama-index-llms-anthropic llama-index-multi-modal-llms-anthropic\n",
    "!pip install llama-index-embeddings-huggingface\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -129,7 +129,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(verbose=True)\n",
    "json_objs = parser.get_json_result(\"./uber_10q_march_2022.pdf\")\n",
@@ -37,7 +37,7 @@
    "%pip install llama-index-core\n",
    "%pip install llama-index-llms-anthropic llama-index-multi-modal-llms-anthropic\n",
    "%pip install llama-index-embeddings-huggingface\n",
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
   ]
  },
  {
@@ -110,7 +110,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(verbose=True)\n",
    "json_objs = parser.get_json_result(\"./uber_10q_march_2022.pdf\")\n",
@@ -32,7 +32,7 @@
   "outputs": [],
   "source": [
    "!pip install llama-index-core\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -125,7 +125,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(verbose=True, premium_mode=True)\n",
    "json_objs = parser.get_json_result(\"./san_francisco_budget_2023.pdf\")"
@@ -77,7 +77,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(result_type=\"text\", language=\"fr\")\n",
    "documents = parser.load_data(\"./treasury_report.pdf\")"
@@ -250,7 +250,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(result_type=\"text\", language=\"ch_sim\")\n",
    "documents = parser.load_data(\"./chinese_pdf.pdf\")"
@@ -404,7 +404,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "base_parser = LlamaParse(result_type=\"text\", language=\"en\")\n",
    "base_documents = parser.load_data(\"./chinese_pdf2.pdf\")"
@@ -69,7 +69,7 @@
    "import pymongo\n",
    "\n",
    "from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch\n",
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core import VectorStoreIndex, StorageContext\n",
    "from llama_index.core.node_parser import SimpleNodeParser"
@@ -114,7 +114,7 @@
    }
   ],
   "source": [
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
   ]
  },
  {
@@ -169,7 +169,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse"
+    "from llama_cloud_services import LlamaParse"
   ]
  },
  {
@@ -35,7 +35,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -160,7 +160,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -199,7 +199,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser_gpt4o = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -31,7 +31,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -117,7 +117,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(target_pages=\"0,1,2\", result_type=\"markdown\")\n",
    "\n",
@@ -34,7 +34,7 @@
    "%pip install llama-index-question-gen-openai\n",
    "%pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "%pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
   ]
  },
  {
@@ -109,7 +109,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "docs_2021 = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2021_10k.pdf\")\n",
    "docs_2020 = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2020_10k.pdf\")"
@@ -31,7 +31,7 @@
   "outputs": [],
   "source": [
    "%pip install llama-index\n",
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
   ]
  },
  {
@@ -53,7 +53,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "# api_key = \"llx-\"  # get from cloud.llamaindex.ai"
   ]
@@ -37,7 +37,7 @@
   "outputs": [],
   "source": [
    "# !pip install llama-index\n",
-    "# !pip install llama-parse"
+    "# !pip install llama-cloud-services"
   ]
  },
  {
@@ -59,7 +59,7 @@
    "from llama_index.core import VectorStoreIndex\n",
    "from IPython.display import Image, Markdown\n",
    "\n",
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "from llama_index.core.node_parser import MarkdownElementNodeParser"
   ]
@@ -33,7 +33,7 @@
    "!pip install llama-index-postprocessor-flag-embedding-reranker\n",
    "!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
    "!pip install llama-index-graph-stores-neo4j\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -125,7 +125,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "docs = LlamaParse(result_type=\"text\").load_data(\"./data/budget_2023.pdf\")"
   ]
@@ -141,7 +141,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -205,7 +205,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser_gpt4o = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -118,7 +118,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -181,7 +181,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser_gpt4o = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -99,7 +99,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -102,7 +102,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -169,7 +169,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "\n",
    "parser = LlamaParse(\n",
@@ -153,7 +153,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "\n",
    "parser_text = LlamaParse(result_type=\"text\")\n",
@@ -143,7 +143,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -172,7 +172,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -104,7 +104,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -25,7 +25,7 @@
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
-    "from llama_parse import LlamaParse"
+    "from llama_cloud_services import LlamaParse"
   ]
  },
  {
@@ -27,7 +27,7 @@
   "outputs": [],
   "source": [
    "%pip install llama-index\n",
-    "%pip install llama-parse\n",
+    "%pip install llama-cloud-services\n",
    "%pip install torch transformers python-pptx Pillow"
   ]
  },
@@ -85,7 +85,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse"
+    "from llama_cloud_services import LlamaParse"
   ]
  },
  {
@@ -45,7 +45,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -100,7 +100,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "vanilaParsing = LlamaParse(result_type=\"markdown\").load_data(\"./mcdonalds_receipt.png\")"
   ]
@@ -22,7 +22,7 @@
    "!pip install llama-index\n",
    "!pip install llama-index-core\n",
    "!pip install llama-index-embeddings-openai llama-index-llms-openai\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
   ]
  },
  {
@@ -111,7 +111,7 @@
    }
   ],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "file_path = \"data/nova_technical_report.pdf\"\n",
    "\n",
@@ -95,7 +95,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "# use our multimodal models for extractions\n",
    "parser = LlamaParse(\n",
@@ -107,7 +107,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser_gpt4o = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
@@ -0,0 +1,762 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Report Generation with LlamaReport\n",
+    "\n",
+    "In this notebook, we'll walk through the basic process of generating a report with LlamaReport, and highlight some of the key features of the library.\n",
+    "\n",
+    "TLDR:\n",
+    "1. Download source data to use as knowledge base for the report\n",
+    "2. Kick off report generation with a template\n",
+    "3. Get the plan and review/accept/reject suggestions\n",
+    "4. Get the final report\n",
+    "5. Review/accept/reject suggestions to edit the final report\n",
+    "6. Print the final report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install llama-cloud-services"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Download Source Data\n",
+    "\n",
+    "Here, we download the `Attention is All You Need` paper as a PDF.\n",
+    "\n",
+    "LlamaReport currently supports up to 5 files as input, and essentially any file type that can be parsed by LlamaParse.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!wget \"https://arxiv.org/pdf/1706.03762.pdf\" -O \"./attention.pdf\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Kick off Report Generation\n",
+    "\n",
+    "Here, we kick off report generation with a template.\n",
+    "\n",
+    "The template can either be a string or a file path, but here we'll use a string.\n",
+    "\n",
+    "In our experiments, anything works as a template, but some general guidelines:\n",
+    "\n",
+    "- Use markdown formatting + instructions in each section to guide the report generation\n",
+    "- If using an existing file as a template, provide extra instructions to guide the report generation\n",
+    "\n",
+    "**NOTE:** Since we are in a notebook, we will use async functions and `await` throughout. Synchronous methods that work without `await` are available by just removing the `a` from the method name and removing the `await` keyword."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_cloud_services import LlamaReport\n",
+    "\n",
+    "llama_report = LlamaReport(\n",
+    "    api_key=\"llx-...\",\n",
+    ")\n",
+    "\n",
+    "report_client = await llama_report.acreate_report(\n",
+    "    name=\"my_cool_report_on_attention\",\n",
+    "    # can pass in file paths or bytes\n",
+    "    input_files=[\"./attention.pdf\"],\n",
+    "    template_text=\"\"\"\\\n",
+    "# [Some title]\\n\\n\n",
+    "## TLDR\\n\n",
+    "A quick summary of the paper.\\n\\n\n",
+    "## Details\\n\n",
+    "More details about the paper, possibly more than one section here.\\n\n",
+    "\"\"\",\n",
+    "    # optional additional instructions for the report generation\n",
+    "    # template_instructions=None,\n",
+    "    # optional file path to an existing template instead of template_text\n",
+    "    # template_file=None,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The returned `ReportClient` object is used to interact with the report generation process for this specific report."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Report(id=0a394b33-1a3e-463c-b5cb-7ff8ab827d0a, name=my_cool_report_on_attention)\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(report_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Get the plan\n",
+    "\n",
+    "The first phases of report generation involve ingesting the source data and generating a plan.\n",
+    "\n",
+    "The plan is a list of instructions for the report generation, and can be reviewed/accepted/rejected by the user.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plan = await report_client.await_for_plan(\n",
+    "    timeout=10000,\n",
+    "    poll_interval=10,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "# {title}\n",
+      "[ReportQuery(field='title', prompt='Generate a clear and concise title for this paper about the Transformer model and attention mechanisms', context='The paper discusses the Transformer architecture for sequence transduction using attention mechanisms, focusing on machine translation applications')]\n",
+      "==================\n",
+      "## TLDR\n",
+      "\n",
+      "{tldr_content}\n",
+      "[ReportQuery(field='tldr_content', prompt='Write a brief, clear summary of the key points about the Transformer model', context='Focus on the main innovations: attention mechanisms, efficiency improvements, and state-of-the-art results in machine translation')]\n",
+      "==================\n",
+      "## Details\n",
+      "\n",
+      "{details_content}\n",
+      "[ReportQuery(field='details_content', prompt='Provide detailed information about the Transformer model architecture and its applications', context='Include information about:\\n- The attention mechanism implementation\\n- Advantages over recurrent and convolutional models\\n- Performance in machine translation tasks\\n- Training efficiency improvements')]\n",
+      "==================\n"
+     ]
+    }
+   ],
+   "source": [
+    "for plan_block in plan.blocks:\n",
+    "    print(plan_block.block.template)\n",
+    "    print(plan_block.queries)\n",
+    "    print(\"==================\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With the plan, we can either use it to kick off generation of the final report, or we can edit the plan and adjust it as needed.\n",
+    "\n",
+    "While we could manually edit the objects here and use `await report_client.aupdate_plan(action=\"edit\", updated_plan=plan)`, we can also use `LlamaReport` to agentically edit the plan."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "suggestions = await report_client.asuggest_edits(\n",
+    "    \"Can you split the details section into two sections?\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Justification for change: \n",
+      "I'll help you break down the details section into two distinct parts - one focusing on the architecture and another on the practical applications and performance. This will make the content more organized and easier to follow. The original block at index 2 will be replaced with these two new sections.\n",
+      "\n",
+      "Proposed changes:\n",
+      "\n",
+      "## Architecture Details\n",
+      "\n",
+      "{architecture_content}\n",
+      "\n",
+      "[ReportQuery(field='architecture_content', prompt='Describe the technical details of the Transformer model architecture', context='Focus on:\\n- Core components of the Transformer architecture\\n- Self-attention mechanism implementation\\n- Multi-head attention details\\n- Position encoding approach\\n- Feed-forward network structure')]\n",
+      "==================\n",
+      "\n",
+      "## Performance and Applications\n",
+      "\n",
+      "{applications_content}\n",
+      "\n",
+      "[ReportQuery(field='applications_content', prompt='Explain the practical applications and performance advantages of the Transformer model', context='Cover:\\n- Comparison with RNN and CNN models\\n- Machine translation results and benchmarks\\n- Training efficiency improvements\\n- Real-world applications and use cases\\n- Scalability benefits')]\n",
+      "==================\n"
+     ]
+    }
+   ],
+   "source": [
+    "for suggestion in suggestions:\n",
+    "    print(\"Justification for change:\", suggestion.justification)\n",
+    "    print(\"Proposed changes:\")\n",
+    "    for plan_block in suggestion.blocks:\n",
+    "        print(plan_block.block.template)\n",
+    "        print(plan_block.queries)\n",
+    "        print(\"==================\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This looks pretty good! We can also use the client to automatically accept and apply, or reject, these suggestions.\n",
+    "\n",
+    "This will (locally) keep track of the history of changes, so that future suggestions can be based on the previous changes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for suggestion in suggestions:\n",
+    "    await report_client.aaccept_edit(suggestion)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What effect did that have on the tracked local history? Let's see!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[EditAction(block_idx=2, old_content='## Details\\n\\n{details_content}\\n\\nField: details_content, Prompt: Provide detailed information about the Transformer model architecture and its applications, Context: Include information about:\\n- The attention mechanism implementation\\n- Advantages over recurrent and convolutional models\\n- Performance in machine translation tasks\\n- Training efficiency improvements\\nDepends on: none', new_content='\\n## Architecture Details\\n\\n{architecture_content}\\n\\n\\nField: architecture_content, Prompt: Describe the technical details of the Transformer model architecture, Context: Focus on:\\n- Core components of the Transformer architecture\\n- Self-attention mechanism implementation\\n- Multi-head attention details\\n- Position encoding approach\\n- Feed-forward network structure\\nDepends on: none', action='approved', timestamp=datetime.datetime(2025, 2, 4, 20, 59, 55, 773558)),\n",
+       " EditAction(block_idx=3, old_content='[No old content]', new_content='\\n## Performance and Applications\\n\\n{applications_content}\\n\\n\\nField: applications_content, Prompt: Explain the practical applications and performance advantages of the Transformer model, Context: Cover:\\n- Comparison with RNN and CNN models\\n- Machine translation results and benchmarks\\n- Training efficiency improvements\\n- Real-world applications and use cases\\n- Scalability benefits\\nDepends on: previous', action='approved', timestamp=datetime.datetime(2025, 2, 4, 20, 59, 55, 773687))]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "report_client.edit_history"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Message(role=<MessageRole.USER: 'user'>, content='Can you split the details section into two sections?', timestamp=datetime.datetime(2025, 2, 4, 20, 59, 47, 754848)),\n",
+       " Message(role=<MessageRole.ASSISTANT: 'assistant'>, content=\"\\nI'll help you break down the details section into two distinct parts - one focusing on the architecture and another on the practical applications and performance. This will make the content more organized and easier to follow. The original block at index 2 will be replaced with these two new sections.\\n\", timestamp=datetime.datetime(2025, 2, 4, 20, 59, 55, 482070))]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "report_client.chat_history"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These two items are used to provide context for future suggestions! You can always clear this, or provide your own history."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# report_client.suggest_edits(\"....\", chat_history=[{\"role\": \"user\", \"content\": \"...\"}, ...])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Get the final report\n",
+    "\n",
+    "Now that we have a plan, we can kick off generation of the final report."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# kicks off report generation\n",
+    "await report_client.aupdate_plan(action=\"approve\")\n",
+    "\n",
+    "# waits for report generation to complete\n",
+    "report = await report_client.await_completion(\n",
+    "    timeout=10000,\n",
+    "    poll_interval=10,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "# Attention Is All You Need: A Pure Attention-Based Architecture for Neural Machine Translation\n",
+      "\n",
+      "## TLDR\n",
+      "\n",
+      "The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. This architecture demonstrated that attention mechanisms alone are sufficient for state-of-the-art sequence modeling, setting a new direction for natural language processing.\n",
+      "\n",
+      "\n",
+      "## Architecture Details\n",
+      "\n",
+      "The Transformer architecture represents a groundbreaking approach to sequence processing, built entirely on attention mechanisms without recurrence or convolution. Here are its key technical details:\n",
+      "\n",
+      "Core Components:\n",
+      "- Encoder-decoder architecture with stacked self-attention and point-wise feed-forward layers\n",
+      "- Each layer contains two main sub-layers: multi-head self-attention mechanism and position-wise feed-forward network\n",
+      "- Layer normalization and residual connections between sub-layers\n",
+      "- No recurrent or convolutional elements, enabling parallel processing\n",
+      "\n",
+      "Self-Attention Mechanism:\n",
+      "- Processes relationships between all positions in a sequence simultaneously\n",
+      "- Computes attention weights using queries, keys, and values derived from input representations\n",
+      "- Implements scaled dot-product attention to prevent gradient issues with large input dimensions\n",
+      "- Allows direct modeling of dependencies regardless of positional distance\n",
+      "- Uses masking in decoder to prevent leftward information flow and maintain auto-regressive property\n",
+      "\n",
+      "Multi-Head Attention:\n",
+      "- Employs multiple attention heads operating in parallel\n",
+      "- Each head processes information in different representation subspaces\n",
+      "- Three types of attention applications:\n",
+      "  1. Encoder self-attention (all positions attend to each other)\n",
+      "  2. Decoder self-attention (each position attends to previous positions)\n",
+      "  3. Encoder-decoder attention (decoder queries attend to encoder outputs)\n",
+      "- Counteracts reduced resolution from attention averaging through parallel processing\n",
+      "\n",
+      "Position-wise Feed-Forward Network:\n",
+      "- Applied identically to each position separately\n",
+      "- Consists of two linear transformations with ReLU activation\n",
+      "- Structure: FFN(x) = max(0, xW1 + b1)W2 + b2\n",
+      "- Input and output dimensionality: dmodel = 512\n",
+      "- Inner-layer dimensionality: dff = 2048\n",
+      "- Parameters vary between layers but remain constant across positions\n",
+      "\n",
+      "Position Encoding:\n",
+      "- Adds positional information to input embeddings\n",
+      "- Enables the model to consider sequential order without recurrence\n",
+      "- Implements sinusoidal position encodings to allow model to attend to relative positions\n",
+      "- Maintains constant number of operations between any two positions, unlike convolutional approaches\n",
+      "- Allows effective modeling of both local and long-range dependencies\n",
+      "\n",
+      "\n",
+      "\n",
+      "## Performance and Applications\n",
+      "\n",
+      "The Transformer model demonstrates significant performance advantages and practical applications across multiple domains:\n",
+      "\n",
+      "Performance Advantages over RNN/CNN Models:\n",
+      "- Eliminates sequential computation constraints present in RNNs, enabling superior parallelization\n",
+      "- Reduces operations needed for relating distant positions to a constant number, compared to linear/logarithmic scaling in CNNs\n",
+      "- Processes all input and output positions simultaneously through self-attention mechanisms\n",
+      "- Achieves state-of-the-art results while requiring significantly less computational resources\n",
+      "\n",
+      "Machine Translation Benchmarks:\n",
+      "- WMT 2014 English-to-German: 28.4 BLEU score, exceeding previous best results by over 2 BLEU points\n",
+      "- WMT 2014 English-to-French: 41.8 BLEU score (single-model state-of-the-art)\n",
+      "- Surpasses performance of existing model ensembles in translation tasks\n",
+      "\n",
+      "Training Efficiency:\n",
+      "- Requires only 3.5 days of training on eight GPUs for state-of-the-art performance\n",
+      "- Achieves superior results at \"a small fraction of the training costs\" compared to previous models\n",
+      "- Enables significantly faster training through parallel processing of input/output sequences\n",
+      "- Can reach production-quality performance in as little as twelve hours on modern GPU hardware\n",
+      "\n",
+      "Real-world Applications:\n",
+      "- Machine translation systems\n",
+      "- Natural language understanding tasks\n",
+      "- Reading comprehension\n",
+      "- Abstractive summarization\n",
+      "- Text entailment analysis\n",
+      "- Constituency parsing (achieving 92.7 F1 score in semi-supervised settings)\n",
+      "- Adaptable to both large and limited training data scenarios\n",
+      "\n",
+      "Scalability Benefits:\n",
+      "- Highly parallelizable architecture enables efficient scaling across multiple GPUs\n",
+      "- Constant computational complexity for relating any input/output positions\n",
+      "- Effective handling of long-range dependencies in sequences\n",
+      "- Maintains performance quality while scaling to larger datasets and model sizes\n",
+      "- Generalizes well across different tasks and domains without architectural changes\n",
+      "- Supports efficient inference and deployment in production environments\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "report_text = \"\\n\\n\".join([block.template for block in report.blocks])\n",
+    "print(report_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Edit the final report\n",
+    "\n",
+    "Now that we have a report, we can edit it.\n",
+    "\n",
+    "We can use the `asuggest_edits` method to get suggestions for edits, and then use the `aaccept_edit`/`areject_edit` methods to apply them.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Justification for change: \n",
+      "I'd suggest changing \"TLDR\" to \"Executive Summary\" which is more appropriate for a professional or academic report. This term is widely used in formal documents and better reflects the nature of this concise overview section while maintaining the same function of providing a quick summary of the key points.\n",
+      "\n",
+      "Proposed changes:\n",
+      "## Executive Summary\n",
+      "\n",
+      "The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. This architecture demonstrated that attention mechanisms alone are sufficient for state-of-the-art sequence modeling, setting a new direction for natural language processing.\n",
+      "==================\n"
+     ]
+    }
+   ],
+   "source": [
+    "suggestions = await report_client.asuggest_edits(\n",
+    "    \"Can you change the TLDR header to something more professional?\"\n",
+    ")\n",
+    "for suggestion in suggestions:\n",
+    "    print(\"Justification for change:\", suggestion.justification)\n",
+    "    print(\"Proposed changes:\")\n",
+    "    for block in suggestion.blocks:\n",
+    "        print(block.template)\n",
+    "        print(\"==================\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Changing to \"Executive Summary\" sounds reasonable, lets accept that!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for suggestion in suggestions:\n",
+    "    await report_client.aaccept_edit(suggestion)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Print the final report\n",
+    "\n",
+    "Now that we have a report, we can print it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "# Attention Is All You Need: A Pure Attention-Based Architecture for Neural Machine Translation\n",
+      "\n",
+      "## Executive Summary\n",
+      "\n",
+      "The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. This architecture demonstrated that attention mechanisms alone are sufficient for state-of-the-art sequence modeling, setting a new direction for natural language processing.\n",
+      "\n",
+      "\n",
+      "## Architecture Details\n",
+      "\n",
+      "The Transformer architecture represents a groundbreaking approach to sequence processing, built entirely on attention mechanisms without recurrence or convolution. Here are its key technical details:\n",
+      "\n",
+      "Core Components:\n",
+      "- Encoder-decoder architecture with stacked self-attention and point-wise feed-forward layers\n",
+      "- Each layer contains two main sub-layers: multi-head self-attention mechanism and position-wise feed-forward network\n",
+      "- Layer normalization and residual connections between sub-layers\n",
+      "- No recurrent or convolutional elements, enabling parallel processing\n",
+      "\n",
+      "Self-Attention Mechanism:\n",
+      "- Processes relationships between all positions in a sequence simultaneously\n",
+      "- Computes attention weights using queries, keys, and values derived from input representations\n",
+      "- Implements scaled dot-product attention to prevent gradient issues with large input dimensions\n",
+      "- Allows direct modeling of dependencies regardless of positional distance\n",
+      "- Uses masking in decoder to prevent leftward information flow and maintain auto-regressive property\n",
+      "\n",
+      "Multi-Head Attention:\n",
+      "- Employs multiple attention heads operating in parallel\n",
+      "- Each head processes information in different representation subspaces\n",
+      "- Three types of attention applications:\n",
+      "  1. Encoder self-attention (all positions attend to each other)\n",
+      "  2. Decoder self-attention (each position attends to previous positions)\n",
+      "  3. Encoder-decoder attention (decoder queries attend to encoder outputs)\n",
+      "- Counteracts reduced resolution from attention averaging through parallel processing\n",
+      "\n",
+      "Position-wise Feed-Forward Network:\n",
+      "- Applied identically to each position separately\n",
+      "- Consists of two linear transformations with ReLU activation\n",
+      "- Structure: FFN(x) = max(0, xW1 + b1)W2 + b2\n",
+      "- Input and output dimensionality: dmodel = 512\n",
+      "- Inner-layer dimensionality: dff = 2048\n",
+      "- Parameters vary between layers but remain constant across positions\n",
+      "\n",
+      "Position Encoding:\n",
+      "- Adds positional information to input embeddings\n",
+      "- Enables the model to consider sequential order without recurrence\n",
+      "- Implements sinusoidal position encodings to allow model to attend to relative positions\n",
+      "- Maintains constant number of operations between any two positions, unlike convolutional approaches\n",
+      "- Allows effective modeling of both local and long-range dependencies\n",
+      "\n",
+      "\n",
+      "\n",
+      "## Performance and Applications\n",
+      "\n",
+      "The Transformer model demonstrates significant performance advantages and practical applications across multiple domains:\n",
+      "\n",
+      "Performance Advantages over RNN/CNN Models:\n",
+      "- Eliminates sequential computation constraints present in RNNs, enabling superior parallelization\n",
+      "- Reduces operations needed for relating distant positions to a constant number, compared to linear/logarithmic scaling in CNNs\n",
+      "- Processes all input and output positions simultaneously through self-attention mechanisms\n",
+      "- Achieves state-of-the-art results while requiring significantly less computational resources\n",
+      "\n",
+      "Machine Translation Benchmarks:\n",
+      "- WMT 2014 English-to-German: 28.4 BLEU score, exceeding previous best results by over 2 BLEU points\n",
+      "- WMT 2014 English-to-French: 41.8 BLEU score (single-model state-of-the-art)\n",
+      "- Surpasses performance of existing model ensembles in translation tasks\n",
+      "\n",
+      "Training Efficiency:\n",
+      "- Requires only 3.5 days of training on eight GPUs for state-of-the-art performance\n",
+      "- Achieves superior results at \"a small fraction of the training costs\" compared to previous models\n",
+      "- Enables significantly faster training through parallel processing of input/output sequences\n",
+      "- Can reach production-quality performance in as little as twelve hours on modern GPU hardware\n",
+      "\n",
+      "Real-world Applications:\n",
+      "- Machine translation systems\n",
+      "- Natural language understanding tasks\n",
+      "- Reading comprehension\n",
+      "- Abstractive summarization\n",
+      "- Text entailment analysis\n",
+      "- Constituency parsing (achieving 92.7 F1 score in semi-supervised settings)\n",
+      "- Adaptable to both large and limited training data scenarios\n",
+      "\n",
+      "Scalability Benefits:\n",
+      "- Highly parallelizable architecture enables efficient scaling across multiple GPUs\n",
+      "- Constant computational complexity for relating any input/output positions\n",
+      "- Effective handling of long-range dependencies in sequences\n",
+      "- Maintains performance quality while scaling to larger datasets and model sizes\n",
+      "- Generalizes well across different tasks and domains without architectural changes\n",
+      "- Supports efficient inference and deployment in production environments\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "report_response = await report_client.aget()\n",
+    "report_text = \"\\n\\n\".join([block.template for block in report_response.report.blocks])\n",
+    "print(report_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also see the sources for each block!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.99687636\n",
+      "# Abstract\n",
+      "\n",
+      "The dominant sequence transduction models are based on complex recurrent or convolutiona\n",
+      "==================\n",
+      "0.99591404\n",
+      "# 2 Background\n",
+      "\n",
+      "The goal of reducing sequential computation also forms the foundation of the Extende\n",
+      "==================\n",
+      "0.9951325\n",
+      "# 1 Introduction\n",
+      "\n",
+      "Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neu\n",
+      "==================\n",
+      "0.99442345\n",
+      "# 7 Conclusion\n",
+      "\n",
+      "In this work, we presented the Transformer, the first sequence transduction model ba\n",
+      "==================\n",
+      "0.9967649\n",
+      "# 3.2.3 Applications of Attention in our Model\n",
+      "\n",
+      "The Transformer uses multi-head attention in three d\n",
+      "==================\n",
+      "0.99533635\n",
+      "# 2 Background\n",
+      "\n",
+      "The goal of reducing sequential computation also forms the foundation of the Extende\n",
+      "==================\n",
+      "0.9935868\n",
+      "# Abstract\n",
+      "\n",
+      "The dominant sequence transduction models are based on complex recurrent or convolutiona\n",
+      "==================\n",
+      "0.98780584\n",
+      "# Outputs\n",
+      "\n",
+      "(shifted right)\n",
+      "\n",
+      "Figure 1: The Transformer - model architecture.\n",
+      "\n",
+      "The Transformer follows\n",
+      "==================\n",
+      "0.9205043\n",
+      "# 3.3 Position-wise Feed-Forward Networks\n",
+      "\n",
+      "In addition to attention sub-layers, each of the layers i\n",
+      "==================\n",
+      "0.79581684\n",
+      "# 1 Introduction\n",
+      "\n",
+      "Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neu\n",
+      "==================\n",
+      "0.9946774\n",
+      "# Abstract\n",
+      "\n",
+      "The dominant sequence transduction models are based on complex recurrent or convolutiona\n",
+      "==================\n",
+      "0.97079873\n",
+      "# 7 Conclusion\n",
+      "\n",
+      "In this work, we presented the Transformer, the first sequence transduction model ba\n",
+      "==================\n",
+      "0.9535353\n",
+      "# 6.3 English Constituency Parsing\n",
+      "\n",
+      "To evaluate if the Transformer can generalize to other tasks we \n",
+      "==================\n",
+      "0.9514138\n",
+      "# 2 Background\n",
+      "\n",
+      "The goal of reducing sequential computation also forms the foundation of the Extende\n",
+      "==================\n",
+      "0.9790758\n",
+      "# 1 Introduction\n",
+      "\n",
+      "Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neu\n",
+      "==================\n",
+      "0.92262185\n",
+      "# Outputs\n",
+      "\n",
+      "(shifted right)\n",
+      "\n",
+      "Figure 1: The Transformer - model architecture.\n",
+      "\n",
+      "The Transformer follows\n",
+      "==================\n"
+     ]
+    }
+   ],
+   "source": [
+    "for block in report_response.report.blocks:\n",
+    "    # Each block has a list of sources, which are the nodes that were used to generate the block\n",
+    "    for source in block.sources:\n",
+    "        print(source.score)\n",
+    "        print(source.node.text[:100])\n",
+    "        print(\"==================\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "llama-parse-aNC435Vv-py3.10",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,8 @@
+from llama_cloud_services.parse import LlamaParse
+from llama_cloud_services.report import ReportClient, LlamaReport
+
+__all__ = [
+    "LlamaParse",
+    "ReportClient",
+    "LlamaReport",
+]
@@ -0,0 +1,3 @@
+from llama_cloud_services.parse.base import LlamaParse, ResultType
+
+__all__ = ["LlamaParse", "ResultType"]
@@ -18,7 +18,7 @@ from llama_index.core.readers.base import BasePydanticReader
 from llama_index.core.readers.file.base import get_default_fs
 from llama_index.core.schema import Document

-from llama_parse.utils import (
+from llama_cloud_services.parse.utils import (
    SUPPORTED_FILE_TYPES,
    ResultType,
    nest_asyncio_err,
@@ -5,7 +5,7 @@ from pathlib import Path
 from pydantic.fields import FieldInfo
 from typing import Any, Callable, List

-from llama_parse.base import LlamaParse
+from llama_cloud_services.parse.base import LlamaParse


 def pydantic_field_to_click_option(name: str, field: FieldInfo) -> click.Option:
@@ -0,0 +1,4 @@
+from llama_cloud_services.report.report import ReportClient
+from llama_cloud_services.report.base import LlamaReport
+
+__all__ = ["ReportClient", "LlamaReport"]
@@ -0,0 +1,269 @@
+import asyncio
+import httpx
+import os
+import io
+from concurrent.futures import ThreadPoolExecutor
+from typing import Optional, List, Union, Any, Coroutine, TypeVar
+from urllib.parse import urljoin
+
+from llama_cloud.types import ReportMetadata
+from llama_cloud_services.report.report import ReportClient
+
+T = TypeVar("T")
+
+
+class LlamaReport:
+    """Client for managing reports and general report operations."""
+
+    def __init__(
+        self,
+        api_key: Optional[str] = None,
+        project_id: Optional[str] = None,
+        organization_id: Optional[str] = None,
+        base_url: Optional[str] = None,
+        timeout: Optional[int] = None,
+        async_httpx_client: Optional[httpx.AsyncClient] = None,
+    ):
+        self.api_key = api_key or os.getenv("LLAMA_CLOUD_API_KEY", None)
+        if not self.api_key:
+            raise ValueError("No API key provided.")
+
+        self.base_url = base_url or os.getenv(
+            "LLAMA_CLOUD_BASE_URL", "https://api.cloud.llamaindex.ai"
+        )
+        self.timeout = timeout or 60
+
+        # Initialize HTTP clients
+        self._aclient = async_httpx_client or httpx.AsyncClient(timeout=self.timeout)
+
+        # Set auth headers
+        self.headers = {
+            "Authorization": f"Bearer {self.api_key}",
+        }
+
+        self.organization_id = organization_id
+        self.project_id = project_id
+        self._client_params = {
+            "timeout": self._aclient.timeout,
+            "headers": self._aclient.headers,
+            "base_url": self._aclient.base_url,
+            "auth": self._aclient.auth,
+            "event_hooks": self._aclient.event_hooks,
+            "cookies": self._aclient.cookies,
+            "max_redirects": self._aclient.max_redirects,
+            "params": self._aclient.params,
+            "trust_env": self._aclient.trust_env,
+        }
+        self._thread_pool = ThreadPoolExecutor(
+            max_workers=min(10, (os.cpu_count() or 1) + 4)
+        )
+
+    @property
+    def aclient(self) -> httpx.AsyncClient:
+        if self._aclient is None:
+            self._aclient = httpx.AsyncClient(**self._client_params)
+        return self._aclient
+
+    def _run_sync(self, coro: Coroutine[Any, Any, T]) -> T:
+        """Run coroutine in a separate thread to avoid event loop issues"""
+
+        # force a new client for this thread/event loop
+        original_client = self._aclient
+        self._aclient = None
+
+        def run_coro() -> T:
+            async def wrapped_coro() -> T:
+                return await coro
+
+            return asyncio.run(wrapped_coro())
+
+        result = self._thread_pool.submit(run_coro).result()
+
+        # restore the original client
+        self._aclient = original_client
+
+        return result
+
+    async def _get_default_project(self) -> str:
+        response = await self.aclient.get(
+            urljoin(str(self.base_url), "/api/v1/projects"), headers=self.headers
+        )
+        response.raise_for_status()
+        projects = response.json()
+        default_project = [p for p in projects if p.get("is_default")]
+        return default_project[0]["id"]
+
+    async def _build_url(
+        self, endpoint: str, extra_params: Optional[List[str]] = None
+    ) -> str:
+        """Helper method to build URLs with common query parameters."""
+        url = urljoin(str(self.base_url), endpoint)
+
+        if not self.project_id:
+            self.project_id = await self._get_default_project()
+
+        query_params = []
+        if self.organization_id:
+            query_params.append(f"organization_id={self.organization_id}")
+        if self.project_id:
+            query_params.append(f"project_id={self.project_id}")
+        if extra_params:
+            query_params.extend([p for p in extra_params if p is not None])
+
+        if query_params:
+            url += "?" + "&".join(query_params)
+
+        return url
+
+    async def acreate_report(
+        self,
+        name: str,
+        template_instructions: Optional[str] = None,
+        template_text: Optional[str] = None,
+        template_file: Optional[Union[str, tuple[str, bytes]]] = None,
+        input_files: Optional[List[Union[str, tuple[str, bytes]]]] = None,
+        existing_retriever_id: Optional[str] = None,
+    ) -> ReportClient:
+        """Create a new report asynchronously."""
+        url = await self._build_url("/api/v1/reports/")
+        open_files: List[io.BufferedReader] = []
+
+        data = {"name": name}
+        if template_instructions:
+            data["template_instructions"] = template_instructions
+        if template_text:
+            data["template_text"] = template_text
+        if existing_retriever_id:
+            data["existing_retriever_id"] = str(existing_retriever_id)
+
+        files: List[tuple[str, io.BufferedReader | bytes]] = []
+        if template_file:
+            if isinstance(template_file, str):
+                open_files.append(open(template_file, "rb"))
+                files.append(("template_file", open_files[-1]))
+            else:
+                files.append(("template_file", template_file[1]))
+
+        if input_files:
+            for f in input_files:
+                if isinstance(f, str):
+                    open_files.append(open(f, "rb"))
+                    files.append(("files", open_files[-1]))
+                else:
+                    files.append(("files", f[1]))
+
+        response = await self.aclient.post(
+            url, headers=self.headers, data=data, files=files
+        )
+        try:
+            response.raise_for_status()
+            report_id = response.json()["id"]
+            return ReportClient(report_id, name, self)
+        except httpx.HTTPStatusError as e:
+            raise ValueError(
+                f"Failed to create report: {e.response.text}\nError Code: {e.response.status_code}"
+            )
+        finally:
+            for open_file in open_files:
+                open_file.close()
+
+    def create_report(
+        self,
+        name: str,
+        template_instructions: Optional[str] = None,
+        template_text: Optional[str] = None,
+        template_file: Optional[Union[str, tuple[str, bytes]]] = None,
+        input_files: Optional[List[Union[str, tuple[str, bytes]]]] = None,
+        existing_retriever_id: Optional[str] = None,
+    ) -> ReportClient:
+        """Create a new report."""
+        return self._run_sync(
+            self.acreate_report(
+                name=name,
+                template_instructions=template_instructions,
+                template_text=template_text,
+                template_file=template_file,
+                input_files=input_files,
+                existing_retriever_id=existing_retriever_id,
+            )
+        )
+
+    async def alist_reports(
+        self, state: Optional[str] = None, limit: int = 100, offset: int = 0
+    ) -> List[ReportClient]:
+        """List all reports asynchronously."""
+        params = []
+        if state:
+            params.append(f"state={state}")
+        if limit:
+            params.append(f"limit={limit}")
+        if offset:
+            params.append(f"offset={offset}")
+
+        url = await self._build_url(
+            "/api/v1/reports/list",
+            extra_params=params,
+        )
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        data = response.json()
+
+        return [
+            ReportClient(r["report_id"], r["name"], self)
+            for r in data["report_responses"]
+        ]
+
+    def list_reports(
+        self, state: Optional[str] = None, limit: int = 100, offset: int = 0
+    ) -> List[ReportClient]:
+        """Synchronous wrapper for listing reports."""
+        return self._run_sync(self.alist_reports(state, limit, offset))
+
+    async def aget_report(self, report_id: str) -> ReportClient:
+        """Get a Report instance for working with a specific report."""
+        url = await self._build_url(f"/api/v1/reports/{report_id}")
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        data = response.json()
+
+        return ReportClient(data["report_id"], data["name"], self)
+
+    def get_report(self, report_id: str) -> ReportClient:
+        """Synchronous wrapper for getting a report."""
+        return self._run_sync(self.aget_report(report_id))
+
+    async def aget_report_metadata(self, report_id: str) -> ReportMetadata:
+        """Get metadata for a specific report asynchronously.
+
+        Returns:
+            dict containing:
+            - id: Report ID
+            - name: Report name
+            - state: Current report state
+            - report_metadata: Additional metadata
+            - template_file: Name of template file if used
+            - template_instructions: Template instructions if provided
+            - input_files: List of input file names
+        """
+        url = await self._build_url(f"/api/v1/reports/{report_id}/metadata")
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        return ReportMetadata(**response.json())
+
+    def get_report_metadata(self, report_id: str) -> ReportMetadata:
+        """Synchronous wrapper for getting report metadata."""
+        return self._run_sync(self.aget_report_metadata(report_id))
+
+    async def adelete_report(self, report_id: str) -> None:
+        """Delete a specific report asynchronously."""
+        url = await self._build_url(f"/api/v1/reports/{report_id}")
+
+        response = await self.aclient.delete(url, headers=self.headers)
+        response.raise_for_status()
+
+    def delete_report(self, report_id: str) -> None:
+        """Synchronous wrapper for deleting a report."""
+        return self._run_sync(self.adelete_report(report_id))
@@ -0,0 +1,527 @@
+import asyncio
+import httpx
+import time
+from typing import Optional, List, Literal, Union, TYPE_CHECKING
+from dataclasses import dataclass
+from datetime import datetime
+from enum import Enum
+
+from llama_cloud.types import (
+    ReportEventItemEventData_Progress,
+    ReportMetadata,
+    EditSuggestion,
+    ReportResponse,
+    ReportPlan,
+    ReportBlock,
+    ReportPlanBlock,
+    Report,
+)
+
+if TYPE_CHECKING:
+    from llama_cloud_services.report.base import LlamaReport
+
+
+class MessageRole(str, Enum):
+    USER = "user"
+    ASSISTANT = "assistant"
+
+
+@dataclass
+class Message:
+    role: MessageRole
+    content: str
+    timestamp: datetime
+
+
+@dataclass
+class EditAction:
+    block_idx: int
+    old_content: str
+    new_content: Optional[str]
+    action: Literal["approved", "rejected"]
+    timestamp: datetime
+
+
+DEFAULT_POLL_INTERVAL = 5
+DEFAULT_TIMEOUT = 600
+
+
+class ReportClient:
+    """Client for operations on a specific report."""
+
+    def __init__(self, report_id: str, name: str, parent_client: "LlamaReport"):
+        self.report_id = report_id
+        self.name = name
+        self._client = parent_client
+        self._headers = parent_client.headers
+        self._run_sync = parent_client._run_sync
+        self._build_url = parent_client._build_url
+        self.chat_history: List[Message] = []
+        self.edit_history: List[EditAction] = []
+
+    @property
+    def aclient(self) -> httpx.AsyncClient:
+        return self._client.aclient
+
+    def __str__(self) -> str:
+        return f"Report(id={self.report_id}, name={self.name})"
+
+    def __repr__(self) -> str:
+        return f"Report(id={self.report_id}, name={self.name})"
+
+    def _get_block_content(self, block: Union[ReportBlock, ReportPlanBlock]) -> str:
+        if isinstance(block, ReportBlock):
+            return block.template
+        elif isinstance(block, ReportPlanBlock):
+            return block.block.template
+        else:
+            raise ValueError(f"Invalid block type: {type(block)}")
+
+    def _get_block_idx(self, block: Union[ReportBlock, ReportPlanBlock]) -> int:
+        if isinstance(block, ReportBlock):
+            return block.idx
+        elif isinstance(block, ReportPlanBlock):
+            return block.block.idx
+        else:
+            raise ValueError(f"Invalid block type: {type(block)}")
+
+    async def aget(self, version: Optional[int] = None) -> ReportResponse:
+        """Get this report's details asynchronously."""
+        extra_params = []
+        if version is not None:
+            extra_params.append(f"version={version}")
+
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}", extra_params)
+
+        response = await self.aclient.get(url, headers=self._headers)
+        response.raise_for_status()
+        return ReportResponse(**response.json())
+
+    def get(self, version: Optional[int] = None) -> ReportResponse:
+        """Synchronous wrapper for getting this report's details."""
+        return self._run_sync(self.aget(version))
+
+    async def aupdate_report(self, updated_report: Report) -> ReportResponse:
+        """Update this report's content asynchronously."""
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}")
+        response = await self.aclient.patch(
+            url, headers=self._headers, json={"content": updated_report.dict()}
+        )
+        response.raise_for_status()
+        return ReportResponse(**response.json())
+
+    def update_report(self, updated_report: Report) -> ReportResponse:
+        """Synchronous wrapper for updating this report's content."""
+        return self._run_sync(self.aupdate_report(updated_report))
+
+    async def aupdate_plan(
+        self,
+        action: Literal["approve", "reject", "edit"],
+        updated_plan: Optional[ReportPlan] = None,
+    ) -> ReportResponse:
+        """Update this report's plan asynchronously."""
+        if action == "edit" and not updated_plan:
+            raise ValueError("updated_plan is required when action is 'edit'")
+
+        url = await self._build_url(
+            f"/api/v1/reports/{self.report_id}/plan", [f"action={action}"]
+        )
+
+        data = None
+        if updated_plan is not None:
+            plan_dict = updated_plan.dict()
+            plan_dict.pop("generated_at", None)
+            data = plan_dict
+
+        if updated_plan is None and action == "edit":
+            raise ValueError("updated_plan is required when action is 'edit'")
+
+        response = await self.aclient.patch(url, headers=self._headers, json=data)
+        response.raise_for_status()
+        return ReportResponse(**response.json())
+
+    def update_plan(
+        self,
+        action: Literal["approve", "reject", "edit"],
+        updated_plan: Optional[ReportPlan] = None,
+    ) -> ReportResponse:
+        """Synchronous wrapper for updating this report's plan."""
+        return self._run_sync(self.aupdate_plan(action, updated_plan))
+
+    async def asuggest_edits(
+        self,
+        user_query: str,
+        auto_history: bool = True,
+        chat_history: Optional[List[dict]] = None,
+    ) -> List[EditSuggestion]:
+        """Get AI suggestions for edits to this report asynchronously.
+
+        Args:
+            user_query: The user's request/question about what to edit
+            auto_history: Whether to automatically add the user's message to the chat history
+            chat_history:
+                A list of chat messages to include in the chat history.
+                The format being a list of dictionaries with "role" and "content" keys.
+        """
+        # Add user message to history
+        self.chat_history.append(
+            Message(role=MessageRole.USER, content=user_query, timestamp=datetime.now())
+        )
+
+        # Format chat history with edit summaries
+        chat_history_dicts = []
+        for msg in self.chat_history[:-1]:  # Exclude current message
+            content = msg.content
+            if msg.role == MessageRole.USER:
+                # Add edit summary for user messages
+                edit_summary = self._get_edit_summary_after_message(msg.timestamp)
+                if edit_summary:
+                    content = f"{content}\n\nActions taken:\n{edit_summary}"
+
+            chat_history_dicts.append({"role": msg.role.value, "content": content})
+
+        # decide whether to include chat history or not
+        if chat_history:
+            chat_history_dicts = chat_history
+        elif auto_history:
+            chat_history_dicts = chat_history_dicts
+        else:
+            chat_history_dicts = []
+
+        # Make the API call
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}/suggest_edits")
+        data = {"user_query": user_query, "chat_history": chat_history_dicts}
+
+        response = await self.aclient.post(url, headers=self._headers, json=data)
+        response.raise_for_status()
+        suggestions = response.json()
+        suggestions = [EditSuggestion(**suggestion) for suggestion in suggestions]
+
+        # Add assistant response to history
+        if suggestions:
+            for suggestion in suggestions:
+                self.chat_history.append(
+                    Message(
+                        role=MessageRole.ASSISTANT,
+                        content=suggestion.justification,
+                        timestamp=datetime.now(),
+                    )
+                )
+
+        return suggestions
+
+    def suggest_edits(
+        self,
+        user_query: str,
+        auto_history: bool = True,
+        chat_history: Optional[List[dict]] = None,
+    ) -> List[EditSuggestion]:
+        """Synchronous wrapper for getting edit suggestions."""
+        return self._run_sync(
+            self.asuggest_edits(user_query, auto_history, chat_history)
+        )
+
+    async def await_completion(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> Report:
+        """Wait for this report to complete processing."""
+        start_time = time.time()
+        while True:
+            report_response = await self.aget()
+            status = report_response.status
+
+            if status == "completed":
+                return report_response.report
+            elif status == "error":
+                events = await self.aget_events()
+                raise ValueError(f"Report entered error state: {events[-1].msg}")
+            elif time.time() - start_time > timeout:
+                raise TimeoutError(f"Report did not complete within {timeout} seconds")
+
+            await asyncio.sleep(poll_interval)
+
+    def wait_for_completion(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> Report:
+        """Synchronous wrapper for awaiting report completion."""
+        return self._run_sync(self.await_completion(timeout, poll_interval))
+
+    async def await_for_plan(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> ReportPlan:
+        """Wait for this report's plan to be ready for review."""
+        start_time = time.time()
+        while True:
+            report_metadata = await self.aget_metadata()
+            state = report_metadata.state
+
+            if state == "waiting_approval":
+                report_response = await self.aget()
+                return report_response.plan
+            elif state == "error":
+                events = await self.aget_events()
+                raise ValueError(f"Report entered error state: {events[-1].msg}")
+            elif time.time() - start_time > timeout:
+                raise TimeoutError(f"Plan was not ready within {timeout} seconds")
+
+            await asyncio.sleep(poll_interval)
+
+    def wait_for_plan(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> ReportPlan:
+        """Synchronous wrapper for awaiting plan readiness."""
+        return self._run_sync(self.await_for_plan(timeout, poll_interval))
+
+    async def aget_metadata(self) -> ReportMetadata:
+        """Get this report's metadata asynchronously."""
+        return await self._client.aget_report_metadata(self.report_id)
+
+    def get_metadata(self) -> ReportMetadata:
+        """Synchronous wrapper for getting this report's metadata."""
+        return self._run_sync(self.aget_metadata())
+
+    async def adelete(self) -> None:
+        """Delete this report asynchronously."""
+        return await self._client.adelete_report(self.report_id)
+
+    def delete(self) -> None:
+        """Synchronous wrapper for deleting this report."""
+        return self._run_sync(self.adelete())
+
+    async def aaccept_edit(self, suggestion: EditSuggestion) -> None:
+        """Accept a suggested edit.
+
+        Args:
+            suggestion: The EditSuggestion to accept, typically from suggest_edits()
+        """
+        if len(suggestion.blocks) == 0:
+            return
+
+        # Determine if we're editing a plan or report based on first block type
+        is_plan_edit = isinstance(suggestion.blocks[0], ReportPlanBlock)
+
+        # Get current content
+        report_response = await self.aget()
+        current_blocks = (
+            report_response.plan.blocks
+            if is_plan_edit
+            else report_response.report.blocks
+        )
+
+        # Track the edit
+        new_blocks = []
+        for edit_block in suggestion.blocks:
+            # Find matching block in current content
+            old_block = next(
+                (
+                    b
+                    for b in current_blocks
+                    if self._get_block_idx(b) == self._get_block_idx(edit_block)
+                ),
+                None,
+            )
+
+            old_content = (
+                self._get_block_content(old_block) if old_block else "[No old content]"
+            )
+            new_content = self._get_block_content(edit_block)
+
+            if is_plan_edit:
+                new_queries_str = "\n".join(
+                    [
+                        f"Field: {q.field}, Prompt: {q.prompt}, Context: {q.context}"
+                        for q in edit_block.queries
+                    ]
+                )
+                new_dependency_str = (
+                    f"Depends on: {edit_block.dependency}"
+                    if edit_block.dependency
+                    else ""
+                )
+                new_content += f"\n\n{new_queries_str}\n{new_dependency_str}"
+
+                if old_block:
+                    old_queries_str = "\n".join(
+                        [
+                            f"Field: {q.field}, Prompt: {q.prompt}, Context: {q.context}"
+                            for q in old_block.queries
+                        ]
+                    )
+                    old_dependency_str = (
+                        f"Depends on: {old_block.dependency}"
+                        if old_block.dependency
+                        else ""
+                    )
+                    old_content += f"\n\n{old_queries_str}\n{old_dependency_str}"
+
+            self.edit_history.append(
+                EditAction(
+                    block_idx=self._get_block_idx(edit_block),
+                    old_content=old_content,
+                    new_content=new_content,
+                    action="approved",
+                    timestamp=datetime.now(),
+                )
+            )
+
+            # Create updated block
+            if is_plan_edit:
+                new_blocks.append(
+                    ReportPlanBlock(
+                        block=ReportBlock(
+                            idx=edit_block.block.idx,
+                            template=self._get_block_content(edit_block),
+                            sources=edit_block.block.sources,
+                        ),
+                        queries=edit_block.queries,
+                        dependency=edit_block.dependency,
+                    )
+                )
+            else:
+                new_blocks.append(
+                    ReportBlock(
+                        idx=edit_block.idx,
+                        template=self._get_block_content(edit_block),
+                        sources=edit_block.sources,
+                    )
+                )
+
+        if new_blocks:
+            if is_plan_edit:
+                # Update plan in place
+                plan = report_response.plan
+
+                # Replace edited blocks and add new ones
+                for new_block in new_blocks:
+                    block_idx = self._get_block_idx(new_block)
+                    existing_block_idx = next(
+                        (
+                            i
+                            for i, b in enumerate(plan.blocks)
+                            if b.block.idx == block_idx
+                        ),
+                        None,
+                    )
+
+                    if existing_block_idx is not None:
+                        # Replace existing block
+                        plan.blocks[existing_block_idx] = new_block
+                    else:
+                        # Add new block to end
+                        plan.blocks.append(new_block)
+
+                await self.aupdate_plan("edit", plan)
+            else:
+                # Update report in place
+                report = report_response.report
+
+                # Replace edited blocks and add new ones
+                for new_block in new_blocks:
+                    block_idx = self._get_block_idx(new_block)
+                    existing_block_idx = next(
+                        (i for i, b in enumerate(report.blocks) if b.idx == block_idx),
+                        None,
+                    )
+
+                    if existing_block_idx is not None:
+                        # Replace existing block
+                        report.blocks[existing_block_idx] = new_block
+                    else:
+                        # Add new block to end
+                        report.blocks.append(new_block)
+
+                await self.aupdate_report(report)
+
+    def accept_edit(self, suggestion: EditSuggestion) -> None:
+        """Synchronous wrapper for accepting an edit."""
+        return self._run_sync(self.aaccept_edit(suggestion))
+
+    async def areject_edit(self, suggestion: EditSuggestion) -> None:
+        """Reject a suggested edit.
+
+        Args:
+            suggestion: The EditSuggestion to reject, typically from suggest_edits()
+        """
+        # Track the rejections
+        for edit_block in suggestion.blocks:
+            self.edit_history.append(
+                EditAction(
+                    block_idx=self._get_block_idx(edit_block),
+                    old_content=self._get_block_content(edit_block),
+                    new_content=None,
+                    action="rejected",
+                    timestamp=datetime.now(),
+                )
+            )
+
+    def reject_edit(self, suggestion: EditSuggestion) -> None:
+        """Synchronous wrapper for rejecting an edit."""
+        return self._run_sync(self.areject_edit(suggestion))
+
+    def _get_edit_summary_after_message(
+        self, message_timestamp: datetime
+    ) -> Optional[str]:
+        """Get a summary of edits that occurred after a specific message."""
+        relevant_edits = [
+            edit for edit in self.edit_history if edit.timestamp > message_timestamp
+        ]
+
+        if not relevant_edits:
+            return None
+
+        approved = [edit for edit in relevant_edits if edit.action == "approved"]
+        rejected = [edit for edit in relevant_edits if edit.action == "rejected"]
+
+        summary = []
+
+        if approved:
+            summary.append("Approved edits:")
+            for edit in approved:
+                summary.append(
+                    f'Block {edit.block_idx}: "{edit.old_content}" -> "{edit.new_content}"'
+                )
+
+        if rejected:
+            if approved:  # Add spacing if we had approved edits
+                summary.append("")
+            summary.append("Rejected edits:")
+            for edit in rejected:
+                summary.append(f'Block {edit.block_idx}: "{edit.old_content}"')
+
+        return "\n".join(summary)
+
+    async def aget_events(
+        self, last_sequence: Optional[int] = None
+    ) -> List[ReportEventItemEventData_Progress]:
+        """Get all events for this report asynchronously.
+
+        Args:
+            last_sequence: If provided, only get events after this sequence number
+
+        Returns:
+            List of ReportEvent objects
+        """
+        extra_params = []
+        if last_sequence is not None:
+            extra_params.append(f"last_sequence={last_sequence}")
+
+        url = await self._build_url(
+            f"/api/v1/reports/{self.report_id}/events", extra_params
+        )
+
+        response = await self.aclient.get(url, headers=self._headers)
+        response.raise_for_status()
+        progress_events = []
+        for event in response.json():
+            if event["event_type"] == "progress":
+                progress_events.append(
+                    ReportEventItemEventData_Progress(**event["event_data"])
+                )
+
+        return progress_events
+
+    def get_events(
+        self, last_sequence: Optional[int] = None
+    ) -> List[ReportEventItemEventData_Progress]:
+        """Synchronous wrapper for getting report events."""
+        return self._run_sync(self.aget_events(last_sequence))
@@ -0,0 +1,165 @@
+# LlamaParse
+
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
+[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
+[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
+
+LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
+
+It is really good at the following:
+
+- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
+- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
+- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
+- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+
+LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
+
+The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
+
+Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
+
+If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+
+## Getting Started
+
+First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
+
+Then, make sure you have the latest LlamaIndex version installed.
+
+**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
+
+```
+pip uninstall llama-index  # run this if upgrading from v0.9.x or older
+pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
+```
+
+Lastly, install the package:
+
+`pip install llama-parse`
+
+Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+
+```bash
+export LLAMA_CLOUD_API_KEY='llx-...'
+
+# output as text
+llama-parse my_file.pdf --result-type text --output-file output.txt
+
+# output as markdown
+llama-parse my_file.pdf --result-type markdown --output-file output.md
+
+# output as raw json
+llama-parse my_file.pdf --output-raw-json --output-file output.json
+```
+
+You can also create simple scripts:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+# sync
+documents = parser.load_data("./my_file.pdf")
+
+# sync batch
+documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
+
+# async
+documents = await parser.aload_data("./my_file.pdf")
+
+# async batch
+documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
+```
+
+## Using with file object
+
+You can parse a file object directly:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+file_name = "my_file1.pdf"
+extra_info = {"file_name": file_name}
+
+with open(f"./{file_name}", "rb") as f:
+    # must provide extra_info with file_name key with passing file object
+    documents = parser.load_data(f, extra_info=extra_info)
+
+# you can also pass file bytes directly
+with open(f"./{file_name}", "rb") as f:
+    file_bytes = f.read()
+    # must provide extra_info with file_name key with passing file bytes
+    documents = parser.load_data(file_bytes, extra_info=extra_info)
+```
+
+## Using with `SimpleDirectoryReader`
+
+You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+from llama_index.core import SimpleDirectoryReader
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    verbose=True,
+)
+
+file_extractor = {".pdf": parser}
+documents = SimpleDirectoryReader(
+    "./data", file_extractor=file_extractor
+).load_data()
+```
+
+Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
+
+## Examples
+
+Several end-to-end indexing examples can be found in the examples folder
+
+- [Getting Started](examples/demo_basic.ipynb)
+- [Advanced RAG Example](examples/demo_advanced.ipynb)
+- [Raw API Usage](examples/demo_api.ipynb)
+
+## Documentation
+
+[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
+
+## Terms of Service
+
+See the [Terms of Service Here](./TOS.pdf).
+
+## Get in Touch (LlamaCloud)
+
+LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
+
+LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
@@ -1,3 +0,0 @@
-from llama_parse.base import LlamaParse, ResultType
-
-__all__ = ["LlamaParse", "ResultType"]
@@ -0,0 +1,3 @@
+from llama_cloud_services.parse import LlamaParse, ResultType
+
+__all__ = ["LlamaParse", "ResultType"]
@@ -0,0 +1,19 @@
+from llama_cloud_services.parse.base import (
+    LlamaParse,
+    ResultType,
+    FileInput,
+    _DEFAULT_SEPARATOR,
+    JOB_RESULT_URL,
+    JOB_STATUS_ROUTE,
+    JOB_UPLOAD_ROUTE,
+)
+
+__all__ = [
+    "LlamaParse",
+    "ResultType",
+    "FileInput",
+    "_DEFAULT_SEPARATOR",
+    "JOB_RESULT_URL",
+    "JOB_STATUS_ROUTE",
+    "JOB_UPLOAD_ROUTE",
+]
@@ -0,0 +1,4 @@
+from llama_cloud_services.parse.cli.main import parse
+
+if __name__ == "__main__":
+    parse()
@@ -0,0 +1,11 @@
+from llama_cloud_services.parse.utils import (
+    SUPPORTED_FILE_TYPES,
+    Language,
+    ResultType,
+)
+
+__all__ = [
+    "SUPPORTED_FILE_TYPES",
+    "Language",
+    "ResultType",
+]
@@ -0,0 +1,24 @@
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.poetry]
+name = "llama-parse"
+version = "0.5.21"
+description = "Parse files into RAG-Optimized formats."
+authors = ["Logan Markewich <logan@llamaindex.ai>"]
+license = "MIT"
+readme = "README.md"
+packages = [{include = "llama_parse"}]
+
+[tool.poetry.dependencies]
+python = ">=3.9,<4.0"
+llama-cloud-services = "*"
+
+[tool.poetry.group.dev.dependencies]
+pytest = "^8.0.0"
+pytest-asyncio = "*"
+ipykernel = "^6.29.0"
+
+[tool.poetry.scripts]
+llama-parse = "llama_parse.cli.main:parse"
@@ -0,0 +1,142 @@
+# LlamaParse
+
+LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
+
+It is really good at the following:
+
+- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
+- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
+- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
+- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+
+LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
+
+The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
+
+Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
+
+If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+
+## Getting Started
+
+First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
+
+Then, install the package:
+
+`pip install llama-cloud-services`
+
+Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+
+```bash
+export LLAMA_CLOUD_API_KEY='llx-...'
+
+# output as text
+llama-parse my_file.pdf --result-type text --output-file output.txt
+
+# output as markdown
+llama-parse my_file.pdf --result-type markdown --output-file output.md
+
+# output as raw json
+llama-parse my_file.pdf --output-raw-json --output-file output.json
+```
+
+You can also create simple scripts:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_cloud_services import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+# sync
+documents = parser.load_data("./my_file.pdf")
+
+# sync batch
+documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
+
+# async
+documents = await parser.aload_data("./my_file.pdf")
+
+# async batch
+documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
+```
+
+## Using with file object
+
+You can parse a file object directly:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_cloud_services import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+file_name = "my_file1.pdf"
+extra_info = {"file_name": file_name}
+
+with open(f"./{file_name}", "rb") as f:
+    # must provide extra_info with file_name key with passing file object
+    documents = parser.load_data(f, extra_info=extra_info)
+
+# you can also pass file bytes directly
+with open(f"./{file_name}", "rb") as f:
+    file_bytes = f.read()
+    # must provide extra_info with file_name key with passing file bytes
+    documents = parser.load_data(file_bytes, extra_info=extra_info)
+```
+
+## Using with `SimpleDirectoryReader`
+
+You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_cloud_services import LlamaParse
+from llama_index.core import SimpleDirectoryReader
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    verbose=True,
+)
+
+file_extractor = {".pdf": parser}
+documents = SimpleDirectoryReader(
+    "./data", file_extractor=file_extractor
+).load_data()
+```
+
+Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
+
+## Examples
+
+Several end-to-end indexing examples can be found in the examples folder
+
+- [Getting Started](examples/parse/demo_basic.ipynb)
+- [Advanced RAG Example](examples/parse/demo_advanced.ipynb)
+- [Raw API Usage](examples/parse/demo_api.ipynb)
+
+## Documentation
+
+[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
@@ -2,25 +2,38 @@
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"

+[tool.mypy]
+files = ["llama_cloud_services"]
+python_version = "3.10"
+
 [tool.poetry]
-name = "llama-parse"
-version = "0.5.20"
-description = "Parse files into RAG-Optimized formats."
-authors = ["Logan Markewich <logan@llamaindex.ai>"]
+name = "llama-cloud-services"
+version = "0.1.0"
+description = "Tailored SDK clients for LlamaCloud services."
+authors = ["Logan Markewich <logan@runllama.ai>"]
 license = "MIT"
 readme = "README.md"
-packages = [{include = "llama_parse"}]
+packages = [{include = "llama_cloud_services"}]

 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
 llama-index-core = ">=0.11.0"
+llama-cloud = "^0.1.11"
 pydantic = "!=2.10"
 click = "^8.1.7"
+python-dotenv = "^1.0.1"
+eval-type-backport = {python = "<3.10", version = "^0.2.0"}

 [tool.poetry.group.dev.dependencies]
 pytest = "^8.0.0"
 pytest-asyncio = "*"
 ipykernel = "^6.29.0"
+pre-commit = "3.2.0"
+autoevals = "^0.0.114"
+deepdiff = "^8.1.1"
+ipython = "^8.12.3"
+jupyter = "^1.1.1"
+mypy = "^1.14.1"

 [tool.poetry.scripts]
-llama-parse = "llama_parse.cli.main:parse"
+llama-parse = "llama_cloud_services.parse.cli.main:parse"
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Logan Markewich	59cf8d4165	remove llama-extract	2025-02-05 12:54:43 -06:00
Logan Markewich	684758b770	remove report pdf	2025-02-04 21:21:02 -06:00
Logan Markewich	c37325b496	make e2e work	2025-02-04 21:09:29 -06:00
Logan Markewich	2c4d5f5bb8	small nit	2025-02-04 15:37:32 -06:00
Logan Markewich	bf9a26b4ad	imports	2025-02-03 22:47:13 -06:00
Logan Markewich	c98239cb7a	readme organize	2025-02-03 10:04:21 -06:00
Logan Markewich	ec9b7331d6	except in teardown	2025-01-31 20:37:22 -06:00
Logan Markewich	f4b3f10c95	tests	2025-01-31 19:00:38 -06:00
Logan Markewich	02cb622d94	tests	2025-01-31 09:01:18 -06:00
Logan Markewich	f79263e25e	tests	2025-01-31 08:51:02 -06:00
Logan Markewich	9b5cc20d7f	types	2025-01-31 08:42:57 -06:00
Logan Markewich	c1f37bba2a	types	2025-01-31 08:39:10 -06:00
Logan Markewich	13cf7dbb15	wip	2025-01-30 17:33:21 -06:00
Logan Markewich	04312cc066	wip	2025-01-30 17:32:02 -06:00