cr

2026-07-01 21:44:37 -04:00 · 2025-08-19 09:20:32 -07:00 · 2025-08-19 09:19:02 -07:00 · 2025-08-18 12:43:47 -07:00 · 2025-08-18 11:32:35 +02:00 · 2025-08-17 17:49:53 -07:00
1 changed files with 931 additions and 0 deletions
@@ -0,0 +1,931 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Getting Started with LlamaParse: Parsing Modes Overview\n",
+    "\n",
+    "<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/parsing_modes/demo_presets.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
+    "\n",
+    "This notebook demonstrates the different parsing modes available in LlamaParse and how to use them effectively for document processing. We'll walk through three main parsing modes:\n",
+    "\n",
+    "1. **Cost-Effective Mode** (`parse_page_with_llm`) - Fast and economical parsing\n",
+    "2. **Agentic Mode** (`parse_page_with_agent` with `gpt-4-1-mini`) - Enhanced parsing with agent capabilities (Default)\n",
+    "3. **Agentic Plus Mode** (`parse_page_with_agent` with `anthropic-sonnet-4.0`) - Premium parsing with advanced models\n",
+    "\n",
+    "We'll use two sample documents:\n",
+    "- Apple 2021 10-K filing (text-heavy financial document)\n",
+    "- GenAI Research Report (visual-rich document with charts and diagrams)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "First, let's set up our environment and initialize the necessary components."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from llama_cloud_services import LlamaParse\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "\n",
+    "# Environment Variables - Make sure these are set\n",
+    "# os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\"  # Set in environment\n",
+    "# os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\"   # Set in environment\n",
+    "\n",
+    "# Initialize LLM for question answering\n",
+    "llm = OpenAI(model=\"gpt-5-mini\")\n",
+    "\n",
+    "# Project Configuration - Replace with your actual values\n",
+    "project_id = \"<project_id>\"  # Replace with your project ID\n",
+    "organization_id = \"<organization_id>\"  # Replace with your organization ID"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Document Files\n",
+    "\n",
+    "First, let's download our sample documents:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Create data directory if it doesn't exist\n",
+    "os.makedirs(\"data\", exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf\" -O data/apple_2021_10k.pdf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!wget \"https://www.sas.com/content/dam/SAS/documents/marketing-whitepapers-ebooks/ebooks/en/generative-ai-global-research-report-113914.pdf\" -O data/genai_research_report.pdf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Set file paths\n",
+    "apple_10k_path = \"./data/apple_2021_10k.pdf\"\n",
+    "genai_report_path = \"./data/genai_research_report.pdf\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Cost-Effective Mode\n",
+    "\n",
+    "The cost-effective mode (`parse_page_with_llm`) is ideal for:\n",
+    "- High-volume document processing\n",
+    "- Text-heavy documents without complex layouts\n",
+    "- Budget-conscious applications\n",
+    "\n",
+    "This mode provides fast, economical parsing while maintaining good quality for standard documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cost-Effective Mode Parser initialized\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize Cost-Effective Mode Parser\n",
+    "cost_effective_parser = LlamaParse(\n",
+    "    parse_mode=\"parse_page_with_llm\",\n",
+    "    high_res_ocr=True,\n",
+    "    adaptive_long_table=True,\n",
+    "    outlined_table_extraction=True,\n",
+    "    output_tables_as_HTML=False,\n",
+    "    result_type=\"markdown\",\n",
+    "    project_id=project_id,\n",
+    "    organization_id=organization_id,\n",
+    ")\n",
+    "\n",
+    "print(\"Cost-Effective Mode Parser initialized\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Parse Apple 10-K with Cost-Effective Mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Parse the Apple 10-K document\n",
+    "print(\"Parsing Apple 10-K with Cost-Effective Mode...\")\n",
+    "apple_result_cost_effective = await cost_effective_parser.aparse(apple_10k_path)\n",
+    "\n",
+    "# Get markdown nodes\n",
+    "apple_nodes_cost_effective = apple_result_cost_effective.get_markdown_nodes(\n",
+    "    split_by_page=True\n",
+    ")\n",
+    "print(f\"Number of pages extracted: {len(apple_nodes_cost_effective)}\")\n",
+    "\n",
+    "# Display sample output from page 32 (contains Q3 financial data)\n",
+    "print(\"\\n=== Sample Output - Page 32 (Cost-Effective Mode) ===\")\n",
+    "print(apple_nodes_cost_effective[31].text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Agentic Mode (Default)\n",
+    "\n",
+    "The agentic mode (`parse_page_with_agent` with `gpt-4-1-mini`) is the recommended default mode that offers:\n",
+    "- Enhanced understanding of document structure\n",
+    "- Better handling of complex layouts and tables\n",
+    "- Improved extraction of visual elements\n",
+    "- Balanced performance and cost"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Agentic Mode Parser initialized\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize Agentic Mode Parser\n",
+    "agentic_parser = LlamaParse(\n",
+    "    parse_mode=\"parse_page_with_agent\",\n",
+    "    model=\"openai-gpt-4-1-mini\",\n",
+    "    high_res_ocr=True,\n",
+    "    adaptive_long_table=True,\n",
+    "    outlined_table_extraction=True,\n",
+    "    output_tables_as_HTML=False,\n",
+    "    result_type=\"markdown\",\n",
+    "    project_id=project_id,\n",
+    "    organization_id=organization_id,\n",
+    ")\n",
+    "\n",
+    "print(\"Agentic Mode Parser initialized\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Parse GenAI Research Report with Agentic Mode\n",
+    "\n",
+    "This document contains charts and visual elements, making it ideal for demonstrating the agentic mode's capabilities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Parsing GenAI Research Report with Agentic Mode...\n",
+      "Started parsing the file under job_id 98d363fb-135f-4529-94a5-713f9b6b2025\n",
+      "Number of pages extracted: 38\n",
+      "\n",
+      "=== Sample Output - Page 7 (Agentic Mode) ===\n",
+      "\n",
+      "# Only one in 10 businesses has undergone the preparation needed to comply with current and upcoming regulations concerning GenAI.\n",
+      "\n",
+      "# The majority of organizations lack a comprehensive governance framework for both AI and GenAI (seven in 10 adopters admit to this).\n",
+      "\n",
+      "## How prepared is your organization to comply with current and upcoming regulations concerning GenAI?\n",
+      "\n",
+      "<table>\n",
+      "  <thead>\n",
+      "    <tr>\n",
+      "      <th></th>\n",
+      "      <th>Fully prepared</th>\n",
+      "      <th>Moderately prepared</th>\n",
+      "      <th>Slightly prepared</th>\n",
+      "      <th>Not prepared</th>\n",
+      "    </tr>\n",
+      "  </thead>\n",
+      "  <tbody>\n",
+      "    <tr>\n",
+      "      <td>All respondents using/planning to use GenAI</td>\n",
+      "      <td>10%</td>\n",
+      "      <td>48%</td>\n",
+      "      <td>40%</td>\n",
+      "      <td>2%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI and have fully implemented it</td>\n",
+      "      <td>35%</td>\n",
+      "      <td>49%</td>\n",
+      "      <td>15%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI but haven't yet fully implemented it</td>\n",
+      "      <td>11%</td>\n",
+      "      <td>66%</td>\n",
+      "      <td>23%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Not yet using GenAI but intend to within the next two years</td>\n",
+      "      <td>3%</td>\n",
+      "      <td>28%</td>\n",
+      "      <td>64%</td>\n",
+      "      <td>5%</td>\n",
+      "    </tr>\n",
+      "  </tbody>\n",
+      "</table>\n",
+      "\n",
+      "> *Please note that percentages on charts may not add to 100% due to rounding*\n",
+      "\n",
+      "## How would you describe your current GenAI/AI governance framework?\n",
+      "\n",
+      "### Artificial Intelligence (AI) Governance framework\n",
+      "\n",
+      "<table>\n",
+      "  <thead>\n",
+      "    <tr>\n",
+      "      <th></th>\n",
+      "      <th>Well-established and comprehensive</th>\n",
+      "      <th>In development</th>\n",
+      "      <th>Ad hoc or informal</th>\n",
+      "      <th>Nonexistent</th>\n",
+      "    </tr>\n",
+      "  </thead>\n",
+      "  <tbody>\n",
+      "    <tr>\n",
+      "      <td>All respondents using/planning to use GenAI</td>\n",
+      "      <td>13%</td>\n",
+      "      <td>61%</td>\n",
+      "      <td>21%</td>\n",
+      "      <td>6%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI and have fully implemented it</td>\n",
+      "      <td>33%</td>\n",
+      "      <td>64%</td>\n",
+      "      <td>3%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI but haven't yet fully implemented it</td>\n",
+      "      <td>18%</td>\n",
+      "      <td>69%</td>\n",
+      "      <td>13%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Not yet using GenAI but intend to within the next two years</td>\n",
+      "      <td>1%</td>\n",
+      "      <td>52%</td>\n",
+      "      <td>34%</td>\n",
+      "      <td>13%</td>\n",
+      "    </tr>\n",
+      "  </tbody>\n",
+      "</table>\n",
+      "\n",
+      "### GenAI Governance framework\n",
+      "\n",
+      "<table>\n",
+      "  <thead>\n",
+      "    <tr>\n",
+      "      <th></th>\n",
+      "      <th>Well-established and comprehensive</th>\n",
+      "      <th>In development</th>\n",
+      "      <th>Ad hoc or informal</th>\n",
+      "      <th>Nonexistent</th>\n",
+      "    </tr>\n",
+      "  </thead>\n",
+      "  <tbody>\n",
+      "    <tr>\n",
+      "      <td>All respondents using/planning to use GenAI</td>\n",
+      "      <td>5%</td>\n",
+      "      <td>55%</td>\n",
+      "      <td>28%</td>\n",
+      "      <td>11%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI and have fully implemented it</td>\n",
+      "      <td>29%</td>\n",
+      "      <td>58%</td>\n",
+      "      <td>13%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Using GenAI but haven't yet fully implemented it</td>\n",
+      "      <td>4%</td>\n",
+      "      <td>78%</td>\n",
+      "      <td>17%</td>\n",
+      "      <td>0%</td>\n",
+      "    </tr>\n",
+      "<tr>\n",
+      "      <td>Not yet using GenAI but intend to within the next two years</td>\n",
+      "      <td>0%</td>\n",
+      "      <td>31%</td>\n",
+      "      <td>43%</td>\n",
+      "      <td>26%</td>\n",
+      "    </tr>\n",
+      "  </tbody>\n",
+      "</table>\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Parse the GenAI Research Report\n",
+    "print(\"Parsing GenAI Research Report with Agentic Mode...\")\n",
+    "genai_result_agentic = await agentic_parser.aparse(genai_report_path)\n",
+    "\n",
+    "# Get markdown nodes\n",
+    "genai_nodes_agentic = genai_result_agentic.get_markdown_nodes(split_by_page=True)\n",
+    "print(f\"Number of pages extracted: {len(genai_nodes_agentic)}\")\n",
+    "\n",
+    "# Display sample output from page 7 (contains regulatory compliance data)\n",
+    "print(\"\\n=== Sample Output - Page 7 (Agentic Mode) ===\")\n",
+    "print(genai_nodes_agentic[6].text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Agentic Plus Mode\n",
+    "\n",
+    "The agentic plus mode (`parse_page_with_agent` with `anthropic-sonnet-4.0`) provides premium parsing for:\n",
+    "- Highly complex documents with intricate layouts\n",
+    "- Documents requiring maximum accuracy\n",
+    "- Advanced reasoning over visual content\n",
+    "- Critical business applications where quality is paramount"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Agentic Plus Mode Parser initialized\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize Agentic Plus Mode Parser\n",
+    "agentic_plus_parser = LlamaParse(\n",
+    "    parse_mode=\"parse_page_with_agent\",\n",
+    "    model=\"anthropic-sonnet-4.0\",\n",
+    "    high_res_ocr=True,\n",
+    "    adaptive_long_table=True,\n",
+    "    outlined_table_extraction=True,\n",
+    "    output_tables_as_HTML=False,\n",
+    "    result_type=\"markdown\",\n",
+    "    project_id=project_id,\n",
+    "    organization_id=organization_id,\n",
+    ")\n",
+    "\n",
+    "print(\"Agentic Plus Mode Parser initialized\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Parse Apple 10-K with Agentic Plus Mode"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Parsing Apple 10-K with Agentic Plus Mode...\n",
+      "Started parsing the file under job_id 172dd00c-c2c7-4d48-8867-ab466ecd5539\n",
+      "Number of pages extracted: 82\n",
+      "\n",
+      "=== Sample Output - Page 32 (Agentic Plus Mode) ===\n",
+      "\n",
+      "# Apple Inc.\n",
+      "\n",
+      "## CONSOLIDATED STATEMENTS OF OPERATIONS\n",
+      "*(In millions, except number of shares which are reflected in thousands and per share amounts)*\n",
+      "\n",
+      "<table>\n",
+      "<thead>\n",
+      "<tr>\n",
+      "<th></th>\n",
+      "<th colspan=\"3\">Years ended</th>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<th></th>\n",
+      "<th>September 25, 2021</th>\n",
+      "<th>September 26, 2020</th>\n",
+      "<th>September 28, 2019</th>\n",
+      "</tr>\n",
+      "</thead>\n",
+      "<tbody>\n",
+      "<tr>\n",
+      "<td><strong>Net sales:</strong></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td> Products</td>\n",
+      "<td>$ 297,392</td>\n",
+      "<td>$ 220,747</td>\n",
+      "<td>$ 213,883</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td> Services</td>\n",
+      "<td>68,425</td>\n",
+      "<td>53,768</td>\n",
+      "<td>46,291</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Total net sales</td>\n",
+      "<td>365,817</td>\n",
+      "<td>274,515</td>\n",
+      "<td>260,174</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Cost of sales:</strong></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td> Products</td>\n",
+      "<td>192,266</td>\n",
+      "<td>151,286</td>\n",
+      "<td>144,996</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td> Services</td>\n",
+      "<td>20,715</td>\n",
+      "<td>18,273</td>\n",
+      "<td>16,786</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Total cost of sales</td>\n",
+      "<td>212,981</td>\n",
+      "<td>169,559</td>\n",
+      "<td>161,782</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Gross margin</td>\n",
+      "<td>152,836</td>\n",
+      "<td>104,956</td>\n",
+      "<td>98,392</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Operating expenses:</strong></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Research and development</td>\n",
+      "<td>21,914</td>\n",
+      "<td>18,752</td>\n",
+      "<td>16,217</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Selling, general and administrative</td>\n",
+      "<td>21,973</td>\n",
+      "<td>19,916</td>\n",
+      "<td>18,245</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Total operating expenses</td>\n",
+      "<td>43,887</td>\n",
+      "<td>38,668</td>\n",
+      "<td>34,462</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Operating income</strong></td>\n",
+      "<td>108,949</td>\n",
+      "<td>66,288</td>\n",
+      "<td>63,930</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Other income/(expense), net</strong></td>\n",
+      "<td>258</td>\n",
+      "<td>803</td>\n",
+      "<td>1,807</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Income before provision for income taxes</strong></td>\n",
+      "<td>109,207</td>\n",
+      "<td>67,091</td>\n",
+      "<td>65,737</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Provision for income taxes</strong></td>\n",
+      "<td>14,527</td>\n",
+      "<td>9,680</td>\n",
+      "<td>10,481</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Net income</strong></td>\n",
+      "<td>$ 94,680</td>\n",
+      "<td>$ 57,411</td>\n",
+      "<td>$ 55,256</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Earnings per share:</strong></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Basic</td>\n",
+      "<td>$ 5.67</td>\n",
+      "<td>$ 3.31</td>\n",
+      "<td>$ 2.99</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Diluted</td>\n",
+      "<td>$ 5.61</td>\n",
+      "<td>$ 3.28</td>\n",
+      "<td>$ 2.97</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td><strong>Shares used in computing earnings per share:</strong></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "<td></td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Basic</td>\n",
+      "<td>16,701,272</td>\n",
+      "<td>17,352,119</td>\n",
+      "<td>18,471,336</td>\n",
+      "</tr>\n",
+      "<tr>\n",
+      "<td>  Diluted</td>\n",
+      "<td>16,864,919</td>\n",
+      "<td>17,528,214</td>\n",
+      "<td>18,595,651</td>\n",
+      "</tr>\n",
+      "</tbody>\n",
+      "</table>\n",
+      "\n",
+      "See accompanying Notes to Consolidated Financial Statements.\n",
+      "\n",
+      "Apple Inc. | 2021 Form 10-K | 29\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Parse the Apple 10-K document with premium mode\n",
+    "print(\"Parsing Apple 10-K with Agentic Plus Mode...\")\n",
+    "apple_result_agentic_plus = await agentic_plus_parser.aparse(apple_10k_path)\n",
+    "\n",
+    "# Get markdown nodes\n",
+    "apple_nodes_agentic_plus = apple_result_agentic_plus.get_markdown_nodes(\n",
+    "    split_by_page=True\n",
+    ")\n",
+    "print(f\"Number of pages extracted: {len(apple_nodes_agentic_plus)}\")\n",
+    "\n",
+    "# Display sample output from page 32\n",
+    "print(\"\\n=== Sample Output - Page 32 (Agentic Plus Mode) ===\")\n",
+    "print(apple_nodes_agentic_plus[31].text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Question Answering Examples\n",
+    "\n",
+    "Now let's demonstrate how to use the parsed content to answer specific questions using an LLM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import PromptTemplate\n",
+    "\n",
+    "\n",
+    "async def ask_question_about_page(\n",
+    "    page_content: str, question: str, document_type: str = \"document\"\n",
+    ") -> str:\n",
+    "    \"\"\"Helper function to ask questions about page content using LLM.\"\"\"\n",
+    "    qa_template = PromptTemplate(\n",
+    "        \"\"\"\n",
+    "        Based on the following page content from a {document_type}, please answer the question:\n",
+    "\n",
+    "        Question: {question}\n",
+    "\n",
+    "        Page Content:\n",
+    "        {page_content}\n",
+    "\n",
+    "        Please provide a specific answer with numbers if available.\n",
+    "        \"\"\"\n",
+    "    )\n",
+    "\n",
+    "    prompt = qa_template.format(\n",
+    "        question=question, page_content=page_content, document_type=document_type\n",
+    "    )\n",
+    "\n",
+    "    response = await llm.acomplete(prompt)\n",
+    "    return response.text"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Question 1: Apple 10-K Financial Data\n",
+    "\n",
+    "**Question**: \"What are net sales in Q3 September 2021 including product/services breakdown?\"\n",
+    "\n",
+    "**Source**: Page 32 of Apple 10-K"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "=== Apple 10-K Financial Data Answer ===\n",
+      "The 10‑K table shows (in millions) for the year ended September 25, 2021:\n",
+      "- Products: $297,392 million\n",
+      "- Services: $68,425 million\n",
+      "- Total net sales: $365,817 million\n",
+      "\n",
+      "Note: the table is the consolidated statement of operations for the fiscal year ended Sept. 25, 2021. If you meant fiscal Q3 (a single quarter), let me know and I can pull the quarterly figures.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use the cost-effective mode result for this example\n",
+    "page_32_content = apple_nodes_cost_effective[31].text\n",
+    "question = (\n",
+    "    \"What are net sales in Q3 September 2021 including product/services breakdown?\"\n",
+    ")\n",
+    "\n",
+    "answer = await ask_question_about_page(\n",
+    "    page_content=page_32_content, question=question, document_type=\"Apple's 10-K filing\"\n",
+    ")\n",
+    "\n",
+    "print(\"=== Apple 10-K Financial Data Answer ===\")\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Question 2: GenAI Research Report Compliance\n",
+    "\n",
+    "**Question**: \"How prepared are organizations in complying with current/upcoming regulations concerning genAI?\"\n",
+    "\n",
+    "**Source**: Page 7 of GenAI Research Report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "=== GenAI Regulatory Compliance Answer ===\n",
+      "Short answer: Very poorly. Only 1 in 10 organizations say they are fully prepared to comply with current/upcoming GenAI regulations.\n",
+      "\n",
+      "Key numbers (all respondents using/planning to use GenAI)\n",
+      "- Fully prepared: 10%  \n",
+      "- Moderately prepared: 48%  \n",
+      "- Slightly prepared: 40%  \n",
+      "- Not prepared: 2%\n",
+      "\n",
+      "Breakdown by adoption stage\n",
+      "- Using GenAI and fully implemented: 35% fully prepared, 49% moderately, 15% slightly, 0% not prepared.  \n",
+      "- Using GenAI but not fully implemented: 11% fully prepared, 66% moderately, 23% slightly, 0% not prepared.  \n",
+      "- Not yet using but intend to within 2 years: 3% fully prepared, 28% moderately, 64% slightly, 5% not prepared.\n",
+      "\n",
+      "Related governance context\n",
+      "- AI governance (all respondents): 13% have a well‑established/comprehensive framework; 61% in development; 21% ad hoc; 6% nonexistent.  \n",
+      "- GenAI governance (all respondents): only 5% well‑established; 55% in development; 28% ad hoc; 11% nonexistent.  \n",
+      "- Among organizations that have fully implemented GenAI, only 29% have a well‑established GenAI governance framework — meaning ~71% of adopters lack a comprehensive GenAI governance framework (the “seven in 10” cited).\n",
+      "\n",
+      "(Percentages may not sum to exactly 100% due to rounding.)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use the agentic mode result for this example\n",
+    "page_7_content = genai_nodes_agentic[6].text\n",
+    "question = \"How prepared are organizations in complying with current/upcoming regulations concerning genAI?\"\n",
+    "\n",
+    "answer = await ask_question_about_page(\n",
+    "    page_content=page_7_content,\n",
+    "    question=question,\n",
+    "    document_type=\"GenAI Research Report\",\n",
+    ")\n",
+    "\n",
+    "print(\"=== GenAI Regulatory Compliance Answer ===\")\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## EU Server Configuration\n",
+    "\n",
+    "For users in Europe or those requiring EU data residency, you can easily configure LlamaParse to use the EU server by adding the `base_url` parameter.\n",
+    "\n",
+    "**NOTE**: You will need to sign up for an account on https://cloud.eu.llamaindex.ai/ and get a separate API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example: EU Server Configuration\n",
+    "eu_parser = LlamaParse(\n",
+    "    parse_mode=\"parse_page_with_agent\",\n",
+    "    model=\"openai-gpt-4-1-mini\",\n",
+    "    base_url=\"https://api.cloud.eu.llamaindex.ai\",  # EU server endpoint\n",
+    "    high_res_ocr=True,\n",
+    "    adaptive_long_table=True,\n",
+    "    outlined_table_extraction=True,\n",
+    "    output_tables_as_HTML=False,\n",
+    "    result_type=\"markdown\",\n",
+    "    project_id=project_id,\n",
+    "    organization_id=organization_id,\n",
+    "    api_key=\"<llamacloud_eu_api_key>\",\n",
+    ")\n",
+    "\n",
+    "print(\"EU Server Parser configured (not executed in this demo)\")\n",
+    "print(\"Simply add base_url='https://api.cloud.eu.llamaindex.ai' to use EU servers\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Mode Comparison Summary\n",
+    "\n",
+    "| Mode | Use Case | Cost | Speed | Accuracy |\n",
+    "|------|----------|------|-------|----------|\n",
+    "| **Cost-Effective** | High-volume, text-heavy documents | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |\n",
+    "| **Agentic (Default)** | General purpose, balanced performance | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ |\n",
+    "| **Agentic Plus** | Complex documents, maximum accuracy | ⭐ | ⭐ | ⭐⭐⭐ |\n",
+    "\n",
+    "### Choosing the Right Mode:\n",
+    "\n",
+    "- **Start with Agentic Mode** - It's the default for good reason, offering the best balance of quality and cost\n",
+    "- **Use Cost-Effective Mode** when processing large volumes of straightforward documents\n",
+    "- **Upgrade to Agentic Plus Mode** for complex documents with intricate layouts, charts, or when maximum accuracy is required"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Steps and Additional Resources\n",
+    "\n",
+    "Now that you've learned about LlamaParse's different modes, explore these resources for deeper dives:\n",
+    "\n",
+    "### Advanced Features\n",
+    "- **JSON Mode Analysis**: Check out `demo_json_tour.ipynb` for detailed analysis of parsing outputs through JSON mode\n",
+    "- **Auto Mode**: Explore `parsing_modes/demo_auto_mode.ipynb` for automatic mode selection based on document characteristics\n",
+    "\n",
+    "### Building Applications\n",
+    "- **LlamaCloud Getting Started**: To setup an e2e RAG/retrieval pipeline, visit the [LlamaCloud Getting Started Guide](https://docs.cloud.llamaindex.ai/llamacloud/how_to/getting-started-with-index)\n",
+    "- **API Documentation**: Full API reference at [LlamaCloud Documentation](https://docs.cloud.llamaindex.ai/API/llama-platform)\n",
+    "\n",
+    "### Key Configuration Options\n",
+    "- `high_res_ocr=True` - Enhanced OCR for better text extraction\n",
+    "- `adaptive_long_table=True` - Better handling of complex tables\n",
+    "- `outlined_table_extraction=True` - Improved table structure detection\n",
+    "- `output_tables_as_HTML=False` - Output tables as markdown instead of HTML\n",
+    "- `result_type=\"markdown\"` - Clean, structured output format\n",
+    "\n",
+    "Happy parsing! 🚀"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "llama_parse",
+   "language": "python",
+   "name": "llama_parse"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
Author	SHA1	Message	Date
Jerry Liu	5e5276adda	cr	2025-08-19 09:20:32 -07:00
Jerry Liu	a8fe85be09	cr	2025-08-19 09:19:02 -07:00
Jerry Liu	fe779e13a4	cr	2025-08-18 12:43:47 -07:00
Clelia (Astra) Bertelli	90aaa4beff	Merge branch 'main' into jerry/add_parse_preset_notebooks	2025-08-18 11:32:35 +02:00
Jerry Liu	8faeb8bdec	cr	2025-08-17 17:49:53 -07:00
Jerry Liu	8028ee810a	cr	2025-08-17 17:39:12 -07:00
Jerry Liu	c7788c84a9	cr	2025-08-17 17:35:45 -07:00
Jerry Liu	0f9bfdf676	cr	2025-08-17 17:34:49 -07:00
Jerry Liu	8cb357f3dc	cr	2025-08-17 17:34:17 -07:00