Compare commits

...

9 Commits

Author SHA1 Message Date
Jerry Liu 5e5276adda cr 2025-08-19 09:20:32 -07:00
Jerry Liu a8fe85be09 cr 2025-08-19 09:19:02 -07:00
Jerry Liu fe779e13a4 cr 2025-08-18 12:43:47 -07:00
Clelia (Astra) Bertelli 90aaa4beff Merge branch 'main' into jerry/add_parse_preset_notebooks 2025-08-18 11:32:35 +02:00
Jerry Liu 8faeb8bdec cr 2025-08-17 17:49:53 -07:00
Jerry Liu 8028ee810a cr 2025-08-17 17:39:12 -07:00
Jerry Liu c7788c84a9 cr 2025-08-17 17:35:45 -07:00
Jerry Liu 0f9bfdf676 cr 2025-08-17 17:34:49 -07:00
Jerry Liu 8cb357f3dc cr 2025-08-17 17:34:17 -07:00
@@ -0,0 +1,931 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with LlamaParse: Parsing Modes Overview\n",
"\n",
"<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/parsing_modes/demo_presets.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"This notebook demonstrates the different parsing modes available in LlamaParse and how to use them effectively for document processing. We'll walk through three main parsing modes:\n",
"\n",
"1. **Cost-Effective Mode** (`parse_page_with_llm`) - Fast and economical parsing\n",
"2. **Agentic Mode** (`parse_page_with_agent` with `gpt-4-1-mini`) - Enhanced parsing with agent capabilities (Default)\n",
"3. **Agentic Plus Mode** (`parse_page_with_agent` with `anthropic-sonnet-4.0`) - Premium parsing with advanced models\n",
"\n",
"We'll use two sample documents:\n",
"- Apple 2021 10-K filing (text-heavy financial document)\n",
"- GenAI Research Report (visual-rich document with charts and diagrams)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"First, let's set up our environment and initialize the necessary components."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from llama_cloud_services import LlamaParse\n",
"from llama_index.llms.openai import OpenAI\n",
"\n",
"# Environment Variables - Make sure these are set\n",
"# os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\" # Set in environment\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\" # Set in environment\n",
"\n",
"# Initialize LLM for question answering\n",
"llm = OpenAI(model=\"gpt-5-mini\")\n",
"\n",
"# Project Configuration - Replace with your actual values\n",
"project_id = \"<project_id>\" # Replace with your project ID\n",
"organization_id = \"<organization_id>\" # Replace with your organization ID"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Document Files\n",
"\n",
"First, let's download our sample documents:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Create data directory if it doesn't exist\n",
"os.makedirs(\"data\", exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget \"https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf\" -O data/apple_2021_10k.pdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget \"https://www.sas.com/content/dam/SAS/documents/marketing-whitepapers-ebooks/ebooks/en/generative-ai-global-research-report-113914.pdf\" -O data/genai_research_report.pdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set file paths\n",
"apple_10k_path = \"./data/apple_2021_10k.pdf\"\n",
"genai_report_path = \"./data/genai_research_report.pdf\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Cost-Effective Mode\n",
"\n",
"The cost-effective mode (`parse_page_with_llm`) is ideal for:\n",
"- High-volume document processing\n",
"- Text-heavy documents without complex layouts\n",
"- Budget-conscious applications\n",
"\n",
"This mode provides fast, economical parsing while maintaining good quality for standard documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cost-Effective Mode Parser initialized\n"
]
}
],
"source": [
"# Initialize Cost-Effective Mode Parser\n",
"cost_effective_parser = LlamaParse(\n",
" parse_mode=\"parse_page_with_llm\",\n",
" high_res_ocr=True,\n",
" adaptive_long_table=True,\n",
" outlined_table_extraction=True,\n",
" output_tables_as_HTML=False,\n",
" result_type=\"markdown\",\n",
" project_id=project_id,\n",
" organization_id=organization_id,\n",
")\n",
"\n",
"print(\"Cost-Effective Mode Parser initialized\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parse Apple 10-K with Cost-Effective Mode"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Parse the Apple 10-K document\n",
"print(\"Parsing Apple 10-K with Cost-Effective Mode...\")\n",
"apple_result_cost_effective = await cost_effective_parser.aparse(apple_10k_path)\n",
"\n",
"# Get markdown nodes\n",
"apple_nodes_cost_effective = apple_result_cost_effective.get_markdown_nodes(\n",
" split_by_page=True\n",
")\n",
"print(f\"Number of pages extracted: {len(apple_nodes_cost_effective)}\")\n",
"\n",
"# Display sample output from page 32 (contains Q3 financial data)\n",
"print(\"\\n=== Sample Output - Page 32 (Cost-Effective Mode) ===\")\n",
"print(apple_nodes_cost_effective[31].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Agentic Mode (Default)\n",
"\n",
"The agentic mode (`parse_page_with_agent` with `gpt-4-1-mini`) is the recommended default mode that offers:\n",
"- Enhanced understanding of document structure\n",
"- Better handling of complex layouts and tables\n",
"- Improved extraction of visual elements\n",
"- Balanced performance and cost"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Agentic Mode Parser initialized\n"
]
}
],
"source": [
"# Initialize Agentic Mode Parser\n",
"agentic_parser = LlamaParse(\n",
" parse_mode=\"parse_page_with_agent\",\n",
" model=\"openai-gpt-4-1-mini\",\n",
" high_res_ocr=True,\n",
" adaptive_long_table=True,\n",
" outlined_table_extraction=True,\n",
" output_tables_as_HTML=False,\n",
" result_type=\"markdown\",\n",
" project_id=project_id,\n",
" organization_id=organization_id,\n",
")\n",
"\n",
"print(\"Agentic Mode Parser initialized\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parse GenAI Research Report with Agentic Mode\n",
"\n",
"This document contains charts and visual elements, making it ideal for demonstrating the agentic mode's capabilities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Parsing GenAI Research Report with Agentic Mode...\n",
"Started parsing the file under job_id 98d363fb-135f-4529-94a5-713f9b6b2025\n",
"Number of pages extracted: 38\n",
"\n",
"=== Sample Output - Page 7 (Agentic Mode) ===\n",
"\n",
"# Only one in 10 businesses has undergone the preparation needed to comply with current and upcoming regulations concerning GenAI.\n",
"\n",
"# The majority of organizations lack a comprehensive governance framework for both AI and GenAI (seven in 10 adopters admit to this).\n",
"\n",
"## How prepared is your organization to comply with current and upcoming regulations concerning GenAI?\n",
"\n",
"<table>\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>Fully prepared</th>\n",
" <th>Moderately prepared</th>\n",
" <th>Slightly prepared</th>\n",
" <th>Not prepared</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>All respondents using/planning to use GenAI</td>\n",
" <td>10%</td>\n",
" <td>48%</td>\n",
" <td>40%</td>\n",
" <td>2%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI and have fully implemented it</td>\n",
" <td>35%</td>\n",
" <td>49%</td>\n",
" <td>15%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI but haven't yet fully implemented it</td>\n",
" <td>11%</td>\n",
" <td>66%</td>\n",
" <td>23%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Not yet using GenAI but intend to within the next two years</td>\n",
" <td>3%</td>\n",
" <td>28%</td>\n",
" <td>64%</td>\n",
" <td>5%</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"\n",
"> *Please note that percentages on charts may not add to 100% due to rounding*\n",
"\n",
"## How would you describe your current GenAI/AI governance framework?\n",
"\n",
"### Artificial Intelligence (AI) Governance framework\n",
"\n",
"<table>\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>Well-established and comprehensive</th>\n",
" <th>In development</th>\n",
" <th>Ad hoc or informal</th>\n",
" <th>Nonexistent</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>All respondents using/planning to use GenAI</td>\n",
" <td>13%</td>\n",
" <td>61%</td>\n",
" <td>21%</td>\n",
" <td>6%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI and have fully implemented it</td>\n",
" <td>33%</td>\n",
" <td>64%</td>\n",
" <td>3%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI but haven't yet fully implemented it</td>\n",
" <td>18%</td>\n",
" <td>69%</td>\n",
" <td>13%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Not yet using GenAI but intend to within the next two years</td>\n",
" <td>1%</td>\n",
" <td>52%</td>\n",
" <td>34%</td>\n",
" <td>13%</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"\n",
"### GenAI Governance framework\n",
"\n",
"<table>\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>Well-established and comprehensive</th>\n",
" <th>In development</th>\n",
" <th>Ad hoc or informal</th>\n",
" <th>Nonexistent</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>All respondents using/planning to use GenAI</td>\n",
" <td>5%</td>\n",
" <td>55%</td>\n",
" <td>28%</td>\n",
" <td>11%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI and have fully implemented it</td>\n",
" <td>29%</td>\n",
" <td>58%</td>\n",
" <td>13%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Using GenAI but haven't yet fully implemented it</td>\n",
" <td>4%</td>\n",
" <td>78%</td>\n",
" <td>17%</td>\n",
" <td>0%</td>\n",
" </tr>\n",
"<tr>\n",
" <td>Not yet using GenAI but intend to within the next two years</td>\n",
" <td>0%</td>\n",
" <td>31%</td>\n",
" <td>43%</td>\n",
" <td>26%</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"\n",
"\n"
]
}
],
"source": [
"# Parse the GenAI Research Report\n",
"print(\"Parsing GenAI Research Report with Agentic Mode...\")\n",
"genai_result_agentic = await agentic_parser.aparse(genai_report_path)\n",
"\n",
"# Get markdown nodes\n",
"genai_nodes_agentic = genai_result_agentic.get_markdown_nodes(split_by_page=True)\n",
"print(f\"Number of pages extracted: {len(genai_nodes_agentic)}\")\n",
"\n",
"# Display sample output from page 7 (contains regulatory compliance data)\n",
"print(\"\\n=== Sample Output - Page 7 (Agentic Mode) ===\")\n",
"print(genai_nodes_agentic[6].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Agentic Plus Mode\n",
"\n",
"The agentic plus mode (`parse_page_with_agent` with `anthropic-sonnet-4.0`) provides premium parsing for:\n",
"- Highly complex documents with intricate layouts\n",
"- Documents requiring maximum accuracy\n",
"- Advanced reasoning over visual content\n",
"- Critical business applications where quality is paramount"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Agentic Plus Mode Parser initialized\n"
]
}
],
"source": [
"# Initialize Agentic Plus Mode Parser\n",
"agentic_plus_parser = LlamaParse(\n",
" parse_mode=\"parse_page_with_agent\",\n",
" model=\"anthropic-sonnet-4.0\",\n",
" high_res_ocr=True,\n",
" adaptive_long_table=True,\n",
" outlined_table_extraction=True,\n",
" output_tables_as_HTML=False,\n",
" result_type=\"markdown\",\n",
" project_id=project_id,\n",
" organization_id=organization_id,\n",
")\n",
"\n",
"print(\"Agentic Plus Mode Parser initialized\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parse Apple 10-K with Agentic Plus Mode"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Parsing Apple 10-K with Agentic Plus Mode...\n",
"Started parsing the file under job_id 172dd00c-c2c7-4d48-8867-ab466ecd5539\n",
"Number of pages extracted: 82\n",
"\n",
"=== Sample Output - Page 32 (Agentic Plus Mode) ===\n",
"\n",
"# Apple Inc.\n",
"\n",
"## CONSOLIDATED STATEMENTS OF OPERATIONS\n",
"*(In millions, except number of shares which are reflected in thousands and per share amounts)*\n",
"\n",
"<table>\n",
"<thead>\n",
"<tr>\n",
"<th></th>\n",
"<th colspan=\"3\">Years ended</th>\n",
"</tr>\n",
"<tr>\n",
"<th></th>\n",
"<th>September 25, 2021</th>\n",
"<th>September 26, 2020</th>\n",
"<th>September 28, 2019</th>\n",
"</tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr>\n",
"<td><strong>Net sales:</strong></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td> Products</td>\n",
"<td>$ 297,392</td>\n",
"<td>$ 220,747</td>\n",
"<td>$ 213,883</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Services</td>\n",
"<td>68,425</td>\n",
"<td>53,768</td>\n",
"<td>46,291</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Total net sales</td>\n",
"<td>365,817</td>\n",
"<td>274,515</td>\n",
"<td>260,174</td>\n",
"</tr>\n",
"<tr>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Cost of sales:</strong></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td> Products</td>\n",
"<td>192,266</td>\n",
"<td>151,286</td>\n",
"<td>144,996</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Services</td>\n",
"<td>20,715</td>\n",
"<td>18,273</td>\n",
"<td>16,786</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Total cost of sales</td>\n",
"<td>212,981</td>\n",
"<td>169,559</td>\n",
"<td>161,782</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Gross margin</td>\n",
"<td>152,836</td>\n",
"<td>104,956</td>\n",
"<td>98,392</td>\n",
"</tr>\n",
"<tr>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Operating expenses:</strong></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td> Research and development</td>\n",
"<td>21,914</td>\n",
"<td>18,752</td>\n",
"<td>16,217</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Selling, general and administrative</td>\n",
"<td>21,973</td>\n",
"<td>19,916</td>\n",
"<td>18,245</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Total operating expenses</td>\n",
"<td>43,887</td>\n",
"<td>38,668</td>\n",
"<td>34,462</td>\n",
"</tr>\n",
"<tr>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Operating income</strong></td>\n",
"<td>108,949</td>\n",
"<td>66,288</td>\n",
"<td>63,930</td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Other income/(expense), net</strong></td>\n",
"<td>258</td>\n",
"<td>803</td>\n",
"<td>1,807</td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Income before provision for income taxes</strong></td>\n",
"<td>109,207</td>\n",
"<td>67,091</td>\n",
"<td>65,737</td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Provision for income taxes</strong></td>\n",
"<td>14,527</td>\n",
"<td>9,680</td>\n",
"<td>10,481</td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Net income</strong></td>\n",
"<td>$ 94,680</td>\n",
"<td>$ 57,411</td>\n",
"<td>$ 55,256</td>\n",
"</tr>\n",
"<tr>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Earnings per share:</strong></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td> Basic</td>\n",
"<td>$ 5.67</td>\n",
"<td>$ 3.31</td>\n",
"<td>$ 2.99</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Diluted</td>\n",
"<td>$ 5.61</td>\n",
"<td>$ 3.28</td>\n",
"<td>$ 2.97</td>\n",
"</tr>\n",
"<tr>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><strong>Shares used in computing earnings per share:</strong></td>\n",
"<td></td>\n",
"<td></td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td> Basic</td>\n",
"<td>16,701,272</td>\n",
"<td>17,352,119</td>\n",
"<td>18,471,336</td>\n",
"</tr>\n",
"<tr>\n",
"<td> Diluted</td>\n",
"<td>16,864,919</td>\n",
"<td>17,528,214</td>\n",
"<td>18,595,651</td>\n",
"</tr>\n",
"</tbody>\n",
"</table>\n",
"\n",
"See accompanying Notes to Consolidated Financial Statements.\n",
"\n",
"Apple Inc. | 2021 Form 10-K | 29\n",
"\n"
]
}
],
"source": [
"# Parse the Apple 10-K document with premium mode\n",
"print(\"Parsing Apple 10-K with Agentic Plus Mode...\")\n",
"apple_result_agentic_plus = await agentic_plus_parser.aparse(apple_10k_path)\n",
"\n",
"# Get markdown nodes\n",
"apple_nodes_agentic_plus = apple_result_agentic_plus.get_markdown_nodes(\n",
" split_by_page=True\n",
")\n",
"print(f\"Number of pages extracted: {len(apple_nodes_agentic_plus)}\")\n",
"\n",
"# Display sample output from page 32\n",
"print(\"\\n=== Sample Output - Page 32 (Agentic Plus Mode) ===\")\n",
"print(apple_nodes_agentic_plus[31].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question Answering Examples\n",
"\n",
"Now let's demonstrate how to use the parsed content to answer specific questions using an LLM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import PromptTemplate\n",
"\n",
"\n",
"async def ask_question_about_page(\n",
" page_content: str, question: str, document_type: str = \"document\"\n",
") -> str:\n",
" \"\"\"Helper function to ask questions about page content using LLM.\"\"\"\n",
" qa_template = PromptTemplate(\n",
" \"\"\"\n",
" Based on the following page content from a {document_type}, please answer the question:\n",
"\n",
" Question: {question}\n",
"\n",
" Page Content:\n",
" {page_content}\n",
"\n",
" Please provide a specific answer with numbers if available.\n",
" \"\"\"\n",
" )\n",
"\n",
" prompt = qa_template.format(\n",
" question=question, page_content=page_content, document_type=document_type\n",
" )\n",
"\n",
" response = await llm.acomplete(prompt)\n",
" return response.text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 1: Apple 10-K Financial Data\n",
"\n",
"**Question**: \"What are net sales in Q3 September 2021 including product/services breakdown?\"\n",
"\n",
"**Source**: Page 32 of Apple 10-K"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=== Apple 10-K Financial Data Answer ===\n",
"The 10K table shows (in millions) for the year ended September 25, 2021:\n",
"- Products: $297,392 million\n",
"- Services: $68,425 million\n",
"- Total net sales: $365,817 million\n",
"\n",
"Note: the table is the consolidated statement of operations for the fiscal year ended Sept. 25, 2021. If you meant fiscal Q3 (a single quarter), let me know and I can pull the quarterly figures.\n"
]
}
],
"source": [
"# Use the cost-effective mode result for this example\n",
"page_32_content = apple_nodes_cost_effective[31].text\n",
"question = (\n",
" \"What are net sales in Q3 September 2021 including product/services breakdown?\"\n",
")\n",
"\n",
"answer = await ask_question_about_page(\n",
" page_content=page_32_content, question=question, document_type=\"Apple's 10-K filing\"\n",
")\n",
"\n",
"print(\"=== Apple 10-K Financial Data Answer ===\")\n",
"print(answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question 2: GenAI Research Report Compliance\n",
"\n",
"**Question**: \"How prepared are organizations in complying with current/upcoming regulations concerning genAI?\"\n",
"\n",
"**Source**: Page 7 of GenAI Research Report"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=== GenAI Regulatory Compliance Answer ===\n",
"Short answer: Very poorly. Only 1 in 10 organizations say they are fully prepared to comply with current/upcoming GenAI regulations.\n",
"\n",
"Key numbers (all respondents using/planning to use GenAI)\n",
"- Fully prepared: 10% \n",
"- Moderately prepared: 48% \n",
"- Slightly prepared: 40% \n",
"- Not prepared: 2%\n",
"\n",
"Breakdown by adoption stage\n",
"- Using GenAI and fully implemented: 35% fully prepared, 49% moderately, 15% slightly, 0% not prepared. \n",
"- Using GenAI but not fully implemented: 11% fully prepared, 66% moderately, 23% slightly, 0% not prepared. \n",
"- Not yet using but intend to within 2 years: 3% fully prepared, 28% moderately, 64% slightly, 5% not prepared.\n",
"\n",
"Related governance context\n",
"- AI governance (all respondents): 13% have a wellestablished/comprehensive framework; 61% in development; 21% ad hoc; 6% nonexistent. \n",
"- GenAI governance (all respondents): only 5% wellestablished; 55% in development; 28% ad hoc; 11% nonexistent. \n",
"- Among organizations that have fully implemented GenAI, only 29% have a wellestablished GenAI governance framework — meaning ~71% of adopters lack a comprehensive GenAI governance framework (the “seven in 10” cited).\n",
"\n",
"(Percentages may not sum to exactly 100% due to rounding.)\n"
]
}
],
"source": [
"# Use the agentic mode result for this example\n",
"page_7_content = genai_nodes_agentic[6].text\n",
"question = \"How prepared are organizations in complying with current/upcoming regulations concerning genAI?\"\n",
"\n",
"answer = await ask_question_about_page(\n",
" page_content=page_7_content,\n",
" question=question,\n",
" document_type=\"GenAI Research Report\",\n",
")\n",
"\n",
"print(\"=== GenAI Regulatory Compliance Answer ===\")\n",
"print(answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## EU Server Configuration\n",
"\n",
"For users in Europe or those requiring EU data residency, you can easily configure LlamaParse to use the EU server by adding the `base_url` parameter.\n",
"\n",
"**NOTE**: You will need to sign up for an account on https://cloud.eu.llamaindex.ai/ and get a separate API key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example: EU Server Configuration\n",
"eu_parser = LlamaParse(\n",
" parse_mode=\"parse_page_with_agent\",\n",
" model=\"openai-gpt-4-1-mini\",\n",
" base_url=\"https://api.cloud.eu.llamaindex.ai\", # EU server endpoint\n",
" high_res_ocr=True,\n",
" adaptive_long_table=True,\n",
" outlined_table_extraction=True,\n",
" output_tables_as_HTML=False,\n",
" result_type=\"markdown\",\n",
" project_id=project_id,\n",
" organization_id=organization_id,\n",
" api_key=\"<llamacloud_eu_api_key>\",\n",
")\n",
"\n",
"print(\"EU Server Parser configured (not executed in this demo)\")\n",
"print(\"Simply add base_url='https://api.cloud.eu.llamaindex.ai' to use EU servers\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mode Comparison Summary\n",
"\n",
"| Mode | Use Case | Cost | Speed | Accuracy |\n",
"|------|----------|------|-------|----------|\n",
"| **Cost-Effective** | High-volume, text-heavy documents | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |\n",
"| **Agentic (Default)** | General purpose, balanced performance | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ |\n",
"| **Agentic Plus** | Complex documents, maximum accuracy | ⭐ | ⭐ | ⭐⭐⭐ |\n",
"\n",
"### Choosing the Right Mode:\n",
"\n",
"- **Start with Agentic Mode** - It's the default for good reason, offering the best balance of quality and cost\n",
"- **Use Cost-Effective Mode** when processing large volumes of straightforward documents\n",
"- **Upgrade to Agentic Plus Mode** for complex documents with intricate layouts, charts, or when maximum accuracy is required"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps and Additional Resources\n",
"\n",
"Now that you've learned about LlamaParse's different modes, explore these resources for deeper dives:\n",
"\n",
"### Advanced Features\n",
"- **JSON Mode Analysis**: Check out `demo_json_tour.ipynb` for detailed analysis of parsing outputs through JSON mode\n",
"- **Auto Mode**: Explore `parsing_modes/demo_auto_mode.ipynb` for automatic mode selection based on document characteristics\n",
"\n",
"### Building Applications\n",
"- **LlamaCloud Getting Started**: To setup an e2e RAG/retrieval pipeline, visit the [LlamaCloud Getting Started Guide](https://docs.cloud.llamaindex.ai/llamacloud/how_to/getting-started-with-index)\n",
"- **API Documentation**: Full API reference at [LlamaCloud Documentation](https://docs.cloud.llamaindex.ai/API/llama-platform)\n",
"\n",
"### Key Configuration Options\n",
"- `high_res_ocr=True` - Enhanced OCR for better text extraction\n",
"- `adaptive_long_table=True` - Better handling of complex tables\n",
"- `outlined_table_extraction=True` - Improved table structure detection\n",
"- `output_tables_as_HTML=False` - Output tables as markdown instead of HTML\n",
"- `result_type=\"markdown\"` - Clean, structured output format\n",
"\n",
"Happy parsing! 🚀"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama_parse",
"language": "python",
"name": "llama_parse"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}