mirror of
https://github.com/run-llama/llama_cloud_services.git
synced 2026-07-01 21:44:37 -04:00
287 lines
11 KiB
Plaintext
287 lines
11 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0db58db5-d4ee-4631-af5b-4fc53eb05170",
|
||
"metadata": {},
|
||
"source": [
|
||
"# RAG with Excel Spreadsheet using LlamaPrase\n",
|
||
"\n",
|
||
"<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/excel/dcf_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
||
"\n",
|
||
"This notebook constructs a RAG pipeline over a simple DCF template [here](https://eqvista.com/app/uploads/2020/09/Eqvista_DCF-Excel-Template.xlsx).\n",
|
||
"\n",
|
||
"Status:\n",
|
||
"| Last Executed | Version | State |\n",
|
||
"|---------------|---------|------------|\n",
|
||
"| Aug-19-2025 | 0.6.61 | Maintained |\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a3636937",
|
||
"metadata": {},
|
||
"source": [
|
||
"> **⚠️ DEPRECATION NOTICE**>> This example uses the deprecated `llama-cloud-services` package, which will be maintained until **May 1, 2026**.>> **Please migrate to:**> - **Python**: `pip install llama-cloud>=1.0` ([GitHub](https://github.com/run-llama/llama-cloud-py))> - **New Package Documentation**: https://docs.cloud.llamaindex.ai/>> The new package provides the same functionality with improved performance and support."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5f7d99ad-6ebd-47d0-92a7-566630b0c22a",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Setup\n",
|
||
"\n",
|
||
"We first setup and load the data. If you haven't already, [download the template](https://eqvista.com/wp-content/uploads/2020/09/Eqvista_DCF-Excel-Template.xlsx) and name it `dcf_template.xlxs` locally."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d867d1a6-cfcf-4f53-952a-f4a6ff2fa205",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"%pip install \"llama-index>=0.13.0<0.14.0\"\n",
|
||
"%pip install llama-cloud-services"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9876ae6d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"\n",
|
||
"os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\"\n",
|
||
"os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9c4693c7-c1c8-47b4-8a8c-25d7e9ef9d2c",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Started parsing the file under job_id 1adabb9a-31d3-4732-962f-a287d5f7af2a\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from llama_cloud_services import LlamaParse\n",
|
||
"\n",
|
||
"parser = LlamaParse(\n",
|
||
" parse_mode=\"parse_page_with_agent\",\n",
|
||
" model=\"openai-gpt-4-1-mini\",\n",
|
||
" high_res_ocr=True,\n",
|
||
" adaptive_long_table=True,\n",
|
||
" outlined_table_extraction=True,\n",
|
||
" output_tables_as_HTML=True,\n",
|
||
")\n",
|
||
"\n",
|
||
"result = await parser.aparse(\"./dcf_template.xlsx\")\n",
|
||
"llama_parse_documents = result.get_text_documents(split_by_page=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "7302f1c8-e405-4cda-8ff7-1d55185816f7",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Discounted Cash Flow Excel Template\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Here is a simple discounted cash flow excel template for estimating your company value based on this income valuation approach\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Instructions:\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"1) Fill out the two assumptions in yellow highlight\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"2) Fill in either the 5 year or 3 year weighted average figures in yellow highlight\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Assumptions\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Tax Rate\t20%\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Discount Rate\t15%\t\t\t\t\t\t\t\t\t\t\n",
|
||
"5 Year Weighted Moving Average\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Indication of Company Value\t $242,995.43 \t\t\t\t\t\t\t\t\t\t\n",
|
||
"3 Year Weighted Moving Average\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"Indication of Company Value\t $158,651.07 \t\t\t\t\t\t\t\t\t\t\n",
|
||
"\t5 Year Weighted Moving Average\t\t\t\t\t\t\t\t\t\t\n",
|
||
"\tPast Years\t\t\t\t\tForecasted Future Years\t\t\t\t\t\n",
|
||
"\tYear 1\tYear 2\tYear 3\tYear 4\tYear 5\tYear 6\tYear 7\tYear 8\tYear 9\tYear 10\tTerminal Value\n",
|
||
"Pre-tax income\t 50,000.00 \t 55,000.00 \t 45,000.00 \t 52,000.00 \t 60,000.00 \t\t\t\t\t\t\n",
|
||
"Income Taxes\t 10,000.00 \t 11,000.00 \t 9,000.00 \t 10,400.00 \t 12,000.00 \t\t\t\t\t\t\n",
|
||
"Net Income\t 40,000.00 \t 44,000.00 \t 36,000.00 \t 41,600.00 \t 48,000.00 \t\t\t\t\t\t\n",
|
||
"Depreciation Expense\t 5,000.00 \t 4,000.00 \t 3,000.00 \t 2,000.00 \t 1,000.00 \t\t\t\t\t\t\n",
|
||
"Capital Expenditures\t 10,000.00 \t 8,000.00 \t 5,000.00 \t 5,000.00 \t 7,000.00 \t\t\t\t\t\t\n",
|
||
"Debt Repayments\t 5,000.00 \t 5,000.00 \t 5,000.00 \t 5,000.00 \t 5,000.00 \t\t\t\t\t\t\n",
|
||
"Net Cash Flow\t 20,000.00 \t 27,000.00 \t 23,000.00 \t 29,600.00 \t 35,000.00 \t 29,093.33 \t 29,817.78 \t 30,177.48 \t 30,469.23 \t 30,379.74 \t 287,188.00 \n",
|
||
"Discounting Factor\t\t\t\t\t\t 0.8696 \t 0.7561 \t 0.6575 \t 0.5718 \t 0.4972 \t 0.4972 \n",
|
||
"Present Value of Future Cash Flow\t\t\t\t\t\t 25,298.55 \t 22,546.52 \t 19,842.18 \t 17,420.88 \t 15,104.10 \t 142,783.19 \n",
|
||
"\t3 Year Weighted Moving Average\t\t\t\t\t\t\t\t\t\t\n",
|
||
"\tPast Years\t\t\tForecasted Future Years\t\t\t\t\t\t\t\n",
|
||
"\tYear 1\tYear 2\tYear 3\tYear 4\tYear 5\tYear 6\tTerminal Value\t\t\t\t\n",
|
||
"Pre-tax income\t 50,000.00 \t 55,000.00 \t 45,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Income Taxes\t 10,000.00 \t 11,000.00 \t 9,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Net Income\t 40,000.00 \t 44,000.00 \t 36,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Depreciation Expense\t 5,000.00 \t 4,000.00 \t 3,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Capital Expenditures\t 10,000.00 \t 8,000.00 \t 5,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Debt Repayments\t 5,000.00 \t 5,000.00 \t 5,000.00 \t\t\t\t\t\t\t\t\n",
|
||
"Net Cash Flow\t 20,000.00 \t 27,000.00 \t 23,000.00 \t 23,833.33 \t 24,083.33 \t 23,819.44 \t 158,253.59 \t\t\t\t\n",
|
||
"Discounting Factor\t\t\t\t 0.8696 \t 0.7561 \t 0.6575 \t 0.6575 \t\t\t\t\n",
|
||
"Present Value of Future Cash Flow\t\t\t\t 20,724.64 \t 18,210.46 \t 15,661.67 \t 104,054.30 \t\t\t\t\n",
|
||
"Notes:\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"-We based this simple discounted cash flow excel model based on the weighted moving averages (5 year or 3 year) for simplicity, in case a constant growth rate cannot be easily determined.\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"-The factors such as Depreciation Expense, Capital Expense and Debt Repayments remain constant, so consider this when looking at the forecasted figures.\t\t\t\t\t\t\t\t\t\t\t\n",
|
||
"-For the terminal value constant growth rate, we make the assumption of the growth from the last forecasted year compared to the first forecasted year. Adjust in the formula as needed.\t\t\t\t\t\t\t\t\t\t\t\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(llama_parse_documents[1].text)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1aedd4bb-7939-4fbc-8f07-d362e24d9772",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Configure LLM\n",
|
||
"\n",
|
||
"We configure the LLM to use the OpenAI API to answer questions based on the parsed data."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f7c056a8-d098-4ebe-9341-d9f07081067c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from llama_index.llms.openai import OpenAI\n",
|
||
"\n",
|
||
"llm = OpenAI(model=\"gpt-5-mini\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fa75f1bc-6fed-4721-ba5e-dc5408395618",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Ask Questions over this Data\n",
|
||
"\n",
|
||
"Let's now ask questions over this data, using both the LlamaParse-powered pipeline and naive pipeline.\n",
|
||
"\n",
|
||
"LlamaParse-powered responses:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a875a20e-a6b6-46b7-80d4-614546215ffc",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"2025-08-19 19:35:11,505 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"In the 5-year WMA table, income taxes for past years (Year 3–Year 5) are:\n",
|
||
"\n",
|
||
"- Year 3: $9,000 \n",
|
||
"- Year 4: $10,400 \n",
|
||
"- Year 5: $12,000\n",
|
||
"\n",
|
||
"These equal 20% of pre-tax income for those years (pre-tax: $45,000; $52,000; $60,000). The taxes rise steadily: Year 3 → Year 4 is about a 15.6% increase, Year 4 → Year 5 about a 15.4% increase, and Year 3 → Year 5 is a 33.3% increase.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from llama_index.core.llms import ChatMessage\n",
|
||
"\n",
|
||
"query_str = \"Tell me about the income taxes in the past years (year 3-5) for the 5 year WMA table\"\n",
|
||
"context = \"\\n\\n\".join([doc.text for doc in llama_parse_documents])\n",
|
||
"messages = [\n",
|
||
" ChatMessage(\n",
|
||
" role=\"user\",\n",
|
||
" content=f\"Here is some context\\n<context>{context}</context>\\n\\nAnswer the following question: {query_str}\",\n",
|
||
" )\n",
|
||
"]\n",
|
||
"\n",
|
||
"response = await llm.achat(messages)\n",
|
||
"print(response.message.content)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "7a93af5f-fcea-4f14-80eb-5dfad230cd8a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"2025-08-19 19:36:38,456 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"For the 3‑year WMA the discount factor used in Year 5 is 0.7561.\n",
|
||
"\n",
|
||
"Why: the model uses a 15% discount rate (assumption). Because Years 1–3 are historical, Year 4 is discounted one period, Year 5 two periods, etc. So the Year‑5 factor = 1 / (1 + 0.15)^2 = 0.756143 (rounded to 0.7561).\n",
|
||
"\n",
|
||
"How it’s used: Year‑5 net cash flow 24,083.33 × 0.7561 = 18,210.46 (present value shown in the template).\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"query_str = \"Tell me about the discounting factors in year 5 for the 3 year WMA\"\n",
|
||
"context = \"\\n\\n\".join([doc.text for doc in llama_parse_documents])\n",
|
||
"messages = [\n",
|
||
" ChatMessage(\n",
|
||
" role=\"user\",\n",
|
||
" content=f\"Here is some context\\n<context>{context}</context>\\n\\nAnswer the following question: {query_str}\",\n",
|
||
" )\n",
|
||
"]\n",
|
||
"\n",
|
||
"response = await llm.achat(messages)\n",
|
||
"print(response.message.content)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": ".venv",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|