mirror of
https://github.com/run-llama/cookbooks.git
synced 2026-07-01 21:34:02 -04:00
393 lines
104 KiB
Plaintext
393 lines
104 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Sub Question Query Engine powered by NVIDIA NIMs\n",
|
||
"\n",
|
||
"A Sub Question Query Engine takes a single, complex question and breaks it into multiple sub-questions, each of which can be answered by a different tool. We'll use NVIDIA NIMs to power our sub-question generation and answer retrieval.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### NVIDIA NIMs\n",
|
||
"\n",
|
||
"NIM supports models across domains like chat, embedding, and re-ranking models \n",
|
||
"from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
|
||
"accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
|
||
"command on NVIDIA accelerated infrastructure.\n",
|
||
"\n",
|
||
"NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
|
||
"NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
|
||
"giving enterprises ownership and full control of their IP and AI application.\n",
|
||
"\n",
|
||
"NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
|
||
"At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Setup\n",
|
||
"Import our dependencies and set up our NVIDIA API key from the API catalog, https://build.nvidia.com for the two models we'll use hosted on the catalog (embedding and re-ranking models).\n",
|
||
"\n",
|
||
"**To get started:**\n",
|
||
"\n",
|
||
"1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
|
||
"\n",
|
||
"2. Click on your model of choice.\n",
|
||
"\n",
|
||
"3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.\n",
|
||
"\n",
|
||
"4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.\n",
|
||
"\n",
|
||
"**Install our dependencies:**\n",
|
||
"* LlamaIndex core for most things\n",
|
||
"* NVIDIA NIM LLM and embeddings for LLM actions\n",
|
||
"* `llama-index-readers-file` to power the PDF reader in `SimpleDirectoryReader`\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"!pip install llama-index-core llama-index-llms-nvidia llama-index-embeddings-nvidia llama-index-readers-file llama-index-utils-workflow"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Bring in our dependencies as imports:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os, json\n",
|
||
"from llama_index.core import (\n",
|
||
" SimpleDirectoryReader,\n",
|
||
" VectorStoreIndex,\n",
|
||
" StorageContext,\n",
|
||
" load_index_from_storage,\n",
|
||
" Settings,\n",
|
||
")\n",
|
||
"from llama_index.core.tools import QueryEngineTool, ToolMetadata\n",
|
||
"from llama_index.core.workflow import (\n",
|
||
" step,\n",
|
||
" Context,\n",
|
||
" Workflow,\n",
|
||
" Event,\n",
|
||
" StartEvent,\n",
|
||
" StopEvent,\n",
|
||
")\n",
|
||
"from llama_index.core.agent.workflow import ReActAgent\n",
|
||
"from llama_index.llms.nvidia import NVIDIA\n",
|
||
"from llama_index.embeddings.nvidia import NVIDIAEmbedding\n",
|
||
"from llama_index.utils.workflow import draw_all_possible_flows"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Define the Sub Question Query Engine as a Workflow\n",
|
||
"\n",
|
||
"* Our StartEvent goes to `query()`, which takes care of several things:\n",
|
||
" * Accepts and stores the original query\n",
|
||
" * Stores the LLM to handle the queries\n",
|
||
" * Stores the list of tools to enable sub-questions\n",
|
||
" * Passes the original question to the LLM, asking it to split up the question into sub-questions\n",
|
||
" * Fires off a `QueryEvent` for every sub-question generated\n",
|
||
"\n",
|
||
"* QueryEvents go to `sub_question()`, which instantiates a new ReAct agent with the full list of tools available and lets it select which one to use.\n",
|
||
" * This is slightly better than the actual SQQE built-in to LlamaIndex, which cannot use multiple tools\n",
|
||
" * Each QueryEvent generates an `AnswerEvent`\n",
|
||
"\n",
|
||
"* AnswerEvents go to `combine_answers()`.\n",
|
||
" * This uses `self.collect_events()` to wait for every QueryEvent to return an answer.\n",
|
||
" * All the answers are then combined into a final prompt for the LLM to consolidate them into a single response\n",
|
||
" * A StopEvent is generated to return the final result"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"class QueryEvent(Event):\n",
|
||
" question: str\n",
|
||
"\n",
|
||
"\n",
|
||
"class AnswerEvent(Event):\n",
|
||
" question: str\n",
|
||
" answer: str\n",
|
||
"\n",
|
||
"\n",
|
||
"class SubQuestionQueryEngine(Workflow):\n",
|
||
" @step\n",
|
||
" async def query(self, ctx: Context, ev: StartEvent) -> QueryEvent:\n",
|
||
" if hasattr(ev, \"query\"):\n",
|
||
" await ctx.store.set(\"original_query\", ev.query)\n",
|
||
" print(f\"Query is {await ctx.store.get('original_query')}\")\n",
|
||
"\n",
|
||
" if hasattr(ev, \"llm\"):\n",
|
||
" await ctx.store.set(\"llm\", ev.llm)\n",
|
||
"\n",
|
||
" if hasattr(ev, \"tools\"):\n",
|
||
" await ctx.store.set(\"tools\", ev.tools)\n",
|
||
"\n",
|
||
" response = (await ctx.store.get(\"llm\")).complete(\n",
|
||
" f\"\"\"\n",
|
||
" Given a user question, and a list of tools, output a list of\n",
|
||
" relevant sub-questions, such that the answers to all the\n",
|
||
" sub-questions put together will answer the question. Respond\n",
|
||
" in pure JSON without any markdown, like this:\n",
|
||
" {{\n",
|
||
" \"sub_questions\": [\n",
|
||
" \"What is the population of San Francisco?\",\n",
|
||
" \"What is the budget of San Francisco?\",\n",
|
||
" \"What is the GDP of San Francisco?\"\n",
|
||
" ]\n",
|
||
" }}\n",
|
||
" Here is the user question: {await ctx.store.get('original_query')}\n",
|
||
"\n",
|
||
" And here is the list of tools: {await ctx.store.get('tools')}\n",
|
||
" \"\"\"\n",
|
||
" )\n",
|
||
"\n",
|
||
" print(f\"Sub-questions are {response}\")\n",
|
||
"\n",
|
||
" response_obj = json.loads(str(response))\n",
|
||
" sub_questions = response_obj[\"sub_questions\"]\n",
|
||
"\n",
|
||
" await ctx.store.set(\"sub_question_count\", len(sub_questions))\n",
|
||
"\n",
|
||
" for question in sub_questions:\n",
|
||
" self.send_event(QueryEvent(question=question))\n",
|
||
"\n",
|
||
" return None\n",
|
||
"\n",
|
||
" @step\n",
|
||
" async def sub_question(self, ctx: Context, ev: QueryEvent) -> AnswerEvent:\n",
|
||
" print(f\"Sub-question is {ev.question}\")\n",
|
||
"\n",
|
||
" agent = ReActAgent(\n",
|
||
" tools=await ctx.store.get(\"tools\"),\n",
|
||
" llm=await ctx.store.get(\"llm\"),\n",
|
||
" )\n",
|
||
" response = await agent.run(ev.question)\n",
|
||
"\n",
|
||
" return AnswerEvent(question=ev.question, answer=str(response))\n",
|
||
"\n",
|
||
" @step\n",
|
||
" async def combine_answers(\n",
|
||
" self, ctx: Context, ev: AnswerEvent\n",
|
||
" ) -> StopEvent | None:\n",
|
||
" ready = ctx.collect_events(\n",
|
||
" ev, [AnswerEvent] * await ctx.store.get(\"sub_question_count\")\n",
|
||
" )\n",
|
||
" if ready is None:\n",
|
||
" return None\n",
|
||
"\n",
|
||
" answers = \"\\n\\n\".join(\n",
|
||
" [\n",
|
||
" f\"Question: {event.question}: \\n Answer: {event.answer}\"\n",
|
||
" for event in ready\n",
|
||
" ]\n",
|
||
" )\n",
|
||
"\n",
|
||
" prompt = f\"\"\"\n",
|
||
" You are given an overall question that has been split into sub-questions,\n",
|
||
" each of which has been answered. Combine the answers to all the sub-questions\n",
|
||
" into a single answer to the original question.\n",
|
||
"\n",
|
||
" Original question: {await ctx.store.get('original_query')}\n",
|
||
"\n",
|
||
" Sub-questions and answers:\n",
|
||
" {answers}\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" print(f\"Final prompt is {prompt}\")\n",
|
||
"\n",
|
||
" response = (await ctx.store.get(\"llm\")).complete(prompt)\n",
|
||
"\n",
|
||
" print(\"Final response is\", response)\n",
|
||
"\n",
|
||
" return StopEvent(result=str(response))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"draw_all_possible_flows(\n",
|
||
" SubQuestionQueryEngine, filename=\"sub_question_query_engine.html\"\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Visualizing this flow looks pretty linear, since it doesn't capture that `query()` can generate multiple parallel `QueryEvents` which get collected into `combine_answers`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Download data to demo"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"!mkdir -p \"./data/sf_budgets/\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/xt3squt47djba0j7emmjb/2016-CSF_Budget_Book_2016_FINAL_WEB_with-cover-page.pdf?rlkey=xs064cjs8cb4wma6t5pw2u2bl&dl=0\" -O \"./data/sf_budgets/2016 - CSF_Budget_Book_2016_FINAL_WEB_with-cover-page.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/jvw59g5nscu1m7f96tjre/2017-Proposed-Budget-FY2017-18-FY2018-19_1.pdf?rlkey=v988oigs2whtcy87ti9wti6od&dl=0\" -O \"./data/sf_budgets/2017 - 2017-Proposed-Budget-FY2017-18-FY2018-19_1.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/izknlwmbs7ia0lbn7zzyx/2018-o0181-18.pdf?rlkey=p5nv2ehtp7272ege3m9diqhei&dl=0\" -O \"./data/sf_budgets/2018 - 2018-o0181-18.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/1rstqm9rh5u5fr0tcjnxj/2019-Proposed-Budget-FY2019-20-FY2020-21.pdf?rlkey=3s2ivfx7z9bev1r840dlpbcgg&dl=0\" -O \"./data/sf_budgets/2019 - 2019-Proposed-Budget-FY2019-20-FY2020-21.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/7teuwxrjdyvgw0n8jjvk0/2021-AAO-FY20-21-FY21-22-09-11-2020-FINAL.pdf?rlkey=6br3wzxwj5fv1f1l8e69nbmhk&dl=0\" -O \"./data/sf_budgets/2021 - 2021-AAO-FY20-21-FY21-22-09-11-2020-FINAL.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/zhgqch4n6xbv9skgcknij/2022-AAO-FY2021-22-FY2022-23-FINAL-20210730.pdf?rlkey=h78t65dfaz3mqbpbhl1u9e309&dl=0\" -O \"./data/sf_budgets/2022 - 2022-AAO-FY2021-22-FY2022-23-FINAL-20210730.pdf\"\n",
|
||
"!wget \"https://www.dropbox.com/scl/fi/vip161t63s56vd94neqlt/2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf?rlkey=hemoce3w1jsuf6s2bz87g549i&dl=0\" -O \"./data/sf_budgets/2023 - 2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Load data and run the workflow\n",
|
||
"\n",
|
||
"Just like using the built-in Sub-Question Query Engine, we create our query tools and instantiate an LLM and pass them in.\n",
|
||
"\n",
|
||
"Each tool is its own query engine based on a single (very lengthy) San Francisco budget document, each of which is 300+ pages. To save time on repeated runs, we persist our generated indexes to disk."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import getpass\n",
|
||
"\n",
|
||
"if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
|
||
" print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
|
||
"else:\n",
|
||
" nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
|
||
" assert nvapi_key.startswith(\n",
|
||
" \"nvapi-\"\n",
|
||
" ), f\"{nvapi_key[:5]}... is not a valid key\"\n",
|
||
" os.environ[\"NVIDIA_API_KEY\"] = nvapi_key\n",
|
||
"\n",
|
||
"folder = \"./data/sf_budgets/\"\n",
|
||
"files = os.listdir(folder)\n",
|
||
"\n",
|
||
"Settings.embed_model = NVIDIAEmbedding(\n",
|
||
" model=\"nvidia/nv-embedqa-e5-v5\", truncate=\"END\"\n",
|
||
")\n",
|
||
"Settings.llm = NVIDIA()\n",
|
||
"\n",
|
||
"query_engine_tools = []\n",
|
||
"for file in files:\n",
|
||
" year = file.split(\" - \")[0]\n",
|
||
" index_persist_path = f\"./storage/budget-{year}/\"\n",
|
||
"\n",
|
||
" if os.path.exists(index_persist_path):\n",
|
||
" storage_context = StorageContext.from_defaults(\n",
|
||
" persist_dir=index_persist_path\n",
|
||
" )\n",
|
||
" index = load_index_from_storage(storage_context)\n",
|
||
" else:\n",
|
||
" documents = SimpleDirectoryReader(\n",
|
||
" input_files=[folder + file]\n",
|
||
" ).load_data()\n",
|
||
" index = VectorStoreIndex.from_documents(documents)\n",
|
||
" index.storage_context.persist(index_persist_path)\n",
|
||
"\n",
|
||
" engine = index.as_query_engine()\n",
|
||
" query_engine_tools.append(\n",
|
||
" QueryEngineTool(\n",
|
||
" query_engine=engine,\n",
|
||
" metadata=ToolMetadata(\n",
|
||
" name=f\"budget_{year}\",\n",
|
||
" description=f\"You can ask this tool natural-language questions about San Francisco's budget in {year}\",\n",
|
||
" ),\n",
|
||
" )\n",
|
||
" )"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"engine = SubQuestionQueryEngine(timeout=120, verbose=True)\n",
|
||
"result = await engine.run(\n",
|
||
" llm=Settings.llm,\n",
|
||
" tools=query_engine_tools,\n",
|
||
" query=\"How has the total amount of San Francisco's budget changed from 2016 to 2023?\",\n",
|
||
")\n",
|
||
"\n",
|
||
"print(result)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Our debug output is lengthy! You can see the sub-questions being generated and then `sub_question()` being repeatedly invoked, each time generating a brief log of ReAct agent thoughts and actions to answer each smaller question.\n",
|
||
"\n",
|
||
"You can see `combine_answers` running multiple times; these were triggered by each `AnswerEvent` but before all 8 `AnswerEvents` were collected. On its final run it generates a full prompt, combines the answers and returns the result."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"colab": {
|
||
"provenance": []
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 0
|
||
}
|