Files
llama_cloud_services/examples/parse/knowledge_graphs/kg_agent.ipynb
T
2026-02-02 11:42:47 -06:00

1112 lines
53 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "6a728deb-28da-4064-8e66-bef416022207",
"metadata": {},
"source": [
"# Knowledge Graph Agent with LlamaParse\n",
"\n",
"<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/knowledge_graphs/kg_agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"Here we build a knowledge graph agent over the SF 2023 Budget Proposal. We use LlamaIndex abstractions to construct a knowledge graph, and we store the property graph in neo4j. We then build an agent that can interact with the knowledge graph as a tool.\n",
"\n",
"Status:\n",
"| Last Executed | Version | State |\n",
"|---------------|---------|------------|\n",
"| Before Feb 2025 | N/A | Deprecated |"
]
},
{
"cell_type": "markdown",
"id": "0facb0b9",
"metadata": {},
"source": [
"> **⚠️ DEPRECATION NOTICE**>> This example uses the deprecated `llama-cloud-services` package, which will be maintained until **May 1, 2026**.>> **Please migrate to:**> - **Python**: `pip install llama-cloud>=1.0` ([GitHub](https://github.com/run-llama/llama-cloud-py))> - **New Package Documentation**: https://docs.cloud.llamaindex.ai/>> The new package provides the same functionality with improved performance and support."
]
},
{
"cell_type": "markdown",
"id": "e8db8ac2-5221-44de-a53e-cb5ab37ac8f5",
"metadata": {},
"source": [
"## Setup (Installs, Data, Models)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "989d1cb5-5464-4d9d-ac1e-6e16276d3698",
"metadata": {},
"outputs": [],
"source": [
"!pip install llama-index\n",
"!pip install llama-index-core==0.10.42\n",
"!pip install llama-index-embeddings-openai\n",
"!pip install llama-index-postprocessor-flag-embedding-reranker\n",
"!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
"!pip install llama-index-graph-stores-neo4j\n",
"!pip install llama-cloud-services"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "580ecfc2-b082-4c4e-910a-7f88e8137aad",
"metadata": {},
"outputs": [],
"source": [
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c86e95e2-0fdf-4bb1-bf2e-b33af13ac7ef",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# API access to llama-cloud\n",
"# os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-\""
]
},
{
"cell_type": "markdown",
"id": "d6f683f2-a41e-4975-843c-435407132f0e",
"metadata": {},
"source": [
"#### Setup Model\n",
"\n",
"Here we use gpt-4o and default OpenAI embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d91854ee-d57a-4d7b-bcc8-c6bd7214fe84",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.llms.openai import OpenAI\n",
"from llama_index.embeddings.openai import OpenAIEmbedding\n",
"from llama_index.core import Settings\n",
"\n",
"llm = OpenAI(model=\"gpt-4o\")\n",
"embed_model = OpenAIEmbedding(model=\"text-embedding-3-small\")\n",
"\n",
"Settings.llm = llm\n",
"Settings.embed_model = embed_model"
]
},
{
"cell_type": "markdown",
"id": "5bcf33f7-b195-444d-9355-ab47c91be6ad",
"metadata": {},
"source": [
"#### Load Data\n",
"\n",
"Here we load the 2023 Budget PDF and parse it with LlamaParse."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24e7d3f1-012d-4964-8130-e9a386b70996",
"metadata": {},
"outputs": [],
"source": [
"!mkdir data\n",
"!wget \"https://www.dropbox.com/scl/fi/vip161t63s56vd94neqlt/2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf?rlkey=hemoce3w1jsuf6s2bz87g549i&dl=0\" -O data/budget_2023.pdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96d13f0b-1749-4c06-a4f7-b6f885db04d3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Started parsing the file under job_id a7bc360f-1625-4fb7-a950-7531a8b3447e\n"
]
}
],
"source": [
"from llama_cloud_services import LlamaParse\n",
"\n",
"docs = LlamaParse(result_type=\"text\").load_data(\"./data/budget_2023.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9afcf38e-3c7e-4b48-ae23-61b33f1a448a",
"metadata": {},
"outputs": [],
"source": [
"from copy import deepcopy\n",
"from llama_index.core.schema import TextNode, Document\n",
"from llama_index.core import VectorStoreIndex\n",
"\n",
"\n",
"def get_sub_docs(docs):\n",
" \"\"\"Split docs into pages, by separator.\"\"\"\n",
" sub_docs = []\n",
" for doc in docs:\n",
" doc_chunks = doc.text.split(\"\\n---\\n\")\n",
" for doc_chunk in doc_chunks:\n",
" sub_doc = Document(\n",
" text=doc_chunk,\n",
" metadata=deepcopy(doc.metadata),\n",
" )\n",
" sub_docs.append(sub_doc)\n",
"\n",
" return sub_docs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d672fcaa-9e49-40a2-9a0b-e709351e840f",
"metadata": {},
"outputs": [],
"source": [
"# this will split into pages\n",
"sub_docs = get_sub_docs(docs)"
]
},
{
"cell_type": "markdown",
"id": "e55bcae5-43ba-4363-9109-d8080d19ce5a",
"metadata": {},
"source": [
"#### Initialize Graph Store\n",
"\n",
"Here we use Neo4j but you can also use our other integrations like Nebula (see an [example notebook](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/property_graph_advanced.ipynb))."
]
},
{
"cell_type": "markdown",
"id": "b86a0e7b-bc15-45ed-b1f7-b4c82d16b815",
"metadata": {},
"source": [
"To launch Neo4j locally, first ensure you have docker installed. Then, you can launch the database with the following docker command\n",
"\n",
"```bash\n",
"docker run \\\n",
" -p 7474:7474 -p 7687:7687 \\\n",
" -v $PWD/data:/data -v $PWD/plugins:/plugins \\\n",
" --name neo4j-apoc \\\n",
" -e NEO4J_apoc_export_file_enabled=true \\\n",
" -e NEO4J_apoc_import_file_enabled=true \\\n",
" -e NEO4J_apoc_import_file_use__neo4j__config=true \\\n",
" -e NEO4JLABS_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
" neo4j:latest\n",
"```\n",
"\n",
"From here, you can open the db at [http://localhost:7474/](http://localhost:7474/). On this page, you will be asked to sign in. Use the default username/password of `neo4j` and `neo4j`.\n",
"\n",
"Once you login for the first time, you will be asked to change the password.\n",
"\n",
"After this, you are ready to create your first property graph!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e25ed865-b78e-4856-9473-7123d9924d46",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.graph_stores.neo4j import Neo4jPGStore\n",
"\n",
"graph_store = Neo4jPGStore(\n",
" username=\"neo4j\",\n",
" password=\"llamaindex\",\n",
" url=\"bolt://localhost:7687\",\n",
")\n",
"vec_store = None"
]
},
{
"cell_type": "markdown",
"id": "10723825-328f-4175-ad85-637b0c28262c",
"metadata": {},
"source": [
"## Construct Knowledge Graph, Get Retrievers\n",
"\n",
"This section shows you how to construct the knowledge graph over the existing documents.\n",
"\n",
"**Note**: we have the default extractors (implicit path, simple llm path) configured. You can also choose to use a pre-defined schema as mentioned in this [notebook](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/property_graph_advanced.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b3d81c4-95a8-409b-a448-ac18e193effc",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.indices.property_graph import (\n",
" ImplicitPathExtractor,\n",
" SimpleLLMPathExtractor,\n",
")\n",
"from llama_index.core import PropertyGraphIndex\n",
"from llama_index.llms.openai import OpenAI\n",
"from llama_index.embeddings.openai import OpenAIEmbedding"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4debe49-e2a1-4092-b0e7-cc5e6604fbef",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 362/362 [00:00<00:00, 1051.95it/s]\n",
"Extracting implicit paths: 100%|███████████████████████████████████████████████████████████████████████████████████████| 438/438 [00:00<00:00, 99701.79it/s]\n",
"Extracting paths from text: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 438/438 [03:53<00:00, 1.87it/s]\n",
"Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 2.89it/s]\n",
"Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 62/62 [00:01<00:00, 38.10it/s]\n"
]
}
],
"source": [
"index = PropertyGraphIndex.from_documents(\n",
" sub_docs,\n",
" embed_model=OpenAIEmbedding(model_name=\"text-embedding-3-small\"),\n",
" kg_extractors=[\n",
" ImplicitPathExtractor(),\n",
" SimpleLLMPathExtractor(\n",
" llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0.3),\n",
" num_workers=4,\n",
" max_paths_per_chunk=10,\n",
" ),\n",
" ],\n",
" property_graph_store=graph_store,\n",
" show_progress=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffc29e45-89c1-48cf-b221-c4e117cda630",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Extracting implicit paths: 0it [00:00, ?it/s]\n",
"Extracting paths from text: 0it [00:00, ?it/s]\n",
"Generating embeddings: 0it [00:00, ?it/s]\n",
"Generating embeddings: 0it [00:00, ?it/s]\n"
]
}
],
"source": [
"# run this if index is already loaded\n",
"index = PropertyGraphIndex.from_existing(\n",
" graph_store,\n",
" embed_model=OpenAIEmbedding(model_name=\"text-embedding-3-small\"),\n",
" kg_extractors=[\n",
" ImplicitPathExtractor(),\n",
" SimpleLLMPathExtractor(\n",
" llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0.3),\n",
" num_workers=4,\n",
" max_paths_per_chunk=10,\n",
" ),\n",
" ],\n",
" show_progress=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "20cf723a-5353-4f97-95b2-c9cde940c583",
"metadata": {},
"source": [
"The constructed knowledge graph should look something like this\n",
"![knowledge graph](./sf2023_budget_kg_screenshot.png)"
]
},
{
"cell_type": "markdown",
"id": "0c1f0f65-2d15-471c-9e74-6cebcecee1f4",
"metadata": {},
"source": [
"#### Define Vector Retriever\n",
"\n",
"Here we define our vector context retriever - it returns initial nodes via vector search, and traverses the relations to pull in more nodes/context."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d4d4ce1-64df-41cd-9f36-49c779fdc2b3",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.indices.property_graph import VectorContextRetriever\n",
"\n",
"kg_retriever = VectorContextRetriever(\n",
" index.property_graph_store,\n",
" embed_model=OpenAIEmbedding(model_name=\"text-embedding-3-small\"),\n",
" similarity_top_k=2,\n",
" path_depth=1,\n",
" # include_text=False,\n",
" include_text=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca3cd98b-1556-4ae7-8a24-519adb48beb6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3\n",
">> IDX: 0, Here are some facts extracted from the provided text:\n",
"\n",
"Mayor's budget -> Includes -> Key changes\n",
"\n",
"first responders to petition for an individual to enter\n",
"the programs. In these procedures, a CARE Plan is\n",
"established, and a judge can use court orders with\n",
"support such as short-term stabilization medications\n",
"and beds, as well as wellness and recovery offerings.\n",
"The Mayors proposed budget includes funding\n",
"for engagement and assessment staff, new City\n",
"attorneys dedicated to CARE Court implementation,\n",
"increased capacity for treatment and housing, and\n",
"outreach and educational efforts.\n",
"Improvements at Laguna Honda Hospital\n",
"Beyond behavioral health, this budget makes\n",
"investments in DPHs budget for Laguna Honda\n",
"Hospital, which is actively working towards gaining\n",
"recertification with the Centers for Medicare\n",
"and Medicaid Services (CMS). DPH is currently\n",
"implementing the action plan submitted to CMS,\n",
"and it represents a significant facility-wide effort\n",
"and includes hundreds of process improvements.\n",
"The Mayors proposed budget includes over $3.5\n",
"million of new annual investment to support the\n",
"implementation of the action plan and sustain\n",
"the improvements, including staffing in key areas,\n",
"including education and training, patient care\n",
"experience, medication management, and leadership\n",
"within the San Francisco Health Network.\n",
"\n",
"Economic Recovery\n",
"The Mayors proposed FY 2023-24 and FY 2024-\n",
"25 budget invests $24.4 million over the two years\n",
"in support of the Roadmap for Downtown San\n",
"Franciscos Future, and broadly supports economic\n",
"recovery across the entire City. While critical\n",
"components around recovery include investments\n",
"in public safety and street conditions, there are also\n",
"targeted improvements and programs to support a\n",
"thriving economy, both downtown and throughout\n",
"the Citys neighborhood commercial corridors.\n",
"Providing Tax Relief and Incentives\n",
"To keep existing businesses stable and to recruit new\n",
"businesses, the Mayors budget includes key changes\n",
"to the Citys business taxes. In November 2020, San\n",
"20\n",
" 20 EXECUTIVE SUMMARY\n",
" EXECUTIVE SUMMARY\n",
"\n",
"\n",
"Francisco voters passed Proposition F, which phased\n",
"out the payroll expense tax, while gradually increasing\n",
"gross receipts tax for businesses across most\n",
"industries. The proposition also delayed gross receipts\n",
"tax increases until 2023 and 2024 for industries hit\n",
"hardest by the pandemic. In March 2023, the Mayor\n",
"introduced legislation to further delay tax increases\n",
"for maintenance and laundry businesses, retail trade,\n",
"food services, manufacturing, accommodations, arts,\n",
"entertainment, and recreation until 2025 and 2026.\n",
"The proposed budget includes revenue assumptions\n",
"aligned with these tax changes.\n",
"The City must also attract new businesses to fill its\n",
"office vacancies, support customer-serving businesses,\n",
"and bolster future revenue through gross receipts\n",
"tax, property tax, and contributions to other revenue\n",
"sources. The Mayors legislation will offer a discount\n",
"for up to three years on the office-based gross\n",
"receipts tax for new offices locating in San Francisco\n",
"in the information, administrative and support\n",
"services, financial services, insurance, professional\n",
"scientific and technical services industries.\n",
"\n",
"Finally, the budget proposes a change to the\n",
"Commercial Rent Tax, which was passed by voters in\n",
"June 2018. The change seeks to pause the collection\n",
"revenues on sub-leases of commercial spaces through\n",
"2029 to ensure commercial properties are only\n",
"subject to a single commercial rent tax, rather than\n",
"also being taxed for sub-leasing the space. Profits on\n",
"subleases will continue to be collected.\n",
"Supporting Small Businesses\n",
"The Mayors proposed budget continues the small\n",
"business grant program, providing $5 million in direct\n",
"grants to help small businesses across the city to\n",
"stabilize, scale, and adapt business models to changed\n",
"conditions. The program will target businesses in\n",
"commercial corridors that have experienced the\n",
"highest drop in sales tax to fill vacancies, or expand\n",
"into new storefronts, while providing business\n",
"assistance to improve operations and renegotiate\n",
"leases\n",
">> IDX: 1, Here are some facts extracted from the provided text:\n",
"\n",
"Mayor's proposed budget -> Expands investments within -> San francisco police department\n",
"Mayor's proposed budget -> Includes -> Funding\n",
"Mayor's proposed budget -> Invests in -> Facilities maintenance\n",
"Mayor's proposed budget -> Includes -> $32.0 million\n",
"Mayor's proposed budget -> Includes -> Mental health services act funds\n",
"Mayor's proposed budget -> Invests -> $24.4 million\n",
"Mayor's proposed budget -> Invests in -> Stationary engineers\n",
"Mayor's proposed budget -> Includes -> $17.7 million\n",
"Mayor's proposed budget -> Leverages -> Ocoh\n",
"Mayor's proposed budget -> Makes investments to -> Two positions\n",
"Mayor's proposed budget -> Makes investments in -> Priority areas\n",
"Mayor's proposed budget -> Continues -> Work\n",
"Mayor's proposed budget -> Includes -> $1.2 million of grant funding\n",
"Mayor's proposed budget -> Maintain -> Programs\n",
"Mayor's proposed budget -> Includes -> $0.7 million\n",
"Mayor's proposed budget -> Includes -> New funding to support efforts\n",
"Mayor's proposed budget -> Includes -> Resources\n",
"Mayor's proposed budget -> Makes investments to -> Continue supporting\n",
"Mayor's proposed budget -> Includes -> Anticipated grant\n",
"Mayor's proposed budget -> Invest in -> Expansion of recovery programs\n",
"Mayor's proposed budget -> Includes -> $0.6 million\n",
"Mayor's proposed budget -> Invests in -> Building projects\n",
"\n",
"CalAIM Expansion for People At-Risk of\n",
"Institutionalization and Justice-Involved\n",
"People\n",
"The State of California is continuing its multi-year\n",
"roll out of California Advancing and Innovating\n",
"Medi-Cal (CalAIM), a new framework that\n",
"encompasses a broad-based delivery system,\n",
"program, and payment reform across the Medi-Cal\n",
"program with a whole-person care approach. For\n",
"the proposed FY 2023-24 and FY 2024-25 budget,\n",
"the focus is on the roll-out of expanded benefits\n",
"to people at risk of long-term institutionalization\n",
"and justice-system involved people exiting jail.\n",
"As CalAIM focuses on stabilizing patients in\n",
"community settings as much as possible, the\n",
"enhanced care management (ECM) benefit allows\n",
"for Medi-Cal to pay for hands-on support to\n",
"address both the clinical and non-clinical needs of\n",
"medically complex patients to keep them out of\n",
"institutions.\n",
"\n",
"Health Equity Investments through the\n",
"Mental Health Services Act\n",
"The Mayors proposed budget includes $32.0\n",
"million in FY 2022-23 and $17.7 million of ongoing\n",
"\n",
"\n",
"in additional Mental Health Services Act (MHSA)\n",
"funds. These funds will be used to ensure the\n",
"continuity of existing MHSA programming; support\n",
"new, innovative and culturally congruent services\n",
"to meet the pressing needs of the Black/African\n",
"American community and sustain a pilot effort to\n",
"provide mental health support for Black mothers.\n",
"New initiatives in this budget include $15.0 million\n",
"for a three-year pilot from FY 2023-24 through FY\n",
"2025-26 with the Dream Keeper Initiative to create\n",
"a talk therapy, telehealth program for people in San\n",
"Francisco, with a particular focus on Black/African\n",
"American residents.\n",
"Fee for Service Transition\n",
"\n",
"In January 2023, the San Francisco Health PlanNumber of Calls(SFHP) and Zuckerberg San Francisco General\n",
"\n",
"Hospital (ZSFG) expanded its use of the fee-for-\n",
"service model to maximize revenues. By expanding\n",
"the use of fee-for-service, ZSFG will recover\n",
"more funding from the SFHP and the State, while\n",
"maintaining quality care for its patients. DPH\n",
"projects $36.7 M in additional revenues in FY 2023-\n",
"24 and $36.9 M in FY 2024-25 as result of the shift\n",
"that is included in the Mayors proposed budget.\n",
"\n",
"\n",
"\n",
"\n",
" 3,000\n",
"\n",
"\n",
" STREET RESPONSE TEAM. The goals 2,500\n",
" of the San Francisco Street Overdose\n",
" Response Team (SORT) are to reduce the 2,000\n",
" risk of opioid-related death of individuals\n",
" who have recently experienced an 1,500\n",
" overdose, contribute to an overall reduction\n",
" in overdose deaths through referrals and\n",
" care coordination with community-based 1,000\n",
" organziations, and to provide support to\n",
" people who have survived any overdose. 500\n",
"\n",
"\n",
" 0 Calls Handled Calls Including Calls That Include Clients Who\n",
" by SORT an Overdose Buprennorphrine StartsAccepted Harm\n",
" Reduction Supplies\n",
" Types of Calls\n",
"\n",
"\n",
" PUBLIC HEALTH 267\n",
">> IDX: 2, Here are some facts extracted from the provided text:\n",
"\n",
"Input -> Carefully considered in formulating -> Mayor's proposed budget\n",
"\n",
"Key Participants\n",
"• Residents provide direction for and commentary\n",
" on budget priorities throughout the annual budget\n",
" process. Input from residents through virtual\n",
" feedback forms, stakeholder working groups\n",
" convened by the Mayors Office, public budget\n",
" hearings, and communication with elected officials\n",
" are all carefully considered in formulating the\n",
" Mayors proposed budget.\n",
"• City departments prioritize needs and present\n",
" balanced budgets for review and analysis by the\n",
" Mayors Office of Public Policy and Finance.\n",
"• The multi-year budget projections described in the\n",
" previous section as well as the Capital Planning\n",
" Committee (CPC) and Committee on Information\n",
" Technology (COIT) provide guidance to the Mayors\n",
" Office on both long-term fiscal trends as well as\n",
" citywide priorities for capital and IT investments.\n",
"• The Mayor, with the assistance of the Mayors\n",
" Office of Public Policy and Finance, prepares\n",
" and submits a balanced budget to the Board of\n",
" Supervisors on an annual basis.\n",
"• The Board of Supervisors is the Citys legislative\n",
" body and is responsible for amending and\n",
" approving the Mayors proposed budget. The\n",
" Boards Budget and Legislative Analyst also\n",
" participates in reviews of city spending and\n",
" financial projections and makes recommendations\n",
" to the Board on budget modifications.\n",
"• The Controller is the Citys Chief Financial Officer\n",
" and is responsible for projecting available revenue\n",
" to fund city operations and investments in both\n",
" the near- and long-term. In addition, the City\n",
" Services Auditor Division of the Controllers Office\n",
" is responsible for working with departments to\n",
" develop, improve, and evaluate their performance\n",
" standards.\n",
"\n",
"Calendar and Process\n",
"Beginning in September and concluding in July, the\n",
"annual budget cycle can be divided into three major\n",
"stages (see calendar at the end of this section):\n",
"\n",
"50 BUDGET PROCESS\n",
"\n",
"\n",
"\n",
"• Budget Preparation: budget development and\n",
" submission to the Board of Supervisors.\n",
"• Approval: budget review and enactment by the\n",
" Board of Supervisors and budget signing by the\n",
" Mayor\n",
"• Implementation: department execution and budget\n",
" adjustments.\n",
"\n",
"Budget Preparation\n",
"Preliminary projections of Enterprise and General Fund\n",
"revenues for the next fiscal year by the Controllers\n",
"Office and Mayors Office staff begin in September.\n",
"Around this time, many departments begin budget\n",
"planning to allow adequate input from oversight\n",
"commissions and the public. In December, budget\n",
"instructions are issued by the Mayors Office and\n",
"the Controllers Office with detailed guidance on the\n",
"preparation of department budget requests. The\n",
"instructions contain a financial outlook, policy goals,\n",
"and guidelines as well as technical instructions.\n",
"Three categories of budgets are prepared:\n",
"• General Fund department budgets: General Fund\n",
" departments rely in whole or in part on discretionary\n",
" revenue comprised primarily of local taxes such as\n",
" property, sales, payroll, and other taxes. The Mayor\n",
" introduces the proposed General Fund budget to\n",
" the Board of Supervisors on June 1.\n",
"\n",
"• Enterprise department budgets: Enterprise\n",
" departments generate non-discretionary revenue\n",
" primarily from charges for services that are used\n",
" to support operations. The Mayor introduces\n",
" the proposed Enterprise budgets to the Board of\n",
" Supervisors on May 1.\n",
"• Capital and IT budgets: Capital and IT budget\n",
" requests are submitted to the CPC and COIT\n",
" for review. The recommendations for each\n",
" committee are taken into account during the budget\n",
" preparation process. The Citys Ten-Year Capital\n",
" Plan is brought before the Board of Supervisors and\n",
" Mayor for approval concurrently with the General\n",
" Fund and Enterprise department budgets.\n"
]
}
],
"source": [
"nodes = kg_retriever.retrieve(\n",
" \"Give me all the programs that the mayor's budget includes\"\n",
")\n",
"# nodes = kg_retriever.retrieve('san francisco')\n",
"print(len(nodes))\n",
"for idx, node in enumerate(nodes):\n",
" print(f\">> IDX: {idx}, {node.get_content()}\")"
]
},
{
"cell_type": "markdown",
"id": "1e0867c1-dd3d-4e84-93a3-30b4ec861024",
"metadata": {},
"source": [
"## Build Baseline Vector Index\n",
"\n",
"We also build a \"baseline\" vector index. This follows the \"naive\" RAG pipeline approach of chunking and vector embedding. We use this as a comparison point."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18b6e528-e1af-41a0-9f04-909e24f1584b",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import VectorStoreIndex\n",
"from llama_index.core.query_engine import RetrieverQueryEngine\n",
"\n",
"base_index = VectorStoreIndex.from_documents(sub_docs, embed_model=embed_model)\n",
"base_retriever = base_index.as_retriever(similarity_top_k=2)\n",
"base_query_engine = RetrieverQueryEngine(base_retriever)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "573bef31-0194-4532-b0b1-ea3e84e3890c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The mayor's budget includes the following programs:\n",
"\n",
"1. Ongoing programmatic support for all districts.\n",
"2. One-time $5.0 million and $250,000 ongoing investment in priority community-based organization needs, including capital and infrastructure, and public safety.\n",
"3. Support for the Mayors Office Administration, which advances Mayoral priorities through policy and budget development, communications, and advocacy.\n",
"4. Financial Capability Services.\n",
"5. Nonprofit Capacity Building.\n",
"6. Eviction Prevention and Housing Stabilization Services.\n",
"7. Community and Housing Place-Based Services.\n",
"8. Civil Legal Services.\n",
"9. Supportive Housing for Persons with HIV/AIDS.\n",
"10. Community, Coalition, and Cultural District Building.\n",
"11. Rental and Homeownership Counseling.\n",
"12. Capital Projects.\n",
"13. Housing Development Grants.\n",
"14. Creation of permanently affordable housing.\n",
"15. Foster healthy communities and neighborhoods.\n",
"16. Improve access to affordable housing.\n",
"17. Preserve affordable housing.\n",
"18. Promote self-sufficiency and protect rights.\n",
"19. Investments in capital and information technology, including urgent repairs and crucial projects across the park system and ADA needs.\n",
"20. Replacement of critical City systems like the Computer Aided Dispatch system and the Property Tax System.\n",
"21. Funding for the Sheriffs Jail Management System, JUSTIS Data Center of Excellence, Infrastructure Modernization, and digital accessibility.\n",
"22. New projects like the replacement of the legacy Legislative Management System and a new platform for the Empty Homes Tax.\n",
"23. Overdose prevention, treatment, and outreach programs funded by settlements with opioid manufacturers and distributors.\n",
"24. Savings targets in FY 2024-25 by reducing budgets for real estate expenses, software and technology licenses, and materials and supplies.\n",
"25. Lower-than-planned investments in citywide equipment, IT, and capital spending.\n",
"26. Use of reserves to help balance the budget while maintaining the bulk of the Citys reserves.\n"
]
}
],
"source": [
"response = base_query_engine.query(\n",
" \"Give me all the programs that the mayor's budget includes\"\n",
")\n",
"print(str(response))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58dee948-240c-4523-9be1-4f770f549a7a",
"metadata": {},
"outputs": [],
"source": [
"print(len(response.source_nodes))\n",
"for node in response.source_nodes:\n",
" print(\"---\")\n",
" print(node.get_content())"
]
},
{
"cell_type": "markdown",
"id": "cfabc20e-bc8b-4e80-80ef-cfdee07a6e10",
"metadata": {},
"source": [
"## Build Custom Retriever\n",
"\n",
"Build joint retriever that combines vector and KG search."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea75c09f-a0d1-48fe-b349-5e52ffe4df03",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.retrievers import BaseRetriever\n",
"from llama_index.core.schema import NodeWithScore\n",
"from typing import List\n",
"\n",
"\n",
"class CustomRetriever(BaseRetriever):\n",
" \"\"\"Custom retriever that performs both KG vector search and direct vector search.\"\"\"\n",
"\n",
" def __init__(self, kg_retriever, vector_retriever):\n",
" self._kg_retriever = kg_retriever\n",
" self._vector_retriever = vector_retriever\n",
"\n",
" def _retrieve(self, query_bundle) -> List[NodeWithScore]:\n",
" \"\"\"Retrieve nodes given query.\"\"\"\n",
" kg_nodes = self._kg_retriever.retrieve(query_bundle)\n",
" vector_nodes = self._vector_retriever.retrieve(query_bundle)\n",
"\n",
" unique_nodes = {n.node_id: n for n in kg_nodes}\n",
" unique_nodes.update({n.node_id: n for n in vector_nodes})\n",
" return list(unique_nodes.values())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3dae1e1-b3a3-4896-91e5-1de95ed32a0d",
"metadata": {},
"outputs": [],
"source": [
"custom_retriever = CustomRetriever(kg_retriever, base_retriever)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6c71c51-32a5-4420-917c-6d42b33c2c64",
"metadata": {},
"outputs": [],
"source": [
"nodes = custom_retriever.retrieve(\n",
" \"Give me all the programs that the mayor's budget includes\"\n",
")\n",
"# len(nodes)"
]
},
{
"cell_type": "markdown",
"id": "389f6d0c-12b2-45e3-916a-f521befd6b91",
"metadata": {},
"source": [
"## Build Agent\n",
"\n",
"Now that we have the retriever, we can treat it as a RAG pipeline tool, and wrap it with an agent that can perform basic CoT reasoning and maintain conversation memory over time."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f7e7854-5d4e-49b8-baaa-8ad1cba053a0",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.tools import QueryEngineTool, ToolMetadata\n",
"from llama_index.core.query_engine import RetrieverQueryEngine\n",
"\n",
"kg_query_engine = RetrieverQueryEngine(custom_retriever)\n",
"kg_query_tool = QueryEngineTool(\n",
" query_engine=kg_query_engine,\n",
" metadata=ToolMetadata(\n",
" name=\"query_tool\",\n",
" description=\"Provides information about the 2023 SF Budget Report.\",\n",
" ),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec7e8a74-2e9a-4e20-91a7-d997160af829",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.agent import FunctionCallingAgentWorker\n",
"\n",
"agent_worker = FunctionCallingAgentWorker.from_tools(\n",
" [kg_query_tool],\n",
" llm=llm,\n",
" verbose=True,\n",
" allow_parallel_tool_calls=False,\n",
")\n",
"agent = agent_worker.as_agent()"
]
},
{
"cell_type": "markdown",
"id": "d567f1e8-621d-4107-9e8f-75051c900ff3",
"metadata": {},
"source": [
"## Try out Queries\n",
"\n",
"Now that the agent is setup, let's try out some queries."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d0b3074-92bf-4e0e-84da-8c29d2eb3ac0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Added user message to memory: Give me all the programs that the mayor's budget includes\n",
"=== Calling Function ===\n",
"Calling function: query_tool with args: {\"input\": \"all programs included in the mayor's budget\"}\n",
"=== Function Output ===\n",
"The mayor's budget includes a variety of programs and investments:\n",
"\n",
"1. **CARE Court Implementation**: Funding for engagement and assessment staff, new City attorneys, increased capacity for treatment and housing, and outreach and educational efforts.\n",
"2. **Laguna Honda Hospital**: Over $3.5 million for staffing in key areas, including education and training, patient care experience, medication management, and leadership.\n",
"3. **Economic Recovery**: $24.4 million over two years for the Roadmap for Downtown San Franciscos Future and economic recovery across the city.\n",
"4. **Tax Relief and Incentives**: Changes to business taxes, including delaying tax increases for certain industries and offering discounts on office-based gross receipts tax for new offices.\n",
"5. **Small Business Support**: $5 million in direct grants to help small businesses stabilize, scale, and adapt.\n",
"6. **CalAIM Expansion**: Focus on expanded benefits for people at risk of long-term institutionalization and justice-system involved individuals.\n",
"7. **Mental Health Services Act**: $32.0 million in FY 2022-23 and $17.7 million ongoing for mental health services, including a $15.0 million pilot for a talk therapy telehealth program.\n",
"8. **Fee for Service Transition**: Expansion of the fee-for-service model to maximize revenues.\n",
"9. **Street Overdose Response Team (SORT)**: Efforts to reduce opioid-related deaths and provide support to overdose survivors.\n",
"10. **Capital and IT Investments**: $118 million over two years for urgent repairs and $53.9 million for vital technology projects.\n",
"11. **Dream Keeper Initiative**: Continued investment in San Franciscos Black communities, including support for small businesses, housing programs, and health services.\n",
"12. **Climate Action Plan**: $2 million over two years to support staff at the Department of Environment.\n",
"13. **Good Government Initiatives**: Funding for hiring and contracting reforms, ongoing and new IT projects, and capital maintenance and critical repairs.\n",
"14. **Minimum Compensation Ordinance (MCO)**: Investments to increase wages for the lowest-paid workers providing City services.\n",
"\n",
"These programs reflect a broad range of priorities, including health, economic recovery, small business support, mental health services, and infrastructure improvements.\n",
"=== LLM Response ===\n",
"The mayor's budget includes a comprehensive array of programs and investments, covering various sectors and priorities:\n",
"\n",
"1. **CARE Court Implementation**: Funding for engagement and assessment staff, new City attorneys, increased capacity for treatment and housing, and outreach and educational efforts.\n",
"2. **Laguna Honda Hospital**: Over $3.5 million for staffing in key areas, including education and training, patient care experience, medication management, and leadership.\n",
"3. **Economic Recovery**: $24.4 million over two years for the Roadmap for Downtown San Franciscos Future and economic recovery across the city.\n",
"4. **Tax Relief and Incentives**: Changes to business taxes, including delaying tax increases for certain industries and offering discounts on office-based gross receipts tax for new offices.\n",
"5. **Small Business Support**: $5 million in direct grants to help small businesses stabilize, scale, and adapt.\n",
"6. **CalAIM Expansion**: Focus on expanded benefits for people at risk of long-term institutionalization and justice-system involved individuals.\n",
"7. **Mental Health Services Act**: $32.0 million in FY 2022-23 and $17.7 million ongoing for mental health services, including a $15.0 million pilot for a talk therapy telehealth program.\n",
"8. **Fee for Service Transition**: Expansion of the fee-for-service model to maximize revenues.\n",
"9. **Street Overdose Response Team (SORT)**: Efforts to reduce opioid-related deaths and provide support to overdose survivors.\n",
"10. **Capital and IT Investments**: $118 million over two years for urgent repairs and $53.9 million for vital technology projects.\n",
"11. **Dream Keeper Initiative**: Continued investment in San Franciscos Black communities, including support for small businesses, housing programs, and health services.\n",
"12. **Climate Action Plan**: $2 million over two years to support staff at the Department of Environment.\n",
"13. **Good Government Initiatives**: Funding for hiring and contracting reforms, ongoing and new IT projects, and capital maintenance and critical repairs.\n",
"14. **Minimum Compensation Ordinance (MCO)**: Investments to increase wages for the lowest-paid workers providing City services.\n",
"\n",
"These programs reflect a broad range of priorities, including health, economic recovery, small business support, mental health services, and infrastructure improvements.\n"
]
}
],
"source": [
"response = agent.chat(\"Give me all the programs that the mayor's budget includes\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1d0c63e-817f-4615-a237-82d728917e4e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The mayor's budget includes a comprehensive array of programs and investments, covering various sectors and priorities:\n",
"\n",
"1. **CARE Court Implementation**: Funding for engagement and assessment staff, new City attorneys, increased capacity for treatment and housing, and outreach and educational efforts.\n",
"2. **Laguna Honda Hospital**: Over $3.5 million for staffing in key areas, including education and training, patient care experience, medication management, and leadership.\n",
"3. **Economic Recovery**: $24.4 million over two years for the Roadmap for Downtown San Franciscos Future and economic recovery across the city.\n",
"4. **Tax Relief and Incentives**: Changes to business taxes, including delaying tax increases for certain industries and offering discounts on office-based gross receipts tax for new offices.\n",
"5. **Small Business Support**: $5 million in direct grants to help small businesses stabilize, scale, and adapt.\n",
"6. **CalAIM Expansion**: Focus on expanded benefits for people at risk of long-term institutionalization and justice-system involved individuals.\n",
"7. **Mental Health Services Act**: $32.0 million in FY 2022-23 and $17.7 million ongoing for mental health services, including a $15.0 million pilot for a talk therapy telehealth program.\n",
"8. **Fee for Service Transition**: Expansion of the fee-for-service model to maximize revenues.\n",
"9. **Street Overdose Response Team (SORT)**: Efforts to reduce opioid-related deaths and provide support to overdose survivors.\n",
"10. **Capital and IT Investments**: $118 million over two years for urgent repairs and $53.9 million for vital technology projects.\n",
"11. **Dream Keeper Initiative**: Continued investment in San Franciscos Black communities, including support for small businesses, housing programs, and health services.\n",
"12. **Climate Action Plan**: $2 million over two years to support staff at the Department of Environment.\n",
"13. **Good Government Initiatives**: Funding for hiring and contracting reforms, ongoing and new IT projects, and capital maintenance and critical repairs.\n",
"14. **Minimum Compensation Ordinance (MCO)**: Investments to increase wages for the lowest-paid workers providing City services.\n",
"\n",
"These programs reflect a broad range of priorities, including health, economic recovery, small business support, mental health services, and infrastructure improvements.\n"
]
}
],
"source": [
"print(str(response))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7cb264f-70c8-4d02-a11b-2c8b8990af76",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Added user message to memory: Compare the budget for DPA Police Accountabilty from 2022-2023 to 2023-2024\n",
"=== Calling Function ===\n",
"Calling function: query_tool with args: {\"input\": \"DPA Police Accountability budget for 2022-2023\"}\n",
"=== Function Output ===\n",
"The DPA Police Accountability budget for 2022-2023 is $9,776,177.\n",
"=== Calling Function ===\n",
"Calling function: query_tool with args: {\"input\": \"DPA Police Accountability budget for 2023-2024\"}\n",
"=== Function Output ===\n",
"The budget for the Department of Police Accountability for the fiscal year 2023-2024 is $9,990,353.\n",
"=== LLM Response ===\n",
"The budget for the Department of Police Accountability (DPA) has increased from $9,776,177 in the fiscal year 2022-2023 to $9,990,353 in the fiscal year 2023-2024. This represents an increase of $214,176.\n",
"The budget for the Department of Police Accountability (DPA) has increased from $9,776,177 in the fiscal year 2022-2023 to $9,990,353 in the fiscal year 2023-2024. This represents an increase of $214,176.\n"
]
}
],
"source": [
"agent.reset()\n",
"response = agent.chat(\n",
" \"Compare the budget for DPA Police Accountabilty from 2022-2023 to 2023-2024\"\n",
")\n",
"print(str(response))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ece9079-da9c-4573-b249-2cab9d71fddf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Here are some facts extracted from the provided text:\n",
"\n",
"Dpa department of police accountability -> Has -> Total funded positions\n",
"\n",
"ORGANIZATIONAL STRUCTURE: POLICE ACCOUNTABILITY\n",
"\n",
"\n",
" Executive Director\n",
"\n",
"\n",
" Chief of Staff Chief of Investigations\n",
"\n",
"\n",
" Audit Operations Legal Investigations Mediation\n",
"\n",
"\n",
" Policy SB 1421\n",
"\n",
"\n",
"Department Total Budget Historical Comparison (Mayor's Proposed) Budget Year 2023-2024 and 2024-2025\n",
"\n",
"\n",
" Department Total Budget Historical Comparison\n",
" DPA Department Of Police AccountabilityTOTAL BUDGET HISTORICAL COMPARISON2022-2023 2023-2024 CHANGE 2024-2025 CHANGE\n",
" FUNDED POSITIONS ORIGINAL PROPOSED FROM PROPOSED FROM\n",
" 2022-2023BUDGET 2023-2024BUDGETChanges from2022-2023 2024-2025BUDGETChanges from2023-2024\n",
"Funded Positions Original Budget Proposed Budget 2022-2023 Proposed Budget 2023-2024\n",
" Total Funded 45.17 41.95 (3.22) 41.85 (0.10)\n",
" Non-Operating Positions (CAP/Other) (2.00) (1.00) 1.00 (1.00)\n",
" Net Operating Positions 43.17 40.95 (2.22) 40.85 (0.10)\n",
"\n",
"\n",
"Sources\n",
" Expenditure Recovery 128,000 332,795 204,795 332,795\n",
"\n",
"\n",
" General Fund 9,648,177 9,657,558 9,381 9,488,396 (169,162)\n",
" Sources Total 9,776,177 9,990,353 214,176 9,821,191 (169,162)\n",
"\n",
"\n",
"Uses - Operating Expenditures\n",
" Salaries 6,003,750 5,930,159 (73,591) 6,158,974 228,815\n",
" Mandatory Fringe Benefits 2,257,157 2,086,784 (170,373) 2,143,678 56,894\n",
" Non-Personnel Services 324,336 334,336 10,000 333,742 (594)\n",
" Materials & Supplies 34,918 34,918 31,426 (3,492)\n",
" Programmatic Projects 100,000 500,000 400,000 100,000 (400,000)\n",
" Services Of Other Depts 1,056,016 1,104,156 48,140 1,053,371 (50,785)\n",
" Uses Total 9,776,177 9,990,353 214,176 9,821,191 (169,162)\n",
"\n",
"\n",
"Uses - By Division Description\n",
" DPA Police Accountabilty 9,776,177 9,990,353 214,176 9,821,191 (169,162)\n",
" Uses by Division Total 9,776,177 9,990,353 214,176 9,821,191 (169,162)\n",
"\n",
"\n",
" POLICE ACCOUNTABILITY 249\n"
]
}
],
"source": [
"print(str(response.source_nodes[0].get_content()))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama_parse",
"language": "python",
"name": "llama_parse"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}