video 1

2026-06-30 21:27:56 -04:00 · 2023-07-13 16:14:58 -06:00
parent e6365532e6
commit 5074558921
126 changed files with 11825 additions and 0 deletions
@@ -0,0 +1,339 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# LlamaIndex Bottoms-Up Development - LLMs and Prompts\n",
+    "This notebook walks through testing an LLM using the primary prompt templates used in llama-index."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openai\n",
+    "import os\n",
+    "\n",
+    "openai.api_key = \"YOUR_API_KEY\"\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\""
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "In this section, we load a test document, create an LLM, and copy prompts from llama-index to test with."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, let's load a quick document to test with. Right now, we will just load it as plain text, but we can do other operations later!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(\"./getting_started/starter_example.md\", \"r\") as f:\n",
+    "    text = f.read()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we create our LLM!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.llms import OpenAI\n",
+    "llm = OpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "LlamaIndex uses some simple templates under the hood for answering queries -- mainly a `text_qa_template` for obtaining initial answers, and a `refine_template` for refining an existing answer when all the text does not fit into one LLM call.\n",
+    "\n",
+    "Let's copy the default templates, and test out our LLM with a few questions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index import Prompt\n",
+    "\n",
+    "text_qa_template = Prompt(\n",
+    "    \"Context information is below.\\n\"\n",
+    "    \"---------------------\\n\"\n",
+    "    \"{context_str}\\n\"\n",
+    "    \"---------------------\\n\"\n",
+    "    \"Given the context information and not prior knowledge, \"\n",
+    "    \"answer the question: {query_str}\\n\"\n",
+    ")\n",
+    "\n",
+    "refine_template = Prompt(\n",
+    "    \"We have the opportunity to refine the original answer \"\n",
+    "    \"(only if needed) with some more context below.\\n\"\n",
+    "    \"------------\\n\"\n",
+    "    \"{context_msg}\\n\"\n",
+    "    \"------------\\n\"\n",
+    "    \"Given the new context, refine the original answer to better \"\n",
+    "    \"answer the question: {query_str}. \"\n",
+    "    \"If the context isn't useful, output the original answer again.\\n\"\n",
+    "    \"Original Answer: {existing_answer}\"\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, lets test a few questions!\n",
+    "\n",
+    "## Text QA Template Testing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "To install LlamaIndex, you can follow the installation steps provided in the \"installation\" guide.\n"
+     ]
+    }
+   ],
+   "source": [
+    "question = \"How can I install llama-index?\"\n",
+    "prompt = text_qa_template.format(context_str=text, query_str=question)\n",
+    "response = llm.complete(prompt)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "To create an index using LlamaIndex, you need to follow these steps:\n",
+      "\n",
+      "1. Download the LlamaIndex repository by cloning it from GitHub.\n",
+      "2. Navigate to the `examples/paul_graham_essay` folder in the cloned repository.\n",
+      "3. Create a new Python file and import the necessary modules: `VectorStoreIndex` and `SimpleDirectoryReader`.\n",
+      "4. Load the documents from the `data` folder using `SimpleDirectoryReader('data').load_data()`.\n",
+      "5. Build the index using `VectorStoreIndex.from_documents(documents)`.\n",
+      "6. To persist the index to disk, use `index.storage_context.persist()`.\n",
+      "7. To reload the index from disk, use the `StorageContext` and `load_index_from_storage` functions.\n",
+      "\n",
+      "Note: This answer assumes that you have already installed LlamaIndex and have the necessary dependencies.\n"
+     ]
+    }
+   ],
+   "source": [
+    "question = \"How do I create an index?\"\n",
+    "prompt = text_qa_template.format(context_str=text, query_str=question)\n",
+    "response = llm.complete(prompt)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "```python\n",
+      "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
+      "\n",
+      "documents = SimpleDirectoryReader('data').load_data()\n",
+      "index = VectorStoreIndex.from_documents(documents)\n",
+      "```"
+     ]
+    }
+   ],
+   "source": [
+    "question = \"How do I create an index? Write your answer using only code.\"\n",
+    "prompt = text_qa_template.format(context_str=text, query_str=question)\n",
+    "response_gen = llm.stream_complete(prompt)\n",
+    "for response in response_gen:\n",
+    "    print(response.delta, end=\"\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Refine Template Testing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "To create an index using LlamaIndex, follow these steps:\n",
+      "\n",
+      "```python\n",
+      "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
+      "\n",
+      "# Load the documents from the 'data' folder\n",
+      "documents = SimpleDirectoryReader('data').load_data()\n",
+      "\n",
+      "# Build the index\n",
+      "index = VectorStoreIndex.from_documents(documents)\n",
+      "\n",
+      "# Persist the index to disk\n",
+      "index.storage_context.persist()\n",
+      "\n",
+      "# Reload the index from disk\n",
+      "from llama_index import StorageContext, load_index_from_storage\n",
+      "\n",
+      "storage_context = StorageContext.from_defaults(persist_dir=\"./storage\")\n",
+      "index = load_index_from_storage(storage_context)\n",
+      "```\n",
+      "\n",
+      "Make sure you have installed LlamaIndex and have the necessary dependencies.\n"
+     ]
+    }
+   ],
+   "source": [
+    "question = \"How do I create an index? Write your answer using only code.\"\n",
+    "existing_answer = \"\"\"To create an index using LlamaIndex, you need to follow these steps:\n",
+    "\n",
+    "1. Download the LlamaIndex repository by cloning it from GitHub.\n",
+    "2. Navigate to the `examples/paul_graham_essay` folder in the cloned repository.\n",
+    "3. Create a new Python file and import the necessary modules: `VectorStoreIndex` and `SimpleDirectoryReader`.\n",
+    "4. Load the documents from the `data` folder using `SimpleDirectoryReader('data').load_data()`.\n",
+    "5. Build the index using `VectorStoreIndex.from_documents(documents)`.\n",
+    "6. To persist the index to disk, use `index.storage_context.persist()`.\n",
+    "7. To reload the index from disk, use the `StorageContext` and `load_index_from_storage` functions.\n",
+    "\n",
+    "Note: This answer assumes that you have already installed LlamaIndex and have the necessary dependencies.\"\"\"\n",
+    "prompt = refine_template.format(context_msg=text, query_str=question, existing_answer=existing_answer)\n",
+    "response = llm.complete(prompt)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Chat Example\n",
+    "The LLM also has a `chat` method that takes in a list of messages, to simulate a chat session. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "assistant: To create an index, you will need to follow these general steps:\n",
+      "\n",
+      "1. Determine the purpose and scope of your index: Decide what information you want to include in your index and what it will be used for. This will help you determine the structure and content of your index.\n",
+      "\n",
+      "2. Identify the items to be indexed: Determine the specific items or topics that you want to include in your index. For example, if you are creating an index for a book, you might want to index chapters, sections, and important concepts.\n",
+      "\n",
+      "3. Create a list of index terms: Identify the key terms or phrases that will be used to reference each item in your index. These terms should be concise and descriptive.\n",
+      "\n",
+      "4. Organize the index terms: Determine the hierarchical structure of your index. You can use headings, subheadings, and indentation to create a logical and organized structure.\n",
+      "\n",
+      "5. Assign page numbers or locations: For each index term, identify the page number or location where the item can be found. This will help users quickly locate the information they are looking for.\n",
+      "\n",
+      "6. Format the index: Use a consistent and clear formatting style for your index. You can use software tools like Microsoft Word or Adobe InDesign to create a professional-looking index.\n",
+      "\n",
+      "7. Review and revise: Once you have created your index, review it carefully to ensure accuracy and completeness. Make any necessary revisions or updates before finalizing your index.\n",
+      "\n",
+      "Remember, creating an index can be a time-consuming process, so it's important to plan and allocate enough time to complete it accurately.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from llama_index.llms import ChatMessage\n",
+    "\n",
+    "chat_history = [\n",
+    "    ChatMessage(role=\"system\", content=\"You are a helpful QA chatbot that can answer questions about llama-index.\"),\n",
+    "    ChatMessage(role=\"user\", content=\"How do I create an index?\"),\n",
+    "]\n",
+    "\n",
+    "response = llm.chat(chat_history)\n",
+    "print(response.message)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "In this notebook, we covered the low-level LLM API, and tested out some basic prompts with out documentation data."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,54 @@
+# Documentation Guide
+
+## A guide for docs contributors
+
+The `docs` directory contains the sphinx source text for LlamaIndex docs, visit
+https://gpt-index.readthedocs.io/ to read the full documentation.
+
+This guide is made for anyone who's interested in running LlamaIndex documentation locally,
+making changes to it and make contributions. LlamaIndex is made by the thriving community
+behind it, and you're always welcome to make contributions to the project and the 
+documentation. 
+
+## Build Docs
+
+If you haven't already, clone the LlamaIndex Github repo to a local directory:
+
+```bash
+git clone https://github.com/jerryjliu/llama_index.git && cd llama_index
+```
+
+Install all dependencies required for building docs (mainly `sphinx` and its extension):
+
+```bash
+pip install -r docs/requirements.txt
+```
+
+Build the sphinx docs:
+
+```bash
+cd docs
+make html
+```
+
+The docs HTML files are now generated under `docs/_build/html` directory, you can preview
+it locally with the following command:
+
+```bash
+python -m http.server 8000 -d _build/html
+```
+
+And open your browser at http://0.0.0.0:8000/ to view the generated docs.
+
+
+##### Watch Docs
+
+We recommend using sphinx-autobuild during development, which provides a live-reloading 
+server, that rebuilds the documentation and refreshes any open pages automatically when 
+changes are saved. This enables a much shorter feedback loop which can help boost 
+productivity when writing documentation.
+
+Simply run the following command from LlamaIndex project's root directory: 
+```bash
+make watch-docs
+```
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -0,0 +1,103 @@
+# App Showcase
+
+Here is a sample of some of the incredible applications and tools built on top of LlamaIndex! 
+
+###### Meru - Dense Data Retrieval API
+
+Hosted API service. Includes a "Dense Data Retrieval" API built on top of LlamaIndex where users can upload their documents and query them.
+[[Website]](https://www.usemeru.com/densedataretrieval)
+
+###### Algovera
+
+Build AI workflows using building blocks. Many workflows built on top of LlamaIndex.
+
+[[Website]](https://app.algovera.ai/workflows).
+
+###### ChatGPT LlamaIndex
+
+Interface that allows users to upload long docs and chat with the bot.
+[[Tweet thread]](https://twitter.com/s_jobs6/status/1618346125697875968?s=20&t=RJhQu2mD0-zZNGfq65xodA)
+
+###### AgentHQ
+
+A web tool to build agents, interacting with LlamaIndex data structures.[[Website]](https://app.agent-hq.io/)
+
+
+###### PapersGPT
+
+Feed any of the following content into GPT to give it deep customized knowledge:
+- Scientific Papers
+- Substack Articles
+- Podcasts
+- Github Repos
+and more.
+
+[[Tweet thread]](https://twitter.com/thejessezhang/status/1615390646763945991?s=20&t=eHvhmIaaaoYFyPSzDRNGtA)
+[[Website]](https://jessezhang.org/llmdemo)
+
+###### VideoQues + DocsQues
+
+**VideoQues**: A tool that answers your queries on YouTube videos. 
+[[LinkedIn post here]](https://www.linkedin.com/posts/ravidesetty_ai-ml-dl-activity-7020599110953050112-EJA_/?utm_source=share&utm_medium=member_desktop).
+
+**DocsQues**: A tool that answers your questions on longer documents (including .pdfs!)
+[[LinkedIn post here]](https://www.linkedin.com/posts/ravidesetty_artificialintelligence-machinelearning-recruiters-activity-7016972785293946880-rhKC?utm_source=share&utm_medium=member_desktop).
+
+###### PaperBrain
+
+A platform to access/understand research papers.
+
+[[Tweet thread]](https://twitter.com/mdarshad1000/status/1619824637898264578?s=20&t=eHvhmIaaaoYFyPSzDRNGtA).
+
+
+###### CACTUS
+Contextual search on top of LinkedIn search results. 
+[[LinkedIn post here]](https://www.linkedin.com/posts/mathewteoh_chromeextension-chatgpt-python-activity-7019362515566403584-ryqW?utm_source=share&utm_medium=member_desktop).
+
+
+###### Personal Note Chatbot
+A chatbot that can answer questions over a directory of Obsidian notes. 
+[[Tweet thread]](https://twitter.com/Sarah_A_Bentley/status/1611069576099336207?s=20&t=IjPLK3msACQjEBYxJJxj4w).
+
+
+###### RHOBH AMA
+
+Ask questions about the Real Housewives of Beverly Hills.
+[[Tweet thread]](https://twitter.com/YourBuddyConner/status/1616504644439789568?s=20&t=bCHa3im7mjoIXLuKo5PttQ)
+[[Website]](https://realhousewivesai.com/)
+
+###### Mynd
+
+A journaling app that uses AI to uncover insights and patterns over time.
+[[Website]](https://mynd.so)
+
+###### CoFounder
+The First AI Co-Founder for Your Start-up 🙌
+
+[CoFounder](https://co-founder.ai?utm_source=llama-index&utm_medium=gallary&utm_campaign=alpha) is a platform to revolutionize the start-up ecosystem by providing founders with unparalleled tools, resources, and support. We are changing how founders build their companies from 0-1—productizing the accelerator/incubator programs using AI.
+
+Current features:
+
+* AI Investor Matching and Introduction and Tracking
+* AI Pitch Deck creation
+* Real-time Pitch Deck practice/feedback
+* Automatic Competitive Analysis / Watchlist
+* More coming soon...
+
+[[Website]](https://co-founder.ai?utm_source=llama-index&utm_medium=gallary&utm_campaign=alpha)
+
+###### Al-X by OpenExO
+
+Your Digital Transformation Co-Pilot
+[[Website]](https://chat.openexo.com)
+
+###### AnySummary
+
+Summarize any document, audio or video with AI
+[[Website]](https://anysummary.app)
+
+###### Blackmaria
+
+Python package for webscraping in Natural language.
+[[Tweet thread]](https://twitter.com/obonigwe1/status/1640080422661943298?t=aftqisb4vaudwrgwah_1oa&s=19)
+[[Github]](https://github.com/Smyja/blackmaria)
@@ -0,0 +1,16 @@
+# Integrations
+
+LlamaIndex has a number of community integrations, from vector stores, to prompt trackers, tracers, and more!
+
+```{toctree}
+---
+maxdepth: 1
+---
+integrations/graphsignal.md
+integrations/guidance.md
+integrations/trulens.md
+integrations/chatgpt_plugins.md
+integrations/using_with_langchain.md
+integrations/graph_stores.md
+integrations/vector_stores.md
+```
@@ -0,0 +1,129 @@
+# ChatGPT Plugin Integrations
+
+**NOTE**: This is a work-in-progress, stay tuned for more exciting updates on this front! 
+
+## ChatGPT Retrieval Plugin Integrations
+
+The [OpenAI ChatGPT Retrieval Plugin](https://github.com/openai/chatgpt-retrieval-plugin)
+offers a centralized API specification for any document storage system to interact 
+with ChatGPT. Since this can be deployed on any service, this means that more and more
+document retrieval services will implement this spec; this allows them to not only
+interact with ChatGPT, but also interact with any LLM toolkit that may use 
+a retrieval service.
+
+LlamaIndex provides a variety of integrations with the ChatGPT Retrieval Plugin.
+
+### Loading Data from LlamaHub into the ChatGPT Retrieval Plugin
+
+The ChatGPT Retrieval Plugin defines an `/upsert` endpoint for users to load
+documents. This offers a natural integration point with LlamaHub, which offers
+over 65 data loaders from various API's and document formats.
+
+Here is a sample code snippet of showing how to load a document from LlamaHub
+into the JSON format that `/upsert` expects:
+
+```python
+from llama_index import download_loader, Document
+from typing import Dict, List
+import json
+
+# download loader, load documents
+SimpleWebPageReader = download_loader("SimpleWebPageReader")
+loader = SimpleWebPageReader(html_to_text=True)
+url = "http://www.paulgraham.com/worked.html"
+documents = loader.load_data(urls=[url])
+
+# Convert LlamaIndex Documents to JSON format
+def dump_docs_to_json(documents: List[Document], out_path: str) -> Dict:
+    """Convert LlamaIndex Documents to JSON format and save it."""
+    result_json = []
+    for doc in documents:
+        cur_dict = {
+            "text": doc.get_text(),
+            "id": doc.get_doc_id(),
+            # NOTE: feel free to customize the other fields as you wish
+            # fields taken from https://github.com/openai/chatgpt-retrieval-plugin/tree/main/scripts/process_json#usage
+            # "source": ...,
+            # "source_id": ...,
+            # "url": url,
+            # "created_at": ...,
+            # "author": "Paul Graham",
+        }
+        result_json.append(cur_dict)
+    
+    json.dump(result_json, open(out_path, 'w'))
+
+```
+
+For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPT_Retrieval_Plugin_Upload.ipynb).
+
+### ChatGPT Retrieval Plugin Data Loader
+
+The ChatGPT Retrieval Plugin data loader [can be accessed on LlamaHub](https://llamahub.ai/l/chatgpt_plugin).
+
+It allows you to easily load data from any docstore that implements the plugin API, into a LlamaIndex data structure.
+
+Example code:
+
+```python
+from llama_index.readers import ChatGPTRetrievalPluginReader
+import os
+
+# load documents
+bearer_token = os.getenv("BEARER_TOKEN")
+reader = ChatGPTRetrievalPluginReader(
+    endpoint_url="http://localhost:8000",
+    bearer_token=bearer_token
+)
+documents = reader.load_data("What did the author do growing up?")
+
+# build and query index
+from llama_index import ListIndex
+index = ListIndex(documents)
+# set Logging to DEBUG for more detailed outputs
+query_engine = vector_index.as_query_engine(
+    response_mode="compact"
+)
+response = query_engine.query(
+    "Summarize the retrieved content and describe what the author did growing up",
+) 
+
+```
+For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPTRetrievalPluginReaderDemo.ipynb).
+
+### ChatGPT Retrieval Plugin Index
+
+The ChatGPT Retrieval Plugin Index allows you to easily build a vector index over any documents, with storage backed by a document store implementing the 
+ChatGPT endpoint.
+
+Note: this index is a vector index, allowing top-k retrieval.
+
+Example code:
+
+```python
+from llama_index.indices.vector_store import ChatGPTRetrievalPluginIndex
+from llama_index import SimpleDirectoryReader
+import os
+
+# load documents
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+
+# build index
+bearer_token = os.getenv("BEARER_TOKEN")
+# initialize without metadata filter
+index = ChatGPTRetrievalPluginIndex(
+    documents, 
+    endpoint_url="http://localhost:8000",
+    bearer_token=bearer_token,
+)
+
+# query index
+query_engine = vector_index.as_query_engine(
+    similarity_top_k=3,
+    response_mode="compact",
+)
+response = query_engine.query("What did the author do growing up?")
+
+```
+
+For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPTRetrievalPluginIndexDemo.ipynb).
@@ -0,0 +1,15 @@
+# Using Graph Stores
+
+## `NebulaGraphStore`
+
+We support a `NebulaGraphStore` integration, for persisting graphs directly in Nebula! Furthermore, you can generate cypher queries and return natural language responses for your Nebula graphs using the `KnowledgeGraphQueryEngine`.
+
+See the associated guides below:
+
+```{toctree}
+---
+maxdepth: 1
+---
+Nebula Graph Store </examples/index_structs/knowledge_graph/NebulaGraphKGIndexDemo.ipynb>
+Knowledge Graph Query Engine </examples/query_engine/knowledge_graph_query_engine.ipynb>
+```
@@ -0,0 +1,46 @@
+# Tracing with Graphsignal
+
+[Graphsignal](https://graphsignal.com/) provides observability for AI agents and LLM-powered applications. It helps developers ensure AI applications run as expected and users have the best experience.
+
+Graphsignal **automatically** traces and monitors LlamaIndex. Traces and metrics provide execution details for query, retrieval, and index operations. These insights include **prompts**, **completions**, **embedding statistics**, **retrieved nodes**, **parameters**, **latency**, and **exceptions**.
+
+When OpenAI APIs are used, Graphsignal provides additional insights such as **token counts** and **costs** per deployment, model or any context.
+
+
+### Installation and Setup
+
+Adding [Graphsignal tracer](https://github.com/graphsignal/graphsignal-python) is simple, just install and configure it:
+
+```sh
+pip install graphsignal
+```
+
+```python
+import graphsignal
+
+# Provide an API key directly or via GRAPHSIGNAL_API_KEY environment variable
+graphsignal.configure(api_key='my-api-key', deployment='my-llama-index-app-prod')
+```
+
+You can get an API key [here](https://app.graphsignal.com/).
+
+See the [Quick Start guide](https://graphsignal.com/docs/guides/quick-start/), [Integration guide](https://graphsignal.com/docs/integrations/llama-index/), and an [example app](https://github.com/graphsignal/examples/blob/main/llama-index-app/main.py) for more information.
+
+
+### Tracing Other Functions
+
+To additionally trace any function or code, you can use a decorator or a context manager:
+
+```python
+with graphsignal.start_trace('load-external-data'):
+    reader.load_data()
+```
+
+See [Python API Reference](https://graphsignal.com/docs/reference/python-api/) for complete instructions.
+
+
+### Useful Links
+
+* [Tracing and Monitoring LlamaIndex Applications](https://graphsignal.com/blog/tracing-and-monitoring-llama-index-applications/)
+* [Monitor OpenAI API Latency, Tokens, Rate Limits, and More](https://graphsignal.com/blog/monitor-open-ai-api-latency-tokens-rate-limits-and-more/)
+* [OpenAI API Cost Tracking: Analyzing Expenses by Model, Deployment, and Context](https://graphsignal.com/blog/open-ai-api-cost-tracking-analyzing-expenses-by-model-deployment-and-context/)
@@ -0,0 +1,90 @@
+
+# Guidance
+
+[Guidance](https://github.com/microsoft/guidance) is a guidance language for controlling large language models developed by Microsoft.
+
+Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text.
+
+## Structured Output
+One particularly exciting aspect of guidance is the ability to output structured objects (think JSON following a specific schema, or a pydantic object). Instead of just "suggesting" the desired output structure to the LLM, guidance can actually "force" the LLM output to follow the desired schema. This allows the LLM to focus on the content rather than the syntax, and completely eliminate the possibility of output parsing issues.
+
+This is particularly powerful for weaker LLMs which be smaller in parameter count, and not trained on sufficient source code data to be able to reliably produce well-formed, hierarchical structured output.
+
+### Creating a guidance program to generate pydantic objects 
+In LlamaIndex, we provide an initial integration with guidance, to make it super easy for generating structured output (more specifically pydantic objects).
+
+For example, if we want to generate an album of songs, with the following schema:
+
+```python
+class Song(BaseModel):
+    title: str
+    length_seconds: int
+    
+class Album(BaseModel):
+    name: str
+    artist: str
+    songs: List[Song]
+```
+
+It's as simple as creating a `GuidancePydanticProgram`, specifying our desired pydantic class `Album`, 
+and supplying a suitable prompt template.
+
+> Note: guidance uses handlebars-style templates, which uses double braces for variable substitution, and single braces for literal braces. This is the opposite convention of Python format strings. 
+
+> Note: We provide an utility function `from llama_index.prompts.guidance_utils import convert_to_handlebars` that can convert from the Python format string style template to guidance handlebars-style template.
+
+
+```python
+program = GuidancePydanticProgram(
+    output_cls=Album, 
+    prompt_template_str="Generate an example album, with an artist and a list of songs. Using the movie {{movie_name}} as inspiration",
+    guidance_llm=OpenAI('text-davinci-003'),
+    verbose=True,
+)
+
+```
+
+Now we can run the program by calling it with additional user input. 
+Here let's go for something spooky and create an album inspired by the Shining.
+```python
+output = program(movie_name='The Shining')
+```
+
+We have our pydantic object:
+```python
+Album(name='The Shining', artist='Jack Torrance', songs=[Song(title='All Work and No Play', length_seconds=180), Song(title='The Overlook Hotel', length_seconds=240), Song(title='The Shining', length_seconds=210)])
+```
+
+You can play with [this notebook](/examples/output_parsing/guidance_pydantic_program.ipynb) for more details.
+
+### Using guidance to improve the robustness of our sub-question query engine.
+LlamaIndex provides a toolkit of advanced query engines for tackling different use-cases.
+Several relies on structured output in intermediate steps.
+We can use guidance to improve the robustness of these query engines, by making sure the
+intermediate response has the expected structure (so that they can be parsed correctly to a structured object).
+
+As an example, we implement a `GuidanceQuestionGenerator` that can be plugged into a `SubQuestionQueryEngine` to make it more robust than using the default setting.
+```python
+from llama_index.question_gen.guidance_generator import GuidanceQuestionGenerator
+from guidance.llms import OpenAI as GuidanceOpenAI
+
+# define guidance based question generator
+question_gen = GuidanceQuestionGenerator.from_defaults(guidance_llm=GuidanceOpenAI('text-davinci-003'), verbose=False)
+
+# define query engine tools
+query_engine_tools = ...
+
+# construct sub-question query engine
+s_engine = SubQuestionQueryEngine.from_defaults(
+    question_gen=question_gen  # use guidance based question_gen defined above
+    query_engine_tools=query_engine_tools, 
+)
+```
+
+See [this notebook](/examples/output_parsing/guidance_sub_question.ipynb) for more details.
+
+
+
+
+
+
@@ -0,0 +1,35 @@
+# Evaluating and Tracking with TruLens
+
+This page covers how to use [TruLens](https://trulens.org) to evaluate and track LLM apps built on Llama-Index.
+
+## What is TruLens?
+
+TruLens is an [opensource](https://github.com/truera/trulens) package that provides instrumentation and evaluation tools for large language model (LLM) based applications. This includes feedback function evaluations of relevance, sentiment and more, plus in-depth tracing including cost and latency.
+
+![TruLens Architecture](https://github.com/truera/trulens/blob/main/docs/Assets/image/TruLens_Architecture.png)
+
+As you iterate on new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. You'll also be able to view evaluations at a record level, and explore the app metadata for each record.
+
+### Installation and Setup
+
+Adding TruLens is simple, just install it from pypi!
+
+```sh
+pip install trulens-eval
+```
+
+```python
+from trulens_eval import TruLlama
+
+```
+
+## Try it out!
+
+[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.4.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/google-colab/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb)
+
+## Read more
+
+* [Build and Evaluate LLM Apps with LlamaIndex and TruLens](https://medium.com/llamaindex-blog/build-and-evaluate-llm-apps-with-llamaindex-and-trulens-6749e030d83c)
+
+* [trulens.org](https://www.trulens.org/)
@@ -0,0 +1,69 @@
+# Using with Langchain 🦜🔗
+
+LlamaIndex provides both Tool abstractions for a Langchain agent as well as a memory module.
+
+The API reference of the Tool abstractions + memory modules are [here](/api_reference/langchain_integrations/base.rst).
+
+### Use any data loader as a Langchain Tool
+
+LlamaIndex allows you to use any data loader within the LlamaIndex core repo or in [LlamaHub](https://llamahub.ai/) as an "on-demand" data query Tool within a LangChain agent.
+
+The Tool will 1) load data using the data loader, 2) index the data, and 3) query the data and return the response in an ad-hoc manner.
+
+**Resources**
+- [OnDemandLoaderTool Tutorial](/examples/tools/OnDemandLoaderTool.ipynb)
+
+
+### Use a query engine as a Langchain Tool
+LlamaIndex provides Tool abstractions so that you can use a LlamaIndex query engine along with a Langchain agent. 
+
+For instance, you can choose to create a "Tool" from an `QueryEngine` directly as follows:
+
+```python
+from llama_index.langchain_helpers.agents import IndexToolConfig, LlamaIndexTool
+
+tool_config = IndexToolConfig(
+    query_engine=query_engine, 
+    name=f"Vector Index",
+    description=f"useful for when you want to answer queries about X",
+    tool_kwargs={"return_direct": True}
+)
+
+tool = LlamaIndexTool.from_tool_config(tool_config)
+
+```
+
+You can also choose to provide a `LlamaToolkit`:
+
+```python
+toolkit = LlamaToolkit(
+    index_configs=index_configs,
+)
+```
+
+Such a toolkit can be used to create a downstream Langchain-based chat agent through
+our `create_llama_agent` and `create_llama_chat_agent` commands:
+
+```python
+from llama_index.langchain_helpers.agents import create_llama_chat_agent
+
+agent_chain = create_llama_chat_agent(
+    toolkit,
+    llm,
+    memory=memory,
+    verbose=True
+)
+
+agent_chain.run(input="Query about X")
+```
+
+You can take a look at [the full tutorial notebook here](https://github.com/jerryjliu/llama_index/blob/main/examples/chatbot/Chatbot_SEC.ipynb).
+
+
+### Llama Demo Notebook: Tool + Memory module
+
+We provide another demo notebook showing how you can build a chat agent with the following components.
+- Using LlamaIndex as a generic callable tool with a Langchain agent
+- Using LlamaIndex as a memory module; this allows you to insert arbitrary amounts of conversation history with a Langchain chatbot!
+
+Please see the [notebook here](https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb).
@@ -0,0 +1,459 @@
+# Using Vector Stores
+
+LlamaIndex offers multiple integration points with vector stores / vector databases:
+
+1. LlamaIndex can use a vector store itself as an index. Like any other index, this index can store documents and be used to answer queries.
+2. LlamaIndex can load data from vector stores, similar to any other data connector. This data can then be used within LlamaIndex data structures.
+
+(vector-store-index)=
+
+## Using a Vector Store as an Index
+
+LlamaIndex also supports different vector stores
+as the storage backend for `VectorStoreIndex`.
+
+- Chroma (`ChromaVectorStore`) [Installation](https://docs.trychroma.com/getting-started)
+- DeepLake (`DeepLakeVectorStore`) [Installation](https://docs.deeplake.ai/en/latest/Installation.html)
+- Qdrant (`QdrantVectorStore`) [Installation](https://qdrant.tech/documentation/install/) [Python Client](https://qdrant.tech/documentation/install/#python-client)
+- Weaviate (`WeaviateVectorStore`). [Installation](https://weaviate.io/developers/weaviate/installation). [Python Client](https://weaviate.io/developers/weaviate/client-libraries/python).
+- Pinecone (`PineconeVectorStore`). [Installation/Quickstart](https://docs.pinecone.io/docs/quickstart).
+- Faiss (`FaissVectorStore`). [Installation](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md).
+- Milvus (`MilvusVectorStore`). [Installation](https://milvus.io/docs)
+- Zilliz (`MilvusVectorStore`). [Quickstart](https://zilliz.com/doc/quick_start)
+- MyScale (`MyScaleVectorStore`). [Quickstart](https://docs.myscale.com/en/quickstart/). [Installation/Python Client](https://docs.myscale.com/en/python-client/).
+- Supabase (`SupabaseVectorStore`). [Quickstart](https://supabase.github.io/vecs/api/).
+- DocArray (`DocArrayHnswVectorStore`, `DocArrayInMemoryVectorStore`). [Installation/Python Client](https://github.com/docarray/docarray#installation).
+- MongoDB Atlas (`MongoDBAtlasVectorSearch`). [Installation/Quickstart] (https://www.mongodb.com/atlas/database).
+- Redis (`RedisVectorStore`). [Installation](https://redis.io/docs/getting-started/installation/).
+
+A detailed API reference is [found here](/api_reference/indices/vector_store.rst).
+
+Similar to any other index within LlamaIndex (tree, keyword table, list), `VectorStoreIndex` can be constructed upon any collection
+of documents. We use the vector store within the index to store embeddings for the input text chunks.
+
+Once constructed, the index can be used for querying.
+
+**Default Vector Store Index Construction/Querying**
+
+By default, `VectorStoreIndex` uses a in-memory `SimpleVectorStore`
+that's initialized as part of the default storage context.
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+
+# Load documents and build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectorStoreIndex.from_documents(documents)
+
+# Query index
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do growing up?")
+
+```
+
+**Custom Vector Store Index Construction/Querying**
+
+We can query over a custom vector store as follows:
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
+from llama_index.vector_stores import DeepLakeVectorStore
+
+# construct vector store and customize storage context
+storage_context = StorageContext.from_defaults(
+    vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
+)
+
+# Load documents and build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
+
+# Query index
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do growing up?")
+```
+
+Below we show more examples of how to construct various vector stores we support.
+
+**Redis**
+
+First, start Redis-Stack (or get url from Redis provider)
+
+```bash
+docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
+```
+
+Then connect and use Redis as a vector database with LlamaIndex
+
+```python
+from llama_index.vector_stores import RedisVectorStore
+vector_store = RedisVectorStore(
+    index_name="llm-project",
+    redis_url="redis://localhost:6379",
+    overwrite=True
+)
+```
+
+This can be used with the `VectorStoreIndex` to provide a query interface for retrieval, querying, deleting, persisting the index, and more.
+
+**DeepLake**
+
+```python
+import os
+import getpath
+from llama_index.vector_stores import DeepLakeVectorStore
+
+os.environ["OPENAI_API_KEY"] = getpath.getpath("OPENAI_API_KEY: ")
+os.environ["ACTIVELOOP_TOKEN"] = getpath.getpath("ACTIVELOOP_TOKEN: ")
+dataset_path = "hub://adilkhan/paul_graham_essay"
+
+# construct vector store
+vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)
+```
+
+**Faiss**
+
+```python
+import faiss
+from llama_index.vector_stores import FaissVectorStore
+
+# create faiss index
+d = 1536
+faiss_index = faiss.IndexFlatL2(d)
+
+# construct vector store
+vector_store = FaissVectorStore(faiss_index)
+
+...
+
+# NOTE: since faiss index is in-memory, we need to explicitly call
+#       vector_store.persist() or storage_context.persist() to save it to disk.
+#       persist() takes in optional arg persist_path. If none give, will use default paths.
+storage_context.persist()
+```
+
+**Weaviate**
+
+```python
+import weaviate
+from llama_index.vector_stores import WeaviateVectorStore
+
+# creating a Weaviate client
+resource_owner_config = weaviate.AuthClientPassword(
+    username="<username>",
+    password="<password>",
+)
+client = weaviate.Client(
+    "https://<cluster-id>.semi.network/", auth_client_secret=resource_owner_config
+)
+
+# construct vector store
+vector_store = WeaviateVectorStore(weaviate_client=client)
+```
+
+**Pinecone**
+
+```python
+import pinecone
+from llama_index.vector_stores import PineconeVectorStore
+
+# Creating a Pinecone index
+api_key = "api_key"
+pinecone.init(api_key=api_key, environment="us-west1-gcp")
+pinecone.create_index(
+    "quickstart",
+    dimension=1536,
+    metric="euclidean",
+    pod_type="p1"
+)
+index = pinecone.Index("quickstart")
+
+# can define filters specific to this vector index (so you can
+# reuse pinecone indexes)
+metadata_filters = {"title": "paul_graham_essay"}
+
+# construct vector store
+vector_store = PineconeVectorStore(
+    pinecone_index=index,
+    metadata_filters=metadata_filters
+)
+```
+
+**Qdrant**
+
+```python
+import qdrant_client
+from llama_index.vector_stores import QdrantVectorStore
+
+# Creating a Qdrant vector store
+client = qdrant_client.QdrantClient(
+    host="<qdrant-host>",
+    api_key="<qdrant-api-key>",
+    https=True
+)
+collection_name = "paul_graham"
+
+# construct vector store
+vector_store = QdrantVectorStore(
+    client=client,
+    collection_name=collection_name,
+)
+```
+
+**Chroma**
+
+```python
+import chromadb
+from llama_index.vector_stores import ChromaVectorStore
+
+# Creating a Chroma client
+# By default, Chroma will operate purely in-memory.
+chroma_client = chromadb.Client()
+chroma_collection = chroma_client.create_collection("quickstart")
+
+# construct vector store
+vector_store = ChromaVectorStore(
+    chroma_collection=chroma_collection,
+)
+```
+
+**Milvus**
+
+- Milvus Index offers the ability to store both Documents and their embeddings. Documents are limited to the predefined Document attributes and does not include metadata.
+
+```python
+import pymilvus
+from llama_index.vector_stores import MilvusVectorStore
+
+# construct vector store
+vector_store = MilvusVectorStore(
+    host='localhost',
+    port=19530,
+    overwrite='True'
+)
+
+```
+
+**Note**: `MilvusVectorStore` depends on the `pymilvus` library.
+Use `pip install pymilvus` if not already installed.
+If you get stuck at building wheel for `grpcio`, check if you are using python 3.11
+(there's a known issue: https://github.com/milvus-io/pymilvus/issues/1308)
+and try downgrading.
+
+**Zilliz**
+
+- Zilliz Cloud (hosted version of Milvus) uses the Milvus Index with some extra arguments.
+
+```python
+import pymilvus
+from llama_index.vector_stores import MilvusVectorStore
+
+
+# construct vector store
+vector_store = MilvusVectorStore(
+    host='foo.vectordb.zillizcloud.com',
+    port=403,
+    user="db_admin",
+    password="foo",
+    use_secure=True,
+    overwrite='True'
+)
+```
+
+**Note**: `MilvusVectorStore` depends on the `pymilvus` library.
+Use `pip install pymilvus` if not already installed.
+If you get stuck at building wheel for `grpcio`, check if you are using python 3.11
+(there's a known issue: https://github.com/milvus-io/pymilvus/issues/1308)
+and try downgrading.
+
+**MyScale**
+
+```python
+import clickhouse_connect
+from llama_index.vector_stores import MyScaleVectorStore
+
+# Creating a MyScale client
+client = clickhouse_connect.get_client(
+    host='YOUR_CLUSTER_HOST',
+    port=8443,
+    username='YOUR_USERNAME',
+    password='YOUR_CLUSTER_PASSWORD'
+)
+
+
+# construct vector store
+vector_store = MyScaleVectorStore(
+    myscale_client=client
+)
+```
+
+**DocArray**
+
+```python
+from llama_index.vector_stores import (
+    DocArrayHnswVectorStore, 
+    DocArrayInMemoryVectorStore,
+)
+
+# construct vector store
+vector_store = DocArrayHnswVectorStore(work_dir='hnsw_index')
+
+# alternatively, construct the in-memory vector store
+vector_store = DocArrayInMemoryVectorStore()
+```
+
+**MongoDBAtlas**
+```python
+# Provide URI to constructor, or use environment variable
+import pymongo
+from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
+from llama_index.indices.vector_store.base import VectorStoreIndex
+from llama_index.storage.storage_context import StorageContext
+from llama_index.readers.file.base import SimpleDirectoryReader
+
+# mongo_uri = os.environ["MONGO_URI"]
+mongo_uri = "mongodb+srv://<username>:<password>@<host>?retryWrites=true&w=majority"
+mongodb_client = pymongo.MongoClient(mongo_uri)
+
+# construct store
+store = MongoDBAtlasVectorSearch(mongodb_client)
+storage_context = StorageContext.from_defaults(vector_store=store)
+uber_docs = SimpleDirectoryReader(input_files=["../data/10k/uber_2021.pdf"]).load_data()
+
+# construct index
+index = VectorStoreIndex.from_documents(uber_docs, storage_context=storage_context)
+```
+
+[Example notebooks can be found here](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/vector_stores).
+
+## Loading Data from Vector Stores using Data Connector
+
+LlamaIndex supports loading data from the following sources. See [Data Connectors](../connector/root.md) for more details and API documentation.
+
+Chroma stores both documents and vectors. This is an example of how to use Chroma:
+
+```python
+
+from llama_index.readers.chroma import ChromaReader
+from llama_index.indices import ListIndex
+
+# The chroma reader loads data from a persisted Chroma collection.
+# This requires a collection name and a persist directory.
+reader = ChromaReader(
+    collection_name="chroma_collection",
+    persist_directory="examples/data_connectors/chroma_collection"
+)
+
+query_vector=[n1, n2, n3, ...]
+
+documents = reader.load_data(collection_name="demo", query_vector=query_vector, limit=5)
+index = ListIndex.from_documents(documents)
+
+query_engine = index.as_query_engine()
+response = query_engine.query("<query_text>")
+display(Markdown(f"<b>{response}</b>"))
+```
+
+Qdrant also stores both documents and vectors. This is an example of how to use Qdrant:
+
+```python
+
+from llama_index.readers.qdrant import QdrantReader
+
+reader = QdrantReader(host="localhost")
+
+# the query_vector is an embedding representation of your query_vector
+# Example query_vector
+# query_vector = [0.3, 0.3, 0.3, 0.3, ...]
+
+query_vector = [n1, n2, n3, ...]
+
+# NOTE: Required args are collection_name, query_vector.
+# See the Python client: https;//github.com/qdrant/qdrant_client
+# for more details
+
+documents = reader.load_data(collection_name="demo", query_vector=query_vector, limit=5)
+
+```
+
+NOTE: Since Weaviate can store a hybrid of document and vector objects, the user may either choose to explicitly specify `class_name` and `properties` in order to query documents, or they may choose to specify a raw GraphQL query. See below for usage.
+
+```python
+# option 1: specify class_name and properties
+
+# 1) load data using class_name and properties
+documents = reader.load_data(
+    class_name="<class_name>",
+    properties=["property1", "property2", "..."],
+    separate_documents=True
+)
+
+# 2) example GraphQL query
+query = """
+{
+    Get {
+        <class_name> {
+            <property1>
+            <property2>
+        }
+    }
+}
+"""
+
+documents = reader.load_data(graphql_query=query, separate_documents=True)
+```
+
+NOTE: Both Pinecone and Faiss data loaders assume that the respective data sources only store vectors; text content is stored elsewhere. Therefore, both data loaders require that the user specifies an `id_to_text_map` in the load_data call.
+
+For instance, this is an example usage of the Pinecone data loader `PineconeReader`:
+
+```python
+
+from llama_index.readers.pinecone import PineconeReader
+
+reader = PineconeReader(api_key=api_key, environment="us-west1-gcp")
+
+id_to_text_map = {
+    "id1": "text blob 1",
+    "id2": "text blob 2",
+}
+
+query_vector=[n1, n2, n3, ..]
+
+documents = reader.load_data(
+    index_name="quickstart", id_to_text_map=id_to_text_map, top_k=3, vector=query_vector, separate_documents=True
+)
+
+```
+
+[Example notebooks can be found here](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/data_connectors).
+
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+../../examples/vector_stores/SimpleIndexDemo.ipynb
+../../examples/vector_stores/SimpleIndexDemoMMR.ipynb
+../../examples/vector_stores/RedisIndexDemo.ipynb
+../../examples/vector_stores/QdrantIndexDemo.ipynb
+../../examples/vector_stores/FaissIndexDemo.ipynb
+../../examples/vector_stores/DeepLakeIndexDemo.ipynb
+../../examples/vector_stores/MyScaleIndexDemo.ipynb
+../../examples/vector_stores/MetalIndexDemo.ipynb
+../../examples/vector_stores/WeaviateIndexDemo.ipynb
+../../examples/vector_stores/OpensearchDemo.ipynb
+../../examples/vector_stores/PineconeIndexDemo.ipynb
+../../examples/vector_stores/ChromaIndexDemo.ipynb
+../../examples/vector_stores/LanceDBIndexDemo.ipynb
+../../examples/vector_stores/MilvusIndexDemo.ipynb
+../../examples/vector_stores/WeaviateIndexDemo-Hybrid.ipynb
+../../examples/vector_stores/PineconeIndexDemo-Hybrid.ipynb
+../../examples/vector_stores/AsyncIndexCreationDemo.ipynb
+../../examples/vector_stores/SupabaseVectorIndexDemo.ipynb
+../../examples/vector_stores/DocArrayHnswIndexDemo.ipynb
+../../examples/vector_stores/DocArrayInMemoryIndexDemo.ipynb
+../../examples/vector_stores/MongoDBAtlasVectorSearch.ipynb
+../../examples/vector_stores/postgres.ipynb
+```
@@ -0,0 +1,77 @@
+"""Configuration for sphinx."""
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+import sphinx_rtd_theme  # noqa: F401
+
+sys.path.insert(0, os.path.abspath("../"))
+
+with open("../llama_index/VERSION") as f:
+    version = f.read()
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+
+project = "LlamaIndex 🦙"
+copyright = "2022, Jerry Liu"
+author = "Jerry Liu"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.coverage",
+    "sphinx.ext.autodoc.typehints",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.napoleon",
+    "sphinx_rtd_theme",
+    "sphinx.ext.mathjax",
+    "m2r2",
+    "myst_nb",
+    "sphinxcontrib.autodoc_pydantic",
+]
+
+myst_heading_anchors = 4
+# TODO: Fix the non-consecutive header level in our docs, until then
+# disable the sphinx/myst warnings
+suppress_warnings = ["myst.header"]
+
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = "furo"
+html_title = project + " " + version
+html_static_path = ["_static"]
+
+html_css_files = [
+    "css/custom.css",
+    "css/algolia.css",
+    "https://cdn.jsdelivr.net/npm/@docsearch/css@3",
+]
+html_js_files = [
+    "js/mendablesearch.js",
+    (
+        "https://cdn.jsdelivr.net/npm/@docsearch/js@3.3.3/dist/umd/index.js",
+        {"defer": "defer"},
+    ),
+    ("js/algolia.js", {"defer": "defer"}),
+]
+
+nb_execution_mode = "off"
@@ -0,0 +1,27 @@
+# Module Guides
+
+These guide provide an overview of how to use our agent classes.
+
+For more detailed guides on how to use specific tools, check out our [tools module guides]().
+
+## OpenAI Agent
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/agent/openai_agent.ipynb
+/examples/agent/openai_agent_with_query_engine.ipynb
+/examples/agent/openai_agent_retrieval.ipynb
+/examples/agent/openai_agent_query_cookbook.ipynb
+/examples/agent/openai_agent_query_plan.ipynb
+/examples/agent/openai_agent_context_retrieval.ipynb
+```
+
+## ReAct Agent
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/agent/react_agent_with_query_engine.ipynb
+```
+
@@ -0,0 +1,69 @@
+# Data Agents
+
+## Concept
+Data Agents are LLM-powered knowledge workers in LlamaIndex that can intelligently perform various tasks over your data, in both a “read” and “write” function. They are capable of the following:
+
+- Perform automated search and retrieval over different types of data - unstructured, semi-structured, and structured.
+- Calling any external service API in a structured fashion, and processing the response + storing it for later.
+
+In that sense, agents are a step beyond our [query engines](/core_modules/query_modules/query_engine/root.md) in that they can not only "read" from a static source of data, but can dynamically ingest and modify data from a variety of different tools.
+
+Building a data agent requires the following core components:
+
+- A reasoning loop
+- Tool abstractions
+
+A data agent is initialized with set of APIs, or Tools, to interact with; these APIs can be called by the agent to return information or modify state. Given an input task, the data agent uses a reasoning loop to decide which tools to use, in which sequence, and the parameters to call each tool.
+
+### Reasoning Loop
+The reasoning loop depends on the type of agent. We have support for the following agents: 
+- OpenAI Function agent (built on top of the OpenAI Function API)
+- a ReAct agent (which works across any chat/text completion endpoint).
+
+### Tool Abstractions
+
+You can learn more about our Tool abstractions in our [Tools section](/core_modules/agent_modules/tools/root.md).
+
+### Blog Post
+
+For full details, please check out our detailed [blog post](https://medium.com/llamaindex-blog/data-agents-eed797d7972f).
+
+
+## Usage Pattern
+
+Data agents can be used in the following manner (the example uses the OpenAI Function API)
+```python
+from llama_index.agent import OpenAIAgent
+from llama_index.llms import OpenAI
+
+# import and define tools
+...
+
+# initialize llm
+llm = OpenAI(model="gpt-3.5-turbo-0613")
+
+# initialize openai agent
+agent = OpenAIAgent.from_tools(tools, llm=llm, verbose=True)
+```
+
+See our usage pattern guide for more details.
+```{toctree}
+---
+maxdepth: 1
+---
+usage_pattern.md
+```
+
+## Modules
+
+Learn more about our different agent types in our module guides below.
+
+Also take a look at our [tools section](/core_modules/agent_modules/tools/root.md)!
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
+
@@ -0,0 +1,193 @@
+# Usage Pattern
+
+## Get Started
+
+An agent is initialized from a set of Tools. Here's an example of instantiating a ReAct
+agent from a set of Tools.
+
+```python
+from llama_index.tools import FunctionTool
+from llama_index.llms import OpenAI
+from llama_index.agent import ReActAgent
+
+# define sample Tool
+def multiply(a: int, b: int) -> int:
+    """Multiple two integers and returns the result integer"""
+    return a * b
+
+multiply_tool = FunctionTool.from_defaults(fn=multiply)
+
+# initialize llm
+llm = OpenAI(model="gpt-3.5-turbo-0613")
+
+# initialize ReAct agent
+agent = ReActAgent.from_tools([multiply_tool], llm=llm, verbose=True)
+```
+
+An agent supports both `chat` and `query` endpoints, inheriting from our `ChatEngine` and `QueryEngine` respectively.
+
+Example usage:
+```python
+agent.chat("What is 2123 * 215123")
+```
+
+
+## Query Engine Tools
+
+It is easy to wrap query engines as tools for an agent as well. Simply do the following:
+
+```python
+
+from llama_index.agent import ReActAgent
+from llama_index.tools import QueryEngineTool
+
+# NOTE: lyft_index and uber_index are both SimpleVectorIndex instances
+lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
+uber_engine = uber_index.as_query_engine(similarity_top_k=3)
+
+query_engine_tools = [
+    QueryEngineTool(
+        query_engine=lyft_engine,
+        metadata=ToolMetadata(
+            name="lyft_10k",
+            description="Provides information about Lyft financials for year 2021. "
+            "Use a detailed plain text question as input to the tool.",
+        ),
+    ),
+    QueryEngineTool(
+        query_engine=uber_engine,
+        metadata=ToolMetadata(
+            name="uber_10k",
+            description="Provides information about Uber financials for year 2021. "
+            "Use a detailed plain text question as input to the tool.",
+        ),
+    ),
+]
+
+# initialize ReAct agent
+agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
+
+```
+
+## Use other agents as Tools
+
+A nifty feature of our agents is that since they inherit from `BaseQueryEngine`, you can easily define other agents as tools
+through our `QueryEngineTool`. 
+
+```python
+from llama_index.tools import QueryEngineTool
+
+query_engine_tools = [
+    QueryEngineTool(
+        query_engine=sql_agent,
+        metadata=ToolMetadata(
+            name="sql_agent",
+            description="Agent that can execute SQL queries."
+        ),
+    ),
+    QueryEngineTool(
+        query_engine=gmail_agent,
+        metadata=ToolMetadata(
+            name="gmail_agent",
+            description="Tool that can send emails on Gmail."
+        ),
+    ),
+]
+
+outer_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
+```
+
+## Advanced Concepts (for `OpenAIAgent`, in beta)
+
+You can also use agents in more advanced settings. For instance, being able to retrieve tools from an index during query-time, and
+being able to perform query planning over an existing set of Tools.
+
+These are largely implemented with our `OpenAIAgent` classes (which depend on the OpenAI Function API). Support
+for our more general `ReActAgent` is something we're actively investigating.
+
+NOTE: these are largely still in beta. The abstractions may change and become more general over time.
+
+### Function Retrieval Agents
+
+If the set of Tools is very large, you can create an `ObjectIndex` to index the tools, and then pass in an `ObjectRetriever` to the agent during query-time, to first dynamically retrieve the relevant tools before having the agent pick from the candidate tools.
+
+We first build an `ObjectIndex` over an existing set of Tools.
+
+```python
+# define an "object" index over these tools
+from llama_index import VectorStoreIndex
+from llama_index.objects import ObjectIndex, SimpleToolNodeMapping
+
+tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
+obj_index = ObjectIndex.from_objects(
+    all_tools,
+    tool_mapping,
+    VectorStoreIndex,
+)
+```
+
+We then define our `FnRetrieverOpenAIAgent`:
+
+```python
+from llama_index.agent import FnRetrieverOpenAIAgent
+
+agent = FnRetrieverOpenAIAgent.from_retriever(obj_index.as_retriever(), verbose=True)
+```
+
+### Context Retrieval Agents
+
+Our context-augmented OpenAI Agent will always perform retrieval before calling any tools.
+
+This helps to provide additional context that can help the agent better pick Tools, versus
+just trying to make a decision without any context.
+
+```python
+from llama_index.schema import Document
+from llama_index.agent import ContextRetrieverOpenAIAgent
+
+
+# toy index - stores a list of abbreviations
+texts = [
+    "Abbrevation: X = Revenue",
+    "Abbrevation: YZ = Risk Factors",
+    "Abbreviation: Z = Costs",
+]
+docs = [Document(text=t) for t in texts]
+context_index = VectorStoreIndex.from_documents(docs)
+
+# add context agent
+context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
+    query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
+)
+response = context_agent.chat("What is the YZ of March 2022?")
+```
+
+### Query Planning
+
+OpenAI Function Agents can be capable of advanced query planning. The trick is to provide the agent
+with a `QueryPlanTool` - if the agent calls the QueryPlanTool, it is forced to infer a full Pydantic schema representing a query
+plan over a set of subtools.
+
+```python
+# define query plan tool
+from llama_index.tools import QueryPlanTool
+from llama_index import get_response_synthesizer
+
+response_synthesizer = get_response_synthesizer(service_context=service_context)
+query_plan_tool = QueryPlanTool.from_defaults(
+    query_engine_tools=[query_tool_sept, query_tool_june, query_tool_march],
+    response_synthesizer=response_synthesizer,
+)
+
+# initialize agent
+agent = OpenAIAgent.from_tools(
+    [query_plan_tool],
+    max_function_calls=10,
+    llm=OpenAI(temperature=0, model="gpt-4-0613"),
+    verbose=True,
+)
+
+# should output a query plan to call march, june, and september tools
+response = agent.query("Analyze Uber revenue growth in March, June, and September")
+
+```
@@ -0,0 +1,65 @@
+# LlamaHub Tools Guide
+
+We offer a rich set of Tool Specs that are offered through [LlamaHub](https://llamahub.ai/) 🦙. 
+![](/_static/data_connectors/llamahub.png)
+
+These tool specs represent an initial curated list of services that an agent can interact with and enrich its capability to perform different actions. 
+
+We also provide a list of **utility tools** that help to abstract away pain points when designing agents to interact with different API services that return large amounts of data.
+
+## Tool Specs
+
+Coming soon! 
+
+## Utility Tools
+
+Oftentimes, directly querying an API can return a massive volume of data, which on its own may overflow the context window of the LLM (or at the very least unnecessarily increase the number of tokens that you are using). 
+
+To tackle this, we’ve provided an initial set of “utility tools” in LlamaHub Tools - utility tools are not conceptually tied to a given service (e.g. Gmail, Notion), but rather can augment the capabilities of existing Tools. In this particular case, utility tools help to abstract away common patterns of needing to cache/index and query data that’s returned from any API request.
+
+Let’s walk through our two main utility tools below.
+
+### OnDemandLoaderTool
+
+This tool turns any existing LlamaIndex data loader ( `BaseReader` class) into a tool that an agent can use. The tool can be called with all the parameters needed to trigger `load_data` from the data loader, along with a natural language query string. During execution, we first load data from the data loader, index it (for instance with a vector store), and then query it “on-demand”. All three of these steps happen in a single tool call.
+
+Oftentimes this can be preferable to figuring out how to load and index API data yourself. While this may allow for data reusability, oftentimes users just need an ad-hoc index to abstract away prompt window limitations for any API call. 
+
+A usage example is given below:
+
+```python
+from llama_hub.wikipedia.base import WikipediaReader
+from llama_index.tools.on_demand_loader_tool import OnDemandLoaderTool
+
+tool = OnDemandLoaderTool.from_defaults(
+	reader,
+	name="Wikipedia Tool",
+	description="A tool for loading data and querying articles from Wikipedia"
+)
+```
+
+### LoadAndSearchToolSpec
+
+The LoadAndSearchToolSpec takes in any existing Tool as input. As a tool spec, it implements `to_tool_list` , and when that function is called, two tools are returned: a `load` tool and then a `search` tool.
+
+The `load` Tool execution would call the underlying Tool, and the index the output (by default with a vector index). The `search` Tool execution would take in a query string as input and call the underlying index.
+
+This is helpful for any API endpoint that will by default return large volumes of data - for instance our WikipediaToolSpec will by default return entire Wikipedia pages, which will easily overflow most LLM context windows.
+
+Example usage is shown below:
+
+```python
+from llama_hub.tools.wikipedia.base import WikipediaToolSpec
+from llama_index.tools.tool_spec.load_and_search import LoadAndSearchToolSpec
+
+wiki_spec = WikipediaToolSpec()
+# Get the search wikipedia tool
+tool = wiki_spec.to_tool_list()[1]
+
+# Create the Agent with load/search tools
+agent = OpenAIAgent.from_tools(
+ LoadAndSearchToolSpec.from_defaults(
+    tool
+ ).to_tool_list(), verbose=True
+)
+```
@@ -0,0 +1,71 @@
+# Tools
+
+## Concept
+
+Having proper tool abstractions is at the core of building [data agents](/core_modules/agent_modules/agents/root.md). Defining a set of Tools is similar to defining any API interface, with the exception that these Tools are meant for agent rather than human use. We allow users to define both a **Tool** as well as a **ToolSpec** containing a series of functions under the hood. 
+
+A Tool implements a very generic interface - simply define `__call__` and also return some basic metadata (name, description, function schema).
+
+A Tool Spec defines a full API specification of any service that can be converted into a list of Tools.
+
+We offer a few different types of Tools:
+- `FunctionTool`: A function tool allows users to easily convert any user-defined function into a Tool. It can also auto-infer the function schema.
+- `QueryEngineTool`: A tool that wraps an existing [query engine](/core_modules/query_modules/root.md). Note: since our agent abstractions inherit from `BaseQueryEngine`, these tools can also wrap other agents.
+
+We offer a rich set of Tools and Tool Specs through [LlamaHub](https://llamahub.ai/) 🦙. 
+
+### Blog Post
+
+For full details, please check out our detailed [blog post]().
+
+## Usage Pattern
+
+Our Tool Specs and Tools can be imported from the `llama-hub` package.
+
+To use with our agent,
+```python
+from llama_index.agent import OpenAIAgent
+from llama_hub.tools.gmail.base import GmailToolSpec
+
+tool_spec = GmailToolSpec()
+agent = OpenAIAgent.from_tools(tool_spec.to_tool_list(), verbose=True)
+
+```
+
+See our Usage Pattern Guide for more details.
+```{toctree}
+---
+maxdepth: 1
+---
+usage_pattern.md
+```
+
+## LlamaHub Tools Guide 🛠️
+
+Check out our guide for a full overview of the Tools/Tool Specs in LlamaHub! 
+```{toctree}
+---
+maxdepth: 1
+---
+llamahub_tools_guide.md
+```
+
+
+<!-- We offer a rich set of Tool Specs that are offered through [LlamaHub](https://llamahub.ai/) 🦙. 
+These tool specs represent an initial curated list of services that an agent can interact with and enrich its capability to perform different actions. 
+
+![](/_static/data_connectors/llamahub.png) -->
+
+
+<!-- ## Module Guides
+```{toctree}
+---
+maxdepth: 1
+---
+modules.md
+```
+
+## Tool Example Notebooks
+
+Coming soon!  -->
+
@@ -0,0 +1,35 @@
+# Usage Pattern
+
+LlamaHub Tool Specs and Tools can be imported from the `llama-hub` package. They can be plugged into our native agents, or LangChain agents.
+
+## Using with our Agents
+
+To use with our OpenAIAgent,
+```python
+from llama_index.agent import OpenAIAgent
+from llama_hub.tools.gmail.base import GmailToolSpec
+
+tool_spec = GmailToolSpec()
+agent = OpenAIAgent.from_tools(tool_spec.to_tool_list(), verbose=True)
+
+# use agent
+agent.chat("Can you create a new email to helpdesk and support @example.com about a service outage")
+```
+
+Full Tool details can be found on our [LlamaHub](llamahub.ai) page. Each tool contains a "Usage" section showing how that tool can be used.
+
+
+## Using with LangChain
+To use with a LangChain agent, simply convert tools to LangChain tools with `to_langchain_tool()`.
+
+```python
+tools = tool_spec.to_tool_list()
+langchain_tools = [t.to_langchain_tool() for t in tools]
+# plug into LangChain agent
+from langchain.agents import initialize_agent
+
+agent_executor = initialize_agent(
+    langchain_tools, llm, agent="conversational-react-description", memory=memory
+)
+
+```
@@ -0,0 +1,31 @@
+# Module Guides
+
+
+```{toctree}
+---
+maxdepth: 1
+---
+../../../examples/data_connectors/PsychicDemo.ipynb
+../../../examples/data_connectors/DeepLakeReader.ipynb
+../../../examples/data_connectors/QdrantDemo.ipynb
+../../../examples/data_connectors/DiscordDemo.ipynb
+../../../examples/data_connectors/MongoDemo.ipynb
+../../../examples/data_connectors/ChromaDemo.ipynb
+../../../examples/data_connectors/MyScaleReaderDemo.ipynb
+../../../examples/data_connectors/FaissDemo.ipynb
+../../../examples/data_connectors/ObsidianReaderDemo.ipynb
+../../../examples/data_connectors/SlackDemo.ipynb
+../../../examples/data_connectors/WebPageDemo.ipynb
+../../../examples/data_connectors/PineconeDemo.ipynb
+../../../examples/data_connectors/MboxReaderDemo.ipynb
+../../../examples/data_connectors/MilvusReaderDemo.ipynb
+../../../examples/data_connectors/NotionDemo.ipynb
+../../../examples/data_connectors/GithubRepositoryReaderDemo.ipynb
+../../../examples/data_connectors/GoogleDocsDemo.ipynb
+../../../examples/data_connectors/DatabaseReaderDemo.ipynb
+../../../examples/data_connectors/TwitterDemo.ipynb
+../../../examples/data_connectors/WeaviateDemo.ipynb
+../../../examples/data_connectors/MakeDemo.ipynb
+../../../examples/data_connectors/deplot/DeplotReader.ipynb
+```
+
@@ -0,0 +1,52 @@
+# Data Connectors (LlamaHub)
+
+## Concept
+A data connector (i.e. `Reader`) ingest data from different data sources and data formats into a simple `Document` representation (text and simple metadata).
+
+```{tip}
+Once you've ingested your data, you can build an [Index](/core_modules/data_modules/index/root.md) on top, ask questions using a [Query Engine](/core_modules/query_modules/query_engine/root.md), and have a conversation using a [Chat Engine](/core_modules/query_modules/chat_engines/root.md).
+```
+
+## LlamaHub
+Our data connectors are offered through [LlamaHub](https://llamahub.ai/) 🦙. 
+LlamaHub is an open-source repository containing data loaders that you can easily plug and play into any LlamaIndex application.
+
+![](/_static/data_connectors/llamahub.png)
+
+
+## Usage Pattern
+Get started with:
+```python
+from llama_index import download_loader
+
+GoogleDocsReader = download_loader('GoogleDocsReader')
+loader = GoogleDocsReader()
+documents = loader.load_data(document_ids=[...])
+```
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+
+## Modules
+
+Some sample data connectors:
+- local file directory (`SimpleDirectoryReader`). Can support parsing a wide range of file types: `.pdf`, `.jpg`, `.png`, `.docx`, etc.
+- [Notion](https://developers.notion.com/) (`NotionPageReader`)
+- [Google Docs](https://developers.google.com/docs/api) (`GoogleDocsReader`)
+- [Slack](https://api.slack.com/) (`SlackReader`)
+- [Discord](https://discord.com/developers/docs/intro) (`DiscordReader`)
+- [Apify Actors](https://llamahub.ai/l/apify-actor) (`ApifyActor`). Can crawl the web, scrape webpages, extract text content, download files including `.pdf`, `.jpg`, `.png`, `.docx`, etc.
+
+See below for detailed guides.
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.rst
+```
@@ -0,0 +1,20 @@
+# Usage Pattern
+
+## Get Started
+Each data loader contains a "Usage" section showing how that loader can be used. At the core of using each loader is a `download_loader` function, which
+downloads the loader file into a module that you can use within your application.
+
+Example usage:
+
+```python
+from llama_index import VectorStoreIndex, download_loader
+
+GoogleDocsReader = download_loader('GoogleDocsReader')
+
+gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
+loader = GoogleDocsReader()
+documents = loader.load_data(document_ids=gdoc_ids)
+index = VectorStoreIndex.from_documents(documents)
+query_engine = index.as_query_engine()
+query_engine.query('Where did the author go to school?')
+```
@@ -0,0 +1,64 @@
+# Documents / Nodes
+
+## Concept
+
+Document and Node objects are core abstractions within LlamaIndex.
+
+A **Document** is a generic container around any data source - for instance, a PDF, an API output, or retrieved data from a database. They can be constructed manually, or created automatically via our data loaders. By default, a Document stores text along with some other attributes. Some of these are listed below.
+- `metadata` - a dictionary of annotations that can be appended to the text.
+- `relationships` - a dictionary containing relationships to other Documents/Nodes.
+
+*Note*: We have beta support for allowing Documents to store images, and are actively working on improving its multimodal capabilities.
+
+A **Node** represents a "chunk" of a source Document, whether that is a text chunk, an image, or other. Similar to Documents, they contain metadata and relationship information with other nodes.
+
+Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to "parse" source Documents into Nodes through our `NodeParser` classes. By default every Node derived from a Document will inherit the same metadata from that Document (e.g. a "file_name" filed in the Document is propagated to every Node).
+
+
+## Usage Pattern
+
+Here are some simple snippets to get started with Documents and Nodes.
+
+#### Documents
+
+```python
+from llama_index import Document, VectorStoreIndex
+
+text_list = [text1, text2, ...]
+documents = [Document(text=t) for t in text_list]
+
+# build index
+index = VectorStoreIndex.from_documents(documents)
+
+```
+
+#### Nodes
+```python
+
+from llama_index.node_parser import SimpleNodeParser
+
+# load documents
+...
+
+# parse nodes
+parser = SimpleNodeParser()
+nodes = parser.get_nodes_from_documents(documents)
+
+# build index
+index = VectorStoreIndex(nodes)
+
+```
+
+### Document/Node Usage
+
+Take a look at our in-depth guides for more details on how to use Documents/Nodes.
+
+```{toctree}
+---
+maxdepth: 1
+---
+usage_documents.md
+usage_nodes.md
+usage_metadata_extractor.md
+```
+
@@ -0,0 +1,177 @@
+# Defining and Customizing Documents
+
+
+## Defining Documents
+
+Documents can either be created automatically via data loaders, or constructed manually.
+
+By default, all of our [data loaders](/core_modules/data_modules/connector/root.md) (including those offered on LlamaHub) return `Document` objects through the `load_data` function.
+
+```python
+from llama_index import SimpleDirectoryReader
+
+documents = SimpleDirectoryReader('./data').load_data()
+```
+
+You can also choose to construct documents manually. LlamaIndex exposes the `Document` struct.
+
+```python
+from llama_index import Document
+
+text_list = [text1, text2, ...]
+documents = [Document(text=t) for t in text_list]
+```
+
+To speed up prototyping and development, you can also quickly create a document using some default text:
+
+```python
+document = Document.example()
+```
+
+## Customizing Documents
+
+This section covers various ways to customize `Document` objects. Since the `Document` object is a subclass of our `TextNode` object, all these settings and details apply to the `TextNode` object class as well.
+
+### Metadata
+
+Documents also offer the chance to include useful metadata. Using the `metadata` dictionary on each document, additional information can be included to help inform responses and track down sources for query responses. This information can be anything, such as filenames or categories. If you are intergrating with a vector database, keep in mind that some vector databases require that the keys must be strings, and the values must be flat (either `str`, `float`, or `int`).
+
+Any information set in the `metadata` dictionary of each document will show up in the `metadata` of each source node created from the document. Additionaly, this information is included in the nodes, enabling the index to utilize it on queries and responses. By default, the metadata is injected into the text for both embedding and LLM model calls.
+
+There are a few ways to set up this dictionary:
+
+1. In the document constructor:
+
+```python
+document = Document(
+    text='text', 
+    metadata={
+        'filename': '<doc_file_name>', 
+        'category': '<category>'
+    }
+)
+```
+
+2. After the document is created:
+
+```python
+document.metadata = {'filename': '<doc_file_name>'}
+```
+
+3. Set the filename automatically using the `SimpleDirectoryReader` and `file_metadata` hook. This will automatically run the hook on each document to set the `metadata` field:
+
+```python
+from llama_index import SimpleDirectoryReader
+filename_fn = lambda filename: {'file_name': filename}
+
+# automatically sets the metadata of each document according to filename_fn
+documents = SimpleDirectoryReader('./data', file_metadata=filename_fn)
+```
+
+### Customizing the id
+
+As detailed in the section [Document Management](../index/document_management.md), the doc `id_` is used to enable effecient refreshing of documents in the index. When using the `SimpleDirectoryReader`, you can automatically set the doc `id_` to be the full path to each document:
+
+```python
+from llama_index import SimpleDirectoryReader
+
+documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()
+print([x.doc_id for x in documents])
+```
+
+You can also set the `id_` of any `Document` or `TextNode` directly!
+
+```python
+document.id_ = "My new document id!"
+```
+
+### Advanced - Metadata Customization
+
+A key detail mentioned above is that by default, any metadata you set is included in the embeddings generation and LLM.
+
+#### Customizing LLM Metadata Text
+
+Typically, a document might have many metadata keys, but you might not want all of them visibile to the LLM during response synthesis. In the above examples, we may not want the LLM to read the `file_name` of our document. However, the `file_name` might include information that will help generate better embeddings. A key advantage of doing this is to bias the embeddings for retrieval without changing what the LLM ends up reading. 
+
+We can exclude it like so:
+
+```python
+document.excluded_llm_metadata_keys = ['file_name']
+```
+
+Then, we can test what the LLM will actually end up reading using the `get_content()` function and specifying `MetadataMode.LLM`:
+
+```python
+from llama_index.schema import MetadataMode
+print(document.get_content(metadata_mode=MetadataMode.LLM))
+```
+
+#### Customizing Embedding Metadata Text
+
+Similar to customing the metadata visibile to the LLM, we can also customize the metadata visible to emebddings. In this case, you can specifically exclude metadata visible to the embedding model, in case you DON'T want particular text to bias the embeddings.
+
+```python
+document.excluded_embed_metadata_keys = ['file_name']
+```
+
+Then, we can test what the embedding model will actually end up reading using the `get_content()` function and specifying `MetadataMode.EMBED`:
+
+```python
+from llama_index.schema import MetadataMode
+print(document.get_content(metadata_mode=MetadataMode.EMBED))
+```
+
+#### Customizing Metadata Format
+
+As you know by now, metadata is injected into the actual text of each document/node when sent to the LLM or embedding model. By default, the format of this metadata is controlled by three attributes:
+
+1. `Document.metadata_seperator` -> default = `"\n"`
+
+When concatenating all key/value fields of your metadata, this field controls the seperator bewtween each key/value pair.
+
+2. `Document.metadata_template` -> default = `"{key}: {value}"`
+
+This attribute controls how each key/value pair in your metadata is formatted. The two variables `key` and `value` string keys are required.
+
+3. `Document.text_template` -> default = `{metadata_str}\n\n{content}`
+
+Once your metadata is converted into a string using `metadata_seperator` and `metadata_template`, this templates controls what that metadata looks like when joined with the text content of your document/node. The `metadata` and `content` string keys are required.
+
+### Summary
+
+Knowing all this, let's create a short example using all this power:
+
+```python
+from llama_index import Document
+from llama_index.schema import MetadataMode
+
+document = Document(
+    text="This is a super-customized document",
+    metadata={
+        "file_name": "super_secret_document.txt",
+        "category": "finance",
+        "author": "LlamaIndex"    
+    },
+    excluded_llm_metadata_keys=['file_name'],
+    metadata_seperator="::",
+    metadata_template="{key}=>{value}",
+    text_template="Metadata: {metadata_str}\n-----\nContent: {content}",
+)
+
+print("The LLM sees this: \n", document.get_content(metadata_mode=MetadataMode.LLM))
+print("The Embedding model sees this: \n", document.get_content(metadata_mode=MetadataMode.EMBED))
+```
+
+
+### Advanced - Automatic Metadata Extraction
+
+We have initial examples of using LLMs themselves to perform metadata extraction.
+
+Take a look here! 
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/metadata_extraction/MetadataExtractionSEC.ipynb
+```
@@ -0,0 +1,43 @@
+# Automated Metadata Extraction for Nodes
+
+You can use LLMs to automate metadata extraction with our `MetadataExtractor` modules.
+
+Our metadata extractor modules include the following "feature extractors":
+- `SummaryExtractor` - automatically extracts a summary over a set of Nodes
+- `QuestionsAnsweredExtractor` - extracts a set of questions that each Node can answer
+- `TitleExtractor` - extracts a title over the context of each Node
+
+You can use these feature extractors within our overall `MetadataExtractor` class. Then you can plug in the `MetadataExtractor` into our node parser:
+
+```python
+from llama_index.node_parser.extractors import (
+    MetadataExtractor,
+    TitleExtractor,
+    QuestionsAnsweredExtractor
+)
+from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
+
+text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128)
+metadata_extractor = MetadataExtractor(
+    extractors=[
+        TitleExtractor(nodes=5),
+        QuestionsAnsweredExtractor(questions=3),
+    ],
+)
+
+node_parser = SimpleNodeParser(
+    text_splitter=text_splitter,
+    metadata_extractor=metadata_extractor,
+)
+# assume documents are defined -> extract nodes
+nodes = node_parser.get_nodes_from_documents(documents)
+```
+
+
+```{toctree}
+---
+caption: Metadata Extraction Guides
+maxdepth: 1
+---
+/examples/metadata_extraction/MetadataExtractionSEC.ipynb
+```
@@ -0,0 +1,35 @@
+# Defining and Customizing Nodes
+
+Nodes represent "chunks" of source Documents, whether that is a text chunk, an image, or more. They also contain metadata and relationship information
+with other nodes and index structures.
+
+Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to "parse" source Documents into Nodes through our `NodeParser` classes.
+
+For instance, you can do
+
+```python
+from llama_index.node_parser import SimpleNodeParser
+
+parser = SimpleNodeParser()
+
+nodes = parser.get_nodes_from_documents(documents)
+```
+
+You can also choose to construct Node objects manually and skip the first section. For instance,
+
+```python
+from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo
+
+node1 = TextNode(text="<text_chunk>", id_="<node_id>")
+node2 = TextNode(text="<text_chunk>", id_="<node_id>")
+# set relationships
+node1.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=node2.node_id)
+node2.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=node1.node_id)
+nodes = [node1, node2]
+```
+
+The `RelatedNodeInfo` class can also store additional `metadata` if needed:
+
+```python
+node2.relationships[NodeRelationship.PARENT] = RelatedNodeInfo(node_id=node1.node_id, metadata={"key": "val"})
+```
@@ -0,0 +1,156 @@
+# Composability
+
+
+LlamaIndex offers **composability** of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.
+
+Composability allows you to to define lower-level indices for each document, and higher-order indices over a collection of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list index over each tree index (one document) within your collection.
+
+### Defining Subindices
+To see how this works, imagine you have 3 documents: `doc1`, `doc2`, and `doc3`.
+
+```python
+from llama_index import SimpleDirectoryReader
+
+doc1 = SimpleDirectoryReader('data1').load_data()
+doc2 = SimpleDirectoryReader('data2').load_data()
+doc3 = SimpleDirectoryReader('data3').load_data()
+```
+
+![](/_static/composability/diagram_b0.png)
+
+Now let's define a tree index for each document. In order to persist the graph later, each index should share the same storage context.
+
+In Python, we have:
+
+```python
+from llama_index import TreeIndex
+
+storage_context = storage_context.from_defaults()
+
+index1 = TreeIndex.from_documents(doc1, storage_context=storage_context)
+index2 = TreeIndex.from_documents(doc2, storage_context=storage_context)
+index3 = TreeIndex.from_documents(doc3, storage_context=storage_context)
+```
+
+![](/_static/composability/diagram_b1.png)
+
+### Defining Summary Text
+
+You then need to explicitly define *summary text* for each subindex. This allows  
+the subindices to be used as Documents for higher-level indices.
+
+```python
+index1_summary = "<summary1>"
+index2_summary = "<summary2>"
+index3_summary = "<summary3>"
+```
+
+You may choose to manually specify the summary text, or use LlamaIndex itself to generate
+a summary, for instance with the following:
+
+```python
+summary = index1.query(
+    "What is a summary of this document?", retriever_mode="all_leaf"
+)
+index1_summary = str(summary)
+```
+
+**If specified**, this summary text for each subindex can be used to refine the answer during query-time. 
+
+### Creating a Graph with a Top-Level Index
+
+We can then create a graph with a list index on top of these 3 tree indices:
+We can query, save, and load the graph to/from disk as any other index.
+
+```python
+from llama_index.indices.composability import ComposableGraph
+
+graph = ComposableGraph.from_indices(
+    ListIndex,
+    [index1, index2, index3],
+    index_summaries=[index1_summary, index2_summary, index3_summary],
+    storage_context=storage_context,
+)
+
+```
+
+![](/_static/composability/diagram.png)
+
+
+### Querying the Graph
+
+During a query, we would start with the top-level list index. Each node in the list corresponds to an underlying tree index. 
+The query will be executed recursively, starting from the root index, then the sub-indices.
+The default query engine for each index is called under the hood (i.e. `index.as_query_engine()`), unless otherwise configured by passing `custom_query_engines` to the `ComposableGraphQueryEngine`.
+Below we show an example that configure the tree index retrievers to use `child_branch_factor=2` (instead of the default `child_branch_factor=1`).
+
+
+More detail on how to configure `ComposableGraphQueryEngine` can be found [here](/api_reference/query/query_engines/graph_query_engine.rst).
+
+
+```python
+# set custom retrievers. An example is provided below
+custom_query_engines = {
+    index.index_id: index.as_query_engine(
+        child_branch_factor=2
+    ) 
+    for index in [index1, index2, index3]
+}
+query_engine = graph.as_query_engine(
+    custom_query_engines=custom_query_engines
+)
+response = query_engine.query("Where did the author grow up?")
+```
+
+> Note that specifying custom retriever for index by id
+> might require you to inspect e.g., `index1.index_id`.
+> Alternatively, you can explicitly set it as follows:
+```python
+index1.set_index_id("<index_id_1>")
+index2.set_index_id("<index_id_2>")
+index3.set_index_id("<index_id_3>")
+```
+
+![](/_static/composability/diagram_q1.png)
+
+So within a node, instead of fetching the text, we would recursively query the stored tree index to retrieve our answer.
+
+![](/_static/composability/diagram_q2.png)
+
+NOTE: You can stack indices as many times as you want, depending on the hierarchies of your knowledge base! 
+
+
+### [Optional] Persisting the Graph
+
+The graph can also be persisted to storage, and then loaded again when needed. Note that you'll need to set the 
+ID of the root index, or keep track of the default.
+
+```python
+# set the ID
+graph.root_index.set_index_id("my_id")
+
+# persist to storage
+graph.root_index.storage_context.persist(persist_dir="./storage")
+
+# load 
+from llama_index import StorageContext, load_graph_from_storage
+
+storage_context = StorageContext.from_defaults(persist_dir="./storage")
+graph = load_graph_from_storage(storage_context, root_id="my_id")
+```
+
+
+We can take a look at a code example below as well. We first build two tree indices, one over the Wikipedia NYC page, and the other over Paul Graham's essay. We then define a keyword extractor index over the two tree indices.
+
+[Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/ComposableIndices.ipynb).
+
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+../../../../examples/composable_indices/ComposableIndices-Prior.ipynb
+../../../../examples/composable_indices/ComposableIndices-Weaviate.ipynb
+../../../../examples/composable_indices/ComposableIndices.ipynb
+```
@@ -0,0 +1,109 @@
+# Document Management
+
+Most LlamaIndex index structures allow for **insertion**, **deletion**, **update**, and **refresh** operations.
+
+## Insertion
+
+You can "insert" a new Document into any index data structure, after building the index initially. This document will be broken down into nodes and ingested into the index.
+
+The underlying mechanism behind insertion depends on the index structure. For instance, for the list index, a new Document is inserted as additional node(s) in the list.
+For the vector store index, a new Document (and embeddings) is inserted into the underlying document/embedding store.
+
+An example notebook showcasing our insert capabilities is given [here](https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/InsertDemo.ipynb).
+In this notebook we showcase how to construct an empty index, manually create Document objects, and add those to our index data structures.
+
+An example code snippet is given below:
+
+```python
+from llama_index import ListIndex, Document
+
+index = ListIndex([])
+text_chunks = ['text_chunk_1', 'text_chunk_2', 'text_chunk_3']
+
+doc_chunks = []
+for i, text in enumerate(text_chunks):
+    doc = Document(text=text, id_=f"doc_id_{i}")
+    doc_chunks.append(doc)
+
+# insert
+for doc_chunk in doc_chunks:
+    index.insert(doc_chunk)
+```
+
+## Deletion
+
+You can "delete" a Document from most index data structures by specifying a document_id. (**NOTE**: the tree index currently does not support deletion). All nodes corresponding to the document will be deleted.
+
+```python
+index.delete_ref_doc("doc_id_0", delete_from_docstore=True)
+```
+
+`delete_from_docstore` will default to `False` in case you are sharing nodes betweeen indexes using the same docstore. However, these nodes will not be used when querying when this is set to `False` as they will be deleted from the `index_struct` of the index, which keeps track of which nodes can be used for querying.
+
+## Update
+
+If a Document is already present within an index, you can "update" a Document with the same doc `id_` (for instance, if the information in the Document has changed).
+
+```python
+# NOTE: the document has a `doc_id` specified
+doc_chunks[0].text = "Brand new document text"
+index.update_ref_doc(
+    doc_chunks[0], 
+    update_kwargs={"delete_kwargs": {'delete_from_docstore': True}}
+)
+```
+
+Here, we passed some extra kwargs to ensure the document is deleted from the docstore. This is of course optional.
+
+## Refresh
+
+If you set the doc `id_` of each document when loading your data, you can also automatically refresh the index.
+
+The `refresh()` function will only update documents who have the same doc `id_`, but different text contents. Any documents not present in the index at all will also be inserted.
+
+`refresh()` also returns a boolean list, indicating which documents in the input have been refreshed in the index.
+
+```python
+# modify first document, with the same doc_id
+doc_chunks[0] = Document(text='Super new document text', id_="doc_id_0")
+
+# add a new document
+doc_chunks.append(Document(text="This isn't in the index yet, but it will be soon!", id_="doc_id_3"))
+
+# refresh the index
+refreshed_docs = index.refresh_ref_docs(
+    doc_chunks,
+    update_kwargs={"delete_kwargs": {'delete_from_docstore': True}}
+)
+
+# refreshed_docs[0] and refreshed_docs[-1] should be true
+```
+
+Again, we passed some extra kwargs to ensure the document is deleted from the docstore. This is of course optional.
+
+If you `print()` the output of `refresh()`, you would see which input documents were refreshed:
+
+```python
+print(refreshed_docs)
+> [True, False, False, True]
+```
+
+This is most useful when you are reading from a directory that is constantly updating with new information.
+
+To autmatically set the doc `id_` when using the `SimpleDirectoryReader`, you can set the `filename_as_id` flag. More details can be found [here](../customization/custom_documents.md).
+
+## Document Tracking
+
+Any index that uses the docstore (i.e. all indexes except for most vector store integrations), you can also see which documents you have inserted into the docstore. 
+
+```python
+print(index.ref_doc_info)
+> {'doc_id_1': RefDocInfo(node_ids=['071a66a8-3c47-49ad-84fa-7010c6277479'], metadata={}), 
+   'doc_id_2': RefDocInfo(node_ids=['9563e84b-f934-41c3-acfd-22e88492c869'], metadata={}), 
+   'doc_id_0': RefDocInfo(node_ids=['b53e6c2f-16f7-4024-af4c-42890e945f36'], metadata={}), 
+   'doc_id_3': RefDocInfo(node_ids=['6bedb29f-15db-4c7c-9885-7490e10aa33f'], metadata={})}
+```
+
+Each entry in the output shows the ingested doc `id_`s as keys, and their associated `node_ids` of the nodes they were split into. 
+
+Lastly, the orignal `metadata` dictionary of each input document is also tracked. You can read more about the `metadata` attribute in [Customizing Documents](../customization/custom_documents.md).
@@ -0,0 +1,70 @@
+# How Each Index Works
+
+This guide describes how each index works with diagrams. 
+
+Some terminology:
+- **Node**: Corresponds to a chunk of text from a Document. LlamaIndex takes in Document objects and internally parses/chunks them into Node objects.
+- **Response Synthesis**: Our module which synthesizes a response given the retrieved Node. You can see how to 
+    [specify different response modes](setting-response-mode) here. 
+
+## List Index
+
+The list index simply stores Nodes as a sequential chain.
+
+![](/_static/indices/list.png)
+
+### Querying
+
+During query time, if no other query parameters are specified, LlamaIndex simply loads all Nodes in the list into
+our Response Synthesis module.
+
+![](/_static/indices/list_query.png)
+
+The list index does offer numerous ways of querying a list index, from an embedding-based query which 
+will fetch the top-k neighbors, or with the addition of a keyword filter, as seen below:
+
+![](/_static/indices/list_filter_query.png)
+
+
+## Vector Store Index
+
+The vector store index stores each Node and a corresponding embedding in a [Vector Store](vector-store-index).
+
+![](/_static/indices/vector_store.png)
+
+### Querying
+
+Querying a vector store index involves fetching the top-k most similar Nodes, and passing
+those into our Response Synthesis module.
+
+![](/_static/indices/vector_store_query.png)
+
+## Tree Index
+
+The tree index builds a hierarchical tree from a set of Nodes (which become leaf nodes in this tree).
+
+![](/_static/indices/tree.png)
+
+### Querying
+
+Querying a tree index involves traversing from root nodes down 
+to leaf nodes. By default, (`child_branch_factor=1`), a query
+chooses one child node given a parent node. If `child_branch_factor=2`, a query
+chooses two child nodes per level.
+
+![](/_static/indices/tree_query.png)
+
+## Keyword Table Index
+
+The keyword table index extracts keywords from each Node and builds a mapping from 
+each keyword to the corresponding Nodes of that keyword.
+
+![](/_static/indices/keyword.png)
+
+### Querying
+
+During query time, we extract relevant keywords from the query, and match those with pre-extracted
+Node keywords to fetch the corresponding Nodes. The extracted Nodes are passed to our 
+Response Synthesis module.
+
+![](/_static/indices/keyword_query.png)
@@ -0,0 +1,71 @@
+# Metadata Extraction
+
+
+## Introduction
+In many cases, especially with long documents, a chunk of text may lack the context necessary to disambiguate the chunk from other similar chunks of text. 
+
+To combat this, we use LLMs to extract certain contextual information relevant to the document to better help the retrieval and language models disambiguate similar-looking passages.
+
+We show this in an [example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/metadata_extraction/MetadataExtractionSEC.ipynb) and demonstrate its effectiveness in processing long documents.
+
+## Usage
+
+First, we define a metadata extractor that takes in a list of feature extractors that will be processed in sequence.
+
+We then feed this to the node parser, which will add the additional metadata to each node.
+```python
+from llama_index.node_parser import SimpleNodeParser
+from llama_index.node_parser.extractors import (
+    MetadataExtractor,
+    SummaryExtractor,
+    QuestionsAnsweredExtractor,
+    TitleExtractor,
+    KeywordExtractor,
+)
+
+metadata_extractor = MetadataExtractor(
+    extractors=[
+        TitleExtractor(nodes=5),
+        QuestionsAnsweredExtractor(questions=3),
+        SummaryExtractor(summaries=["prev", "self"]),
+        KeywordExtractor(keywords=10),
+    ],
+)
+
+node_parser = SimpleNodeParser(
+    metadata_extractor=metadata_extractor,
+)
+```
+
+Here is an sample of extracted metadata:
+
+```
+{'page_label': '2',
+ 'file_name': '10k-132.pdf',
+ 'document_title': 'Uber Technologies, Inc. 2019 Annual Report: Revolutionizing Mobility and Logistics Across 69 Countries and 111 Million MAPCs with $65 Billion in Gross Bookings',
+ 'questions_this_excerpt_can_answer': '\n\n1. How many countries does Uber Technologies, Inc. operate in?\n2. What is the total number of MAPCs served by Uber Technologies, Inc.?\n3. How much gross bookings did Uber Technologies, Inc. generate in 2019?',
+ 'prev_section_summary': "\n\nThe 2019 Annual Report provides an overview of the key topics and entities that have been important to the organization over the past year. These include financial performance, operational highlights, customer satisfaction, employee engagement, and sustainability initiatives. It also provides an overview of the organization's strategic objectives and goals for the upcoming year.",
+ 'section_summary': '\nThis section discusses a global tech platform that serves multiple multi-trillion dollar markets with products leveraging core technology and infrastructure. It enables consumers and drivers to tap a button and get a ride or work. The platform has revolutionized personal mobility with ridesharing and is now leveraging its platform to redefine the massive meal delivery and logistics industries. The foundation of the platform is its massive network, leading technology, operational excellence, and product expertise.',
+ 'excerpt_keywords': '\nRidesharing, Mobility, Meal Delivery, Logistics, Network, Technology, Operational Excellence, Product Expertise, Point A, Point B'}
+```
+
+## Custom Extractors
+
+If the provided extractors do not fit your needs, you can also define a custom extractor like so:
+```python
+from llama_index.node_parser.extractors import MetadataFeatureExtractor
+
+class CustomExtractor(MetadataFeatureExtractor):
+    def extract(self, nodes) -> List[Dict]:
+        metadata_list = [
+            {
+                "custom": node.metadata["document_title"]
+                + "\n"
+                + node.metadata["excerpt_keywords"]
+            }
+            for node in nodes
+        ]
+        return metadata_list
+```
+
+In a more advanced example, it can also make use of an `llm_predictor` to extract features from the node content and the existing metadata. Refer to the [source code of the provided metadata extractors](https://github.com/jerryjliu/llama_index/blob/main/llama_index/node_parser/extractors/metadata_extractors.py) for more details.
@@ -0,0 +1,16 @@
+# Module Guides
+
+```{toctree}
+---
+maxdepth: 1
+---
+vector_store_guide.ipynb
+List Index <./index_guide.md>
+Tree Index <./index_guide.md>
+Keyword Table Index <./index_guide.md>
+/examples/index_structs/knowledge_graph/KnowledgeGraphDemo.ipynb
+/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb
+SQL Index </examples/index_structs/struct_indices/SQLIndexDemo.ipynb>
+/examples/index_structs/struct_indices/duckdb_sql_query.ipynb
+/examples/index_structs/doc_summary/DocSummary.ipynb
+```
@@ -0,0 +1,56 @@
+# Indexes
+
+## Concept
+An `Index` is a data structure that allows us to quickly retrieve relevant context for a user query.
+For LlamaIndex, it's the core foundation for retrieval-augmented generation (RAG) use-cases.
+
+
+At a high-level, `Indices` are built from [Documents](/core_modules/data_modules/documents_and_nodes/root.md).
+They are used to build [Query Engines](/core_modules/query_modules/query_engine/root.md) and [Chat Engines](/core_modules/query_modules/chat_engines/root.md)
+which enables question & answer and chat over your data.  
+
+Under the hood, `Indices` store data in `Node` objects (which represent chunks of the original documents), and expose a [Retriever](/core_modules/query_modules/retriever/root.md) interface that supports additional configuration and automation.
+
+For a more in-depth explanation, check out our guide below:
+```{toctree}
+---
+maxdepth: 1
+---
+index_guide.md
+```
+
+
+
+## Usage Pattern
+Get started with:
+```python
+from llama_index import VectorStoreIndex
+
+index = VectorStoreIndex.from_documents(docs)
+```
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+
+## Modules
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
+
+## Advanced Concepts
+
+```{toctree}
+---
+maxdepth: 1
+---
+composability.md
+```
@@ -0,0 +1,88 @@
+# Usage Pattern
+
+## Get Started
+
+Build an index from documents:
+
+```python
+from llama_index import VectorStoreIndex
+
+index = VectorStoreIndex.from_documents(docs)
+```
+
+```{tip}
+To learn how to load documents, see [Data Connectors](/core_modules/data_modules/connector/root.md)
+```
+
+### What is happening under the hood?
+
+1. Documents are chunked up and parsed into `Node` objects (which are lightweight abstraction over text str that additional keep track of metadata and relationships).
+2. Additional computation is performed to add `Node` into index data structure
+   > Note: the computation is index-specific.
+   >
+   > - For a vector store index, this means calling an embedding model (via API or locally) to compute embedding for the `Node` objects
+   > - For a document summary index, this means calling an LLM to generate a summary
+
+## Configuring Document Parsing
+
+The most common configuration you might want to change is how to parse document into `Node` objects.
+
+### High-Level API
+
+We can configure our service context to use the desired chunk size and set `show_progress` to display a progress bar during index construction.
+
+```python
+from llama_index import ServiceContext, VectorStoreIndex
+
+service_context = ServiceContext.from_defaults(chunk_size=512)
+index = VectorStoreIndex.from_documents(
+    docs,
+    service_context=service_context,
+    show_progress=True
+)
+```
+
+> Note: While the high-level API optimizes for ease-of-use, it does _NOT_ expose full range of configurability.
+
+### Low-Level API
+
+You can use the low-level composition API if you need more granular control.
+
+Here we show an example where you want to both modify the text chunk size, disable injecting metadata, and disable creating `Node` relationships.  
+The steps are:
+
+1. Configure a node parser
+
+```python
+from llama_index.node_parser import SimpleNodeParser
+
+parser = SimpleNodeParser.from_defaults(
+    chunk_size=512,
+    include_extra_info=False,
+    include_prev_next_rel=False,
+)
+```
+
+2. Parse document into `Node` objects
+
+```python
+nodes = parser.get_nodes_from_documents(documents)
+```
+
+3. build index from `Node` objects
+
+```python
+index = VectorStoreIndex(nodes)
+```
+
+## Handling Document Update
+
+Read more about how to deal with data sources that change over time with `Index` **insertion**, **deletion**, **update**, and **refresh** operations.
+
+```{toctree}
+---
+maxdepth: 1
+---
+metadata_extraction.md
+document_management.md
+```
@@ -0,0 +1,321 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Vector Store Index\n",
+    "\n",
+    "In this guide, we show how to use the vector store index with different vector store\n",
+    "implementations.  \n",
+    " \n",
+    "From how to get started with few lines of code with the default\n",
+    "in-memory vector store with default query configuration, to using a custom hosted vector\n",
+    "store, with advanced settings such as metadata filters.\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Construct vector store and index\n",
+    "**Default**\n",
+    "\n",
+    "By default, `VectorStoreIndex` uses a in-memory `SimpleVectorStore`\n",
+    "that's initialized as part of the default storage context."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
+    "\n",
+    "# Load documents and build index\n",
+    "documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
+    "index = VectorStoreIndex.from_documents(documents)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "**Custom vector stores**\n",
+    "\n",
+    "You can use a custom vector store (in this case `PineconeVectorStore`) as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pinecone\n",
+    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext\n",
+    "from llama_index.vector_stores import PineconeVectorStore\n",
+    "\n",
+    "# init pinecone\n",
+    "pinecone.init(api_key=\"<api_key>\", environment=\"<environment>\")\n",
+    "pinecone.create_index(\"quickstart\", dimension=1536, metric=\"euclidean\", pod_type=\"p1\")\n",
+    "\n",
+    "# construct vector store and customize storage context\n",
+    "storage_context = StorageContext.from_defaults(\n",
+    "    vector_store=PineconeVectorStore(pinecone.Index(\"quickstart\"))\n",
+    ")\n",
+    "\n",
+    "# Load documents and build index\n",
+    "documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
+    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "For more examples of how to initialize different vector stores, \n",
+    "see [Vector Store Integrations](/how_to/integrations/vector_stores.md)."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connect to external vector stores (with existing embeddings)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you have already computed embeddings and dumped them into an external vector store (e.g. Pinecone, Chroma), you can use it with LlamaIndex by:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = PineconeVectorStore(pinecone.Index(\"quickstart\"))\n",
+    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### Query\n",
+    "**Default**  \n",
+    "\n",
+    "You can start querying by getting the default query engine:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_engine = index.as_query_engine()\n",
+    "response = query_engine.query(\"What did the author do growing up?\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "**Configure standard query setting**  \n",
+    "\n",
+    "To configure query settings, you can directly pass it as\n",
+    "keyword args when building the query engine: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
+    "\n",
+    "query_engine = index.as_query_engine(\n",
+    "    similarity_top_k=3,\n",
+    "    vector_store_query_mode=\"default\",\n",
+    "    filters=MetadataFilters(\n",
+    "        filters=[\n",
+    "            ExactMatchFilter(key=\"name\", value=\"paul graham\"),\n",
+    "        ]\n",
+    "    ),\n",
+    "    alpha=None,\n",
+    "    doc_ids=None,\n",
+    ")\n",
+    "response = query_engine.query(\"what did the author do growing up?\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that metadata filtering is applied against metadata specified in `Node.metadata`."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, if you are using the lower-level compositional API:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index import get_response_synthesizer\n",
+    "from llama_index.indices.vector_store.retrievers import VectorIndexRetriever\n",
+    "from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
+    "\n",
+    "# build retriever\n",
+    "retriever = VectorIndexRetriever(\n",
+    "    index=index,\n",
+    "    similarity_top_k=3,\n",
+    "    vector_store_query_mode=\"default\",\n",
+    "    filters=[ExactMatchFilter(key=\"name\", value=\"paul graham\")],\n",
+    "    alpha=None,\n",
+    "    doc_ids=None,\n",
+    ")\n",
+    "\n",
+    "# build query engine\n",
+    "query_engine = RetrieverQueryEngine(\n",
+    "    retriever=retriever, response_synthesizer=get_response_synthesizer()\n",
+    ")\n",
+    "\n",
+    "# query\n",
+    "response = query_engine.query(\"what did the author do growing up?\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Configure vector store specific keyword arguments**  \n",
+    "\n",
+    "You can customize keyword arguments unique to a specific vector store implementation as well by passing in `vector_store_kwargs`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_engine = index.as_query_engine(\n",
+    "    similarity_top_k=3,\n",
+    "    # only works for pinecone\n",
+    "    vector_store_kwargs={\n",
+    "        \"filter\": {\"name\": \"paul graham\"},\n",
+    "    },\n",
+    ")\n",
+    "response = query_engine.query(\"what did the author do growing up?\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Use an auto retriever**\n",
+    "\n",
+    "You can also use an LLM to automatically decide query setting for you! \n",
+    "Right now, we support automatically setting exact match metadata filters and top k parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index import get_response_synthesizer\n",
+    "from llama_index.indices.vector_store.retrievers import VectorIndexAutoRetriever\n",
+    "from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
+    "from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo\n",
+    "\n",
+    "\n",
+    "vector_store_info = VectorStoreInfo(\n",
+    "    content_info=\"brief biography of celebrities\",\n",
+    "    metadata_info=[\n",
+    "        MetadataInfo(\n",
+    "            name=\"category\",\n",
+    "            type=\"str\",\n",
+    "            description=\"Category of the celebrity, one of [Sports, Entertainment, Business, Music]\",\n",
+    "        ),\n",
+    "        MetadataInfo(\n",
+    "            name=\"country\",\n",
+    "            type=\"str\",\n",
+    "            description=\"Country of the celebrity, one of [United States, Barbados, Portugal]\",\n",
+    "        ),\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "# build retriever\n",
+    "retriever = VectorIndexAutoRetriever(index, vector_store_info=vector_store_info)\n",
+    "\n",
+    "# build query engine\n",
+    "query_engine = RetrieverQueryEngine(\n",
+    "    retriever=retriever, response_synthesizer=get_response_synthesizer()\n",
+    ")\n",
+    "\n",
+    "# query\n",
+    "response = query_engine.query(\"Tell me about two celebrities from United States\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "llama",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,24 @@
+# Node Parser
+
+## Concept
+
+Node parsers are a simple abstraction that take a list of documents, and chunk them into `Node` objects, such that each node is a specific size. When a document is broken into nodes, all of it's attributes are inherited to the children nodes (i.e. `metadata`, text and metadata templates, etc.). You can read more about `Node` and `Document` properies [here](/core_modules/data_modules/documents_and_nodes/root.md).
+
+A node parser can configure the chunk size (in tokens) as well as any overlap between chunked nodes. The chunking is done by using a `TokenTextSplitter`, which default to a chunk size of 1024 and a default chunk overlap of 20 tokens.
+
+## Usage Pattern
+
+```python
+from llama_index.node_parser import SimpleNodeParser
+
+node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
+```
+
+You can find more usage details and availbale customization options below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+usage_pattern.md
+```
@@ -0,0 +1,80 @@
+# Usage Pattern
+
+## Getting Started
+
+Node parsers can be used on their own:
+
+```python
+from llama_index import Document
+from llama_index.node_parser import SimpleNodeParser
+
+node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
+
+nodes = node_parser.get_nodes_from_documents([Document(text="long text")], show_progress=False)
+```
+
+Or set inside a `ServiceContext` to be used automatically when an index is constructed using `.from_documents()`:
+
+```python
+from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
+from llama_index.node_parser import SimpleNodeParser
+
+documents = SimpleDirectoryReader("./data").load_data()
+
+node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
+service_context = ServiceContext.from_defaults(node_parser=node_parser)
+
+index = VectorStoreIndex.from_documents(documents, service_context=service_context)
+```
+
+## Customization
+
+There are several options available to customize:
+
+- `text_spliiter` (defaults to `TokenTextSplitter`) - the text splitter used to split text into chunks.
+- `include_metadata` (defaults to `True`) - whether or not `Node`s should inherit the document metadata.
+- `include_prev_next_rel` (defaults to `True`) - whether or not to include previous/next relationships between chunked `Node`s
+- `metadata_extractor` (defaults to `None`) - extra processing to extract helpful metadata. See [here for details](/core_modules/data_modules/documents_and_nodes/usage_metadata_extractor.md).
+
+If you don't want to change the `text_splitter`, you can use `SimpleNodeParser.from_defaults()` to easily change the chunk size and chunk overlap. The defaults are 1024 and 20 respectively.
+
+```python
+from llama_index.node_parser import SimpleNodeParser
+
+node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
+```
+
+### Text Splitter Customization
+
+If you do customize the `text_splitter` from the default `TokenTextSplitter`, you can use any splitter from langchain, or optionally our `SentenceSplitter`. Each text splitter has options for the default seperator, as well as options for backup seperators. These are useful for languages that are sufficiently different from English.
+
+`TokenTextSplitter` configuration:
+
+```python
+from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
+
+text_splitter = TokenTextSplitter(
+  seperator=" ",
+  chunk_size=1024,
+  chunk_overlap=20,
+  backup_seperators=["\n"]
+)
+
+node_parser = SimpleNodeParser(text_splitter=text_splitter)
+```
+
+`SentenceSplitter` configuration:
+
+```python
+from llama_index.langchain_helpers.text_splitter import SentenceSplitter
+
+text_splitter = SentenceSplitter(
+  seperator=" ",
+  chunk_size=1024,
+  chunk_overlap=20,
+  backup_seperators=["\n"],
+  paragraph_seperator="\n\n\n"
+)
+
+node_parser = SimpleNodeParser(text_splitter=text_splitter)
+```
@@ -0,0 +1,134 @@
+# Customizing Storage
+
+By default, LlamaIndex hides away the complexities and let you query your data in under 5 lines of code:
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+
+documents = SimpleDirectoryReader('data').load_data()
+index = VectorStoreIndex.from_documents(documents)
+query_engine = index.as_query_engine()
+response = query_engine.query("Summarize the documents.")
+```
+
+Under the hood, LlamaIndex also supports a swappable **storage layer** that allows you to customize where ingested documents (i.e., `Node` objects), embedding vectors, and index metadata are stored.
+
+
+![](/_static/storage/storage.png)
+
+### Low-Level API
+To do this, instead of the high-level API,
+```python
+index = VectorStoreIndex.from_documents(documents)
+```
+we use a lower-level API that gives more granular control:
+```python
+from llama_index.storage.docstore import SimpleDocumentStore
+from llama_index.storage.index_store import SimpleIndexStore
+from llama_index.vector_stores import SimpleVectorStore
+from llama_index.node_parser import SimpleNodeParser
+
+# create parser and parse document into nodes 
+parser = SimpleNodeParser()
+nodes = parser.get_nodes_from_documents(documents)
+
+# create storage context using default stores
+storage_context = StorageContext.from_defaults(
+    docstore=SimpleDocumentStore(),
+    vector_store=SimpleVectorStore(),
+    index_store=SimpleIndexStore(),
+)
+
+# create (or load) docstore and add nodes
+storage_context.docstore.add_documents(nodes)
+
+# build index
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+
+# save index
+index.storage_context.persist(persist_dir="<persist_dir>")
+
+# can also set index_id to save multiple indexes to the same folder
+index.set_index_id = "<index_id>"
+index.storage_context.persist(persist_dir="<persist_dir>")
+
+# to load index later, make sure you setup the storage context
+# this will loaded the persisted stores from persist_dir
+storage_context = StorageContext.from_defaults(
+    persist_dir="<persist_dir>"
+)
+
+# then load the index object
+from llama_index import load_index_from_storage
+loaded_index = load_index_from_storage(storage_context)
+
+# if loading an index from a persist_dir containing multiple indexes
+loaded_index = load_index_from_storage(storage_context, index_id="<index_id>")
+
+# if loading multiple indexes from a persist dir
+loaded_indicies = load_index_from_storage(storage_context, index_ids=["<index_id>", ...])
+```
+
+You can customize the underlying storage with a one-line change to instantiate different document stores, index stores, and vector stores.
+See [Document Stores](./docstores.md), [Vector Stores](./vector_stores.md), [Index Stores](./index_stores.md) guides for more details.
+
+For saving and loading a graph/composable index, see the [full guide here](../index/composability.md).
+
+### Vector Store Integrations and Storage
+
+Most of our vector store integrations store the entire index (vectors + text) in the vector store itself. This comes with the major benefit of not having to exlicitly persist the index as shown above, since the vector store is already hosted and persisting the data in our index.
+
+The vector stores that support this practice are:
+
+- ChatGPTRetrievalPluginClient
+- ChromaVectorStore
+- DocArrayHnswVectorStore
+- DocArrayInMemoryVectorStore
+- LanceDBVectorStore
+- MetalVectorStore
+- MilvusVectorStore
+- MyScaleVectorStore
+- OpensearchVectorStore
+- PineconeVectorStore
+- QdrantVectorStore
+- RedisVectorStore
+- WeaviateVectorStore
+
+A small example using Pinecone is below:
+
+```python
+import pinecone
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+from llama_index.vector_stores import PineconeVectorStore
+
+# Creating a Pinecone index
+api_key = "api_key"
+pinecone.init(api_key=api_key, environment="us-west1-gcp")
+pinecone.create_index(
+    "quickstart",
+    dimension=1536,
+    metric="euclidean",
+    pod_type="p1"
+)
+index = pinecone.Index("quickstart")
+
+# construct vector store
+vector_store = PineconeVectorStore(pinecone_index=index)
+
+# create storage context
+storage_context = StorageContext.from_defaults(vector_store=vector_store)
+
+# load documents
+documents = SimpleDirectoryReader("./data").load_data()
+
+# create index, which will insert documents/vectors to pinecone
+index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
+```
+
+If you have an existing vector store with data already loaded in, 
+you can connect to it and directly create a `VectorStoreIndex` as follows:
+
+```python
+index = pinecone.Index("quickstart")
+vector_store = PineconeVectorStore(pinecone_index=index)
+loaded_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
+```
@@ -0,0 +1,74 @@
+# Document Stores
+Document stores contain ingested document chunks, which we call `Node` objects.
+
+See the [API Reference](/api_reference/storage/docstore.rst) for more details.
+
+
+### Simple Document Store
+By default, the `SimpleDocumentStore` stores `Node` objects in-memory. 
+They can be persisted to (and loaded from) disk by calling `docstore.persist()` (and `SimpleDocumentStore.from_persist_path(...)` respectively).
+
+### MongoDB Document Store
+We support MongoDB as an alternative document store backend that persists data as `Node` objects are ingested.
+```python
+from llama_index.storage.docstore import MongoDocumentStore
+from llama_index.node_parser import SimpleNodeParser
+
+# create parser and parse document into nodes 
+parser = SimpleNodeParser()
+nodes = parser.get_nodes_from_documents(documents)
+
+# create (or load) docstore and add nodes
+docstore = MongoDocumentStore.from_uri(uri="<mongodb+srv://...>")
+docstore.add_documents(nodes)
+
+# create storage context
+storage_context = StorageContext.from_defaults(docstore=docstore)
+
+# build index
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+```
+
+Under the hood, `MongoDocumentStore` connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your nodes.
+> Note: You can configure the `db_name` and `namespace` when instantiating `MongoDocumentStore`, otherwise they default to `db_name="db_docstore"` and `namespace="docstore"`.
+
+Note that it's not necessary to call `storage_context.persist()` (or `docstore.persist()`) when using an `MongoDocumentStore`
+since data is persisted by default. 
+
+You can easily reconnect to your MongoDB collection and reload the index by re-initializing a `MongoDocumentStore` with an existing `db_name` and `collection_name`.
+
+A more complete example can be found [here](../../examples/docstore/MongoDocstoreDemo.ipynb)
+
+### Redis Document Store
+
+We support Redis as an alternative document store backend that persists data as `Node` objects are ingested.
+
+```python
+from llama_index.storage.docstore import RedisDocumentStore
+from llama_index.node_parser import SimpleNodeParser
+
+# create parser and parse document into nodes 
+parser = SimpleNodeParser()
+nodes = parser.get_nodes_from_documents(documents)
+
+# create (or load) docstore and add nodes
+docstore = RedisDocumentStore.from_host_and_port(
+  host="127.0.0.1", 
+  port="6379", 
+  namespace='llama_index'
+)
+docstore.add_documents(nodes)
+
+# create storage context
+storage_context = StorageContext.from_defaults(docstore=docstore)
+
+# build index
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+```
+
+Under the hood, `RedisDocumentStore` connects to a redis database and adds your nodes to a namespace stored under `{namespace}/docs`.
+> Note: You can configure the `namespace` when instantiating `RedisDocumentStore`, otherwise it defaults `namespace="docstore"`.
+
+You can easily reconnect to your Redis client and reload the index by re-initializing a `RedisDocumentStore` with an existing `host`, `port`, and `namespace`.
+
+A more complete example can be found [here](../../examples/docstore/RedisDocstoreIndexStoreDemo.ipynb)
@@ -0,0 +1,75 @@
+# Index Stores
+
+Index stores contains lightweight index metadata (i.e. additional state information created when building an index).
+
+See the [API Reference](/api_reference/storage/index_store.rst) for more details.
+
+### Simple Index Store
+By default, LlamaIndex uses a simple index store backed by an in-memory key-value store.
+They can be persisted to (and loaded from) disk by calling `index_store.persist()` (and `SimpleIndexStore.from_persist_path(...)` respectively).
+
+
+### MongoDB Index Store
+Similarly to document stores, we can also use `MongoDB` as the storage backend of the index store.
+
+
+```python
+from llama_index.storage.index_store import MongoIndexStore
+from llama_index import VectorStoreIndex
+
+# create (or load) index store
+index_store = MongoIndexStore.from_uri(uri="<mongodb+srv://...>")
+
+# create storage context
+storage_context = StorageContext.from_defaults(index_store=index_store)
+
+# build index
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+
+# or alternatively, load index
+from llama_index import load_index_from_storage
+index = load_index_from_storage(storage_context)
+```
+
+Under the hood, `MongoIndexStore` connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your index metadata.
+> Note: You can configure the `db_name` and `namespace` when instantiating `MongoIndexStore`, otherwise they default to `db_name="db_docstore"` and `namespace="docstore"`.
+
+Note that it's not necessary to call `storage_context.persist()` (or `index_store.persist()`) when using an `MongoIndexStore`
+since data is persisted by default. 
+
+You can easily reconnect to your MongoDB collection and reload the index by re-initializing a `MongoIndexStore` with an existing `db_name` and `collection_name`.
+
+A more complete example can be found [here](../../examples/docstore/MongoDocstoreDemo.ipynb)
+
+### Redis Index Store
+
+We support Redis as an alternative document store backend that persists data as `Node` objects are ingested.
+
+```python
+from llama_index.storage.index_store import RedisIndexStore
+from llama_index import VectorStoreIndex
+
+# create (or load) docstore and add nodes
+index_store = RedisIndexStore.from_host_and_port(
+  host="127.0.0.1", 
+  port="6379", 
+  namespace='llama_index'
+)
+
+# create storage context
+storage_context = StorageContext.from_defaults(index_store=index_store)
+
+# build index
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+
+# or alternatively, load index
+from llama_index import load_index_from_storage
+index = load_index_from_storage(storage_context)
+```
+
+Under the hood, `RedisIndexStore` connects to a redis database and adds your nodes to a namespace stored under `{namespace}/index`.
+> Note: You can configure the `namespace` when instantiating `RedisIndexStore`, otherwise it defaults `namespace="index_store"`.
+
+You can easily reconnect to your Redis client and reload the index by re-initializing a `RedisIndexStore` with an existing `host`, `port`, and `namespace`.
+
+A more complete example can be found [here](../../examples/docstore/RedisDocstoreIndexStoreDemo.ipynb)
@@ -0,0 +1,11 @@
+# Key-Value Stores
+
+Key-Value stores are the underlying storage abstractions that power our [Document Stores](./docstores.md) and [Index Stores](./index_stores.md).
+
+We provide the following key-value stores:
+- **Simple Key-Value Store**: An in-memory KV store. The user can choose to call `persist` on this kv store to persist data to disk.
+- **MongoDB Key-Value Store**: A MongoDB KV store.
+
+See the [API Reference](/api_reference/storage/kv_store.rst) for more details.
+
+Note: At the moment, these storage abstractions are not externally facing.
@@ -0,0 +1,91 @@
+# Storage
+
+## Concept
+
+LlamaIndex provides a high-level interface for ingesting, indexing, and querying your external data.
+
+Under the hood, LlamaIndex also supports swappable **storage components** that allows you to customize:
+
+- **Document stores**: where ingested documents (i.e., `Node` objects) are stored,
+- **Index stores**: where index metadata are stored,
+- **Vector stores**: where embedding vectors are stored.
+
+The Document/Index stores rely on a common Key-Value store abstraction, which is also detailed below.
+
+LlamaIndex supports persisting data to any storage backend supported by [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html). 
+We have confirmed support for the following storage backends:
+
+- Local filesystem
+- AWS S3
+- Cloudflare R2
+
+
+![](/_static/storage/storage.png)
+
+## Usage Pattern
+
+Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store. This *also* means that you will not need to explicitly persist this data - this happens automatically. Usage would look something like the following to build a new index / reload an existing one.
+
+```python
+
+## build a new index
+from llama_index import VectorStoreIndex, StorageContext
+from llama_index.vector_stores import DeepLakeVectorStore
+# construct vector store and customize storage context
+vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
+storage_context = StorageContext.from_defaults(
+    vector_store = vector_store
+)
+# Load documents and build index
+index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
+
+
+## reload an existing one
+index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
+```
+
+See our [Vector Store Module Guide](vector_stores.md) below for more details.
+
+
+Note that in general to use storage abstractions, you need to define a `StorageContext` object:
+
+```python
+from llama_index.storage.docstore import SimpleDocumentStore
+from llama_index.storage.index_store import SimpleIndexStore
+from llama_index.vector_stores import SimpleVectorStore
+from llama_index.storage import StorageContext
+
+# create storage context using default stores
+storage_context = StorageContext.from_defaults(
+    docstore=SimpleDocumentStore(),
+    vector_store=SimpleVectorStore(),
+    index_store=SimpleIndexStore(),
+)
+```
+
+More details on customization/persistence can be found in the guides below.
+
+
+```{toctree}
+---
+maxdepth: 1
+---
+customization.md
+save_load.md
+```
+
+
+
+## Modules
+
+We offer in-depth guides on the different storage components.
+
+```{toctree}
+---
+maxdepth: 1
+---
+vector_stores.md
+docstores.md
+index_stores.md
+kv_stores.md
+```
@@ -0,0 +1,93 @@
+# Persisting & Loading Data
+
+## Persisting Data
+By default, LlamaIndex stores data in-memory, and this data can be explicitly persisted if desired:
+```python
+storage_context.persist(persist_dir="<persist_dir>")
+```
+This will persist data to disk, under the specified `persist_dir` (or `./storage` by default).
+
+Multiple indexes can be persisted and loaded from the same directory, assuming you keep track of index ID's for loading.
+
+User can also configure alternative storage backends (e.g. `MongoDB`) that persist data by default.
+In this case, calling `storage_context.persist()` will do nothing.
+
+## Loading Data
+To load data, user simply needs to re-create the storage context using the same configuration (e.g. pass in the same `persist_dir` or vector store client).
+
+```python
+storage_context = StorageContext.from_defaults(
+    docstore=SimpleDocumentStore.from_persist_dir(persist_dir="<persist_dir>"),
+    vector_store=SimpleVectorStore.from_persist_dir(persist_dir="<persist_dir>"),
+    index_store=SimpleIndexStore.from_persist_dir(persist_dir="<persist_dir>"),
+)
+```
+
+We can then load specific indices from the `StorageContext` through some convenience functions below.
+
+
+```python
+from llama_index import load_index_from_storage, load_indices_from_storage, load_graph_from_storage
+
+# load a single index
+# need to specify index_id if multiple indexes are persisted to the same directory
+index = load_index_from_storage(storage_context, index_id="<index_id>") 
+
+# don't need to specify index_id if there's only one index in storage context
+index = load_index_from_storage(storage_context) 
+
+# load multiple indices
+indices = load_indices_from_storage(storage_context) # loads all indices
+indices = load_indices_from_storage(storage_context, index_ids=[index_id1, ...]) # loads specific indices
+
+# load composable graph
+graph = load_graph_from_storage(storage_context, root_id="<root_id>") # loads graph with the specified root_id
+```
+
+Here's the full [API Reference on saving and loading](/api_reference/storage/indices_save_load.rst).
+
+## Using a remote backend
+
+By default, LlamaIndex uses a local filesystem to load and save files. However, you can override this by passing a `fsspec.AbstractFileSystem` object.
+
+Here's a simple example, instantiating a vector store:
+```python
+import dotenv
+import s3fs
+import os
+dotenv.load_dotenv("../../../.env")
+
+# load documents
+documents = SimpleDirectoryReader('../../../examples/paul_graham_essay/data/').load_data()
+print(len(documents))
+index = VectorStoreIndex.from_documents(documents)
+```
+
+At this point, everything has been the same. Now - let's instantiate a S3 filesystem and save / load from there.
+
+```python
+# set up s3fs
+AWS_KEY = os.environ['AWS_ACCESS_KEY_ID']
+AWS_SECRET = os.environ['AWS_SECRET_ACCESS_KEY']
+R2_ACCOUNT_ID = os.environ['R2_ACCOUNT_ID']
+
+assert AWS_KEY is not None and AWS_KEY != ""
+
+s3 = s3fs.S3FileSystem(
+   key=AWS_KEY,
+   secret=AWS_SECRET,
+   endpoint_url=f'https://{R2_ACCOUNT_ID}.r2.cloudflarestorage.com',
+   s3_additional_kwargs={'ACL': 'public-read'}
+)
+
+# save index to remote blob storage
+index.set_index_id("vector_index")
+# this is {bucket_name}/{index_name}
+index.storage_context.persist('llama-index/storage_demo', fs=s3)
+
+# load index from s3
+sc = StorageContext.from_defaults(persist_dir='llama-index/storage_demo', fs=s3)
+index2 = load_index_from_storage(sc, 'vector_index')
+```
+
+By default, if you do not pass a filesystem, we will assume a local filesystem.
@@ -0,0 +1,65 @@
+# Vector Stores
+
+Vector stores contain embedding vectors of ingested document chunks 
+(and sometimes the document chunks as well).
+
+## Simple Vector Store
+By default, LlamaIndex uses a simple in-memory vector store that's great for quick experimentation.
+They can be persisted to (and loaded from) disk by calling `vector_store.persist()` (and `SimpleVectorStore.from_persist_path(...)` respectively).
+
+## Third-Party Vector Store Integrations
+We also integrate with a wide range of vector store implementations. 
+They mainly differ in 2 aspects:
+1. in-memory vs. hosted
+2. stores only vector embeddings vs. also stores documents
+
+### In-Memory Vector Stores
+* Faiss
+* Chroma
+
+### (Self) Hosted Vector Stores
+* Pinecone
+* Weaviate
+* Milvus/Zilliz
+* Qdrant
+* Chroma
+* Opensearch
+* DeepLake
+* MyScale
+* Tair
+* DocArray
+* MongoDB Atlas
+
+### Others
+* ChatGPTRetrievalPlugin
+
+For more details, see [Vector Store Integrations](/community/integrations/vector_stores.md).
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+/examples/vector_stores/SimpleIndexDemo.ipynb
+/examples/vector_stores/QdrantIndexDemo.ipynb
+/examples/vector_stores/FaissIndexDemo.ipynb
+/examples/vector_stores/DeepLakeIndexDemo.ipynb
+/examples/vector_stores/MyScaleIndexDemo.ipynb
+/examples/vector_stores/MetalIndexDemo.ipynb
+/examples/vector_stores/WeaviateIndexDemo.ipynb
+/examples/vector_stores/OpensearchDemo.ipynb
+/examples/vector_stores/PineconeIndexDemo.ipynb
+/examples/vector_stores/ChromaIndexDemo.ipynb
+/examples/vector_stores/LanceDBIndexDemo.ipynb
+/examples/vector_stores/MilvusIndexDemo.ipynb
+/examples/vector_stores/RedisIndexDemo.ipynb
+/examples/vector_stores/WeaviateIndexDemo-Hybrid.ipynb
+/examples/vector_stores/PineconeIndexDemo-Hybrid.ipynb
+/examples/vector_stores/AsyncIndexCreationDemo.ipynb
+/examples/vector_stores/TairIndexDemo.ipynb
+/examples/vector_stores/SupabaseVectorIndexDemo.ipynb
+/examples/vector_stores/DocArrayHnswIndexDemo.ipynb
+/examples/vector_stores/DocArrayInMemoryIndexDemo.ipynb
+/examples/vector_stores/MongoDBAtlasVectorSearch.ipynb
+```
+
@@ -0,0 +1,13 @@
+# Modules
+
+We support integrations with OpenAI, Azure, and anything LangChain offers.
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/embeddings/OpenAI.ipynb
+/examples/embeddings/Langchain.ipynb
+/examples/customization/llms/AzureOpenAI.ipynb
+/examples/embeddings/custom_embeddings.ipynb
+```
@@ -0,0 +1,42 @@
+# Embeddings
+
+## Concept
+Embeddings are used in LlamaIndex to represent your documents using a sophistacted numerical representation. Embedding models take text as input, and return a long list of numbers used to caputre the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!
+
+At a high level, if a user asks a question about dogs, then the embedding for that question will be highly similar to text that talks about dogs.
+
+When calculating the similarity between embeddings, there are many methods to use (dot product, cosine similarity, etc.). By default, LlamaIndex uses cosine similarity when comparing embeddings.
+
+There are many embedding models to pick from. By default, LlamaIndex uses `text-embedding-ada-002` from OpenAI. We also support any embedding model offered by Langchain [here](https://python.langchain.com/docs/modules/data_connection/text_embedding/), as well as providing an easy to extend base class for implementing your own embeddings.
+
+## Usage Pattern
+
+Most commonly in LlamaIndex, embedding models will be specified in the `ServiceContext` object, and then used in a vector index. The embedding model will be used to embed the documents used during index construction, as well as embedding any queries you make using the query engine later on.
+
+```python
+from llama_index import ServiceContext
+from llama_index.embeddings import OpenAIEmbedding
+
+embed_model = OpenAIEmbedding()
+service_context = serviceContext.from_defaults(embed_model=embed_model)
+```
+
+You can find more usage details and availbale customization options below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+usage_pattern.md
+```
+
+## Modules
+
+We support integrations with OpenAI, Azure, and anything LangChain offers. Details below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+modules.md
+```
@@ -0,0 +1,101 @@
+# Usage Pattern
+
+## Getting Started
+
+The most common usage for an embedding model will be setting it in the service context object, and then using it to construct an index and query. The input documents will be broken into nodes, and the emedding model will generate an embedding for each node.
+
+By default, LlamaIndex will use `text-embedding-ada-002`, which is what the example below manually sets up for you.
+
+```python
+from llama_index import ServiceContext, VectorStoreIndex, SimpleDirectoryReader
+from llama_index.embeddings import OpenAIEmbedding
+
+embed_model = OpenAIEmbedding()
+service_context = serviceContext.from_defaults(embed_model=embed_model)
+
+# optionally set a global service context to avoid passing it into other objects every time
+from llama_index import set_global_service_context
+set_global_service_context(service_context)
+
+documents = SimpleDirectoryReader("./data").load_data()
+
+index = VectorStoreIndex.from_documents(documents)
+```
+
+Then, at query time, the embedding model will be used again to embed the query text.
+
+```python
+query_engine = index.as_query_engine()
+
+response = query_engine.query("query string")
+```
+
+## Customization
+
+### Batch Size
+
+By default, embeddings requests are sent to OpenAI in batches of 10. For some users, this may (rarely) incur a rate limit. For other users embedding many documents, this batch size may be too small.
+
+```python
+# set the batch size to 42
+embed_model = OpenAIEmbedding(embed_batch_size=42)
+```
+
+### Embedding Model Integrations
+
+We also support any embeddings offered by Langchain [here](https://python.langchain.com/docs/modules/data_connection/text_embedding/), using our `LangchainEmbedding` wrapper class.
+
+The example below loads a model from Hugging Face, using Langchain's embedding class.
+
+```python
+from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+from llama_index import LangchainEmbedding, ServiceContext
+
+embed_model = LangchainEmbedding(
+  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
+)
+service_context = ServiceContext.from_defaults(embed_model=embed_model)
+```
+
+### Custom Embedding Model
+
+If you wanted to use embeddings not offered by LlamaIndex or Langchain, you can also extend our base embeddings class and implement your own!
+
+The example below uses Instructor Embeddings ([install/setup details here](https://huggingface.co/hkunlp/instructor-large)), and implements a custom embeddings class. Instructor embeddings work by providing text, as well as "instructions" on the domain of the text to embed. This is helpful when embedding text from a very specific and specialized topic.
+
+```python
+from typing import Any, List
+from InstructorEmbedding import INSTRUCTOR
+from llama_index.embeddings.base import BaseEmbedding
+
+class InstructorEmbeddings(BaseEmbedding):
+  def __init__(
+    self, 
+    instructor_model_name: str = "hkunlp/instructor-large",
+    instruction: str = "Represent the Computer Science documentation or question:",
+    **kwargs: Any,
+  ) -> None:
+    self._model = INSTRUCTOR(instructor_model_name)
+    self._instruction = instruction
+    super().__init__(**kwargs)
+
+    def _get_query_embedding(self, query: str) -> List[float]:
+      embeddings = self._model.encode([[self._instruction, query]])
+      return embeddings[0]
+
+    def _get_text_embedding(self, text: str) -> List[float]:
+      embeddings = self._model.encode([[self._instruction, text]])
+      return embeddings[0] 
+
+    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
+      embeddings = self._model.encode([[self._instruction, text] for text in texts])
+      return embeddings
+```
+
+## Standalone Usage
+
+You can also use embeddings as a standalone module for your project, existing application, or general testing and exploration.
+
+```python
+embeddings = embed_model.get_text_embedding("It is raining cats and dogs here!")
+```
@@ -0,0 +1,51 @@
+# Modules
+
+We support integrations with OpenAI, Anthropic, Hugging Face, PaLM, and more.
+
+## OpenAI
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/openai.ipynb
+/examples/llm/azure_openai.ipynb
+
+```
+
+## Anthropic
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/anthropic.ipynb
+
+```
+
+## Hugging Face
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb
+/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb
+
+```
+
+
+## PaLM
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/palm.ipynb
+
+```
+
+## LangChain
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/llm/langchain.ipynb
@@ -0,0 +1,49 @@
+# LLM
+
+## Concept
+Picking the proper Large Language Model (LLM) is one of the first steps you need to consider when building any LLM application over your data.
+
+LLMs are a core component of LlamaIndex. They can be used as standalone modules or plugged into other core LlamaIndex modules (indices, retrievers, query engines). They are always used during the response synthesis step (e.g. after retrieval). Depending on the type of index being used, LLMs may also be used during index construction, insertion, and query traversal.
+
+LlamaIndex provides a unified interface for defining LLM modules, whether it's from OpenAI, Hugging Face, or LangChain, so that you 
+don't have to write the boilerplate code of defining the LLM interface yourself. This interface consists of the following (more details below):
+- Support for **text completion** and **chat** endpoints (details below)
+- Support for **streaming** and **non-streaming** endpoints
+- Support for **synchronous** and **asynchronous** endpoints
+
+
+## Usage Pattern
+
+The following code snippet shows how you can get started using LLMs.
+
+```python
+from llama_index.llms import OpenAI
+
+# non-streaming
+resp = OpenAI().complete('Paul Graham is ')
+print(resp)
+```
+
+You can use the LLM as a standalone module or with other LlamaIndex abstractions. Check out our guide below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+usage_standalone.md
+usage_custom.md
+```
+
+
+## Modules
+
+We support integrations with OpenAI, Hugging Face, PaLM, and more.
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
+
+
@@ -0,0 +1,248 @@
+# Customizing LLMs within LlamaIndex Abstractions
+
+You can plugin these LLM abstractions within our other modules in LlamaIndex (indexes, retrievers, query engines, agents) which allow you to build advanced workflows over your data.
+
+By default, we use OpenAI's `text-davinci-003` model. But you may choose to customize
+the underlying LLM being used.
+
+Below we show a few examples of LLM customization. This includes
+
+- changing the underlying LLM
+- changing the number of output tokens (for OpenAI, Cohere, or AI21)
+- having more fine-grained control over all parameters for any LLM, from context window to chunk overlap
+
+## Example: Changing the underlying LLM
+
+An example snippet of customizing the LLM being used is shown below.
+In this example, we use `text-davinci-002` instead of `text-davinci-003`. Available models include `text-davinci-003`,`text-curie-001`,`text-babbage-001`,`text-ada-001`, `code-davinci-002`,`code-cushman-001`. 
+
+Note that
+you may also plug in any LLM shown on Langchain's
+[LLM](https://python.langchain.com/en/latest/modules/models/llms/integrations.html) page.
+
+```python
+
+from llama_index import (
+    KeywordTableIndex,
+    SimpleDirectoryReader,
+    LLMPredictor,
+    ServiceContext
+)
+from llama_index.llms import OpenAI
+# alternatively
+# from langchain.llms import ...
+
+documents = SimpleDirectoryReader('data').load_data()
+
+# define LLM
+llm = OpenAI(temperature=0, model="text-davinci-002")
+service_context = ServiceContext.from_defaults(llm=llm)
+
+# build index
+index = KeywordTableIndex.from_documents(documents, service_context=service_context)
+
+# get response from query
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do after his time at Y Combinator?")
+
+```
+
+## Example: Changing the number of output tokens (for OpenAI, Cohere, AI21)
+
+The number of output tokens is usually set to some low number by default (for instance,
+with OpenAI the default is 256).
+
+For OpenAI, Cohere, AI21, you just need to set the `max_tokens` parameter
+(or maxTokens for AI21). We will handle text chunking/calculations under the hood.
+
+```python
+
+from llama_index import (
+    KeywordTableIndex,
+    SimpleDirectoryReader,
+    ServiceContext
+)
+from llama_index.llms import OpenAI
+
+documents = SimpleDirectoryReader('data').load_data()
+
+# define LLM
+llm = OpenAI(temperature=0, model="text-davinci-002", max_tokens=512)
+service_context = ServiceContext.from_defaults(llm=llm)
+
+```
+
+## Example: Explicitly configure `context_window` and `num_output`
+
+If you are using other LLM classes from langchain, you may need to explicitly configure the `context_window` and `num_output` via the `ServiceContext` since the information is not available by default.
+
+```python
+
+from llama_index import (
+    KeywordTableIndex,
+    SimpleDirectoryReader,
+    ServiceContext
+)
+from llama_index.llms import OpenAI
+# alternatively
+# from langchain.llms import ...
+
+documents = SimpleDirectoryReader('data').load_data()
+
+
+# set context window
+context_window = 4096
+# set number of output tokens
+num_output = 256
+
+# define LLM
+llm = OpenAI(
+    temperature=0, 
+    model="text-davinci-002", 
+    max_tokens=num_output,
+)
+
+service_context = ServiceContext.from_defaults(
+    llm=llm,
+    context_window=context_window,
+    num_output=num_output,
+)
+
+```
+
+## Example: Using a HuggingFace LLM
+
+LlamaIndex supports using LLMs from HuggingFace directly. Note that for a completely private experience, also setup a local embedding model (example [here](embeddings.md#custom-embeddings)).
+
+Many open-source models from HuggingFace require either some preamble before before each prompt, which is a `system_prompt`. Additionally, queries themselves may need an additional wrapper around the `query_str` itself. All this information is usually available from the HuggingFace model card for the model you are using.
+
+Below, this example uses both the `system_prompt` and `query_wrapper_prompt`, using specific prompts from the model card found [here](https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b).
+
+```python
+from llama_index.prompts.prompts import SimpleInputPrompt
+
+system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
+- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
+- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
+- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
+- StableLM will refuse to participate in anything that could harm a human.
+"""
+
+# This will wrap the default prompts that are internal to llama-index
+query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")
+
+import torch
+from llama_index.llms import HuggingFaceLLM
+llm = HuggingFaceLLM(
+    context_window=4096, 
+    max_new_tokens=256,
+    generate_kwargs={"temperature": 0.7, "do_sample": False},
+    system_prompt=system_prompt,
+    query_wrapper_prompt=query_wrapper_prompt,
+    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
+    model_name="StabilityAI/stablelm-tuned-alpha-3b",
+    device_map="auto",
+    stopping_ids=[50278, 50279, 50277, 1, 0],
+    tokenizer_kwargs={"max_length": 4096},
+    # uncomment this if using CUDA to reduce memory usage
+    # model_kwargs={"torch_dtype": torch.float16}
+)
+service_context = ServiceContext.from_defaults(
+    chunk_size=1024, 
+    llm=llm,
+)
+```
+
+Some models will raise errors if all the keys from the tokenizer are passed to the model. A common tokenizer output that causes issues is `token_type_ids`. Below is an example of configuring the predictor to remove this before passing the inputs to the model:
+
+```python
+HuggingFaceLLM(
+    ...
+    tokenizer_outputs_to_remove=["token_type_ids"]
+)
+```
+
+A full API reference can be found [here](../../reference/llm_predictor.rst).
+
+Several example notebooks are also listed below:
+
+- [StableLM](/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb)
+- [Camel](/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb)
+
+## Example: Using a Custom LLM Model - Advanced
+
+To use a custom LLM model, you only need to implement the `LLM` class (or `CustomLLM` for a simpler interface)
+You will be responsible for passing the text to the model and returning the newly generated tokens.
+
+Note that for a completely private experience, also setup a local embedding model (example [here](embeddings.md#custom-embeddings)).
+
+Here is a small example using locally running facebook/OPT model and Huggingface's pipeline abstraction:
+
+```python
+import torch
+from transformers import pipeline
+from typing import Optional, List, Mapping, Any
+
+from llama_index import (
+    ServiceContext, 
+    SimpleDirectoryReader, 
+    LangchainEmbedding, 
+    ListIndex
+)
+from llama_index.llms import CustomLLM, CompletionResponse, LLMMetadata
+
+
+# set context window size
+context_window = 2048
+# set number of output tokens
+num_output = 256
+
+# store the pipeline/model outisde of the LLM class to avoid memory issues
+model_name = "facebook/opt-iml-max-30b"
+pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})
+
+class OurLLM(CustomLLM):
+
+    @property
+    def metadata(self) -> LLMMetadata:
+        """Get LLM metadata."""
+        return LLMMetadata(
+            context_window=context_window, num_output=num_output
+        )
+
+    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
+        prompt_length = len(prompt)
+        response = pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
+
+        # only return newly generated tokens
+        text = response[prompt_length:]
+        return CompletionResponse(text=text)
+    
+    def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
+        raise NotImplementedError()
+
+# define our LLM
+llm = OurLLM()
+
+service_context = ServiceContext.from_defaults(
+    llm=llm, 
+    context_window=context_window, 
+    num_output=num_output
+)
+
+# Load the your data
+documents = SimpleDirectoryReader('./data').load_data()
+index = ListIndex.from_documents(documents, service_context=service_context)
+
+# Query and print response
+query_engine = index.as_query_engine()
+response = query_engine.query("<query_text>")
+print(response)
+```
+
+Using this method, you can use any LLM. Maybe you have one running locally, or running on your own server. As long as the class is implemented and the generated tokens are returned, it should work out. Note that we need to use the prompt helper to customize the prompt sizes, since every model has a slightly different context length.
+
+Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a sufficiently large LLM to ensure it's capable of handling the complex queries that LlamaIndex uses internally, so your mileage may vary.
+
+A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](/core_modules/service_modules/prompts.md).
+
@@ -0,0 +1,35 @@
+# Using LLMs as standalone modules
+
+You can use our LLM modules on their own.
+
+## Text Completion Example
+
+```python
+from llama_index.llms import OpenAI
+
+# non-streaming
+resp = OpenAI().complete('Paul Graham is ')
+print(resp)
+
+# using streaming endpoint
+from llama_index.llms import OpenAI
+llm = OpenAI()
+resp = llm.stream_complete('Paul Graham is ')
+for delta in resp:
+    print(delta, end='')
+```
+
+## Chat Example
+
+```python
+from llama_index.llms import ChatMessage, OpenAI
+
+messages = [
+    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
+    ChatMessage(role="user", content="What is your name"),
+]
+resp = OpenAI().chat(messages)
+print(resp)
+```
+
+Check out our [modules section](modules.md) for usage guides for each LLM.
@@ -0,0 +1,106 @@
+# Prompts
+
+## Concept
+
+Prompting is the fundamental input that gives LLMs their expressive power. LlamaIndex uses prompts to build the index, do insertion, 
+perform traversal during querying, and to synthesize the final answer.
+
+LlamaIndex uses a set of [default prompt templates](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) that work well out of the box.
+
+In addition, there are some prompts written and used specifically for chat models like `gpt-3.5-turbo` [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py).
+
+Users may also provide their own prompt templates to further customize the behavior of the framework. The best method for customizing is copying the default prompt from the link above, and using that as the base for any modifications.
+
+## Usage Pattern
+
+### Defining a custom prompt
+
+Defining a custom prompt is as simple as creating a format string
+
+```python
+from llama_index import Prompt
+
+template = (
+    "We have provided context information below. \n"
+    "---------------------\n"
+    "{context_str}"
+    "\n---------------------\n"
+    "Given this information, please answer the question: {query_str}\n"
+)
+qa_template = Prompt(template)
+```
+
+> Note: you may see references to legacy prompt subclasses such as `QuestionAnswerPrompt`, `RefinePrompt`. These have been deprecated (and now are type aliases of `Prompt`). Now you can directly specify `Prompt(template)` to construct custom prompts. But you still have to make sure the template string contains the expected parameters (e.g. `{context_str}` and `{query_str}`) when replacing a default question answer prompt.
+
+### Passing custom prompts into the pipeline
+
+Since LlamaIndex is a multi-step pipeline, it's important to identify the operation that you want to modify and pass in the custom prompt at the right place.
+
+At a high-level, prompts are used in 1) index construction, and 2) query engine execution
+
+The most commonly used prompts will be the `text_qa_template` and the `refine_template`. 
+
+- `text_qa_template` - used to get an initial answer to a query using retrieved nodes
+- `refine_tempalate` - used when the retrieved text does not fit into a single LLM call with `response_mode="compact"` (the default), or when more than one node is retrieved using `response_mode="refine"`. The answer from the first query is inserted as an `existing_answer`, and the LLM must update or repeat the existing answer based on the new context.
+
+#### Modify prompts used in index construction
+Different indices use different types of prompts during construction (some don't use prompts at all). 
+For instance, `TreeIndex` uses a `SummaryPrompt` to hierarchically
+summarize the nodes, and `KeywordTableIndex` uses a `KeywordExtractPrompt` to extract keywords.
+
+There are two equivalent ways to override the prompts:
+
+1. via the default nodes constructor 
+
+```python
+index = TreeIndex(nodes, summary_template=<custom_prompt>)
+```
+2. via the documents constructor.
+
+```python
+index = TreeIndex.from_documents(docs, summary_template=<custom_prompt>)
+```
+
+For more details on which index uses which prompts, please visit
+[Index class references](/api_reference/indices.rst).
+
+#### Modify prompts used in query engine
+More commonly, prompts are used at query-time (i.e. for executing a query against an index and synthesizing the final response). 
+
+There are also two equivalent ways to override the prompts:
+
+1. via the high-level API
+```python
+query_engine = index.as_query_engine(
+    text_qa_template=<custom_qa_prompt>,
+    refine_template=<custom_refine_prompt>
+)
+```
+2. via the low-level composition API
+
+```python
+retriever = index.as_retriever()
+synth = get_response_synthesizer(
+    text_qa_template=<custom_qa_prompt>,
+    refine_template=<custom_refine_prompt>
+)
+query_engine = RetrieverQueryEngine(retriever, response_synthesizer)
+```
+
+The two approaches above are equivalent, where 1 is essentially syntactic sugar for 2 and hides away the underlying complexity. You might want to use 1 to quickly modify some common parameters, and use 2 to have more granular control.
+
+
+For more details on which classes use which prompts, please visit
+[Query class references](/api_reference/query.rst).
+
+Check out the [reference documentation](/api_reference/prompts.rst) for a full set of all prompts.
+
+## Modules
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/customization/prompts/completion_prompts.ipynb
+/examples/customization/prompts/chat_prompts.ipynb
+```
@@ -0,0 +1,16 @@
+# Module Guides
+
+We provide a few simple implementations to start, with more sophisticated modes coming soon!  
+
+More specifically, the `SimpleChatEngine` does not make use of a knowledge base, 
+whereas `CondenseQuestionChatEngine` and `ReActChatEngine` make use of a query engine over knowledge base.
+
+```{toctree}
+---
+maxdepth: 1
+---
+Simple Chat Engine </examples/chat_engine/chat_engine_repl.ipynb>
+ReAct Chat Engine </examples/chat_engine/chat_engine_react.ipynb>
+OpenAI Chat Engine </examples/chat_engine/chat_engine_openai.ipynb>
+Condense Question Chat Engine </examples/chat_engine/chat_engine_condense_question.ipynb>
+```
@@ -0,0 +1,48 @@
+# Chat Engine
+
+## Concept
+Chat engine is a high-level interface for having a conversation with your data
+(multiple back-and-forth instead of a single question & answer).
+Think ChatGPT, but augmented with your knowledge base.  
+
+Conceptually, it is a **stateful** analogy of a [Query Engine](../query_engine/root.md). 
+By keeping track of the conversation history, it can answer questions with past context in mind.  
+
+
+```{tip}
+If you want to ask standalone question over your data (i.e. without keeping track of conversation history), use [Query Engine](../query_engine/root.md) instead.  
+```
+
+## Usage Pattern
+Get started with:
+```python
+chat_engine = index.as_chat_engine()
+response = chat_engine.chat("Tell me a joke.")
+```
+
+To stream response:
+```python
+chat_engine = index.as_chat_engine()
+streaming_response = chat_engine.stream_chat("Tell me a joke.")
+for token in streaming_response.response_gen:
+    print(token, end="")
+```
+
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+
+## Modules
+Below you can find corresponding tutorials to see the available chat engines in action. 
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
@@ -0,0 +1,109 @@
+# Usage Pattern
+
+## Get Started
+
+Build a chat engine from index:
+```python
+chat_engine = index.as_chat_engine()
+```
+
+```{tip}
+To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
+```
+
+Have a conversation with your data:
+```python
+response = chat_engine.chat("Tell me a joke.")
+```
+
+Reset chat history to start a new conversation:
+```python
+chat_engine.reset()
+```
+
+Enter an interactive chat REPL:
+```python
+chat_engine.chat_repl()
+```
+
+
+## Configuring a Chat Engine
+Configuring a chat engine is very similar to configuring a query engine.
+
+### High-Level API
+You can directly build and configure a chat engine from an index in 1 line of code:
+```python
+chat_engine = index.as_chat_engine(
+    chat_mode='condense_question', 
+    verbose=True
+)
+```
+> Note: you can access different chat engines by specifying the `chat_mode` as a kwarg. `condense_question` corresponds to `CondenseQuestionChatEngine`, `react` corresponds to `ReActChatEngine`.
+
+> Note: While the high-level API optimizes for ease-of-use, it does *NOT* expose full range of configurability.  
+
+### Low-Level Composition API
+
+You can use the low-level composition API if you need more granular control.
+Concretely speaking, you would explicitly construct `ChatEngine` object instead of calling `index.as_chat_engine(...)`.
+> Note: You may need to look at API references or example notebooks.
+
+Here's an example where we configure the following:
+* configure the condense question prompt, 
+* initialize the conversation with some existing history,
+* print verbose debug message.
+
+```python
+from llama_index.prompts  import Prompt
+
+custom_prompt = Prompt("""\
+Given a conversation (between Human and Assistant) and a follow up message from Human, \
+rewrite the message to be a standalone question that captures all relevant context \
+from the conversation.
+
+<Chat History> 
+{chat_history}
+
+<Follow Up Message>
+{question}
+
+<Standalone question>
+""")
+
+# list of (human_message, ai_message) tuples
+custom_chat_history = [
+    (
+        'Hello assistant, we are having a insightful discussion about Paul Graham today.', 
+        'Okay, sounds good.'
+    )
+]
+
+query_engine = index.as_query_engine()
+chat_engine = CondenseQuestionChatEngine.from_defaults(
+    query_engine=query_engine, 
+    condense_question_prompt=custom_prompt,
+    chat_history=custom_chat_history,
+    verbose=True
+)
+```
+
+
+
+### Streaming
+To enable streaming, you simply need to call the `stream_chat` endpoint instead of the `chat` endpoint. 
+
+```{warning}
+This somewhat inconsistent with query engine (where you pass in a `streaming=True` flag). We are working on making the behavior more consistent! 
+```
+
+```python
+chat_engine = index.as_chat_engine()
+streaming_response = chat_engine.stream_chat("Tell me a joke.")
+for token in streaming_response.response_gen:
+    print(token, end="")
+```
+
+See an [end-to-end tutorial](/examples/customization/streaming/chat_engine_condense_question_stream_response.ipynb)
+
+
+
@@ -0,0 +1,222 @@
+# Modules
+
+## SimilarityPostprocessor
+
+Used to remove nodes that are below a similarity score threshold.
+
+```python
+from llama_index.indices.postprocessor import SimilarityPostprocessor
+
+postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+## KeywordNodePostprocessor
+
+Used to ensure certain keywords are either excluded or included.
+
+```python
+from llama_index.indices.postprocessor import KeywordNodePostprocessor
+
+postprocessor = KeywordNodePostprocessor(
+  required_keywords=["word1", "word2"],
+  exclude_keywords=["word3", "word4"]
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+## SentenceEmbeddingOptimizer
+
+This postprocessor optimizes token usage by removing sentences that are not relevant to the query (this is done using embeddings).
+
+The percentile cutoff is a measure for using the top percentage of relevant sentences.
+
+The threshold cutoff can be specified instead, which uses a raw similarity cutoff for picking which sentences to keep.
+
+```python
+from llama_index.indices.postprocessor import SentenceEmbeddingOptimizer
+
+postprocessor = SentenceEmbeddingOptimizer(
+  embed_model=service_context.embed_model,
+  percentile_cutoff=0.5,
+  # threshold_cutoff=0.7
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+A full notebook guide can be found [here](/examples/node_postprocessor/OptimizerDemo.ipynb)
+
+## CohereRerank
+
+Uses the "Cohere ReRank" functionality to re-order nodes, and returns the top N nodes.
+
+```python
+from llama_index.indices.postprocessor import CohereRerank
+
+postprocessor = CohereRerank(
+  top_n=2
+  model="rerank-english-v2.0",
+  api_key="YOUR COHERE API KEY"
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+Full notebook guide is available [here](/examples/node_postprocessor/CohereRerank.ipynb).
+
+## LLM Rerank
+
+Uses a LLM to re-order nodes by asking the LLM to return the relevant documents and a score of how relevant they are. Returns the top N ranked nodes.
+
+```python
+from llama_index.indices.postprocessor import LLMRerank
+
+postprocessor = LLMRerank(
+  top_n=2
+  service_context=service_context,
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+Full notebook guide is available [her for Gatsby](/examples/node_postprocessor/LLMReranker-Gatsby.ipynb) and [here for Lyft 10K documents](/examples/node_postprocessor/LLMReranker-Lyft-10k.ipynb).
+
+## FixedRecencyPostprocessor
+
+This postproccesor returns the top K nodes sorted by date. This assumes there is a `date` field to parse in the metadata of each node.
+
+```python
+from llama_index.indices.postprocessor import FixedRecencyPostprocessor
+
+postprocessor = FixedRecencyPostprocessor(
+  tok_k=1,
+  date_key="date"  # the key in the metadata to find the date
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+![](/_static/node_postprocessors/recency.png)
+
+A full notebook guide is available [here](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb).
+
+## EmbeddingRecencyPostprocessor
+
+This postproccesor returns the top K nodes after sorting by date and removing older nodes that are too similar after measuring embedding similarity.
+
+```python
+from llama_index.indices.postprocessor import EmbeddingRecencyPostprocessor
+
+postprocessor = EmbeddingRecencyPostprocessor(
+  service_context=service_context,
+  date_key="date",
+  similarity_cutoff=0.7
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+A full notebook guide is available [here](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb).
+
+## TimeWeightedPostprocessor
+
+This postproccesor returns the top K nodes applying a time-weighted rerank to each node. Each time a node is retrieved, the time it was retrieved is recorded. This biases search to favor information that has not be returned in a query yet.
+
+```python
+from llama_index.indices.postprocessor import TimeWeightedPostprocessor
+
+postprocessor = TimeWeightedPostprocessor(
+  time_decay=0.99,
+  top_k=1
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+A full notebook guide is available [here](/examples/node_postprocessor/TimeWeightedPostprocessorDemo.ipynb).
+
+## (Beta) PIINodePostprocessor
+
+The PII (Personal Identifiable Information) postprocssor removes information that might be a security risk. It does this by using NER (either with a dedicated NER model, or with a local LLM model).
+
+### LLM Version
+
+```python
+from llama_index.indices.postprocessor import PIINodePostprocessor
+
+postprocessor = PIINodePostprocessor(
+  service_context=service_context,  # this should be setup with an LLM you trust
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+### NER Version
+
+This version uses the default local model from Hugging Face that is loaded when you run `pipline("ner")`.
+
+```python
+from llama_index.indices.postprocessor import NERPIINodePostprocessor
+
+postprocessor = NERPIINodePostprocessor()
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+A full notebook guide for both can be found [here](/examples/node_postprocessor/PII.ipynb).
+
+## (Beta) PrevNextNodePostprocessor
+
+Uses pre-defined settings to read the `Node` relationships and fetch either all nodes that come previously, next, or both.
+
+This is useful when you know the relationships point to important data (either before, after, or both) that should be sent to the LLM if that node is retrieved.
+
+```python
+from llama_index.indices.postprocessor import PrevNextNodePostprocessor
+
+postprocessor = PrevNextNodePostprocessor(
+  docstore=index.docstore,
+  num_nodes=1,  # number of nodes to fetch when looking forawrds or backwards
+  mode="next"   # can be either 'next', 'previous', or 'both'
+)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+![](/_static/node_postprocessors/prev_next.png)
+
+## (Beta) AutoPrevNextNodePostprocessor
+
+The same as PrevNextNodePostprocessor, but lets the LLM decide the mode (next, previous, or both).
+
+```python
+from llama_index.indices.postprocessor import AutoPrevNextNodePostprocessor
+
+postprocessor = AutoPrevNextNodePostprocessor(
+  docstore=index.docstore,
+  service_context=service_context
+  num_nodes=1,  # number of nodes to fetch when looking forawrds or backwards)
+
+postprocessor.postprocess_nodes(nodes)
+```
+
+A full example notebook is available [here](/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb).
+
+## All Notebooks
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/node_postprocessor/OptimizerDemo.ipynb
+/examples/node_postprocessor/CohereRerank.ipynb
+/examples/node_postprocessor/LLMReranker-Lyft-10k.ipynb
+/examples/node_postprocessor/LLMReranker-Gatsby.ipynb
+/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb
+/examples/node_postprocessor/TimeWeightedPostprocessorDemo.ipynb
+/examples/node_postprocessor/PII.ipynb
+/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb
+```
@@ -0,0 +1,49 @@
+# Node Postprocessor
+
+## Concept
+Node postprocessors are a set of modules that take a set of nodes, and apply some kind of transformation or filtering before returning them.
+
+In LlamaIndex, node postprocessors are most commonly applied within a query engine, after the node retrieval step and before the response synthesis step.
+
+LlamaIndex offers several node postprocessors for immediate use, while also providing a simple API for adding your own custom postprocessors.
+
+```{tip}
+Confused about where node postprocessor fits in the pipeline? Read about [high-level concepts](/getting_started/concepts.md)
+```
+
+## Usage Pattern
+
+An example of using a node postprocessors is below:
+
+```python
+from llama_index.indices.postprocessor import SimilarityPostprocessor
+from llama_index.schema import Node, NodeWithScore
+
+nodes = [
+  NodeWithScore(node=Node(text="text"), score=0.7),
+  NodeWithScore(node=Node(text="text"), score=0.8)
+]
+
+# filter nodes below 0.75 similarity score
+processor = SimilarityPostprocessor(similarity_cutoff=0.75)
+filtered_nodes = processor.postprocess_nodes(nodes)
+```
+
+You can find more details using post processors and how to build your own below.
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+## Modules
+Below you can find guides for each node postprocessor.
+
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
@@ -0,0 +1,93 @@
+# Usage Pattern
+
+Most commonly, node-postprocessors will be used in a query engine, where they are applied to the nodes returned from a retriever, and before the response synthesis step.
+
+
+## Using with a Query Engine
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+from llama_index.indices.postprocessor import TimeWeightedPostprocessor
+
+documents = SimpleDirectoryReader("./data").load_data()
+
+index = VectorStoreIndex.from_documents(documents)
+
+query_engine = index.as_query_engine(
+  node_postprocessors=[
+    TimeWeightedPostprocessor(
+        time_decay=0.5, time_access_refresh=False, top_k=1
+    )
+  ]
+)
+
+# all node post-processors will be applied during each query
+response = query_engine.query("query string")
+```
+
+## Using with Retrieved Nodes
+
+Or used as a standalone object for filtering retrieved nodes:
+
+```python
+from llama_index.indices.postprocessor import SimilarityPostprocessor
+
+nodes = index.as_retriever().query("query string")
+
+# filter nodes below 0.75 similarity score
+processor = SimilarityPostprocessor(similarity_cutoff=0.75)
+filtered_nodes = processor.postprocess_nodes(nodes)
+```
+
+## Using with your own nodes
+
+As you may have noticed, the postprocessors take `NodeWithScore` objects as inputs, which is just a wrapper class with a `Node` and a `score` value.
+
+```python
+from llama_index.indices.postprocessor import SimilarityPostprocessor
+from llama_index.schema import Node, NodeWithScore
+
+nodes = [
+  NodeWithScore(node=Node(text="text"), score=0.7),
+  NodeWithScore(node=Node(text="text"), score=0.8)
+]
+
+# filter nodes below 0.75 similarity score
+processor = SimilarityPostprocessor(similarity_cutoff=0.75)
+filtered_nodes = processor.postprocess_nodes(nodes)
+```
+
+## Custom Node PostProcessor
+
+The base class is `BaseNodePostprocessor`, and the API interface is very simple: 
+
+```python
+class BaseNodePostprocessor:
+    """Node postprocessor."""
+
+    @abstractmethod
+    def postprocess_nodes(
+        self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
+    ) -> List[NodeWithScore]:
+        """Postprocess nodes."""
+```
+
+A dummy node-postprocessor can be implemented in just a few lines of code:
+
+```python
+from llama_index import QueryBundle
+from llama_index.indices.postprocessor.base import BaseNodePostprocessor
+from llama_index.schema import NodeWithScore
+
+class DummyNodePostprocessor:
+
+    def postprocess_nodes(
+        self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
+    ) -> List[NodeWithScore]:
+        
+        # subtracts 1 from the score
+        for n in nodes:
+            n.score -= 1
+
+        return nodes
+```
@@ -0,0 +1,144 @@
+# Query Transformations
+
+
+LlamaIndex allows you to perform *query transformations* over your index structures.
+Query transformations are modules that will convert a query into another query. They can be **single-step**, as in the transformation is run once before the query is executed against an index. 
+
+They can also be **multi-step**, as in: 
+1. The query is transformed, executed against an index, 
+2. The response is retrieved.
+3. Subsequent queries are transformed/executed in a sequential fashion.
+
+We list some of our query transformations in more detail below.
+
+#### Use Cases
+Query transformations have multiple use cases:
+- Transforming an initial query into a form that can be more easily embedded (e.g. HyDE)
+- Transforming an initial query into a subquestion that can be more easily answered from the data (single-step query decomposition)
+- Breaking an initial query into multiple subquestions that can be more easily answered on their own. (multi-step query decomposition)
+
+
+### HyDE (Hypothetical Document Embeddings)
+
+[HyDE](http://boston.lti.cs.cmu.edu/luyug/HyDE/HyDE.pdf) is a technique where given a natural language query, a hypothetical document/answer is generated first. This hypothetical document is then used for embedding lookup rather than the raw query.
+
+To use HyDE, an example code snippet is shown below.
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+from llama_index.indices.query.query_transform.base import HyDEQueryTransform
+from llama_index.query_engine.transform_query_engine import TransformQueryEngine
+
+# load documents, build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectorStoreIndex(documents)
+
+# run query with HyDE query transform
+query_str = "what did paul graham do after going to RISD"
+hyde = HyDEQueryTransform(include_original=True)
+query_engine = index.as_query_engine()
+query_engine = TransformQueryEngine(query_engine, query_transform=hyde)
+response = query_engine.query(query_str)
+print(response)
+
+```
+
+Check out our [example notebook](../../../examples/query_transformations/HyDEQueryTransformDemo.ipynb) for a full walkthrough.
+
+
+### Single-Step Query Decomposition
+
+Some recent approaches (e.g. [self-ask](https://ofir.io/self-ask.pdf), [ReAct](https://arxiv.org/abs/2210.03629)) have suggested that LLM's 
+perform better at answering complex questions when they break the question into smaller steps. We have found that this is true for queries that require knowledge augmentation as well.
+
+If your query is complex, different parts of your knowledge base may answer different "subqueries" around the overall query.
+
+Our single-step query decomposition feature transforms a **complicated** question into a simpler one over the data collection to help provide a sub-answer to the original question.
+
+This is especially helpful over a [composed graph](../../index/composability.md). Within a composed graph, a query can be routed to multiple subindexes, each representing a subset of the overall knowledge corpus. Query decomposition allows us to transform the query into a more suitable question over any given index.
+
+An example image is shown below.
+
+![](/_static/query_transformations/single_step_diagram.png)
+
+
+Here's a corresponding example code snippet over a composed graph.
+
+```python
+
+# Setting: a list index composed over multiple vector indices
+# llm_predictor_chatgpt corresponds to the ChatGPT LLM interface
+from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
+decompose_transform = DecomposeQueryTransform(
+    llm_predictor_chatgpt, verbose=True
+)
+
+# initialize indexes and graph
+...
+
+
+# configure retrievers
+vector_query_engine = vector_index.as_query_engine()
+vector_query_engine = TransformQueryEngine(
+    vector_query_engine, 
+    query_transform=decompose_transform
+    transform_extra_info={'index_summary': vector_index.index_struct.summary}
+)
+custom_query_engines = {
+    vector_index.index_id: vector_query_engine
+} 
+
+# query
+query_str = (
+    "Compare and contrast the airports in Seattle, Houston, and Toronto. "
+)
+query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
+response = query_engine.query(query_str)
+```
+
+Check out our [example notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb) for a full walkthrough.
+
+
+
+### Multi-Step Query Transformations
+
+Multi-step query transformations are a generalization on top of existing single-step query transformation approaches.
+
+Given an initial, complex query, the query is transformed and executed against an index. The response is retrieved from the query. 
+Given the response (along with prior responses) and the query, followup questions may be asked against the index as well. This technique allows a query to be run against a single knowledge source until that query has satisfied all questions.
+
+An example image is shown below.
+
+![](/_static/query_transformations/multi_step_diagram.png)
+
+
+Here's a corresponding example code snippet.
+
+```python
+from llama_index.indices.query.query_transform.base import StepDecomposeQueryTransform
+# gpt-4
+step_decompose_transform = StepDecomposeQueryTransform(
+    llm_predictor, verbose=True
+)
+
+query_engine = index.as_query_engine()
+query_engine = MultiStepQueryEngine(query_engine, query_transform=step_decompose_transform)
+
+response = query_engine.query(
+    "Who was in the first batch of the accelerator program the author started?",
+)
+print(str(response))
+
+```
+
+Check out our [example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-multistep.ipynb) for a full walkthrough.
+
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+/examples/query_transformations/HyDEQueryTransformDemo.ipynb
+/examples/query_transformations/SimpleIndexDemo-multistep.ipynb
+```
@@ -0,0 +1,49 @@
+# Module Guides
+
+
+## Basic
+```{toctree}
+---
+maxdepth: 1
+---
+Retriever Query Engine </examples/query_engine/CustomRetrievers.ipynb>
+```
+
+## Structured & Semi-Structured Data
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/query_engine/json_query_engine.ipynb
+/examples/query_engine/pandas_query_engine.ipynb
+/examples/query_engine/knowledge_graph_query_engine.ipynb
+```
+
+## Advanced
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/query_engine/RouterQueryEngine.ipynb
+/examples/query_engine/RetrieverRouterQueryEngine.ipynb
+/examples/query_engine/JointQASummary.ipynb
+/examples/query_engine/sub_question_query_engine.ipynb
+/examples/query_transformations/SimpleIndexDemo-multistep.ipynb
+/examples/query_engine/SQLRouterQueryEngine.ipynb
+/examples/query_engine/SQLAutoVectorQueryEngine.ipynb
+/examples/query_engine/SQLJoinQueryEngine.ipynb
+/examples/index_structs/struct_indices/duckdb_sql_query.ipynb
+Retry Query Engine </examples/evaluation/RetryQuery.ipynb>
+Retry Source Query Engine </examples/evaluation/RetryQuery.ipynb>
+Retry Guideline Query Engine </examples/evaluation/RetryQuery.ipynb>
+/examples/query_engine/citation_query_engine.ipynb
+/examples/query_engine/pdf_tables/recursive_retriever.ipynb
+```
+
+## Experimental
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/query_engine/flare_query_engine.ipynb
+```
@@ -0,0 +1,20 @@
+# Response Modes
+
+Right now, we support the following options:
+- `default`: "create and refine" an answer by sequentially going through each retrieved `Node`; 
+    This makes a separate LLM call per Node. Good for more detailed answers.
+- `compact`: "compact" the prompt during each LLM call by stuffing as 
+    many `Node` text chunks that can fit within the maximum prompt size. If there are 
+    too many chunks to stuff in one prompt, "create and refine" an answer by going through
+    multiple prompts.
+- `tree_summarize`: Given a set of `Node` objects and the query, recursively construct a tree 
+    and return the root node as the response. Good for summarization purposes.
+- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM, 
+    without actually sending them. Then can be inspected by checking `response.source_nodes`.
+    The response object is covered in more detail in Section 5.
+- `accumulate`: Given a set of `Node` objects and the query, apply the query to each `Node` text
+    chunk while accumulating the responses into an array. Returns a concatenated string of all
+    responses. Good for when you need to run the same query separately against each text
+    chunk.
+
+See [Response Synthesizer](/core_modules/query_modules/response_synthesizers/root.md) to learn more.
@@ -0,0 +1,51 @@
+# Query Engine
+
+## Concept
+Query engine is a generic interface that allows you to ask question over your data.
+
+A query engine takes in a natural language query, and returns a rich response.
+It is most often (but not always) built on one or many [Indices](/core_modules/data_modules/index/root.md) via [Retrievers](/core_modules/query_modules/retriever/root.md).
+You can compose multiple query engines to achieve more advanced capability.
+
+```{tip}
+If you want to have a conversation with your data (multiple back-and-forth instead of a single question & answer), take a look at [Chat Engine](/core_modules/query_modules/chat_engines/root.md)  
+```
+
+## Usage Pattern
+Get started with:
+```python
+query_engine = index.as_query_engine()
+response = query_engine.query("Who is Paul Graham.")
+```
+
+To stream response:
+```python
+query_engine = index.as_query_engine(streaming=True)
+streaming_response = query_engine.query("Who is Paul Graham.")
+streaming_response.print_response_stream() 
+```
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+
+## Modules
+```{toctree}
+---
+maxdepth: 3
+---
+modules.md
+```
+
+
+## Supporting Modules
+```{toctree}
+---
+maxdepth: 2
+---
+supporting_modules.md
+```
@@ -0,0 +1,56 @@
+# Streaming
+
+LlamaIndex supports streaming the response as it's being generated.
+This allows you to start printing or processing the beginning of the response before the full response is finished.
+This can drastically reduce the perceived latency of queries.
+
+### Setup
+To enable streaming, you need to use an LLM that supports streaming.
+Right now, streaming is supported by `OpenAI`, `HuggingFaceLLM`, and most LangChain LLMs (via `LangChainLLM`).
+
+Configure query engine to use streaming:
+
+If you are using the high-level API, set `streaming=True` when building a query engine.
+```python
+query_engine = index.as_query_engine(
+    streaming=True,
+    similarity_top_k=1
+)
+```
+
+If you are using the low-level API to compose the query engine,
+pass `streaming=True` when constructing the `Response Synthesizer`:
+```python
+from llama_index import get_response_synthesizer
+synth = get_response_synthesizer(streaming=True, ...)
+query_engine = RetrieverQueryEngine(response_synthesizer=synth, ...)
+```
+
+### Streaming Response
+After properly configuring both the LLM and the query engine,
+calling `query` now returns a `StreamingResponse` object.
+
+```python
+streaming_response = query_engine.query(
+    "What did the author do growing up?", 
+)
+```
+
+The response is returned immediately when the LLM call *starts*, without having to wait for the full completion.
+
+> Note: In the case where the query engine makes multiple LLM calls, only the last LLM call will be streamed and the response is returned when the last LLM call starts.
+
+You can obtain a `Generator` from the streaming response and iterate over the tokens as they arrive:
+```python
+for text in streaming_response.response_gen:
+    # do something with text as they arrive.
+```
+
+Alternatively, if you just want to print the text as they arrive:
+```
+streaming_response.print_response_stream() 
+```
+
+See an [end-to-end example](/examples/customization/streaming/SimpleIndexDemo-streaming.ipynb)
+
+
@@ -0,0 +1,8 @@
+# Supporting Modules
+
+```{toctree}
+---
+maxdepth: 1
+---
+advanced/query_transformations.md
+```
@@ -0,0 +1,96 @@
+# Usage Pattern
+
+## Get Started
+Build a query engine from index:
+```python
+query_engine = index.as_query_engine()
+```
+
+```{tip}
+To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
+```
+
+Ask a question over your data
+```python
+response = query_engine.query('Who is Paul Graham?')
+```
+
+## Configuring a Query Engine
+### High-Level API
+You can directly build and configure a query engine from an index in 1 line of code:
+```python
+query_engine = index.as_query_engine(
+    response_mode='tree_summarize',
+    verbose=True,
+)
+```
+> Note: While the high-level API optimizes for ease-of-use, it does *NOT* expose full range of configurability.  
+
+See [**Response Modes**](./response_modes.md) for a full list of response modes and what they do.
+
+```{toctree}
+---
+maxdepth: 1
+hidden:
+---
+response_modes.md
+streaming.md
+```
+
+
+
+### Low-Level Composition API
+
+You can use the low-level composition API if you need more granular control.
+Concretely speaking, you would explicitly construct a `QueryEngine` object instead of calling `index.as_query_engine(...)`.
+> Note: You may need to look at API references or example notebooks.
+
+
+```python
+from llama_index import (
+    VectorStoreIndex,
+    get_response_synthesizer,
+)
+from llama_index.retrievers import VectorIndexRetriever
+from llama_index.query_engine import RetrieverQueryEngine
+
+# build index
+index = VectorStoreIndex.from_documents(documents)
+
+# configure retriever
+retriever = VectorIndexRetriever(
+    index=index, 
+    similarity_top_k=2,
+)
+
+# configure response synthesizer
+response_synthesizer = get_response_synthesizer(
+    response_mode="tree_summarize",
+)
+
+# assemble query engine
+query_engine = RetrieverQueryEngine(
+    retriever=retriever,
+    response_synthesizer=response_synthesizer,
+)
+
+# query
+response = query_engine.query("What did the author do growing up?")
+print(response)
+```
+### Streaming
+To enable streaming, you simply need to pass in a `streaming=True` flag
+
+```python
+query_engine = index.as_query_engine(
+    streaming=True,
+)
+streaming_response = query_engine.query(
+    "What did the author do growing up?", 
+)
+streaming_response.print_response_stream() 
+```
+
+* Read the full [streaming guide](/core_modules/query_modules/query_engine/streaming.md)
+* See an [end-to-end example](/examples/customization/streaming/SimpleIndexDemo-streaming.ipynb)
+
@@ -0,0 +1,62 @@
+# Module Guide
+
+Detailed inputs/outputs for each response synthesizer are found below. 
+
+## API Example
+
+The following shows the setup for utilizing all kwargs.
+
+- `response_mode` specifies which response synthesizer to use
+- `service_context` defines the LLM and related settings for synthesis
+- `text_qa_template` and `refine_template` are the prompts used at various stages
+- `use_async` is used for only the `tree_summarize` response mode right now, to asynchronously build the summary tree
+- `streaming` configures whether to return a streaming response object or not
+
+In the `synthesize`/`asyntheszie` functions, you can optionally provide additional source nodes, which will be added to the `response.source_nodes` list.
+
+```python
+from llama_index.schema import Node, NodeWithScore
+from llama_index import get_response_synthesizer
+
+response_synthesizer = get_response_synthesizer(
+  response_mode="refine",
+  service_context=service_context,
+  text_qa_template=text_qa_template,
+  refine_template=refine_template,
+  use_async=False,
+  streaming=False
+)
+
+# synchronous
+response = response_synthesizer.synthesize(
+  "query string", 
+  nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
+  additional_source_nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..], 
+)
+
+# asynchronous
+response = await response_synthesizer.asynthesize(
+  "query string", 
+  nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
+  additional_source_nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..], 
+)
+```
+
+You can also directly return a string, using the lower-level `get_response` and `aget_response` functions
+
+```python
+response_str = response_synthesizer.get_response(
+  "query string", 
+  text_chunks=["text1", "text2", ...]
+)
+```
+
+## Example Notebooks
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/response_synthesizers/refine.ipynb
+/examples/response_synthesizers/tree_summarize.ipynb
+```
@@ -0,0 +1,50 @@
+# Response Synthesizer
+
+## Concept
+A `Response Synthesizer` is what generates a response from an LLM, using a user query and a given set of text chunks. The output of a response synthesizer is a `Response` object.
+
+The method for doing this can take many forms, from as simple as iterating over text chunks, to as complex as building a tree. The main idea here is to simplify the process of generating a response using an LLM across your data.
+
+When used in a query engine, the response synthesizer is used after nodes are retrieved from a retriever, and after any node-postprocessors are ran.
+
+```{tip}
+Confused about where response synthesizer fits in the pipeline? Read the [high-level concepts](/getting_started/concepts.md)
+```
+
+## Usage Pattern
+Use a response synthesizer on it's own:
+
+```python
+from llama_index.schema import Node
+from llama_index.response_synthesizers import get_response_synthesizer
+
+response_synthesizer = get_response_synthesizer(response_mode='compact')
+
+response = response_synthesizer.synthesize("query text", nodes=[Node(text="text"), ...])
+```
+
+Or in a query engine after you've created an index:
+
+```python
+query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
+response = query_engine.query("query_text")
+```
+
+You can find more details on all available response synthesizers, modes, and how to build your own below.
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+## Modules
+Below you can find detailed API information for each response synthesis module.
+
+```{toctree}
+---
+maxdepth: 1
+---
+modules.md
+```
@@ -0,0 +1,95 @@
+# Usage Pattern
+
+## Get Started
+
+Configuring the response synthesizer for a query engine using `response_mode`:
+
+```python
+from llama_index.schema import Node, NodeWithScore
+from llama_index.response_synthesizers import get_response_synthesizer
+
+response_synthesizer = get_response_synthesizer(response_mode='compact')
+
+response = response_synthesizer.synthesize(
+  "query text", 
+  nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..]
+)
+```
+
+Or, more commonly, in a query engine after you've created an index:
+
+```python
+query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
+response = query_engine.query("query_text")
+```
+
+```{tip}
+To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
+```
+
+## Configuring the Response Mode
+Response synthesizers are typically specified through a `response_mode` kwarg setting.
+
+Several response synthesizers are implemented already in LlamaIndex:
+
+- `refine`: "create and refine" an answer by sequentially going through each retrieved text chunk. 
+    This makes a separate LLM call per Node. Good for more detailed answers.
+- `compact` (default): "compact" the prompt during each LLM call by stuffing as 
+    many text chunks that can fit within the maximum prompt size. If there are 
+    too many chunks to stuff in one prompt, "create and refine" an answer by going through
+    multiple compact prompts. The same as `refine`, but should result in less LLM calls.
+- `tree_summarize`: Given a set of text chunks and the query, recursively construct a tree 
+    and return the root node as the response. Good for summarization purposes.
+- `simple_summarize`: Truncates all text chunks to fit into a single LLM prompt. Good for quick
+    summarization purposes, but may lose detail due to truncation.
+- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM, 
+    without actually sending them. Then can be inspected by checking `response.source_nodes`.
+- `accumulate`: Given a set of text chunks and the query, apply the query to each text
+    chunk while accumulating the responses into an array. Returns a concatenated string of all
+    responses. Good for when you need to run the same query separately against each text
+    chunk.
+- `compact_accumulate`: The same as accumulate, but will "compact" each LLM prompt similar to
+    `compact`, and run the same query against each text chunk.
+
+## Custom Response Synthesizers
+
+Each response synthesizer inherits from `llama_index.response_synthesizers.base.BaseSynthesizer`. The base API is extremely simple, which makes it easy to create your own response synthesizer.
+
+Maybe you want to customize which template is used at each step in `tree_summarize`, or maybe a new research paper came out detailing a new way to generate a response to a query, you can create your own response synthesizer and plug it into any query engine or use it on it's own.
+
+Below we show the `__init__()` function, as well as the two abstract methods that every response synthesizer must implement. The basic requirements are to process a query and text chunks, and return a string (or string generator) response.
+
+```python
+class BaseSynthesizer(ABC):
+    """Response builder class."""
+
+    def __init__(
+        self,
+        service_context: Optional[ServiceContext] = None,
+        streaming: bool = False,
+    ) -> None:
+        """Init params."""
+        self._service_context = service_context or ServiceContext.from_defaults()
+        self._callback_manager = self._service_context.callback_manager
+        self._streaming = streaming
+
+    @abstractmethod
+    def get_response(
+        self,
+        query_str: str,
+        text_chunks: Sequence[str],
+        **response_kwargs: Any,
+    ) -> RESPONSE_TEXT_TYPE:
+        """Get response."""
+        ...
+
+    @abstractmethod
+    async def aget_response(
+        self,
+        query_str: str,
+        text_chunks: Sequence[str],
+        **response_kwargs: Any,
+    ) -> RESPONSE_TEXT_TYPE:
+        """Get response."""
+        ...
+```
@@ -0,0 +1,52 @@
+# Module Guides
+We are adding more module guides soon!
+In the meanwhile, please take a look at the [API References](/api_reference/query/retrievers.rst).
+
+## Vector Index Retrievers
+* VectorIndexRetriever
+```{toctree}
+---
+maxdepth: 1
+---
+VectorIndexAutoRetriever </examples/vector_stores/chroma_auto_retriever.ipynb>
+```
+
+## List Index
+* ListIndexRetriever 
+* ListIndexEmbeddingRetriever 
+* ListIndexLLMRetriever
+
+## Tree Index
+* TreeSelectLeafRetriever
+* TreeSelectLeafEmbeddingRetriever
+* TreeAllLeafRetriever
+* TreeRootRetriever
+
+
+## Keyword Table Index
+* KeywordTableGPTRetriever
+* KeywordTableSimpleRetriever
+* KeywordTableRAKERetriever
+
+
+## Knowledge Graph Index
+```{toctree}
+---
+maxdepth: 1
+---
+Custom Retriever (KG Index and Vector Store Index) </examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb>
+```
+* KGTableRetriever
+
+## Document Summary Index
+* DocumentSummaryIndexRetriever
+* DocumentSummaryIndexEmbeddingRetriever
+
+## Composed Retrievers
+* TransformRetriever
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/query_engine/pdf_tables/recursive_retriever.ipynb
+```
@@ -0,0 +1,35 @@
+# Retriever Modes
+Here we show the mapping from `retriever_mode` configuration to the selected retriever class.
+> Note that `retriever_mode` can mean different thing for different index classes. 
+
+## Vector Index
+Specifying `retriever_mode` has no effect (silently ignored).
+`vector_index.as_retriever(...)` always returns a VectorIndexRetriever.
+
+
+## List Index
+* `default`: ListIndexRetriever 
+* `embedding`: ListIndexEmbeddingRetriever 
+* `llm`: ListIndexLLMRetriever
+
+## Tree Index
+* `select_leaf`: TreeSelectLeafRetriever
+* `select_leaf_embedding`: TreeSelectLeafEmbeddingRetriever
+* `all_leaf`: TreeAllLeafRetriever
+* `root`: TreeRootRetriever
+
+
+## Keyword Table Index
+* `default`: KeywordTableGPTRetriever
+* `simple`: KeywordTableSimpleRetriever
+* `rake`: KeywordTableRAKERetriever
+
+
+## Knowledge Graph Index
+* `keyword`: KGTableRetriever
+* `embedding`: KGTableRetriever
+* `hybrid`: KGTableRetriever
+
+## Document Summary Index
+* `default`: DocumentSummaryIndexRetriever
+* `embedding`: DocumentSummaryIndexEmbeddingRetrievers
@@ -0,0 +1,37 @@
+
+# Retriever
+
+## Concept
+
+Retrievers are responsible for fetching the most relevant context given a user query (or chat message).  
+
+It can be built on top of [Indices](/core_modules/data_modules/index/root.md), but can also be defined independently.
+It is used as a key building block in [Query Engines](/core_modules/query_modules/query_engine/root.md) (and [Chat Engines](/core_modules/query_modules/chat_engines/root.md)) for retrieving relevant context.
+
+```{tip}
+Confused about where retriever fits in the pipeline? Read about [high-level concepts](/getting_started/concepts.md)
+```
+
+## Usage Pattern
+
+Get started with:
+```python
+retriever = index.as_retriever()
+nodes = retriever.retrieve("Who is Paul Graham?")
+```
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage_pattern.md
+```
+
+
+## Modules
+```{toctree}
+---
+maxdepth: 2
+---
+modules.md
+```
@@ -0,0 +1,74 @@
+# Usage Pattern
+
+## Get Started
+Get a retriever from index:
+```python
+retriever = index.as_retriever()
+```
+
+Retrieve relevant context for a question:
+```python
+nodes = retriever.retrieve('Who is Paul Graham?')
+```
+
+> Note: To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
+
+## High-Level API
+
+### Selecting a Retriever
+
+You can select the index-specific retriever class via `retriever_mode`. 
+For example, with a `ListIndex`:
+```python
+retriever = list_index.as_retriever(
+    retriever_mode='llm',
+)
+```
+This creates a [ListIndexLLMRetriever](/api_reference/query/retrievers/list.rst) on top of the list index.
+
+See [**Retriever Modes**](/core_modules/query_modules/retriever/retriever_modes.md) for a full list of (index-specific) retriever modes
+and the retriever classes they map to.
+
+```{toctree}
+---
+maxdepth: 1
+hidden:
+---
+retriever_modes.md
+```
+
+### Configuring a Retriever
+In the same way, you can pass kwargs to configure the selected retriever.
+> Note: take a look at the API reference for the selected retriever class' constructor parameters for a list of valid kwargs.
+
+For example, if we selected the "llm" retriever mode, we might do the following:
+```python
+retriever = list_index.as_retriever(
+    retriever_mode='llm',
+    choice_batch_size=5,
+)
+
+```
+
+## Low-Level Composition API
+You can use the low-level composition API if you need more granular control.  
+
+To achieve the same outcome as above, you can directly import and construct the desired retriever class:
+```python
+from llama_index.indices.list import ListIndexLLMRetriever
+
+retriever = ListIndexLLMRetriever(
+    index=list_index,
+    choice_batch_size=5,
+)
+```
+
+
+## Advanced
+
+```{toctree}
+---
+maxdepth: 1
+---
+Define Custom Retriever </examples/query_engine/CustomRetrievers.ipynb>
+```
@@ -0,0 +1,156 @@
+# Output Parsing
+
+LlamaIndex supports integrations with output parsing modules offered
+by other frameworks. These output parsing modules can be used in the following ways:
+- To provide formatting instructions for any prompt / query (through `output_parser.format`)
+- To provide "parsing" for LLM outputs (through `output_parser.parse`)
+
+
+### Guardrails
+
+Guardrails is an open-source Python package for specification/validation/correction of output schemas. See below for a code example.
+
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+from llama_index.output_parsers import GuardrailsOutputParser
+from llama_index.llm_predictor import StructuredLLMPredictor
+from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
+from llama_index.prompts.default_prompts import DEFAULT_TEXT_QA_PROMPT_TMPL, DEFAULT_REFINE_PROMPT_TMPL
+
+
+# load documents, build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectorStoreIndex(documents, chunk_size=512)
+llm_predictor = StructuredLLMPredictor()
+
+
+# specify StructuredLLMPredictor
+# this is a special LLMPredictor that allows for structured outputs
+
+# define query / output spec
+rail_spec = ("""
+<rail version="0.1">
+
+<output>
+    <list name="points" description="Bullet points regarding events in the author's life.">
+        <object>
+            <string name="explanation" format="one-line" on-fail-one-line="noop" />
+            <string name="explanation2" format="one-line" on-fail-one-line="noop" />
+            <string name="explanation3" format="one-line" on-fail-one-line="noop" />
+        </object>
+    </list>
+</output>
+
+<prompt>
+
+Query string here.
+
+@xml_prefix_prompt
+
+{output_schema}
+
+@json_suffix_prompt_v2_wo_none
+</prompt>
+</rail>
+""")
+
+# define output parser
+output_parser = GuardrailsOutputParser.from_rail_string(rail_spec, llm=llm_predictor.llm)
+
+# format each prompt with output parser instructions
+fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
+fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)
+
+qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
+refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)
+
+# obtain a structured response
+query_engine = index.as_query_engine(
+    service_context=ServiceContext.from_defaults(
+        llm_predictor=llm_predictor
+    ),
+    text_qa_template=qa_prompt, 
+    refine_template=refine_prompt, 
+)
+response = query_engine.query(
+    "What are the three items the author did growing up?", 
+)
+print(response)
+
+```
+
+Output:
+```
+{'points': [{'explanation': 'Writing short stories', 'explanation2': 'Programming on an IBM 1401', 'explanation3': 'Using microcomputers'}]}
+```
+
+
+### Langchain
+
+Langchain also offers output parsing modules that you can use within LlamaIndex.
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+from llama_index.output_parsers import LangchainOutputParser
+from llama_index.llm_predictor import StructuredLLMPredictor
+from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
+from llama_index.prompts.default_prompts import DEFAULT_TEXT_QA_PROMPT_TMPL, DEFAULT_REFINE_PROMPT_TMPL
+from langchain.output_parsers import StructuredOutputParser, ResponseSchema
+
+
+# load documents, build index
+documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
+index = VectorStoreIndex.from_documents(documents)
+llm_predictor = StructuredLLMPredictor()
+
+# define output schema
+response_schemas = [
+    ResponseSchema(name="Education", description="Describes the author's educational experience/background."),
+    ResponseSchema(name="Work", description="Describes the author's work experience/background.")
+]
+
+# define output parser
+lc_output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
+output_parser = LangchainOutputParser(lc_output_parser)
+
+# format each prompt with output parser instructions
+fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
+fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)
+qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
+refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)
+
+# query index
+query_engine = index.as_query_engine(
+    service_context=ServiceContext.from_defaults(
+        llm_predictor=llm_predictor
+    ),
+    text_qa_template=qa_prompt, 
+    refine_template=refine_prompt, 
+)
+response = query_engine.query(
+    "What are a few things the author did growing up?", 
+)
+print(str(response))
+```
+
+Output:
+
+```
+{'Education': 'Before college, the author wrote short stories and experimented with programming on an IBM 1401.', 'Work': 'The author worked on writing and programming outside of school.'}
+```
+
+### Guides
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+
+/examples/output_parsing/GuardrailsDemo.ipynb
+/examples/output_parsing/LangchainOutputParserDemo.ipynb
+/examples/output_parsing/guidance_pydantic_program.ipynb
+/examples/output_parsing/guidance_sub_question.ipynb
+/examples/output_parsing/openai_pydantic_program.ipynb
+```
@@ -0,0 +1,35 @@
+# Pydantic Program
+
+A pydantic program is a generic abstraction that takes in an input string and converts it to a structured Pydantic object type.
+
+Because this abstraction is so generic, it encompasses a broad range of LLM workflows. The programs are composable and be for more generic or specific use cases.
+
+There's a few general types of Pydantic Programs:
+- **LLM Text Completion Pydantic Programs**: These convert input text into a user-specified structured object through a text completion API + output parsing.
+- **LLM Function Calling Pydantic Program**: These convert input text into a user-specified structured object through an LLM function calling API.
+- **Prepackaged Pydantic Programs**: These convert input text into prespecified structured objects.
+
+
+## LLM Text Completion Pydantic Programs
+TODO: Coming soon!
+
+
+## LLM Function Calling Pydantic Programs
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/output_parsing/openai_pydantic_program.ipynb
+/examples/output_parsing/guidance_pydantic_program.ipynb
+/examples/output_parsing/guidance_sub_question.ipynb
+```
+
+
+## Prepackaged Pydantic Programs
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/output_parsing/df_program.ipynb
+/examples/output_parsing/evaporate_program.ipynb
+```
@@ -0,0 +1,42 @@
+# Structured Outputs
+
+The ability of LLMs to produce structured outputs are important for downstream applications that rely on reliably parsing output values. 
+LlamaIndex itself also relies on structured output in the following ways.
+- **Document retrieval**: Many data structures within LlamaIndex rely on LLM calls with a specific schema for Document retrieval. For instance, the tree index expects LLM calls to be in the format "ANSWER: (number)".
+- **Response synthesis**: Users may expect that the final response contains some degree of structure (e.g. a JSON output, a formatted SQL query, etc.)
+
+LlamaIndex provides a variety of modules enabling LLMs to produce outputs in a structured format. We provide modules at different levels of abstraction:
+- **Output Parsers**: These are modules that operate before and after an LLM text completion endpoint. They are not used with LLM function calling endpoints (since those contain structured outputs out of the box).
+- **Pydantic Programs**: These are generic modules that map an input prompt to a structured output, represented by a Pydantic object. They may use function calling APIs or text completion APIs + output parsers.
+- **Pre-defined Pydantic Program**: We have pre-defined Pydantic programs that map inputs to specific output types (like dataframes).
+
+See the sections below for an overview of output parsers and Pydantic programs.
+
+## 🔬 Anatomy of a Structured Output Function
+
+Here we describe the different components of an LLM-powered structured output function. The pipeline depends on whether you're using a **generic LLM text completion API** or an **LLM function calling API**.
+
+![](/_static/structured_output/diagram1.png)
+
+With generic completion APIs, the inputs and outputs are handled by text prompts. The output parser plays a role before and after the LLM call in ensuring structured outputs. Before the LLM call, the output parser can
+append format instructions to the prompt. After the LLM call, the output parser can parse the output to the specified instructions.
+
+With function calling APIs, the output is inherently in a structured format, and the input can take in the signature of the desired object. The structured output just needs to be cast in the right object format (e.g. Pydantic).
+
+## Output Parser Modules
+
+```{toctree}
+---
+maxdepth: 2
+---
+output_parser.md
+```
+
+## Pydantic Program Modules
+
+```{toctree}
+---
+maxdepth: 2
+---
+pydantic_program.md
+```
@@ -0,0 +1,50 @@
+# Callbacks
+
+## Concept
+LlamaIndex provides callbacks to help debug, track, and trace the inner workings of the library. 
+Using the callback manager, as many callbacks as needed can be added.
+
+In addition to logging data related to events, you can also track the duration and number of occurances
+of each event. 
+
+Furthermore, a trace map of events is also recorded, and callbacks can use this data
+however they want. For example, the `LlamaDebugHandler` will, by default, print the trace of events
+after most operations.
+
+**Callback Event Types**  
+While each callback may not leverage each event type, the following events are available to be tracked:
+
+- `CHUNKING` -> Logs for the before and after of text splitting.
+- `NODE_PARSING` -> Logs for the documents and the nodes that they are parsed into.
+- `EMBEDDING` -> Logs for the number of texts embedded.
+- `LLM` -> Logs for the template and response of LLM calls.
+- `QUERY` -> Keeps track of the start and end of each query.
+- `RETRIEVE` -> Logs for the nodes retrieved for a query.
+- `SYNTHESIZE` -> Logs for the result for synthesize calls.
+- `TREE` -> Logs for the summary and level of summaries generated.
+- `SUB_QUESTIONS` -> Logs for the sub questions and answers generated.
+
+You can implement your own callback to track and trace these events, or use an existing callback.
+
+
+## Modules
+
+Currently supported callbacks are as follows:
+
+- [TokenCountingHandler](/examples/callbacks/TokenCountingHandler.ipynb) -> Flexible token counting for prompt, completion, and embedding token usage. See the migration details [here](/core_modules/model_modules/callbacks/token_counting_migration.md)
+- [LlamaDebugHanlder](/examples/callbacks/LlamaDebugHandler.ipynb) -> Basic tracking and tracing for events. Example usage can be found in the notebook below.
+- [WandbCallbackHandler](/examples/callbacks/WandbCallbackHandler.ipynb) -> Tracking of events and traces using the Wandb Prompts frontend. More details are in the notebook below or at [Wandb](https://docs.wandb.ai/guides/prompts/quickstart)
+- [AimCallback](/examples/callbacks/AimCallback.ipynb) -> Tracking of LLM inputs and outputs. Example usage can be found in the notebook below.
+
+
+```{toctree}
+---
+maxdepth: 1
+hidden:
+---
+/examples/callbacks/TokenCountingHandler.ipynb
+/examples/callbacks/LlamaDebugHandler.ipynb
+/examples/callbacks/WandbCallbackHandler.ipynb
+/examples/callbacks/AimCallback.ipynb
+token_counting_migration.md
+```
@@ -0,0 +1,47 @@
+# Token Counting - Migration Guide
+
+The existing token counting implementation has been __deprecated__. 
+
+We know token counting is important to many users, so this guide was created to walkthrough a (hopefully painless) transition. 
+
+Previously, token counting was kept track of on the `llm_predictor` and `embed_model` objects directly, and optionally printed to the console. This implementation used a static tokenizer for token counting (gpt-2), and the `last_token_usage` and `total_token_usage` attributes were not always kept track of properly.
+
+Going forward, token counting as moved into a callback. Using the `TokenCountingHandler` callback, you now have more options for how tokens are counted, the lifetime of the token counts, and even creating separete token counters for different indexes.
+
+Here is a minimum example of using the new `TokenCountingHandler` with an OpenAI model:
+
+```python
+import tiktoken
+from llama_index.callbacks import CallbackManager, TokenCountingHandler
+from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
+
+# you can set a tokenizer directly, or optionally let it default 
+# to the same tokenizer that was used previously for token counting
+# NOTE: The tokenizer should be a function that takes in text and returns a list of tokens
+token_counter = TokenCountingHandler(
+    tokenizer=tiktoken.encoding_for_model("text-davinci-003").encode
+    verbose=False  # set to true to see usage printed to the console
+)
+
+callback_manager = CallbackManager([token_counter])
+
+service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
+
+document = SimpleDirectoryReader("./data").load_data()
+
+# if verbose is turned on, you will see embedding token usage printed
+index = VectorStoreIndex.from_documents(documents, service_context=service_context)
+
+# otherwise, you can access the count directly
+print(token_counter.total_embedding_token_count)
+
+# reset the counts at your discretion!
+token_counter.reset_counts()
+
+# also track prompt, completion, and total LLM tokens, in addition to embeddings
+response = index.as_query_engine().query("What did the author do growing up?")
+print('Embedding Tokens: ', token_counter.total_embedding_token_count, '\n',
+      'LLM Prompt Tokens: ', token_counter.prompt_llm_token_count, '\n',
+      'LLM Completion Tokens: ', token_counter.completion_llm_token_count, '\n',
+      'Total LLM Token Count: ', token_counter.total_llm_token_count)
+```
@@ -0,0 +1,97 @@
+# Cost Analysis
+
+## Concept
+Each call to an LLM will cost some amount of money - for instance, OpenAI's gpt-3.5-turbo costs $0.002 / 1k tokens. The cost of building an index and querying depends on 
+
+- the type of LLM used
+- the type of data structure used
+- parameters used during building 
+- parameters used during querying
+
+The cost of building and querying each index is a TODO in the reference documentation. In the meantime, we provide the following information:
+
+1. A high-level overview of the cost structure of the indices.
+2. A token predictor that you can use directly within LlamaIndex!
+
+### Overview of Cost Structure
+
+#### Indices with no LLM calls
+The following indices don't require LLM calls at all during building (0 cost):
+- `ListIndex`
+- `SimpleKeywordTableIndex` - uses a regex keyword extractor to extract keywords from each document
+- `RAKEKeywordTableIndex` - uses a RAKE keyword extractor to extract keywords from each document
+
+#### Indices with LLM calls
+The following indices do require LLM calls during build time:
+- `TreeIndex` - use LLM to hierarchically summarize the text to build the tree
+- `KeywordTableIndex` - use LLM to extract keywords from each document
+
+### Query Time
+
+There will always be >= 1 LLM call during query time, in order to synthesize the final answer. 
+Some indices contain cost tradeoffs between index building and querying. `ListIndex`, for instance,
+is free to build, but running a query over a list index (without filtering or embedding lookups), will
+call the LLM {math}`N` times.
+
+Here are some notes regarding each of the indices:
+- `ListIndex`: by default requires {math}`N` LLM calls, where N is the number of nodes.
+- `TreeIndex`: by default requires {math}`\log (N)` LLM calls, where N is the number of leaf nodes. 
+    - Setting `child_branch_factor=2` will be more expensive than the default `child_branch_factor=1` (polynomial vs logarithmic), because we traverse 2 children instead of just 1 for each parent node.
+- `KeywordTableIndex`: by default requires an LLM call to extract query keywords.
+    - Can do `index.as_retriever(retriever_mode="simple")` or `index.as_retriever(retriever_mode="rake")` to also use regex/RAKE keyword extractors on your query text.
+-  `VectorStoreIndex`: by default, requires one LLM call per query. If you increase the `similarity_top_k` or `chunk_size`, or change the `response_mode`, then this number will increase.
+
+## Usage Pattern
+
+LlamaIndex offers token **predictors** to predict token usage of LLM and embedding calls.
+This allows you to estimate your costs during 1) index construction, and 2) index querying, before
+any respective LLM calls are made.
+
+Tokens are counted using the `TokenCountingHandler` callback. See the [example notebook](../../../examples/callbacks/TokenCountingHandler.ipynb) for details on the setup.
+
+### Using MockLLM
+
+To predict token usage of LLM calls, import and instantiate the MockLLM as shown below. The `max_tokens` parameter is used as a "worst case" prediction, where each LLM response will contain exactly that number of tokens. If `max_tokens` is not specified, then it will simply predict back the prompt.
+
+```python
+from llama_index import ServiceContext, set_global_service_context
+from llama_index.llms import MockLLM
+
+llm = MockLLM(max_tokens=256)
+
+service_context = ServiceContext.from_defaults(llm=llm)
+
+# optionally set a global service context
+set_global_service_context(service_context)
+```
+
+You can then use this predictor during both index construction and querying. 
+
+### Using MockEmbedding
+
+You may also predict the token usage of embedding calls with `MockEmbedding`. 
+
+```python
+from llama_index import ServiceContext, set_global_service_context
+from llama_index import MockEmbedding
+
+# specify a MockLLMPredictor
+embed_model = MockEmbedding(embed_dim=1536)
+
+service_context = ServiceContext.from_defaults(embed_model=embed_model)
+
+# optionally set a global service context
+set_global_service_context(service_context)
+```
+
+## Usage Pattern
+
+Read about the full usage pattern below!
+
+```{toctree}
+---
+caption: Examples
+maxdepth: 1
+---
+usage_pattern.md
+```
@@ -0,0 +1,97 @@
+# Usage Pattern
+
+## Estimating LLM and Embedding Token Counts
+
+In order to measure LLM and Embedding token counts, you'll need to
+
+1. Setup `MockLLM` and `MockEmbedding` objects
+
+```python
+from llama_index.llms import MockLLM
+from llama_index import MockEmbedding
+
+llm = MockLLM(max_tokens=256)
+embed_model = MockEmbedding(embed_dim=1536)
+```
+
+2. Setup the `TokenCountingCallback` handler
+
+```python
+import tiktoken
+from llama_index.callbacks import CallbackManager, TokenCountingHandler
+
+token_counter = TokenCountingHandler(
+    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
+)
+
+callback_manager = CallbackManager([token_counter])
+```
+
+3. Add them to the global `ServiceContext`
+
+```python
+from llama_index import ServiceContext, set_global_service_context
+
+set_global_service_context(
+    ServiceContext.from_defaults(
+        llm=llm, 
+        embed_model=embed_model, 
+        callback_manager=callback_manager
+    )
+)
+```
+
+4. Construct an Index 
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+
+documents = SimpleDirectoryReader("./docs/examples/data/paul_graham").load_data()
+
+index = VectorStoreIndex.from_documents(documents)
+```
+
+5. Measure the counts!
+
+```python
+print(
+    "Embedding Tokens: ",
+    token_counter.total_embedding_token_count,
+    "\n",
+    "LLM Prompt Tokens: ",
+    token_counter.prompt_llm_token_count,
+    "\n",
+    "LLM Completion Tokens: ",
+    token_counter.completion_llm_token_count,
+    "\n",
+    "Total LLM Token Count: ",
+    token_counter.total_llm_token_count,
+    "\n",
+)
+
+# reset counts
+token_counter.reset_counts()
+```
+
+6. Run a query, mesaure again
+
+```python
+query_engine = index.as_query_engine()
+
+response = query_engine.query("query")
+
+print(
+    "Embedding Tokens: ",
+    token_counter.total_embedding_token_count,
+    "\n",
+    "LLM Prompt Tokens: ",
+    token_counter.prompt_llm_token_count,
+    "\n",
+    "LLM Completion Tokens: ",
+    token_counter.completion_llm_token_count,
+    "\n",
+    "Total LLM Token Count: ",
+    token_counter.total_llm_token_count,
+    "\n",
+)
+```
@@ -0,0 +1,13 @@
+# Modules
+
+Notebooks with usage of these components can be found below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+
+../../../examples/evaluation/TestNYC-Evaluation.ipynb
+../../../examples/evaluation/TestNYC-Evaluation-Query.ipynb
+../../../examples/evaluation/QuestionGeneration.ipynb
+```
@@ -0,0 +1,64 @@
+# Evaluation
+
+## Concept
+Evaluation in generative AI and retrieval is a difficult task. Due to the unpredictable nature of text, and a general lack of "expected" outcomes to compare against, there are many blockers to getting started with evaluation.
+
+However, LlamaIndex offers a few key modules for evaluating the quality of both Document retrieval and response synthesis.
+Here are some key questions for each component:
+
+- **Document retrieval**: Are the sources relevant to the query?
+- **Response synthesis**: Does the response match the retrieved context? Does it also match the query? 
+
+This guide describes how the evaluation components within LlamaIndex work. Note that our current evaluation modules
+do *not* require ground-truth labels. Evaluation can be done with some combination of the query, context, response,
+and combine these with LLM calls.
+
+### Evaluation of the Response + Context
+
+Each response from a `query_engine.query` calls returns both the synthesized response as well as source documents.
+
+We can evaluate the response against the retrieved sources - without taking into account the query!
+
+This allows you to measure hallucination - if the response does not match the retrieved sources, this means that the model may be "hallucinating" an answer since it is not rooting the answer in the context provided to it in the prompt.
+
+There are two sub-modes of evaluation here. We can either get a binary response "YES"/"NO" on whether response matches *any* source context,
+and also get a response list across sources to see which sources match.
+
+The `ResponseEvaluator` handles both modes for evaluating in this context.
+
+### Evaluation of the Query + Response + Source Context
+
+This is similar to the above section, except now we also take into account the query. The goal is to determine if
+the response + source context answers the query.
+
+As with the above, there are two submodes of evaluation. 
+- We can either get a binary response "YES"/"NO" on whether
+the response matches the query, and whether any source node also matches the query.
+- We can also ignore the synthesized response, and check every source node to see
+if it matches the query.
+
+### Question Generation
+
+In addition to evaluating queries, LlamaIndex can also use your data to generate questions to evaluate on. This means that you can automatically generate questions, and then run an evaluation pipeline to test if the LLM can actually answer questions accurately using your data.
+
+## Usage Pattern
+
+For full usage details, see the usage pattern below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+usage_pattern.md
+```
+
+## Modules
+
+Notebooks with usage of these components can be found below.
+
+```{toctree}
+---
+maxdepth: 1
+---
+modules.md
+```
@@ -0,0 +1,141 @@
+# Usage Pattern
+
+## Evaluating Response for Hallucination
+
+### Binary Evaluation
+
+This mode of evaluation will return "YES"/"NO" if the synthesized response matches any source context.
+
+```python
+from llama_index import VectorStoreIndex
+from llama_index.llms import OpenAI
+from llama_index.evaluation import ResponseEvaluator
+
+# build service context
+llm = OpenAI(model="gpt-4", temperature=0.0)
+service_context = ServiceContext.from_defaults(llm=llm)
+
+# build index
+...
+
+# define evaluator
+evaluator = ResponseEvaluator(service_context=service_context)
+
+# query index
+query_engine = vector_index.as_query_engine()
+response = query_engine.query("What battles took place in New York City in the American Revolution?")
+eval_result = evaluator.evaluate(response)
+print(str(eval_result))
+
+```
+
+You'll get back either a `YES` or `NO` response.
+
+![](/_static/evaluation/eval_response_context.png)
+
+### Sources Evaluation
+
+This mode of evaluation will return "YES"/"NO" for every source node.
+
+```python
+from llama_index import VectorStoreIndex
+from llama_index.evaluation import ResponseEvaluator
+
+# build service context
+llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
+service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
+
+# build index
+...
+
+# define evaluator
+evaluator = ResponseEvaluator(service_context=service_context)
+
+# query index
+query_engine = vector_index.as_query_engine()
+response = query_engine.query("What battles took place in New York City in the American Revolution?")
+eval_result = evaluator.evaluate_source_nodes(response)
+print(str(eval_result))
+
+```
+
+You'll get back a list of "YES"/"NO", corresponding to each source node in `response.source_nodes`.
+
+## Evaluting Query + Response for Answer Quality
+
+### Binary Evaluation
+
+This mode of evaluation will return "YES"/"NO" if the synthesized response matches the query + any source context.
+
+```python
+from llama_index import VectorStoreIndex
+from llama_index.llms import OpenAI
+from llama_index.evaluation import QueryResponseEvaluator
+
+# build service context
+llm = OpenAI(model="gpt-4", temperature=0.0)
+service_context = ServiceContext.from_defaults(llm=llm)
+
+# build index
+...
+
+# define evaluator
+evaluator = QueryResponseEvaluator(service_context=service_context)
+
+# query index
+query_engine = vector_index.as_query_engine()
+response = query_engine.query("What battles took place in New York City in the American Revolution?")
+eval_result = evaluator.evaluate(response)
+print(str(eval_result))
+
+```
+
+![](/_static/evaluation/eval_query_response_context.png)
+
+### Sources Evaluation
+
+This mode of evaluation will look at each source node, and see if each source node contains an answer to the query.
+
+```python
+from llama_index import VectorStoreIndex
+from llama_index.evaluation import QueryResponseEvaluator
+
+# build service context
+llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
+service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
+
+# build index
+...
+
+# define evaluator
+evaluator = QueryResponseEvaluator(service_context=service_context)
+
+# query index
+query_engine = vector_index.as_query_engine()
+response = query_engine.query("What battles took place in New York City in the American Revolution?")
+eval_result = evaluator.evaluate_source_nodes(response)
+print(str(eval_result))
+```
+
+![](/_static/evaluation/eval_query_sources.png)
+
+## Question Generation
+
+LlamaIndex can also generate questions to answer using your data. Using in combination with the above evaluators, you can create a fully automated evaluation pipeline over your data.
+
+```python
+from llama_index import SimpleDirectoryReader
+from llama_index.evaluation import ResponseEvaluator
+
+# build service context
+llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
+service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
+
+# build documents
+documents = SimpleDirectoryReader("./data").load_data()
+
+# define genertor, generate questions
+data_generator = DatasetGenerator.from_documents(documents)
+
+eval_questions = data_generator.generate_questions_from_nodes()
+```
@@ -0,0 +1,44 @@
+# Playground
+
+## Concept
+
+The Playground module in LlamaIndex is a way to automatically test your data (i.e. documents) across a diverse combination of indices, models, embeddings, modes, etc. to decide which ones are best for your purposes. More options will continue to be added.
+
+For each combination, you'll be able to compare the results for any query and compare the answers, latency, tokens used, and so on.
+
+You may initialize a Playground with a list of pre-built indices, or initialize one from a list of Documents using the preset indices.
+
+## Usage Pattern
+
+A sample usage is given below.
+
+```python
+from llama_index import download_loader
+from llama_index.indices.vector_store import VectorStoreIndex
+from llama_index.indices.tree.base import TreeIndex
+from llama_index.playground import Playground
+
+# load data 
+WikipediaReader = download_loader("WikipediaReader")
+loader = WikipediaReader()
+documents = loader.load_data(pages=['Berlin'])
+
+# define multiple index data structures (vector index, list index)
+indices = [VectorStoreIndex(documents), TreeIndex(documents)]
+
+# initialize playground
+playground = Playground(indices=indices)
+
+# playground compare
+playground.compare("What is the population of Berlin?")
+
+```
+
+## Modules
+
+```{toctree}
+---
+maxdepth: 1
+---
+../../../examples/analysis/PlaygroundDemo.ipynb
+```
@@ -0,0 +1,103 @@
+# ServiceContext
+
+## Concept
+The `ServiceContext` is a bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application.
+You can use it to set the [global configuration](#setting-global-configuration), as well as [local configurations](#setting-local-configuration) at specific parts of the pipeline.
+
+## Usage Pattern
+
+### Configuring the service context
+The `ServiceContext` is a simple python dataclass that you can directly construct by passing in the desired components.
+
+```python
+@dataclass
+class ServiceContext:
+    # The LLM used to generate natural language responses to queries.
+    llm_predictor: BaseLLMPredictor
+
+    # The PromptHelper object that helps with truncating and repacking text chunks to fit in the LLM's context window.
+    prompt_helper: PromptHelper
+
+    # The embedding model used to generate vector representations of text.
+    embed_model: BaseEmbedding
+
+    # The parser that converts documents into nodes.
+    node_parser: NodeParser
+
+    # The callback manager object that calls it's handlers on events. Provides basic logging and tracing capabilities.
+    callback_manager: CallbackManager
+
+    @classmethod
+    def from_defaults(cls, ...) -> "ServiceContext":
+      ... 
+```
+
+```{tip}
+Learn how to configure specific modules:
+- [LLM](/core_modules/model_modules/llms/usage_custom.md)
+- [Embedding Model](/core_modules/model_modules/embeddings/usage_pattern.md)
+- [Node Parser](/core_modules/data_modules/node_parsers/usage_pattern.md)
+
+```
+
+We also expose some common kwargs (of the above components) via the `ServiceContext.from_defaults` method
+for convenience (so you don't have to manually construct them).
+ 
+**Kwargs for node parser**:
+- `chunk_size`: The size of the text chunk for a node . Is used for the node parser when they aren't provided.
+- `chunk_overlap`: The amount of overlap between nodes (i.e. text chunks).
+
+**Kwargs for prompt helper**:
+- `context_window`: The size of the context window of the LLM. Typically we set this 
+  automatically with the model metadata. But we also allow explicit override via this parameter
+  for additional control (or in case the default is not available for certain latest
+  models)
+- `num_output`: The number of maximum output from the LLM. Typically we set this
+  automatically given the model metadata. This parameter does not actually limit the model
+  output, it affects the amount of "space" we save for the output, when computing 
+  available context window size for packing text from retrieved Nodes.
+
+Here's a complete example that sets up all objects using their default settings:
+
+```python
+from llama_index import ServiceContext, LLMPredictor, OpenAIEmbedding, PromptHelper
+from llama_index.llms import OpenAI
+from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
+from llama_index.node_parser import SimpleNodeParser
+
+llm = OpenAI(model='text-davinci-003', temperature=0, max_tokens=256)
+embed_model = OpenAIEmbedding()
+node_parser = SimpleNodeParser(
+  text_splitter=TokenTextSplitter(chunk_size=1024, chunk_overlap=20)
+)
+prompt_helper = PromptHelper(
+  context_window=4096, 
+  num_output=256, 
+  chunk_overlap_ratio=0.1, 
+  chunk_size_limit=None
+)
+
+service_context = ServiceContext.from_defaults(
+  llm=llm,
+  embed_model=embed_model,
+  node_parser=node_parser,
+  prompt_helper=prompt_helper
+)
+```
+
+### Setting global configuration
+You can set a service context as the global default that applies to the entire LlamaIndex pipeline:
+
+```python
+from llama_index import set_global_service_context
+set_global_service_context(service_context)
+```
+
+### Setting local configuration
+You can pass in a service context to specific part of the pipeline to override the default configuration: 
+
+```python
+query_engine = index.as_query_engine(service_context=service_context)
+response = query_engine.query("What did the author do growing up?")
+print(response)
+```
@@ -0,0 +1,41 @@
+# Deprecated Terms
+
+As LlamaIndex continues to evolve, many class names and APIs have been adjusted, improved, and deprecated.
+
+The following is a list of previously popular terms that have been deprecated, with links to their replacements.
+
+## GPTSimpleVectorIndex
+
+This has been renamed to `VectorStoreIndex`, as well as unifying all vector indexes to a single unified interface. You can integrate with various vector databases by modifying the underlying `vector_store`. 
+
+Please see the following links for more details on usage.
+
+- [Index Usage Pattern](/core_modules/data_modules/index/usage_pattern.md)
+- [Vector Store Guide](/core_modules/data_modules/index/vector_store_guide.ipynb)
+- [Vector Store Integrations](/community/integrations/vector_stores.md)
+
+## GPTVectorStoreIndex
+
+This has been renamed to `VectorStoreIndex`, but it is only a cosmetic change. Please see the following links for more details on usage.
+
+- [Index Usage Pattern](/core_modules/data_modules/index/usage_pattern.md)
+- [Vector Store Guide](/core_modules/data_modules/index/vector_store_guide.ipynb)
+- [Vector Store Integrations](/community/integrations/vector_stores.md)
+
+## LLMPredictor
+
+The `LLMPredictor` object is no longer intended to be used by users. Instead, you can setup an LLM directly and pass it into the `ServiceContext`.
+
+- [LLMs in LlamaIndex](/core_modules/model_modules/llms/root.md)
+- [Setting LLMs in the ServiceContext](/core_modules/supporting_modules/service_context.md)
+
+## PromptHelper and max_input_size/
+
+The `max_input_size` parameter for the prompt helper has since been replaced with `context_window`.
+
+The `PromptHelper` in general has been deprecated in favour of specifying parameters directly in the `service_context` and `node_parser`.
+
+See the following links for more details.
+
+- [Configuring settings in the Service Context](/core_modules/supporting_modules/service_context.md)
+- [Parsing Documents into Nodes](/core_modules/data_modules/node_parsers/root.md)
@@ -0,0 +1 @@
+.. mdinclude:: ../../CHANGELOG.md
@@ -0,0 +1 @@
+.. mdinclude:: ../../CONTRIBUTING.md
@@ -0,0 +1 @@
+.. mdinclude:: ../DOCS_README.md
@@ -0,0 +1,8 @@
+# Privacy and Security  
+By default, LLamaIndex sends your data to OpenAI for generating embeddings and natural language responses. However, it is important to note that this can be configured according to your preferences. LLamaIndex provides the flexibility to use your own embedding model or run a large language model locally if desired.
+
+## Data Privacy
+Regarding data privacy, when using LLamaIndex with OpenAI, the privacy details and handling of your data are subject to OpenAI's policies. And each custom service other than OpenAI have their own policies as well.
+
+## Vector stores
+LLamaIndex offers modules to connect with other vector stores within indexes to store embeddings. It is worth noting that each vector store has its own privacy policies and practices, and LLamaIndex does not assume responsibility for how they handle or use your data. Also by default LLamaIndex have a default option to store your embeddings locally.
@@ -0,0 +1,96 @@
+# Agents
+
+## Context
+An "agent" is an automated reasoning and decision engine. It takes in a user input/query and can make internal decisions for executing
+that query in order to return the correct result. The key agent components can include, but are not limited to:
+- Breaking down a complex question into smaller ones
+- Choosing an external Tool to use + coming up with parameters for calling the Tool
+- Planning out a set of tasks
+- Storing previously completed tasks in a memory module
+
+Research developments in LLMs (e.g. [ChatGPT Plugins](https://openai.com/blog/chatgpt-plugins)), LLM research ([ReAct](https://arxiv.org/abs/2210.03629), [Toolformer](https://arxiv.org/abs/2302.04761)) and LLM tooling ([LangChain](https://python.langchain.com/en/latest/modules/agents.html), [Semantic Kernel](https://github.com/microsoft/semantic-kernel)) have popularized the concept of agents.
+
+
+
+## Agents + LlamaIndex
+
+LlamaIndex provides some amazing tools to manage and interact with your data within your LLM application. And it can be a core tool that you use while building an agent-based app.
+- On one hand, some components within LlamaIndex are "agent-like" - these make automated decisions to help a particular use case over your data.
+- On the other hand, LlamaIndex can be used as a core Tool within another agent framework.
+
+In general, LlamaIndex components offer more explicit, constrained behavior for more specific use cases. Agent frameworks such as ReAct (implemented in LangChain) offer agents that are more unconstrained + 
+capable of general reasoning. 
+
+There are tradeoffs for using both - less-capable LLMs typically do better with more constraints. Take a look at [our blog post on this](https://medium.com/llamaindex-blog/dumber-llm-agents-need-more-constraints-and-better-tools-17a524c59e12) for 
+a more information + a detailed analysis.
+
+
+### "Agent-like" Components within LlamaIndex 
+
+LlamaIndex provides core modules capable of automated reasoning for different use cases over your data. Please check out our [use cases doc](/end_to_end_tutorials/use_cases.md) for more details on high-level use cases that LlamaIndex can help fulfill.
+
+Some of these core modules are shown below along with example tutorials (not comprehensive, please click into the guides/how-tos for more details).
+
+**SubQuestionQueryEngine for Multi-Document Analysis**
+- [Usage](queries.md#multi-document-queries)
+- [Sub Question Query Engine (Intro)](/examples/query_engine/sub_question_query_engine.ipynb)
+- [10Q Analysis (Uber)](/examples/usecases/10q_sub_question.ipynb)
+- [10K Analysis (Uber and Lyft)](/examples/usecases/10k_sub_question.ipynb)
+
+
+**Query Transformations**
+- [How-To](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
+- [Multi-Step Query Decomposition](/examples/query_transformations/HyDEQueryTransformDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb))
+
+**Routing**
+- [Usage](queries.md#routing-over-heterogeneous-data)
+- [Router Query Engine Guide](/examples/query_engine/RouterQueryEngine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/RouterQueryEngine.ipynb))
+
+**LLM Reranking**
+- [Second Stage Processing How-To](/core_modules/query_modules/node_postprocessors/root.md)
+- [LLM Reranking Guide (Great Gatsby)](/examples/node_postprocessor/LLMReranker-Gatsby.ipynb)
+
+**Chat Engines**
+- [Chat Engines How-To](/core_modules/query_modules/chat_engines/root.md)
+
+
+### Using LlamaIndex as as Tool within an Agent Framework
+
+LlamaIndex can be used as as Tool within an agent framework - including LangChain, ChatGPT. These integrations are described below.
+
+#### LangChain
+
+We have deep integrations with LangChain. 
+LlamaIndex query engines can be easily packaged as Tools to be used within a LangChain agent, and LlamaIndex can also be used as a memory module / retriever. Check out our guides/tutorials below!
+
+**Resources**
+- [LangChain integration guide](/community/integrations/using_with_langchain.md)
+- [Building a Chatbot Tutorial (LangChain + LlamaIndex)](/guides/tutorials/building_a_chatbot.md)
+- [OnDemandLoaderTool Tutorial](/examples/tools/OnDemandLoaderTool.ipynb)
+
+#### ChatGPT
+
+LlamaIndex can be used as a ChatGPT retrieval plugin (we have a TODO to develop a more general plugin as well).
+
+**Resources**
+- [LlamaIndex ChatGPT Retrieval Plugin](https://github.com/openai/chatgpt-retrieval-plugin#llamaindex)
+
+
+### Native OpenAIAgent
+
+With the [new OpenAI API](https://openai.com/blog/function-calling-and-other-api-updates) that supports function calling, it’s never been easier to build your own agent!
+
+Learn how to write your own OpenAI agent in **under 50 lines of code**, or directly use our super simple
+`OpenAIAgent` implementation.
+
+```{toctree}
+---
+maxdepth: 1
+---
+/examples/agent/openai_agent.ipynb
+/examples/agent/openai_agent_with_query_engine.ipynb
+/examples/agent/openai_agent_retrieval.ipynb
+/examples/agent/openai_agent_query_cookbook.ipynb
+/examples/agent/openai_agent_query_plan.ipynb
+/examples/agent/openai_agent_context_retrieval.ipynb
+```
@@ -0,0 +1,13 @@
+
+# Full-Stack Web Application
+
+LlamaIndex can be integrated into a downstream full-stack web application. It can be used in a backend server (such as Flask), packaged into a Docker container, and/or directly used in a framework such as Streamlit.
+
+We provide tutorials and resources to help you get started in this area.
+
+Relevant Resources:
+- [Fullstack Application Guide](/end_to_end_tutorials/apps/fullstack_app_guide.md)
+- [Fullstack Application with Delphic](/end_to_end_tutorials/apps/fullstack_with_delphic.md)
+- [A Guide to Extracting Terms and Definitions](/end_to_end_tutorials/question_and_answer/terms_definitions_tutorial.md)
+- [LlamaIndex Starter Pack](https://github.com/logan-markewich/llama_index_starter_pack)
+
@@ -0,0 +1,370 @@
+# A Guide to Building a Full-Stack Web App with LLamaIndex
+
+LlamaIndex is a python library, which means that integrating it with a full-stack web application will be a little different than what you might be used to.
+
+This guide seeks to walk through the steps needed to create a basic API service written in python, and how this interacts with a TypeScript+React frontend.
+
+All code examples here are available from the [llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react) in the flask_react folder.
+
+The main technologies used in this guide are as follows:
+
+- python3.11
+- llama_index
+- flask
+- typescript
+- react
+
+## Flask Backend
+
+For this guide, our backend will use a [Flask](https://flask.palletsprojects.com/en/2.2.x/) API server to communicate with our frontend code. If you prefer, you can also easily translate this to a [FastAPI](https://fastapi.tiangolo.com/) server, or any other python server library of your choice.
+
+Setting up a server using Flask is easy. You import the package, create the app object, and then create your endpoints. Let's create a basic skeleton for the server first:
+
+```python
+from flask import Flask
+
+app = Flask(__name__)
+
+@app.route("/")
+def home():
+    return "Hello World!"
+
+if __name__ == "__main__":
+    app.run(host="0.0.0.0", port=5601)
+```
+
+_flask_demo.py_
+
+If you run this file (`python flask_demo.py`), it will launch a server on port 5601. If you visit `http://localhost:5601/`, you will see the "Hello World!" text rendered in your browser. Nice!
+
+The next step is deciding what functions we want to include in our server, and to start using LlamaIndex.
+
+To keep things simple, the most basic operation we can provide is querying an existing index. Using the [paul graham essay](https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt) from LlamaIndex, create a documents folder and download+place the essay text file inside of it.
+
+### Basic Flask - Handling User Index Queries
+
+Now, let's write some code to initialize our index:
+
+```python
+import os
+from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext
+
+# NOTE: for local testing only, do NOT deploy with your key hardcoded
+os.environ['OPENAI_API_KEY'] = "your key here"
+
+index = None
+
+def initialize_index():
+    global index
+    storage_context = StorageContext.from_defaults()
+    if os.path.exists(index_dir):
+        index = load_index_from_storage(storage_context)
+    else:
+        documents = SimpleDirectoryReader("./documents").load_data()
+        index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
+        storage_context.persist(index_dir)
+```
+
+This function will initialize our index. If we call this just before starting the flask server in the `main` function, then our index will be ready for user queries!
+
+Our query endpoint will accept `GET` requests with the query text as a parameter. Here's what the full endpoint function will look like:
+
+```python
+from flask import request
+
+@app.route("/query", methods=["GET"])
+def query_index():
+  global index
+  query_text = request.args.get("text", None)
+  if query_text is None:
+    return "No text found, please include a ?text=blah parameter in the URL", 400
+  query_engine = index.as_query_engine()
+  response = query_engine.query(query_text)
+  return str(response), 200
+```
+
+Now, we've introduced a few new concepts to our server:
+
+- a new `/query` endpoint, defined by the function decorator
+- a new import from flask, `request`, which is used to get parameters from the request
+- if the `text` parameter is missing, then we return an error message and an appropriate HTML response code
+- otherwise, we query the index, and return the response as a string
+
+A full query example that you can test in your browser might look something like this: `http://localhost:5601/query?text=what did the author do growing up` (once you press enter, the browser will convert the spaces into "%20" characters).
+
+Things are looking pretty good! We now have a functional API. Using your own documents, you can easily provide an interface for any application to call the flask API and get answers to queries.
+
+### Advanced Flask - Handling User Document Uploads
+
+Things are looking pretty cool, but how can we take this a step further? What if we want to allow users to build their own indexes by uploading their own documents? Have no fear, Flask can handle it all :muscle:.
+
+To let users upload documents, we have to take some extra precautions. Instead of querying an existing index, the index will become **mutable**. If you have many users adding to the same index, we need to think about how to handle concurrency. Our Flask server is threaded, which means multiple users can ping the server with requests which will be handled at the same time.
+
+One option might be to create an index for each user or group, and store and fetch things from S3. But for this example, we will assume there is one locally stored index that users are interacting with.
+
+To handle concurrent uploads and ensure sequential inserts into the index, we can use the `BaseManager` python package to provide sequential access to the index using a separate server and locks. This sounds scary, but it's not so bad! We will just move all our index operations (initializing, querying, inserting) into the `BaseManager` "index_server", which will be called from our Flask server.
+
+Here's a basic example of what our `index_server.py` will look like after we've moved our code:
+
+```python
+import os
+from multiprocessing import Lock
+from multiprocessing.managers import BaseManager
+from llama_index import SimpleDirectoryReader, VectorStoreIndex, Document
+
+# NOTE: for local testing only, do NOT deploy with your key hardcoded
+os.environ['OPENAI_API_KEY'] = "your key here"
+
+index = None
+lock = Lock()
+
+def initialize_index():
+  global index
+
+  with lock:
+    # same as before ...
+  ...
+
+def query_index(query_text):
+  global index
+  query_engine = index.as_query_engine()
+  response = query_engine.query(query_text)
+  return str(response)
+
+if __name__ == "__main__":
+    # init the global index
+    print("initializing index...")
+    initialize_index()
+
+    # setup server
+    # NOTE: you might want to handle the password in a less hardcoded way
+    manager = BaseManager(('', 5602), b'password')
+    manager.register('query_index', query_index)
+    server = manager.get_server()
+
+    print("starting server...")
+    server.serve_forever()
+```
+
+_index_server.py_
+
+So, we've moved our functions, introduced the `Lock` object which ensures sequential access to the global index, registered our single function in the server, and started the server on port 5602 with the password `password`.
+
+Then, we can adjust our flask code as follows:
+
+```python
+from multiprocessing.managers import BaseManager
+from flask import Flask, request
+
+# initialize manager connection
+# NOTE: you might want to handle the password in a less hardcoded way
+manager = BaseManager(('', 5602), b'password')
+manager.register('query_index')
+manager.connect()
+
+@app.route("/query", methods=["GET"])
+def query_index():
+  global index
+  query_text = request.args.get("text", None)
+  if query_text is None:
+    return "No text found, please include a ?text=blah parameter in the URL", 400
+  response = manager.query_index(query_text)._getvalue()
+  return str(response), 200
+
+@app.route("/")
+def home():
+    return "Hello World!"
+
+if __name__ == "__main__":
+    app.run(host="0.0.0.0", port=5601)
+
+```
+
+_flask_demo.py_
+
+The two main changes are connecting to our existing `BaseManager` server and registering the functions, as well as calling the function through the manager in the `/query` endpoint.
+
+One special thing to note is that `BaseManager` servers don't return objects quite as we expect. To resolve the return value into it's original object, we call the `_getvalue()` function.
+
+If we allow users to upload their own documents, we should probably remove the Paul Graham essay from the documents folder, so let's do that first. Then, let's add an endpoint to upload files! First, let's define our Flask endpoint function:
+
+```python
+...
+manager.register('insert_into_index')
+...
+
+@app.route("/uploadFile", methods=["POST"])
+def upload_file():
+    global manager
+    if 'file' not in request.files:
+        return "Please send a POST request with a file", 400
+
+    filepath = None
+    try:
+        uploaded_file = request.files["file"]
+        filename = secure_filename(uploaded_file.filename)
+        filepath = os.path.join('documents', os.path.basename(filename))
+        uploaded_file.save(filepath)
+
+        if request.form.get("filename_as_doc_id", None) is not None:
+            manager.insert_into_index(filepath, doc_id=filename)
+        else:
+            manager.insert_into_index(filepath)
+    except Exception as e:
+        # cleanup temp file
+        if filepath is not None and os.path.exists(filepath):
+            os.remove(filepath)
+        return "Error: {}".format(str(e)), 500
+
+    # cleanup temp file
+    if filepath is not None and os.path.exists(filepath):
+        os.remove(filepath)
+
+    return "File inserted!", 200
+```
+
+Not too bad! You will notice that we write the file to disk. We could skip this if we only accept basic file formats like `txt` files, but written to disk we can take advantage of LlamaIndex's `SimpleDirectoryReader` to take care of a bunch of more complex file formats. Optionally, we also use a second `POST` argument to either use the filename as a doc_id or let LlamaIndex generate one for us. This will make more sense once we implement the frontend.
+
+With these more complicated requests, I also suggest using a tool like [Postman](https://www.postman.com/downloads/?utm_source=postman-home). Examples of using postman to test our endpoints are in the [repository for this project](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react/postman_examples).
+
+Lastly, you'll notice we added a new function to the manager. Let's implement that inside `index_server.py`:
+
+```python
+def insert_into_index(doc_text, doc_id=None):
+    global index
+    document = SimpleDirectoryReader(input_files=[doc_text]).load_data()[0]
+    if doc_id is not None:
+        document.doc_id = doc_id
+
+    with lock:
+        index.insert(document)
+        index.storage_context.persist()
+
+...
+manager.register('insert_into_index', insert_into_index)
+...
+```
+
+Easy! If we launch both the `index_server.py` and then the `flask_demo.py` python files, we have a Flask API server that can handle multiple requests to insert documents into a vector index and respond to user queries!
+
+To support some functionality in the frontend, I've adjusted what some responses look like from the Flask API, as well as added some functionality to keep track of which documents are stored in the index (LlamaIndex doesn't currently support this in a user-friendly way, but we can augment it ourselves!). Lastly, I had to add CORS support to the server using the `Flask-cors` python package.
+
+Check out the complete `flask_demo.py` and `index_server.py` scripts in the [repository](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react) for the final minor changes, the`requirements.txt` file, and a sample `Dockerfile` to help with deployment.
+
+## React Frontend
+
+Generally, React and Typescript are one of the most popular libraries and languages for writing webapps today. This guide will assume you are familiar with how these tools work, because otherwise this guide will triple in length :smile:.
+
+In the [repository](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react), the frontend code is organized inside of the `react_frontend` folder.
+
+The most relevant part of the frontend will be the `src/apis` folder. This is where we make calls to the Flask server, supporting the following queries:
+
+- `/query` -- make a query to the existing index
+- `/uploadFile` -- upload a file to the flask server for insertion into the index
+- `/getDocuments` -- list the current document titles and a portion of their texts
+
+Using these three queries, we can build a robust frontend that allows users to upload and keep track of their files, query the index, and view the query response and information about which text nodes were used to form the response.
+
+### fetchDocuments.tsx
+
+This file contains the function to, you guessed it, fetch the list of current documents in the index. The code is as follows:
+
+```typescript
+export type Document = {
+  id: string;
+  text: string;
+};
+
+const fetchDocuments = async (): Promise<Document[]> => {
+  const response = await fetch("http://localhost:5601/getDocuments", {
+    mode: "cors",
+  });
+
+  if (!response.ok) {
+    return [];
+  }
+
+  const documentList = (await response.json()) as Document[];
+  return documentList;
+};
+```
+
+As you can see, we make a query to the Flask server (here, it assumes running on localhost). Notice that we need to include the `mode: 'cors'` option, as we are making an external request.
+
+Then, we check if the response was ok, and if so, get the response json and return it. Here, the response json is a list of `Document` objects that are defined in the same file.
+
+### queryIndex.tsx
+
+This file sends the user query to the flask server, and gets the response back, as well as details about which nodes in our index provided the response.
+
+```typescript
+export type ResponseSources = {
+  text: string;
+  doc_id: string;
+  start: number;
+  end: number;
+  similarity: number;
+};
+
+export type QueryResponse = {
+  text: string;
+  sources: ResponseSources[];
+};
+
+const queryIndex = async (query: string): Promise<QueryResponse> => {
+  const queryURL = new URL("http://localhost:5601/query?text=1");
+  queryURL.searchParams.append("text", query);
+
+  const response = await fetch(queryURL, { mode: "cors" });
+  if (!response.ok) {
+    return { text: "Error in query", sources: [] };
+  }
+
+  const queryResponse = (await response.json()) as QueryResponse;
+
+  return queryResponse;
+};
+
+export default queryIndex;
+```
+
+This is similar to the `fetchDocuments.tsx` file, with the main difference being we include the query text as a parameter in the URL. Then, we check if the response is ok and return it with the appropriate typescript type.
+
+### insertDocument.tsx
+
+Probably the most complex API call is uploading a document. The function here accepts a file object and constructs a `POST` request using `FormData`.
+
+The actual response text is not used in the app but could be utilized to provide some user feedback on if the file failed to upload or not.
+
+```typescript
+const insertDocument = async (file: File) => {
+  const formData = new FormData();
+  formData.append("file", file);
+  formData.append("filename_as_doc_id", "true");
+
+  const response = await fetch("http://localhost:5601/uploadFile", {
+    mode: "cors",
+    method: "POST",
+    body: formData,
+  });
+
+  const responseText = response.text();
+  return responseText;
+};
+
+export default insertDocument;
+```
+
+### All the Other Frontend Good-ness
+
+And that pretty much wraps up the frontend portion! The rest of the react frontend code is some pretty basic react components, and my best attempt to make it look at least a little nice :smile:.
+
+I encourage to read the rest of the [codebase](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react/react_frontend) and submit any PRs for improvements!
+
+## Conclusion
+
+This guide has covered a ton of information. We went from a basic "Hello World" Flask server written in python, to a fully functioning LlamaIndex powered backend and how to connect that to a frontend application.
+
+As you can see, we can easily augment and wrap the services provided by LlamaIndex (like the little external document tracker) to help provide a good user experience on the frontend.
+
+You could take this and add many features (multi-index/user support, saving objects into S3, adding a Pinecone vector server, etc.). And when you build an app after reading this, be sure to share the final result in the Discord! Good Luck! :muscle:
@@ -0,0 +1,785 @@
+# A Guide to Building a Full-Stack LlamaIndex Web App with Delphic
+
+This guide seeks to walk you through using LlamaIndex with a production-ready web app starter template
+called [Delphic](https://github.com/JSv4/Delphic). All code examples here are available from
+the [Delphic](https://github.com/JSv4/Delphic) repo
+
+## What We're Building
+
+Here's a quick demo of the out-of-the-box functionality of Delphic:
+
+https://user-images.githubusercontent.com/5049984/233236432-aa4980b6-a510-42f3-887a-81485c9644e6.mp4
+
+## Architectural Overview
+
+Delphic leverages the LlamaIndex python library to let users to create their own document collections they can then
+query in a responsive frontend.
+
+We chose a stack that provides a responsive, robust mix of technologies that can (1) orchestrate complex python
+processing tasks while providing (2) a modern, responsive frontend and (3) a secure backend to build additional
+functionality upon.
+
+The core libraries are:
+
+1. [Django](https://www.djangoproject.com/)
+2. [Django Channels](https://channels.readthedocs.io/en/stable/)
+3. [Django Ninja](https://django-ninja.rest-framework.com/)
+4. [Redis](https://redis.io/)
+5. [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html)
+6. [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/)
+7. [Langchain](https://python.langchain.com/en/latest/index.html)
+8. [React](https://github.com/facebook/react)
+9. Docker & Docker Compose
+
+Thanks to this modern stack built on the super stable Django web framework, the starter Delphic app boasts a streamlined
+developer experience, built-in authentication and user management, asynchronous vector store processing, and
+web-socket-based query connections for a responsive UI. In addition, our frontend is built with TypeScript and is based
+on MUI React for a responsive and modern user interface.
+
+## System Requirements
+
+Celery doesn't work on Windows. It may be deployable with Windows Subsystem for Linux, but configuring that is beyond
+the scope of this tutorial. For this reason, we recommend you only follow this tutorial if you're running Linux or OSX.
+You will need Docker and Docker Compose installed to deploy the application. Local development will require node version
+manager (nvm).
+
+## Django Backend
+
+### Project Directory Overview
+
+The Delphic application has a structured backend directory organization that follows common Django project conventions.
+From the repo root, in the `./delphic` subfolder, the main folders are:
+
+1. `contrib`: This directory contains custom modifications or additions to Django's built-in `contrib` apps.
+2. `indexes`: This directory contains the core functionality related to document indexing and LLM integration. It
+   includes:
+
+- `admin.py`: Django admin configuration for the app
+- `apps.py`: Application configuration
+- `models.py`: Contains the app's database models
+- `migrations`: Directory containing database schema migrations for the app
+- `signals.py`: Defines any signals for the app
+- `tests.py`: Unit tests for the app
+
+3. `tasks`: This directory contains tasks for asynchronous processing using Celery. The `index_tasks.py` file includes
+   the tasks for creating vector indexes.
+4. `users`: This directory is dedicated to user management, including:
+5. `utils`: This directory contains utility modules and functions that are used across the application, such as custom
+   storage backends, path helpers, and collection-related utilities.
+
+### Database Models
+
+The Delphic application has two core models: `Document` and `Collection`. These models represent the central entities
+the application deals with when indexing and querying documents using LLMs. They're defined in
+[`./delphic/indexes/models.py`](https://github.com/JSv4/Delphic/blob/main/delphic/indexes/models.py).
+
+1. `Collection`:
+
+- `api_key`: A foreign key that links a collection to an API key. This helps associate jobs with the source API key.
+- `title`: A character field that provides a title for the collection.
+- `description`: A text field that provides a description of the collection.
+- `status`: A character field that stores the processing status of the collection, utilizing the `CollectionStatus`
+  enumeration.
+- `created`: A datetime field that records when the collection was created.
+- `modified`: A datetime field that records the last modification time of the collection.
+- `model`: A file field that stores the model associated with the collection.
+- `processing`: A boolean field that indicates if the collection is currently being processed.
+
+2. `Document`:
+
+- `collection`: A foreign key that links a document to a collection. This represents the relationship between documents
+  and collections.
+- `file`: A file field that stores the uploaded document file.
+- `description`: A text field that provides a description of the document.
+- `created`: A datetime field that records when the document was created.
+- `modified`: A datetime field that records the last modification time of the document.
+
+These models provide a solid foundation for collections of documents and the indexes created from them with LlamaIndex.
+
+### Django Ninja API
+
+Django Ninja is a web framework for building APIs with Django and Python 3.7+ type hints. It provides a simple,
+intuitive, and expressive way of defining API endpoints, leveraging Python’s type hints to automatically generate input
+validation, serialization, and documentation.
+
+In the Delphic repo,
+the [`./config/api/endpoints.py`](https://github.com/JSv4/Delphic/blob/main/config/api/endpoints.py)
+file contains the API routes and logic for the API endpoints. Now, let’s briefly address the purpose of each endpoint
+in the `endpoints.py` file:
+
+1. `/heartbeat`: A simple GET endpoint to check if the API is up and running. Returns `True` if the API is accessible.
+   This is helpful for Kubernetes setups that expect to be able to query your container to ensure it's up and running.
+
+2. `/collections/create`: A POST endpoint to create a new `Collection`. Accepts form parameters such
+   as `title`, `description`, and a list of `files`. Creates a new `Collection` and `Document` instances for each file,
+   and schedules a Celery task to create an index.
+
+```python
+@collections_router.post("/create")
+async def create_collection(request,
+                            title: str = Form(...),
+                            description: str = Form(...),
+                            files: list[UploadedFile] = File(...), ):
+    key = None if getattr(request, "auth", None) is None else request.auth
+    if key is not None:
+        key = await key
+
+    collection_instance = Collection(
+        api_key=key,
+        title=title,
+        description=description,
+        status=CollectionStatusEnum.QUEUED,
+    )
+
+    await sync_to_async(collection_instance.save)()
+
+    for uploaded_file in files:
+        doc_data = uploaded_file.file.read()
+        doc_file = ContentFile(doc_data, uploaded_file.name)
+        document = Document(collection=collection_instance, file=doc_file)
+        await sync_to_async(document.save)()
+
+    create_index.si(collection_instance.id).apply_async()
+
+    return await sync_to_async(CollectionModelSchema)(
+        ...
+    )
+```
+
+3. `/collections/query` — a POST endpoint to query a document collection using the LLM. Accepts a JSON payload
+   containing `collection_id` and `query_str`, and returns a response generated by querying the collection. We don't
+   actually use this endpoint in our chat GUI (We use a websocket - see below), but you could build an app to integrate
+   to this REST endpoint to query a specific collection.
+
+```python
+@collections_router.post("/query",
+                         response=CollectionQueryOutput,
+                         summary="Ask a question of a document collection", )
+def query_collection_view(request: HttpRequest, query_input: CollectionQueryInput):
+    collection_id = query_input.collection_id
+    query_str = query_input.query_str
+    response = query_collection(collection_id, query_str)
+    return {"response": response}
+```
+
+4. `/collections/available`: A GET endpoint that returns a list of all collections created with the user's API key. The
+   output is serialized using the `CollectionModelSchema`.
+
+```python
+@collections_router.get("/available",
+                        response=list[CollectionModelSchema],
+                        summary="Get a list of all of the collections created with my api_key", )
+async def get_my_collections_view(request: HttpRequest):
+    key = None if getattr(request, "auth", None) is None else request.auth
+    if key is not None:
+        key = await key
+
+    collections = Collection.objects.filter(api_key=key)
+
+    return [
+        {
+            ...
+        }
+        async for collection in collections
+    ]
+```
+
+5. `/collections/{collection_id}/add_file`: A POST endpoint to add a file to an existing collection. Accepts
+   a `collection_id` path parameter, and form parameters such as `file` and `description`. Adds the file as a `Document`
+   instance associated with the specified collection.
+
+```python
+@collections_router.post("/{collection_id}/add_file", summary="Add a file to a collection")
+async def add_file_to_collection(request,
+                                 collection_id: int,
+                                 file: UploadedFile = File(...),
+                                 description: str = Form(...), ):
+    collection = await sync_to_async(Collection.objects.get)(id=collection_id
+```
+
+### Intro to Websockets
+
+WebSockets are a communication protocol that enables bidirectional and full-duplex communication between a client and a
+server over a single, long-lived connection. The WebSocket protocol is designed to work over the same ports as HTTP and
+HTTPS (ports 80 and 443, respectively) and uses a similar handshake process to establish a connection. Once the
+connection is established, data can be sent in both directions as “frames” without the need to reestablish the
+connection each time, unlike traditional HTTP requests.
+
+There are several reasons to use WebSockets, particularly when working with code that takes a long time to load into
+memory but is quick to run once loaded:
+
+1. **Performance**: WebSockets eliminate the overhead associated with opening and closing multiple connections for each
+   request, reducing latency.
+2. **Efficiency**: WebSockets allow for real-time communication without the need for polling, resulting in more
+   efficient use of resources and better responsiveness.
+3. **Scalability**: WebSockets can handle a large number of simultaneous connections, making it ideal for applications
+   that require high concurrency.
+
+In the case of the Delphic application, using WebSockets makes sense as the LLMs can be expensive to load into memory.
+By establishing a WebSocket connection, the LLM can remain loaded in memory, allowing subsequent requests to be
+processed quickly without the need to reload the model each time.
+
+The ASGI configuration file [`./config/asgi.py`](https://github.com/JSv4/Delphic/blob/main/config/asgi.py) defines how
+the application should handle incoming connections, using the Django Channels `ProtocolTypeRouter` to route connections
+based on their protocol type. In this case, we have two protocol types: "http" and "websocket".
+
+The “http” protocol type uses the standard Django ASGI application to handle HTTP requests, while the “websocket”
+protocol type uses a custom `TokenAuthMiddleware` to authenticate WebSocket connections. The `URLRouter` within
+the `TokenAuthMiddleware` defines a URL pattern for the `CollectionQueryConsumer`, which is responsible for handling
+WebSocket connections related to querying document collections.
+
+```python
+application = ProtocolTypeRouter(
+    {
+        "http": get_asgi_application(),
+        "websocket": TokenAuthMiddleware(
+            URLRouter(
+                [
+                    re_path(
+                        r"ws/collections/(?P<collection_id>\w+)/query/$",
+                        CollectionQueryConsumer.as_asgi(),
+                    ),
+                ]
+            )
+        ),
+    }
+)
+```
+
+This configuration allows clients to establish WebSocket connections with the Delphic application to efficiently query
+document collections using the LLMs, without the need to reload the models for each request.
+
+### Websocket Handler
+
+The `CollectionQueryConsumer` class
+in [`config/api/websockets/queries.py`](https://github.com/JSv4/Delphic/blob/main/config/api/websockets/queries.py) is
+responsible for handling WebSocket connections related to querying document collections. It inherits from
+the `AsyncWebsocketConsumer` class provided by Django Channels.
+
+The `CollectionQueryConsumer` class has three main methods:
+
+1. `connect`: Called when a WebSocket is handshaking as part of the connection process.
+2. `disconnect`: Called when a WebSocket closes for any reason.
+3. `receive`: Called when the server receives a message from the WebSocket.
+
+#### Websocket connect listener
+
+The `connect` method is responsible for establishing the connection, extracting the collection ID from the connection
+path, loading the collection model, and accepting the connection.
+
+```python
+async def connect(self):
+    try:
+        self.collection_id = extract_connection_id(self.scope["path"])
+        self.index = await load_collection_model(self.collection_id)
+        await self.accept()
+
+except ValueError as e:
+await self.accept()
+await self.close(code=4000)
+except Exception as e:
+pass
+```
+
+#### Websocket disconnect listener
+
+The `disconnect` method is empty in this case, as there are no additional actions to be taken when the WebSocket is
+closed.
+
+#### Websocket receive listener
+
+The `receive` method is responsible for processing incoming messages from the WebSocket. It takes the incoming message,
+decodes it, and then queries the loaded collection model using the provided query. The response is then formatted as a
+markdown string and sent back to the client over the WebSocket connection.
+
+```python
+async def receive(self, text_data):
+    text_data_json = json.loads(text_data)
+
+    if self.index is not None:
+        query_str = text_data_json["query"]
+        modified_query_str = f"Please return a nicely formatted markdown string to this request:\n\n{query_str}"
+        query_engine = self.index.as_query_engine()
+        response = query_engine.query(modified_query_str)
+
+        markdown_response = f"## Response\n\n{response}\n\n"
+        if response.source_nodes:
+            markdown_sources = f"## Sources\n\n{response.get_formatted_sources()}"
+        else:
+            markdown_sources = ""
+
+        formatted_response = f"{markdown_response}{markdown_sources}"
+
+        await self.send(json.dumps({"response": formatted_response}, indent=4))
+    else:
+        await self.send(json.dumps({"error": "No index loaded for this connection."}, indent=4))
+```
+
+To load the collection model, the `load_collection_model` function is used, which can be found
+in [`delphic/utils/collections.py`](https://github.com/JSv4/Delphic/blob/main/delphic/utils/collections.py). This
+function retrieves the collection object with the given collection ID, checks if a JSON file for the collection model
+exists, and if not, creates one. Then, it sets up the `LLMPredictor` and `ServiceContext` before loading
+the `VectorStoreIndex` using the cache file.
+
+```python
+async def load_collection_model(collection_id: str | int) -> VectorStoreIndex:
+    """
+    Load the Collection model from cache or the database, and return the index.
+
+    Args:
+        collection_id (Union[str, int]): The ID of the Collection model instance.
+
+    Returns:
+        VectorStoreIndex: The loaded index.
+
+    This function performs the following steps:
+    1. Retrieve the Collection object with the given collection_id.
+    2. Check if a JSON file with the name '/cache/model_{collection_id}.json' exists.
+    3. If the JSON file doesn't exist, load the JSON from the Collection.model FileField and save it to
+       '/cache/model_{collection_id}.json'.
+    4. Call VectorStoreIndex.load_from_disk with the cache_file_path.
+    """
+    # Retrieve the Collection object
+    collection = await Collection.objects.aget(id=collection_id)
+    logger.info(f"load_collection_model() - loaded collection {collection_id}")
+
+    # Make sure there's a model
+    if collection.model.name:
+        logger.info("load_collection_model() - Setup local json index file")
+
+        # Check if the JSON file exists
+        cache_dir = Path(settings.BASE_DIR) / "cache"
+        cache_file_path = cache_dir / f"model_{collection_id}.json"
+        if not cache_file_path.exists():
+            cache_dir.mkdir(parents=True, exist_ok=True)
+            with collection.model.open("rb") as model_file:
+                with cache_file_path.open("w+", encoding="utf-8") as cache_file:
+                    cache_file.write(model_file.read().decode("utf-8"))
+
+        # define LLM
+        logger.info(
+            f"load_collection_model() - Setup service context with tokens {settings.MAX_TOKENS} and "
+            f"model {settings.MODEL_NAME}"
+        )
+        llm_predictor = LLMPredictor(
+            llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=512)
+        )
+        service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
+
+        # Call VectorStoreIndex.load_from_disk
+        logger.info("load_collection_model() - Load llama index")
+        index = VectorStoreIndex.load_from_disk(
+            cache_file_path, service_context=service_context
+        )
+        logger.info(
+            "load_collection_model() - Llamaindex loaded and ready for query..."
+        )
+
+    else:
+        logger.error(
+            f"load_collection_model() - collection {collection_id} has no model!"
+        )
+        raise ValueError("No model exists for this collection!")
+
+    return index
+```
+
+## React Frontend
+
+### Overview
+
+We chose to use TypeScript, React and Material-UI (MUI) for the Delphic project’s frontend for a couple reasons. First,
+as the most popular component library (MUI) for the most popular frontend framework (React), this choice makes this
+project accessible to a huge community of developers. Second, React is, at this point, a stable and generally well-liked
+framework that delivers valuable abstractions in the form of its virtual DOM while still being relatively stable and, in
+our opinion, pretty easy to learn, again making it accessible.
+
+### Frontend Project Structure
+
+The frontend can be found in the [`/frontend`](https://github.com/JSv4/Delphic/tree/main/frontend) directory of the
+repo, with the React-related components being in `/frontend/src` . You’ll notice there is a DockerFile in the `frontend`
+directory and several folders and files related to configuring our frontend web
+server — [nginx](https://www.nginx.com/).
+
+The `/frontend/src/App.tsx` file serves as the entry point of the application. It defines the main components, such as
+the login form, the drawer layout, and the collection create modal. The main components are conditionally rendered based
+on whether the user is logged in and has an authentication token.
+
+The DrawerLayout2 component is defined in the`DrawerLayour2.tsx` file. This component manages the layout of the
+application and provides the navigation and main content areas.
+
+Since the application is relatively simple, we can get away with not using a complex state management solution like
+Redux and just use React’s useState hooks.
+
+### Grabbing Collections from the Backend
+
+The collections available to the logged-in user are retrieved and displayed in the DrawerLayout2 component. The process
+can be broken down into the following steps:
+
+1. Initializing state variables:
+
+```tsx
+const[collections, setCollections] = useState < CollectionModelSchema[] > ([]);
+const[loading, setLoading] = useState(true);
+```
+
+Here, we initialize two state variables: `collections` to store the list of collections and `loading` to track whether
+the collections are being fetched.
+
+2. Collections are fetched for the logged-in user with the `fetchCollections()` function:
+
+```tsx
+const
+fetchCollections = async () = > {
+try {
+const accessToken = localStorage.getItem("accessToken");
+if (accessToken) {
+const response = await getMyCollections(accessToken);
+setCollections(response.data);
+}
+} catch (error) {
+console.error(error);
+} finally {
+setLoading(false);
+}
+};
+```
+
+The `fetchCollections` function retrieves the collections for the logged-in user by calling the `getMyCollections` API
+function with the user's access token. It then updates the `collections` state with the retrieved data and sets
+the `loading` state to `false` to indicate that fetching is complete.
+
+### Displaying Collections
+
+The latest collectios are displayed in the drawer like this:
+
+```tsx
+< List >
+{collections.map((collection) = > (
+    < div key={collection.id} >
+    < ListItem disablePadding >
+    < ListItemButton
+    disabled={
+    collection.status != = CollectionStatus.COMPLETE | |
+    !collection.has_model
+    }
+    onClick={() = > handleCollectionClick(collection)}
+selected = {
+    selectedCollection & &
+    selectedCollection.id == = collection.id
+}
+>
+< ListItemText
+primary = {collection.title} / >
+          {collection.status == = CollectionStatus.RUNNING ? (
+    < CircularProgress
+    size={24}
+    style={{position: "absolute", right: 16}}
+    / >
+): null}
+< / ListItemButton >
+    < / ListItem >
+        < / div >
+))}
+< / List >
+```
+
+You’ll notice that the `disabled` property of a collection’s `ListItemButton` is set based on whether the collection's
+status is not `CollectionStatus.COMPLETE` or the collection does not have a model (`!collection.has_model`). If either
+of these conditions is true, the button is disabled, preventing users from selecting an incomplete or model-less
+collection. Where the CollectionStatus is RUNNING, we also show a loading wheel over the button.
+
+In a separate `useEffect` hook, we check if any collection in the `collections` state has a status
+of `CollectionStatus.RUNNING` or `CollectionStatus.QUEUED`. If so, we set up an interval to repeatedly call
+the `fetchCollections` function every 15 seconds (15,000 milliseconds) to update the collection statuses. This way, the
+application periodically checks for completed collections, and the UI is updated accordingly when the processing is
+done.
+
+```tsx
+useEffect(() = > {
+    let
+interval: NodeJS.Timeout;
+if (
+    collections.some(
+        (collection) = >
+collection.status == = CollectionStatus.RUNNING | |
+collection.status == = CollectionStatus.QUEUED
+)
+) {
+    interval = setInterval(() = > {
+    fetchCollections();
+}, 15000);
+}
+return () = > clearInterval(interval);
+}, [collections]);
+```
+
+### Chat View Component
+
+The `ChatView` component in `frontend/src/chat/ChatView.tsx` is responsible for handling and displaying a chat interface
+for a user to interact with a collection. The component establishes a WebSocket connection to communicate in real-time
+with the server, sending and receiving messages.
+
+Key features of the `ChatView` component include:
+
+1. Establishing and managing the WebSocket connection with the server.
+2. Displaying messages from the user and the server in a chat-like format.
+3. Handling user input to send messages to the server.
+4. Updating the messages state and UI based on received messages from the server.
+5. Displaying connection status and errors, such as loading messages, connecting to the server, or encountering errors
+   while loading a collection.
+
+Together, all of this allows users to interact with their selected collection with a very smooth, low-latency
+experience.
+
+#### Chat Websocket Client
+
+The WebSocket connection in the `ChatView` component is used to establish real-time communication between the client and
+the server. The WebSocket connection is set up and managed in the `ChatView` component as follows:
+
+First, we want to initialize the the WebSocket reference:
+
+const websocket = useRef<WebSocket | null>(null);
+
+A `websocket` reference is created using `useRef`, which holds the WebSocket object that will be used for
+communication. `useRef` is a hook in React that allows you to create a mutable reference object that persists across
+renders. It is particularly useful when you need to hold a reference to a mutable object, such as a WebSocket
+connection, without causing unnecessary re-renders.
+
+In the `ChatView` component, the WebSocket connection needs to be established and maintained throughout the lifetime of
+the component, and it should not trigger a re-render when the connection state changes. By using `useRef`, you ensure
+that the WebSocket connection is kept as a reference, and the component only re-renders when there are actual state
+changes, such as updating messages or displaying errors.
+
+The `setupWebsocket` function is responsible for establishing the WebSocket connection and setting up event handlers to
+handle different WebSocket events.
+
+Overall, the setupWebsocket function looks like this:
+
+```tsx
+const setupWebsocket = () => {  
+  setConnecting(true);  
+  // Here, a new WebSocket object is created using the specified URL, which includes the   
+  // selected collection's ID and the user's authentication token.  
+    
+  websocket.current = new WebSocket(  
+    `ws://localhost:8000/ws/collections/${selectedCollection.id}/query/?token=${authToken}`  
+  );  
+  
+  websocket.current.onopen = (event) => {  
+    //...  
+  };  
+  
+  websocket.current.onmessage = (event) => {  
+    //...  
+  };  
+  
+  websocket.current.onclose = (event) => {  
+    //...  
+  };  
+  
+  websocket.current.onerror = (event) => {  
+    //...  
+  };  
+  
+  return () => {  
+    websocket.current?.close();  
+  };  
+};
+```
+
+Notice in a bunch of places we trigger updates to the GUI based on the information from the web socket client.
+
+When the component first opens and we try to establish a connection, the `onopen` listener is triggered. In the
+callback, the component updates the states to reflect that the connection is established, any previous errors are
+cleared, and no messages are awaiting responses:
+
+```tsx
+websocket.current.onopen = (event) => {  
+  setError(false);  
+  setConnecting(false);  
+  setAwaitingMessage(false);  
+  
+  console.log("WebSocket connected:", event);  
+};
+```
+
+`onmessage`is triggered when a new message is received from the server through the WebSocket connection. In the
+callback, the received data is parsed and the `messages` state is updated with the new message from the server:
+
+```
+websocket.current.onmessage = (event) => {  
+  const data = JSON.parse(event.data);  
+  console.log("WebSocket message received:", data);  
+  setAwaitingMessage(false);  
+  
+  if (data.response) {  
+    // Update the messages state with the new message from the server  
+    setMessages((prevMessages) => [  
+      ...prevMessages,  
+      {  
+        sender_id: "server",  
+        message: data.response,  
+        timestamp: new Date().toLocaleTimeString(),  
+      },  
+    ]);  
+  }  
+};
+```
+
+`onclose`is triggered when the WebSocket connection is closed. In the callback, the component checks for a specific
+close code (`4000`) to display a warning toast and update the component states accordingly. It also logs the close
+event:
+
+```tsx
+websocket.current.onclose = (event) => {  
+  if (event.code === 4000) {  
+    toast.warning(  
+      "Selected collection's model is unavailable. Was it created properly?"  
+    );  
+    setError(true);  
+    setConnecting(false);  
+    setAwaitingMessage(false);  
+  }  
+  console.log("WebSocket closed:", event);  
+};
+```
+
+Finally, `onerror` is triggered when an error occurs with the WebSocket connection. In the callback, the component
+updates the states to reflect the error and logs the error event:
+
+```tsx
+    websocket.current.onerror = (event) => {
+      setError(true);
+      setConnecting(false);
+      setAwaitingMessage(false);
+
+      console.error("WebSocket error:", event);
+    };
+  ```
+
+#### Rendering our Chat Messages
+
+In the `ChatView` component, the layout is determined using CSS styling and Material-UI components. The main layout
+consists of a container with a `flex` display and a column-oriented `flexDirection`. This ensures that the content
+within the container is arranged vertically.
+
+There are three primary sections within the layout:
+
+1. The chat messages area: This section takes up most of the available space and displays a list of messages exchanged
+   between the user and the server. It has an overflow-y set to ‘auto’, which allows scrolling when the content
+   overflows the available space. The messages are rendered using the `ChatMessage` component for each message and
+   a `ChatMessageLoading` component to show the loading state while waiting for a server response.
+2. The divider: A Material-UI `Divider` component is used to separate the chat messages area from the input area,
+   creating a clear visual distinction between the two sections.
+3. The input area: This section is located at the bottom and allows the user to type and send messages. It contains
+   a `TextField` component from Material-UI, which is set to accept multiline input with a maximum of 2 rows. The input
+   area also includes a `Button` component to send the message. The user can either click the "Send" button or press "
+   Enter" on their keyboard to send the message.
+
+The user inputs accepted in the `ChatView` component are text messages that the user types in the `TextField`. The
+component processes these text inputs and sends them to the server through the WebSocket connection.
+
+## Deployment
+
+### Prerequisites
+
+To deploy the app, you're going to need Docker and Docker Compose installed. If you're on Ubuntu or another, common
+Linux distribution, DigitalOcean has
+a [great Docker tutorial](https://www.digitalocean.com/community/tutorial_collections/how-to-install-and-use-docker) and
+another great tutorial
+for [Docker Compose](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-20-04)
+you can follow. If those don't work for you, try
+the [official docker documentation.](https://docs.docker.com/engine/install/)
+
+### Build and Deploy
+
+The project is based on django-cookiecutter, and it’s pretty easy to get it deployed on a VM and configured to serve
+HTTPs traffic for a specific domain. The configuration is somewhat involved, however — not because of this project, but
+it’s just a fairly involved topic to configure your certificates, DNS, etc.
+
+For the purposes of this guide, let’s just get running locally. Perhaps we’ll release a guide on production deployment.
+In the meantime, check out
+the [Django Cookiecutter project docs](https://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html)
+for starters.
+
+This guide assumes your goal is to get the application up and running for use. If you want to develop, most likely you
+won’t want to launch the compose stack with the — profiles fullstack flag and will instead want to launch the react
+frontend using the node development server.
+
+To deploy, first clone the repo:
+
+```commandline
+git clone https://github.com/yourusername/delphic.git
+```
+
+Change into the project directory:
+
+```commandline
+cd delphic
+```
+
+Copy the sample environment files:
+
+```commandline
+mkdir -p ./.envs/.local/  
+cp -a ./docs/sample_envs/local/.frontend ./frontend  
+cp -a ./docs/sample_envs/local/.django ./.envs/.local  
+cp -a ./docs/sample_envs/local/.postgres ./.envs/.local
+```
+
+Edit the `.django` and `.postgres` configuration files to include your OpenAI API key and set a unique password for your
+database user. You can also set the response token limit in the .django file or switch which OpenAI model you want to
+use. GPT4 is supported, assuming you’re authorized to access it.
+
+Build the docker compose stack with the `--profiles fullstack` flag:
+
+```commandline
+sudo docker-compose --profiles fullstack -f local.yml build
+```
+
+The fullstack flag instructs compose to build a docker container from the frontend folder and this will be launched
+along with all of the needed, backend containers. It takes a long time to build a production React container, however,
+so we don’t recommend you develop this way. Follow
+the [instructions in the project readme.md](https://github.com/JSv4/Delphic#development) for development environment
+setup instructions.
+
+Finally, bring up the application:
+
+```commandline
+sudo docker-compose -f local.yml up
+```
+
+Now, visit `localhost:3000` in your browser to see the frontend, and use the Delphic application locally.
+
+## Using the Application
+
+### Setup Users
+
+In order to actually use the application (at the moment, we intend to make it possible to share certain models with
+unauthenticated users), you need a login. You can use either a superuser or non-superuser. In either case, someone needs
+to first create a superuser using the console:
+
+**Why set up a Django superuser?** A Django superuser has all the permissions in the application and can manage all
+aspects of the system, including creating, modifying, and deleting users, collections, and other data. Setting up a
+superuser allows you to fully control and manage the application.
+
+**How to create a Django superuser:**
+
+1 Run the following command to create a superuser:
+
+sudo docker-compose -f local.yml run django python manage.py createsuperuser
+
+2 You will be prompted to provide a username, email address, and password for the superuser. Enter the required
+information.
+
+**How to create additional users using Django admin:**
+
+1. Start your Delphic application locally following the deployment instructions.
+2. Visit the Django admin interface by navigating to `http://localhost:8000/admin` in your browser.
+3. Log in with the superuser credentials you created earlier.
+4. Click on “Users” under the “Authentication and Authorization” section.
+5. Click on the “Add user +” button in the top right corner.
+6. Enter the required information for the new user, such as username and password. Click “Save” to create the user.
+7. To grant the new user additional permissions or make them a superuser, click on their username in the user list,
+   scroll down to the “Permissions” section, and configure their permissions accordingly. Save your changes.
@@ -0,0 +1,7 @@
+# Chatbots
+
+Chatbots are an incredibly popular use case for LLM's. LlamaIndex gives you the tools to build Knowledge-augmented chatbots and agents.
+
+Relevant Resources:
+- [Building a Chatbot](/end_to_end_tutorials/chatbots/building_a_chatbot.md)
+- [Using with a LangChain Agent](/community/integrations/using_with_langchain.md)
@@ -0,0 +1,352 @@
+# 💬🤖 How to Build a Chatbot
+
+LlamaIndex is an interface between your data and LLM's; it offers the toolkit for you to setup a query interface around your data for any downstream task, whether it's question-answering, summarization, or more.
+
+In this tutorial, we show you how to build a context augmented chatbot. We use Langchain for the underlying Agent/Chatbot abstractions, and we use LlamaIndex for the data retrieval/lookup/querying! The result is a chatbot agent that has access to a rich set of "data interface" Tools that LlamaIndex provides to answer queries over your data.
+
+**Note**: This is a continuation of some initial work building a query interface over SEC 10-K filings - [check it out here](https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d).
+
+### Context
+
+In this tutorial, we build an "10-K Chatbot" by downloading the raw UBER 10-K HTML filings from Dropbox. The user can choose to ask questions regarding the 10-K filings.
+
+### Ingest Data
+
+Let's first download the raw 10-k files, from 2019-2022.
+
+```python
+# NOTE: the code examples assume you're operating within a Jupyter notebook.
+# download files
+!mkdir data
+!wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip
+!unzip data/UBER.zip -d data
+
+```
+
+We use the [Unstructured](https://github.com/Unstructured-IO/unstructured) library to parse the HTML files into formatted text.
+We have a direct integration with Unstructured through [LlamaHub](https://llamahub.ai/) - this allows us to convert any text into a Document format that LlamaIndex can ingest.
+
+```python
+
+from llama_index import download_loader, VectorStoreIndex, ServiceContext, StorageContext, load_index_from_storage
+from pathlib import Path
+
+years = [2022, 2021, 2020, 2019]
+UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)
+
+loader = UnstructuredReader()
+doc_set = {}
+all_docs = []
+for year in years:
+    year_docs = loader.load_data(file=Path(f'./data/UBER/UBER_{year}.html'), split_documents=False)
+    # insert year metadata into each year
+    for d in year_docs:
+        d.metadata = {"year": year}
+    doc_set[year] = year_docs
+    all_docs.extend(year_docs)
+```
+
+### Setting up Vector Indices for each year
+
+We first setup a vector index for each year. Each vector index allows us 
+to ask questions about the 10-K filing of a given year.
+
+We build each index and save it to disk.
+
+```python
+# initialize simple vector indices + global vector index
+service_context = ServiceContext.from_defaults(chunk_size=512)
+index_set = {}
+for year in years:
+    storage_context = StorageContext.from_defaults()
+    cur_index = VectorStoreIndex.from_documents(
+        doc_set[year], 
+        service_context=service_context,
+        storage_context=storage_context,
+    )
+    index_set[year] = cur_index
+    storage_context.persist(persist_dir=f'./storage/{year}')
+
+```
+
+To load an index from disk, do the following
+```python
+# Load indices from disk
+index_set = {}
+for year in years:
+    storage_context = StorageContext.from_defaults(persist_dir=f'./storage/{year}')
+    cur_index = load_index_from_storage(storage_context=storage_context)
+    index_set[year] = cur_index
+```
+
+
+### Composing a Graph to Synthesize Answers Across 10-K Filings
+
+Since we have access to documents of 4 years, we may not only want to ask questions regarding the 10-K document of a given year, but ask questions that require analysis over all 10-K filings. 
+
+To address this, we compose a "graph" which consists of a list index defined over the 4 vector indices. Querying this graph would first retrieve information from each vector index, and combine information together via the list index.
+
+```python
+from llama_index import ListIndex, LLMPredictor, ServiceContext, load_graph_from_storage
+from langchain import OpenAI
+from llama_index.indices.composability import ComposableGraph
+
+# describe each index to help traversal of composed graph
+index_summaries = [f"UBER 10-k Filing for {year} fiscal year" for year in years]
+
+# define an LLMPredictor set number of output tokens
+llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, max_tokens=512))
+service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
+storage_context = StorageContext.from_defaults()
+
+# define a list index over the vector indices
+# allows us to synthesize information across each index
+graph = ComposableGraph.from_indices(
+    ListIndex,
+    [index_set[y] for y in years], 
+    index_summaries=index_summaries,
+    service_context=service_context,
+    storage_context = storage_context,
+)
+root_id = graph.root_id
+
+# [optional] save to disk
+storage_context.persist(persist_dir=f'./storage/root')
+
+# [optional] load from disk, so you don't need to build graph from scratch
+graph = load_graph_from_storage(
+    root_id=root_id, 
+    service_context=service_context,
+    storage_context=storage_context,
+)
+
+```
+
+### Setting up the Tools + Langchain Chatbot Agent
+
+We use Langchain to setup the outer chatbot agent, which has access to a set of Tools.
+LlamaIndex provides some wrappers around indices and graphs so that they can be easily used within a Tool interface.
+
+```python
+# do imports
+from langchain.chains.conversation.memory import ConversationBufferMemory
+from langchain.agents import initialize_agent
+
+from llama_index.langchain_helpers.agents import LlamaToolkit, create_llama_chat_agent, IndexToolConfig
+```
+
+We want to define a separate Tool for each index (corresponding to a given year), as well 
+as the graph. We can define all tools under a central `LlamaToolkit` interface.
+
+Below, we define a `IndexToolConfig` for our graph. Note that we also import a `DecomposeQueryTransform` module for use within each vector index within the graph - this allows us to "decompose" the overall query into a query that can be answered from each subindex. (see example below).
+
+```python
+# define a decompose transform
+from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
+decompose_transform = DecomposeQueryTransform(
+    llm_predictor, verbose=True
+)
+
+# define custom retrievers
+from llama_index.query_engine.transform_query_engine import TransformQueryEngine
+
+custom_query_engines = {}
+for index in index_set.values():
+    query_engine = index.as_query_engine()
+    query_engine = TransformQueryEngine(
+        query_engine,
+        query_transform=decompose_transform,
+        transform_extra_info={'index_summary': index.index_struct.summary},
+    )
+    custom_query_engines[index.index_id] = query_engine
+custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
+    response_mode='tree_summarize',
+    verbose=True,
+)
+
+# construct query engine
+graph_query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
+
+# tool config
+graph_config = IndexToolConfig(
+    query_engine=graph_query_engine,
+    name=f"Graph Index",
+    description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber.",
+    tool_kwargs={"return_direct": True}
+)
+```
+
+Besides the `IndexToolConfig` object for the graph, we also define an `IndexToolConfig` corresponding to each index:
+
+```python
+# define toolkit
+index_configs = []
+for y in range(2019, 2023):
+    query_engine = index_set[y].as_query_engine(
+        similarity_top_k=3,
+    )
+    tool_config = IndexToolConfig(
+        query_engine=query_engine, 
+        name=f"Vector Index {y}",
+        description=f"useful for when you want to answer queries about the {y} SEC 10-K for Uber",
+        tool_kwargs={"return_direct": True}
+    )
+    index_configs.append(tool_config)
+```
+
+Finally, we combine these configs with our `LlamaToolkit`: 
+
+```python
+toolkit = LlamaToolkit(
+    index_configs=index_configs + [graph_config],
+)
+```
+
+
+Finally, we call `create_llama_chat_agent` to create our Langchain chatbot agent, which
+has access to the 5 Tools we defined above:
+
+```python
+memory = ConversationBufferMemory(memory_key="chat_history")
+llm=OpenAI(temperature=0)
+agent_chain = create_llama_chat_agent(
+    toolkit,
+    llm,
+    memory=memory,
+    verbose=True
+)
+```
+
+### Testing the Agent
+
+We can now test the agent with various queries.
+
+If we test it with a simple "hello" query, the agent does not use any Tools.
+
+```python
+agent_chain.run(input="hi, i am bob")
+```
+
+```
+> Entering new AgentExecutor chain...
+
+Thought: Do I need to use a tool? No
+AI: Hi Bob, nice to meet you! How can I help you today?
+
+> Finished chain.
+'Hi Bob, nice to meet you! How can I help you today?'
+```
+
+If we test it with a query regarding the 10-k of a given year, the agent will use
+the relevant vector index Tool.
+
+```python
+agent_chain.run(input="What were some of the biggest risk factors in 2020 for Uber?")
+```
+
+```
+> Entering new AgentExecutor chain...
+
+Thought: Do I need to use a tool? Yes
+Action: Vector Index 2020
+Action Input: Risk Factors
+...
+
+Observation: 
+
+Risk Factors
+
+The COVID-19 pandemic and the impact of actions to mitigate the pandemic has adversely affected and continues to adversely affect our business, financial condition, and results of operations.
+
+...
+'\n\nRisk Factors\n\nThe COVID-19 pandemic and the impact of actions to mitigate the pandemic has adversely affected and continues to adversely affect our business,
+
+```
+
+Finally, if we test it with a query to compare/contrast risk factors across years,
+the agent will use the graph index Tool.
+
+```python
+cross_query_str = (
+    "Compare/contrast the risk factors described in the Uber 10-K across years. Give answer in bullet points."
+)
+agent_chain.run(input=cross_query_str)
+```
+
+```
+> Entering new AgentExecutor chain...
+
+Thought: Do I need to use a tool? Yes
+Action: Graph Index
+Action Input: Compare/contrast the risk factors described in the Uber 10-K across years.> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2022 fiscal year?
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2022 fiscal year?
+INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 964 tokens
+INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
+> Got response: 
+The risk factors described in the Uber 10-K for the 2022 fiscal year include: the potential for changes in the classification of Drivers, the potential for increased competition, the potential for...
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2021 fiscal year?
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2021 fiscal year?
+INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 590 tokens
+INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
+> Got response: 
+1. The COVID-19 pandemic and the impact of actions to mitigate the pandemic have adversely affected and may continue to adversely affect parts of our business.
+
+2. Our business would be adversely ...
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2020 fiscal year?
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2020 fiscal year?
+INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 516 tokens
+INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
+> Got response: 
+The risk factors described in the Uber 10-K for the 2020 fiscal year include: the timing of widespread adoption of vaccines against the virus, additional actions that may be taken by governmental ...
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2019 fiscal year?
+> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
+> New query:  What are the risk factors described in the Uber 10-K for the 2019 fiscal year?
+INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1020 tokens
+INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
+INFO:llama_index.indices.common.tree.base:> Building index from nodes: 0 chunks
+> Got response: 
+Risk factors described in the Uber 10-K for the 2019 fiscal year include: competition from other transportation providers; the impact of government regulations; the impact of litigation; the impac...
+INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 7039 tokens
+INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 72 tokens
+
+Observation: 
+In 2020, the risk factors included the timing of widespread adoption of vaccines against the virus, additional actions that may be taken by governmental authorities, the further impact on the business of Drivers
+
+...
+
+```
+
+
+### Setting up the Chatbot Loop
+
+Now that we have the chatbot setup, it only takes a few more steps to setup a basic interactive loop to converse with our SEC-augmented chatbot! 
+
+```python
+while True:
+    text_input = input("User: ")
+    response = agent_chain.run(input=text_input)
+    print(f'Agent: {response}')
+    
+```
+
+Here's an example of the loop in action:
+```
+User:  What were some of the legal proceedings against Uber in 2022?
+Agent: 
+
+In 2022, legal proceedings against Uber include a motion to compel arbitration, an appeal of a ruling that Proposition 22 is unconstitutional, a complaint alleging that drivers are employees and entitled to protections under the wage and labor laws, a summary judgment motion, allegations of misclassification of drivers and related employment violations in New York, fraud related to certain deductions, class actions in Australia alleging that Uber entities conspired to injure the group members during the period 2014 to 2017 by either directly breaching transport legislation or commissioning offenses against transport legislation by UberX Drivers in Australia, and claims of lost income and decreased value of certain taxi. Additionally, Uber is facing a challenge in California Superior Court alleging that Proposition 22 is unconstitutional, and a preliminary injunction order prohibiting Uber from classifying Drivers as independent contractors and from violating various wage and hour laws.
+
+User: 
+
+```
+
+### Notebook
+
+Take a look at our [corresponding notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatbot/Chatbot_SEC.ipynb). 
@@ -0,0 +1,29 @@
+# Discover LlamaIndex Video Series
+
+This page contains links to videos + associated notebooks for our ongoing video tutorial series "Discover LlamaIndex".
+
+## SubQuestionQueryEngine + 10K Analysis
+
+This video covers the `SubQuestionQueryEngine` and how it can be applied to financial documents to help decompose complex queries into multiple sub-questions.
+
+[Youtube](https://www.youtube.com/watch?v=GT_Lsj3xj1o)
+
+[Notebook](../../examples/usecases/10k_sub_question.ipynb)
+
+## Discord Document Management
+
+This video covers managing documents from a source that is consantly updating (i.e Discord) and how you can avoid document duplication and save embedding tokens.
+
+[Youtube](https://www.youtube.com/watch?v=j6dJcODLd_c)
+
+[Notebook + Supplimentary Material](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/discover_llamaindex/document_management/)
+
+[Reference Docs](../../core_modules/data_modules/index/document_management.md)
+
+## Joint Text to SQL and Semantic Search
+
+This video covers the tools built into LlamaIndex for combining SQL and semantic search into a single unified query interface.
+
+[Youtube](https://www.youtube.com/watch?v=ZIvcVJGtCrY)
+
+[Notebook](../../examples/query_engine/SQLAutoVectorQueryEngine.ipynb)
@@ -0,0 +1,5 @@
+# Private Setup
+
+Relevant Resources:
+- [Using LlamaIndex with Local Models](https://colab.research.google.com/drive/16QMQePkONNlDpgiltOi7oRQgmB8dU5fl?usp=sharing)
+
@@ -0,0 +1,234 @@
+# Q&A over Documents
+
+At a high-level, LlamaIndex gives you the ability to query your data for any downstream LLM use case,
+whether it's question-answering, summarization, or a component in a chatbot.
+
+This section describes the different ways you can query your data with LlamaIndex, roughly in order
+of simplest (top-k semantic search), to more advanced capabilities.
+
+### Semantic Search 
+
+The most basic example usage of LlamaIndex is through semantic search. We provide
+a simple in-memory vector store for you to get started, but you can also choose
+to use any one of our [vector store integrations](/community/integrations/vector_stores.md):
+
+```python
+from llama_index import VectorStoreIndex, SimpleDirectoryReader
+documents = SimpleDirectoryReader('data').load_data()
+index = VectorStoreIndex.from_documents(documents)
+query_engine = index.as_query_engine()
+response = query_engine.query("What did the author do growing up?")
+print(response)
+
+```
+
+**Tutorials**
+- [Starter Tutorial](/getting_started/starter_example.md)
+- [Basic Usage Pattern](/end_to_end_tutorials/usage_pattern.md)
+
+**Guides**
+- [Example](../examples/vector_stores/SimpleIndexDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/vector_stores/SimpleIndexDemo.ipynb))
+
+
+### Summarization
+
+A summarization query requires the LLM to iterate through many if not most documents in order to synthesize an answer.
+For instance, a summarization query could look like one of the following: 
+- "What is a summary of this collection of text?"
+- "Give me a summary of person X's experience with the company."
+
+In general, a list index would be suited for this use case. A list index by default goes through all the data.
+
+Empirically, setting `response_mode="tree_summarize"` also leads to better summarization results.
+
+```python
+index = ListIndex.from_documents(documents)
+
+query_engine = index.as_query_engine(
+    response_mode="tree_summarize"
+)
+response = query_engine.query("<summarization_query>")
+```
+
+### Queries over Structured Data
+
+LlamaIndex supports queries over structured data, whether that's a Pandas DataFrame or a SQL Database.
+
+Here are some relevant resources:
+
+**Tutorials**
+
+- [Guide on Text-to-SQL](/guides/tutorials/sql_guide.md)
+
+**Guides**
+- [SQL Guide (Core)](../examples/index_structs/struct_indices/SQLIndexDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/index_structs/struct_indices/SQLIndexDemo.ipynb))
+- [Pandas Demo](../examples/query_engine/pandas_query_engine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/pandas_query_engine.ipynb))
+
+
+### Synthesis over Heterogeneous Data
+
+LlamaIndex supports synthesizing across heterogeneous data sources. This can be done by composing a graph over your existing data.
+Specifically, compose a list index over your subindices. A list index inherently combines information for each node; therefore
+it can synthesize information across your heterogeneous data sources.
+
+```python
+from llama_index import VectorStoreIndex, ListIndex
+from llama_index.indices.composability import ComposableGraph
+
+index1 = VectorStoreIndex.from_documents(notion_docs)
+index2 = VectorStoreIndex.from_documents(slack_docs)
+
+graph = ComposableGraph.from_indices(ListIndex, [index1, index2], index_summaries=["summary1", "summary2"])
+query_engine = graph.as_query_engine()
+response = query_engine.query("<query_str>")
+
+```
+
+**Guides**
+- [Composability](/core_modules/data_modules/index/composability.md)
+- [City Analysis](/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb))
+
+
+
+### Routing over Heterogeneous Data
+
+LlamaIndex also supports routing over heterogeneous data sources with `RouterQueryEngine` - for instance, if you want to "route" a query to an 
+underlying Document or a sub-index.
+
+
+To do this, first build the sub-indices over different data sources.
+Then construct the corresponding query engines, and give each query engine a description to obtain a `QueryEngineTool`.
+
+```python
+from llama_index import TreeIndex, VectorStoreIndex
+from llama_index.tools import QueryEngineTool
+
+...
+
+# define sub-indices
+index1 = VectorStoreIndex.from_documents(notion_docs)
+index2 = VectorStoreIndex.from_documents(slack_docs)
+
+# define query engines and tools
+tool1 = QueryEngineTool.from_defaults(
+    query_engine=index1.as_query_engine(), 
+    description="Use this query engine to do...",
+)
+tool2 = QueryEngineTool.from_defaults(
+    query_engine=index2.as_query_engine(), 
+    description="Use this query engine for something else...",
+)
+```
+
+Then, we define a `RouterQueryEngine` over them.
+By default, this uses a `LLMSingleSelector` as the router, which uses the LLM to choose the best sub-index to router the query to, given the descriptions.
+
+```python
+from llama_index.query_engine import RouterQueryEngine
+
+query_engine = RouterQueryEngine.from_defaults(
+    query_engine_tools=[tool1, tool2]
+)
+
+response = query_engine.query(
+    "In Notion, give me a summary of the product roadmap."
+)
+
+```
+
+**Guides**
+- [Router Query Engine Guide](../examples/query_engine/RouterQueryEngine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/RouterQueryEngine.ipynb))
+- [City Analysis Unified Query Interface](../examples/composable_indices/city_analysis/City_Analysis-Unified-Query.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb))
+
+### Compare/Contrast Queries
+You can explicitly perform compare/contrast queries with a **query transformation** module within a ComposableGraph.
+
+```python
+from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
+decompose_transform = DecomposeQueryTransform(
+    llm_predictor_chatgpt, verbose=True
+)
+```
+
+This module will help break down a complex query into a simpler one over your existing index structure.
+
+**Guides**
+- [Query Transformations](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
+- [City Analysis Compare/Contrast Example](/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb))
+
+You can also rely on the LLM to *infer* whether to perform compare/contrast queries (see Multi-Document Queries below).
+
+### Multi-Document Queries
+
+Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. 
+It can do this through our `SubQuestionQueryEngine` class. Given a query, this query engine will generate a "query plan" containing
+sub-queries against sub-documents before synthesizing the final answer.
+
+To do this, first define an index for each document/data source, and wrap it with a `QueryEngineTool` (similar to above):
+
+```python
+from llama_index.tools import QueryEngineTool, ToolMetadata
+
+query_engine_tools = [
+    QueryEngineTool(
+        query_engine=sept_engine, 
+        metadata=ToolMetadata(name='sept_22', description='Provides information about Uber quarterly financials ending September 2022')
+    ),
+    QueryEngineTool(
+        query_engine=june_engine, 
+        metadata=ToolMetadata(name='june_22', description='Provides information about Uber quarterly financials ending June 2022')
+    ),
+    QueryEngineTool(
+        query_engine=march_engine, 
+        metadata=ToolMetadata(name='march_22', description='Provides information about Uber quarterly financials ending March 2022')
+    ),
+]
+```
+
+Then, we define a `SubQuestionQueryEngine` over these tools:
+
+```python
+from llama_index.query_engine import SubQuestionQueryEngine
+
+query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
+
+```
+
+This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer.
+This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.
+
+**Guides**
+- [Sub Question Query Engine (Intro)](../examples/query_engine/sub_question_query_engine.ipynb)
+- [10Q Analysis (Uber)](../examples/usecases/10q_sub_question.ipynb)
+- [10K Analysis (Uber and Lyft)](../examples/usecases/10k_sub_question.ipynb)
+
+
+### Multi-Step Queries
+
+LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions,
+and sequentially generate subquestions based on returned answers until the final answer is returned.
+
+For instance, given a question "Who was in the first batch of the accelerator program the author started?",
+the module will first decompose the query into a simpler initial question "What was the accelerator program the author started?",
+query the index, and then ask followup questions.
+
+**Guides**
+- [Query Transformations](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
+- [Multi-Step Query Decomposition](../examples/query_transformations/HyDEQueryTransformDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb))
+
+
+### Temporal Queries
+
+LlamaIndex can support queries that require an understanding of time. It can do this in two ways:
+- Decide whether the query requires utilizing temporal relationships between nodes (prev/next relationships) in order to retrieve additional context to answer the question.
+- Sort by recency and filter outdated context.
+
+**Guides**
+- [Second-Stage Postprocessing Guide](/core_modules/query_modules/node_postprocessors/root.md)
+- [Prev/Next Postprocessing](/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb)
+- [Recency Postprocessing](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb)
+
+### Additional Resources
+- [A Guide to Creating a Unified Query Framework over your ndexes](/end_to_end_tutorials/question_and_answer/unified_query.md)
+- [A Guide to Extracting Terms and Definitions](/end_to_end_tutorials/question_and_answer/terms_definitions_tutorial.md)
+- [SEC 10k Analysis](https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d)
@@ -0,0 +1,489 @@
+# A Guide to Extracting Terms and Definitions
+
+Llama Index has many use cases (semantic search, summarization, etc.) that are [well documented](/end_to_end_tutorials/use_cases.md). However, this doesn't mean we can't apply Llama Index to very specific use cases!
+
+In this tutorial, we will go through the design process of using Llama Index to extract terms and definitions from text, while allowing users to query those terms later. Using [Streamlit](https://streamlit.io/), we can provide an easy way to build frontend for running and testing all of this, and quickly iterate with our design.
+
+This tutorial assumes you have Python3.9+ and the following packages installed:
+
+- llama-index
+- streamlit
+
+At the base level, our objective is to take text from a document, extract terms and definitions, and then provide a way for users to query that knowledge base of terms and definitions. The tutorial will go over features from both Llama Index and Streamlit, and hopefully provide some interesting solutions for common problems that come up.
+
+The final version of this tutorial can be found [here](https://github.com/logan-markewich/llama_index_starter_pack) and a live hosted demo is available on [Huggingface Spaces](https://huggingface.co/spaces/llamaindex/llama_index_term_definition_demo).
+
+## Uploading Text
+
+Step one is giving users a way to upload documents. Let’s write some code using Streamlit to provide the interface for this! Use the following code and launch the app with `streamlit run app.py`.
+
+```python
+import streamlit as st
+
+st.title("🦙 Llama Index Term Extractor 🦙")
+
+document_text = st.text_area("Or enter raw text")
+if st.button("Extract Terms and Definitions") and document_text:
+    with st.spinner("Extracting..."):
+        extracted_terms = document text  # this is a placeholder!
+    st.write(extracted_terms)
+```
+
+Super simple right! But you'll notice that the app doesn't do anything useful yet. To use llama_index, we also need to setup our OpenAI LLM. There are a bunch of possible settings for the LLM, so we can let the user figure out what's best. We should also let the user set the prompt that will extract the terms (which will also help us debug what works best).
+
+## LLM Settings
+
+This next step introduces some tabs to our app, to separate it into different panes that provide different features. Let's create a tab for LLM settings and for uploading text:
+
+```python
+import os
+import streamlit as st
+
+DEFAULT_TERM_STR = (
+    "Make a list of terms and definitions that are defined in the context, "
+    "with one pair on each line. "
+    "If a term is missing it's definition, use your best judgment. "
+    "Write each line as as follows:\nTerm: <term> Definition: <definition>"
+)
+
+st.title("🦙 Llama Index Term Extractor 🦙")
+
+setup_tab, upload_tab = st.tabs(["Setup", "Upload/Extract Terms"])
+
+with setup_tab:
+    st.subheader("LLM Setup")
+    api_key = st.text_input("Enter your OpenAI API key here", type="password")
+    llm_name = st.selectbox('Which LLM?', ["text-davinci-003", "gpt-3.5-turbo", "gpt-4"])
+    model_temperature = st.slider("LLM Temperature", min_value=0.0, max_value=1.0, step=0.1)
+    term_extract_str = st.text_area("The query to extract terms and definitions with.", value=DEFAULT_TERM_STR)
+
+with upload_tab:
+    st.subheader("Extract and Query Definitions")
+    document_text = st.text_area("Or enter raw text")
+    if st.button("Extract Terms and Definitions") and document_text:
+        with st.spinner("Extracting..."):
+            extracted_terms = document text  # this is a placeholder!
+        st.write(extracted_terms)
+```
+
+Now our app has two tabs, which really helps with the organization. You'll also noticed I added a default prompt to extract terms -- you can change this later once you try extracting some terms, it's just the prompt I arrived at after experimenting a bit.
+
+Speaking of extracting terms, it's time to add some functions to do just that!
+
+## Extracting and Storing Terms
+
+Now that we are able to define LLM settings and upload text, we can try using Llama Index to extract the terms from text for us!
+
+We can add the following functions to both initialize our LLM, as well as use it to extract terms from the input text.
+
+```python
+from llama_index import Document, ListIndex, LLMPredictor, ServiceContext, load_index_from_storage
+
+def get_llm(llm_name, model_temperature, api_key, max_tokens=256):
+    os.environ['OPENAI_API_KEY'] = api_key
+    if llm_name == "text-davinci-003":
+        return OpenAI(temperature=model_temperature, model_name=llm_name, max_tokens=max_tokens)
+    else:
+        return ChatOpenAI(temperature=model_temperature, model_name=llm_name, max_tokens=max_tokens)
+
+def extract_terms(documents, term_extract_str, llm_name, model_temperature, api_key):
+    llm = get_llm(llm_name, model_temperature, api_key, max_tokens=1024)
+
+    service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm),
+                                                   chunk_size=1024)
+
+    temp_index = ListIndex.from_documents(documents, service_context=service_context)
+    query_engine = temp_index.as_query_engine(response_mode="tree_summarize")
+    terms_definitions = str(query_engine.query(term_extract_str))
+    terms_definitions = [x for x in terms_definitions.split("\n") if x and 'Term:' in x and 'Definition:' in x]
+    # parse the text into a dict
+    terms_to_definition = {x.split("Definition:")[0].split("Term:")[-1].strip(): x.split("Definition:")[-1].strip() for x in terms_definitions}
+    return terms_to_definition
+```
+
+Now, using the new functions, we can finally extract our terms!
+
+```python
+...
+with upload_tab:
+    st.subheader("Extract and Query Definitions")
+    document_text = st.text_area("Or enter raw text")
+    if st.button("Extract Terms and Definitions") and document_text:
+        with st.spinner("Extracting..."):
+            extracted_terms = extract_terms([Document(text=document_text)],
+                                            term_extract_str, llm_name,
+                                            model_temperature, api_key)
+        st.write(extracted_terms)
+```
+
+There's a lot going on now, let's take a moment to go over what is happening.
+
+`get_llm()` is instantiating the LLM based on the user configuration from the setup tab. Based on the model name, we need to use the appropriate class (`OpenAI` vs. `ChatOpenAI`).
+
+`extract_terms()` is where all the good stuff happens. First, we call `get_llm()` with `max_tokens=1024`, since we don't want to limit the model too much when it is extracting our terms and definitions (the default is 256 if not set). Then, we define our `ServiceContext` object, aligning `num_output` with our `max_tokens` value, as well as setting the chunk size to be no larger than the output. When documents are indexed by Llama Index, they are broken into chunks (also called nodes) if they are large, and `chunk_size` sets the size for these chunks.
+
+Next, we create a temporary list index and pass in our service context. A list index will read every single piece of text in our index, which is perfect for extracting terms. Finally, we use our pre-defined query text to extract terms, using `response_mode="tree_summarize`. This response mode will generate a tree of summaries from the bottom up, where each parent summarizes its children. Finally, the top of the tree is returned, which will contain all our extracted terms and definitions.
+
+Lastly, we do some minor post processing. We assume the model followed instructions and put a term/definition pair on each line. If a line is missing the `Term:` or `Definition:` labels, we skip it. Then, we convert this to a dictionary for easy storage!
+
+## Saving Extracted Terms
+
+Now that we can extract terms, we need to put them somewhere so that we can query for them later. A `VectorStoreIndex` should be a perfect choice for now! But in addition, our app should also keep track of which terms are inserted into the index so that we can inspect them later. Using `st.session_state`, we can store the current list of terms in a session dict, unique to each user!
+
+First things first though, let's add a feature to initialize a global vector index and another function to insert the extracted terms.
+
+```python
+...
+if 'all_terms' not in st.session_state:
+    st.session_state['all_terms'] = DEFAULT_TERMS
+...
+
+def insert_terms(terms_to_definition):
+    for term, definition in terms_to_definition.items():
+        doc = Document(text=f"Term: {term}\nDefinition: {definition}")
+        st.session_state['llama_index'].insert(doc)
+
+@st.cache_resource
+def initialize_index(llm_name, model_temperature, api_key):
+    """Create the VectorStoreIndex object."""
+    llm = get_llm(llm_name, model_temperature, api_key)
+
+    service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
+
+    index = VectorStoreIndex([], service_context=service_context)
+
+    return index
+
+...
+
+with upload_tab:
+    st.subheader("Extract and Query Definitions")
+    if st.button("Initialize Index and Reset Terms"):
+        st.session_state['llama_index'] = initialize_index(llm_name, model_temperature, api_key)
+        st.session_state['all_terms'] = {}
+
+    if "llama_index" in st.session_state:
+        st.markdown("Either upload an image/screenshot of a document, or enter the text manually.")
+        document_text = st.text_area("Or enter raw text")
+        if st.button("Extract Terms and Definitions") and (uploaded_file or document_text):
+            st.session_state['terms'] = {}
+            terms_docs = {}
+            with st.spinner("Extracting..."):
+                terms_docs.update(extract_terms([Document(text=document_text)], term_extract_str, llm_name, model_temperature, api_key))
+            st.session_state['terms'].update(terms_docs)
+
+        if "terms" in st.session_state and st.session_state["terms"]::
+            st.markdown("Extracted terms")
+            st.json(st.session_state['terms'])
+
+            if st.button("Insert terms?"):
+                with st.spinner("Inserting terms"):
+                    insert_terms(st.session_state['terms'])
+                st.session_state['all_terms'].update(st.session_state['terms'])
+                st.session_state['terms'] = {}
+                st.experimental_rerun()
+```
+
+Now you are really starting to leverage the power of streamlit! Let's start with the code under the upload tab. We added a button to initialize the vector index, and we store it in the global streamlit state dictionary, as well as resetting the currently extracted terms. Then, after extracting terms from the input text, we store it the extracted terms in the global state again and give the user a chance to review them before inserting. If the insert button is pressed, then we call our insert terms function, update our global tracking of inserted terms, and remove the most recently extracted terms from the session state.
+
+## Querying for Extracted Terms/Definitions
+
+With the terms and definitions extracted and saved, how can we use them? And how will the user even remember what's previously been saved?? We can simply add some more tabs to the app to handle these features.
+
+```python
+...
+setup_tab, terms_tab, upload_tab, query_tab = st.tabs(
+    ["Setup", "All Terms", "Upload/Extract Terms", "Query Terms"]
+)
+...
+with terms_tab:
+    with terms_tab:
+    st.subheader("Current Extracted Terms and Definitions")
+    st.json(st.session_state["all_terms"])
+...
+with query_tab:
+    st.subheader("Query for Terms/Definitions!")
+    st.markdown(
+        (
+            "The LLM will attempt to answer your query, and augment it's answers using the terms/definitions you've inserted. "
+            "If a term is not in the index, it will answer using it's internal knowledge."
+        )
+    )
+    if st.button("Initialize Index and Reset Terms", key="init_index_2"):
+        st.session_state["llama_index"] = initialize_index(
+            llm_name, model_temperature, api_key
+        )
+        st.session_state["all_terms"] = {}
+
+    if "llama_index" in st.session_state:
+        query_text = st.text_input("Ask about a term or definition:")
+        if query_text:
+            query_text = query_text + "\nIf you can't find the answer, answer the query with the best of your knowledge."
+            with st.spinner("Generating answer..."):
+                response = st.session_state["llama_index"].query(
+                    query_text, similarity_top_k=5, response_mode="compact"
+                )
+            st.markdown(str(response))
+```
+
+While this is mostly basic, some important things to note:
+
+- Our initialize button has the same text as our other button. Streamlit will complain about this, so we provide a unique key instead.
+- Some additional text has been added to the query! This is to try and compensate for times when the index does not have the answer.
+- In our index query, we've specified two options:
+  - `similarity_top_k=5` means the index will fetch the top 5 closest matching terms/definitions to the query.
+  - `response_mode="compact"` means as much text as possible from the 5 matching terms/definitions will be used in each LLM call. Without this, the index would make at least 5 calls to the LLM, which can slow things down for the user.
+
+## Dry Run Test
+
+Well, actually I hope you've been testing as we went. But now, let's try one complete test.
+
+1. Refresh the app
+2. Enter your LLM settings
+3. Head over to the query tab
+4. Ask the following: `What is a bunnyhug?`
+5. The app should give some nonsense response. If you didn't know, a bunnyhug is another word for a hoodie, used by people from the Canadian Prairies!
+6. Let's add this definition to the app. Open the upload tab and enter the following text: `A bunnyhug is a common term used to describe a hoodie. This term is used by people from the Canadian Prairies.`
+7. Click the extract button. After a few moments, the app should display the correctly extracted term/definition. Click the insert term button to save it!
+8. If we open the terms tab, the term and definition we just extracted should be displayed
+9. Go back to the query tab and try asking what a bunnyhug is. Now, the answer should be correct!
+
+## Improvement #1 - Create a Starting Index
+
+With our base app working, it might feel like a lot of work to build up a useful index. What if we gave the user some kind of starting point to show off the app's query capabilities? We can do just that! First, let's make a small change to our app so that we save the index to disk after every upload:
+
+```python
+def insert_terms(terms_to_definition):
+    for term, definition in terms_to_definition.items():
+        doc = Document(text=f"Term: {term}\nDefinition: {definition}")
+        st.session_state['llama_index'].insert(doc)
+    # TEMPORARY - save to disk
+    st.session_state['llama_index'].storage_context.persist()
+```
+
+Now, we need some document to extract from! The repository for this project used the wikipedia page on New York City, and you can find the text [here](https://github.com/jerryjliu/llama_index/blob/main/examples/test_wiki/data/nyc_text.txt).
+
+If you paste the text into the upload tab and run it (it may take some time), we can insert the extracted terms. Make sure to also copy the text for the extracted terms into a notepad or similar before inserting into the index! We will need them in a second.
+
+After inserting, remove the line of code we used to save the index to disk. With a starting index now saved, we can modify our `initialize_index` function to look like this:
+
+```python
+@st.cache_resource
+def initialize_index(llm_name, model_temperature, api_key):
+    """Load the Index object."""
+    llm = get_llm(llm_name, model_temperature, api_key)
+
+    service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
+
+    index = load_index_from_storage(service_context=service_context)
+
+    return index
+```
+
+Did you remember to save that giant list of extracted terms in a notepad? Now when our app initializes, we want to pass in the default terms that are in the index to our global terms state:
+
+```python
+...
+if "all_terms" not in st.session_state:
+    st.session_state["all_terms"] = DEFAULT_TERMS
+...
+```
+
+Repeat the above anywhere where we were previously resetting the `all_terms` values.
+
+## Improvement #2 - (Refining) Better Prompts
+
+If you play around with the app a bit now, you might notice that it stopped following our prompt! Remember, we added to our `query_str` variable that if the term/definition could not be found, answer to the best of its knowledge. But now if you try asking about random terms (like bunnyhug!), it may or may not follow those instructions.
+
+This is due to the concept of "refining" answers in Llama Index. Since we are querying across the top 5 matching results, sometimes all the results do not fit in a single prompt! OpenAI models typically have a max input size of 4097 tokens. So, Llama Index accounts for this by breaking up the matching results into chunks that will fit into the prompt. After Llama Index gets an initial answer from the first API call, it sends the next chunk to the API, along with the previous answer, and asks the model to refine that answer.
+
+So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:
+
+```python
+from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model
+from langchain.prompts.chat import (
+    AIMessagePromptTemplate,
+    ChatPromptTemplate,
+    HumanMessagePromptTemplate,
+)
+
+from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
+
+# Text QA templates
+DEFAULT_TEXT_QA_PROMPT_TMPL = (
+    "Context information is below. \n"
+    "---------------------\n"
+    "{context_str}"
+    "\n---------------------\n"
+    "Given the context information answer the following question "
+    "(if you don't know the answer, use the best of your knowledge): {query_str}\n"
+)
+TEXT_QA_TEMPLATE = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
+
+# Refine templates
+DEFAULT_REFINE_PROMPT_TMPL = (
+    "The original question is as follows: {query_str}\n"
+    "We have provided an existing answer: {existing_answer}\n"
+    "We have the opportunity to refine the existing answer "
+    "(only if needed) with some more context below.\n"
+    "------------\n"
+    "{context_msg}\n"
+    "------------\n"
+    "Given the new context and using the best of your knowledge, improve the existing answer. "
+    "If you can't improve the existing answer, just repeat it again."
+)
+DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
+
+CHAT_REFINE_PROMPT_TMPL_MSGS = [
+    HumanMessagePromptTemplate.from_template("{query_str}"),
+    AIMessagePromptTemplate.from_template("{existing_answer}"),
+    HumanMessagePromptTemplate.from_template(
+        "We have the opportunity to refine the above answer "
+        "(only if needed) with some more context below.\n"
+        "------------\n"
+        "{context_msg}\n"
+        "------------\n"
+        "Given the new context and using the best of your knowledge, improve the existing answer. "
+    "If you can't improve the existing answer, just repeat it again."
+    ),
+]
+
+CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
+CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
+
+# refine prompt selector
+DEFAULT_REFINE_PROMPT_SEL_LC = ConditionalPromptSelector(
+    default_prompt=DEFAULT_REFINE_PROMPT.get_langchain_prompt(),
+    conditionals=[(is_chat_model, CHAT_REFINE_PROMPT.get_langchain_prompt())],
+)
+REFINE_TEMPLATE = RefinePrompt(
+    langchain_prompt_selector=DEFAULT_REFINE_PROMPT_SEL_LC
+)
+```
+
+That seems like a lot of code, but it's not too bad! If you looked at the default prompts, you might have noticed that there are default prompts, and prompts specific to chat models. Continuing that trend, we do the same for our custom prompts. Then, using a prompt selector, we can combine both prompts into a single object. If the LLM being used is a chat model (ChatGPT, GPT-4), then the chat prompts are used. Otherwise, use the normal prompt templates.
+
+Another thing to note is that we only defined one QA template. In a chat model, this will be converted to a single "human" message.
+
+So, now we can import these prompts into our app and use them during the query.
+
+```python
+from constants import REFINE_TEMPLATE, TEXT_QA_TEMPLATE
+...
+    if "llama_index" in st.session_state:
+        query_text = st.text_input("Ask about a term or definition:")
+        if query_text:
+            query_text = query_text  # Notice we removed the old instructions
+            with st.spinner("Generating answer..."):
+                response = st.session_state["llama_index"].query(
+                    query_text, similarity_top_k=5, response_mode="compact",
+                    text_qa_template=TEXT_QA_TEMPLATE, refine_template=REFINE_TEMPLATE
+                )
+            st.markdown(str(response))
+...
+```
+
+If you experiment a bit more with queries, hopefully you notice that the responses follow our instructions a little better now!
+
+## Improvement #3 - Image Support
+
+Llama index also supports images! Using Llama Index, we can upload images of documents (papers, letters, etc.), and Llama Index handles extracting the text. We can leverage this to also allow users to upload images of their documents and extract terms and definitions from them.
+
+If you get an import error about PIL, install it using `pip install Pillow` first.
+
+```python
+from PIL import Image
+from llama_index.readers.file.base import DEFAULT_FILE_EXTRACTOR, ImageParser
+
+@st.cache_resource
+def get_file_extractor():
+    image_parser = ImageParser(keep_image=True, parse_text=True)
+    file_extractor = DEFAULT_FILE_EXTRACTOR
+    file_extractor.update(
+        {
+            ".jpg": image_parser,
+            ".png": image_parser,
+            ".jpeg": image_parser,
+        }
+    )
+
+    return file_extractor
+
+file_extractor = get_file_extractor()
+...
+with upload_tab:
+    st.subheader("Extract and Query Definitions")
+    if st.button("Initialize Index and Reset Terms", key="init_index_1"):
+        st.session_state["llama_index"] = initialize_index(
+            llm_name, model_temperature, api_key
+        )
+        st.session_state["all_terms"] = DEFAULT_TERMS
+
+    if "llama_index" in st.session_state:
+        st.markdown(
+            "Either upload an image/screenshot of a document, or enter the text manually."
+        )
+        uploaded_file = st.file_uploader(
+            "Upload an image/screenshot of a document:", type=["png", "jpg", "jpeg"]
+        )
+        document_text = st.text_area("Or enter raw text")
+        if st.button("Extract Terms and Definitions") and (
+            uploaded_file or document_text
+        ):
+            st.session_state["terms"] = {}
+            terms_docs = {}
+            with st.spinner("Extracting (images may be slow)..."):
+                if document_text:
+                    terms_docs.update(
+                        extract_terms(
+                            [Document(text=document_text)],
+                            term_extract_str,
+                            llm_name,
+                            model_temperature,
+                            api_key,
+                        )
+                    )
+                if uploaded_file:
+                    Image.open(uploaded_file).convert("RGB").save("temp.png")
+                    img_reader = SimpleDirectoryReader(
+                        input_files=["temp.png"], file_extractor=file_extractor
+                    )
+                    img_docs = img_reader.load_data()
+                    os.remove("temp.png")
+                    terms_docs.update(
+                        extract_terms(
+                            img_docs,
+                            term_extract_str,
+                            llm_name,
+                            model_temperature,
+                            api_key,
+                        )
+                    )
+            st.session_state["terms"].update(terms_docs)
+
+        if "terms" in st.session_state and st.session_state["terms"]:
+            st.markdown("Extracted terms")
+            st.json(st.session_state["terms"])
+
+            if st.button("Insert terms?"):
+                with st.spinner("Inserting terms"):
+                    insert_terms(st.session_state["terms"])
+                st.session_state["all_terms"].update(st.session_state["terms"])
+                st.session_state["terms"] = {}
+                st.experimental_rerun()
+```
+
+Here, we added the option to upload a file using Streamlit. Then the image is opened and saved to disk (this seems hacky but it keeps things simple). Then we pass the image path to the reader, extract the documents/text, and remove our temp image file.
+
+Now that we have the documents, we can call `extract_terms()` the same as before.
+
+## Conclusion/TLDR
+
+In this tutorial, we covered a ton of information, while solving some common issues and problems along the way:
+
+- Using different indexes for different use cases (List vs. Vector index)
+- Storing global state values with Streamlit's `session_state` concept
+- Customizing internal prompts with Llama Index
+- Reading text from images with Llama Index
+
+The final version of this tutorial can be found [here](https://github.com/logan-markewich/llama_index_starter_pack) and a live hosted demo is available on [Huggingface Spaces](https://huggingface.co/spaces/llamaindex/llama_index_term_definition_demo).
@@ -0,0 +1,268 @@
+# A Guide to Creating a Unified Query Framework over your Indexes
+
+LlamaIndex offers a variety of different [use cases](/end_to_end_tutorials/use_cases.md).
+
+For simple queries, we may want to use a single index data structure, such as a `VectorStoreIndex` for semantic search, or `ListIndex` for summarization.
+
+For more complex queries, we may want to use a composable graph.
+
+But how do we integrate indexes and graphs into our LLM application? Different indexes and graphs may be better suited for different types of queries that you may want to run.
+
+In this guide, we show how you can unify the diverse use cases of different index/graph structures under a **single** query framework.
+
+### Setup
+
+In this example, we will analyze Wikipedia articles of different cities: Boston, Seattle, San Francisco, and more.
+
+The below code snippet downloads the relevant data into files.
+
+```python
+
+from pathlib import Path
+import requests
+
+wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
+
+for title in wiki_titles:
+    response = requests.get(
+        'https://en.wikipedia.org/w/api.php',
+        params={
+            'action': 'query',
+            'format': 'json',
+            'titles': title,
+            'prop': 'extracts',
+            # 'exintro': True,
+            'explaintext': True,
+        }
+    ).json()
+    page = next(iter(response['query']['pages'].values()))
+    wiki_text = page['extract']
+
+    data_path = Path('data')
+    if not data_path.exists():
+        Path.mkdir(data_path)
+
+    with open(data_path / f"{title}.txt", 'w') as fp:
+        fp.write(wiki_text)
+
+```
+
+The next snippet loads all files into Document objects.
+
+```python
+# Load all wiki documents
+city_docs = {}
+for wiki_title in wiki_titles:
+    city_docs[wiki_title] = SimpleDirectoryReader(input_files=[f"data/{wiki_title}.txt"]).load_data()
+
+```
+
+### Defining the Set of Indexes
+
+We will now define a set of indexes and graphs over our data. You can think of each index/graph as a lightweight structure
+that solves a distinct use case.
+
+We will first define a vector index over the documents of each city.
+
+```python
+from llama_index import VectorStoreIndex, ServiceContext, StorageContext
+from langchain.llms.openai import OpenAIChat
+
+# set service context
+llm_predictor_gpt4 = LLMPredictor(llm=OpenAIChat(temperature=0, model_name="gpt-4"))
+service_context = ServiceContext.from_defaults(
+    llm_predictor=llm_predictor_gpt4, chunk_size=1024
+)
+
+# Build city document index
+vector_indices = {}
+for wiki_title in wiki_titles:
+    storage_context = StorageContext.from_defaults()
+    # build vector index
+    vector_indices[wiki_title] = VectorStoreIndex.from_documents(
+        city_docs[wiki_title],
+        service_context=service_context,
+        storage_context=storage_context,
+    )
+    # set id for vector index
+    vector_indices[wiki_title].index_struct.index_id = wiki_title
+    # persist to disk
+    storage_context.persist(persist_dir=f'./storage/{wiki_title}')
+```
+
+Querying a vector index lets us easily perform semantic search over a given city's documents.
+
+```python
+response = vector_indices["Toronto"].as_query_engine().query("What are the sports teams in Toronto?")
+print(str(response))
+
+```
+
+Example response:
+
+```text
+The sports teams in Toronto are the Toronto Maple Leafs (NHL), Toronto Blue Jays (MLB), Toronto Raptors (NBA), Toronto Argonauts (CFL), Toronto FC (MLS), Toronto Rock (NLL), Toronto Wolfpack (RFL), and Toronto Rush (NARL).
+```
+
+### Defining a Graph for Compare/Contrast Queries
+
+We will now define a composed graph in order to run **compare/contrast** queries (see [use cases doc](/use_cases/queries.md)).
+This graph contains a keyword table composed on top of existing vector indexes.
+
+To do this, we first want to set the "summary text" for each vector index.
+
+```python
+index_summaries = {}
+for wiki_title in wiki_titles:
+    # set summary text for city
+    index_summaries[wiki_title] = (
+        f"This content contains Wikipedia articles about {wiki_title}. "
+        f"Use this index if you need to lookup specific facts about {wiki_title}.\n"
+        "Do not use this index if you want to analyze multiple cities."
+    )
+```
+
+Next, we compose a keyword table on top of these vector indexes, with these indexes and summaries, in order to build the graph.
+
+```python
+from llama_index.indices.composability import ComposableGraph
+
+graph = ComposableGraph.from_indices(
+    SimpleKeywordTableIndex,
+    [index for _, index in vector_indices.items()],
+    [summary for _, summary in index_summaries.items()],
+    max_keywords_per_chunk=50
+)
+
+# get root index
+root_index = graph.get_index(graph.index_struct.root_id, SimpleKeywordTableIndex)
+# set id of root index
+root_index.set_index_id("compare_contrast")
+root_summary = (
+    "This index contains Wikipedia articles about multiple cities. "
+    "Use this index if you want to compare multiple cities. "
+)
+
+```
+
+Querying this graph (with a query transform module), allows us to easily compare/contrast between different cities.
+An example is shown below.
+
+```python
+# define decompose_transform
+from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
+decompose_transform = DecomposeQueryTransform(
+    llm_predictor_chatgpt, verbose=True
+)
+
+# define custom query engines
+from llama_index.query_engine.transform_query_engine import TransformQueryEngine
+custom_query_engines = {}
+for index in vector_indices.values():
+    query_engine = index.as_query_engine(service_context=service_context)
+    query_engine = TransformQueryEngine(
+        query_engine,
+        query_transform=decompose_transform,
+        transform_extra_info={'index_summary': index.index_struct.summary},
+    )
+    custom_query_engines[index.index_id] = query_engine
+custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
+    retriever_mode='simple',
+    response_mode='tree_summarize',
+    service_context=service_context,
+)
+
+# define query engine
+query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
+
+# query the graph
+query_str = (
+    "Compare and contrast the arts and culture of Houston and Boston. "
+)
+response_chatgpt = query_engine.query(query_str)
+```
+
+### Defining the Unified Query Interface
+
+Now that we've defined the set of indexes/graphs, we want to build an **outer abstraction** layer that provides a unified query interface
+to our data structures. This means that during query-time, we can query this outer abstraction layer and trust that the right index/graph
+will be used for the job.
+
+There are a few ways to do this, both within our framework as well as outside of it!
+
+- Build a **router query engine** on top of your existing indexes/graphs
+- Define each index/graph as a Tool within an agent framework (e.g. LangChain).
+
+For the purposes of this tutorial, we follow the former approach. If you want to take a look at how the latter approach works,
+take a look at [our example tutorial here](/guides/tutorials/building_a_chatbot.md).
+
+Let's take a look at an example of building a router query engine to automatically "route" any query to the set of indexes/graphs that you have define under the hood.
+
+First, we define the query engines for the set of indexes/graph that we want to route our query to. We also give each a description (about what data it holds and what it's useful for) to help the router choose between them depending on the specific query.
+
+```python
+from llama_index.tools.query_engine import QueryEngineTool
+
+query_engine_tools = []
+
+# add vector index tools
+for wiki_title in wiki_titles:
+    index = vector_indices[wiki_title]
+    summary = index_summaries[wiki_title]
+
+    query_engine = index.as_query_engine(service_context=service_context)
+    vector_tool = QueryEngineTool.from_defaults(query_engine, description=summary)
+    query_engine_tools.append(vector_tool)
+
+
+# add graph tool
+graph_description = (
+    "This tool contains Wikipedia articles about multiple cities. "
+    "Use this tool if you want to compare multiple cities. "
+)
+graph_tool = QueryEngineTool.from_defaults(graph_query_engine, description=graph_description)
+query_engine_tools.append(graph_tool)
+```
+
+Now, we can define the routing logic and overall router query engine.
+Here, we use the `LLMSingleSelector`, which uses LLM to choose a underlying query engine to route the query to.
+
+```python
+from llama_index.query_engine.router_query_engine import RouterQueryEngine
+from llama_index.selectors.llm_selectors import LLMSingleSelector
+
+
+router_query_engine = RouterQueryEngine(
+    selector=LLMSingleSelector.from_defaults(service_context=service_context),
+    query_engine_tools=query_engine_tools
+)
+```
+
+### Querying our Unified Interface
+
+The advantage of a unified query interface is that it can now handle different types of queries.
+
+It can now handle queries about specific cities (by routing to the specific city vector index), and also compare/contrast different cities.
+
+Let's take a look at a few examples!
+
+**Asking a Compare/Contrast Question**
+
+```python
+# ask a compare/contrast question
+response = router_query_engine.query(
+    "Compare and contrast the arts and culture of Houston and Boston.",
+)
+print(str(response)
+```
+
+**Asking Questions about specific Cities**
+
+```python
+
+response = router_query_engine.query("What are the sports teams in Toronto?")
+print(str(response))
+
+```
+
+This "outer" abstraction is able to handle different queries by routing to the right underlying abstractions.
--- a/Show More
+++ b/Show More