This commit is contained in:
Logan Markewich
2023-07-13 16:14:58 -06:00
parent e6365532e6
commit 5074558921
126 changed files with 11825 additions and 0 deletions
+339
View File
@@ -0,0 +1,339 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# LlamaIndex Bottoms-Up Development - LLMs and Prompts\n",
"This notebook walks through testing an LLM using the primary prompt templates used in llama-index."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"import os\n",
"\n",
"openai.api_key = \"YOUR_API_KEY\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"In this section, we load a test document, create an LLM, and copy prompts from llama-index to test with."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's load a quick document to test with. Right now, we will just load it as plain text, but we can do other operations later!"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"with open(\"./getting_started/starter_example.md\", \"r\") as f:\n",
" text = f.read()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we create our LLM!"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.llms import OpenAI\n",
"llm = OpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"LlamaIndex uses some simple templates under the hood for answering queries -- mainly a `text_qa_template` for obtaining initial answers, and a `refine_template` for refining an existing answer when all the text does not fit into one LLM call.\n",
"\n",
"Let's copy the default templates, and test out our LLM with a few questions."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from llama_index import Prompt\n",
"\n",
"text_qa_template = Prompt(\n",
" \"Context information is below.\\n\"\n",
" \"---------------------\\n\"\n",
" \"{context_str}\\n\"\n",
" \"---------------------\\n\"\n",
" \"Given the context information and not prior knowledge, \"\n",
" \"answer the question: {query_str}\\n\"\n",
")\n",
"\n",
"refine_template = Prompt(\n",
" \"We have the opportunity to refine the original answer \"\n",
" \"(only if needed) with some more context below.\\n\"\n",
" \"------------\\n\"\n",
" \"{context_msg}\\n\"\n",
" \"------------\\n\"\n",
" \"Given the new context, refine the original answer to better \"\n",
" \"answer the question: {query_str}. \"\n",
" \"If the context isn't useful, output the original answer again.\\n\"\n",
" \"Original Answer: {existing_answer}\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, lets test a few questions!\n",
"\n",
"## Text QA Template Testing"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"To install LlamaIndex, you can follow the installation steps provided in the \"installation\" guide.\n"
]
}
],
"source": [
"question = \"How can I install llama-index?\"\n",
"prompt = text_qa_template.format(context_str=text, query_str=question)\n",
"response = llm.complete(prompt)\n",
"print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"To create an index using LlamaIndex, you need to follow these steps:\n",
"\n",
"1. Download the LlamaIndex repository by cloning it from GitHub.\n",
"2. Navigate to the `examples/paul_graham_essay` folder in the cloned repository.\n",
"3. Create a new Python file and import the necessary modules: `VectorStoreIndex` and `SimpleDirectoryReader`.\n",
"4. Load the documents from the `data` folder using `SimpleDirectoryReader('data').load_data()`.\n",
"5. Build the index using `VectorStoreIndex.from_documents(documents)`.\n",
"6. To persist the index to disk, use `index.storage_context.persist()`.\n",
"7. To reload the index from disk, use the `StorageContext` and `load_index_from_storage` functions.\n",
"\n",
"Note: This answer assumes that you have already installed LlamaIndex and have the necessary dependencies.\n"
]
}
],
"source": [
"question = \"How do I create an index?\"\n",
"prompt = text_qa_template.format(context_str=text, query_str=question)\n",
"response = llm.complete(prompt)\n",
"print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"```python\n",
"from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
"\n",
"documents = SimpleDirectoryReader('data').load_data()\n",
"index = VectorStoreIndex.from_documents(documents)\n",
"```"
]
}
],
"source": [
"question = \"How do I create an index? Write your answer using only code.\"\n",
"prompt = text_qa_template.format(context_str=text, query_str=question)\n",
"response_gen = llm.stream_complete(prompt)\n",
"for response in response_gen:\n",
" print(response.delta, end=\"\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Refine Template Testing"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"To create an index using LlamaIndex, follow these steps:\n",
"\n",
"```python\n",
"from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
"\n",
"# Load the documents from the 'data' folder\n",
"documents = SimpleDirectoryReader('data').load_data()\n",
"\n",
"# Build the index\n",
"index = VectorStoreIndex.from_documents(documents)\n",
"\n",
"# Persist the index to disk\n",
"index.storage_context.persist()\n",
"\n",
"# Reload the index from disk\n",
"from llama_index import StorageContext, load_index_from_storage\n",
"\n",
"storage_context = StorageContext.from_defaults(persist_dir=\"./storage\")\n",
"index = load_index_from_storage(storage_context)\n",
"```\n",
"\n",
"Make sure you have installed LlamaIndex and have the necessary dependencies.\n"
]
}
],
"source": [
"question = \"How do I create an index? Write your answer using only code.\"\n",
"existing_answer = \"\"\"To create an index using LlamaIndex, you need to follow these steps:\n",
"\n",
"1. Download the LlamaIndex repository by cloning it from GitHub.\n",
"2. Navigate to the `examples/paul_graham_essay` folder in the cloned repository.\n",
"3. Create a new Python file and import the necessary modules: `VectorStoreIndex` and `SimpleDirectoryReader`.\n",
"4. Load the documents from the `data` folder using `SimpleDirectoryReader('data').load_data()`.\n",
"5. Build the index using `VectorStoreIndex.from_documents(documents)`.\n",
"6. To persist the index to disk, use `index.storage_context.persist()`.\n",
"7. To reload the index from disk, use the `StorageContext` and `load_index_from_storage` functions.\n",
"\n",
"Note: This answer assumes that you have already installed LlamaIndex and have the necessary dependencies.\"\"\"\n",
"prompt = refine_template.format(context_msg=text, query_str=question, existing_answer=existing_answer)\n",
"response = llm.complete(prompt)\n",
"print(response.text)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Chat Example\n",
"The LLM also has a `chat` method that takes in a list of messages, to simulate a chat session. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"assistant: To create an index, you will need to follow these general steps:\n",
"\n",
"1. Determine the purpose and scope of your index: Decide what information you want to include in your index and what it will be used for. This will help you determine the structure and content of your index.\n",
"\n",
"2. Identify the items to be indexed: Determine the specific items or topics that you want to include in your index. For example, if you are creating an index for a book, you might want to index chapters, sections, and important concepts.\n",
"\n",
"3. Create a list of index terms: Identify the key terms or phrases that will be used to reference each item in your index. These terms should be concise and descriptive.\n",
"\n",
"4. Organize the index terms: Determine the hierarchical structure of your index. You can use headings, subheadings, and indentation to create a logical and organized structure.\n",
"\n",
"5. Assign page numbers or locations: For each index term, identify the page number or location where the item can be found. This will help users quickly locate the information they are looking for.\n",
"\n",
"6. Format the index: Use a consistent and clear formatting style for your index. You can use software tools like Microsoft Word or Adobe InDesign to create a professional-looking index.\n",
"\n",
"7. Review and revise: Once you have created your index, review it carefully to ensure accuracy and completeness. Make any necessary revisions or updates before finalizing your index.\n",
"\n",
"Remember, creating an index can be a time-consuming process, so it's important to plan and allocate enough time to complete it accurately.\n"
]
}
],
"source": [
"from llama_index.llms import ChatMessage\n",
"\n",
"chat_history = [\n",
" ChatMessage(role=\"system\", content=\"You are a helpful QA chatbot that can answer questions about llama-index.\"),\n",
" ChatMessage(role=\"user\", content=\"How do I create an index?\"),\n",
"]\n",
"\n",
"response = llm.chat(chat_history)\n",
"print(response.message)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In this notebook, we covered the low-level LLM API, and tested out some basic prompts with out documentation data."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
+54
View File
@@ -0,0 +1,54 @@
# Documentation Guide
## A guide for docs contributors
The `docs` directory contains the sphinx source text for LlamaIndex docs, visit
https://gpt-index.readthedocs.io/ to read the full documentation.
This guide is made for anyone who's interested in running LlamaIndex documentation locally,
making changes to it and make contributions. LlamaIndex is made by the thriving community
behind it, and you're always welcome to make contributions to the project and the
documentation.
## Build Docs
If you haven't already, clone the LlamaIndex Github repo to a local directory:
```bash
git clone https://github.com/jerryjliu/llama_index.git && cd llama_index
```
Install all dependencies required for building docs (mainly `sphinx` and its extension):
```bash
pip install -r docs/requirements.txt
```
Build the sphinx docs:
```bash
cd docs
make html
```
The docs HTML files are now generated under `docs/_build/html` directory, you can preview
it locally with the following command:
```bash
python -m http.server 8000 -d _build/html
```
And open your browser at http://0.0.0.0:8000/ to view the generated docs.
##### Watch Docs
We recommend using sphinx-autobuild during development, which provides a live-reloading
server, that rebuilds the documentation and refreshes any open pages automatically when
changes are saved. This enables a much shorter feedback loop which can help boost
productivity when writing documentation.
Simply run the following command from LlamaIndex project's root directory:
```bash
make watch-docs
```
+20
View File
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+103
View File
@@ -0,0 +1,103 @@
# App Showcase
Here is a sample of some of the incredible applications and tools built on top of LlamaIndex!
###### Meru - Dense Data Retrieval API
Hosted API service. Includes a "Dense Data Retrieval" API built on top of LlamaIndex where users can upload their documents and query them.
[[Website]](https://www.usemeru.com/densedataretrieval)
###### Algovera
Build AI workflows using building blocks. Many workflows built on top of LlamaIndex.
[[Website]](https://app.algovera.ai/workflows).
###### ChatGPT LlamaIndex
Interface that allows users to upload long docs and chat with the bot.
[[Tweet thread]](https://twitter.com/s_jobs6/status/1618346125697875968?s=20&t=RJhQu2mD0-zZNGfq65xodA)
###### AgentHQ
A web tool to build agents, interacting with LlamaIndex data structures.[[Website]](https://app.agent-hq.io/)
###### PapersGPT
Feed any of the following content into GPT to give it deep customized knowledge:
- Scientific Papers
- Substack Articles
- Podcasts
- Github Repos
and more.
[[Tweet thread]](https://twitter.com/thejessezhang/status/1615390646763945991?s=20&t=eHvhmIaaaoYFyPSzDRNGtA)
[[Website]](https://jessezhang.org/llmdemo)
###### VideoQues + DocsQues
**VideoQues**: A tool that answers your queries on YouTube videos.
[[LinkedIn post here]](https://www.linkedin.com/posts/ravidesetty_ai-ml-dl-activity-7020599110953050112-EJA_/?utm_source=share&utm_medium=member_desktop).
**DocsQues**: A tool that answers your questions on longer documents (including .pdfs!)
[[LinkedIn post here]](https://www.linkedin.com/posts/ravidesetty_artificialintelligence-machinelearning-recruiters-activity-7016972785293946880-rhKC?utm_source=share&utm_medium=member_desktop).
###### PaperBrain
A platform to access/understand research papers.
[[Tweet thread]](https://twitter.com/mdarshad1000/status/1619824637898264578?s=20&t=eHvhmIaaaoYFyPSzDRNGtA).
###### CACTUS
Contextual search on top of LinkedIn search results.
[[LinkedIn post here]](https://www.linkedin.com/posts/mathewteoh_chromeextension-chatgpt-python-activity-7019362515566403584-ryqW?utm_source=share&utm_medium=member_desktop).
###### Personal Note Chatbot
A chatbot that can answer questions over a directory of Obsidian notes.
[[Tweet thread]](https://twitter.com/Sarah_A_Bentley/status/1611069576099336207?s=20&t=IjPLK3msACQjEBYxJJxj4w).
###### RHOBH AMA
Ask questions about the Real Housewives of Beverly Hills.
[[Tweet thread]](https://twitter.com/YourBuddyConner/status/1616504644439789568?s=20&t=bCHa3im7mjoIXLuKo5PttQ)
[[Website]](https://realhousewivesai.com/)
###### Mynd
A journaling app that uses AI to uncover insights and patterns over time.
[[Website]](https://mynd.so)
###### CoFounder
The First AI Co-Founder for Your Start-up 🙌
[CoFounder](https://co-founder.ai?utm_source=llama-index&utm_medium=gallary&utm_campaign=alpha) is a platform to revolutionize the start-up ecosystem by providing founders with unparalleled tools, resources, and support. We are changing how founders build their companies from 0-1—productizing the accelerator/incubator programs using AI.
Current features:
* AI Investor Matching and Introduction and Tracking
* AI Pitch Deck creation
* Real-time Pitch Deck practice/feedback
* Automatic Competitive Analysis / Watchlist
* More coming soon...
[[Website]](https://co-founder.ai?utm_source=llama-index&utm_medium=gallary&utm_campaign=alpha)
###### Al-X by OpenExO
Your Digital Transformation Co-Pilot
[[Website]](https://chat.openexo.com)
###### AnySummary
Summarize any document, audio or video with AI
[[Website]](https://anysummary.app)
###### Blackmaria
Python package for webscraping in Natural language.
[[Tweet thread]](https://twitter.com/obonigwe1/status/1640080422661943298?t=aftqisb4vaudwrgwah_1oa&s=19)
[[Github]](https://github.com/Smyja/blackmaria)
+16
View File
@@ -0,0 +1,16 @@
# Integrations
LlamaIndex has a number of community integrations, from vector stores, to prompt trackers, tracers, and more!
```{toctree}
---
maxdepth: 1
---
integrations/graphsignal.md
integrations/guidance.md
integrations/trulens.md
integrations/chatgpt_plugins.md
integrations/using_with_langchain.md
integrations/graph_stores.md
integrations/vector_stores.md
```
@@ -0,0 +1,129 @@
# ChatGPT Plugin Integrations
**NOTE**: This is a work-in-progress, stay tuned for more exciting updates on this front!
## ChatGPT Retrieval Plugin Integrations
The [OpenAI ChatGPT Retrieval Plugin](https://github.com/openai/chatgpt-retrieval-plugin)
offers a centralized API specification for any document storage system to interact
with ChatGPT. Since this can be deployed on any service, this means that more and more
document retrieval services will implement this spec; this allows them to not only
interact with ChatGPT, but also interact with any LLM toolkit that may use
a retrieval service.
LlamaIndex provides a variety of integrations with the ChatGPT Retrieval Plugin.
### Loading Data from LlamaHub into the ChatGPT Retrieval Plugin
The ChatGPT Retrieval Plugin defines an `/upsert` endpoint for users to load
documents. This offers a natural integration point with LlamaHub, which offers
over 65 data loaders from various API's and document formats.
Here is a sample code snippet of showing how to load a document from LlamaHub
into the JSON format that `/upsert` expects:
```python
from llama_index import download_loader, Document
from typing import Dict, List
import json
# download loader, load documents
SimpleWebPageReader = download_loader("SimpleWebPageReader")
loader = SimpleWebPageReader(html_to_text=True)
url = "http://www.paulgraham.com/worked.html"
documents = loader.load_data(urls=[url])
# Convert LlamaIndex Documents to JSON format
def dump_docs_to_json(documents: List[Document], out_path: str) -> Dict:
"""Convert LlamaIndex Documents to JSON format and save it."""
result_json = []
for doc in documents:
cur_dict = {
"text": doc.get_text(),
"id": doc.get_doc_id(),
# NOTE: feel free to customize the other fields as you wish
# fields taken from https://github.com/openai/chatgpt-retrieval-plugin/tree/main/scripts/process_json#usage
# "source": ...,
# "source_id": ...,
# "url": url,
# "created_at": ...,
# "author": "Paul Graham",
}
result_json.append(cur_dict)
json.dump(result_json, open(out_path, 'w'))
```
For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPT_Retrieval_Plugin_Upload.ipynb).
### ChatGPT Retrieval Plugin Data Loader
The ChatGPT Retrieval Plugin data loader [can be accessed on LlamaHub](https://llamahub.ai/l/chatgpt_plugin).
It allows you to easily load data from any docstore that implements the plugin API, into a LlamaIndex data structure.
Example code:
```python
from llama_index.readers import ChatGPTRetrievalPluginReader
import os
# load documents
bearer_token = os.getenv("BEARER_TOKEN")
reader = ChatGPTRetrievalPluginReader(
endpoint_url="http://localhost:8000",
bearer_token=bearer_token
)
documents = reader.load_data("What did the author do growing up?")
# build and query index
from llama_index import ListIndex
index = ListIndex(documents)
# set Logging to DEBUG for more detailed outputs
query_engine = vector_index.as_query_engine(
response_mode="compact"
)
response = query_engine.query(
"Summarize the retrieved content and describe what the author did growing up",
)
```
For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPTRetrievalPluginReaderDemo.ipynb).
### ChatGPT Retrieval Plugin Index
The ChatGPT Retrieval Plugin Index allows you to easily build a vector index over any documents, with storage backed by a document store implementing the
ChatGPT endpoint.
Note: this index is a vector index, allowing top-k retrieval.
Example code:
```python
from llama_index.indices.vector_store import ChatGPTRetrievalPluginIndex
from llama_index import SimpleDirectoryReader
import os
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
# build index
bearer_token = os.getenv("BEARER_TOKEN")
# initialize without metadata filter
index = ChatGPTRetrievalPluginIndex(
documents,
endpoint_url="http://localhost:8000",
bearer_token=bearer_token,
)
# query index
query_engine = vector_index.as_query_engine(
similarity_top_k=3,
response_mode="compact",
)
response = query_engine.query("What did the author do growing up?")
```
For more details, check out the [full example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatgpt_plugin/ChatGPTRetrievalPluginIndexDemo.ipynb).
@@ -0,0 +1,15 @@
# Using Graph Stores
## `NebulaGraphStore`
We support a `NebulaGraphStore` integration, for persisting graphs directly in Nebula! Furthermore, you can generate cypher queries and return natural language responses for your Nebula graphs using the `KnowledgeGraphQueryEngine`.
See the associated guides below:
```{toctree}
---
maxdepth: 1
---
Nebula Graph Store </examples/index_structs/knowledge_graph/NebulaGraphKGIndexDemo.ipynb>
Knowledge Graph Query Engine </examples/query_engine/knowledge_graph_query_engine.ipynb>
```
@@ -0,0 +1,46 @@
# Tracing with Graphsignal
[Graphsignal](https://graphsignal.com/) provides observability for AI agents and LLM-powered applications. It helps developers ensure AI applications run as expected and users have the best experience.
Graphsignal **automatically** traces and monitors LlamaIndex. Traces and metrics provide execution details for query, retrieval, and index operations. These insights include **prompts**, **completions**, **embedding statistics**, **retrieved nodes**, **parameters**, **latency**, and **exceptions**.
When OpenAI APIs are used, Graphsignal provides additional insights such as **token counts** and **costs** per deployment, model or any context.
### Installation and Setup
Adding [Graphsignal tracer](https://github.com/graphsignal/graphsignal-python) is simple, just install and configure it:
```sh
pip install graphsignal
```
```python
import graphsignal
# Provide an API key directly or via GRAPHSIGNAL_API_KEY environment variable
graphsignal.configure(api_key='my-api-key', deployment='my-llama-index-app-prod')
```
You can get an API key [here](https://app.graphsignal.com/).
See the [Quick Start guide](https://graphsignal.com/docs/guides/quick-start/), [Integration guide](https://graphsignal.com/docs/integrations/llama-index/), and an [example app](https://github.com/graphsignal/examples/blob/main/llama-index-app/main.py) for more information.
### Tracing Other Functions
To additionally trace any function or code, you can use a decorator or a context manager:
```python
with graphsignal.start_trace('load-external-data'):
reader.load_data()
```
See [Python API Reference](https://graphsignal.com/docs/reference/python-api/) for complete instructions.
### Useful Links
* [Tracing and Monitoring LlamaIndex Applications](https://graphsignal.com/blog/tracing-and-monitoring-llama-index-applications/)
* [Monitor OpenAI API Latency, Tokens, Rate Limits, and More](https://graphsignal.com/blog/monitor-open-ai-api-latency-tokens-rate-limits-and-more/)
* [OpenAI API Cost Tracking: Analyzing Expenses by Model, Deployment, and Context](https://graphsignal.com/blog/open-ai-api-cost-tracking-analyzing-expenses-by-model-deployment-and-context/)
+90
View File
@@ -0,0 +1,90 @@
# Guidance
[Guidance](https://github.com/microsoft/guidance) is a guidance language for controlling large language models developed by Microsoft.
Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text.
## Structured Output
One particularly exciting aspect of guidance is the ability to output structured objects (think JSON following a specific schema, or a pydantic object). Instead of just "suggesting" the desired output structure to the LLM, guidance can actually "force" the LLM output to follow the desired schema. This allows the LLM to focus on the content rather than the syntax, and completely eliminate the possibility of output parsing issues.
This is particularly powerful for weaker LLMs which be smaller in parameter count, and not trained on sufficient source code data to be able to reliably produce well-formed, hierarchical structured output.
### Creating a guidance program to generate pydantic objects
In LlamaIndex, we provide an initial integration with guidance, to make it super easy for generating structured output (more specifically pydantic objects).
For example, if we want to generate an album of songs, with the following schema:
```python
class Song(BaseModel):
title: str
length_seconds: int
class Album(BaseModel):
name: str
artist: str
songs: List[Song]
```
It's as simple as creating a `GuidancePydanticProgram`, specifying our desired pydantic class `Album`,
and supplying a suitable prompt template.
> Note: guidance uses handlebars-style templates, which uses double braces for variable substitution, and single braces for literal braces. This is the opposite convention of Python format strings.
> Note: We provide an utility function `from llama_index.prompts.guidance_utils import convert_to_handlebars` that can convert from the Python format string style template to guidance handlebars-style template.
```python
program = GuidancePydanticProgram(
output_cls=Album,
prompt_template_str="Generate an example album, with an artist and a list of songs. Using the movie {{movie_name}} as inspiration",
guidance_llm=OpenAI('text-davinci-003'),
verbose=True,
)
```
Now we can run the program by calling it with additional user input.
Here let's go for something spooky and create an album inspired by the Shining.
```python
output = program(movie_name='The Shining')
```
We have our pydantic object:
```python
Album(name='The Shining', artist='Jack Torrance', songs=[Song(title='All Work and No Play', length_seconds=180), Song(title='The Overlook Hotel', length_seconds=240), Song(title='The Shining', length_seconds=210)])
```
You can play with [this notebook](/examples/output_parsing/guidance_pydantic_program.ipynb) for more details.
### Using guidance to improve the robustness of our sub-question query engine.
LlamaIndex provides a toolkit of advanced query engines for tackling different use-cases.
Several relies on structured output in intermediate steps.
We can use guidance to improve the robustness of these query engines, by making sure the
intermediate response has the expected structure (so that they can be parsed correctly to a structured object).
As an example, we implement a `GuidanceQuestionGenerator` that can be plugged into a `SubQuestionQueryEngine` to make it more robust than using the default setting.
```python
from llama_index.question_gen.guidance_generator import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
# define guidance based question generator
question_gen = GuidanceQuestionGenerator.from_defaults(guidance_llm=GuidanceOpenAI('text-davinci-003'), verbose=False)
# define query engine tools
query_engine_tools = ...
# construct sub-question query engine
s_engine = SubQuestionQueryEngine.from_defaults(
question_gen=question_gen # use guidance based question_gen defined above
query_engine_tools=query_engine_tools,
)
```
See [this notebook](/examples/output_parsing/guidance_sub_question.ipynb) for more details.
+35
View File
@@ -0,0 +1,35 @@
# Evaluating and Tracking with TruLens
This page covers how to use [TruLens](https://trulens.org) to evaluate and track LLM apps built on Llama-Index.
## What is TruLens?
TruLens is an [opensource](https://github.com/truera/trulens) package that provides instrumentation and evaluation tools for large language model (LLM) based applications. This includes feedback function evaluations of relevance, sentiment and more, plus in-depth tracing including cost and latency.
![TruLens Architecture](https://github.com/truera/trulens/blob/main/docs/Assets/image/TruLens_Architecture.png)
As you iterate on new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. You'll also be able to view evaluations at a record level, and explore the app metadata for each record.
### Installation and Setup
Adding TruLens is simple, just install it from pypi!
```sh
pip install trulens-eval
```
```python
from trulens_eval import TruLlama
```
## Try it out!
[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.4.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/google-colab/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb)
## Read more
* [Build and Evaluate LLM Apps with LlamaIndex and TruLens](https://medium.com/llamaindex-blog/build-and-evaluate-llm-apps-with-llamaindex-and-trulens-6749e030d83c)
* [trulens.org](https://www.trulens.org/)
@@ -0,0 +1,69 @@
# Using with Langchain 🦜🔗
LlamaIndex provides both Tool abstractions for a Langchain agent as well as a memory module.
The API reference of the Tool abstractions + memory modules are [here](/api_reference/langchain_integrations/base.rst).
### Use any data loader as a Langchain Tool
LlamaIndex allows you to use any data loader within the LlamaIndex core repo or in [LlamaHub](https://llamahub.ai/) as an "on-demand" data query Tool within a LangChain agent.
The Tool will 1) load data using the data loader, 2) index the data, and 3) query the data and return the response in an ad-hoc manner.
**Resources**
- [OnDemandLoaderTool Tutorial](/examples/tools/OnDemandLoaderTool.ipynb)
### Use a query engine as a Langchain Tool
LlamaIndex provides Tool abstractions so that you can use a LlamaIndex query engine along with a Langchain agent.
For instance, you can choose to create a "Tool" from an `QueryEngine` directly as follows:
```python
from llama_index.langchain_helpers.agents import IndexToolConfig, LlamaIndexTool
tool_config = IndexToolConfig(
query_engine=query_engine,
name=f"Vector Index",
description=f"useful for when you want to answer queries about X",
tool_kwargs={"return_direct": True}
)
tool = LlamaIndexTool.from_tool_config(tool_config)
```
You can also choose to provide a `LlamaToolkit`:
```python
toolkit = LlamaToolkit(
index_configs=index_configs,
)
```
Such a toolkit can be used to create a downstream Langchain-based chat agent through
our `create_llama_agent` and `create_llama_chat_agent` commands:
```python
from llama_index.langchain_helpers.agents import create_llama_chat_agent
agent_chain = create_llama_chat_agent(
toolkit,
llm,
memory=memory,
verbose=True
)
agent_chain.run(input="Query about X")
```
You can take a look at [the full tutorial notebook here](https://github.com/jerryjliu/llama_index/blob/main/examples/chatbot/Chatbot_SEC.ipynb).
### Llama Demo Notebook: Tool + Memory module
We provide another demo notebook showing how you can build a chat agent with the following components.
- Using LlamaIndex as a generic callable tool with a Langchain agent
- Using LlamaIndex as a memory module; this allows you to insert arbitrary amounts of conversation history with a Langchain chatbot!
Please see the [notebook here](https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb).
@@ -0,0 +1,459 @@
# Using Vector Stores
LlamaIndex offers multiple integration points with vector stores / vector databases:
1. LlamaIndex can use a vector store itself as an index. Like any other index, this index can store documents and be used to answer queries.
2. LlamaIndex can load data from vector stores, similar to any other data connector. This data can then be used within LlamaIndex data structures.
(vector-store-index)=
## Using a Vector Store as an Index
LlamaIndex also supports different vector stores
as the storage backend for `VectorStoreIndex`.
- Chroma (`ChromaVectorStore`) [Installation](https://docs.trychroma.com/getting-started)
- DeepLake (`DeepLakeVectorStore`) [Installation](https://docs.deeplake.ai/en/latest/Installation.html)
- Qdrant (`QdrantVectorStore`) [Installation](https://qdrant.tech/documentation/install/) [Python Client](https://qdrant.tech/documentation/install/#python-client)
- Weaviate (`WeaviateVectorStore`). [Installation](https://weaviate.io/developers/weaviate/installation). [Python Client](https://weaviate.io/developers/weaviate/client-libraries/python).
- Pinecone (`PineconeVectorStore`). [Installation/Quickstart](https://docs.pinecone.io/docs/quickstart).
- Faiss (`FaissVectorStore`). [Installation](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md).
- Milvus (`MilvusVectorStore`). [Installation](https://milvus.io/docs)
- Zilliz (`MilvusVectorStore`). [Quickstart](https://zilliz.com/doc/quick_start)
- MyScale (`MyScaleVectorStore`). [Quickstart](https://docs.myscale.com/en/quickstart/). [Installation/Python Client](https://docs.myscale.com/en/python-client/).
- Supabase (`SupabaseVectorStore`). [Quickstart](https://supabase.github.io/vecs/api/).
- DocArray (`DocArrayHnswVectorStore`, `DocArrayInMemoryVectorStore`). [Installation/Python Client](https://github.com/docarray/docarray#installation).
- MongoDB Atlas (`MongoDBAtlasVectorSearch`). [Installation/Quickstart] (https://www.mongodb.com/atlas/database).
- Redis (`RedisVectorStore`). [Installation](https://redis.io/docs/getting-started/installation/).
A detailed API reference is [found here](/api_reference/indices/vector_store.rst).
Similar to any other index within LlamaIndex (tree, keyword table, list), `VectorStoreIndex` can be constructed upon any collection
of documents. We use the vector store within the index to store embeddings for the input text chunks.
Once constructed, the index can be used for querying.
**Default Vector Store Index Construction/Querying**
By default, `VectorStoreIndex` uses a in-memory `SimpleVectorStore`
that's initialized as part of the default storage context.
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
# Load documents and build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex.from_documents(documents)
# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
```
**Custom Vector Store Index Construction/Querying**
We can query over a custom vector store as follows:
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores import DeepLakeVectorStore
# construct vector store and customize storage context
storage_context = StorageContext.from_defaults(
vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
)
# Load documents and build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
```
Below we show more examples of how to construct various vector stores we support.
**Redis**
First, start Redis-Stack (or get url from Redis provider)
```bash
docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
```
Then connect and use Redis as a vector database with LlamaIndex
```python
from llama_index.vector_stores import RedisVectorStore
vector_store = RedisVectorStore(
index_name="llm-project",
redis_url="redis://localhost:6379",
overwrite=True
)
```
This can be used with the `VectorStoreIndex` to provide a query interface for retrieval, querying, deleting, persisting the index, and more.
**DeepLake**
```python
import os
import getpath
from llama_index.vector_stores import DeepLakeVectorStore
os.environ["OPENAI_API_KEY"] = getpath.getpath("OPENAI_API_KEY: ")
os.environ["ACTIVELOOP_TOKEN"] = getpath.getpath("ACTIVELOOP_TOKEN: ")
dataset_path = "hub://adilkhan/paul_graham_essay"
# construct vector store
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)
```
**Faiss**
```python
import faiss
from llama_index.vector_stores import FaissVectorStore
# create faiss index
d = 1536
faiss_index = faiss.IndexFlatL2(d)
# construct vector store
vector_store = FaissVectorStore(faiss_index)
...
# NOTE: since faiss index is in-memory, we need to explicitly call
# vector_store.persist() or storage_context.persist() to save it to disk.
# persist() takes in optional arg persist_path. If none give, will use default paths.
storage_context.persist()
```
**Weaviate**
```python
import weaviate
from llama_index.vector_stores import WeaviateVectorStore
# creating a Weaviate client
resource_owner_config = weaviate.AuthClientPassword(
username="<username>",
password="<password>",
)
client = weaviate.Client(
"https://<cluster-id>.semi.network/", auth_client_secret=resource_owner_config
)
# construct vector store
vector_store = WeaviateVectorStore(weaviate_client=client)
```
**Pinecone**
```python
import pinecone
from llama_index.vector_stores import PineconeVectorStore
# Creating a Pinecone index
api_key = "api_key"
pinecone.init(api_key=api_key, environment="us-west1-gcp")
pinecone.create_index(
"quickstart",
dimension=1536,
metric="euclidean",
pod_type="p1"
)
index = pinecone.Index("quickstart")
# can define filters specific to this vector index (so you can
# reuse pinecone indexes)
metadata_filters = {"title": "paul_graham_essay"}
# construct vector store
vector_store = PineconeVectorStore(
pinecone_index=index,
metadata_filters=metadata_filters
)
```
**Qdrant**
```python
import qdrant_client
from llama_index.vector_stores import QdrantVectorStore
# Creating a Qdrant vector store
client = qdrant_client.QdrantClient(
host="<qdrant-host>",
api_key="<qdrant-api-key>",
https=True
)
collection_name = "paul_graham"
# construct vector store
vector_store = QdrantVectorStore(
client=client,
collection_name=collection_name,
)
```
**Chroma**
```python
import chromadb
from llama_index.vector_stores import ChromaVectorStore
# Creating a Chroma client
# By default, Chroma will operate purely in-memory.
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("quickstart")
# construct vector store
vector_store = ChromaVectorStore(
chroma_collection=chroma_collection,
)
```
**Milvus**
- Milvus Index offers the ability to store both Documents and their embeddings. Documents are limited to the predefined Document attributes and does not include metadata.
```python
import pymilvus
from llama_index.vector_stores import MilvusVectorStore
# construct vector store
vector_store = MilvusVectorStore(
host='localhost',
port=19530,
overwrite='True'
)
```
**Note**: `MilvusVectorStore` depends on the `pymilvus` library.
Use `pip install pymilvus` if not already installed.
If you get stuck at building wheel for `grpcio`, check if you are using python 3.11
(there's a known issue: https://github.com/milvus-io/pymilvus/issues/1308)
and try downgrading.
**Zilliz**
- Zilliz Cloud (hosted version of Milvus) uses the Milvus Index with some extra arguments.
```python
import pymilvus
from llama_index.vector_stores import MilvusVectorStore
# construct vector store
vector_store = MilvusVectorStore(
host='foo.vectordb.zillizcloud.com',
port=403,
user="db_admin",
password="foo",
use_secure=True,
overwrite='True'
)
```
**Note**: `MilvusVectorStore` depends on the `pymilvus` library.
Use `pip install pymilvus` if not already installed.
If you get stuck at building wheel for `grpcio`, check if you are using python 3.11
(there's a known issue: https://github.com/milvus-io/pymilvus/issues/1308)
and try downgrading.
**MyScale**
```python
import clickhouse_connect
from llama_index.vector_stores import MyScaleVectorStore
# Creating a MyScale client
client = clickhouse_connect.get_client(
host='YOUR_CLUSTER_HOST',
port=8443,
username='YOUR_USERNAME',
password='YOUR_CLUSTER_PASSWORD'
)
# construct vector store
vector_store = MyScaleVectorStore(
myscale_client=client
)
```
**DocArray**
```python
from llama_index.vector_stores import (
DocArrayHnswVectorStore,
DocArrayInMemoryVectorStore,
)
# construct vector store
vector_store = DocArrayHnswVectorStore(work_dir='hnsw_index')
# alternatively, construct the in-memory vector store
vector_store = DocArrayInMemoryVectorStore()
```
**MongoDBAtlas**
```python
# Provide URI to constructor, or use environment variable
import pymongo
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.indices.vector_store.base import VectorStoreIndex
from llama_index.storage.storage_context import StorageContext
from llama_index.readers.file.base import SimpleDirectoryReader
# mongo_uri = os.environ["MONGO_URI"]
mongo_uri = "mongodb+srv://<username>:<password>@<host>?retryWrites=true&w=majority"
mongodb_client = pymongo.MongoClient(mongo_uri)
# construct store
store = MongoDBAtlasVectorSearch(mongodb_client)
storage_context = StorageContext.from_defaults(vector_store=store)
uber_docs = SimpleDirectoryReader(input_files=["../data/10k/uber_2021.pdf"]).load_data()
# construct index
index = VectorStoreIndex.from_documents(uber_docs, storage_context=storage_context)
```
[Example notebooks can be found here](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/vector_stores).
## Loading Data from Vector Stores using Data Connector
LlamaIndex supports loading data from the following sources. See [Data Connectors](../connector/root.md) for more details and API documentation.
Chroma stores both documents and vectors. This is an example of how to use Chroma:
```python
from llama_index.readers.chroma import ChromaReader
from llama_index.indices import ListIndex
# The chroma reader loads data from a persisted Chroma collection.
# This requires a collection name and a persist directory.
reader = ChromaReader(
collection_name="chroma_collection",
persist_directory="examples/data_connectors/chroma_collection"
)
query_vector=[n1, n2, n3, ...]
documents = reader.load_data(collection_name="demo", query_vector=query_vector, limit=5)
index = ListIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
display(Markdown(f"<b>{response}</b>"))
```
Qdrant also stores both documents and vectors. This is an example of how to use Qdrant:
```python
from llama_index.readers.qdrant import QdrantReader
reader = QdrantReader(host="localhost")
# the query_vector is an embedding representation of your query_vector
# Example query_vector
# query_vector = [0.3, 0.3, 0.3, 0.3, ...]
query_vector = [n1, n2, n3, ...]
# NOTE: Required args are collection_name, query_vector.
# See the Python client: https;//github.com/qdrant/qdrant_client
# for more details
documents = reader.load_data(collection_name="demo", query_vector=query_vector, limit=5)
```
NOTE: Since Weaviate can store a hybrid of document and vector objects, the user may either choose to explicitly specify `class_name` and `properties` in order to query documents, or they may choose to specify a raw GraphQL query. See below for usage.
```python
# option 1: specify class_name and properties
# 1) load data using class_name and properties
documents = reader.load_data(
class_name="<class_name>",
properties=["property1", "property2", "..."],
separate_documents=True
)
# 2) example GraphQL query
query = """
{
Get {
<class_name> {
<property1>
<property2>
}
}
}
"""
documents = reader.load_data(graphql_query=query, separate_documents=True)
```
NOTE: Both Pinecone and Faiss data loaders assume that the respective data sources only store vectors; text content is stored elsewhere. Therefore, both data loaders require that the user specifies an `id_to_text_map` in the load_data call.
For instance, this is an example usage of the Pinecone data loader `PineconeReader`:
```python
from llama_index.readers.pinecone import PineconeReader
reader = PineconeReader(api_key=api_key, environment="us-west1-gcp")
id_to_text_map = {
"id1": "text blob 1",
"id2": "text blob 2",
}
query_vector=[n1, n2, n3, ..]
documents = reader.load_data(
index_name="quickstart", id_to_text_map=id_to_text_map, top_k=3, vector=query_vector, separate_documents=True
)
```
[Example notebooks can be found here](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/data_connectors).
```{toctree}
---
caption: Examples
maxdepth: 1
---
../../examples/vector_stores/SimpleIndexDemo.ipynb
../../examples/vector_stores/SimpleIndexDemoMMR.ipynb
../../examples/vector_stores/RedisIndexDemo.ipynb
../../examples/vector_stores/QdrantIndexDemo.ipynb
../../examples/vector_stores/FaissIndexDemo.ipynb
../../examples/vector_stores/DeepLakeIndexDemo.ipynb
../../examples/vector_stores/MyScaleIndexDemo.ipynb
../../examples/vector_stores/MetalIndexDemo.ipynb
../../examples/vector_stores/WeaviateIndexDemo.ipynb
../../examples/vector_stores/OpensearchDemo.ipynb
../../examples/vector_stores/PineconeIndexDemo.ipynb
../../examples/vector_stores/ChromaIndexDemo.ipynb
../../examples/vector_stores/LanceDBIndexDemo.ipynb
../../examples/vector_stores/MilvusIndexDemo.ipynb
../../examples/vector_stores/WeaviateIndexDemo-Hybrid.ipynb
../../examples/vector_stores/PineconeIndexDemo-Hybrid.ipynb
../../examples/vector_stores/AsyncIndexCreationDemo.ipynb
../../examples/vector_stores/SupabaseVectorIndexDemo.ipynb
../../examples/vector_stores/DocArrayHnswIndexDemo.ipynb
../../examples/vector_stores/DocArrayInMemoryIndexDemo.ipynb
../../examples/vector_stores/MongoDBAtlasVectorSearch.ipynb
../../examples/vector_stores/postgres.ipynb
```
+77
View File
@@ -0,0 +1,77 @@
"""Configuration for sphinx."""
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import sphinx_rtd_theme # noqa: F401
sys.path.insert(0, os.path.abspath("../"))
with open("../llama_index/VERSION") as f:
version = f.read()
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
project = "LlamaIndex 🦙"
copyright = "2022, Jerry Liu"
author = "Jerry Liu"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.coverage",
"sphinx.ext.autodoc.typehints",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx_rtd_theme",
"sphinx.ext.mathjax",
"m2r2",
"myst_nb",
"sphinxcontrib.autodoc_pydantic",
]
myst_heading_anchors = 4
# TODO: Fix the non-consecutive header level in our docs, until then
# disable the sphinx/myst warnings
suppress_warnings = ["myst.header"]
templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = "furo"
html_title = project + " " + version
html_static_path = ["_static"]
html_css_files = [
"css/custom.css",
"css/algolia.css",
"https://cdn.jsdelivr.net/npm/@docsearch/css@3",
]
html_js_files = [
"js/mendablesearch.js",
(
"https://cdn.jsdelivr.net/npm/@docsearch/js@3.3.3/dist/umd/index.js",
{"defer": "defer"},
),
("js/algolia.js", {"defer": "defer"}),
]
nb_execution_mode = "off"
@@ -0,0 +1,27 @@
# Module Guides
These guide provide an overview of how to use our agent classes.
For more detailed guides on how to use specific tools, check out our [tools module guides]().
## OpenAI Agent
```{toctree}
---
maxdepth: 1
---
/examples/agent/openai_agent.ipynb
/examples/agent/openai_agent_with_query_engine.ipynb
/examples/agent/openai_agent_retrieval.ipynb
/examples/agent/openai_agent_query_cookbook.ipynb
/examples/agent/openai_agent_query_plan.ipynb
/examples/agent/openai_agent_context_retrieval.ipynb
```
## ReAct Agent
```{toctree}
---
maxdepth: 1
---
/examples/agent/react_agent_with_query_engine.ipynb
```
@@ -0,0 +1,69 @@
# Data Agents
## Concept
Data Agents are LLM-powered knowledge workers in LlamaIndex that can intelligently perform various tasks over your data, in both a “read” and “write” function. They are capable of the following:
- Perform automated search and retrieval over different types of data - unstructured, semi-structured, and structured.
- Calling any external service API in a structured fashion, and processing the response + storing it for later.
In that sense, agents are a step beyond our [query engines](/core_modules/query_modules/query_engine/root.md) in that they can not only "read" from a static source of data, but can dynamically ingest and modify data from a variety of different tools.
Building a data agent requires the following core components:
- A reasoning loop
- Tool abstractions
A data agent is initialized with set of APIs, or Tools, to interact with; these APIs can be called by the agent to return information or modify state. Given an input task, the data agent uses a reasoning loop to decide which tools to use, in which sequence, and the parameters to call each tool.
### Reasoning Loop
The reasoning loop depends on the type of agent. We have support for the following agents:
- OpenAI Function agent (built on top of the OpenAI Function API)
- a ReAct agent (which works across any chat/text completion endpoint).
### Tool Abstractions
You can learn more about our Tool abstractions in our [Tools section](/core_modules/agent_modules/tools/root.md).
### Blog Post
For full details, please check out our detailed [blog post](https://medium.com/llamaindex-blog/data-agents-eed797d7972f).
## Usage Pattern
Data agents can be used in the following manner (the example uses the OpenAI Function API)
```python
from llama_index.agent import OpenAIAgent
from llama_index.llms import OpenAI
# import and define tools
...
# initialize llm
llm = OpenAI(model="gpt-3.5-turbo-0613")
# initialize openai agent
agent = OpenAIAgent.from_tools(tools, llm=llm, verbose=True)
```
See our usage pattern guide for more details.
```{toctree}
---
maxdepth: 1
---
usage_pattern.md
```
## Modules
Learn more about our different agent types in our module guides below.
Also take a look at our [tools section](/core_modules/agent_modules/tools/root.md)!
```{toctree}
---
maxdepth: 2
---
modules.md
```
@@ -0,0 +1,193 @@
# Usage Pattern
## Get Started
An agent is initialized from a set of Tools. Here's an example of instantiating a ReAct
agent from a set of Tools.
```python
from llama_index.tools import FunctionTool
from llama_index.llms import OpenAI
from llama_index.agent import ReActAgent
# define sample Tool
def multiply(a: int, b: int) -> int:
"""Multiple two integers and returns the result integer"""
return a * b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
# initialize llm
llm = OpenAI(model="gpt-3.5-turbo-0613")
# initialize ReAct agent
agent = ReActAgent.from_tools([multiply_tool], llm=llm, verbose=True)
```
An agent supports both `chat` and `query` endpoints, inheriting from our `ChatEngine` and `QueryEngine` respectively.
Example usage:
```python
agent.chat("What is 2123 * 215123")
```
## Query Engine Tools
It is easy to wrap query engines as tools for an agent as well. Simply do the following:
```python
from llama_index.agent import ReActAgent
from llama_index.tools import QueryEngineTool
# NOTE: lyft_index and uber_index are both SimpleVectorIndex instances
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description="Provides information about Lyft financials for year 2021. "
"Use a detailed plain text question as input to the tool.",
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description="Provides information about Uber financials for year 2021. "
"Use a detailed plain text question as input to the tool.",
),
),
]
# initialize ReAct agent
agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
```
## Use other agents as Tools
A nifty feature of our agents is that since they inherit from `BaseQueryEngine`, you can easily define other agents as tools
through our `QueryEngineTool`.
```python
from llama_index.tools import QueryEngineTool
query_engine_tools = [
QueryEngineTool(
query_engine=sql_agent,
metadata=ToolMetadata(
name="sql_agent",
description="Agent that can execute SQL queries."
),
),
QueryEngineTool(
query_engine=gmail_agent,
metadata=ToolMetadata(
name="gmail_agent",
description="Tool that can send emails on Gmail."
),
),
]
outer_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
```
## Advanced Concepts (for `OpenAIAgent`, in beta)
You can also use agents in more advanced settings. For instance, being able to retrieve tools from an index during query-time, and
being able to perform query planning over an existing set of Tools.
These are largely implemented with our `OpenAIAgent` classes (which depend on the OpenAI Function API). Support
for our more general `ReActAgent` is something we're actively investigating.
NOTE: these are largely still in beta. The abstractions may change and become more general over time.
### Function Retrieval Agents
If the set of Tools is very large, you can create an `ObjectIndex` to index the tools, and then pass in an `ObjectRetriever` to the agent during query-time, to first dynamically retrieve the relevant tools before having the agent pick from the candidate tools.
We first build an `ObjectIndex` over an existing set of Tools.
```python
# define an "object" index over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping
tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
all_tools,
tool_mapping,
VectorStoreIndex,
)
```
We then define our `FnRetrieverOpenAIAgent`:
```python
from llama_index.agent import FnRetrieverOpenAIAgent
agent = FnRetrieverOpenAIAgent.from_retriever(obj_index.as_retriever(), verbose=True)
```
### Context Retrieval Agents
Our context-augmented OpenAI Agent will always perform retrieval before calling any tools.
This helps to provide additional context that can help the agent better pick Tools, versus
just trying to make a decision without any context.
```python
from llama_index.schema import Document
from llama_index.agent import ContextRetrieverOpenAIAgent
# toy index - stores a list of abbreviations
texts = [
"Abbrevation: X = Revenue",
"Abbrevation: YZ = Risk Factors",
"Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)
# add context agent
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
)
response = context_agent.chat("What is the YZ of March 2022?")
```
### Query Planning
OpenAI Function Agents can be capable of advanced query planning. The trick is to provide the agent
with a `QueryPlanTool` - if the agent calls the QueryPlanTool, it is forced to infer a full Pydantic schema representing a query
plan over a set of subtools.
```python
# define query plan tool
from llama_index.tools import QueryPlanTool
from llama_index import get_response_synthesizer
response_synthesizer = get_response_synthesizer(service_context=service_context)
query_plan_tool = QueryPlanTool.from_defaults(
query_engine_tools=[query_tool_sept, query_tool_june, query_tool_march],
response_synthesizer=response_synthesizer,
)
# initialize agent
agent = OpenAIAgent.from_tools(
[query_plan_tool],
max_function_calls=10,
llm=OpenAI(temperature=0, model="gpt-4-0613"),
verbose=True,
)
# should output a query plan to call march, june, and september tools
response = agent.query("Analyze Uber revenue growth in March, June, and September")
```
@@ -0,0 +1,65 @@
# LlamaHub Tools Guide
We offer a rich set of Tool Specs that are offered through [LlamaHub](https://llamahub.ai/) 🦙.
![](/_static/data_connectors/llamahub.png)
These tool specs represent an initial curated list of services that an agent can interact with and enrich its capability to perform different actions.
We also provide a list of **utility tools** that help to abstract away pain points when designing agents to interact with different API services that return large amounts of data.
## Tool Specs
Coming soon!
## Utility Tools
Oftentimes, directly querying an API can return a massive volume of data, which on its own may overflow the context window of the LLM (or at the very least unnecessarily increase the number of tokens that you are using).
To tackle this, weve provided an initial set of “utility tools” in LlamaHub Tools - utility tools are not conceptually tied to a given service (e.g. Gmail, Notion), but rather can augment the capabilities of existing Tools. In this particular case, utility tools help to abstract away common patterns of needing to cache/index and query data thats returned from any API request.
Lets walk through our two main utility tools below.
### OnDemandLoaderTool
This tool turns any existing LlamaIndex data loader ( `BaseReader` class) into a tool that an agent can use. The tool can be called with all the parameters needed to trigger `load_data` from the data loader, along with a natural language query string. During execution, we first load data from the data loader, index it (for instance with a vector store), and then query it “on-demand”. All three of these steps happen in a single tool call.
Oftentimes this can be preferable to figuring out how to load and index API data yourself. While this may allow for data reusability, oftentimes users just need an ad-hoc index to abstract away prompt window limitations for any API call.
A usage example is given below:
```python
from llama_hub.wikipedia.base import WikipediaReader
from llama_index.tools.on_demand_loader_tool import OnDemandLoaderTool
tool = OnDemandLoaderTool.from_defaults(
reader,
name="Wikipedia Tool",
description="A tool for loading data and querying articles from Wikipedia"
)
```
### LoadAndSearchToolSpec
The LoadAndSearchToolSpec takes in any existing Tool as input. As a tool spec, it implements `to_tool_list` , and when that function is called, two tools are returned: a `load` tool and then a `search` tool.
The `load` Tool execution would call the underlying Tool, and the index the output (by default with a vector index). The `search` Tool execution would take in a query string as input and call the underlying index.
This is helpful for any API endpoint that will by default return large volumes of data - for instance our WikipediaToolSpec will by default return entire Wikipedia pages, which will easily overflow most LLM context windows.
Example usage is shown below:
```python
from llama_hub.tools.wikipedia.base import WikipediaToolSpec
from llama_index.tools.tool_spec.load_and_search import LoadAndSearchToolSpec
wiki_spec = WikipediaToolSpec()
# Get the search wikipedia tool
tool = wiki_spec.to_tool_list()[1]
# Create the Agent with load/search tools
agent = OpenAIAgent.from_tools(
LoadAndSearchToolSpec.from_defaults(
tool
).to_tool_list(), verbose=True
)
```
@@ -0,0 +1,71 @@
# Tools
## Concept
Having proper tool abstractions is at the core of building [data agents](/core_modules/agent_modules/agents/root.md). Defining a set of Tools is similar to defining any API interface, with the exception that these Tools are meant for agent rather than human use. We allow users to define both a **Tool** as well as a **ToolSpec** containing a series of functions under the hood.
A Tool implements a very generic interface - simply define `__call__` and also return some basic metadata (name, description, function schema).
A Tool Spec defines a full API specification of any service that can be converted into a list of Tools.
We offer a few different types of Tools:
- `FunctionTool`: A function tool allows users to easily convert any user-defined function into a Tool. It can also auto-infer the function schema.
- `QueryEngineTool`: A tool that wraps an existing [query engine](/core_modules/query_modules/root.md). Note: since our agent abstractions inherit from `BaseQueryEngine`, these tools can also wrap other agents.
We offer a rich set of Tools and Tool Specs through [LlamaHub](https://llamahub.ai/) 🦙.
### Blog Post
For full details, please check out our detailed [blog post]().
## Usage Pattern
Our Tool Specs and Tools can be imported from the `llama-hub` package.
To use with our agent,
```python
from llama_index.agent import OpenAIAgent
from llama_hub.tools.gmail.base import GmailToolSpec
tool_spec = GmailToolSpec()
agent = OpenAIAgent.from_tools(tool_spec.to_tool_list(), verbose=True)
```
See our Usage Pattern Guide for more details.
```{toctree}
---
maxdepth: 1
---
usage_pattern.md
```
## LlamaHub Tools Guide 🛠️
Check out our guide for a full overview of the Tools/Tool Specs in LlamaHub!
```{toctree}
---
maxdepth: 1
---
llamahub_tools_guide.md
```
<!-- We offer a rich set of Tool Specs that are offered through [LlamaHub](https://llamahub.ai/) 🦙.
These tool specs represent an initial curated list of services that an agent can interact with and enrich its capability to perform different actions.
![](/_static/data_connectors/llamahub.png) -->
<!-- ## Module Guides
```{toctree}
---
maxdepth: 1
---
modules.md
```
## Tool Example Notebooks
Coming soon! -->
@@ -0,0 +1,35 @@
# Usage Pattern
LlamaHub Tool Specs and Tools can be imported from the `llama-hub` package. They can be plugged into our native agents, or LangChain agents.
## Using with our Agents
To use with our OpenAIAgent,
```python
from llama_index.agent import OpenAIAgent
from llama_hub.tools.gmail.base import GmailToolSpec
tool_spec = GmailToolSpec()
agent = OpenAIAgent.from_tools(tool_spec.to_tool_list(), verbose=True)
# use agent
agent.chat("Can you create a new email to helpdesk and support @example.com about a service outage")
```
Full Tool details can be found on our [LlamaHub](llamahub.ai) page. Each tool contains a "Usage" section showing how that tool can be used.
## Using with LangChain
To use with a LangChain agent, simply convert tools to LangChain tools with `to_langchain_tool()`.
```python
tools = tool_spec.to_tool_list()
langchain_tools = [t.to_langchain_tool() for t in tools]
# plug into LangChain agent
from langchain.agents import initialize_agent
agent_executor = initialize_agent(
langchain_tools, llm, agent="conversational-react-description", memory=memory
)
```
@@ -0,0 +1,31 @@
# Module Guides
```{toctree}
---
maxdepth: 1
---
../../../examples/data_connectors/PsychicDemo.ipynb
../../../examples/data_connectors/DeepLakeReader.ipynb
../../../examples/data_connectors/QdrantDemo.ipynb
../../../examples/data_connectors/DiscordDemo.ipynb
../../../examples/data_connectors/MongoDemo.ipynb
../../../examples/data_connectors/ChromaDemo.ipynb
../../../examples/data_connectors/MyScaleReaderDemo.ipynb
../../../examples/data_connectors/FaissDemo.ipynb
../../../examples/data_connectors/ObsidianReaderDemo.ipynb
../../../examples/data_connectors/SlackDemo.ipynb
../../../examples/data_connectors/WebPageDemo.ipynb
../../../examples/data_connectors/PineconeDemo.ipynb
../../../examples/data_connectors/MboxReaderDemo.ipynb
../../../examples/data_connectors/MilvusReaderDemo.ipynb
../../../examples/data_connectors/NotionDemo.ipynb
../../../examples/data_connectors/GithubRepositoryReaderDemo.ipynb
../../../examples/data_connectors/GoogleDocsDemo.ipynb
../../../examples/data_connectors/DatabaseReaderDemo.ipynb
../../../examples/data_connectors/TwitterDemo.ipynb
../../../examples/data_connectors/WeaviateDemo.ipynb
../../../examples/data_connectors/MakeDemo.ipynb
../../../examples/data_connectors/deplot/DeplotReader.ipynb
```
@@ -0,0 +1,52 @@
# Data Connectors (LlamaHub)
## Concept
A data connector (i.e. `Reader`) ingest data from different data sources and data formats into a simple `Document` representation (text and simple metadata).
```{tip}
Once you've ingested your data, you can build an [Index](/core_modules/data_modules/index/root.md) on top, ask questions using a [Query Engine](/core_modules/query_modules/query_engine/root.md), and have a conversation using a [Chat Engine](/core_modules/query_modules/chat_engines/root.md).
```
## LlamaHub
Our data connectors are offered through [LlamaHub](https://llamahub.ai/) 🦙.
LlamaHub is an open-source repository containing data loaders that you can easily plug and play into any LlamaIndex application.
![](/_static/data_connectors/llamahub.png)
## Usage Pattern
Get started with:
```python
from llama_index import download_loader
GoogleDocsReader = download_loader('GoogleDocsReader')
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=[...])
```
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
Some sample data connectors:
- local file directory (`SimpleDirectoryReader`). Can support parsing a wide range of file types: `.pdf`, `.jpg`, `.png`, `.docx`, etc.
- [Notion](https://developers.notion.com/) (`NotionPageReader`)
- [Google Docs](https://developers.google.com/docs/api) (`GoogleDocsReader`)
- [Slack](https://api.slack.com/) (`SlackReader`)
- [Discord](https://discord.com/developers/docs/intro) (`DiscordReader`)
- [Apify Actors](https://llamahub.ai/l/apify-actor) (`ApifyActor`). Can crawl the web, scrape webpages, extract text content, download files including `.pdf`, `.jpg`, `.png`, `.docx`, etc.
See below for detailed guides.
```{toctree}
---
maxdepth: 2
---
modules.rst
```
@@ -0,0 +1,20 @@
# Usage Pattern
## Get Started
Each data loader contains a "Usage" section showing how that loader can be used. At the core of using each loader is a `download_loader` function, which
downloads the loader file into a module that you can use within your application.
Example usage:
```python
from llama_index import VectorStoreIndex, download_loader
GoogleDocsReader = download_loader('GoogleDocsReader')
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
query_engine.query('Where did the author go to school?')
```
@@ -0,0 +1,64 @@
# Documents / Nodes
## Concept
Document and Node objects are core abstractions within LlamaIndex.
A **Document** is a generic container around any data source - for instance, a PDF, an API output, or retrieved data from a database. They can be constructed manually, or created automatically via our data loaders. By default, a Document stores text along with some other attributes. Some of these are listed below.
- `metadata` - a dictionary of annotations that can be appended to the text.
- `relationships` - a dictionary containing relationships to other Documents/Nodes.
*Note*: We have beta support for allowing Documents to store images, and are actively working on improving its multimodal capabilities.
A **Node** represents a "chunk" of a source Document, whether that is a text chunk, an image, or other. Similar to Documents, they contain metadata and relationship information with other nodes.
Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to "parse" source Documents into Nodes through our `NodeParser` classes. By default every Node derived from a Document will inherit the same metadata from that Document (e.g. a "file_name" filed in the Document is propagated to every Node).
## Usage Pattern
Here are some simple snippets to get started with Documents and Nodes.
#### Documents
```python
from llama_index import Document, VectorStoreIndex
text_list = [text1, text2, ...]
documents = [Document(text=t) for t in text_list]
# build index
index = VectorStoreIndex.from_documents(documents)
```
#### Nodes
```python
from llama_index.node_parser import SimpleNodeParser
# load documents
...
# parse nodes
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
# build index
index = VectorStoreIndex(nodes)
```
### Document/Node Usage
Take a look at our in-depth guides for more details on how to use Documents/Nodes.
```{toctree}
---
maxdepth: 1
---
usage_documents.md
usage_nodes.md
usage_metadata_extractor.md
```
@@ -0,0 +1,177 @@
# Defining and Customizing Documents
## Defining Documents
Documents can either be created automatically via data loaders, or constructed manually.
By default, all of our [data loaders](/core_modules/data_modules/connector/root.md) (including those offered on LlamaHub) return `Document` objects through the `load_data` function.
```python
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader('./data').load_data()
```
You can also choose to construct documents manually. LlamaIndex exposes the `Document` struct.
```python
from llama_index import Document
text_list = [text1, text2, ...]
documents = [Document(text=t) for t in text_list]
```
To speed up prototyping and development, you can also quickly create a document using some default text:
```python
document = Document.example()
```
## Customizing Documents
This section covers various ways to customize `Document` objects. Since the `Document` object is a subclass of our `TextNode` object, all these settings and details apply to the `TextNode` object class as well.
### Metadata
Documents also offer the chance to include useful metadata. Using the `metadata` dictionary on each document, additional information can be included to help inform responses and track down sources for query responses. This information can be anything, such as filenames or categories. If you are intergrating with a vector database, keep in mind that some vector databases require that the keys must be strings, and the values must be flat (either `str`, `float`, or `int`).
Any information set in the `metadata` dictionary of each document will show up in the `metadata` of each source node created from the document. Additionaly, this information is included in the nodes, enabling the index to utilize it on queries and responses. By default, the metadata is injected into the text for both embedding and LLM model calls.
There are a few ways to set up this dictionary:
1. In the document constructor:
```python
document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)
```
2. After the document is created:
```python
document.metadata = {'filename': '<doc_file_name>'}
```
3. Set the filename automatically using the `SimpleDirectoryReader` and `file_metadata` hook. This will automatically run the hook on each document to set the `metadata` field:
```python
from llama_index import SimpleDirectoryReader
filename_fn = lambda filename: {'file_name': filename}
# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader('./data', file_metadata=filename_fn)
```
### Customizing the id
As detailed in the section [Document Management](../index/document_management.md), the doc `id_` is used to enable effecient refreshing of documents in the index. When using the `SimpleDirectoryReader`, you can automatically set the doc `id_` to be the full path to each document:
```python
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()
print([x.doc_id for x in documents])
```
You can also set the `id_` of any `Document` or `TextNode` directly!
```python
document.id_ = "My new document id!"
```
### Advanced - Metadata Customization
A key detail mentioned above is that by default, any metadata you set is included in the embeddings generation and LLM.
#### Customizing LLM Metadata Text
Typically, a document might have many metadata keys, but you might not want all of them visibile to the LLM during response synthesis. In the above examples, we may not want the LLM to read the `file_name` of our document. However, the `file_name` might include information that will help generate better embeddings. A key advantage of doing this is to bias the embeddings for retrieval without changing what the LLM ends up reading.
We can exclude it like so:
```python
document.excluded_llm_metadata_keys = ['file_name']
```
Then, we can test what the LLM will actually end up reading using the `get_content()` function and specifying `MetadataMode.LLM`:
```python
from llama_index.schema import MetadataMode
print(document.get_content(metadata_mode=MetadataMode.LLM))
```
#### Customizing Embedding Metadata Text
Similar to customing the metadata visibile to the LLM, we can also customize the metadata visible to emebddings. In this case, you can specifically exclude metadata visible to the embedding model, in case you DON'T want particular text to bias the embeddings.
```python
document.excluded_embed_metadata_keys = ['file_name']
```
Then, we can test what the embedding model will actually end up reading using the `get_content()` function and specifying `MetadataMode.EMBED`:
```python
from llama_index.schema import MetadataMode
print(document.get_content(metadata_mode=MetadataMode.EMBED))
```
#### Customizing Metadata Format
As you know by now, metadata is injected into the actual text of each document/node when sent to the LLM or embedding model. By default, the format of this metadata is controlled by three attributes:
1. `Document.metadata_seperator` -> default = `"\n"`
When concatenating all key/value fields of your metadata, this field controls the seperator bewtween each key/value pair.
2. `Document.metadata_template` -> default = `"{key}: {value}"`
This attribute controls how each key/value pair in your metadata is formatted. The two variables `key` and `value` string keys are required.
3. `Document.text_template` -> default = `{metadata_str}\n\n{content}`
Once your metadata is converted into a string using `metadata_seperator` and `metadata_template`, this templates controls what that metadata looks like when joined with the text content of your document/node. The `metadata` and `content` string keys are required.
### Summary
Knowing all this, let's create a short example using all this power:
```python
from llama_index import Document
from llama_index.schema import MetadataMode
document = Document(
text="This is a super-customized document",
metadata={
"file_name": "super_secret_document.txt",
"category": "finance",
"author": "LlamaIndex"
},
excluded_llm_metadata_keys=['file_name'],
metadata_seperator="::",
metadata_template="{key}=>{value}",
text_template="Metadata: {metadata_str}\n-----\nContent: {content}",
)
print("The LLM sees this: \n", document.get_content(metadata_mode=MetadataMode.LLM))
print("The Embedding model sees this: \n", document.get_content(metadata_mode=MetadataMode.EMBED))
```
### Advanced - Automatic Metadata Extraction
We have initial examples of using LLMs themselves to perform metadata extraction.
Take a look here!
```{toctree}
---
maxdepth: 1
---
/examples/metadata_extraction/MetadataExtractionSEC.ipynb
```
@@ -0,0 +1,43 @@
# Automated Metadata Extraction for Nodes
You can use LLMs to automate metadata extraction with our `MetadataExtractor` modules.
Our metadata extractor modules include the following "feature extractors":
- `SummaryExtractor` - automatically extracts a summary over a set of Nodes
- `QuestionsAnsweredExtractor` - extracts a set of questions that each Node can answer
- `TitleExtractor` - extracts a title over the context of each Node
You can use these feature extractors within our overall `MetadataExtractor` class. Then you can plug in the `MetadataExtractor` into our node parser:
```python
from llama_index.node_parser.extractors import (
MetadataExtractor,
TitleExtractor,
QuestionsAnsweredExtractor
)
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128)
metadata_extractor = MetadataExtractor(
extractors=[
TitleExtractor(nodes=5),
QuestionsAnsweredExtractor(questions=3),
],
)
node_parser = SimpleNodeParser(
text_splitter=text_splitter,
metadata_extractor=metadata_extractor,
)
# assume documents are defined -> extract nodes
nodes = node_parser.get_nodes_from_documents(documents)
```
```{toctree}
---
caption: Metadata Extraction Guides
maxdepth: 1
---
/examples/metadata_extraction/MetadataExtractionSEC.ipynb
```
@@ -0,0 +1,35 @@
# Defining and Customizing Nodes
Nodes represent "chunks" of source Documents, whether that is a text chunk, an image, or more. They also contain metadata and relationship information
with other nodes and index structures.
Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to "parse" source Documents into Nodes through our `NodeParser` classes.
For instance, you can do
```python
from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
```
You can also choose to construct Node objects manually and skip the first section. For instance,
```python
from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo
node1 = TextNode(text="<text_chunk>", id_="<node_id>")
node2 = TextNode(text="<text_chunk>", id_="<node_id>")
# set relationships
node1.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=node2.node_id)
node2.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=node1.node_id)
nodes = [node1, node2]
```
The `RelatedNodeInfo` class can also store additional `metadata` if needed:
```python
node2.relationships[NodeRelationship.PARENT] = RelatedNodeInfo(node_id=node1.node_id, metadata={"key": "val"})
```
@@ -0,0 +1,156 @@
# Composability
LlamaIndex offers **composability** of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.
Composability allows you to to define lower-level indices for each document, and higher-order indices over a collection of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list index over each tree index (one document) within your collection.
### Defining Subindices
To see how this works, imagine you have 3 documents: `doc1`, `doc2`, and `doc3`.
```python
from llama_index import SimpleDirectoryReader
doc1 = SimpleDirectoryReader('data1').load_data()
doc2 = SimpleDirectoryReader('data2').load_data()
doc3 = SimpleDirectoryReader('data3').load_data()
```
![](/_static/composability/diagram_b0.png)
Now let's define a tree index for each document. In order to persist the graph later, each index should share the same storage context.
In Python, we have:
```python
from llama_index import TreeIndex
storage_context = storage_context.from_defaults()
index1 = TreeIndex.from_documents(doc1, storage_context=storage_context)
index2 = TreeIndex.from_documents(doc2, storage_context=storage_context)
index3 = TreeIndex.from_documents(doc3, storage_context=storage_context)
```
![](/_static/composability/diagram_b1.png)
### Defining Summary Text
You then need to explicitly define *summary text* for each subindex. This allows
the subindices to be used as Documents for higher-level indices.
```python
index1_summary = "<summary1>"
index2_summary = "<summary2>"
index3_summary = "<summary3>"
```
You may choose to manually specify the summary text, or use LlamaIndex itself to generate
a summary, for instance with the following:
```python
summary = index1.query(
"What is a summary of this document?", retriever_mode="all_leaf"
)
index1_summary = str(summary)
```
**If specified**, this summary text for each subindex can be used to refine the answer during query-time.
### Creating a Graph with a Top-Level Index
We can then create a graph with a list index on top of these 3 tree indices:
We can query, save, and load the graph to/from disk as any other index.
```python
from llama_index.indices.composability import ComposableGraph
graph = ComposableGraph.from_indices(
ListIndex,
[index1, index2, index3],
index_summaries=[index1_summary, index2_summary, index3_summary],
storage_context=storage_context,
)
```
![](/_static/composability/diagram.png)
### Querying the Graph
During a query, we would start with the top-level list index. Each node in the list corresponds to an underlying tree index.
The query will be executed recursively, starting from the root index, then the sub-indices.
The default query engine for each index is called under the hood (i.e. `index.as_query_engine()`), unless otherwise configured by passing `custom_query_engines` to the `ComposableGraphQueryEngine`.
Below we show an example that configure the tree index retrievers to use `child_branch_factor=2` (instead of the default `child_branch_factor=1`).
More detail on how to configure `ComposableGraphQueryEngine` can be found [here](/api_reference/query/query_engines/graph_query_engine.rst).
```python
# set custom retrievers. An example is provided below
custom_query_engines = {
index.index_id: index.as_query_engine(
child_branch_factor=2
)
for index in [index1, index2, index3]
}
query_engine = graph.as_query_engine(
custom_query_engines=custom_query_engines
)
response = query_engine.query("Where did the author grow up?")
```
> Note that specifying custom retriever for index by id
> might require you to inspect e.g., `index1.index_id`.
> Alternatively, you can explicitly set it as follows:
```python
index1.set_index_id("<index_id_1>")
index2.set_index_id("<index_id_2>")
index3.set_index_id("<index_id_3>")
```
![](/_static/composability/diagram_q1.png)
So within a node, instead of fetching the text, we would recursively query the stored tree index to retrieve our answer.
![](/_static/composability/diagram_q2.png)
NOTE: You can stack indices as many times as you want, depending on the hierarchies of your knowledge base!
### [Optional] Persisting the Graph
The graph can also be persisted to storage, and then loaded again when needed. Note that you'll need to set the
ID of the root index, or keep track of the default.
```python
# set the ID
graph.root_index.set_index_id("my_id")
# persist to storage
graph.root_index.storage_context.persist(persist_dir="./storage")
# load
from llama_index import StorageContext, load_graph_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
graph = load_graph_from_storage(storage_context, root_id="my_id")
```
We can take a look at a code example below as well. We first build two tree indices, one over the Wikipedia NYC page, and the other over Paul Graham's essay. We then define a keyword extractor index over the two tree indices.
[Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/ComposableIndices.ipynb).
```{toctree}
---
caption: Examples
maxdepth: 1
---
../../../../examples/composable_indices/ComposableIndices-Prior.ipynb
../../../../examples/composable_indices/ComposableIndices-Weaviate.ipynb
../../../../examples/composable_indices/ComposableIndices.ipynb
```
@@ -0,0 +1,109 @@
# Document Management
Most LlamaIndex index structures allow for **insertion**, **deletion**, **update**, and **refresh** operations.
## Insertion
You can "insert" a new Document into any index data structure, after building the index initially. This document will be broken down into nodes and ingested into the index.
The underlying mechanism behind insertion depends on the index structure. For instance, for the list index, a new Document is inserted as additional node(s) in the list.
For the vector store index, a new Document (and embeddings) is inserted into the underlying document/embedding store.
An example notebook showcasing our insert capabilities is given [here](https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/InsertDemo.ipynb).
In this notebook we showcase how to construct an empty index, manually create Document objects, and add those to our index data structures.
An example code snippet is given below:
```python
from llama_index import ListIndex, Document
index = ListIndex([])
text_chunks = ['text_chunk_1', 'text_chunk_2', 'text_chunk_3']
doc_chunks = []
for i, text in enumerate(text_chunks):
doc = Document(text=text, id_=f"doc_id_{i}")
doc_chunks.append(doc)
# insert
for doc_chunk in doc_chunks:
index.insert(doc_chunk)
```
## Deletion
You can "delete" a Document from most index data structures by specifying a document_id. (**NOTE**: the tree index currently does not support deletion). All nodes corresponding to the document will be deleted.
```python
index.delete_ref_doc("doc_id_0", delete_from_docstore=True)
```
`delete_from_docstore` will default to `False` in case you are sharing nodes betweeen indexes using the same docstore. However, these nodes will not be used when querying when this is set to `False` as they will be deleted from the `index_struct` of the index, which keeps track of which nodes can be used for querying.
## Update
If a Document is already present within an index, you can "update" a Document with the same doc `id_` (for instance, if the information in the Document has changed).
```python
# NOTE: the document has a `doc_id` specified
doc_chunks[0].text = "Brand new document text"
index.update_ref_doc(
doc_chunks[0],
update_kwargs={"delete_kwargs": {'delete_from_docstore': True}}
)
```
Here, we passed some extra kwargs to ensure the document is deleted from the docstore. This is of course optional.
## Refresh
If you set the doc `id_` of each document when loading your data, you can also automatically refresh the index.
The `refresh()` function will only update documents who have the same doc `id_`, but different text contents. Any documents not present in the index at all will also be inserted.
`refresh()` also returns a boolean list, indicating which documents in the input have been refreshed in the index.
```python
# modify first document, with the same doc_id
doc_chunks[0] = Document(text='Super new document text', id_="doc_id_0")
# add a new document
doc_chunks.append(Document(text="This isn't in the index yet, but it will be soon!", id_="doc_id_3"))
# refresh the index
refreshed_docs = index.refresh_ref_docs(
doc_chunks,
update_kwargs={"delete_kwargs": {'delete_from_docstore': True}}
)
# refreshed_docs[0] and refreshed_docs[-1] should be true
```
Again, we passed some extra kwargs to ensure the document is deleted from the docstore. This is of course optional.
If you `print()` the output of `refresh()`, you would see which input documents were refreshed:
```python
print(refreshed_docs)
> [True, False, False, True]
```
This is most useful when you are reading from a directory that is constantly updating with new information.
To autmatically set the doc `id_` when using the `SimpleDirectoryReader`, you can set the `filename_as_id` flag. More details can be found [here](../customization/custom_documents.md).
## Document Tracking
Any index that uses the docstore (i.e. all indexes except for most vector store integrations), you can also see which documents you have inserted into the docstore.
```python
print(index.ref_doc_info)
> {'doc_id_1': RefDocInfo(node_ids=['071a66a8-3c47-49ad-84fa-7010c6277479'], metadata={}),
'doc_id_2': RefDocInfo(node_ids=['9563e84b-f934-41c3-acfd-22e88492c869'], metadata={}),
'doc_id_0': RefDocInfo(node_ids=['b53e6c2f-16f7-4024-af4c-42890e945f36'], metadata={}),
'doc_id_3': RefDocInfo(node_ids=['6bedb29f-15db-4c7c-9885-7490e10aa33f'], metadata={})}
```
Each entry in the output shows the ingested doc `id_`s as keys, and their associated `node_ids` of the nodes they were split into.
Lastly, the orignal `metadata` dictionary of each input document is also tracked. You can read more about the `metadata` attribute in [Customizing Documents](../customization/custom_documents.md).
@@ -0,0 +1,70 @@
# How Each Index Works
This guide describes how each index works with diagrams.
Some terminology:
- **Node**: Corresponds to a chunk of text from a Document. LlamaIndex takes in Document objects and internally parses/chunks them into Node objects.
- **Response Synthesis**: Our module which synthesizes a response given the retrieved Node. You can see how to
[specify different response modes](setting-response-mode) here.
## List Index
The list index simply stores Nodes as a sequential chain.
![](/_static/indices/list.png)
### Querying
During query time, if no other query parameters are specified, LlamaIndex simply loads all Nodes in the list into
our Response Synthesis module.
![](/_static/indices/list_query.png)
The list index does offer numerous ways of querying a list index, from an embedding-based query which
will fetch the top-k neighbors, or with the addition of a keyword filter, as seen below:
![](/_static/indices/list_filter_query.png)
## Vector Store Index
The vector store index stores each Node and a corresponding embedding in a [Vector Store](vector-store-index).
![](/_static/indices/vector_store.png)
### Querying
Querying a vector store index involves fetching the top-k most similar Nodes, and passing
those into our Response Synthesis module.
![](/_static/indices/vector_store_query.png)
## Tree Index
The tree index builds a hierarchical tree from a set of Nodes (which become leaf nodes in this tree).
![](/_static/indices/tree.png)
### Querying
Querying a tree index involves traversing from root nodes down
to leaf nodes. By default, (`child_branch_factor=1`), a query
chooses one child node given a parent node. If `child_branch_factor=2`, a query
chooses two child nodes per level.
![](/_static/indices/tree_query.png)
## Keyword Table Index
The keyword table index extracts keywords from each Node and builds a mapping from
each keyword to the corresponding Nodes of that keyword.
![](/_static/indices/keyword.png)
### Querying
During query time, we extract relevant keywords from the query, and match those with pre-extracted
Node keywords to fetch the corresponding Nodes. The extracted Nodes are passed to our
Response Synthesis module.
![](/_static/indices/keyword_query.png)
File diff suppressed because one or more lines are too long
@@ -0,0 +1,71 @@
# Metadata Extraction
## Introduction
In many cases, especially with long documents, a chunk of text may lack the context necessary to disambiguate the chunk from other similar chunks of text.
To combat this, we use LLMs to extract certain contextual information relevant to the document to better help the retrieval and language models disambiguate similar-looking passages.
We show this in an [example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/metadata_extraction/MetadataExtractionSEC.ipynb) and demonstrate its effectiveness in processing long documents.
## Usage
First, we define a metadata extractor that takes in a list of feature extractors that will be processed in sequence.
We then feed this to the node parser, which will add the additional metadata to each node.
```python
from llama_index.node_parser import SimpleNodeParser
from llama_index.node_parser.extractors import (
MetadataExtractor,
SummaryExtractor,
QuestionsAnsweredExtractor,
TitleExtractor,
KeywordExtractor,
)
metadata_extractor = MetadataExtractor(
extractors=[
TitleExtractor(nodes=5),
QuestionsAnsweredExtractor(questions=3),
SummaryExtractor(summaries=["prev", "self"]),
KeywordExtractor(keywords=10),
],
)
node_parser = SimpleNodeParser(
metadata_extractor=metadata_extractor,
)
```
Here is an sample of extracted metadata:
```
{'page_label': '2',
'file_name': '10k-132.pdf',
'document_title': 'Uber Technologies, Inc. 2019 Annual Report: Revolutionizing Mobility and Logistics Across 69 Countries and 111 Million MAPCs with $65 Billion in Gross Bookings',
'questions_this_excerpt_can_answer': '\n\n1. How many countries does Uber Technologies, Inc. operate in?\n2. What is the total number of MAPCs served by Uber Technologies, Inc.?\n3. How much gross bookings did Uber Technologies, Inc. generate in 2019?',
'prev_section_summary': "\n\nThe 2019 Annual Report provides an overview of the key topics and entities that have been important to the organization over the past year. These include financial performance, operational highlights, customer satisfaction, employee engagement, and sustainability initiatives. It also provides an overview of the organization's strategic objectives and goals for the upcoming year.",
'section_summary': '\nThis section discusses a global tech platform that serves multiple multi-trillion dollar markets with products leveraging core technology and infrastructure. It enables consumers and drivers to tap a button and get a ride or work. The platform has revolutionized personal mobility with ridesharing and is now leveraging its platform to redefine the massive meal delivery and logistics industries. The foundation of the platform is its massive network, leading technology, operational excellence, and product expertise.',
'excerpt_keywords': '\nRidesharing, Mobility, Meal Delivery, Logistics, Network, Technology, Operational Excellence, Product Expertise, Point A, Point B'}
```
## Custom Extractors
If the provided extractors do not fit your needs, you can also define a custom extractor like so:
```python
from llama_index.node_parser.extractors import MetadataFeatureExtractor
class CustomExtractor(MetadataFeatureExtractor):
def extract(self, nodes) -> List[Dict]:
metadata_list = [
{
"custom": node.metadata["document_title"]
+ "\n"
+ node.metadata["excerpt_keywords"]
}
for node in nodes
]
return metadata_list
```
In a more advanced example, it can also make use of an `llm_predictor` to extract features from the node content and the existing metadata. Refer to the [source code of the provided metadata extractors](https://github.com/jerryjliu/llama_index/blob/main/llama_index/node_parser/extractors/metadata_extractors.py) for more details.
@@ -0,0 +1,16 @@
# Module Guides
```{toctree}
---
maxdepth: 1
---
vector_store_guide.ipynb
List Index <./index_guide.md>
Tree Index <./index_guide.md>
Keyword Table Index <./index_guide.md>
/examples/index_structs/knowledge_graph/KnowledgeGraphDemo.ipynb
/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb
SQL Index </examples/index_structs/struct_indices/SQLIndexDemo.ipynb>
/examples/index_structs/struct_indices/duckdb_sql_query.ipynb
/examples/index_structs/doc_summary/DocSummary.ipynb
```
@@ -0,0 +1,56 @@
# Indexes
## Concept
An `Index` is a data structure that allows us to quickly retrieve relevant context for a user query.
For LlamaIndex, it's the core foundation for retrieval-augmented generation (RAG) use-cases.
At a high-level, `Indices` are built from [Documents](/core_modules/data_modules/documents_and_nodes/root.md).
They are used to build [Query Engines](/core_modules/query_modules/query_engine/root.md) and [Chat Engines](/core_modules/query_modules/chat_engines/root.md)
which enables question & answer and chat over your data.
Under the hood, `Indices` store data in `Node` objects (which represent chunks of the original documents), and expose a [Retriever](/core_modules/query_modules/retriever/root.md) interface that supports additional configuration and automation.
For a more in-depth explanation, check out our guide below:
```{toctree}
---
maxdepth: 1
---
index_guide.md
```
## Usage Pattern
Get started with:
```python
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
```
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
```{toctree}
---
maxdepth: 2
---
modules.md
```
## Advanced Concepts
```{toctree}
---
maxdepth: 1
---
composability.md
```
@@ -0,0 +1,88 @@
# Usage Pattern
## Get Started
Build an index from documents:
```python
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
```
```{tip}
To learn how to load documents, see [Data Connectors](/core_modules/data_modules/connector/root.md)
```
### What is happening under the hood?
1. Documents are chunked up and parsed into `Node` objects (which are lightweight abstraction over text str that additional keep track of metadata and relationships).
2. Additional computation is performed to add `Node` into index data structure
> Note: the computation is index-specific.
>
> - For a vector store index, this means calling an embedding model (via API or locally) to compute embedding for the `Node` objects
> - For a document summary index, this means calling an LLM to generate a summary
## Configuring Document Parsing
The most common configuration you might want to change is how to parse document into `Node` objects.
### High-Level API
We can configure our service context to use the desired chunk size and set `show_progress` to display a progress bar during index construction.
```python
from llama_index import ServiceContext, VectorStoreIndex
service_context = ServiceContext.from_defaults(chunk_size=512)
index = VectorStoreIndex.from_documents(
docs,
service_context=service_context,
show_progress=True
)
```
> Note: While the high-level API optimizes for ease-of-use, it does _NOT_ expose full range of configurability.
### Low-Level API
You can use the low-level composition API if you need more granular control.
Here we show an example where you want to both modify the text chunk size, disable injecting metadata, and disable creating `Node` relationships.
The steps are:
1. Configure a node parser
```python
from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser.from_defaults(
chunk_size=512,
include_extra_info=False,
include_prev_next_rel=False,
)
```
2. Parse document into `Node` objects
```python
nodes = parser.get_nodes_from_documents(documents)
```
3. build index from `Node` objects
```python
index = VectorStoreIndex(nodes)
```
## Handling Document Update
Read more about how to deal with data sources that change over time with `Index` **insertion**, **deletion**, **update**, and **refresh** operations.
```{toctree}
---
maxdepth: 1
---
metadata_extraction.md
document_management.md
```
@@ -0,0 +1,321 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vector Store Index\n",
"\n",
"In this guide, we show how to use the vector store index with different vector store\n",
"implementations. \n",
" \n",
"From how to get started with few lines of code with the default\n",
"in-memory vector store with default query configuration, to using a custom hosted vector\n",
"store, with advanced settings such as metadata filters.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Construct vector store and index\n",
"**Default**\n",
"\n",
"By default, `VectorStoreIndex` uses a in-memory `SimpleVectorStore`\n",
"that's initialized as part of the default storage context."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
"\n",
"# Load documents and build index\n",
"documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
"index = VectorStoreIndex.from_documents(documents)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Custom vector stores**\n",
"\n",
"You can use a custom vector store (in this case `PineconeVectorStore`) as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pinecone\n",
"from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext\n",
"from llama_index.vector_stores import PineconeVectorStore\n",
"\n",
"# init pinecone\n",
"pinecone.init(api_key=\"<api_key>\", environment=\"<environment>\")\n",
"pinecone.create_index(\"quickstart\", dimension=1536, metric=\"euclidean\", pod_type=\"p1\")\n",
"\n",
"# construct vector store and customize storage context\n",
"storage_context = StorageContext.from_defaults(\n",
" vector_store=PineconeVectorStore(pinecone.Index(\"quickstart\"))\n",
")\n",
"\n",
"# Load documents and build index\n",
"documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
"index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"For more examples of how to initialize different vector stores, \n",
"see [Vector Store Integrations](/how_to/integrations/vector_stores.md)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to external vector stores (with existing embeddings)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have already computed embeddings and dumped them into an external vector store (e.g. Pinecone, Chroma), you can use it with LlamaIndex by:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vector_store = PineconeVectorStore(pinecone.Index(\"quickstart\"))\n",
"index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Query\n",
"**Default** \n",
"\n",
"You can start querying by getting the default query engine:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query_engine = index.as_query_engine()\n",
"response = query_engine.query(\"What did the author do growing up?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Configure standard query setting** \n",
"\n",
"To configure query settings, you can directly pass it as\n",
"keyword args when building the query engine: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
"\n",
"query_engine = index.as_query_engine(\n",
" similarity_top_k=3,\n",
" vector_store_query_mode=\"default\",\n",
" filters=MetadataFilters(\n",
" filters=[\n",
" ExactMatchFilter(key=\"name\", value=\"paul graham\"),\n",
" ]\n",
" ),\n",
" alpha=None,\n",
" doc_ids=None,\n",
")\n",
"response = query_engine.query(\"what did the author do growing up?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that metadata filtering is applied against metadata specified in `Node.metadata`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, if you are using the lower-level compositional API:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index import get_response_synthesizer\n",
"from llama_index.indices.vector_store.retrievers import VectorIndexRetriever\n",
"from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
"\n",
"# build retriever\n",
"retriever = VectorIndexRetriever(\n",
" index=index,\n",
" similarity_top_k=3,\n",
" vector_store_query_mode=\"default\",\n",
" filters=[ExactMatchFilter(key=\"name\", value=\"paul graham\")],\n",
" alpha=None,\n",
" doc_ids=None,\n",
")\n",
"\n",
"# build query engine\n",
"query_engine = RetrieverQueryEngine(\n",
" retriever=retriever, response_synthesizer=get_response_synthesizer()\n",
")\n",
"\n",
"# query\n",
"response = query_engine.query(\"what did the author do growing up?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Configure vector store specific keyword arguments** \n",
"\n",
"You can customize keyword arguments unique to a specific vector store implementation as well by passing in `vector_store_kwargs`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query_engine = index.as_query_engine(\n",
" similarity_top_k=3,\n",
" # only works for pinecone\n",
" vector_store_kwargs={\n",
" \"filter\": {\"name\": \"paul graham\"},\n",
" },\n",
")\n",
"response = query_engine.query(\"what did the author do growing up?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Use an auto retriever**\n",
"\n",
"You can also use an LLM to automatically decide query setting for you! \n",
"Right now, we support automatically setting exact match metadata filters and top k parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index import get_response_synthesizer\n",
"from llama_index.indices.vector_store.retrievers import VectorIndexAutoRetriever\n",
"from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
"from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo\n",
"\n",
"\n",
"vector_store_info = VectorStoreInfo(\n",
" content_info=\"brief biography of celebrities\",\n",
" metadata_info=[\n",
" MetadataInfo(\n",
" name=\"category\",\n",
" type=\"str\",\n",
" description=\"Category of the celebrity, one of [Sports, Entertainment, Business, Music]\",\n",
" ),\n",
" MetadataInfo(\n",
" name=\"country\",\n",
" type=\"str\",\n",
" description=\"Country of the celebrity, one of [United States, Barbados, Portugal]\",\n",
" ),\n",
" ],\n",
")\n",
"\n",
"# build retriever\n",
"retriever = VectorIndexAutoRetriever(index, vector_store_info=vector_store_info)\n",
"\n",
"# build query engine\n",
"query_engine = RetrieverQueryEngine(\n",
" retriever=retriever, response_synthesizer=get_response_synthesizer()\n",
")\n",
"\n",
"# query\n",
"response = query_engine.query(\"Tell me about two celebrities from United States\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llama",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
@@ -0,0 +1,24 @@
# Node Parser
## Concept
Node parsers are a simple abstraction that take a list of documents, and chunk them into `Node` objects, such that each node is a specific size. When a document is broken into nodes, all of it's attributes are inherited to the children nodes (i.e. `metadata`, text and metadata templates, etc.). You can read more about `Node` and `Document` properies [here](/core_modules/data_modules/documents_and_nodes/root.md).
A node parser can configure the chunk size (in tokens) as well as any overlap between chunked nodes. The chunking is done by using a `TokenTextSplitter`, which default to a chunk size of 1024 and a default chunk overlap of 20 tokens.
## Usage Pattern
```python
from llama_index.node_parser import SimpleNodeParser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
```
You can find more usage details and availbale customization options below.
```{toctree}
---
maxdepth: 1
---
usage_pattern.md
```
@@ -0,0 +1,80 @@
# Usage Pattern
## Getting Started
Node parsers can be used on their own:
```python
from llama_index import Document
from llama_index.node_parser import SimpleNodeParser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents([Document(text="long text")], show_progress=False)
```
Or set inside a `ServiceContext` to be used automatically when an index is constructed using `.from_documents()`:
```python
from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
from llama_index.node_parser import SimpleNodeParser
documents = SimpleDirectoryReader("./data").load_data()
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
service_context = ServiceContext.from_defaults(node_parser=node_parser)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
```
## Customization
There are several options available to customize:
- `text_spliiter` (defaults to `TokenTextSplitter`) - the text splitter used to split text into chunks.
- `include_metadata` (defaults to `True`) - whether or not `Node`s should inherit the document metadata.
- `include_prev_next_rel` (defaults to `True`) - whether or not to include previous/next relationships between chunked `Node`s
- `metadata_extractor` (defaults to `None`) - extra processing to extract helpful metadata. See [here for details](/core_modules/data_modules/documents_and_nodes/usage_metadata_extractor.md).
If you don't want to change the `text_splitter`, you can use `SimpleNodeParser.from_defaults()` to easily change the chunk size and chunk overlap. The defaults are 1024 and 20 respectively.
```python
from llama_index.node_parser import SimpleNodeParser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
```
### Text Splitter Customization
If you do customize the `text_splitter` from the default `TokenTextSplitter`, you can use any splitter from langchain, or optionally our `SentenceSplitter`. Each text splitter has options for the default seperator, as well as options for backup seperators. These are useful for languages that are sufficiently different from English.
`TokenTextSplitter` configuration:
```python
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(
seperator=" ",
chunk_size=1024,
chunk_overlap=20,
backup_seperators=["\n"]
)
node_parser = SimpleNodeParser(text_splitter=text_splitter)
```
`SentenceSplitter` configuration:
```python
from llama_index.langchain_helpers.text_splitter import SentenceSplitter
text_splitter = SentenceSplitter(
seperator=" ",
chunk_size=1024,
chunk_overlap=20,
backup_seperators=["\n"],
paragraph_seperator="\n\n\n"
)
node_parser = SimpleNodeParser(text_splitter=text_splitter)
```
@@ -0,0 +1,134 @@
# Customizing Storage
By default, LlamaIndex hides away the complexities and let you query your data in under 5 lines of code:
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the documents.")
```
Under the hood, LlamaIndex also supports a swappable **storage layer** that allows you to customize where ingested documents (i.e., `Node` objects), embedding vectors, and index metadata are stored.
![](/_static/storage/storage.png)
### Low-Level API
To do this, instead of the high-level API,
```python
index = VectorStoreIndex.from_documents(documents)
```
we use a lower-level API that gives more granular control:
```python
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore
from llama_index.node_parser import SimpleNodeParser
# create parser and parse document into nodes
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
# create storage context using default stores
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore(),
vector_store=SimpleVectorStore(),
index_store=SimpleIndexStore(),
)
# create (or load) docstore and add nodes
storage_context.docstore.add_documents(nodes)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
# save index
index.storage_context.persist(persist_dir="<persist_dir>")
# can also set index_id to save multiple indexes to the same folder
index.set_index_id = "<index_id>"
index.storage_context.persist(persist_dir="<persist_dir>")
# to load index later, make sure you setup the storage context
# this will loaded the persisted stores from persist_dir
storage_context = StorageContext.from_defaults(
persist_dir="<persist_dir>"
)
# then load the index object
from llama_index import load_index_from_storage
loaded_index = load_index_from_storage(storage_context)
# if loading an index from a persist_dir containing multiple indexes
loaded_index = load_index_from_storage(storage_context, index_id="<index_id>")
# if loading multiple indexes from a persist dir
loaded_indicies = load_index_from_storage(storage_context, index_ids=["<index_id>", ...])
```
You can customize the underlying storage with a one-line change to instantiate different document stores, index stores, and vector stores.
See [Document Stores](./docstores.md), [Vector Stores](./vector_stores.md), [Index Stores](./index_stores.md) guides for more details.
For saving and loading a graph/composable index, see the [full guide here](../index/composability.md).
### Vector Store Integrations and Storage
Most of our vector store integrations store the entire index (vectors + text) in the vector store itself. This comes with the major benefit of not having to exlicitly persist the index as shown above, since the vector store is already hosted and persisting the data in our index.
The vector stores that support this practice are:
- ChatGPTRetrievalPluginClient
- ChromaVectorStore
- DocArrayHnswVectorStore
- DocArrayInMemoryVectorStore
- LanceDBVectorStore
- MetalVectorStore
- MilvusVectorStore
- MyScaleVectorStore
- OpensearchVectorStore
- PineconeVectorStore
- QdrantVectorStore
- RedisVectorStore
- WeaviateVectorStore
A small example using Pinecone is below:
```python
import pinecone
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import PineconeVectorStore
# Creating a Pinecone index
api_key = "api_key"
pinecone.init(api_key=api_key, environment="us-west1-gcp")
pinecone.create_index(
"quickstart",
dimension=1536,
metric="euclidean",
pod_type="p1"
)
index = pinecone.Index("quickstart")
# construct vector store
vector_store = PineconeVectorStore(pinecone_index=index)
# create storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# load documents
documents = SimpleDirectoryReader("./data").load_data()
# create index, which will insert documents/vectors to pinecone
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
```
If you have an existing vector store with data already loaded in,
you can connect to it and directly create a `VectorStoreIndex` as follows:
```python
index = pinecone.Index("quickstart")
vector_store = PineconeVectorStore(pinecone_index=index)
loaded_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
```
@@ -0,0 +1,74 @@
# Document Stores
Document stores contain ingested document chunks, which we call `Node` objects.
See the [API Reference](/api_reference/storage/docstore.rst) for more details.
### Simple Document Store
By default, the `SimpleDocumentStore` stores `Node` objects in-memory.
They can be persisted to (and loaded from) disk by calling `docstore.persist()` (and `SimpleDocumentStore.from_persist_path(...)` respectively).
### MongoDB Document Store
We support MongoDB as an alternative document store backend that persists data as `Node` objects are ingested.
```python
from llama_index.storage.docstore import MongoDocumentStore
from llama_index.node_parser import SimpleNodeParser
# create parser and parse document into nodes
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
# create (or load) docstore and add nodes
docstore = MongoDocumentStore.from_uri(uri="<mongodb+srv://...>")
docstore.add_documents(nodes)
# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
```
Under the hood, `MongoDocumentStore` connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your nodes.
> Note: You can configure the `db_name` and `namespace` when instantiating `MongoDocumentStore`, otherwise they default to `db_name="db_docstore"` and `namespace="docstore"`.
Note that it's not necessary to call `storage_context.persist()` (or `docstore.persist()`) when using an `MongoDocumentStore`
since data is persisted by default.
You can easily reconnect to your MongoDB collection and reload the index by re-initializing a `MongoDocumentStore` with an existing `db_name` and `collection_name`.
A more complete example can be found [here](../../examples/docstore/MongoDocstoreDemo.ipynb)
### Redis Document Store
We support Redis as an alternative document store backend that persists data as `Node` objects are ingested.
```python
from llama_index.storage.docstore import RedisDocumentStore
from llama_index.node_parser import SimpleNodeParser
# create parser and parse document into nodes
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
# create (or load) docstore and add nodes
docstore = RedisDocumentStore.from_host_and_port(
host="127.0.0.1",
port="6379",
namespace='llama_index'
)
docstore.add_documents(nodes)
# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
```
Under the hood, `RedisDocumentStore` connects to a redis database and adds your nodes to a namespace stored under `{namespace}/docs`.
> Note: You can configure the `namespace` when instantiating `RedisDocumentStore`, otherwise it defaults `namespace="docstore"`.
You can easily reconnect to your Redis client and reload the index by re-initializing a `RedisDocumentStore` with an existing `host`, `port`, and `namespace`.
A more complete example can be found [here](../../examples/docstore/RedisDocstoreIndexStoreDemo.ipynb)
@@ -0,0 +1,75 @@
# Index Stores
Index stores contains lightweight index metadata (i.e. additional state information created when building an index).
See the [API Reference](/api_reference/storage/index_store.rst) for more details.
### Simple Index Store
By default, LlamaIndex uses a simple index store backed by an in-memory key-value store.
They can be persisted to (and loaded from) disk by calling `index_store.persist()` (and `SimpleIndexStore.from_persist_path(...)` respectively).
### MongoDB Index Store
Similarly to document stores, we can also use `MongoDB` as the storage backend of the index store.
```python
from llama_index.storage.index_store import MongoIndexStore
from llama_index import VectorStoreIndex
# create (or load) index store
index_store = MongoIndexStore.from_uri(uri="<mongodb+srv://...>")
# create storage context
storage_context = StorageContext.from_defaults(index_store=index_store)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
# or alternatively, load index
from llama_index import load_index_from_storage
index = load_index_from_storage(storage_context)
```
Under the hood, `MongoIndexStore` connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your index metadata.
> Note: You can configure the `db_name` and `namespace` when instantiating `MongoIndexStore`, otherwise they default to `db_name="db_docstore"` and `namespace="docstore"`.
Note that it's not necessary to call `storage_context.persist()` (or `index_store.persist()`) when using an `MongoIndexStore`
since data is persisted by default.
You can easily reconnect to your MongoDB collection and reload the index by re-initializing a `MongoIndexStore` with an existing `db_name` and `collection_name`.
A more complete example can be found [here](../../examples/docstore/MongoDocstoreDemo.ipynb)
### Redis Index Store
We support Redis as an alternative document store backend that persists data as `Node` objects are ingested.
```python
from llama_index.storage.index_store import RedisIndexStore
from llama_index import VectorStoreIndex
# create (or load) docstore and add nodes
index_store = RedisIndexStore.from_host_and_port(
host="127.0.0.1",
port="6379",
namespace='llama_index'
)
# create storage context
storage_context = StorageContext.from_defaults(index_store=index_store)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
# or alternatively, load index
from llama_index import load_index_from_storage
index = load_index_from_storage(storage_context)
```
Under the hood, `RedisIndexStore` connects to a redis database and adds your nodes to a namespace stored under `{namespace}/index`.
> Note: You can configure the `namespace` when instantiating `RedisIndexStore`, otherwise it defaults `namespace="index_store"`.
You can easily reconnect to your Redis client and reload the index by re-initializing a `RedisIndexStore` with an existing `host`, `port`, and `namespace`.
A more complete example can be found [here](../../examples/docstore/RedisDocstoreIndexStoreDemo.ipynb)
@@ -0,0 +1,11 @@
# Key-Value Stores
Key-Value stores are the underlying storage abstractions that power our [Document Stores](./docstores.md) and [Index Stores](./index_stores.md).
We provide the following key-value stores:
- **Simple Key-Value Store**: An in-memory KV store. The user can choose to call `persist` on this kv store to persist data to disk.
- **MongoDB Key-Value Store**: A MongoDB KV store.
See the [API Reference](/api_reference/storage/kv_store.rst) for more details.
Note: At the moment, these storage abstractions are not externally facing.
@@ -0,0 +1,91 @@
# Storage
## Concept
LlamaIndex provides a high-level interface for ingesting, indexing, and querying your external data.
Under the hood, LlamaIndex also supports swappable **storage components** that allows you to customize:
- **Document stores**: where ingested documents (i.e., `Node` objects) are stored,
- **Index stores**: where index metadata are stored,
- **Vector stores**: where embedding vectors are stored.
The Document/Index stores rely on a common Key-Value store abstraction, which is also detailed below.
LlamaIndex supports persisting data to any storage backend supported by [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html).
We have confirmed support for the following storage backends:
- Local filesystem
- AWS S3
- Cloudflare R2
![](/_static/storage/storage.png)
## Usage Pattern
Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store. This *also* means that you will not need to explicitly persist this data - this happens automatically. Usage would look something like the following to build a new index / reload an existing one.
```python
## build a new index
from llama_index import VectorStoreIndex, StorageContext
from llama_index.vector_stores import DeepLakeVectorStore
# construct vector store and customize storage context
vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
storage_context = StorageContext.from_defaults(
vector_store = vector_store
)
# Load documents and build index
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
## reload an existing one
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
```
See our [Vector Store Module Guide](vector_stores.md) below for more details.
Note that in general to use storage abstractions, you need to define a `StorageContext` object:
```python
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore
from llama_index.storage import StorageContext
# create storage context using default stores
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore(),
vector_store=SimpleVectorStore(),
index_store=SimpleIndexStore(),
)
```
More details on customization/persistence can be found in the guides below.
```{toctree}
---
maxdepth: 1
---
customization.md
save_load.md
```
## Modules
We offer in-depth guides on the different storage components.
```{toctree}
---
maxdepth: 1
---
vector_stores.md
docstores.md
index_stores.md
kv_stores.md
```
@@ -0,0 +1,93 @@
# Persisting & Loading Data
## Persisting Data
By default, LlamaIndex stores data in-memory, and this data can be explicitly persisted if desired:
```python
storage_context.persist(persist_dir="<persist_dir>")
```
This will persist data to disk, under the specified `persist_dir` (or `./storage` by default).
Multiple indexes can be persisted and loaded from the same directory, assuming you keep track of index ID's for loading.
User can also configure alternative storage backends (e.g. `MongoDB`) that persist data by default.
In this case, calling `storage_context.persist()` will do nothing.
## Loading Data
To load data, user simply needs to re-create the storage context using the same configuration (e.g. pass in the same `persist_dir` or vector store client).
```python
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore.from_persist_dir(persist_dir="<persist_dir>"),
vector_store=SimpleVectorStore.from_persist_dir(persist_dir="<persist_dir>"),
index_store=SimpleIndexStore.from_persist_dir(persist_dir="<persist_dir>"),
)
```
We can then load specific indices from the `StorageContext` through some convenience functions below.
```python
from llama_index import load_index_from_storage, load_indices_from_storage, load_graph_from_storage
# load a single index
# need to specify index_id if multiple indexes are persisted to the same directory
index = load_index_from_storage(storage_context, index_id="<index_id>")
# don't need to specify index_id if there's only one index in storage context
index = load_index_from_storage(storage_context)
# load multiple indices
indices = load_indices_from_storage(storage_context) # loads all indices
indices = load_indices_from_storage(storage_context, index_ids=[index_id1, ...]) # loads specific indices
# load composable graph
graph = load_graph_from_storage(storage_context, root_id="<root_id>") # loads graph with the specified root_id
```
Here's the full [API Reference on saving and loading](/api_reference/storage/indices_save_load.rst).
## Using a remote backend
By default, LlamaIndex uses a local filesystem to load and save files. However, you can override this by passing a `fsspec.AbstractFileSystem` object.
Here's a simple example, instantiating a vector store:
```python
import dotenv
import s3fs
import os
dotenv.load_dotenv("../../../.env")
# load documents
documents = SimpleDirectoryReader('../../../examples/paul_graham_essay/data/').load_data()
print(len(documents))
index = VectorStoreIndex.from_documents(documents)
```
At this point, everything has been the same. Now - let's instantiate a S3 filesystem and save / load from there.
```python
# set up s3fs
AWS_KEY = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET = os.environ['AWS_SECRET_ACCESS_KEY']
R2_ACCOUNT_ID = os.environ['R2_ACCOUNT_ID']
assert AWS_KEY is not None and AWS_KEY != ""
s3 = s3fs.S3FileSystem(
key=AWS_KEY,
secret=AWS_SECRET,
endpoint_url=f'https://{R2_ACCOUNT_ID}.r2.cloudflarestorage.com',
s3_additional_kwargs={'ACL': 'public-read'}
)
# save index to remote blob storage
index.set_index_id("vector_index")
# this is {bucket_name}/{index_name}
index.storage_context.persist('llama-index/storage_demo', fs=s3)
# load index from s3
sc = StorageContext.from_defaults(persist_dir='llama-index/storage_demo', fs=s3)
index2 = load_index_from_storage(sc, 'vector_index')
```
By default, if you do not pass a filesystem, we will assume a local filesystem.
@@ -0,0 +1,65 @@
# Vector Stores
Vector stores contain embedding vectors of ingested document chunks
(and sometimes the document chunks as well).
## Simple Vector Store
By default, LlamaIndex uses a simple in-memory vector store that's great for quick experimentation.
They can be persisted to (and loaded from) disk by calling `vector_store.persist()` (and `SimpleVectorStore.from_persist_path(...)` respectively).
## Third-Party Vector Store Integrations
We also integrate with a wide range of vector store implementations.
They mainly differ in 2 aspects:
1. in-memory vs. hosted
2. stores only vector embeddings vs. also stores documents
### In-Memory Vector Stores
* Faiss
* Chroma
### (Self) Hosted Vector Stores
* Pinecone
* Weaviate
* Milvus/Zilliz
* Qdrant
* Chroma
* Opensearch
* DeepLake
* MyScale
* Tair
* DocArray
* MongoDB Atlas
### Others
* ChatGPTRetrievalPlugin
For more details, see [Vector Store Integrations](/community/integrations/vector_stores.md).
```{toctree}
---
caption: Examples
maxdepth: 1
---
/examples/vector_stores/SimpleIndexDemo.ipynb
/examples/vector_stores/QdrantIndexDemo.ipynb
/examples/vector_stores/FaissIndexDemo.ipynb
/examples/vector_stores/DeepLakeIndexDemo.ipynb
/examples/vector_stores/MyScaleIndexDemo.ipynb
/examples/vector_stores/MetalIndexDemo.ipynb
/examples/vector_stores/WeaviateIndexDemo.ipynb
/examples/vector_stores/OpensearchDemo.ipynb
/examples/vector_stores/PineconeIndexDemo.ipynb
/examples/vector_stores/ChromaIndexDemo.ipynb
/examples/vector_stores/LanceDBIndexDemo.ipynb
/examples/vector_stores/MilvusIndexDemo.ipynb
/examples/vector_stores/RedisIndexDemo.ipynb
/examples/vector_stores/WeaviateIndexDemo-Hybrid.ipynb
/examples/vector_stores/PineconeIndexDemo-Hybrid.ipynb
/examples/vector_stores/AsyncIndexCreationDemo.ipynb
/examples/vector_stores/TairIndexDemo.ipynb
/examples/vector_stores/SupabaseVectorIndexDemo.ipynb
/examples/vector_stores/DocArrayHnswIndexDemo.ipynb
/examples/vector_stores/DocArrayInMemoryIndexDemo.ipynb
/examples/vector_stores/MongoDBAtlasVectorSearch.ipynb
```
@@ -0,0 +1,13 @@
# Modules
We support integrations with OpenAI, Azure, and anything LangChain offers.
```{toctree}
---
maxdepth: 1
---
/examples/embeddings/OpenAI.ipynb
/examples/embeddings/Langchain.ipynb
/examples/customization/llms/AzureOpenAI.ipynb
/examples/embeddings/custom_embeddings.ipynb
```
@@ -0,0 +1,42 @@
# Embeddings
## Concept
Embeddings are used in LlamaIndex to represent your documents using a sophistacted numerical representation. Embedding models take text as input, and return a long list of numbers used to caputre the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!
At a high level, if a user asks a question about dogs, then the embedding for that question will be highly similar to text that talks about dogs.
When calculating the similarity between embeddings, there are many methods to use (dot product, cosine similarity, etc.). By default, LlamaIndex uses cosine similarity when comparing embeddings.
There are many embedding models to pick from. By default, LlamaIndex uses `text-embedding-ada-002` from OpenAI. We also support any embedding model offered by Langchain [here](https://python.langchain.com/docs/modules/data_connection/text_embedding/), as well as providing an easy to extend base class for implementing your own embeddings.
## Usage Pattern
Most commonly in LlamaIndex, embedding models will be specified in the `ServiceContext` object, and then used in a vector index. The embedding model will be used to embed the documents used during index construction, as well as embedding any queries you make using the query engine later on.
```python
from llama_index import ServiceContext
from llama_index.embeddings import OpenAIEmbedding
embed_model = OpenAIEmbedding()
service_context = serviceContext.from_defaults(embed_model=embed_model)
```
You can find more usage details and availbale customization options below.
```{toctree}
---
maxdepth: 1
---
usage_pattern.md
```
## Modules
We support integrations with OpenAI, Azure, and anything LangChain offers. Details below.
```{toctree}
---
maxdepth: 1
---
modules.md
```
@@ -0,0 +1,101 @@
# Usage Pattern
## Getting Started
The most common usage for an embedding model will be setting it in the service context object, and then using it to construct an index and query. The input documents will be broken into nodes, and the emedding model will generate an embedding for each node.
By default, LlamaIndex will use `text-embedding-ada-002`, which is what the example below manually sets up for you.
```python
from llama_index import ServiceContext, VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings import OpenAIEmbedding
embed_model = OpenAIEmbedding()
service_context = serviceContext.from_defaults(embed_model=embed_model)
# optionally set a global service context to avoid passing it into other objects every time
from llama_index import set_global_service_context
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
```
Then, at query time, the embedding model will be used again to embed the query text.
```python
query_engine = index.as_query_engine()
response = query_engine.query("query string")
```
## Customization
### Batch Size
By default, embeddings requests are sent to OpenAI in batches of 10. For some users, this may (rarely) incur a rate limit. For other users embedding many documents, this batch size may be too small.
```python
# set the batch size to 42
embed_model = OpenAIEmbedding(embed_batch_size=42)
```
### Embedding Model Integrations
We also support any embeddings offered by Langchain [here](https://python.langchain.com/docs/modules/data_connection/text_embedding/), using our `LangchainEmbedding` wrapper class.
The example below loads a model from Hugging Face, using Langchain's embedding class.
```python
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext
embed_model = LangchainEmbedding(
HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)
service_context = ServiceContext.from_defaults(embed_model=embed_model)
```
### Custom Embedding Model
If you wanted to use embeddings not offered by LlamaIndex or Langchain, you can also extend our base embeddings class and implement your own!
The example below uses Instructor Embeddings ([install/setup details here](https://huggingface.co/hkunlp/instructor-large)), and implements a custom embeddings class. Instructor embeddings work by providing text, as well as "instructions" on the domain of the text to embed. This is helpful when embedding text from a very specific and specialized topic.
```python
from typing import Any, List
from InstructorEmbedding import INSTRUCTOR
from llama_index.embeddings.base import BaseEmbedding
class InstructorEmbeddings(BaseEmbedding):
def __init__(
self,
instructor_model_name: str = "hkunlp/instructor-large",
instruction: str = "Represent the Computer Science documentation or question:",
**kwargs: Any,
) -> None:
self._model = INSTRUCTOR(instructor_model_name)
self._instruction = instruction
super().__init__(**kwargs)
def _get_query_embedding(self, query: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, query]])
return embeddings[0]
def _get_text_embedding(self, text: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, text]])
return embeddings[0]
def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
embeddings = self._model.encode([[self._instruction, text] for text in texts])
return embeddings
```
## Standalone Usage
You can also use embeddings as a standalone module for your project, existing application, or general testing and exploration.
```python
embeddings = embed_model.get_text_embedding("It is raining cats and dogs here!")
```
@@ -0,0 +1,51 @@
# Modules
We support integrations with OpenAI, Anthropic, Hugging Face, PaLM, and more.
## OpenAI
```{toctree}
---
maxdepth: 1
---
/examples/llm/openai.ipynb
/examples/llm/azure_openai.ipynb
```
## Anthropic
```{toctree}
---
maxdepth: 1
---
/examples/llm/anthropic.ipynb
```
## Hugging Face
```{toctree}
---
maxdepth: 1
---
/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb
/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb
```
## PaLM
```{toctree}
---
maxdepth: 1
---
/examples/llm/palm.ipynb
```
## LangChain
```{toctree}
---
maxdepth: 1
---
/examples/llm/langchain.ipynb
@@ -0,0 +1,49 @@
# LLM
## Concept
Picking the proper Large Language Model (LLM) is one of the first steps you need to consider when building any LLM application over your data.
LLMs are a core component of LlamaIndex. They can be used as standalone modules or plugged into other core LlamaIndex modules (indices, retrievers, query engines). They are always used during the response synthesis step (e.g. after retrieval). Depending on the type of index being used, LLMs may also be used during index construction, insertion, and query traversal.
LlamaIndex provides a unified interface for defining LLM modules, whether it's from OpenAI, Hugging Face, or LangChain, so that you
don't have to write the boilerplate code of defining the LLM interface yourself. This interface consists of the following (more details below):
- Support for **text completion** and **chat** endpoints (details below)
- Support for **streaming** and **non-streaming** endpoints
- Support for **synchronous** and **asynchronous** endpoints
## Usage Pattern
The following code snippet shows how you can get started using LLMs.
```python
from llama_index.llms import OpenAI
# non-streaming
resp = OpenAI().complete('Paul Graham is ')
print(resp)
```
You can use the LLM as a standalone module or with other LlamaIndex abstractions. Check out our guide below.
```{toctree}
---
maxdepth: 1
---
usage_standalone.md
usage_custom.md
```
## Modules
We support integrations with OpenAI, Hugging Face, PaLM, and more.
```{toctree}
---
maxdepth: 2
---
modules.md
```
@@ -0,0 +1,248 @@
# Customizing LLMs within LlamaIndex Abstractions
You can plugin these LLM abstractions within our other modules in LlamaIndex (indexes, retrievers, query engines, agents) which allow you to build advanced workflows over your data.
By default, we use OpenAI's `text-davinci-003` model. But you may choose to customize
the underlying LLM being used.
Below we show a few examples of LLM customization. This includes
- changing the underlying LLM
- changing the number of output tokens (for OpenAI, Cohere, or AI21)
- having more fine-grained control over all parameters for any LLM, from context window to chunk overlap
## Example: Changing the underlying LLM
An example snippet of customizing the LLM being used is shown below.
In this example, we use `text-davinci-002` instead of `text-davinci-003`. Available models include `text-davinci-003`,`text-curie-001`,`text-babbage-001`,`text-ada-001`, `code-davinci-002`,`code-cushman-001`.
Note that
you may also plug in any LLM shown on Langchain's
[LLM](https://python.langchain.com/en/latest/modules/models/llms/integrations.html) page.
```python
from llama_index import (
KeywordTableIndex,
SimpleDirectoryReader,
LLMPredictor,
ServiceContext
)
from llama_index.llms import OpenAI
# alternatively
# from langchain.llms import ...
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm)
# build index
index = KeywordTableIndex.from_documents(documents, service_context=service_context)
# get response from query
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do after his time at Y Combinator?")
```
## Example: Changing the number of output tokens (for OpenAI, Cohere, AI21)
The number of output tokens is usually set to some low number by default (for instance,
with OpenAI the default is 256).
For OpenAI, Cohere, AI21, you just need to set the `max_tokens` parameter
(or maxTokens for AI21). We will handle text chunking/calculations under the hood.
```python
from llama_index import (
KeywordTableIndex,
SimpleDirectoryReader,
ServiceContext
)
from llama_index.llms import OpenAI
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm = OpenAI(temperature=0, model="text-davinci-002", max_tokens=512)
service_context = ServiceContext.from_defaults(llm=llm)
```
## Example: Explicitly configure `context_window` and `num_output`
If you are using other LLM classes from langchain, you may need to explicitly configure the `context_window` and `num_output` via the `ServiceContext` since the information is not available by default.
```python
from llama_index import (
KeywordTableIndex,
SimpleDirectoryReader,
ServiceContext
)
from llama_index.llms import OpenAI
# alternatively
# from langchain.llms import ...
documents = SimpleDirectoryReader('data').load_data()
# set context window
context_window = 4096
# set number of output tokens
num_output = 256
# define LLM
llm = OpenAI(
temperature=0,
model="text-davinci-002",
max_tokens=num_output,
)
service_context = ServiceContext.from_defaults(
llm=llm,
context_window=context_window,
num_output=num_output,
)
```
## Example: Using a HuggingFace LLM
LlamaIndex supports using LLMs from HuggingFace directly. Note that for a completely private experience, also setup a local embedding model (example [here](embeddings.md#custom-embeddings)).
Many open-source models from HuggingFace require either some preamble before before each prompt, which is a `system_prompt`. Additionally, queries themselves may need an additional wrapper around the `query_str` itself. All this information is usually available from the HuggingFace model card for the model you are using.
Below, this example uses both the `system_prompt` and `query_wrapper_prompt`, using specific prompts from the model card found [here](https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b).
```python
from llama_index.prompts.prompts import SimpleInputPrompt
system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""
# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")
import torch
from llama_index.llms import HuggingFaceLLM
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256,
generate_kwargs={"temperature": 0.7, "do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
model_name="StabilityAI/stablelm-tuned-alpha-3b",
device_map="auto",
stopping_ids=[50278, 50279, 50277, 1, 0],
tokenizer_kwargs={"max_length": 4096},
# uncomment this if using CUDA to reduce memory usage
# model_kwargs={"torch_dtype": torch.float16}
)
service_context = ServiceContext.from_defaults(
chunk_size=1024,
llm=llm,
)
```
Some models will raise errors if all the keys from the tokenizer are passed to the model. A common tokenizer output that causes issues is `token_type_ids`. Below is an example of configuring the predictor to remove this before passing the inputs to the model:
```python
HuggingFaceLLM(
...
tokenizer_outputs_to_remove=["token_type_ids"]
)
```
A full API reference can be found [here](../../reference/llm_predictor.rst).
Several example notebooks are also listed below:
- [StableLM](/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb)
- [Camel](/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb)
## Example: Using a Custom LLM Model - Advanced
To use a custom LLM model, you only need to implement the `LLM` class (or `CustomLLM` for a simpler interface)
You will be responsible for passing the text to the model and returning the newly generated tokens.
Note that for a completely private experience, also setup a local embedding model (example [here](embeddings.md#custom-embeddings)).
Here is a small example using locally running facebook/OPT model and Huggingface's pipeline abstraction:
```python
import torch
from transformers import pipeline
from typing import Optional, List, Mapping, Any
from llama_index import (
ServiceContext,
SimpleDirectoryReader,
LangchainEmbedding,
ListIndex
)
from llama_index.llms import CustomLLM, CompletionResponse, LLMMetadata
# set context window size
context_window = 2048
# set number of output tokens
num_output = 256
# store the pipeline/model outisde of the LLM class to avoid memory issues
model_name = "facebook/opt-iml-max-30b"
pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})
class OurLLM(CustomLLM):
@property
def metadata(self) -> LLMMetadata:
"""Get LLM metadata."""
return LLMMetadata(
context_window=context_window, num_output=num_output
)
def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
prompt_length = len(prompt)
response = pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
# only return newly generated tokens
text = response[prompt_length:]
return CompletionResponse(text=text)
def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
raise NotImplementedError()
# define our LLM
llm = OurLLM()
service_context = ServiceContext.from_defaults(
llm=llm,
context_window=context_window,
num_output=num_output
)
# Load the your data
documents = SimpleDirectoryReader('./data').load_data()
index = ListIndex.from_documents(documents, service_context=service_context)
# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
print(response)
```
Using this method, you can use any LLM. Maybe you have one running locally, or running on your own server. As long as the class is implemented and the generated tokens are returned, it should work out. Note that we need to use the prompt helper to customize the prompt sizes, since every model has a slightly different context length.
Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a sufficiently large LLM to ensure it's capable of handling the complex queries that LlamaIndex uses internally, so your mileage may vary.
A list of all default internal prompts is available [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py), and chat-specific prompts are listed [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py). You can also implement your own custom prompts, as described [here](/core_modules/service_modules/prompts.md).
@@ -0,0 +1,35 @@
# Using LLMs as standalone modules
You can use our LLM modules on their own.
## Text Completion Example
```python
from llama_index.llms import OpenAI
# non-streaming
resp = OpenAI().complete('Paul Graham is ')
print(resp)
# using streaming endpoint
from llama_index.llms import OpenAI
llm = OpenAI()
resp = llm.stream_complete('Paul Graham is ')
for delta in resp:
print(delta, end='')
```
## Chat Example
```python
from llama_index.llms import ChatMessage, OpenAI
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = OpenAI().chat(messages)
print(resp)
```
Check out our [modules section](modules.md) for usage guides for each LLM.
@@ -0,0 +1,106 @@
# Prompts
## Concept
Prompting is the fundamental input that gives LLMs their expressive power. LlamaIndex uses prompts to build the index, do insertion,
perform traversal during querying, and to synthesize the final answer.
LlamaIndex uses a set of [default prompt templates](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) that work well out of the box.
In addition, there are some prompts written and used specifically for chat models like `gpt-3.5-turbo` [here](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py).
Users may also provide their own prompt templates to further customize the behavior of the framework. The best method for customizing is copying the default prompt from the link above, and using that as the base for any modifications.
## Usage Pattern
### Defining a custom prompt
Defining a custom prompt is as simple as creating a format string
```python
from llama_index import Prompt
template = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)
qa_template = Prompt(template)
```
> Note: you may see references to legacy prompt subclasses such as `QuestionAnswerPrompt`, `RefinePrompt`. These have been deprecated (and now are type aliases of `Prompt`). Now you can directly specify `Prompt(template)` to construct custom prompts. But you still have to make sure the template string contains the expected parameters (e.g. `{context_str}` and `{query_str}`) when replacing a default question answer prompt.
### Passing custom prompts into the pipeline
Since LlamaIndex is a multi-step pipeline, it's important to identify the operation that you want to modify and pass in the custom prompt at the right place.
At a high-level, prompts are used in 1) index construction, and 2) query engine execution
The most commonly used prompts will be the `text_qa_template` and the `refine_template`.
- `text_qa_template` - used to get an initial answer to a query using retrieved nodes
- `refine_tempalate` - used when the retrieved text does not fit into a single LLM call with `response_mode="compact"` (the default), or when more than one node is retrieved using `response_mode="refine"`. The answer from the first query is inserted as an `existing_answer`, and the LLM must update or repeat the existing answer based on the new context.
#### Modify prompts used in index construction
Different indices use different types of prompts during construction (some don't use prompts at all).
For instance, `TreeIndex` uses a `SummaryPrompt` to hierarchically
summarize the nodes, and `KeywordTableIndex` uses a `KeywordExtractPrompt` to extract keywords.
There are two equivalent ways to override the prompts:
1. via the default nodes constructor
```python
index = TreeIndex(nodes, summary_template=<custom_prompt>)
```
2. via the documents constructor.
```python
index = TreeIndex.from_documents(docs, summary_template=<custom_prompt>)
```
For more details on which index uses which prompts, please visit
[Index class references](/api_reference/indices.rst).
#### Modify prompts used in query engine
More commonly, prompts are used at query-time (i.e. for executing a query against an index and synthesizing the final response).
There are also two equivalent ways to override the prompts:
1. via the high-level API
```python
query_engine = index.as_query_engine(
text_qa_template=<custom_qa_prompt>,
refine_template=<custom_refine_prompt>
)
```
2. via the low-level composition API
```python
retriever = index.as_retriever()
synth = get_response_synthesizer(
text_qa_template=<custom_qa_prompt>,
refine_template=<custom_refine_prompt>
)
query_engine = RetrieverQueryEngine(retriever, response_synthesizer)
```
The two approaches above are equivalent, where 1 is essentially syntactic sugar for 2 and hides away the underlying complexity. You might want to use 1 to quickly modify some common parameters, and use 2 to have more granular control.
For more details on which classes use which prompts, please visit
[Query class references](/api_reference/query.rst).
Check out the [reference documentation](/api_reference/prompts.rst) for a full set of all prompts.
## Modules
```{toctree}
---
maxdepth: 1
---
/examples/customization/prompts/completion_prompts.ipynb
/examples/customization/prompts/chat_prompts.ipynb
```
@@ -0,0 +1,16 @@
# Module Guides
We provide a few simple implementations to start, with more sophisticated modes coming soon!
More specifically, the `SimpleChatEngine` does not make use of a knowledge base,
whereas `CondenseQuestionChatEngine` and `ReActChatEngine` make use of a query engine over knowledge base.
```{toctree}
---
maxdepth: 1
---
Simple Chat Engine </examples/chat_engine/chat_engine_repl.ipynb>
ReAct Chat Engine </examples/chat_engine/chat_engine_react.ipynb>
OpenAI Chat Engine </examples/chat_engine/chat_engine_openai.ipynb>
Condense Question Chat Engine </examples/chat_engine/chat_engine_condense_question.ipynb>
```
@@ -0,0 +1,48 @@
# Chat Engine
## Concept
Chat engine is a high-level interface for having a conversation with your data
(multiple back-and-forth instead of a single question & answer).
Think ChatGPT, but augmented with your knowledge base.
Conceptually, it is a **stateful** analogy of a [Query Engine](../query_engine/root.md).
By keeping track of the conversation history, it can answer questions with past context in mind.
```{tip}
If you want to ask standalone question over your data (i.e. without keeping track of conversation history), use [Query Engine](../query_engine/root.md) instead.
```
## Usage Pattern
Get started with:
```python
chat_engine = index.as_chat_engine()
response = chat_engine.chat("Tell me a joke.")
```
To stream response:
```python
chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Tell me a joke.")
for token in streaming_response.response_gen:
print(token, end="")
```
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
Below you can find corresponding tutorials to see the available chat engines in action.
```{toctree}
---
maxdepth: 2
---
modules.md
```
@@ -0,0 +1,109 @@
# Usage Pattern
## Get Started
Build a chat engine from index:
```python
chat_engine = index.as_chat_engine()
```
```{tip}
To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
```
Have a conversation with your data:
```python
response = chat_engine.chat("Tell me a joke.")
```
Reset chat history to start a new conversation:
```python
chat_engine.reset()
```
Enter an interactive chat REPL:
```python
chat_engine.chat_repl()
```
## Configuring a Chat Engine
Configuring a chat engine is very similar to configuring a query engine.
### High-Level API
You can directly build and configure a chat engine from an index in 1 line of code:
```python
chat_engine = index.as_chat_engine(
chat_mode='condense_question',
verbose=True
)
```
> Note: you can access different chat engines by specifying the `chat_mode` as a kwarg. `condense_question` corresponds to `CondenseQuestionChatEngine`, `react` corresponds to `ReActChatEngine`.
> Note: While the high-level API optimizes for ease-of-use, it does *NOT* expose full range of configurability.
### Low-Level Composition API
You can use the low-level composition API if you need more granular control.
Concretely speaking, you would explicitly construct `ChatEngine` object instead of calling `index.as_chat_engine(...)`.
> Note: You may need to look at API references or example notebooks.
Here's an example where we configure the following:
* configure the condense question prompt,
* initialize the conversation with some existing history,
* print verbose debug message.
```python
from llama_index.prompts import Prompt
custom_prompt = Prompt("""\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.
<Chat History>
{chat_history}
<Follow Up Message>
{question}
<Standalone question>
""")
# list of (human_message, ai_message) tuples
custom_chat_history = [
(
'Hello assistant, we are having a insightful discussion about Paul Graham today.',
'Okay, sounds good.'
)
]
query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(
query_engine=query_engine,
condense_question_prompt=custom_prompt,
chat_history=custom_chat_history,
verbose=True
)
```
### Streaming
To enable streaming, you simply need to call the `stream_chat` endpoint instead of the `chat` endpoint.
```{warning}
This somewhat inconsistent with query engine (where you pass in a `streaming=True` flag). We are working on making the behavior more consistent!
```
```python
chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Tell me a joke.")
for token in streaming_response.response_gen:
print(token, end="")
```
See an [end-to-end tutorial](/examples/customization/streaming/chat_engine_condense_question_stream_response.ipynb)
@@ -0,0 +1,222 @@
# Modules
## SimilarityPostprocessor
Used to remove nodes that are below a similarity score threshold.
```python
from llama_index.indices.postprocessor import SimilarityPostprocessor
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)
postprocessor.postprocess_nodes(nodes)
```
## KeywordNodePostprocessor
Used to ensure certain keywords are either excluded or included.
```python
from llama_index.indices.postprocessor import KeywordNodePostprocessor
postprocessor = KeywordNodePostprocessor(
required_keywords=["word1", "word2"],
exclude_keywords=["word3", "word4"]
)
postprocessor.postprocess_nodes(nodes)
```
## SentenceEmbeddingOptimizer
This postprocessor optimizes token usage by removing sentences that are not relevant to the query (this is done using embeddings).
The percentile cutoff is a measure for using the top percentage of relevant sentences.
The threshold cutoff can be specified instead, which uses a raw similarity cutoff for picking which sentences to keep.
```python
from llama_index.indices.postprocessor import SentenceEmbeddingOptimizer
postprocessor = SentenceEmbeddingOptimizer(
embed_model=service_context.embed_model,
percentile_cutoff=0.5,
# threshold_cutoff=0.7
)
postprocessor.postprocess_nodes(nodes)
```
A full notebook guide can be found [here](/examples/node_postprocessor/OptimizerDemo.ipynb)
## CohereRerank
Uses the "Cohere ReRank" functionality to re-order nodes, and returns the top N nodes.
```python
from llama_index.indices.postprocessor import CohereRerank
postprocessor = CohereRerank(
top_n=2
model="rerank-english-v2.0",
api_key="YOUR COHERE API KEY"
)
postprocessor.postprocess_nodes(nodes)
```
Full notebook guide is available [here](/examples/node_postprocessor/CohereRerank.ipynb).
## LLM Rerank
Uses a LLM to re-order nodes by asking the LLM to return the relevant documents and a score of how relevant they are. Returns the top N ranked nodes.
```python
from llama_index.indices.postprocessor import LLMRerank
postprocessor = LLMRerank(
top_n=2
service_context=service_context,
)
postprocessor.postprocess_nodes(nodes)
```
Full notebook guide is available [her for Gatsby](/examples/node_postprocessor/LLMReranker-Gatsby.ipynb) and [here for Lyft 10K documents](/examples/node_postprocessor/LLMReranker-Lyft-10k.ipynb).
## FixedRecencyPostprocessor
This postproccesor returns the top K nodes sorted by date. This assumes there is a `date` field to parse in the metadata of each node.
```python
from llama_index.indices.postprocessor import FixedRecencyPostprocessor
postprocessor = FixedRecencyPostprocessor(
tok_k=1,
date_key="date" # the key in the metadata to find the date
)
postprocessor.postprocess_nodes(nodes)
```
![](/_static/node_postprocessors/recency.png)
A full notebook guide is available [here](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb).
## EmbeddingRecencyPostprocessor
This postproccesor returns the top K nodes after sorting by date and removing older nodes that are too similar after measuring embedding similarity.
```python
from llama_index.indices.postprocessor import EmbeddingRecencyPostprocessor
postprocessor = EmbeddingRecencyPostprocessor(
service_context=service_context,
date_key="date",
similarity_cutoff=0.7
)
postprocessor.postprocess_nodes(nodes)
```
A full notebook guide is available [here](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb).
## TimeWeightedPostprocessor
This postproccesor returns the top K nodes applying a time-weighted rerank to each node. Each time a node is retrieved, the time it was retrieved is recorded. This biases search to favor information that has not be returned in a query yet.
```python
from llama_index.indices.postprocessor import TimeWeightedPostprocessor
postprocessor = TimeWeightedPostprocessor(
time_decay=0.99,
top_k=1
)
postprocessor.postprocess_nodes(nodes)
```
A full notebook guide is available [here](/examples/node_postprocessor/TimeWeightedPostprocessorDemo.ipynb).
## (Beta) PIINodePostprocessor
The PII (Personal Identifiable Information) postprocssor removes information that might be a security risk. It does this by using NER (either with a dedicated NER model, or with a local LLM model).
### LLM Version
```python
from llama_index.indices.postprocessor import PIINodePostprocessor
postprocessor = PIINodePostprocessor(
service_context=service_context, # this should be setup with an LLM you trust
)
postprocessor.postprocess_nodes(nodes)
```
### NER Version
This version uses the default local model from Hugging Face that is loaded when you run `pipline("ner")`.
```python
from llama_index.indices.postprocessor import NERPIINodePostprocessor
postprocessor = NERPIINodePostprocessor()
postprocessor.postprocess_nodes(nodes)
```
A full notebook guide for both can be found [here](/examples/node_postprocessor/PII.ipynb).
## (Beta) PrevNextNodePostprocessor
Uses pre-defined settings to read the `Node` relationships and fetch either all nodes that come previously, next, or both.
This is useful when you know the relationships point to important data (either before, after, or both) that should be sent to the LLM if that node is retrieved.
```python
from llama_index.indices.postprocessor import PrevNextNodePostprocessor
postprocessor = PrevNextNodePostprocessor(
docstore=index.docstore,
num_nodes=1, # number of nodes to fetch when looking forawrds or backwards
mode="next" # can be either 'next', 'previous', or 'both'
)
postprocessor.postprocess_nodes(nodes)
```
![](/_static/node_postprocessors/prev_next.png)
## (Beta) AutoPrevNextNodePostprocessor
The same as PrevNextNodePostprocessor, but lets the LLM decide the mode (next, previous, or both).
```python
from llama_index.indices.postprocessor import AutoPrevNextNodePostprocessor
postprocessor = AutoPrevNextNodePostprocessor(
docstore=index.docstore,
service_context=service_context
num_nodes=1, # number of nodes to fetch when looking forawrds or backwards)
postprocessor.postprocess_nodes(nodes)
```
A full example notebook is available [here](/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb).
## All Notebooks
```{toctree}
---
maxdepth: 1
---
/examples/node_postprocessor/OptimizerDemo.ipynb
/examples/node_postprocessor/CohereRerank.ipynb
/examples/node_postprocessor/LLMReranker-Lyft-10k.ipynb
/examples/node_postprocessor/LLMReranker-Gatsby.ipynb
/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb
/examples/node_postprocessor/TimeWeightedPostprocessorDemo.ipynb
/examples/node_postprocessor/PII.ipynb
/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb
```
@@ -0,0 +1,49 @@
# Node Postprocessor
## Concept
Node postprocessors are a set of modules that take a set of nodes, and apply some kind of transformation or filtering before returning them.
In LlamaIndex, node postprocessors are most commonly applied within a query engine, after the node retrieval step and before the response synthesis step.
LlamaIndex offers several node postprocessors for immediate use, while also providing a simple API for adding your own custom postprocessors.
```{tip}
Confused about where node postprocessor fits in the pipeline? Read about [high-level concepts](/getting_started/concepts.md)
```
## Usage Pattern
An example of using a node postprocessors is below:
```python
from llama_index.indices.postprocessor import SimilarityPostprocessor
from llama_index.schema import Node, NodeWithScore
nodes = [
NodeWithScore(node=Node(text="text"), score=0.7),
NodeWithScore(node=Node(text="text"), score=0.8)
]
# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)
```
You can find more details using post processors and how to build your own below.
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
Below you can find guides for each node postprocessor.
```{toctree}
---
maxdepth: 2
---
modules.md
```
@@ -0,0 +1,93 @@
# Usage Pattern
Most commonly, node-postprocessors will be used in a query engine, where they are applied to the nodes returned from a retriever, and before the response synthesis step.
## Using with a Query Engine
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.indices.postprocessor import TimeWeightedPostprocessor
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
node_postprocessors=[
TimeWeightedPostprocessor(
time_decay=0.5, time_access_refresh=False, top_k=1
)
]
)
# all node post-processors will be applied during each query
response = query_engine.query("query string")
```
## Using with Retrieved Nodes
Or used as a standalone object for filtering retrieved nodes:
```python
from llama_index.indices.postprocessor import SimilarityPostprocessor
nodes = index.as_retriever().query("query string")
# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)
```
## Using with your own nodes
As you may have noticed, the postprocessors take `NodeWithScore` objects as inputs, which is just a wrapper class with a `Node` and a `score` value.
```python
from llama_index.indices.postprocessor import SimilarityPostprocessor
from llama_index.schema import Node, NodeWithScore
nodes = [
NodeWithScore(node=Node(text="text"), score=0.7),
NodeWithScore(node=Node(text="text"), score=0.8)
]
# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)
```
## Custom Node PostProcessor
The base class is `BaseNodePostprocessor`, and the API interface is very simple:
```python
class BaseNodePostprocessor:
"""Node postprocessor."""
@abstractmethod
def postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
) -> List[NodeWithScore]:
"""Postprocess nodes."""
```
A dummy node-postprocessor can be implemented in just a few lines of code:
```python
from llama_index import QueryBundle
from llama_index.indices.postprocessor.base import BaseNodePostprocessor
from llama_index.schema import NodeWithScore
class DummyNodePostprocessor:
def postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
) -> List[NodeWithScore]:
# subtracts 1 from the score
for n in nodes:
n.score -= 1
return nodes
```
@@ -0,0 +1,144 @@
# Query Transformations
LlamaIndex allows you to perform *query transformations* over your index structures.
Query transformations are modules that will convert a query into another query. They can be **single-step**, as in the transformation is run once before the query is executed against an index.
They can also be **multi-step**, as in:
1. The query is transformed, executed against an index,
2. The response is retrieved.
3. Subsequent queries are transformed/executed in a sequential fashion.
We list some of our query transformations in more detail below.
#### Use Cases
Query transformations have multiple use cases:
- Transforming an initial query into a form that can be more easily embedded (e.g. HyDE)
- Transforming an initial query into a subquestion that can be more easily answered from the data (single-step query decomposition)
- Breaking an initial query into multiple subquestions that can be more easily answered on their own. (multi-step query decomposition)
### HyDE (Hypothetical Document Embeddings)
[HyDE](http://boston.lti.cs.cmu.edu/luyug/HyDE/HyDE.pdf) is a technique where given a natural language query, a hypothetical document/answer is generated first. This hypothetical document is then used for embedding lookup rather than the raw query.
To use HyDE, an example code snippet is shown below.
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.indices.query.query_transform.base import HyDEQueryTransform
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
# load documents, build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex(documents)
# run query with HyDE query transform
query_str = "what did paul graham do after going to RISD"
hyde = HyDEQueryTransform(include_original=True)
query_engine = index.as_query_engine()
query_engine = TransformQueryEngine(query_engine, query_transform=hyde)
response = query_engine.query(query_str)
print(response)
```
Check out our [example notebook](../../../examples/query_transformations/HyDEQueryTransformDemo.ipynb) for a full walkthrough.
### Single-Step Query Decomposition
Some recent approaches (e.g. [self-ask](https://ofir.io/self-ask.pdf), [ReAct](https://arxiv.org/abs/2210.03629)) have suggested that LLM's
perform better at answering complex questions when they break the question into smaller steps. We have found that this is true for queries that require knowledge augmentation as well.
If your query is complex, different parts of your knowledge base may answer different "subqueries" around the overall query.
Our single-step query decomposition feature transforms a **complicated** question into a simpler one over the data collection to help provide a sub-answer to the original question.
This is especially helpful over a [composed graph](../../index/composability.md). Within a composed graph, a query can be routed to multiple subindexes, each representing a subset of the overall knowledge corpus. Query decomposition allows us to transform the query into a more suitable question over any given index.
An example image is shown below.
![](/_static/query_transformations/single_step_diagram.png)
Here's a corresponding example code snippet over a composed graph.
```python
# Setting: a list index composed over multiple vector indices
# llm_predictor_chatgpt corresponds to the ChatGPT LLM interface
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor_chatgpt, verbose=True
)
# initialize indexes and graph
...
# configure retrievers
vector_query_engine = vector_index.as_query_engine()
vector_query_engine = TransformQueryEngine(
vector_query_engine,
query_transform=decompose_transform
transform_extra_info={'index_summary': vector_index.index_struct.summary}
)
custom_query_engines = {
vector_index.index_id: vector_query_engine
}
# query
query_str = (
"Compare and contrast the airports in Seattle, Houston, and Toronto. "
)
query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
response = query_engine.query(query_str)
```
Check out our [example notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb) for a full walkthrough.
### Multi-Step Query Transformations
Multi-step query transformations are a generalization on top of existing single-step query transformation approaches.
Given an initial, complex query, the query is transformed and executed against an index. The response is retrieved from the query.
Given the response (along with prior responses) and the query, followup questions may be asked against the index as well. This technique allows a query to be run against a single knowledge source until that query has satisfied all questions.
An example image is shown below.
![](/_static/query_transformations/multi_step_diagram.png)
Here's a corresponding example code snippet.
```python
from llama_index.indices.query.query_transform.base import StepDecomposeQueryTransform
# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(
llm_predictor, verbose=True
)
query_engine = index.as_query_engine()
query_engine = MultiStepQueryEngine(query_engine, query_transform=step_decompose_transform)
response = query_engine.query(
"Who was in the first batch of the accelerator program the author started?",
)
print(str(response))
```
Check out our [example notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-multistep.ipynb) for a full walkthrough.
```{toctree}
---
caption: Examples
maxdepth: 1
---
/examples/query_transformations/HyDEQueryTransformDemo.ipynb
/examples/query_transformations/SimpleIndexDemo-multistep.ipynb
```
@@ -0,0 +1,49 @@
# Module Guides
## Basic
```{toctree}
---
maxdepth: 1
---
Retriever Query Engine </examples/query_engine/CustomRetrievers.ipynb>
```
## Structured & Semi-Structured Data
```{toctree}
---
maxdepth: 1
---
/examples/query_engine/json_query_engine.ipynb
/examples/query_engine/pandas_query_engine.ipynb
/examples/query_engine/knowledge_graph_query_engine.ipynb
```
## Advanced
```{toctree}
---
maxdepth: 1
---
/examples/query_engine/RouterQueryEngine.ipynb
/examples/query_engine/RetrieverRouterQueryEngine.ipynb
/examples/query_engine/JointQASummary.ipynb
/examples/query_engine/sub_question_query_engine.ipynb
/examples/query_transformations/SimpleIndexDemo-multistep.ipynb
/examples/query_engine/SQLRouterQueryEngine.ipynb
/examples/query_engine/SQLAutoVectorQueryEngine.ipynb
/examples/query_engine/SQLJoinQueryEngine.ipynb
/examples/index_structs/struct_indices/duckdb_sql_query.ipynb
Retry Query Engine </examples/evaluation/RetryQuery.ipynb>
Retry Source Query Engine </examples/evaluation/RetryQuery.ipynb>
Retry Guideline Query Engine </examples/evaluation/RetryQuery.ipynb>
/examples/query_engine/citation_query_engine.ipynb
/examples/query_engine/pdf_tables/recursive_retriever.ipynb
```
## Experimental
```{toctree}
---
maxdepth: 1
---
/examples/query_engine/flare_query_engine.ipynb
```
@@ -0,0 +1,20 @@
# Response Modes
Right now, we support the following options:
- `default`: "create and refine" an answer by sequentially going through each retrieved `Node`;
This makes a separate LLM call per Node. Good for more detailed answers.
- `compact`: "compact" the prompt during each LLM call by stuffing as
many `Node` text chunks that can fit within the maximum prompt size. If there are
too many chunks to stuff in one prompt, "create and refine" an answer by going through
multiple prompts.
- `tree_summarize`: Given a set of `Node` objects and the query, recursively construct a tree
and return the root node as the response. Good for summarization purposes.
- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM,
without actually sending them. Then can be inspected by checking `response.source_nodes`.
The response object is covered in more detail in Section 5.
- `accumulate`: Given a set of `Node` objects and the query, apply the query to each `Node` text
chunk while accumulating the responses into an array. Returns a concatenated string of all
responses. Good for when you need to run the same query separately against each text
chunk.
See [Response Synthesizer](/core_modules/query_modules/response_synthesizers/root.md) to learn more.
@@ -0,0 +1,51 @@
# Query Engine
## Concept
Query engine is a generic interface that allows you to ask question over your data.
A query engine takes in a natural language query, and returns a rich response.
It is most often (but not always) built on one or many [Indices](/core_modules/data_modules/index/root.md) via [Retrievers](/core_modules/query_modules/retriever/root.md).
You can compose multiple query engines to achieve more advanced capability.
```{tip}
If you want to have a conversation with your data (multiple back-and-forth instead of a single question & answer), take a look at [Chat Engine](/core_modules/query_modules/chat_engines/root.md)
```
## Usage Pattern
Get started with:
```python
query_engine = index.as_query_engine()
response = query_engine.query("Who is Paul Graham.")
```
To stream response:
```python
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Who is Paul Graham.")
streaming_response.print_response_stream()
```
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
```{toctree}
---
maxdepth: 3
---
modules.md
```
## Supporting Modules
```{toctree}
---
maxdepth: 2
---
supporting_modules.md
```
@@ -0,0 +1,56 @@
# Streaming
LlamaIndex supports streaming the response as it's being generated.
This allows you to start printing or processing the beginning of the response before the full response is finished.
This can drastically reduce the perceived latency of queries.
### Setup
To enable streaming, you need to use an LLM that supports streaming.
Right now, streaming is supported by `OpenAI`, `HuggingFaceLLM`, and most LangChain LLMs (via `LangChainLLM`).
Configure query engine to use streaming:
If you are using the high-level API, set `streaming=True` when building a query engine.
```python
query_engine = index.as_query_engine(
streaming=True,
similarity_top_k=1
)
```
If you are using the low-level API to compose the query engine,
pass `streaming=True` when constructing the `Response Synthesizer`:
```python
from llama_index import get_response_synthesizer
synth = get_response_synthesizer(streaming=True, ...)
query_engine = RetrieverQueryEngine(response_synthesizer=synth, ...)
```
### Streaming Response
After properly configuring both the LLM and the query engine,
calling `query` now returns a `StreamingResponse` object.
```python
streaming_response = query_engine.query(
"What did the author do growing up?",
)
```
The response is returned immediately when the LLM call *starts*, without having to wait for the full completion.
> Note: In the case where the query engine makes multiple LLM calls, only the last LLM call will be streamed and the response is returned when the last LLM call starts.
You can obtain a `Generator` from the streaming response and iterate over the tokens as they arrive:
```python
for text in streaming_response.response_gen:
# do something with text as they arrive.
```
Alternatively, if you just want to print the text as they arrive:
```
streaming_response.print_response_stream()
```
See an [end-to-end example](/examples/customization/streaming/SimpleIndexDemo-streaming.ipynb)
@@ -0,0 +1,8 @@
# Supporting Modules
```{toctree}
---
maxdepth: 1
---
advanced/query_transformations.md
```
@@ -0,0 +1,96 @@
# Usage Pattern
## Get Started
Build a query engine from index:
```python
query_engine = index.as_query_engine()
```
```{tip}
To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
```
Ask a question over your data
```python
response = query_engine.query('Who is Paul Graham?')
```
## Configuring a Query Engine
### High-Level API
You can directly build and configure a query engine from an index in 1 line of code:
```python
query_engine = index.as_query_engine(
response_mode='tree_summarize',
verbose=True,
)
```
> Note: While the high-level API optimizes for ease-of-use, it does *NOT* expose full range of configurability.
See [**Response Modes**](./response_modes.md) for a full list of response modes and what they do.
```{toctree}
---
maxdepth: 1
hidden:
---
response_modes.md
streaming.md
```
### Low-Level Composition API
You can use the low-level composition API if you need more granular control.
Concretely speaking, you would explicitly construct a `QueryEngine` object instead of calling `index.as_query_engine(...)`.
> Note: You may need to look at API references or example notebooks.
```python
from llama_index import (
VectorStoreIndex,
get_response_synthesizer,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
# build index
index = VectorStoreIndex.from_documents(documents)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=2,
)
# configure response synthesizer
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize",
)
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
# query
response = query_engine.query("What did the author do growing up?")
print(response)
```
### Streaming
To enable streaming, you simply need to pass in a `streaming=True` flag
```python
query_engine = index.as_query_engine(
streaming=True,
)
streaming_response = query_engine.query(
"What did the author do growing up?",
)
streaming_response.print_response_stream()
```
* Read the full [streaming guide](/core_modules/query_modules/query_engine/streaming.md)
* See an [end-to-end example](/examples/customization/streaming/SimpleIndexDemo-streaming.ipynb)
@@ -0,0 +1,62 @@
# Module Guide
Detailed inputs/outputs for each response synthesizer are found below.
## API Example
The following shows the setup for utilizing all kwargs.
- `response_mode` specifies which response synthesizer to use
- `service_context` defines the LLM and related settings for synthesis
- `text_qa_template` and `refine_template` are the prompts used at various stages
- `use_async` is used for only the `tree_summarize` response mode right now, to asynchronously build the summary tree
- `streaming` configures whether to return a streaming response object or not
In the `synthesize`/`asyntheszie` functions, you can optionally provide additional source nodes, which will be added to the `response.source_nodes` list.
```python
from llama_index.schema import Node, NodeWithScore
from llama_index import get_response_synthesizer
response_synthesizer = get_response_synthesizer(
response_mode="refine",
service_context=service_context,
text_qa_template=text_qa_template,
refine_template=refine_template,
use_async=False,
streaming=False
)
# synchronous
response = response_synthesizer.synthesize(
"query string",
nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
additional_source_nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
)
# asynchronous
response = await response_synthesizer.asynthesize(
"query string",
nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
additional_source_nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..],
)
```
You can also directly return a string, using the lower-level `get_response` and `aget_response` functions
```python
response_str = response_synthesizer.get_response(
"query string",
text_chunks=["text1", "text2", ...]
)
```
## Example Notebooks
```{toctree}
---
maxdepth: 1
---
/examples/response_synthesizers/refine.ipynb
/examples/response_synthesizers/tree_summarize.ipynb
```
@@ -0,0 +1,50 @@
# Response Synthesizer
## Concept
A `Response Synthesizer` is what generates a response from an LLM, using a user query and a given set of text chunks. The output of a response synthesizer is a `Response` object.
The method for doing this can take many forms, from as simple as iterating over text chunks, to as complex as building a tree. The main idea here is to simplify the process of generating a response using an LLM across your data.
When used in a query engine, the response synthesizer is used after nodes are retrieved from a retriever, and after any node-postprocessors are ran.
```{tip}
Confused about where response synthesizer fits in the pipeline? Read the [high-level concepts](/getting_started/concepts.md)
```
## Usage Pattern
Use a response synthesizer on it's own:
```python
from llama_index.schema import Node
from llama_index.response_synthesizers import get_response_synthesizer
response_synthesizer = get_response_synthesizer(response_mode='compact')
response = response_synthesizer.synthesize("query text", nodes=[Node(text="text"), ...])
```
Or in a query engine after you've created an index:
```python
query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
response = query_engine.query("query_text")
```
You can find more details on all available response synthesizers, modes, and how to build your own below.
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
Below you can find detailed API information for each response synthesis module.
```{toctree}
---
maxdepth: 1
---
modules.md
```
@@ -0,0 +1,95 @@
# Usage Pattern
## Get Started
Configuring the response synthesizer for a query engine using `response_mode`:
```python
from llama_index.schema import Node, NodeWithScore
from llama_index.response_synthesizers import get_response_synthesizer
response_synthesizer = get_response_synthesizer(response_mode='compact')
response = response_synthesizer.synthesize(
"query text",
nodes=[NodeWithScore(node=Node(text="text"), score=1.0), ..]
)
```
Or, more commonly, in a query engine after you've created an index:
```python
query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)
response = query_engine.query("query_text")
```
```{tip}
To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
```
## Configuring the Response Mode
Response synthesizers are typically specified through a `response_mode` kwarg setting.
Several response synthesizers are implemented already in LlamaIndex:
- `refine`: "create and refine" an answer by sequentially going through each retrieved text chunk.
This makes a separate LLM call per Node. Good for more detailed answers.
- `compact` (default): "compact" the prompt during each LLM call by stuffing as
many text chunks that can fit within the maximum prompt size. If there are
too many chunks to stuff in one prompt, "create and refine" an answer by going through
multiple compact prompts. The same as `refine`, but should result in less LLM calls.
- `tree_summarize`: Given a set of text chunks and the query, recursively construct a tree
and return the root node as the response. Good for summarization purposes.
- `simple_summarize`: Truncates all text chunks to fit into a single LLM prompt. Good for quick
summarization purposes, but may lose detail due to truncation.
- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM,
without actually sending them. Then can be inspected by checking `response.source_nodes`.
- `accumulate`: Given a set of text chunks and the query, apply the query to each text
chunk while accumulating the responses into an array. Returns a concatenated string of all
responses. Good for when you need to run the same query separately against each text
chunk.
- `compact_accumulate`: The same as accumulate, but will "compact" each LLM prompt similar to
`compact`, and run the same query against each text chunk.
## Custom Response Synthesizers
Each response synthesizer inherits from `llama_index.response_synthesizers.base.BaseSynthesizer`. The base API is extremely simple, which makes it easy to create your own response synthesizer.
Maybe you want to customize which template is used at each step in `tree_summarize`, or maybe a new research paper came out detailing a new way to generate a response to a query, you can create your own response synthesizer and plug it into any query engine or use it on it's own.
Below we show the `__init__()` function, as well as the two abstract methods that every response synthesizer must implement. The basic requirements are to process a query and text chunks, and return a string (or string generator) response.
```python
class BaseSynthesizer(ABC):
"""Response builder class."""
def __init__(
self,
service_context: Optional[ServiceContext] = None,
streaming: bool = False,
) -> None:
"""Init params."""
self._service_context = service_context or ServiceContext.from_defaults()
self._callback_manager = self._service_context.callback_manager
self._streaming = streaming
@abstractmethod
def get_response(
self,
query_str: str,
text_chunks: Sequence[str],
**response_kwargs: Any,
) -> RESPONSE_TEXT_TYPE:
"""Get response."""
...
@abstractmethod
async def aget_response(
self,
query_str: str,
text_chunks: Sequence[str],
**response_kwargs: Any,
) -> RESPONSE_TEXT_TYPE:
"""Get response."""
...
```
@@ -0,0 +1,52 @@
# Module Guides
We are adding more module guides soon!
In the meanwhile, please take a look at the [API References](/api_reference/query/retrievers.rst).
## Vector Index Retrievers
* VectorIndexRetriever
```{toctree}
---
maxdepth: 1
---
VectorIndexAutoRetriever </examples/vector_stores/chroma_auto_retriever.ipynb>
```
## List Index
* ListIndexRetriever
* ListIndexEmbeddingRetriever
* ListIndexLLMRetriever
## Tree Index
* TreeSelectLeafRetriever
* TreeSelectLeafEmbeddingRetriever
* TreeAllLeafRetriever
* TreeRootRetriever
## Keyword Table Index
* KeywordTableGPTRetriever
* KeywordTableSimpleRetriever
* KeywordTableRAKERetriever
## Knowledge Graph Index
```{toctree}
---
maxdepth: 1
---
Custom Retriever (KG Index and Vector Store Index) </examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.ipynb>
```
* KGTableRetriever
## Document Summary Index
* DocumentSummaryIndexRetriever
* DocumentSummaryIndexEmbeddingRetriever
## Composed Retrievers
* TransformRetriever
```{toctree}
---
maxdepth: 1
---
/examples/query_engine/pdf_tables/recursive_retriever.ipynb
```
@@ -0,0 +1,35 @@
# Retriever Modes
Here we show the mapping from `retriever_mode` configuration to the selected retriever class.
> Note that `retriever_mode` can mean different thing for different index classes.
## Vector Index
Specifying `retriever_mode` has no effect (silently ignored).
`vector_index.as_retriever(...)` always returns a VectorIndexRetriever.
## List Index
* `default`: ListIndexRetriever
* `embedding`: ListIndexEmbeddingRetriever
* `llm`: ListIndexLLMRetriever
## Tree Index
* `select_leaf`: TreeSelectLeafRetriever
* `select_leaf_embedding`: TreeSelectLeafEmbeddingRetriever
* `all_leaf`: TreeAllLeafRetriever
* `root`: TreeRootRetriever
## Keyword Table Index
* `default`: KeywordTableGPTRetriever
* `simple`: KeywordTableSimpleRetriever
* `rake`: KeywordTableRAKERetriever
## Knowledge Graph Index
* `keyword`: KGTableRetriever
* `embedding`: KGTableRetriever
* `hybrid`: KGTableRetriever
## Document Summary Index
* `default`: DocumentSummaryIndexRetriever
* `embedding`: DocumentSummaryIndexEmbeddingRetrievers
@@ -0,0 +1,37 @@
# Retriever
## Concept
Retrievers are responsible for fetching the most relevant context given a user query (or chat message).
It can be built on top of [Indices](/core_modules/data_modules/index/root.md), but can also be defined independently.
It is used as a key building block in [Query Engines](/core_modules/query_modules/query_engine/root.md) (and [Chat Engines](/core_modules/query_modules/chat_engines/root.md)) for retrieving relevant context.
```{tip}
Confused about where retriever fits in the pipeline? Read about [high-level concepts](/getting_started/concepts.md)
```
## Usage Pattern
Get started with:
```python
retriever = index.as_retriever()
nodes = retriever.retrieve("Who is Paul Graham?")
```
```{toctree}
---
maxdepth: 2
---
usage_pattern.md
```
## Modules
```{toctree}
---
maxdepth: 2
---
modules.md
```
@@ -0,0 +1,74 @@
# Usage Pattern
## Get Started
Get a retriever from index:
```python
retriever = index.as_retriever()
```
Retrieve relevant context for a question:
```python
nodes = retriever.retrieve('Who is Paul Graham?')
```
> Note: To learn how to build an index, see [Index](/core_modules/data_modules/index/root.md)
## High-Level API
### Selecting a Retriever
You can select the index-specific retriever class via `retriever_mode`.
For example, with a `ListIndex`:
```python
retriever = list_index.as_retriever(
retriever_mode='llm',
)
```
This creates a [ListIndexLLMRetriever](/api_reference/query/retrievers/list.rst) on top of the list index.
See [**Retriever Modes**](/core_modules/query_modules/retriever/retriever_modes.md) for a full list of (index-specific) retriever modes
and the retriever classes they map to.
```{toctree}
---
maxdepth: 1
hidden:
---
retriever_modes.md
```
### Configuring a Retriever
In the same way, you can pass kwargs to configure the selected retriever.
> Note: take a look at the API reference for the selected retriever class' constructor parameters for a list of valid kwargs.
For example, if we selected the "llm" retriever mode, we might do the following:
```python
retriever = list_index.as_retriever(
retriever_mode='llm',
choice_batch_size=5,
)
```
## Low-Level Composition API
You can use the low-level composition API if you need more granular control.
To achieve the same outcome as above, you can directly import and construct the desired retriever class:
```python
from llama_index.indices.list import ListIndexLLMRetriever
retriever = ListIndexLLMRetriever(
index=list_index,
choice_batch_size=5,
)
```
## Advanced
```{toctree}
---
maxdepth: 1
---
Define Custom Retriever </examples/query_engine/CustomRetrievers.ipynb>
```
@@ -0,0 +1,156 @@
# Output Parsing
LlamaIndex supports integrations with output parsing modules offered
by other frameworks. These output parsing modules can be used in the following ways:
- To provide formatting instructions for any prompt / query (through `output_parser.format`)
- To provide "parsing" for LLM outputs (through `output_parser.parse`)
### Guardrails
Guardrails is an open-source Python package for specification/validation/correction of output schemas. See below for a code example.
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.output_parsers import GuardrailsOutputParser
from llama_index.llm_predictor import StructuredLLMPredictor
from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
from llama_index.prompts.default_prompts import DEFAULT_TEXT_QA_PROMPT_TMPL, DEFAULT_REFINE_PROMPT_TMPL
# load documents, build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex(documents, chunk_size=512)
llm_predictor = StructuredLLMPredictor()
# specify StructuredLLMPredictor
# this is a special LLMPredictor that allows for structured outputs
# define query / output spec
rail_spec = ("""
<rail version="0.1">
<output>
<list name="points" description="Bullet points regarding events in the author's life.">
<object>
<string name="explanation" format="one-line" on-fail-one-line="noop" />
<string name="explanation2" format="one-line" on-fail-one-line="noop" />
<string name="explanation3" format="one-line" on-fail-one-line="noop" />
</object>
</list>
</output>
<prompt>
Query string here.
@xml_prefix_prompt
{output_schema}
@json_suffix_prompt_v2_wo_none
</prompt>
</rail>
""")
# define output parser
output_parser = GuardrailsOutputParser.from_rail_string(rail_spec, llm=llm_predictor.llm)
# format each prompt with output parser instructions
fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)
qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)
# obtain a structured response
query_engine = index.as_query_engine(
service_context=ServiceContext.from_defaults(
llm_predictor=llm_predictor
),
text_qa_template=qa_prompt,
refine_template=refine_prompt,
)
response = query_engine.query(
"What are the three items the author did growing up?",
)
print(response)
```
Output:
```
{'points': [{'explanation': 'Writing short stories', 'explanation2': 'Programming on an IBM 1401', 'explanation3': 'Using microcomputers'}]}
```
### Langchain
Langchain also offers output parsing modules that you can use within LlamaIndex.
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.output_parsers import LangchainOutputParser
from llama_index.llm_predictor import StructuredLLMPredictor
from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
from llama_index.prompts.default_prompts import DEFAULT_TEXT_QA_PROMPT_TMPL, DEFAULT_REFINE_PROMPT_TMPL
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
# load documents, build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex.from_documents(documents)
llm_predictor = StructuredLLMPredictor()
# define output schema
response_schemas = [
ResponseSchema(name="Education", description="Describes the author's educational experience/background."),
ResponseSchema(name="Work", description="Describes the author's work experience/background.")
]
# define output parser
lc_output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
output_parser = LangchainOutputParser(lc_output_parser)
# format each prompt with output parser instructions
fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)
qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)
# query index
query_engine = index.as_query_engine(
service_context=ServiceContext.from_defaults(
llm_predictor=llm_predictor
),
text_qa_template=qa_prompt,
refine_template=refine_prompt,
)
response = query_engine.query(
"What are a few things the author did growing up?",
)
print(str(response))
```
Output:
```
{'Education': 'Before college, the author wrote short stories and experimented with programming on an IBM 1401.', 'Work': 'The author worked on writing and programming outside of school.'}
```
### Guides
```{toctree}
---
caption: Examples
maxdepth: 1
---
/examples/output_parsing/GuardrailsDemo.ipynb
/examples/output_parsing/LangchainOutputParserDemo.ipynb
/examples/output_parsing/guidance_pydantic_program.ipynb
/examples/output_parsing/guidance_sub_question.ipynb
/examples/output_parsing/openai_pydantic_program.ipynb
```
@@ -0,0 +1,35 @@
# Pydantic Program
A pydantic program is a generic abstraction that takes in an input string and converts it to a structured Pydantic object type.
Because this abstraction is so generic, it encompasses a broad range of LLM workflows. The programs are composable and be for more generic or specific use cases.
There's a few general types of Pydantic Programs:
- **LLM Text Completion Pydantic Programs**: These convert input text into a user-specified structured object through a text completion API + output parsing.
- **LLM Function Calling Pydantic Program**: These convert input text into a user-specified structured object through an LLM function calling API.
- **Prepackaged Pydantic Programs**: These convert input text into prespecified structured objects.
## LLM Text Completion Pydantic Programs
TODO: Coming soon!
## LLM Function Calling Pydantic Programs
```{toctree}
---
maxdepth: 1
---
/examples/output_parsing/openai_pydantic_program.ipynb
/examples/output_parsing/guidance_pydantic_program.ipynb
/examples/output_parsing/guidance_sub_question.ipynb
```
## Prepackaged Pydantic Programs
```{toctree}
---
maxdepth: 1
---
/examples/output_parsing/df_program.ipynb
/examples/output_parsing/evaporate_program.ipynb
```
@@ -0,0 +1,42 @@
# Structured Outputs
The ability of LLMs to produce structured outputs are important for downstream applications that rely on reliably parsing output values.
LlamaIndex itself also relies on structured output in the following ways.
- **Document retrieval**: Many data structures within LlamaIndex rely on LLM calls with a specific schema for Document retrieval. For instance, the tree index expects LLM calls to be in the format "ANSWER: (number)".
- **Response synthesis**: Users may expect that the final response contains some degree of structure (e.g. a JSON output, a formatted SQL query, etc.)
LlamaIndex provides a variety of modules enabling LLMs to produce outputs in a structured format. We provide modules at different levels of abstraction:
- **Output Parsers**: These are modules that operate before and after an LLM text completion endpoint. They are not used with LLM function calling endpoints (since those contain structured outputs out of the box).
- **Pydantic Programs**: These are generic modules that map an input prompt to a structured output, represented by a Pydantic object. They may use function calling APIs or text completion APIs + output parsers.
- **Pre-defined Pydantic Program**: We have pre-defined Pydantic programs that map inputs to specific output types (like dataframes).
See the sections below for an overview of output parsers and Pydantic programs.
## 🔬 Anatomy of a Structured Output Function
Here we describe the different components of an LLM-powered structured output function. The pipeline depends on whether you're using a **generic LLM text completion API** or an **LLM function calling API**.
![](/_static/structured_output/diagram1.png)
With generic completion APIs, the inputs and outputs are handled by text prompts. The output parser plays a role before and after the LLM call in ensuring structured outputs. Before the LLM call, the output parser can
append format instructions to the prompt. After the LLM call, the output parser can parse the output to the specified instructions.
With function calling APIs, the output is inherently in a structured format, and the input can take in the signature of the desired object. The structured output just needs to be cast in the right object format (e.g. Pydantic).
## Output Parser Modules
```{toctree}
---
maxdepth: 2
---
output_parser.md
```
## Pydantic Program Modules
```{toctree}
---
maxdepth: 2
---
pydantic_program.md
```
@@ -0,0 +1,50 @@
# Callbacks
## Concept
LlamaIndex provides callbacks to help debug, track, and trace the inner workings of the library.
Using the callback manager, as many callbacks as needed can be added.
In addition to logging data related to events, you can also track the duration and number of occurances
of each event.
Furthermore, a trace map of events is also recorded, and callbacks can use this data
however they want. For example, the `LlamaDebugHandler` will, by default, print the trace of events
after most operations.
**Callback Event Types**
While each callback may not leverage each event type, the following events are available to be tracked:
- `CHUNKING` -> Logs for the before and after of text splitting.
- `NODE_PARSING` -> Logs for the documents and the nodes that they are parsed into.
- `EMBEDDING` -> Logs for the number of texts embedded.
- `LLM` -> Logs for the template and response of LLM calls.
- `QUERY` -> Keeps track of the start and end of each query.
- `RETRIEVE` -> Logs for the nodes retrieved for a query.
- `SYNTHESIZE` -> Logs for the result for synthesize calls.
- `TREE` -> Logs for the summary and level of summaries generated.
- `SUB_QUESTIONS` -> Logs for the sub questions and answers generated.
You can implement your own callback to track and trace these events, or use an existing callback.
## Modules
Currently supported callbacks are as follows:
- [TokenCountingHandler](/examples/callbacks/TokenCountingHandler.ipynb) -> Flexible token counting for prompt, completion, and embedding token usage. See the migration details [here](/core_modules/model_modules/callbacks/token_counting_migration.md)
- [LlamaDebugHanlder](/examples/callbacks/LlamaDebugHandler.ipynb) -> Basic tracking and tracing for events. Example usage can be found in the notebook below.
- [WandbCallbackHandler](/examples/callbacks/WandbCallbackHandler.ipynb) -> Tracking of events and traces using the Wandb Prompts frontend. More details are in the notebook below or at [Wandb](https://docs.wandb.ai/guides/prompts/quickstart)
- [AimCallback](/examples/callbacks/AimCallback.ipynb) -> Tracking of LLM inputs and outputs. Example usage can be found in the notebook below.
```{toctree}
---
maxdepth: 1
hidden:
---
/examples/callbacks/TokenCountingHandler.ipynb
/examples/callbacks/LlamaDebugHandler.ipynb
/examples/callbacks/WandbCallbackHandler.ipynb
/examples/callbacks/AimCallback.ipynb
token_counting_migration.md
```
@@ -0,0 +1,47 @@
# Token Counting - Migration Guide
The existing token counting implementation has been __deprecated__.
We know token counting is important to many users, so this guide was created to walkthrough a (hopefully painless) transition.
Previously, token counting was kept track of on the `llm_predictor` and `embed_model` objects directly, and optionally printed to the console. This implementation used a static tokenizer for token counting (gpt-2), and the `last_token_usage` and `total_token_usage` attributes were not always kept track of properly.
Going forward, token counting as moved into a callback. Using the `TokenCountingHandler` callback, you now have more options for how tokens are counted, the lifetime of the token counts, and even creating separete token counters for different indexes.
Here is a minimum example of using the new `TokenCountingHandler` with an OpenAI model:
```python
import tiktoken
from llama_index.callbacks import CallbackManager, TokenCountingHandler
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
# you can set a tokenizer directly, or optionally let it default
# to the same tokenizer that was used previously for token counting
# NOTE: The tokenizer should be a function that takes in text and returns a list of tokens
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("text-davinci-003").encode
verbose=False # set to true to see usage printed to the console
)
callback_manager = CallbackManager([token_counter])
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
document = SimpleDirectoryReader("./data").load_data()
# if verbose is turned on, you will see embedding token usage printed
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
# otherwise, you can access the count directly
print(token_counter.total_embedding_token_count)
# reset the counts at your discretion!
token_counter.reset_counts()
# also track prompt, completion, and total LLM tokens, in addition to embeddings
response = index.as_query_engine().query("What did the author do growing up?")
print('Embedding Tokens: ', token_counter.total_embedding_token_count, '\n',
'LLM Prompt Tokens: ', token_counter.prompt_llm_token_count, '\n',
'LLM Completion Tokens: ', token_counter.completion_llm_token_count, '\n',
'Total LLM Token Count: ', token_counter.total_llm_token_count)
```
@@ -0,0 +1,97 @@
# Cost Analysis
## Concept
Each call to an LLM will cost some amount of money - for instance, OpenAI's gpt-3.5-turbo costs $0.002 / 1k tokens. The cost of building an index and querying depends on
- the type of LLM used
- the type of data structure used
- parameters used during building
- parameters used during querying
The cost of building and querying each index is a TODO in the reference documentation. In the meantime, we provide the following information:
1. A high-level overview of the cost structure of the indices.
2. A token predictor that you can use directly within LlamaIndex!
### Overview of Cost Structure
#### Indices with no LLM calls
The following indices don't require LLM calls at all during building (0 cost):
- `ListIndex`
- `SimpleKeywordTableIndex` - uses a regex keyword extractor to extract keywords from each document
- `RAKEKeywordTableIndex` - uses a RAKE keyword extractor to extract keywords from each document
#### Indices with LLM calls
The following indices do require LLM calls during build time:
- `TreeIndex` - use LLM to hierarchically summarize the text to build the tree
- `KeywordTableIndex` - use LLM to extract keywords from each document
### Query Time
There will always be >= 1 LLM call during query time, in order to synthesize the final answer.
Some indices contain cost tradeoffs between index building and querying. `ListIndex`, for instance,
is free to build, but running a query over a list index (without filtering or embedding lookups), will
call the LLM {math}`N` times.
Here are some notes regarding each of the indices:
- `ListIndex`: by default requires {math}`N` LLM calls, where N is the number of nodes.
- `TreeIndex`: by default requires {math}`\log (N)` LLM calls, where N is the number of leaf nodes.
- Setting `child_branch_factor=2` will be more expensive than the default `child_branch_factor=1` (polynomial vs logarithmic), because we traverse 2 children instead of just 1 for each parent node.
- `KeywordTableIndex`: by default requires an LLM call to extract query keywords.
- Can do `index.as_retriever(retriever_mode="simple")` or `index.as_retriever(retriever_mode="rake")` to also use regex/RAKE keyword extractors on your query text.
- `VectorStoreIndex`: by default, requires one LLM call per query. If you increase the `similarity_top_k` or `chunk_size`, or change the `response_mode`, then this number will increase.
## Usage Pattern
LlamaIndex offers token **predictors** to predict token usage of LLM and embedding calls.
This allows you to estimate your costs during 1) index construction, and 2) index querying, before
any respective LLM calls are made.
Tokens are counted using the `TokenCountingHandler` callback. See the [example notebook](../../../examples/callbacks/TokenCountingHandler.ipynb) for details on the setup.
### Using MockLLM
To predict token usage of LLM calls, import and instantiate the MockLLM as shown below. The `max_tokens` parameter is used as a "worst case" prediction, where each LLM response will contain exactly that number of tokens. If `max_tokens` is not specified, then it will simply predict back the prompt.
```python
from llama_index import ServiceContext, set_global_service_context
from llama_index.llms import MockLLM
llm = MockLLM(max_tokens=256)
service_context = ServiceContext.from_defaults(llm=llm)
# optionally set a global service context
set_global_service_context(service_context)
```
You can then use this predictor during both index construction and querying.
### Using MockEmbedding
You may also predict the token usage of embedding calls with `MockEmbedding`.
```python
from llama_index import ServiceContext, set_global_service_context
from llama_index import MockEmbedding
# specify a MockLLMPredictor
embed_model = MockEmbedding(embed_dim=1536)
service_context = ServiceContext.from_defaults(embed_model=embed_model)
# optionally set a global service context
set_global_service_context(service_context)
```
## Usage Pattern
Read about the full usage pattern below!
```{toctree}
---
caption: Examples
maxdepth: 1
---
usage_pattern.md
```
@@ -0,0 +1,97 @@
# Usage Pattern
## Estimating LLM and Embedding Token Counts
In order to measure LLM and Embedding token counts, you'll need to
1. Setup `MockLLM` and `MockEmbedding` objects
```python
from llama_index.llms import MockLLM
from llama_index import MockEmbedding
llm = MockLLM(max_tokens=256)
embed_model = MockEmbedding(embed_dim=1536)
```
2. Setup the `TokenCountingCallback` handler
```python
import tiktoken
from llama_index.callbacks import CallbackManager, TokenCountingHandler
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
callback_manager = CallbackManager([token_counter])
```
3. Add them to the global `ServiceContext`
```python
from llama_index import ServiceContext, set_global_service_context
set_global_service_context(
ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
callback_manager=callback_manager
)
)
```
4. Construct an Index
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./docs/examples/data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)
```
5. Measure the counts!
```python
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
# reset counts
token_counter.reset_counts()
```
6. Run a query, mesaure again
```python
query_engine = index.as_query_engine()
response = query_engine.query("query")
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
```
@@ -0,0 +1,13 @@
# Modules
Notebooks with usage of these components can be found below.
```{toctree}
---
maxdepth: 1
---
../../../examples/evaluation/TestNYC-Evaluation.ipynb
../../../examples/evaluation/TestNYC-Evaluation-Query.ipynb
../../../examples/evaluation/QuestionGeneration.ipynb
```
@@ -0,0 +1,64 @@
# Evaluation
## Concept
Evaluation in generative AI and retrieval is a difficult task. Due to the unpredictable nature of text, and a general lack of "expected" outcomes to compare against, there are many blockers to getting started with evaluation.
However, LlamaIndex offers a few key modules for evaluating the quality of both Document retrieval and response synthesis.
Here are some key questions for each component:
- **Document retrieval**: Are the sources relevant to the query?
- **Response synthesis**: Does the response match the retrieved context? Does it also match the query?
This guide describes how the evaluation components within LlamaIndex work. Note that our current evaluation modules
do *not* require ground-truth labels. Evaluation can be done with some combination of the query, context, response,
and combine these with LLM calls.
### Evaluation of the Response + Context
Each response from a `query_engine.query` calls returns both the synthesized response as well as source documents.
We can evaluate the response against the retrieved sources - without taking into account the query!
This allows you to measure hallucination - if the response does not match the retrieved sources, this means that the model may be "hallucinating" an answer since it is not rooting the answer in the context provided to it in the prompt.
There are two sub-modes of evaluation here. We can either get a binary response "YES"/"NO" on whether response matches *any* source context,
and also get a response list across sources to see which sources match.
The `ResponseEvaluator` handles both modes for evaluating in this context.
### Evaluation of the Query + Response + Source Context
This is similar to the above section, except now we also take into account the query. The goal is to determine if
the response + source context answers the query.
As with the above, there are two submodes of evaluation.
- We can either get a binary response "YES"/"NO" on whether
the response matches the query, and whether any source node also matches the query.
- We can also ignore the synthesized response, and check every source node to see
if it matches the query.
### Question Generation
In addition to evaluating queries, LlamaIndex can also use your data to generate questions to evaluate on. This means that you can automatically generate questions, and then run an evaluation pipeline to test if the LLM can actually answer questions accurately using your data.
## Usage Pattern
For full usage details, see the usage pattern below.
```{toctree}
---
maxdepth: 1
---
usage_pattern.md
```
## Modules
Notebooks with usage of these components can be found below.
```{toctree}
---
maxdepth: 1
---
modules.md
```
@@ -0,0 +1,141 @@
# Usage Pattern
## Evaluating Response for Hallucination
### Binary Evaluation
This mode of evaluation will return "YES"/"NO" if the synthesized response matches any source context.
```python
from llama_index import VectorStoreIndex
from llama_index.llms import OpenAI
from llama_index.evaluation import ResponseEvaluator
# build service context
llm = OpenAI(model="gpt-4", temperature=0.0)
service_context = ServiceContext.from_defaults(llm=llm)
# build index
...
# define evaluator
evaluator = ResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American Revolution?")
eval_result = evaluator.evaluate(response)
print(str(eval_result))
```
You'll get back either a `YES` or `NO` response.
![](/_static/evaluation/eval_response_context.png)
### Sources Evaluation
This mode of evaluation will return "YES"/"NO" for every source node.
```python
from llama_index import VectorStoreIndex
from llama_index.evaluation import ResponseEvaluator
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# build index
...
# define evaluator
evaluator = ResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American Revolution?")
eval_result = evaluator.evaluate_source_nodes(response)
print(str(eval_result))
```
You'll get back a list of "YES"/"NO", corresponding to each source node in `response.source_nodes`.
## Evaluting Query + Response for Answer Quality
### Binary Evaluation
This mode of evaluation will return "YES"/"NO" if the synthesized response matches the query + any source context.
```python
from llama_index import VectorStoreIndex
from llama_index.llms import OpenAI
from llama_index.evaluation import QueryResponseEvaluator
# build service context
llm = OpenAI(model="gpt-4", temperature=0.0)
service_context = ServiceContext.from_defaults(llm=llm)
# build index
...
# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American Revolution?")
eval_result = evaluator.evaluate(response)
print(str(eval_result))
```
![](/_static/evaluation/eval_query_response_context.png)
### Sources Evaluation
This mode of evaluation will look at each source node, and see if each source node contains an answer to the query.
```python
from llama_index import VectorStoreIndex
from llama_index.evaluation import QueryResponseEvaluator
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# build index
...
# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American Revolution?")
eval_result = evaluator.evaluate_source_nodes(response)
print(str(eval_result))
```
![](/_static/evaluation/eval_query_sources.png)
## Question Generation
LlamaIndex can also generate questions to answer using your data. Using in combination with the above evaluators, you can create a fully automated evaluation pipeline over your data.
```python
from llama_index import SimpleDirectoryReader
from llama_index.evaluation import ResponseEvaluator
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# build documents
documents = SimpleDirectoryReader("./data").load_data()
# define genertor, generate questions
data_generator = DatasetGenerator.from_documents(documents)
eval_questions = data_generator.generate_questions_from_nodes()
```
@@ -0,0 +1,44 @@
# Playground
## Concept
The Playground module in LlamaIndex is a way to automatically test your data (i.e. documents) across a diverse combination of indices, models, embeddings, modes, etc. to decide which ones are best for your purposes. More options will continue to be added.
For each combination, you'll be able to compare the results for any query and compare the answers, latency, tokens used, and so on.
You may initialize a Playground with a list of pre-built indices, or initialize one from a list of Documents using the preset indices.
## Usage Pattern
A sample usage is given below.
```python
from llama_index import download_loader
from llama_index.indices.vector_store import VectorStoreIndex
from llama_index.indices.tree.base import TreeIndex
from llama_index.playground import Playground
# load data
WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(pages=['Berlin'])
# define multiple index data structures (vector index, list index)
indices = [VectorStoreIndex(documents), TreeIndex(documents)]
# initialize playground
playground = Playground(indices=indices)
# playground compare
playground.compare("What is the population of Berlin?")
```
## Modules
```{toctree}
---
maxdepth: 1
---
../../../examples/analysis/PlaygroundDemo.ipynb
```
@@ -0,0 +1,103 @@
# ServiceContext
## Concept
The `ServiceContext` is a bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application.
You can use it to set the [global configuration](#setting-global-configuration), as well as [local configurations](#setting-local-configuration) at specific parts of the pipeline.
## Usage Pattern
### Configuring the service context
The `ServiceContext` is a simple python dataclass that you can directly construct by passing in the desired components.
```python
@dataclass
class ServiceContext:
# The LLM used to generate natural language responses to queries.
llm_predictor: BaseLLMPredictor
# The PromptHelper object that helps with truncating and repacking text chunks to fit in the LLM's context window.
prompt_helper: PromptHelper
# The embedding model used to generate vector representations of text.
embed_model: BaseEmbedding
# The parser that converts documents into nodes.
node_parser: NodeParser
# The callback manager object that calls it's handlers on events. Provides basic logging and tracing capabilities.
callback_manager: CallbackManager
@classmethod
def from_defaults(cls, ...) -> "ServiceContext":
...
```
```{tip}
Learn how to configure specific modules:
- [LLM](/core_modules/model_modules/llms/usage_custom.md)
- [Embedding Model](/core_modules/model_modules/embeddings/usage_pattern.md)
- [Node Parser](/core_modules/data_modules/node_parsers/usage_pattern.md)
```
We also expose some common kwargs (of the above components) via the `ServiceContext.from_defaults` method
for convenience (so you don't have to manually construct them).
**Kwargs for node parser**:
- `chunk_size`: The size of the text chunk for a node . Is used for the node parser when they aren't provided.
- `chunk_overlap`: The amount of overlap between nodes (i.e. text chunks).
**Kwargs for prompt helper**:
- `context_window`: The size of the context window of the LLM. Typically we set this
automatically with the model metadata. But we also allow explicit override via this parameter
for additional control (or in case the default is not available for certain latest
models)
- `num_output`: The number of maximum output from the LLM. Typically we set this
automatically given the model metadata. This parameter does not actually limit the model
output, it affects the amount of "space" we save for the output, when computing
available context window size for packing text from retrieved Nodes.
Here's a complete example that sets up all objects using their default settings:
```python
from llama_index import ServiceContext, LLMPredictor, OpenAIEmbedding, PromptHelper
from llama_index.llms import OpenAI
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser
llm = OpenAI(model='text-davinci-003', temperature=0, max_tokens=256)
embed_model = OpenAIEmbedding()
node_parser = SimpleNodeParser(
text_splitter=TokenTextSplitter(chunk_size=1024, chunk_overlap=20)
)
prompt_helper = PromptHelper(
context_window=4096,
num_output=256,
chunk_overlap_ratio=0.1,
chunk_size_limit=None
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
node_parser=node_parser,
prompt_helper=prompt_helper
)
```
### Setting global configuration
You can set a service context as the global default that applies to the entire LlamaIndex pipeline:
```python
from llama_index import set_global_service_context
set_global_service_context(service_context)
```
### Setting local configuration
You can pass in a service context to specific part of the pipeline to override the default configuration:
```python
query_engine = index.as_query_engine(service_context=service_context)
response = query_engine.query("What did the author do growing up?")
print(response)
```
+41
View File
@@ -0,0 +1,41 @@
# Deprecated Terms
As LlamaIndex continues to evolve, many class names and APIs have been adjusted, improved, and deprecated.
The following is a list of previously popular terms that have been deprecated, with links to their replacements.
## GPTSimpleVectorIndex
This has been renamed to `VectorStoreIndex`, as well as unifying all vector indexes to a single unified interface. You can integrate with various vector databases by modifying the underlying `vector_store`.
Please see the following links for more details on usage.
- [Index Usage Pattern](/core_modules/data_modules/index/usage_pattern.md)
- [Vector Store Guide](/core_modules/data_modules/index/vector_store_guide.ipynb)
- [Vector Store Integrations](/community/integrations/vector_stores.md)
## GPTVectorStoreIndex
This has been renamed to `VectorStoreIndex`, but it is only a cosmetic change. Please see the following links for more details on usage.
- [Index Usage Pattern](/core_modules/data_modules/index/usage_pattern.md)
- [Vector Store Guide](/core_modules/data_modules/index/vector_store_guide.ipynb)
- [Vector Store Integrations](/community/integrations/vector_stores.md)
## LLMPredictor
The `LLMPredictor` object is no longer intended to be used by users. Instead, you can setup an LLM directly and pass it into the `ServiceContext`.
- [LLMs in LlamaIndex](/core_modules/model_modules/llms/root.md)
- [Setting LLMs in the ServiceContext](/core_modules/supporting_modules/service_context.md)
## PromptHelper and max_input_size/
The `max_input_size` parameter for the prompt helper has since been replaced with `context_window`.
The `PromptHelper` in general has been deprecated in favour of specifying parameters directly in the `service_context` and `node_parser`.
See the following links for more details.
- [Configuring settings in the Service Context](/core_modules/supporting_modules/service_context.md)
- [Parsing Documents into Nodes](/core_modules/data_modules/node_parsers/root.md)
+1
View File
@@ -0,0 +1 @@
.. mdinclude:: ../../CHANGELOG.md
+1
View File
@@ -0,0 +1 @@
.. mdinclude:: ../../CONTRIBUTING.md
+1
View File
@@ -0,0 +1 @@
.. mdinclude:: ../DOCS_README.md
+8
View File
@@ -0,0 +1,8 @@
# Privacy and Security
By default, LLamaIndex sends your data to OpenAI for generating embeddings and natural language responses. However, it is important to note that this can be configured according to your preferences. LLamaIndex provides the flexibility to use your own embedding model or run a large language model locally if desired.
## Data Privacy
Regarding data privacy, when using LLamaIndex with OpenAI, the privacy details and handling of your data are subject to OpenAI's policies. And each custom service other than OpenAI have their own policies as well.
## Vector stores
LLamaIndex offers modules to connect with other vector stores within indexes to store embeddings. It is worth noting that each vector store has its own privacy policies and practices, and LLamaIndex does not assume responsibility for how they handle or use your data. Also by default LLamaIndex have a default option to store your embeddings locally.
+96
View File
@@ -0,0 +1,96 @@
# Agents
## Context
An "agent" is an automated reasoning and decision engine. It takes in a user input/query and can make internal decisions for executing
that query in order to return the correct result. The key agent components can include, but are not limited to:
- Breaking down a complex question into smaller ones
- Choosing an external Tool to use + coming up with parameters for calling the Tool
- Planning out a set of tasks
- Storing previously completed tasks in a memory module
Research developments in LLMs (e.g. [ChatGPT Plugins](https://openai.com/blog/chatgpt-plugins)), LLM research ([ReAct](https://arxiv.org/abs/2210.03629), [Toolformer](https://arxiv.org/abs/2302.04761)) and LLM tooling ([LangChain](https://python.langchain.com/en/latest/modules/agents.html), [Semantic Kernel](https://github.com/microsoft/semantic-kernel)) have popularized the concept of agents.
## Agents + LlamaIndex
LlamaIndex provides some amazing tools to manage and interact with your data within your LLM application. And it can be a core tool that you use while building an agent-based app.
- On one hand, some components within LlamaIndex are "agent-like" - these make automated decisions to help a particular use case over your data.
- On the other hand, LlamaIndex can be used as a core Tool within another agent framework.
In general, LlamaIndex components offer more explicit, constrained behavior for more specific use cases. Agent frameworks such as ReAct (implemented in LangChain) offer agents that are more unconstrained +
capable of general reasoning.
There are tradeoffs for using both - less-capable LLMs typically do better with more constraints. Take a look at [our blog post on this](https://medium.com/llamaindex-blog/dumber-llm-agents-need-more-constraints-and-better-tools-17a524c59e12) for
a more information + a detailed analysis.
### "Agent-like" Components within LlamaIndex
LlamaIndex provides core modules capable of automated reasoning for different use cases over your data. Please check out our [use cases doc](/end_to_end_tutorials/use_cases.md) for more details on high-level use cases that LlamaIndex can help fulfill.
Some of these core modules are shown below along with example tutorials (not comprehensive, please click into the guides/how-tos for more details).
**SubQuestionQueryEngine for Multi-Document Analysis**
- [Usage](queries.md#multi-document-queries)
- [Sub Question Query Engine (Intro)](/examples/query_engine/sub_question_query_engine.ipynb)
- [10Q Analysis (Uber)](/examples/usecases/10q_sub_question.ipynb)
- [10K Analysis (Uber and Lyft)](/examples/usecases/10k_sub_question.ipynb)
**Query Transformations**
- [How-To](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
- [Multi-Step Query Decomposition](/examples/query_transformations/HyDEQueryTransformDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb))
**Routing**
- [Usage](queries.md#routing-over-heterogeneous-data)
- [Router Query Engine Guide](/examples/query_engine/RouterQueryEngine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/RouterQueryEngine.ipynb))
**LLM Reranking**
- [Second Stage Processing How-To](/core_modules/query_modules/node_postprocessors/root.md)
- [LLM Reranking Guide (Great Gatsby)](/examples/node_postprocessor/LLMReranker-Gatsby.ipynb)
**Chat Engines**
- [Chat Engines How-To](/core_modules/query_modules/chat_engines/root.md)
### Using LlamaIndex as as Tool within an Agent Framework
LlamaIndex can be used as as Tool within an agent framework - including LangChain, ChatGPT. These integrations are described below.
#### LangChain
We have deep integrations with LangChain.
LlamaIndex query engines can be easily packaged as Tools to be used within a LangChain agent, and LlamaIndex can also be used as a memory module / retriever. Check out our guides/tutorials below!
**Resources**
- [LangChain integration guide](/community/integrations/using_with_langchain.md)
- [Building a Chatbot Tutorial (LangChain + LlamaIndex)](/guides/tutorials/building_a_chatbot.md)
- [OnDemandLoaderTool Tutorial](/examples/tools/OnDemandLoaderTool.ipynb)
#### ChatGPT
LlamaIndex can be used as a ChatGPT retrieval plugin (we have a TODO to develop a more general plugin as well).
**Resources**
- [LlamaIndex ChatGPT Retrieval Plugin](https://github.com/openai/chatgpt-retrieval-plugin#llamaindex)
### Native OpenAIAgent
With the [new OpenAI API](https://openai.com/blog/function-calling-and-other-api-updates) that supports function calling, its never been easier to build your own agent!
Learn how to write your own OpenAI agent in **under 50 lines of code**, or directly use our super simple
`OpenAIAgent` implementation.
```{toctree}
---
maxdepth: 1
---
/examples/agent/openai_agent.ipynb
/examples/agent/openai_agent_with_query_engine.ipynb
/examples/agent/openai_agent_retrieval.ipynb
/examples/agent/openai_agent_query_cookbook.ipynb
/examples/agent/openai_agent_query_plan.ipynb
/examples/agent/openai_agent_context_retrieval.ipynb
```
+13
View File
@@ -0,0 +1,13 @@
# Full-Stack Web Application
LlamaIndex can be integrated into a downstream full-stack web application. It can be used in a backend server (such as Flask), packaged into a Docker container, and/or directly used in a framework such as Streamlit.
We provide tutorials and resources to help you get started in this area.
Relevant Resources:
- [Fullstack Application Guide](/end_to_end_tutorials/apps/fullstack_app_guide.md)
- [Fullstack Application with Delphic](/end_to_end_tutorials/apps/fullstack_with_delphic.md)
- [A Guide to Extracting Terms and Definitions](/end_to_end_tutorials/question_and_answer/terms_definitions_tutorial.md)
- [LlamaIndex Starter Pack](https://github.com/logan-markewich/llama_index_starter_pack)
@@ -0,0 +1,370 @@
# A Guide to Building a Full-Stack Web App with LLamaIndex
LlamaIndex is a python library, which means that integrating it with a full-stack web application will be a little different than what you might be used to.
This guide seeks to walk through the steps needed to create a basic API service written in python, and how this interacts with a TypeScript+React frontend.
All code examples here are available from the [llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react) in the flask_react folder.
The main technologies used in this guide are as follows:
- python3.11
- llama_index
- flask
- typescript
- react
## Flask Backend
For this guide, our backend will use a [Flask](https://flask.palletsprojects.com/en/2.2.x/) API server to communicate with our frontend code. If you prefer, you can also easily translate this to a [FastAPI](https://fastapi.tiangolo.com/) server, or any other python server library of your choice.
Setting up a server using Flask is easy. You import the package, create the app object, and then create your endpoints. Let's create a basic skeleton for the server first:
```python
from flask import Flask
app = Flask(__name__)
@app.route("/")
def home():
return "Hello World!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5601)
```
_flask_demo.py_
If you run this file (`python flask_demo.py`), it will launch a server on port 5601. If you visit `http://localhost:5601/`, you will see the "Hello World!" text rendered in your browser. Nice!
The next step is deciding what functions we want to include in our server, and to start using LlamaIndex.
To keep things simple, the most basic operation we can provide is querying an existing index. Using the [paul graham essay](https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt) from LlamaIndex, create a documents folder and download+place the essay text file inside of it.
### Basic Flask - Handling User Index Queries
Now, let's write some code to initialize our index:
```python
import os
from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext
# NOTE: for local testing only, do NOT deploy with your key hardcoded
os.environ['OPENAI_API_KEY'] = "your key here"
index = None
def initialize_index():
global index
storage_context = StorageContext.from_defaults()
if os.path.exists(index_dir):
index = load_index_from_storage(storage_context)
else:
documents = SimpleDirectoryReader("./documents").load_data()
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
storage_context.persist(index_dir)
```
This function will initialize our index. If we call this just before starting the flask server in the `main` function, then our index will be ready for user queries!
Our query endpoint will accept `GET` requests with the query text as a parameter. Here's what the full endpoint function will look like:
```python
from flask import request
@app.route("/query", methods=["GET"])
def query_index():
global index
query_text = request.args.get("text", None)
if query_text is None:
return "No text found, please include a ?text=blah parameter in the URL", 400
query_engine = index.as_query_engine()
response = query_engine.query(query_text)
return str(response), 200
```
Now, we've introduced a few new concepts to our server:
- a new `/query` endpoint, defined by the function decorator
- a new import from flask, `request`, which is used to get parameters from the request
- if the `text` parameter is missing, then we return an error message and an appropriate HTML response code
- otherwise, we query the index, and return the response as a string
A full query example that you can test in your browser might look something like this: `http://localhost:5601/query?text=what did the author do growing up` (once you press enter, the browser will convert the spaces into "%20" characters).
Things are looking pretty good! We now have a functional API. Using your own documents, you can easily provide an interface for any application to call the flask API and get answers to queries.
### Advanced Flask - Handling User Document Uploads
Things are looking pretty cool, but how can we take this a step further? What if we want to allow users to build their own indexes by uploading their own documents? Have no fear, Flask can handle it all :muscle:.
To let users upload documents, we have to take some extra precautions. Instead of querying an existing index, the index will become **mutable**. If you have many users adding to the same index, we need to think about how to handle concurrency. Our Flask server is threaded, which means multiple users can ping the server with requests which will be handled at the same time.
One option might be to create an index for each user or group, and store and fetch things from S3. But for this example, we will assume there is one locally stored index that users are interacting with.
To handle concurrent uploads and ensure sequential inserts into the index, we can use the `BaseManager` python package to provide sequential access to the index using a separate server and locks. This sounds scary, but it's not so bad! We will just move all our index operations (initializing, querying, inserting) into the `BaseManager` "index_server", which will be called from our Flask server.
Here's a basic example of what our `index_server.py` will look like after we've moved our code:
```python
import os
from multiprocessing import Lock
from multiprocessing.managers import BaseManager
from llama_index import SimpleDirectoryReader, VectorStoreIndex, Document
# NOTE: for local testing only, do NOT deploy with your key hardcoded
os.environ['OPENAI_API_KEY'] = "your key here"
index = None
lock = Lock()
def initialize_index():
global index
with lock:
# same as before ...
...
def query_index(query_text):
global index
query_engine = index.as_query_engine()
response = query_engine.query(query_text)
return str(response)
if __name__ == "__main__":
# init the global index
print("initializing index...")
initialize_index()
# setup server
# NOTE: you might want to handle the password in a less hardcoded way
manager = BaseManager(('', 5602), b'password')
manager.register('query_index', query_index)
server = manager.get_server()
print("starting server...")
server.serve_forever()
```
_index_server.py_
So, we've moved our functions, introduced the `Lock` object which ensures sequential access to the global index, registered our single function in the server, and started the server on port 5602 with the password `password`.
Then, we can adjust our flask code as follows:
```python
from multiprocessing.managers import BaseManager
from flask import Flask, request
# initialize manager connection
# NOTE: you might want to handle the password in a less hardcoded way
manager = BaseManager(('', 5602), b'password')
manager.register('query_index')
manager.connect()
@app.route("/query", methods=["GET"])
def query_index():
global index
query_text = request.args.get("text", None)
if query_text is None:
return "No text found, please include a ?text=blah parameter in the URL", 400
response = manager.query_index(query_text)._getvalue()
return str(response), 200
@app.route("/")
def home():
return "Hello World!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5601)
```
_flask_demo.py_
The two main changes are connecting to our existing `BaseManager` server and registering the functions, as well as calling the function through the manager in the `/query` endpoint.
One special thing to note is that `BaseManager` servers don't return objects quite as we expect. To resolve the return value into it's original object, we call the `_getvalue()` function.
If we allow users to upload their own documents, we should probably remove the Paul Graham essay from the documents folder, so let's do that first. Then, let's add an endpoint to upload files! First, let's define our Flask endpoint function:
```python
...
manager.register('insert_into_index')
...
@app.route("/uploadFile", methods=["POST"])
def upload_file():
global manager
if 'file' not in request.files:
return "Please send a POST request with a file", 400
filepath = None
try:
uploaded_file = request.files["file"]
filename = secure_filename(uploaded_file.filename)
filepath = os.path.join('documents', os.path.basename(filename))
uploaded_file.save(filepath)
if request.form.get("filename_as_doc_id", None) is not None:
manager.insert_into_index(filepath, doc_id=filename)
else:
manager.insert_into_index(filepath)
except Exception as e:
# cleanup temp file
if filepath is not None and os.path.exists(filepath):
os.remove(filepath)
return "Error: {}".format(str(e)), 500
# cleanup temp file
if filepath is not None and os.path.exists(filepath):
os.remove(filepath)
return "File inserted!", 200
```
Not too bad! You will notice that we write the file to disk. We could skip this if we only accept basic file formats like `txt` files, but written to disk we can take advantage of LlamaIndex's `SimpleDirectoryReader` to take care of a bunch of more complex file formats. Optionally, we also use a second `POST` argument to either use the filename as a doc_id or let LlamaIndex generate one for us. This will make more sense once we implement the frontend.
With these more complicated requests, I also suggest using a tool like [Postman](https://www.postman.com/downloads/?utm_source=postman-home). Examples of using postman to test our endpoints are in the [repository for this project](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react/postman_examples).
Lastly, you'll notice we added a new function to the manager. Let's implement that inside `index_server.py`:
```python
def insert_into_index(doc_text, doc_id=None):
global index
document = SimpleDirectoryReader(input_files=[doc_text]).load_data()[0]
if doc_id is not None:
document.doc_id = doc_id
with lock:
index.insert(document)
index.storage_context.persist()
...
manager.register('insert_into_index', insert_into_index)
...
```
Easy! If we launch both the `index_server.py` and then the `flask_demo.py` python files, we have a Flask API server that can handle multiple requests to insert documents into a vector index and respond to user queries!
To support some functionality in the frontend, I've adjusted what some responses look like from the Flask API, as well as added some functionality to keep track of which documents are stored in the index (LlamaIndex doesn't currently support this in a user-friendly way, but we can augment it ourselves!). Lastly, I had to add CORS support to the server using the `Flask-cors` python package.
Check out the complete `flask_demo.py` and `index_server.py` scripts in the [repository](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react) for the final minor changes, the`requirements.txt` file, and a sample `Dockerfile` to help with deployment.
## React Frontend
Generally, React and Typescript are one of the most popular libraries and languages for writing webapps today. This guide will assume you are familiar with how these tools work, because otherwise this guide will triple in length :smile:.
In the [repository](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react), the frontend code is organized inside of the `react_frontend` folder.
The most relevant part of the frontend will be the `src/apis` folder. This is where we make calls to the Flask server, supporting the following queries:
- `/query` -- make a query to the existing index
- `/uploadFile` -- upload a file to the flask server for insertion into the index
- `/getDocuments` -- list the current document titles and a portion of their texts
Using these three queries, we can build a robust frontend that allows users to upload and keep track of their files, query the index, and view the query response and information about which text nodes were used to form the response.
### fetchDocuments.tsx
This file contains the function to, you guessed it, fetch the list of current documents in the index. The code is as follows:
```typescript
export type Document = {
id: string;
text: string;
};
const fetchDocuments = async (): Promise<Document[]> => {
const response = await fetch("http://localhost:5601/getDocuments", {
mode: "cors",
});
if (!response.ok) {
return [];
}
const documentList = (await response.json()) as Document[];
return documentList;
};
```
As you can see, we make a query to the Flask server (here, it assumes running on localhost). Notice that we need to include the `mode: 'cors'` option, as we are making an external request.
Then, we check if the response was ok, and if so, get the response json and return it. Here, the response json is a list of `Document` objects that are defined in the same file.
### queryIndex.tsx
This file sends the user query to the flask server, and gets the response back, as well as details about which nodes in our index provided the response.
```typescript
export type ResponseSources = {
text: string;
doc_id: string;
start: number;
end: number;
similarity: number;
};
export type QueryResponse = {
text: string;
sources: ResponseSources[];
};
const queryIndex = async (query: string): Promise<QueryResponse> => {
const queryURL = new URL("http://localhost:5601/query?text=1");
queryURL.searchParams.append("text", query);
const response = await fetch(queryURL, { mode: "cors" });
if (!response.ok) {
return { text: "Error in query", sources: [] };
}
const queryResponse = (await response.json()) as QueryResponse;
return queryResponse;
};
export default queryIndex;
```
This is similar to the `fetchDocuments.tsx` file, with the main difference being we include the query text as a parameter in the URL. Then, we check if the response is ok and return it with the appropriate typescript type.
### insertDocument.tsx
Probably the most complex API call is uploading a document. The function here accepts a file object and constructs a `POST` request using `FormData`.
The actual response text is not used in the app but could be utilized to provide some user feedback on if the file failed to upload or not.
```typescript
const insertDocument = async (file: File) => {
const formData = new FormData();
formData.append("file", file);
formData.append("filename_as_doc_id", "true");
const response = await fetch("http://localhost:5601/uploadFile", {
mode: "cors",
method: "POST",
body: formData,
});
const responseText = response.text();
return responseText;
};
export default insertDocument;
```
### All the Other Frontend Good-ness
And that pretty much wraps up the frontend portion! The rest of the react frontend code is some pretty basic react components, and my best attempt to make it look at least a little nice :smile:.
I encourage to read the rest of the [codebase](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react/react_frontend) and submit any PRs for improvements!
## Conclusion
This guide has covered a ton of information. We went from a basic "Hello World" Flask server written in python, to a fully functioning LlamaIndex powered backend and how to connect that to a frontend application.
As you can see, we can easily augment and wrap the services provided by LlamaIndex (like the little external document tracker) to help provide a good user experience on the frontend.
You could take this and add many features (multi-index/user support, saving objects into S3, adding a Pinecone vector server, etc.). And when you build an app after reading this, be sure to share the final result in the Discord! Good Luck! :muscle:
@@ -0,0 +1,785 @@
# A Guide to Building a Full-Stack LlamaIndex Web App with Delphic
This guide seeks to walk you through using LlamaIndex with a production-ready web app starter template
called [Delphic](https://github.com/JSv4/Delphic). All code examples here are available from
the [Delphic](https://github.com/JSv4/Delphic) repo
## What We're Building
Here's a quick demo of the out-of-the-box functionality of Delphic:
https://user-images.githubusercontent.com/5049984/233236432-aa4980b6-a510-42f3-887a-81485c9644e6.mp4
## Architectural Overview
Delphic leverages the LlamaIndex python library to let users to create their own document collections they can then
query in a responsive frontend.
We chose a stack that provides a responsive, robust mix of technologies that can (1) orchestrate complex python
processing tasks while providing (2) a modern, responsive frontend and (3) a secure backend to build additional
functionality upon.
The core libraries are:
1. [Django](https://www.djangoproject.com/)
2. [Django Channels](https://channels.readthedocs.io/en/stable/)
3. [Django Ninja](https://django-ninja.rest-framework.com/)
4. [Redis](https://redis.io/)
5. [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html)
6. [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/)
7. [Langchain](https://python.langchain.com/en/latest/index.html)
8. [React](https://github.com/facebook/react)
9. Docker & Docker Compose
Thanks to this modern stack built on the super stable Django web framework, the starter Delphic app boasts a streamlined
developer experience, built-in authentication and user management, asynchronous vector store processing, and
web-socket-based query connections for a responsive UI. In addition, our frontend is built with TypeScript and is based
on MUI React for a responsive and modern user interface.
## System Requirements
Celery doesn't work on Windows. It may be deployable with Windows Subsystem for Linux, but configuring that is beyond
the scope of this tutorial. For this reason, we recommend you only follow this tutorial if you're running Linux or OSX.
You will need Docker and Docker Compose installed to deploy the application. Local development will require node version
manager (nvm).
## Django Backend
### Project Directory Overview
The Delphic application has a structured backend directory organization that follows common Django project conventions.
From the repo root, in the `./delphic` subfolder, the main folders are:
1. `contrib`: This directory contains custom modifications or additions to Django's built-in `contrib` apps.
2. `indexes`: This directory contains the core functionality related to document indexing and LLM integration. It
includes:
- `admin.py`: Django admin configuration for the app
- `apps.py`: Application configuration
- `models.py`: Contains the app's database models
- `migrations`: Directory containing database schema migrations for the app
- `signals.py`: Defines any signals for the app
- `tests.py`: Unit tests for the app
3. `tasks`: This directory contains tasks for asynchronous processing using Celery. The `index_tasks.py` file includes
the tasks for creating vector indexes.
4. `users`: This directory is dedicated to user management, including:
5. `utils`: This directory contains utility modules and functions that are used across the application, such as custom
storage backends, path helpers, and collection-related utilities.
### Database Models
The Delphic application has two core models: `Document` and `Collection`. These models represent the central entities
the application deals with when indexing and querying documents using LLMs. They're defined in
[`./delphic/indexes/models.py`](https://github.com/JSv4/Delphic/blob/main/delphic/indexes/models.py).
1. `Collection`:
- `api_key`: A foreign key that links a collection to an API key. This helps associate jobs with the source API key.
- `title`: A character field that provides a title for the collection.
- `description`: A text field that provides a description of the collection.
- `status`: A character field that stores the processing status of the collection, utilizing the `CollectionStatus`
enumeration.
- `created`: A datetime field that records when the collection was created.
- `modified`: A datetime field that records the last modification time of the collection.
- `model`: A file field that stores the model associated with the collection.
- `processing`: A boolean field that indicates if the collection is currently being processed.
2. `Document`:
- `collection`: A foreign key that links a document to a collection. This represents the relationship between documents
and collections.
- `file`: A file field that stores the uploaded document file.
- `description`: A text field that provides a description of the document.
- `created`: A datetime field that records when the document was created.
- `modified`: A datetime field that records the last modification time of the document.
These models provide a solid foundation for collections of documents and the indexes created from them with LlamaIndex.
### Django Ninja API
Django Ninja is a web framework for building APIs with Django and Python 3.7+ type hints. It provides a simple,
intuitive, and expressive way of defining API endpoints, leveraging Pythons type hints to automatically generate input
validation, serialization, and documentation.
In the Delphic repo,
the [`./config/api/endpoints.py`](https://github.com/JSv4/Delphic/blob/main/config/api/endpoints.py)
file contains the API routes and logic for the API endpoints. Now, lets briefly address the purpose of each endpoint
in the `endpoints.py` file:
1. `/heartbeat`: A simple GET endpoint to check if the API is up and running. Returns `True` if the API is accessible.
This is helpful for Kubernetes setups that expect to be able to query your container to ensure it's up and running.
2. `/collections/create`: A POST endpoint to create a new `Collection`. Accepts form parameters such
as `title`, `description`, and a list of `files`. Creates a new `Collection` and `Document` instances for each file,
and schedules a Celery task to create an index.
```python
@collections_router.post("/create")
async def create_collection(request,
title: str = Form(...),
description: str = Form(...),
files: list[UploadedFile] = File(...), ):
key = None if getattr(request, "auth", None) is None else request.auth
if key is not None:
key = await key
collection_instance = Collection(
api_key=key,
title=title,
description=description,
status=CollectionStatusEnum.QUEUED,
)
await sync_to_async(collection_instance.save)()
for uploaded_file in files:
doc_data = uploaded_file.file.read()
doc_file = ContentFile(doc_data, uploaded_file.name)
document = Document(collection=collection_instance, file=doc_file)
await sync_to_async(document.save)()
create_index.si(collection_instance.id).apply_async()
return await sync_to_async(CollectionModelSchema)(
...
)
```
3. `/collections/query` — a POST endpoint to query a document collection using the LLM. Accepts a JSON payload
containing `collection_id` and `query_str`, and returns a response generated by querying the collection. We don't
actually use this endpoint in our chat GUI (We use a websocket - see below), but you could build an app to integrate
to this REST endpoint to query a specific collection.
```python
@collections_router.post("/query",
response=CollectionQueryOutput,
summary="Ask a question of a document collection", )
def query_collection_view(request: HttpRequest, query_input: CollectionQueryInput):
collection_id = query_input.collection_id
query_str = query_input.query_str
response = query_collection(collection_id, query_str)
return {"response": response}
```
4. `/collections/available`: A GET endpoint that returns a list of all collections created with the user's API key. The
output is serialized using the `CollectionModelSchema`.
```python
@collections_router.get("/available",
response=list[CollectionModelSchema],
summary="Get a list of all of the collections created with my api_key", )
async def get_my_collections_view(request: HttpRequest):
key = None if getattr(request, "auth", None) is None else request.auth
if key is not None:
key = await key
collections = Collection.objects.filter(api_key=key)
return [
{
...
}
async for collection in collections
]
```
5. `/collections/{collection_id}/add_file`: A POST endpoint to add a file to an existing collection. Accepts
a `collection_id` path parameter, and form parameters such as `file` and `description`. Adds the file as a `Document`
instance associated with the specified collection.
```python
@collections_router.post("/{collection_id}/add_file", summary="Add a file to a collection")
async def add_file_to_collection(request,
collection_id: int,
file: UploadedFile = File(...),
description: str = Form(...), ):
collection = await sync_to_async(Collection.objects.get)(id=collection_id
```
### Intro to Websockets
WebSockets are a communication protocol that enables bidirectional and full-duplex communication between a client and a
server over a single, long-lived connection. The WebSocket protocol is designed to work over the same ports as HTTP and
HTTPS (ports 80 and 443, respectively) and uses a similar handshake process to establish a connection. Once the
connection is established, data can be sent in both directions as “frames” without the need to reestablish the
connection each time, unlike traditional HTTP requests.
There are several reasons to use WebSockets, particularly when working with code that takes a long time to load into
memory but is quick to run once loaded:
1. **Performance**: WebSockets eliminate the overhead associated with opening and closing multiple connections for each
request, reducing latency.
2. **Efficiency**: WebSockets allow for real-time communication without the need for polling, resulting in more
efficient use of resources and better responsiveness.
3. **Scalability**: WebSockets can handle a large number of simultaneous connections, making it ideal for applications
that require high concurrency.
In the case of the Delphic application, using WebSockets makes sense as the LLMs can be expensive to load into memory.
By establishing a WebSocket connection, the LLM can remain loaded in memory, allowing subsequent requests to be
processed quickly without the need to reload the model each time.
The ASGI configuration file [`./config/asgi.py`](https://github.com/JSv4/Delphic/blob/main/config/asgi.py) defines how
the application should handle incoming connections, using the Django Channels `ProtocolTypeRouter` to route connections
based on their protocol type. In this case, we have two protocol types: "http" and "websocket".
The “http” protocol type uses the standard Django ASGI application to handle HTTP requests, while the “websocket”
protocol type uses a custom `TokenAuthMiddleware` to authenticate WebSocket connections. The `URLRouter` within
the `TokenAuthMiddleware` defines a URL pattern for the `CollectionQueryConsumer`, which is responsible for handling
WebSocket connections related to querying document collections.
```python
application = ProtocolTypeRouter(
{
"http": get_asgi_application(),
"websocket": TokenAuthMiddleware(
URLRouter(
[
re_path(
r"ws/collections/(?P<collection_id>\w+)/query/$",
CollectionQueryConsumer.as_asgi(),
),
]
)
),
}
)
```
This configuration allows clients to establish WebSocket connections with the Delphic application to efficiently query
document collections using the LLMs, without the need to reload the models for each request.
### Websocket Handler
The `CollectionQueryConsumer` class
in [`config/api/websockets/queries.py`](https://github.com/JSv4/Delphic/blob/main/config/api/websockets/queries.py) is
responsible for handling WebSocket connections related to querying document collections. It inherits from
the `AsyncWebsocketConsumer` class provided by Django Channels.
The `CollectionQueryConsumer` class has three main methods:
1. `connect`: Called when a WebSocket is handshaking as part of the connection process.
2. `disconnect`: Called when a WebSocket closes for any reason.
3. `receive`: Called when the server receives a message from the WebSocket.
#### Websocket connect listener
The `connect` method is responsible for establishing the connection, extracting the collection ID from the connection
path, loading the collection model, and accepting the connection.
```python
async def connect(self):
try:
self.collection_id = extract_connection_id(self.scope["path"])
self.index = await load_collection_model(self.collection_id)
await self.accept()
except ValueError as e:
await self.accept()
await self.close(code=4000)
except Exception as e:
pass
```
#### Websocket disconnect listener
The `disconnect` method is empty in this case, as there are no additional actions to be taken when the WebSocket is
closed.
#### Websocket receive listener
The `receive` method is responsible for processing incoming messages from the WebSocket. It takes the incoming message,
decodes it, and then queries the loaded collection model using the provided query. The response is then formatted as a
markdown string and sent back to the client over the WebSocket connection.
```python
async def receive(self, text_data):
text_data_json = json.loads(text_data)
if self.index is not None:
query_str = text_data_json["query"]
modified_query_str = f"Please return a nicely formatted markdown string to this request:\n\n{query_str}"
query_engine = self.index.as_query_engine()
response = query_engine.query(modified_query_str)
markdown_response = f"## Response\n\n{response}\n\n"
if response.source_nodes:
markdown_sources = f"## Sources\n\n{response.get_formatted_sources()}"
else:
markdown_sources = ""
formatted_response = f"{markdown_response}{markdown_sources}"
await self.send(json.dumps({"response": formatted_response}, indent=4))
else:
await self.send(json.dumps({"error": "No index loaded for this connection."}, indent=4))
```
To load the collection model, the `load_collection_model` function is used, which can be found
in [`delphic/utils/collections.py`](https://github.com/JSv4/Delphic/blob/main/delphic/utils/collections.py). This
function retrieves the collection object with the given collection ID, checks if a JSON file for the collection model
exists, and if not, creates one. Then, it sets up the `LLMPredictor` and `ServiceContext` before loading
the `VectorStoreIndex` using the cache file.
```python
async def load_collection_model(collection_id: str | int) -> VectorStoreIndex:
"""
Load the Collection model from cache or the database, and return the index.
Args:
collection_id (Union[str, int]): The ID of the Collection model instance.
Returns:
VectorStoreIndex: The loaded index.
This function performs the following steps:
1. Retrieve the Collection object with the given collection_id.
2. Check if a JSON file with the name '/cache/model_{collection_id}.json' exists.
3. If the JSON file doesn't exist, load the JSON from the Collection.model FileField and save it to
'/cache/model_{collection_id}.json'.
4. Call VectorStoreIndex.load_from_disk with the cache_file_path.
"""
# Retrieve the Collection object
collection = await Collection.objects.aget(id=collection_id)
logger.info(f"load_collection_model() - loaded collection {collection_id}")
# Make sure there's a model
if collection.model.name:
logger.info("load_collection_model() - Setup local json index file")
# Check if the JSON file exists
cache_dir = Path(settings.BASE_DIR) / "cache"
cache_file_path = cache_dir / f"model_{collection_id}.json"
if not cache_file_path.exists():
cache_dir.mkdir(parents=True, exist_ok=True)
with collection.model.open("rb") as model_file:
with cache_file_path.open("w+", encoding="utf-8") as cache_file:
cache_file.write(model_file.read().decode("utf-8"))
# define LLM
logger.info(
f"load_collection_model() - Setup service context with tokens {settings.MAX_TOKENS} and "
f"model {settings.MODEL_NAME}"
)
llm_predictor = LLMPredictor(
llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=512)
)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# Call VectorStoreIndex.load_from_disk
logger.info("load_collection_model() - Load llama index")
index = VectorStoreIndex.load_from_disk(
cache_file_path, service_context=service_context
)
logger.info(
"load_collection_model() - Llamaindex loaded and ready for query..."
)
else:
logger.error(
f"load_collection_model() - collection {collection_id} has no model!"
)
raise ValueError("No model exists for this collection!")
return index
```
## React Frontend
### Overview
We chose to use TypeScript, React and Material-UI (MUI) for the Delphic projects frontend for a couple reasons. First,
as the most popular component library (MUI) for the most popular frontend framework (React), this choice makes this
project accessible to a huge community of developers. Second, React is, at this point, a stable and generally well-liked
framework that delivers valuable abstractions in the form of its virtual DOM while still being relatively stable and, in
our opinion, pretty easy to learn, again making it accessible.
### Frontend Project Structure
The frontend can be found in the [`/frontend`](https://github.com/JSv4/Delphic/tree/main/frontend) directory of the
repo, with the React-related components being in `/frontend/src` . Youll notice there is a DockerFile in the `frontend`
directory and several folders and files related to configuring our frontend web
server — [nginx](https://www.nginx.com/).
The `/frontend/src/App.tsx` file serves as the entry point of the application. It defines the main components, such as
the login form, the drawer layout, and the collection create modal. The main components are conditionally rendered based
on whether the user is logged in and has an authentication token.
The DrawerLayout2 component is defined in the`DrawerLayour2.tsx` file. This component manages the layout of the
application and provides the navigation and main content areas.
Since the application is relatively simple, we can get away with not using a complex state management solution like
Redux and just use Reacts useState hooks.
### Grabbing Collections from the Backend
The collections available to the logged-in user are retrieved and displayed in the DrawerLayout2 component. The process
can be broken down into the following steps:
1. Initializing state variables:
```tsx
const[collections, setCollections] = useState < CollectionModelSchema[] > ([]);
const[loading, setLoading] = useState(true);
```
Here, we initialize two state variables: `collections` to store the list of collections and `loading` to track whether
the collections are being fetched.
2. Collections are fetched for the logged-in user with the `fetchCollections()` function:
```tsx
const
fetchCollections = async () = > {
try {
const accessToken = localStorage.getItem("accessToken");
if (accessToken) {
const response = await getMyCollections(accessToken);
setCollections(response.data);
}
} catch (error) {
console.error(error);
} finally {
setLoading(false);
}
};
```
The `fetchCollections` function retrieves the collections for the logged-in user by calling the `getMyCollections` API
function with the user's access token. It then updates the `collections` state with the retrieved data and sets
the `loading` state to `false` to indicate that fetching is complete.
### Displaying Collections
The latest collectios are displayed in the drawer like this:
```tsx
< List >
{collections.map((collection) = > (
< div key={collection.id} >
< ListItem disablePadding >
< ListItemButton
disabled={
collection.status != = CollectionStatus.COMPLETE | |
!collection.has_model
}
onClick={() = > handleCollectionClick(collection)}
selected = {
selectedCollection & &
selectedCollection.id == = collection.id
}
>
< ListItemText
primary = {collection.title} / >
{collection.status == = CollectionStatus.RUNNING ? (
< CircularProgress
size={24}
style={{position: "absolute", right: 16}}
/ >
): null}
< / ListItemButton >
< / ListItem >
< / div >
))}
< / List >
```
Youll notice that the `disabled` property of a collections `ListItemButton` is set based on whether the collection's
status is not `CollectionStatus.COMPLETE` or the collection does not have a model (`!collection.has_model`). If either
of these conditions is true, the button is disabled, preventing users from selecting an incomplete or model-less
collection. Where the CollectionStatus is RUNNING, we also show a loading wheel over the button.
In a separate `useEffect` hook, we check if any collection in the `collections` state has a status
of `CollectionStatus.RUNNING` or `CollectionStatus.QUEUED`. If so, we set up an interval to repeatedly call
the `fetchCollections` function every 15 seconds (15,000 milliseconds) to update the collection statuses. This way, the
application periodically checks for completed collections, and the UI is updated accordingly when the processing is
done.
```tsx
useEffect(() = > {
let
interval: NodeJS.Timeout;
if (
collections.some(
(collection) = >
collection.status == = CollectionStatus.RUNNING | |
collection.status == = CollectionStatus.QUEUED
)
) {
interval = setInterval(() = > {
fetchCollections();
}, 15000);
}
return () = > clearInterval(interval);
}, [collections]);
```
### Chat View Component
The `ChatView` component in `frontend/src/chat/ChatView.tsx` is responsible for handling and displaying a chat interface
for a user to interact with a collection. The component establishes a WebSocket connection to communicate in real-time
with the server, sending and receiving messages.
Key features of the `ChatView` component include:
1. Establishing and managing the WebSocket connection with the server.
2. Displaying messages from the user and the server in a chat-like format.
3. Handling user input to send messages to the server.
4. Updating the messages state and UI based on received messages from the server.
5. Displaying connection status and errors, such as loading messages, connecting to the server, or encountering errors
while loading a collection.
Together, all of this allows users to interact with their selected collection with a very smooth, low-latency
experience.
#### Chat Websocket Client
The WebSocket connection in the `ChatView` component is used to establish real-time communication between the client and
the server. The WebSocket connection is set up and managed in the `ChatView` component as follows:
First, we want to initialize the the WebSocket reference:
const websocket = useRef<WebSocket | null>(null);
A `websocket` reference is created using `useRef`, which holds the WebSocket object that will be used for
communication. `useRef` is a hook in React that allows you to create a mutable reference object that persists across
renders. It is particularly useful when you need to hold a reference to a mutable object, such as a WebSocket
connection, without causing unnecessary re-renders.
In the `ChatView` component, the WebSocket connection needs to be established and maintained throughout the lifetime of
the component, and it should not trigger a re-render when the connection state changes. By using `useRef`, you ensure
that the WebSocket connection is kept as a reference, and the component only re-renders when there are actual state
changes, such as updating messages or displaying errors.
The `setupWebsocket` function is responsible for establishing the WebSocket connection and setting up event handlers to
handle different WebSocket events.
Overall, the setupWebsocket function looks like this:
```tsx
const setupWebsocket = () => {
setConnecting(true);
// Here, a new WebSocket object is created using the specified URL, which includes the
// selected collection's ID and the user's authentication token.
websocket.current = new WebSocket(
`ws://localhost:8000/ws/collections/${selectedCollection.id}/query/?token=${authToken}`
);
websocket.current.onopen = (event) => {
//...
};
websocket.current.onmessage = (event) => {
//...
};
websocket.current.onclose = (event) => {
//...
};
websocket.current.onerror = (event) => {
//...
};
return () => {
websocket.current?.close();
};
};
```
Notice in a bunch of places we trigger updates to the GUI based on the information from the web socket client.
When the component first opens and we try to establish a connection, the `onopen` listener is triggered. In the
callback, the component updates the states to reflect that the connection is established, any previous errors are
cleared, and no messages are awaiting responses:
```tsx
websocket.current.onopen = (event) => {
setError(false);
setConnecting(false);
setAwaitingMessage(false);
console.log("WebSocket connected:", event);
};
```
`onmessage`is triggered when a new message is received from the server through the WebSocket connection. In the
callback, the received data is parsed and the `messages` state is updated with the new message from the server:
```
websocket.current.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log("WebSocket message received:", data);
setAwaitingMessage(false);
if (data.response) {
// Update the messages state with the new message from the server
setMessages((prevMessages) => [
...prevMessages,
{
sender_id: "server",
message: data.response,
timestamp: new Date().toLocaleTimeString(),
},
]);
}
};
```
`onclose`is triggered when the WebSocket connection is closed. In the callback, the component checks for a specific
close code (`4000`) to display a warning toast and update the component states accordingly. It also logs the close
event:
```tsx
websocket.current.onclose = (event) => {
if (event.code === 4000) {
toast.warning(
"Selected collection's model is unavailable. Was it created properly?"
);
setError(true);
setConnecting(false);
setAwaitingMessage(false);
}
console.log("WebSocket closed:", event);
};
```
Finally, `onerror` is triggered when an error occurs with the WebSocket connection. In the callback, the component
updates the states to reflect the error and logs the error event:
```tsx
websocket.current.onerror = (event) => {
setError(true);
setConnecting(false);
setAwaitingMessage(false);
console.error("WebSocket error:", event);
};
```
#### Rendering our Chat Messages
In the `ChatView` component, the layout is determined using CSS styling and Material-UI components. The main layout
consists of a container with a `flex` display and a column-oriented `flexDirection`. This ensures that the content
within the container is arranged vertically.
There are three primary sections within the layout:
1. The chat messages area: This section takes up most of the available space and displays a list of messages exchanged
between the user and the server. It has an overflow-y set to auto, which allows scrolling when the content
overflows the available space. The messages are rendered using the `ChatMessage` component for each message and
a `ChatMessageLoading` component to show the loading state while waiting for a server response.
2. The divider: A Material-UI `Divider` component is used to separate the chat messages area from the input area,
creating a clear visual distinction between the two sections.
3. The input area: This section is located at the bottom and allows the user to type and send messages. It contains
a `TextField` component from Material-UI, which is set to accept multiline input with a maximum of 2 rows. The input
area also includes a `Button` component to send the message. The user can either click the "Send" button or press "
Enter" on their keyboard to send the message.
The user inputs accepted in the `ChatView` component are text messages that the user types in the `TextField`. The
component processes these text inputs and sends them to the server through the WebSocket connection.
## Deployment
### Prerequisites
To deploy the app, you're going to need Docker and Docker Compose installed. If you're on Ubuntu or another, common
Linux distribution, DigitalOcean has
a [great Docker tutorial](https://www.digitalocean.com/community/tutorial_collections/how-to-install-and-use-docker) and
another great tutorial
for [Docker Compose](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-20-04)
you can follow. If those don't work for you, try
the [official docker documentation.](https://docs.docker.com/engine/install/)
### Build and Deploy
The project is based on django-cookiecutter, and its pretty easy to get it deployed on a VM and configured to serve
HTTPs traffic for a specific domain. The configuration is somewhat involved, however — not because of this project, but
its just a fairly involved topic to configure your certificates, DNS, etc.
For the purposes of this guide, lets just get running locally. Perhaps well release a guide on production deployment.
In the meantime, check out
the [Django Cookiecutter project docs](https://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html)
for starters.
This guide assumes your goal is to get the application up and running for use. If you want to develop, most likely you
wont want to launch the compose stack with the — profiles fullstack flag and will instead want to launch the react
frontend using the node development server.
To deploy, first clone the repo:
```commandline
git clone https://github.com/yourusername/delphic.git
```
Change into the project directory:
```commandline
cd delphic
```
Copy the sample environment files:
```commandline
mkdir -p ./.envs/.local/
cp -a ./docs/sample_envs/local/.frontend ./frontend
cp -a ./docs/sample_envs/local/.django ./.envs/.local
cp -a ./docs/sample_envs/local/.postgres ./.envs/.local
```
Edit the `.django` and `.postgres` configuration files to include your OpenAI API key and set a unique password for your
database user. You can also set the response token limit in the .django file or switch which OpenAI model you want to
use. GPT4 is supported, assuming youre authorized to access it.
Build the docker compose stack with the `--profiles fullstack` flag:
```commandline
sudo docker-compose --profiles fullstack -f local.yml build
```
The fullstack flag instructs compose to build a docker container from the frontend folder and this will be launched
along with all of the needed, backend containers. It takes a long time to build a production React container, however,
so we dont recommend you develop this way. Follow
the [instructions in the project readme.md](https://github.com/JSv4/Delphic#development) for development environment
setup instructions.
Finally, bring up the application:
```commandline
sudo docker-compose -f local.yml up
```
Now, visit `localhost:3000` in your browser to see the frontend, and use the Delphic application locally.
## Using the Application
### Setup Users
In order to actually use the application (at the moment, we intend to make it possible to share certain models with
unauthenticated users), you need a login. You can use either a superuser or non-superuser. In either case, someone needs
to first create a superuser using the console:
**Why set up a Django superuser?** A Django superuser has all the permissions in the application and can manage all
aspects of the system, including creating, modifying, and deleting users, collections, and other data. Setting up a
superuser allows you to fully control and manage the application.
**How to create a Django superuser:**
1 Run the following command to create a superuser:
sudo docker-compose -f local.yml run django python manage.py createsuperuser
2 You will be prompted to provide a username, email address, and password for the superuser. Enter the required
information.
**How to create additional users using Django admin:**
1. Start your Delphic application locally following the deployment instructions.
2. Visit the Django admin interface by navigating to `http://localhost:8000/admin` in your browser.
3. Log in with the superuser credentials you created earlier.
4. Click on “Users” under the “Authentication and Authorization” section.
5. Click on the “Add user +” button in the top right corner.
6. Enter the required information for the new user, such as username and password. Click “Save” to create the user.
7. To grant the new user additional permissions or make them a superuser, click on their username in the user list,
scroll down to the “Permissions” section, and configure their permissions accordingly. Save your changes.
+7
View File
@@ -0,0 +1,7 @@
# Chatbots
Chatbots are an incredibly popular use case for LLM's. LlamaIndex gives you the tools to build Knowledge-augmented chatbots and agents.
Relevant Resources:
- [Building a Chatbot](/end_to_end_tutorials/chatbots/building_a_chatbot.md)
- [Using with a LangChain Agent](/community/integrations/using_with_langchain.md)
@@ -0,0 +1,352 @@
# 💬🤖 How to Build a Chatbot
LlamaIndex is an interface between your data and LLM's; it offers the toolkit for you to setup a query interface around your data for any downstream task, whether it's question-answering, summarization, or more.
In this tutorial, we show you how to build a context augmented chatbot. We use Langchain for the underlying Agent/Chatbot abstractions, and we use LlamaIndex for the data retrieval/lookup/querying! The result is a chatbot agent that has access to a rich set of "data interface" Tools that LlamaIndex provides to answer queries over your data.
**Note**: This is a continuation of some initial work building a query interface over SEC 10-K filings - [check it out here](https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d).
### Context
In this tutorial, we build an "10-K Chatbot" by downloading the raw UBER 10-K HTML filings from Dropbox. The user can choose to ask questions regarding the 10-K filings.
### Ingest Data
Let's first download the raw 10-k files, from 2019-2022.
```python
# NOTE: the code examples assume you're operating within a Jupyter notebook.
# download files
!mkdir data
!wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip
!unzip data/UBER.zip -d data
```
We use the [Unstructured](https://github.com/Unstructured-IO/unstructured) library to parse the HTML files into formatted text.
We have a direct integration with Unstructured through [LlamaHub](https://llamahub.ai/) - this allows us to convert any text into a Document format that LlamaIndex can ingest.
```python
from llama_index import download_loader, VectorStoreIndex, ServiceContext, StorageContext, load_index_from_storage
from pathlib import Path
years = [2022, 2021, 2020, 2019]
UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)
loader = UnstructuredReader()
doc_set = {}
all_docs = []
for year in years:
year_docs = loader.load_data(file=Path(f'./data/UBER/UBER_{year}.html'), split_documents=False)
# insert year metadata into each year
for d in year_docs:
d.metadata = {"year": year}
doc_set[year] = year_docs
all_docs.extend(year_docs)
```
### Setting up Vector Indices for each year
We first setup a vector index for each year. Each vector index allows us
to ask questions about the 10-K filing of a given year.
We build each index and save it to disk.
```python
# initialize simple vector indices + global vector index
service_context = ServiceContext.from_defaults(chunk_size=512)
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults()
cur_index = VectorStoreIndex.from_documents(
doc_set[year],
service_context=service_context,
storage_context=storage_context,
)
index_set[year] = cur_index
storage_context.persist(persist_dir=f'./storage/{year}')
```
To load an index from disk, do the following
```python
# Load indices from disk
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults(persist_dir=f'./storage/{year}')
cur_index = load_index_from_storage(storage_context=storage_context)
index_set[year] = cur_index
```
### Composing a Graph to Synthesize Answers Across 10-K Filings
Since we have access to documents of 4 years, we may not only want to ask questions regarding the 10-K document of a given year, but ask questions that require analysis over all 10-K filings.
To address this, we compose a "graph" which consists of a list index defined over the 4 vector indices. Querying this graph would first retrieve information from each vector index, and combine information together via the list index.
```python
from llama_index import ListIndex, LLMPredictor, ServiceContext, load_graph_from_storage
from langchain import OpenAI
from llama_index.indices.composability import ComposableGraph
# describe each index to help traversal of composed graph
index_summaries = [f"UBER 10-k Filing for {year} fiscal year" for year in years]
# define an LLMPredictor set number of output tokens
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, max_tokens=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
storage_context = StorageContext.from_defaults()
# define a list index over the vector indices
# allows us to synthesize information across each index
graph = ComposableGraph.from_indices(
ListIndex,
[index_set[y] for y in years],
index_summaries=index_summaries,
service_context=service_context,
storage_context = storage_context,
)
root_id = graph.root_id
# [optional] save to disk
storage_context.persist(persist_dir=f'./storage/root')
# [optional] load from disk, so you don't need to build graph from scratch
graph = load_graph_from_storage(
root_id=root_id,
service_context=service_context,
storage_context=storage_context,
)
```
### Setting up the Tools + Langchain Chatbot Agent
We use Langchain to setup the outer chatbot agent, which has access to a set of Tools.
LlamaIndex provides some wrappers around indices and graphs so that they can be easily used within a Tool interface.
```python
# do imports
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.agents import initialize_agent
from llama_index.langchain_helpers.agents import LlamaToolkit, create_llama_chat_agent, IndexToolConfig
```
We want to define a separate Tool for each index (corresponding to a given year), as well
as the graph. We can define all tools under a central `LlamaToolkit` interface.
Below, we define a `IndexToolConfig` for our graph. Note that we also import a `DecomposeQueryTransform` module for use within each vector index within the graph - this allows us to "decompose" the overall query into a query that can be answered from each subindex. (see example below).
```python
# define a decompose transform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor, verbose=True
)
# define custom retrievers
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
custom_query_engines = {}
for index in index_set.values():
query_engine = index.as_query_engine()
query_engine = TransformQueryEngine(
query_engine,
query_transform=decompose_transform,
transform_extra_info={'index_summary': index.index_struct.summary},
)
custom_query_engines[index.index_id] = query_engine
custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
response_mode='tree_summarize',
verbose=True,
)
# construct query engine
graph_query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
# tool config
graph_config = IndexToolConfig(
query_engine=graph_query_engine,
name=f"Graph Index",
description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber.",
tool_kwargs={"return_direct": True}
)
```
Besides the `IndexToolConfig` object for the graph, we also define an `IndexToolConfig` corresponding to each index:
```python
# define toolkit
index_configs = []
for y in range(2019, 2023):
query_engine = index_set[y].as_query_engine(
similarity_top_k=3,
)
tool_config = IndexToolConfig(
query_engine=query_engine,
name=f"Vector Index {y}",
description=f"useful for when you want to answer queries about the {y} SEC 10-K for Uber",
tool_kwargs={"return_direct": True}
)
index_configs.append(tool_config)
```
Finally, we combine these configs with our `LlamaToolkit`:
```python
toolkit = LlamaToolkit(
index_configs=index_configs + [graph_config],
)
```
Finally, we call `create_llama_chat_agent` to create our Langchain chatbot agent, which
has access to the 5 Tools we defined above:
```python
memory = ConversationBufferMemory(memory_key="chat_history")
llm=OpenAI(temperature=0)
agent_chain = create_llama_chat_agent(
toolkit,
llm,
memory=memory,
verbose=True
)
```
### Testing the Agent
We can now test the agent with various queries.
If we test it with a simple "hello" query, the agent does not use any Tools.
```python
agent_chain.run(input="hi, i am bob")
```
```
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? No
AI: Hi Bob, nice to meet you! How can I help you today?
> Finished chain.
'Hi Bob, nice to meet you! How can I help you today?'
```
If we test it with a query regarding the 10-k of a given year, the agent will use
the relevant vector index Tool.
```python
agent_chain.run(input="What were some of the biggest risk factors in 2020 for Uber?")
```
```
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: Vector Index 2020
Action Input: Risk Factors
...
Observation:
Risk Factors
The COVID-19 pandemic and the impact of actions to mitigate the pandemic has adversely affected and continues to adversely affect our business, financial condition, and results of operations.
...
'\n\nRisk Factors\n\nThe COVID-19 pandemic and the impact of actions to mitigate the pandemic has adversely affected and continues to adversely affect our business,
```
Finally, if we test it with a query to compare/contrast risk factors across years,
the agent will use the graph index Tool.
```python
cross_query_str = (
"Compare/contrast the risk factors described in the Uber 10-K across years. Give answer in bullet points."
)
agent_chain.run(input=cross_query_str)
```
```
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: Graph Index
Action Input: Compare/contrast the risk factors described in the Uber 10-K across years.> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2022 fiscal year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2022 fiscal year?
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 964 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
> Got response:
The risk factors described in the Uber 10-K for the 2022 fiscal year include: the potential for changes in the classification of Drivers, the potential for increased competition, the potential for...
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2021 fiscal year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2021 fiscal year?
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 590 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
> Got response:
1. The COVID-19 pandemic and the impact of actions to mitigate the pandemic have adversely affected and may continue to adversely affect parts of our business.
2. Our business would be adversely ...
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2020 fiscal year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2020 fiscal year?
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 516 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
> Got response:
The risk factors described in the Uber 10-K for the 2020 fiscal year include: the timing of widespread adoption of vaccines against the virus, additional actions that may be taken by governmental ...
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2019 fiscal year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across years.
> New query: What are the risk factors described in the Uber 10-K for the 2019 fiscal year?
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1020 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
INFO:llama_index.indices.common.tree.base:> Building index from nodes: 0 chunks
> Got response:
Risk factors described in the Uber 10-K for the 2019 fiscal year include: competition from other transportation providers; the impact of government regulations; the impact of litigation; the impac...
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 7039 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 72 tokens
Observation:
In 2020, the risk factors included the timing of widespread adoption of vaccines against the virus, additional actions that may be taken by governmental authorities, the further impact on the business of Drivers
...
```
### Setting up the Chatbot Loop
Now that we have the chatbot setup, it only takes a few more steps to setup a basic interactive loop to converse with our SEC-augmented chatbot!
```python
while True:
text_input = input("User: ")
response = agent_chain.run(input=text_input)
print(f'Agent: {response}')
```
Here's an example of the loop in action:
```
User: What were some of the legal proceedings against Uber in 2022?
Agent:
In 2022, legal proceedings against Uber include a motion to compel arbitration, an appeal of a ruling that Proposition 22 is unconstitutional, a complaint alleging that drivers are employees and entitled to protections under the wage and labor laws, a summary judgment motion, allegations of misclassification of drivers and related employment violations in New York, fraud related to certain deductions, class actions in Australia alleging that Uber entities conspired to injure the group members during the period 2014 to 2017 by either directly breaching transport legislation or commissioning offenses against transport legislation by UberX Drivers in Australia, and claims of lost income and decreased value of certain taxi. Additionally, Uber is facing a challenge in California Superior Court alleging that Proposition 22 is unconstitutional, and a preliminary injunction order prohibiting Uber from classifying Drivers as independent contractors and from violating various wage and hour laws.
User:
```
### Notebook
Take a look at our [corresponding notebook](https://github.com/jerryjliu/llama_index/blob/main/examples/chatbot/Chatbot_SEC.ipynb).
@@ -0,0 +1,29 @@
# Discover LlamaIndex Video Series
This page contains links to videos + associated notebooks for our ongoing video tutorial series "Discover LlamaIndex".
## SubQuestionQueryEngine + 10K Analysis
This video covers the `SubQuestionQueryEngine` and how it can be applied to financial documents to help decompose complex queries into multiple sub-questions.
[Youtube](https://www.youtube.com/watch?v=GT_Lsj3xj1o)
[Notebook](../../examples/usecases/10k_sub_question.ipynb)
## Discord Document Management
This video covers managing documents from a source that is consantly updating (i.e Discord) and how you can avoid document duplication and save embedding tokens.
[Youtube](https://www.youtube.com/watch?v=j6dJcODLd_c)
[Notebook + Supplimentary Material](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/discover_llamaindex/document_management/)
[Reference Docs](../../core_modules/data_modules/index/document_management.md)
## Joint Text to SQL and Semantic Search
This video covers the tools built into LlamaIndex for combining SQL and semantic search into a single unified query interface.
[Youtube](https://www.youtube.com/watch?v=ZIvcVJGtCrY)
[Notebook](../../examples/query_engine/SQLAutoVectorQueryEngine.ipynb)
+5
View File
@@ -0,0 +1,5 @@
# Private Setup
Relevant Resources:
- [Using LlamaIndex with Local Models](https://colab.research.google.com/drive/16QMQePkONNlDpgiltOi7oRQgmB8dU5fl?usp=sharing)
@@ -0,0 +1,234 @@
# Q&A over Documents
At a high-level, LlamaIndex gives you the ability to query your data for any downstream LLM use case,
whether it's question-answering, summarization, or a component in a chatbot.
This section describes the different ways you can query your data with LlamaIndex, roughly in order
of simplest (top-k semantic search), to more advanced capabilities.
### Semantic Search
The most basic example usage of LlamaIndex is through semantic search. We provide
a simple in-memory vector store for you to get started, but you can also choose
to use any one of our [vector store integrations](/community/integrations/vector_stores.md):
```python
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
```
**Tutorials**
- [Starter Tutorial](/getting_started/starter_example.md)
- [Basic Usage Pattern](/end_to_end_tutorials/usage_pattern.md)
**Guides**
- [Example](../examples/vector_stores/SimpleIndexDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/tree/main/docs/examples/vector_stores/SimpleIndexDemo.ipynb))
### Summarization
A summarization query requires the LLM to iterate through many if not most documents in order to synthesize an answer.
For instance, a summarization query could look like one of the following:
- "What is a summary of this collection of text?"
- "Give me a summary of person X's experience with the company."
In general, a list index would be suited for this use case. A list index by default goes through all the data.
Empirically, setting `response_mode="tree_summarize"` also leads to better summarization results.
```python
index = ListIndex.from_documents(documents)
query_engine = index.as_query_engine(
response_mode="tree_summarize"
)
response = query_engine.query("<summarization_query>")
```
### Queries over Structured Data
LlamaIndex supports queries over structured data, whether that's a Pandas DataFrame or a SQL Database.
Here are some relevant resources:
**Tutorials**
- [Guide on Text-to-SQL](/guides/tutorials/sql_guide.md)
**Guides**
- [SQL Guide (Core)](../examples/index_structs/struct_indices/SQLIndexDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/index_structs/struct_indices/SQLIndexDemo.ipynb))
- [Pandas Demo](../examples/query_engine/pandas_query_engine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/pandas_query_engine.ipynb))
### Synthesis over Heterogeneous Data
LlamaIndex supports synthesizing across heterogeneous data sources. This can be done by composing a graph over your existing data.
Specifically, compose a list index over your subindices. A list index inherently combines information for each node; therefore
it can synthesize information across your heterogeneous data sources.
```python
from llama_index import VectorStoreIndex, ListIndex
from llama_index.indices.composability import ComposableGraph
index1 = VectorStoreIndex.from_documents(notion_docs)
index2 = VectorStoreIndex.from_documents(slack_docs)
graph = ComposableGraph.from_indices(ListIndex, [index1, index2], index_summaries=["summary1", "summary2"])
query_engine = graph.as_query_engine()
response = query_engine.query("<query_str>")
```
**Guides**
- [Composability](/core_modules/data_modules/index/composability.md)
- [City Analysis](/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb))
### Routing over Heterogeneous Data
LlamaIndex also supports routing over heterogeneous data sources with `RouterQueryEngine` - for instance, if you want to "route" a query to an
underlying Document or a sub-index.
To do this, first build the sub-indices over different data sources.
Then construct the corresponding query engines, and give each query engine a description to obtain a `QueryEngineTool`.
```python
from llama_index import TreeIndex, VectorStoreIndex
from llama_index.tools import QueryEngineTool
...
# define sub-indices
index1 = VectorStoreIndex.from_documents(notion_docs)
index2 = VectorStoreIndex.from_documents(slack_docs)
# define query engines and tools
tool1 = QueryEngineTool.from_defaults(
query_engine=index1.as_query_engine(),
description="Use this query engine to do...",
)
tool2 = QueryEngineTool.from_defaults(
query_engine=index2.as_query_engine(),
description="Use this query engine for something else...",
)
```
Then, we define a `RouterQueryEngine` over them.
By default, this uses a `LLMSingleSelector` as the router, which uses the LLM to choose the best sub-index to router the query to, given the descriptions.
```python
from llama_index.query_engine import RouterQueryEngine
query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[tool1, tool2]
)
response = query_engine.query(
"In Notion, give me a summary of the product roadmap."
)
```
**Guides**
- [Router Query Engine Guide](../examples/query_engine/RouterQueryEngine.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/RouterQueryEngine.ipynb))
- [City Analysis Unified Query Interface](../examples/composable_indices/city_analysis/City_Analysis-Unified-Query.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb))
### Compare/Contrast Queries
You can explicitly perform compare/contrast queries with a **query transformation** module within a ComposableGraph.
```python
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor_chatgpt, verbose=True
)
```
This module will help break down a complex query into a simpler one over your existing index structure.
**Guides**
- [Query Transformations](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
- [City Analysis Compare/Contrast Example](/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb))
You can also rely on the LLM to *infer* whether to perform compare/contrast queries (see Multi-Document Queries below).
### Multi-Document Queries
Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well.
It can do this through our `SubQuestionQueryEngine` class. Given a query, this query engine will generate a "query plan" containing
sub-queries against sub-documents before synthesizing the final answer.
To do this, first define an index for each document/data source, and wrap it with a `QueryEngineTool` (similar to above):
```python
from llama_index.tools import QueryEngineTool, ToolMetadata
query_engine_tools = [
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(name='sept_22', description='Provides information about Uber quarterly financials ending September 2022')
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(name='june_22', description='Provides information about Uber quarterly financials ending June 2022')
),
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(name='march_22', description='Provides information about Uber quarterly financials ending March 2022')
),
]
```
Then, we define a `SubQuestionQueryEngine` over these tools:
```python
from llama_index.query_engine import SubQuestionQueryEngine
query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
```
This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer.
This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.
**Guides**
- [Sub Question Query Engine (Intro)](../examples/query_engine/sub_question_query_engine.ipynb)
- [10Q Analysis (Uber)](../examples/usecases/10q_sub_question.ipynb)
- [10K Analysis (Uber and Lyft)](../examples/usecases/10k_sub_question.ipynb)
### Multi-Step Queries
LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions,
and sequentially generate subquestions based on returned answers until the final answer is returned.
For instance, given a question "Who was in the first batch of the accelerator program the author started?",
the module will first decompose the query into a simpler initial question "What was the accelerator program the author started?",
query the index, and then ask followup questions.
**Guides**
- [Query Transformations](/core_modules/query_modules/query_engine/advanced/query_transformations.md)
- [Multi-Step Query Decomposition](../examples/query_transformations/HyDEQueryTransformDemo.ipynb) ([Notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb))
### Temporal Queries
LlamaIndex can support queries that require an understanding of time. It can do this in two ways:
- Decide whether the query requires utilizing temporal relationships between nodes (prev/next relationships) in order to retrieve additional context to answer the question.
- Sort by recency and filter outdated context.
**Guides**
- [Second-Stage Postprocessing Guide](/core_modules/query_modules/node_postprocessors/root.md)
- [Prev/Next Postprocessing](/examples/node_postprocessor/PrevNextPostprocessorDemo.ipynb)
- [Recency Postprocessing](/examples/node_postprocessor/RecencyPostprocessorDemo.ipynb)
### Additional Resources
- [A Guide to Creating a Unified Query Framework over your ndexes](/end_to_end_tutorials/question_and_answer/unified_query.md)
- [A Guide to Extracting Terms and Definitions](/end_to_end_tutorials/question_and_answer/terms_definitions_tutorial.md)
- [SEC 10k Analysis](https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d)
@@ -0,0 +1,489 @@
# A Guide to Extracting Terms and Definitions
Llama Index has many use cases (semantic search, summarization, etc.) that are [well documented](/end_to_end_tutorials/use_cases.md). However, this doesn't mean we can't apply Llama Index to very specific use cases!
In this tutorial, we will go through the design process of using Llama Index to extract terms and definitions from text, while allowing users to query those terms later. Using [Streamlit](https://streamlit.io/), we can provide an easy way to build frontend for running and testing all of this, and quickly iterate with our design.
This tutorial assumes you have Python3.9+ and the following packages installed:
- llama-index
- streamlit
At the base level, our objective is to take text from a document, extract terms and definitions, and then provide a way for users to query that knowledge base of terms and definitions. The tutorial will go over features from both Llama Index and Streamlit, and hopefully provide some interesting solutions for common problems that come up.
The final version of this tutorial can be found [here](https://github.com/logan-markewich/llama_index_starter_pack) and a live hosted demo is available on [Huggingface Spaces](https://huggingface.co/spaces/llamaindex/llama_index_term_definition_demo).
## Uploading Text
Step one is giving users a way to upload documents. Lets write some code using Streamlit to provide the interface for this! Use the following code and launch the app with `streamlit run app.py`.
```python
import streamlit as st
st.title("🦙 Llama Index Term Extractor 🦙")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and document_text:
with st.spinner("Extracting..."):
extracted_terms = document text # this is a placeholder!
st.write(extracted_terms)
```
Super simple right! But you'll notice that the app doesn't do anything useful yet. To use llama_index, we also need to setup our OpenAI LLM. There are a bunch of possible settings for the LLM, so we can let the user figure out what's best. We should also let the user set the prompt that will extract the terms (which will also help us debug what works best).
## LLM Settings
This next step introduces some tabs to our app, to separate it into different panes that provide different features. Let's create a tab for LLM settings and for uploading text:
```python
import os
import streamlit as st
DEFAULT_TERM_STR = (
"Make a list of terms and definitions that are defined in the context, "
"with one pair on each line. "
"If a term is missing it's definition, use your best judgment. "
"Write each line as as follows:\nTerm: <term> Definition: <definition>"
)
st.title("🦙 Llama Index Term Extractor 🦙")
setup_tab, upload_tab = st.tabs(["Setup", "Upload/Extract Terms"])
with setup_tab:
st.subheader("LLM Setup")
api_key = st.text_input("Enter your OpenAI API key here", type="password")
llm_name = st.selectbox('Which LLM?', ["text-davinci-003", "gpt-3.5-turbo", "gpt-4"])
model_temperature = st.slider("LLM Temperature", min_value=0.0, max_value=1.0, step=0.1)
term_extract_str = st.text_area("The query to extract terms and definitions with.", value=DEFAULT_TERM_STR)
with upload_tab:
st.subheader("Extract and Query Definitions")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and document_text:
with st.spinner("Extracting..."):
extracted_terms = document text # this is a placeholder!
st.write(extracted_terms)
```
Now our app has two tabs, which really helps with the organization. You'll also noticed I added a default prompt to extract terms -- you can change this later once you try extracting some terms, it's just the prompt I arrived at after experimenting a bit.
Speaking of extracting terms, it's time to add some functions to do just that!
## Extracting and Storing Terms
Now that we are able to define LLM settings and upload text, we can try using Llama Index to extract the terms from text for us!
We can add the following functions to both initialize our LLM, as well as use it to extract terms from the input text.
```python
from llama_index import Document, ListIndex, LLMPredictor, ServiceContext, load_index_from_storage
def get_llm(llm_name, model_temperature, api_key, max_tokens=256):
os.environ['OPENAI_API_KEY'] = api_key
if llm_name == "text-davinci-003":
return OpenAI(temperature=model_temperature, model_name=llm_name, max_tokens=max_tokens)
else:
return ChatOpenAI(temperature=model_temperature, model_name=llm_name, max_tokens=max_tokens)
def extract_terms(documents, term_extract_str, llm_name, model_temperature, api_key):
llm = get_llm(llm_name, model_temperature, api_key, max_tokens=1024)
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm),
chunk_size=1024)
temp_index = ListIndex.from_documents(documents, service_context=service_context)
query_engine = temp_index.as_query_engine(response_mode="tree_summarize")
terms_definitions = str(query_engine.query(term_extract_str))
terms_definitions = [x for x in terms_definitions.split("\n") if x and 'Term:' in x and 'Definition:' in x]
# parse the text into a dict
terms_to_definition = {x.split("Definition:")[0].split("Term:")[-1].strip(): x.split("Definition:")[-1].strip() for x in terms_definitions}
return terms_to_definition
```
Now, using the new functions, we can finally extract our terms!
```python
...
with upload_tab:
st.subheader("Extract and Query Definitions")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and document_text:
with st.spinner("Extracting..."):
extracted_terms = extract_terms([Document(text=document_text)],
term_extract_str, llm_name,
model_temperature, api_key)
st.write(extracted_terms)
```
There's a lot going on now, let's take a moment to go over what is happening.
`get_llm()` is instantiating the LLM based on the user configuration from the setup tab. Based on the model name, we need to use the appropriate class (`OpenAI` vs. `ChatOpenAI`).
`extract_terms()` is where all the good stuff happens. First, we call `get_llm()` with `max_tokens=1024`, since we don't want to limit the model too much when it is extracting our terms and definitions (the default is 256 if not set). Then, we define our `ServiceContext` object, aligning `num_output` with our `max_tokens` value, as well as setting the chunk size to be no larger than the output. When documents are indexed by Llama Index, they are broken into chunks (also called nodes) if they are large, and `chunk_size` sets the size for these chunks.
Next, we create a temporary list index and pass in our service context. A list index will read every single piece of text in our index, which is perfect for extracting terms. Finally, we use our pre-defined query text to extract terms, using `response_mode="tree_summarize`. This response mode will generate a tree of summaries from the bottom up, where each parent summarizes its children. Finally, the top of the tree is returned, which will contain all our extracted terms and definitions.
Lastly, we do some minor post processing. We assume the model followed instructions and put a term/definition pair on each line. If a line is missing the `Term:` or `Definition:` labels, we skip it. Then, we convert this to a dictionary for easy storage!
## Saving Extracted Terms
Now that we can extract terms, we need to put them somewhere so that we can query for them later. A `VectorStoreIndex` should be a perfect choice for now! But in addition, our app should also keep track of which terms are inserted into the index so that we can inspect them later. Using `st.session_state`, we can store the current list of terms in a session dict, unique to each user!
First things first though, let's add a feature to initialize a global vector index and another function to insert the extracted terms.
```python
...
if 'all_terms' not in st.session_state:
st.session_state['all_terms'] = DEFAULT_TERMS
...
def insert_terms(terms_to_definition):
for term, definition in terms_to_definition.items():
doc = Document(text=f"Term: {term}\nDefinition: {definition}")
st.session_state['llama_index'].insert(doc)
@st.cache_resource
def initialize_index(llm_name, model_temperature, api_key):
"""Create the VectorStoreIndex object."""
llm = get_llm(llm_name, model_temperature, api_key)
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
index = VectorStoreIndex([], service_context=service_context)
return index
...
with upload_tab:
st.subheader("Extract and Query Definitions")
if st.button("Initialize Index and Reset Terms"):
st.session_state['llama_index'] = initialize_index(llm_name, model_temperature, api_key)
st.session_state['all_terms'] = {}
if "llama_index" in st.session_state:
st.markdown("Either upload an image/screenshot of a document, or enter the text manually.")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and (uploaded_file or document_text):
st.session_state['terms'] = {}
terms_docs = {}
with st.spinner("Extracting..."):
terms_docs.update(extract_terms([Document(text=document_text)], term_extract_str, llm_name, model_temperature, api_key))
st.session_state['terms'].update(terms_docs)
if "terms" in st.session_state and st.session_state["terms"]::
st.markdown("Extracted terms")
st.json(st.session_state['terms'])
if st.button("Insert terms?"):
with st.spinner("Inserting terms"):
insert_terms(st.session_state['terms'])
st.session_state['all_terms'].update(st.session_state['terms'])
st.session_state['terms'] = {}
st.experimental_rerun()
```
Now you are really starting to leverage the power of streamlit! Let's start with the code under the upload tab. We added a button to initialize the vector index, and we store it in the global streamlit state dictionary, as well as resetting the currently extracted terms. Then, after extracting terms from the input text, we store it the extracted terms in the global state again and give the user a chance to review them before inserting. If the insert button is pressed, then we call our insert terms function, update our global tracking of inserted terms, and remove the most recently extracted terms from the session state.
## Querying for Extracted Terms/Definitions
With the terms and definitions extracted and saved, how can we use them? And how will the user even remember what's previously been saved?? We can simply add some more tabs to the app to handle these features.
```python
...
setup_tab, terms_tab, upload_tab, query_tab = st.tabs(
["Setup", "All Terms", "Upload/Extract Terms", "Query Terms"]
)
...
with terms_tab:
with terms_tab:
st.subheader("Current Extracted Terms and Definitions")
st.json(st.session_state["all_terms"])
...
with query_tab:
st.subheader("Query for Terms/Definitions!")
st.markdown(
(
"The LLM will attempt to answer your query, and augment it's answers using the terms/definitions you've inserted. "
"If a term is not in the index, it will answer using it's internal knowledge."
)
)
if st.button("Initialize Index and Reset Terms", key="init_index_2"):
st.session_state["llama_index"] = initialize_index(
llm_name, model_temperature, api_key
)
st.session_state["all_terms"] = {}
if "llama_index" in st.session_state:
query_text = st.text_input("Ask about a term or definition:")
if query_text:
query_text = query_text + "\nIf you can't find the answer, answer the query with the best of your knowledge."
with st.spinner("Generating answer..."):
response = st.session_state["llama_index"].query(
query_text, similarity_top_k=5, response_mode="compact"
)
st.markdown(str(response))
```
While this is mostly basic, some important things to note:
- Our initialize button has the same text as our other button. Streamlit will complain about this, so we provide a unique key instead.
- Some additional text has been added to the query! This is to try and compensate for times when the index does not have the answer.
- In our index query, we've specified two options:
- `similarity_top_k=5` means the index will fetch the top 5 closest matching terms/definitions to the query.
- `response_mode="compact"` means as much text as possible from the 5 matching terms/definitions will be used in each LLM call. Without this, the index would make at least 5 calls to the LLM, which can slow things down for the user.
## Dry Run Test
Well, actually I hope you've been testing as we went. But now, let's try one complete test.
1. Refresh the app
2. Enter your LLM settings
3. Head over to the query tab
4. Ask the following: `What is a bunnyhug?`
5. The app should give some nonsense response. If you didn't know, a bunnyhug is another word for a hoodie, used by people from the Canadian Prairies!
6. Let's add this definition to the app. Open the upload tab and enter the following text: `A bunnyhug is a common term used to describe a hoodie. This term is used by people from the Canadian Prairies.`
7. Click the extract button. After a few moments, the app should display the correctly extracted term/definition. Click the insert term button to save it!
8. If we open the terms tab, the term and definition we just extracted should be displayed
9. Go back to the query tab and try asking what a bunnyhug is. Now, the answer should be correct!
## Improvement #1 - Create a Starting Index
With our base app working, it might feel like a lot of work to build up a useful index. What if we gave the user some kind of starting point to show off the app's query capabilities? We can do just that! First, let's make a small change to our app so that we save the index to disk after every upload:
```python
def insert_terms(terms_to_definition):
for term, definition in terms_to_definition.items():
doc = Document(text=f"Term: {term}\nDefinition: {definition}")
st.session_state['llama_index'].insert(doc)
# TEMPORARY - save to disk
st.session_state['llama_index'].storage_context.persist()
```
Now, we need some document to extract from! The repository for this project used the wikipedia page on New York City, and you can find the text [here](https://github.com/jerryjliu/llama_index/blob/main/examples/test_wiki/data/nyc_text.txt).
If you paste the text into the upload tab and run it (it may take some time), we can insert the extracted terms. Make sure to also copy the text for the extracted terms into a notepad or similar before inserting into the index! We will need them in a second.
After inserting, remove the line of code we used to save the index to disk. With a starting index now saved, we can modify our `initialize_index` function to look like this:
```python
@st.cache_resource
def initialize_index(llm_name, model_temperature, api_key):
"""Load the Index object."""
llm = get_llm(llm_name, model_temperature, api_key)
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
index = load_index_from_storage(service_context=service_context)
return index
```
Did you remember to save that giant list of extracted terms in a notepad? Now when our app initializes, we want to pass in the default terms that are in the index to our global terms state:
```python
...
if "all_terms" not in st.session_state:
st.session_state["all_terms"] = DEFAULT_TERMS
...
```
Repeat the above anywhere where we were previously resetting the `all_terms` values.
## Improvement #2 - (Refining) Better Prompts
If you play around with the app a bit now, you might notice that it stopped following our prompt! Remember, we added to our `query_str` variable that if the term/definition could not be found, answer to the best of its knowledge. But now if you try asking about random terms (like bunnyhug!), it may or may not follow those instructions.
This is due to the concept of "refining" answers in Llama Index. Since we are querying across the top 5 matching results, sometimes all the results do not fit in a single prompt! OpenAI models typically have a max input size of 4097 tokens. So, Llama Index accounts for this by breaking up the matching results into chunks that will fit into the prompt. After Llama Index gets an initial answer from the first API call, it sends the next chunk to the API, along with the previous answer, and asks the model to refine that answer.
So, the refine process seems to be messing with our results! Rather than appending extra instructions to the `query_str`, remove that, and Llama Index will let us provide our own custom prompts! Let's create those now, using the [default prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py) and [chat specific prompts](https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/chat_prompts.py) as a guide. Using a new file `constants.py`, let's create some new query templates:
```python
from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model
from langchain.prompts.chat import (
AIMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
# Text QA templates
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information answer the following question "
"(if you don't know the answer, use the best of your knowledge): {query_str}\n"
)
TEXT_QA_TEMPLATE = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
# Refine templates
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {query_str}\n"
"We have provided an existing answer: {existing_answer}\n"
"We have the opportunity to refine the existing answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context and using the best of your knowledge, improve the existing answer. "
"If you can't improve the existing answer, just repeat it again."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"We have the opportunity to refine the above answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context and using the best of your knowledge, improve the existing answer. "
"If you can't improve the existing answer, just repeat it again."
),
]
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
# refine prompt selector
DEFAULT_REFINE_PROMPT_SEL_LC = ConditionalPromptSelector(
default_prompt=DEFAULT_REFINE_PROMPT.get_langchain_prompt(),
conditionals=[(is_chat_model, CHAT_REFINE_PROMPT.get_langchain_prompt())],
)
REFINE_TEMPLATE = RefinePrompt(
langchain_prompt_selector=DEFAULT_REFINE_PROMPT_SEL_LC
)
```
That seems like a lot of code, but it's not too bad! If you looked at the default prompts, you might have noticed that there are default prompts, and prompts specific to chat models. Continuing that trend, we do the same for our custom prompts. Then, using a prompt selector, we can combine both prompts into a single object. If the LLM being used is a chat model (ChatGPT, GPT-4), then the chat prompts are used. Otherwise, use the normal prompt templates.
Another thing to note is that we only defined one QA template. In a chat model, this will be converted to a single "human" message.
So, now we can import these prompts into our app and use them during the query.
```python
from constants import REFINE_TEMPLATE, TEXT_QA_TEMPLATE
...
if "llama_index" in st.session_state:
query_text = st.text_input("Ask about a term or definition:")
if query_text:
query_text = query_text # Notice we removed the old instructions
with st.spinner("Generating answer..."):
response = st.session_state["llama_index"].query(
query_text, similarity_top_k=5, response_mode="compact",
text_qa_template=TEXT_QA_TEMPLATE, refine_template=REFINE_TEMPLATE
)
st.markdown(str(response))
...
```
If you experiment a bit more with queries, hopefully you notice that the responses follow our instructions a little better now!
## Improvement #3 - Image Support
Llama index also supports images! Using Llama Index, we can upload images of documents (papers, letters, etc.), and Llama Index handles extracting the text. We can leverage this to also allow users to upload images of their documents and extract terms and definitions from them.
If you get an import error about PIL, install it using `pip install Pillow` first.
```python
from PIL import Image
from llama_index.readers.file.base import DEFAULT_FILE_EXTRACTOR, ImageParser
@st.cache_resource
def get_file_extractor():
image_parser = ImageParser(keep_image=True, parse_text=True)
file_extractor = DEFAULT_FILE_EXTRACTOR
file_extractor.update(
{
".jpg": image_parser,
".png": image_parser,
".jpeg": image_parser,
}
)
return file_extractor
file_extractor = get_file_extractor()
...
with upload_tab:
st.subheader("Extract and Query Definitions")
if st.button("Initialize Index and Reset Terms", key="init_index_1"):
st.session_state["llama_index"] = initialize_index(
llm_name, model_temperature, api_key
)
st.session_state["all_terms"] = DEFAULT_TERMS
if "llama_index" in st.session_state:
st.markdown(
"Either upload an image/screenshot of a document, or enter the text manually."
)
uploaded_file = st.file_uploader(
"Upload an image/screenshot of a document:", type=["png", "jpg", "jpeg"]
)
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and (
uploaded_file or document_text
):
st.session_state["terms"] = {}
terms_docs = {}
with st.spinner("Extracting (images may be slow)..."):
if document_text:
terms_docs.update(
extract_terms(
[Document(text=document_text)],
term_extract_str,
llm_name,
model_temperature,
api_key,
)
)
if uploaded_file:
Image.open(uploaded_file).convert("RGB").save("temp.png")
img_reader = SimpleDirectoryReader(
input_files=["temp.png"], file_extractor=file_extractor
)
img_docs = img_reader.load_data()
os.remove("temp.png")
terms_docs.update(
extract_terms(
img_docs,
term_extract_str,
llm_name,
model_temperature,
api_key,
)
)
st.session_state["terms"].update(terms_docs)
if "terms" in st.session_state and st.session_state["terms"]:
st.markdown("Extracted terms")
st.json(st.session_state["terms"])
if st.button("Insert terms?"):
with st.spinner("Inserting terms"):
insert_terms(st.session_state["terms"])
st.session_state["all_terms"].update(st.session_state["terms"])
st.session_state["terms"] = {}
st.experimental_rerun()
```
Here, we added the option to upload a file using Streamlit. Then the image is opened and saved to disk (this seems hacky but it keeps things simple). Then we pass the image path to the reader, extract the documents/text, and remove our temp image file.
Now that we have the documents, we can call `extract_terms()` the same as before.
## Conclusion/TLDR
In this tutorial, we covered a ton of information, while solving some common issues and problems along the way:
- Using different indexes for different use cases (List vs. Vector index)
- Storing global state values with Streamlit's `session_state` concept
- Customizing internal prompts with Llama Index
- Reading text from images with Llama Index
The final version of this tutorial can be found [here](https://github.com/logan-markewich/llama_index_starter_pack) and a live hosted demo is available on [Huggingface Spaces](https://huggingface.co/spaces/llamaindex/llama_index_term_definition_demo).
@@ -0,0 +1,268 @@
# A Guide to Creating a Unified Query Framework over your Indexes
LlamaIndex offers a variety of different [use cases](/end_to_end_tutorials/use_cases.md).
For simple queries, we may want to use a single index data structure, such as a `VectorStoreIndex` for semantic search, or `ListIndex` for summarization.
For more complex queries, we may want to use a composable graph.
But how do we integrate indexes and graphs into our LLM application? Different indexes and graphs may be better suited for different types of queries that you may want to run.
In this guide, we show how you can unify the diverse use cases of different index/graph structures under a **single** query framework.
### Setup
In this example, we will analyze Wikipedia articles of different cities: Boston, Seattle, San Francisco, and more.
The below code snippet downloads the relevant data into files.
```python
from pathlib import Path
import requests
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
for title in wiki_titles:
response = requests.get(
'https://en.wikipedia.org/w/api.php',
params={
'action': 'query',
'format': 'json',
'titles': title,
'prop': 'extracts',
# 'exintro': True,
'explaintext': True,
}
).json()
page = next(iter(response['query']['pages'].values()))
wiki_text = page['extract']
data_path = Path('data')
if not data_path.exists():
Path.mkdir(data_path)
with open(data_path / f"{title}.txt", 'w') as fp:
fp.write(wiki_text)
```
The next snippet loads all files into Document objects.
```python
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
city_docs[wiki_title] = SimpleDirectoryReader(input_files=[f"data/{wiki_title}.txt"]).load_data()
```
### Defining the Set of Indexes
We will now define a set of indexes and graphs over our data. You can think of each index/graph as a lightweight structure
that solves a distinct use case.
We will first define a vector index over the documents of each city.
```python
from llama_index import VectorStoreIndex, ServiceContext, StorageContext
from langchain.llms.openai import OpenAIChat
# set service context
llm_predictor_gpt4 = LLMPredictor(llm=OpenAIChat(temperature=0, model_name="gpt-4"))
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor_gpt4, chunk_size=1024
)
# Build city document index
vector_indices = {}
for wiki_title in wiki_titles:
storage_context = StorageContext.from_defaults()
# build vector index
vector_indices[wiki_title] = VectorStoreIndex.from_documents(
city_docs[wiki_title],
service_context=service_context,
storage_context=storage_context,
)
# set id for vector index
vector_indices[wiki_title].index_struct.index_id = wiki_title
# persist to disk
storage_context.persist(persist_dir=f'./storage/{wiki_title}')
```
Querying a vector index lets us easily perform semantic search over a given city's documents.
```python
response = vector_indices["Toronto"].as_query_engine().query("What are the sports teams in Toronto?")
print(str(response))
```
Example response:
```text
The sports teams in Toronto are the Toronto Maple Leafs (NHL), Toronto Blue Jays (MLB), Toronto Raptors (NBA), Toronto Argonauts (CFL), Toronto FC (MLS), Toronto Rock (NLL), Toronto Wolfpack (RFL), and Toronto Rush (NARL).
```
### Defining a Graph for Compare/Contrast Queries
We will now define a composed graph in order to run **compare/contrast** queries (see [use cases doc](/use_cases/queries.md)).
This graph contains a keyword table composed on top of existing vector indexes.
To do this, we first want to set the "summary text" for each vector index.
```python
index_summaries = {}
for wiki_title in wiki_titles:
# set summary text for city
index_summaries[wiki_title] = (
f"This content contains Wikipedia articles about {wiki_title}. "
f"Use this index if you need to lookup specific facts about {wiki_title}.\n"
"Do not use this index if you want to analyze multiple cities."
)
```
Next, we compose a keyword table on top of these vector indexes, with these indexes and summaries, in order to build the graph.
```python
from llama_index.indices.composability import ComposableGraph
graph = ComposableGraph.from_indices(
SimpleKeywordTableIndex,
[index for _, index in vector_indices.items()],
[summary for _, summary in index_summaries.items()],
max_keywords_per_chunk=50
)
# get root index
root_index = graph.get_index(graph.index_struct.root_id, SimpleKeywordTableIndex)
# set id of root index
root_index.set_index_id("compare_contrast")
root_summary = (
"This index contains Wikipedia articles about multiple cities. "
"Use this index if you want to compare multiple cities. "
)
```
Querying this graph (with a query transform module), allows us to easily compare/contrast between different cities.
An example is shown below.
```python
# define decompose_transform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor_chatgpt, verbose=True
)
# define custom query engines
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
custom_query_engines = {}
for index in vector_indices.values():
query_engine = index.as_query_engine(service_context=service_context)
query_engine = TransformQueryEngine(
query_engine,
query_transform=decompose_transform,
transform_extra_info={'index_summary': index.index_struct.summary},
)
custom_query_engines[index.index_id] = query_engine
custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
retriever_mode='simple',
response_mode='tree_summarize',
service_context=service_context,
)
# define query engine
query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
# query the graph
query_str = (
"Compare and contrast the arts and culture of Houston and Boston. "
)
response_chatgpt = query_engine.query(query_str)
```
### Defining the Unified Query Interface
Now that we've defined the set of indexes/graphs, we want to build an **outer abstraction** layer that provides a unified query interface
to our data structures. This means that during query-time, we can query this outer abstraction layer and trust that the right index/graph
will be used for the job.
There are a few ways to do this, both within our framework as well as outside of it!
- Build a **router query engine** on top of your existing indexes/graphs
- Define each index/graph as a Tool within an agent framework (e.g. LangChain).
For the purposes of this tutorial, we follow the former approach. If you want to take a look at how the latter approach works,
take a look at [our example tutorial here](/guides/tutorials/building_a_chatbot.md).
Let's take a look at an example of building a router query engine to automatically "route" any query to the set of indexes/graphs that you have define under the hood.
First, we define the query engines for the set of indexes/graph that we want to route our query to. We also give each a description (about what data it holds and what it's useful for) to help the router choose between them depending on the specific query.
```python
from llama_index.tools.query_engine import QueryEngineTool
query_engine_tools = []
# add vector index tools
for wiki_title in wiki_titles:
index = vector_indices[wiki_title]
summary = index_summaries[wiki_title]
query_engine = index.as_query_engine(service_context=service_context)
vector_tool = QueryEngineTool.from_defaults(query_engine, description=summary)
query_engine_tools.append(vector_tool)
# add graph tool
graph_description = (
"This tool contains Wikipedia articles about multiple cities. "
"Use this tool if you want to compare multiple cities. "
)
graph_tool = QueryEngineTool.from_defaults(graph_query_engine, description=graph_description)
query_engine_tools.append(graph_tool)
```
Now, we can define the routing logic and overall router query engine.
Here, we use the `LLMSingleSelector`, which uses LLM to choose a underlying query engine to route the query to.
```python
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector
router_query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(service_context=service_context),
query_engine_tools=query_engine_tools
)
```
### Querying our Unified Interface
The advantage of a unified query interface is that it can now handle different types of queries.
It can now handle queries about specific cities (by routing to the specific city vector index), and also compare/contrast different cities.
Let's take a look at a few examples!
**Asking a Compare/Contrast Question**
```python
# ask a compare/contrast question
response = router_query_engine.query(
"Compare and contrast the arts and culture of Houston and Boston.",
)
print(str(response)
```
**Asking Questions about specific Cities**
```python
response = router_query_engine.query("What are the sports teams in Toronto?")
print(str(response))
```
This "outer" abstraction is able to handle different queries by routing to the right underlying abstractions.

Some files were not shown because too many files have changed in this diff Show More