mirror of
https://github.com/langchain-ai/rag-from-scratch.git
synced 2026-07-01 20:14:06 -04:00
Merge branch 'main' into patch-1
This commit is contained in:
@@ -1,7 +1,8 @@
|
||||
# RAG From Scratch
|
||||
|
||||
Retrieval augmented generation (RAG) comes is a general methodology for connecting LLMs with external data sources. These notebooks accompany a video series will build up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation. It will build up to more advanced techniques to address edge cases or challenges in RAG:
|
||||
LLMs are trained on a large but fixed corpus of data, limiting their ability to reason about private or recent information. Fine-tuning is one way to mitigate this, but is often [not well-suited for factual recall](https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts) and [can be costly](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise).
|
||||
Retrieval augmented generation (RAG) has emerged as a popular and powerful mechanism to expand an LLM's knowledge base, using documents retrieved from an external data source to ground the LLM generation via in-context learning.
|
||||
These notebooks accompany a [video playlist](https://youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x&feature=shared) that builds up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation.
|
||||

|
||||
|
||||
Video playlist:
|
||||
https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x
|
||||
[Video playlist](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x)
|
||||
@@ -14,7 +14,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Enviornment\n",
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"`(1) Packages`"
|
||||
]
|
||||
@@ -470,7 +470,7 @@
|
||||
"1. Allows us to perform unstructured search over the `contents` and `title` of each document\n",
|
||||
"2. And to use range filtering on `view count`, `publication date`, and `length`.\n",
|
||||
"\n",
|
||||
"We want to convert natural langugae into structured search queries.\n",
|
||||
"We want to convert natural language into structured search queries.\n",
|
||||
"\n",
|
||||
"We can define a schema for structured search queries."
|
||||
]
|
||||
|
||||
@@ -16,13 +16,13 @@
|
||||
"\n",
|
||||
"## Preface: Chunking\n",
|
||||
"\n",
|
||||
"We don't explicity cover document chunking / splitting.\n",
|
||||
"We don't explicitly cover document chunking / splitting.\n",
|
||||
"\n",
|
||||
"For an excellent review of document chunking, see this video from Greg Kamradt:\n",
|
||||
"\n",
|
||||
"https://www.youtube.com/watch?v=8OJC21T2SL4\n",
|
||||
"\n",
|
||||
"## Enviornment\n",
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"`(1) Packages`"
|
||||
]
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
"id": "a6656c51-25c7-490b-b76c-a506fab8892b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Enviornment\n",
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"`(1) Packages`"
|
||||
]
|
||||
|
||||
@@ -18,7 +18,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Enviornment\n",
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"`(1) Packages`"
|
||||
]
|
||||
@@ -227,7 +227,7 @@
|
||||
"id": "f5e0e35f-6861-4c5e-9301-04fd5408f8f8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is reccomended (1 indicates identical) for OpenAI embeddings."
|
||||
"[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is recommended (1 indicates identical) for OpenAI embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Enviornment\n",
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"`(1) Packages`"
|
||||
]
|
||||
@@ -818,7 +818,7 @@
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"# HyDE document genration\n",
|
||||
"# HyDE document generation\n",
|
||||
"template = \"\"\"Please write a scientific paper passage to answer the question\n",
|
||||
"Question: {question}\n",
|
||||
"Passage:\"\"\"\n",
|
||||
@@ -845,8 +845,8 @@
|
||||
"source": [
|
||||
"# Retrieve\n",
|
||||
"retrieval_chain = generate_docs_for_retrieval | retriever \n",
|
||||
"retireved_docs = retrieval_chain.invoke({\"question\":question})\n",
|
||||
"retireved_docs"
|
||||
"retrieved_docs = retrieval_chain.invoke({\"question\":question})\n",
|
||||
"retrieved_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -872,7 +872,7 @@
|
||||
" | StrOutputParser()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"final_rag_chain.invoke({\"context\":retireved_docs,\"question\":question})"
|
||||
"final_rag_chain.invoke({\"context\":retrieved_docs,\"question\":question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user