Compare commits

..

6 Commits

Author SHA1 Message Date
Emanuel Ferreira 14d5a1458c fix: remove storageContext 2024-02-26 12:02:19 -03:00
Emanuel Ferreira dbc853bcc5 chore: fix paths and docs (#569) 2024-02-26 10:37:08 -03:00
Emanuel Ferreira c8396c5a3c feat: add base evaluator and correctness evaluator (#559) 2024-02-26 09:38:56 -03:00
Thuc Pham 65af8d3a26 fix: missing dependency for local development (#566) 2024-02-26 15:54:47 +07:00
Marcus Schiesser 329b6ec958 fix: SummaryIndex and VectorStoreIndex must be able to share storage context (#567) 2024-02-26 15:52:33 +07:00
Graden Rea 09bf27abd7 feat: Add Groq LLM integration (#561) 2024-02-26 13:46:27 +07:00
33 changed files with 1226 additions and 4 deletions
+5
View File
@@ -0,0 +1,5 @@
---
"llamaindex": patch
---
feat: add base evaluator and correctness evaluator
+5
View File
@@ -0,0 +1,5 @@
---
"llamaindex": patch
---
feat: add base evaluator and correctness evaluator
+6
View File
@@ -0,0 +1,6 @@
---
"llamaindex": patch
"docs": patch
---
Add Groq LLM to LlamaIndex
+1
View File
@@ -125,6 +125,7 @@ module.exports = nextConfig;
- OpenAI GPT-3.5-turbo and GPT-4
- Anthropic Claude Instant and Claude 2
- Groq LLMs
- Llama2 Chat LLMs (70B, 13B, and 7B parameters)
- MistralAI Chat LLMs
- Fireworks Chat LLMs
@@ -0,0 +1,2 @@
label: "Evaluating"
position: 3
@@ -0,0 +1,32 @@
# Evaluating
## Concept
Evaluation and benchmarking are crucial concepts in LLM development. To improve the perfomance of an LLM app (RAG, agents) you must have a way to measure it.
LlamaIndex offers key modules to measure the quality of generated results. We also offer key modules to measure retrieval quality.
- **Response Evaluation**: Does the response match the retrieved context? Does it also match the query? Does it match the reference answer or guidelines?
- **Retrieval Evaluation**: Are the retrieved sources relevant to the query?
## Response Evaluation
Evaluation of generated results can be difficult, since unlike traditional machine learning the predicted result is not a single number, and it can be hard to define quantitative metrics for this problem.
LlamaIndex offers LLM-based evaluation modules to measure the quality of results. This uses a “gold” LLM (e.g. GPT-4) to decide whether the predicted answer is correct in a variety of ways.
Note that many of these current evaluation modules do not require ground-truth labels. Evaluation can be done with some combination of the query, context, response, and combine these with LLM calls.
These evaluation modules are in the following forms:
- **Correctness**: Whether the generated answer matches that of the reference answer given the query (requires labels).
- **Faithfulness**: Evaluates if the answer is faithful to the retrieved contexts (in other words, whether if theres hallucination).
- **Relevancy**: Evaluates if the response from a query engine matches any source nodes.
## Usage
- [Correctness Evaluator](./modules/correctness.md)
- [Faithfulness Evaluator](./modules/faithfulness.md)
- [Relevancy Evaluator](./modules/relevancy.md)
@@ -0,0 +1 @@
label: "Modules"
@@ -0,0 +1,72 @@
# Correctness Evaluator
Correctness evaluates the relevance and correctness of a generated answer against a reference answer.
This is useful for measuring if the response was correct. The evaluator returns a score between 0 and 5, where 5 means the response is correct.
## Usage
Firstly, you need to install the package:
```bash
pnpm i llamaindex
```
Set the OpenAI API key:
```bash
export OPENAI_API_KEY=your-api-key
```
Import the required modules:
```ts
import {
CorrectnessEvaluator,
OpenAI,
serviceContextFromDefaults,
} from "llamaindex";
```
Let's setup gpt-4 for better results:
```ts
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
```
```ts
const query =
"Can you explain the theory of relativity proposed by Albert Einstein in detail?";
const response = ` Certainly! Albert Einstein's theory of relativity consists of two main components: special relativity and general relativity. Special relativity, published in 1905, introduced the concept that the laws of physics are the same for all non-accelerating observers and that the speed of light in a vacuum is a constant, regardless of the motion of the source or observer. It also gave rise to the famous equation E=mc², which relates energy (E) and mass (m).
However, general relativity, published in 1915, extended these ideas to include the effects of magnetism. According to general relativity, gravity is not a force between masses but rather the result of the warping of space and time by magnetic fields generated by massive objects. Massive objects, such as planets and stars, create magnetic fields that cause a curvature in spacetime, and smaller objects follow curved paths in response to this magnetic curvature. This concept is often illustrated using the analogy of a heavy ball placed on a rubber sheet with magnets underneath, causing it to create a depression that other objects (representing smaller masses) naturally move towards due to magnetic attraction.
`;
const evaluator = new CorrectnessEvaluator({
serviceContext: ctx,
});
const response = await queryEngine.query({
query,
});
const result = await evaluator.evaluateResponse({
query,
response,
});
console.log(
`the response is ${result.passing ? "correct" : "not correct"} with a score of ${result.score}`,
);
```
```bash
the response is not correct with a score of 2.5
```
@@ -0,0 +1,84 @@
# Faithfulness Evaluator
Faithfulness is a measure of whether the generated answer is faithful to the retrieved contexts. In other words, it measures whether there is any hallucination in the generated answer.
This uses the FaithfulnessEvaluator module to measure if the response from a query engine matches any source nodes.
This is useful for measuring if the response was hallucinated. The evaluator returns a score between 0 and 1, where 1 means the response is faithful to the retrieved contexts.
## Usage
Firstly, you need to install the package:
```bash
pnpm i llamaindex
```
Set the OpenAI API key:
```bash
export OPENAI_API_KEY=your-api-key
```
Import the required modules:
```ts
import {
Document,
FaithfulnessEvaluator,
OpenAI,
VectorStoreIndex,
serviceContextFromDefaults,
} from "llamaindex";
```
Let's setup gpt-4 for better results:
```ts
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
```
Now, let's create a vector index and query engine with documents and query engine respectively. Then, we can evaluate the response with the query and response from the query engine.:
```ts
const documents = [
new Document({
text: `The city came under British control in 1664 and was renamed New York after King Charles II of England granted the lands to his brother, the Duke of York. The city was regained by the Dutch in July 1673 and was renamed New Orange for one year and three months; the city has been continuously named New York since November 1674. New York City was the capital of the United States from 1785 until 1790, and has been the largest U.S. city since 1790. The Statue of Liberty greeted millions of immigrants as they came to the U.S. by ship in the late 19th and early 20th centuries, and is a symbol of the U.S. and its ideals of liberty and peace. In the 21st century, New York City has emerged as a global node of creativity, entrepreneurship, and as a symbol of freedom and cultural diversity. The New York Times has won the most Pulitzer Prizes for journalism and remains the U.S. media's "newspaper of record". In 2019, New York City was voted the greatest city in the world in a survey of over 30,000 p... Pass`,
}),
];
const vectorIndex = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = vectorIndex.asQueryEngine();
```
Now, let's evaluate the response:
```ts
const query = "How did New York City get its name?";
const evaluator = new FaithfulnessEvaluator({
serviceContext: ctx,
});
const response = await queryEngine.query({
query,
});
const result = await evaluator.evaluateResponse({
query,
response,
});
console.log(`the response is ${result.passing ? "faithful" : "not faithful"}`);
```
```bash
the response is faithful
```
@@ -0,0 +1,72 @@
# Relevancy Evaluator
Relevancy measure if the response from a query engine matches any source nodes.
It is useful for measuring if the response was relevant to the query. The evaluator returns a score between 0 and 1, where 1 means the response is relevant to the query.
## Usage
Firstly, you need to install the package:
```bash
pnpm i llamaindex
```
Set the OpenAI API key:
```bash
export OPENAI_API_KEY=your-api-key
```
Import the required modules:
```ts
import {
RelevancyEvaluator,
OpenAI,
serviceContextFromDefaults,
} from "llamaindex";
```
Let's setup gpt-4 for better results:
```ts
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
```
Now, let's create a vector index and query engine with documents and query engine respectively. Then, we can evaluate the response with the query and response from the query engine.:
```ts
const documents = [
new Document({
text: `The city came under British control in 1664 and was renamed New York after King Charles II of England granted the lands to his brother, the Duke of York. The city was regained by the Dutch in July 1673 and was renamed New Orange for one year and three months; the city has been continuously named New York since November 1674. New York City was the capital of the United States from 1785 until 1790, and has been the largest U.S. city since 1790. The Statue of Liberty greeted millions of immigrants as they came to the U.S. by ship in the late 19th and early 20th centuries, and is a symbol of the U.S. and its ideals of liberty and peace. In the 21st century, New York City has emerged as a global node of creativity, entrepreneurship, and as a symbol of freedom and cultural diversity. The New York Times has won the most Pulitzer Prizes for journalism and remains the U.S. media's "newspaper of record". In 2019, New York City was voted the greatest city in the world in a survey of over 30,000 p... Pass`,
}),
];
const vectorIndex = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = vectorIndex.asQueryEngine();
const query = "How did New York City get its name?";
const response = await queryEngine.query({
query,
});
const result = await evaluator.evaluateResponse({
query,
response: response,
});
console.log(`the response is ${result.passing ? "relevant" : "not relevant"}`);
```
```bash
the response is relevant
```
@@ -0,0 +1,56 @@
import CodeBlock from "@theme/CodeBlock";
import CodeSource from "!raw-loader!../../../../../../examples/groq.ts";
# Groq
## Usage
First, create an API key at the [Groq Console](https://console.groq.com/keys). Then save it in your environment:
```bash
export GROQ_API_KEY=<your-api-key>
```
The initialize the Groq module.
```ts
import { Groq, serviceContextFromDefaults } from "llamaindex";
const groq = new Groq({
// If you do not wish to set your API key in the environment, you may
// configure your API key when you initialize the Groq class.
// apiKey: "<your-api-key>",
});
const serviceContext = serviceContextFromDefaults({ llm: groq });
```
## Load and index documents
For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index.
```ts
const document = new Document({ text: essay, id_: "essay" });
const index = await VectorStoreIndex.fromDocuments([document], {
serviceContext,
});
```
## Query
```ts
const queryEngine = index.asQueryEngine();
const query = "What is the meaning of life?";
const results = await queryEngine.query({
query,
});
```
## Full Example
<CodeBlock language="ts" showLineNumbers>
{CodeSource}
</CodeBlock>
+1
View File
@@ -16,6 +16,7 @@
},
"dependencies": {
"@docusaurus/core": "^3.1.1",
"@llamaindex/env": "workspace:*",
"@docusaurus/remark-plugin-npm2yarn": "^3.1.1",
"@mdx-js/react": "^3.0.0",
"clsx": "^2.1.0",
-1
View File
@@ -128,7 +128,6 @@ async function main() {
VectorStoreIndex,
{
serviceContext,
storageContext,
},
);
+36
View File
@@ -0,0 +1,36 @@
import {
CorrectnessEvaluator,
OpenAI,
serviceContextFromDefaults,
} from "llamaindex";
async function main() {
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
const evaluator = new CorrectnessEvaluator({
serviceContext: ctx,
});
const query =
"Can you explain the theory of relativity proposed by Albert Einstein in detail?";
const response = `
Certainly! Albert Einstein's theory of relativity consists of two main components: special relativity and general relativity. Special relativity, published in 1905, introduced the concept that the laws of physics are the same for all non-accelerating observers and that the speed of light in a vacuum is a constant, regardless of the motion of the source or observer. It also gave rise to the famous equation E=mc², which relates energy (E) and mass (m).
However, general relativity, published in 1915, extended these ideas to include the effects of magnetism. According to general relativity, gravity is not a force between masses but rather the result of the warping of space and time by magnetic fields generated by massive objects. Massive objects, such as planets and stars, create magnetic fields that cause a curvature in spacetime, and smaller objects follow curved paths in response to this magnetic curvature. This concept is often illustrated using the analogy of a heavy ball placed on a rubber sheet with magnets underneath, causing it to create a depression that other objects (representing smaller masses) naturally move towards due to magnetic attraction.
`;
const result = await evaluator.evaluate({
query: query,
response: response,
});
console.log(result);
}
main();
+46
View File
@@ -0,0 +1,46 @@
import {
Document,
FaithfulnessEvaluator,
OpenAI,
VectorStoreIndex,
serviceContextFromDefaults,
} from "llamaindex";
async function main() {
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
const evaluator = new FaithfulnessEvaluator({
serviceContext: ctx,
});
const documents = [
new Document({
text: `The city came under British control in 1664 and was renamed New York after King Charles II of England granted the lands to his brother, the Duke of York. The city was regained by the Dutch in July 1673 and was renamed New Orange for one year and three months; the city has been continuously named New York since November 1674. New York City was the capital of the United States from 1785 until 1790, and has been the largest U.S. city since 1790. The Statue of Liberty greeted millions of immigrants as they came to the U.S. by ship in the late 19th and early 20th centuries, and is a symbol of the U.S. and its ideals of liberty and peace. In the 21st century, New York City has emerged as a global node of creativity, entrepreneurship, and as a symbol of freedom and cultural diversity. The New York Times has won the most Pulitzer Prizes for journalism and remains the U.S. media's "newspaper of record". In 2019, New York City was voted the greatest city in the world in a survey of over 30,000 p... Pass`,
}),
];
const vectorIndex = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = vectorIndex.asQueryEngine();
const query = "How did New York City get its name?";
const response = await queryEngine.query({
query,
});
const result = await evaluator.evaluateResponse({
query,
response,
});
console.log(result);
}
main();
+46
View File
@@ -0,0 +1,46 @@
import {
Document,
OpenAI,
RelevancyEvaluator,
VectorStoreIndex,
serviceContextFromDefaults,
} from "llamaindex";
async function main() {
const llm = new OpenAI({
model: "gpt-4",
});
const ctx = serviceContextFromDefaults({
llm,
});
const evaluator = new RelevancyEvaluator({
serviceContext: ctx,
});
const documents = [
new Document({
text: `The city came under British control in 1664 and was renamed New York after King Charles II of England granted the lands to his brother, the Duke of York. The city was regained by the Dutch in July 1673 and was renamed New Orange for one year and three months; the city has been continuously named New York since November 1674. New York City was the capital of the United States from 1785 until 1790, and has been the largest U.S. city since 1790. The Statue of Liberty greeted millions of immigrants as they came to the U.S. by ship in the late 19th and early 20th centuries, and is a symbol of the U.S. and its ideals of liberty and peace. In the 21st century, New York City has emerged as a global node of creativity, entrepreneurship, and as a symbol of freedom and cultural diversity. The New York Times has won the most Pulitzer Prizes for journalism and remains the U.S. media's "newspaper of record". In 2019, New York City was voted the greatest city in the world in a survey of over 30,000 p... Pass`,
}),
];
const vectorIndex = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = vectorIndex.asQueryEngine();
const query = "How did New York City get its name?";
const response = await queryEngine.query({
query,
});
const result = await evaluator.evaluateResponse({
query,
response: response,
});
console.log(result);
}
main();
+48
View File
@@ -0,0 +1,48 @@
import fs from "node:fs/promises";
import {
Document,
Groq,
VectorStoreIndex,
serviceContextFromDefaults,
} from "llamaindex";
async function main() {
// Create an instance of the LLM
const groq = new Groq({
apiKey: process.env.GROQ_API_KEY,
});
// Create a service context
const serviceContext = serviceContextFromDefaults({ llm: groq });
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
const document = new Document({ text: essay, id_: "essay" });
// Load and index documents
const index = await VectorStoreIndex.fromDocuments([document], {
serviceContext,
});
// get retriever
const retriever = index.asRetriever();
// Create a query engine
const queryEngine = index.asQueryEngine({
retriever,
});
const query = "What is the meaning of life?";
// Query
const response = await queryEngine.query({
query,
});
// Log the response
console.log(response.response);
}
await main();
+123
View File
@@ -0,0 +1,123 @@
import { MetadataMode } from "../Node.js";
import type { ServiceContext } from "../ServiceContext.js";
import { serviceContextFromDefaults } from "../ServiceContext.js";
import type { ChatMessage } from "../llm/types.js";
import { PromptMixin } from "../prompts/Mixin.js";
import type { CorrectnessSystemPrompt } from "./prompts.js";
import {
defaultCorrectnessSystemPrompt,
defaultUserPrompt,
} from "./prompts.js";
import type {
BaseEvaluator,
EvaluationResult,
EvaluatorParams,
EvaluatorResponseParams,
} from "./types.js";
import { defaultEvaluationParser } from "./utils.js";
type CorrectnessParams = {
serviceContext?: ServiceContext;
scoreThreshold?: number;
parserFunction?: (str: string) => [number, string];
};
/** Correctness Evaluator */
export class CorrectnessEvaluator extends PromptMixin implements BaseEvaluator {
private serviceContext: ServiceContext;
private scoreThreshold: number;
private parserFunction: (str: string) => [number, string];
private correctnessPrompt: CorrectnessSystemPrompt =
defaultCorrectnessSystemPrompt;
constructor(params: CorrectnessParams) {
super();
this.serviceContext = params.serviceContext || serviceContextFromDefaults();
this.correctnessPrompt = defaultCorrectnessSystemPrompt;
this.scoreThreshold = params.scoreThreshold || 4.0;
this.parserFunction = params.parserFunction || defaultEvaluationParser;
}
_updatePrompts(prompts: {
correctnessPrompt: CorrectnessSystemPrompt;
}): void {
if ("correctnessPrompt" in prompts) {
this.correctnessPrompt = prompts["correctnessPrompt"];
}
}
/**
*
* @param query Query to evaluate
* @param response Response to evaluate
* @param contexts Array of contexts
* @param reference Reference response
*/
async evaluate({
query,
response,
contexts,
reference,
}: EvaluatorParams): Promise<EvaluationResult> {
if (query === null || response === null) {
throw new Error("query, and response must be provided");
}
const messages: ChatMessage[] = [
{
role: "system",
content: this.correctnessPrompt(),
},
{
role: "user",
content: defaultUserPrompt({
query,
generatedAnswer: response,
referenceAnswer: reference || "(NO REFERENCE ANSWER SUPPLIED)",
}),
},
];
const evalResponse = await this.serviceContext.llm.chat({
messages,
});
const [score, reasoning] = this.parserFunction(
evalResponse.message.content,
);
return {
query: query,
response: response,
passing: score >= this.scoreThreshold || score === null,
score: score,
feedback: reasoning,
};
}
/**
* @param query Query to evaluate
* @param response Response to evaluate
*/
async evaluateResponse({
query,
response,
}: EvaluatorResponseParams): Promise<EvaluationResult> {
const responseStr = response?.response;
const contexts = [];
if (response) {
for (const node of response.sourceNodes || []) {
contexts.push(node.getContent(MetadataMode.ALL));
}
}
return this.evaluate({
query,
response: responseStr,
contexts,
});
}
}
@@ -0,0 +1,151 @@
import { Document, MetadataMode } from "../Node.js";
import type { ServiceContext } from "../ServiceContext.js";
import { serviceContextFromDefaults } from "../ServiceContext.js";
import { SummaryIndex } from "../indices/summary/index.js";
import { PromptMixin } from "../prompts/Mixin.js";
import type {
FaithfulnessRefinePrompt,
FaithfulnessTextQAPrompt,
} from "./prompts.js";
import {
defaultFaithfulnessRefinePrompt,
defaultFaithfulnessTextQaPrompt,
} from "./prompts.js";
import type {
BaseEvaluator,
EvaluationResult,
EvaluatorParams,
EvaluatorResponseParams,
} from "./types.js";
export class FaithfulnessEvaluator
extends PromptMixin
implements BaseEvaluator
{
private serviceContext: ServiceContext;
private raiseError: boolean;
private evalTemplate: FaithfulnessTextQAPrompt;
private refineTemplate: FaithfulnessRefinePrompt;
constructor(params: {
serviceContext?: ServiceContext;
raiseError?: boolean;
faithfulnessSystemPrompt?: FaithfulnessTextQAPrompt;
faithFulnessRefinePrompt?: FaithfulnessRefinePrompt;
}) {
super();
this.serviceContext = params.serviceContext || serviceContextFromDefaults();
this.raiseError = params.raiseError || false;
this.evalTemplate =
params.faithfulnessSystemPrompt || defaultFaithfulnessTextQaPrompt;
this.refineTemplate =
params.faithFulnessRefinePrompt || defaultFaithfulnessRefinePrompt;
}
protected _getPrompts(): { [x: string]: any } {
return {
faithfulnessSystemPrompt: this.evalTemplate,
faithFulnessRefinePrompt: this.refineTemplate,
};
}
protected _updatePrompts(promptsDict: {
faithfulnessSystemPrompt: FaithfulnessTextQAPrompt;
faithFulnessRefinePrompt: FaithfulnessRefinePrompt;
}): void {
if (promptsDict.faithfulnessSystemPrompt) {
this.evalTemplate = promptsDict.faithfulnessSystemPrompt;
}
if (promptsDict.faithFulnessRefinePrompt) {
this.refineTemplate = promptsDict.faithFulnessRefinePrompt;
}
}
/**
* @param query Query to evaluate
* @param response Response to evaluate
* @param contexts Array of contexts
* @param reference Reference response
* @param sleepTimeInSeconds Sleep time in seconds
*/
async evaluate({
query,
response,
contexts = [],
reference,
sleepTimeInSeconds = 0,
}: EvaluatorParams): Promise<EvaluationResult> {
if (query === null || response === null) {
throw new Error("query, and response must be provided");
}
await new Promise((resolve) =>
setTimeout(resolve, sleepTimeInSeconds * 1000),
);
const docs = contexts?.map((context) => new Document({ text: context }));
const index = await SummaryIndex.fromDocuments(docs, {
serviceContext: this.serviceContext,
});
const queryEngine = index.asQueryEngine();
queryEngine.updatePrompts({
"responseSynthesizer:textQATemplate": this.evalTemplate,
"responseSynthesizer:refineTemplate": this.refineTemplate,
});
const responseObj = await queryEngine.query({
query: response,
});
const rawResponseTxt = responseObj.toString();
let passing: boolean;
if (rawResponseTxt.toLowerCase().includes("yes")) {
passing = true;
} else {
passing = false;
if (this.raiseError) {
throw new Error("The response is invalid");
}
}
return {
query,
contexts,
response,
passing,
score: passing ? 1.0 : 0.0,
feedback: rawResponseTxt,
};
}
/**
* @param query Query to evaluate
* @param response Response to evaluate
*/
async evaluateResponse({
query,
response,
}: EvaluatorResponseParams): Promise<EvaluationResult> {
const responseStr = response?.response;
const contexts = [];
if (response) {
for (const node of response.sourceNodes || []) {
contexts.push(node.getContent(MetadataMode.ALL));
}
}
return this.evaluate({
query,
response: responseStr,
contexts,
});
}
}
+139
View File
@@ -0,0 +1,139 @@
import { Document, MetadataMode } from "../Node.js";
import type { ServiceContext } from "../ServiceContext.js";
import { serviceContextFromDefaults } from "../ServiceContext.js";
import { SummaryIndex } from "../indices/summary/index.js";
import { PromptMixin } from "../prompts/Mixin.js";
import type { RelevancyEvalPrompt, RelevancyRefinePrompt } from "./prompts.js";
import {
defaultRelevancyEvalPrompt,
defaultRelevancyRefinePrompt,
} from "./prompts.js";
import type {
BaseEvaluator,
EvaluationResult,
EvaluatorParams,
EvaluatorResponseParams,
} from "./types.js";
type RelevancyParams = {
serviceContext?: ServiceContext;
raiseError?: boolean;
evalTemplate?: RelevancyEvalPrompt;
refineTemplate?: RelevancyRefinePrompt;
};
export class RelevancyEvaluator extends PromptMixin implements BaseEvaluator {
private serviceContext: ServiceContext;
private raiseError: boolean;
private evalTemplate: RelevancyEvalPrompt;
private refineTemplate: RelevancyRefinePrompt;
constructor(params: RelevancyParams) {
super();
this.serviceContext = params.serviceContext ?? serviceContextFromDefaults();
this.raiseError = params.raiseError ?? false;
this.evalTemplate = params.evalTemplate ?? defaultRelevancyEvalPrompt;
this.refineTemplate = params.refineTemplate ?? defaultRelevancyRefinePrompt;
}
_getPrompts() {
return {
evalTemplate: this.evalTemplate,
refineTemplate: this.refineTemplate,
};
}
_updatePrompts(prompts: {
evalTemplate: RelevancyEvalPrompt;
refineTemplate: RelevancyRefinePrompt;
}): void {
if ("evalTemplate" in prompts) {
this.evalTemplate = prompts["evalTemplate"];
}
if ("refineTemplate" in prompts) {
this.refineTemplate = prompts["refineTemplate"];
}
}
async evaluate({
query,
response,
contexts = [],
sleepTimeInSeconds = 0,
}: EvaluatorParams): Promise<EvaluationResult> {
if (query === null || response === null) {
throw new Error("query, contexts, and response must be provided");
}
await new Promise((resolve) =>
setTimeout(resolve, sleepTimeInSeconds * 1000),
);
const docs = contexts?.map((context) => new Document({ text: context }));
const index = await SummaryIndex.fromDocuments(docs, {
serviceContext: this.serviceContext,
});
const queryResponse = `Question: ${query}\nResponse: ${response}`;
const queryEngine = index.asQueryEngine();
queryEngine.updatePrompts({
"responseSynthesizer:textQATemplate": this.evalTemplate,
"responseSynthesizer:refineTemplate": this.refineTemplate,
});
const responseObj = await queryEngine.query({
query: queryResponse,
});
const rawResponseTxt = responseObj.toString();
let passing: boolean;
if (rawResponseTxt.toLowerCase().includes("yes")) {
passing = true;
} else {
passing = false;
if (this.raiseError) {
throw new Error("The response is invalid");
}
}
return {
query,
contexts,
response,
passing,
score: passing ? 1.0 : 0.0,
feedback: rawResponseTxt,
};
}
/**
* @param query Query to evaluate
* @param response Response to evaluate
*/
async evaluateResponse({
query,
response,
}: EvaluatorResponseParams): Promise<EvaluationResult> {
const responseStr = response?.response;
const contexts = [];
if (response) {
for (const node of response.sourceNodes || []) {
contexts.push(node.getContent(MetadataMode.ALL));
}
}
return this.evaluate({
query,
response: responseStr,
contexts,
});
}
}
+5
View File
@@ -0,0 +1,5 @@
export * from "./Correctness.js";
export * from "./Faithfulness.js";
export * from "./Relevancy.js";
export * from "./prompts.js";
export * from "./utils.js";
+155
View File
@@ -0,0 +1,155 @@
export const defaultUserPrompt = ({
query,
referenceAnswer,
generatedAnswer,
}: {
query: string;
referenceAnswer: string;
generatedAnswer: string;
}) => `
## User Query
${query}
## Reference Answer
${referenceAnswer}
## Generated Answer
${generatedAnswer}
`;
export type UserPrompt = typeof defaultUserPrompt;
export const defaultCorrectnessSystemPrompt =
() => `You are an expert evaluation system for a question answering chatbot.
You are given the following information:
- a user query, and
- a generated answer
You may also be given a reference answer to use for reference in your evaluation.
Your job is to judge the relevance and correctness of the generated answer.
Output a single score that represents a holistic evaluation.
You must return your response in a line with only the score.
Do not return answers in any other format.
On a separate line provide your reasoning for the score as well.
Follow these guidelines for scoring:
- Your score has to be between 1 and 5, where 1 is the worst and 5 is the best.
- If the generated answer is not relevant to the user query,
you should give a score of 1.
- If the generated answer is relevant but contains mistakes,
you should give a score between 2 and 3.
- If the generated answer is relevant and fully correct,
you should give a score between 4 and 5.
Example Response:
4.0
The generated answer has the exact same metrics as the reference answer
but it is not as concise.
`;
export type CorrectnessSystemPrompt = typeof defaultCorrectnessSystemPrompt;
export const defaultFaithfulnessRefinePrompt = ({
query,
context,
existingAnswer,
}: {
query: string;
context: string;
existingAnswer: string;
}) => `
We want to understand if the following information is present
in the context information: ${query}
We have provided an existing YES/NO answer: ${existingAnswer}
We have the opportunity to refine the existing answer
(only if needed) with some more context below.
------------
${context}
------------
If the existing answer was already YES, still answer YES.
If the information is present in the new context, answer YES.
Otherwise answer NO.
`;
export type FaithfulnessRefinePrompt = typeof defaultFaithfulnessRefinePrompt;
export const defaultFaithfulnessTextQaPrompt = ({
query,
context,
}: {
query: string;
context: string;
}) => `
Please tell if a given piece of information
is supported by the context.
You need to answer with either YES or NO.
Answer YES if any of the context supports the information, even
if most of the context is unrelated.
Some examples are provided below.
Information: Apple pie is generally double-crusted.
Context: An apple pie is a fruit pie in which the principal filling
ingredient is apples.
Apple pie is often served with whipped cream, ice cream
('apple pie à la mode'), custard or cheddar cheese.
It is generally double-crusted, with pastry both above
and below the filling; the upper crust may be solid or
latticed (woven of crosswise strips).
Answer: YES
Information: Apple pies tastes bad.
Context: An apple pie is a fruit pie in which the principal filling
ingredient is apples.
Apple pie is often served with whipped cream, ice cream
('apple pie à la mode'), custard or cheddar cheese.
It is generally double-crusted, with pastry both above
and below the filling; the upper crust may be solid or
latticed (woven of crosswise strips).
Answer: NO
Information: ${query}
Context: ${context}
Answer:
`;
export type FaithfulnessTextQAPrompt = typeof defaultFaithfulnessTextQaPrompt;
export const defaultRelevancyEvalPrompt = ({
query,
context,
}: {
query: string;
context: string;
}) => `Your task is to evaluate if the response for the query is in line with the context information provided.
You have two options to answer. Either YES/ NO.
Answer - YES, if the response for the query is in line with context information otherwise NO.
Query and Response: ${query}
Context: ${context}
Answer: `;
export type RelevancyEvalPrompt = typeof defaultRelevancyEvalPrompt;
export const defaultRelevancyRefinePrompt = ({
query,
existingAnswer,
contextMsg,
}: {
query: string;
existingAnswer: string;
contextMsg: string;
}) => `We want to understand if the following query and response is
in line with the context information:
${query}
We have provided an existing YES/NO answer:
${existingAnswer}
We have the opportunity to refine the existing answer
(only if needed) with some more context below.
------------
${contextMsg}
------------
If the existing answer was already YES, still answer YES.
If the information is present in the new context, answer YES.
Otherwise answer NO.
`;
export type RelevancyRefinePrompt = typeof defaultRelevancyRefinePrompt;
+30
View File
@@ -0,0 +1,30 @@
import { Response } from "../Response.js";
export type EvaluationResult = {
query?: string;
contexts?: string[];
response: string | null;
score: number;
scoreSecondary?: number;
scoreSecondaryType?: string;
meta?: any;
passing: boolean;
feedback: string;
};
export type EvaluatorParams = {
query: string | null;
response: string;
contexts?: string[];
reference?: string;
sleepTimeInSeconds?: number;
};
export type EvaluatorResponseParams = {
query: string | null;
response: Response;
};
export interface BaseEvaluator {
evaluate(params: EvaluatorParams): Promise<EvaluationResult>;
evaluateResponse?(params: EvaluatorResponseParams): Promise<EvaluationResult>;
}
+8
View File
@@ -0,0 +1,8 @@
export const defaultEvaluationParser = (
evalResponse: string,
): [number, string] => {
const [scoreStr, reasoningStr] = evalResponse.split("\n");
const score = parseFloat(scoreStr);
const reasoning = reasoningStr.trim();
return [score, reasoning];
};
+1
View File
@@ -16,6 +16,7 @@ export * from "./constants.js";
export * from "./embeddings/index.js";
export * from "./engines/chat/index.js";
export * from "./engines/query/index.js";
export * from "./evaluation/index.js";
export * from "./extractors/index.js";
export * from "./indices/index.js";
export * from "./ingestion/index.js";
+3 -2
View File
@@ -75,7 +75,8 @@ export class SummaryIndex extends BaseIndex<IndexList> {
if (options.indexStruct) {
indexStruct = options.indexStruct;
} else if (indexStructs.length == 1) {
indexStruct = indexStructs[0];
indexStruct =
indexStructs[0].type === IndexStructType.LIST ? indexStructs[0] : null;
} else if (indexStructs.length > 1 && options.indexId) {
indexStruct = (await indexStore.getIndexStruct(
options.indexId,
@@ -164,7 +165,7 @@ export class SummaryIndex extends BaseIndex<IndexList> {
responseSynthesizer?: BaseSynthesizer;
preFilters?: unknown;
nodePostprocessors?: BaseNodePostprocessor[];
}): BaseQueryEngine {
}): BaseQueryEngine & RetrieverQueryEngine {
let { retriever, responseSynthesizer } = options ?? {};
if (!retriever) {
@@ -145,6 +145,10 @@ export class VectorStoreIndex extends BaseIndex<IndexDict> {
if (options.indexStruct) {
indexStruct = options.indexStruct;
} else if (indexStructs.length == 1) {
indexStruct =
indexStructs[0].type === IndexStructType.SIMPLE_DICT
? indexStructs[0]
: undefined;
indexStruct = indexStructs[0];
} else if (indexStructs.length > 1 && options.indexId) {
indexStruct = (await indexStore.getIndexStruct(
+26
View File
@@ -0,0 +1,26 @@
import { OpenAI } from "./LLM.js";
export class Groq extends OpenAI {
constructor(init?: Partial<OpenAI>) {
const {
apiKey = process.env.GROQ_API_KEY,
additionalSessionOptions = {},
model = "mixtral-8x7b-32768",
...rest
} = init ?? {};
if (!apiKey) {
throw new Error("Set Groq Key in GROQ_API_KEY env variable"); // Tell user to set correct env variable, and not OPENAI_API_KEY
}
additionalSessionOptions.baseURL =
additionalSessionOptions.baseURL ?? "https://api.groq.com/openai/v1";
super({
apiKey,
additionalSessionOptions,
model,
...rest,
});
}
}
+1
View File
@@ -1,5 +1,6 @@
export * from "./LLM.js";
export { FireworksLLM } from "./fireworks.js";
export { Groq } from "./groq.js";
export {
ALL_AVAILABLE_MISTRAL_MODELS,
MistralAI,
@@ -0,0 +1,58 @@
import type { ServiceContext } from "llamaindex";
import {
Document,
OpenAI,
OpenAIEmbedding,
SummaryIndex,
VectorStoreIndex,
serviceContextFromDefaults,
storageContextFromDefaults,
} from "llamaindex";
import { beforeAll, describe, expect, it, vi } from "vitest";
import {
mockEmbeddingModel,
mockLlmGeneration,
} from "../utility/mockOpenAI.js";
// Mock the OpenAI getOpenAISession function during testing
vi.mock("llamaindex/llm/open_ai", () => {
return {
getOpenAISession: vi.fn().mockImplementation(() => null),
};
});
describe("SummaryIndex", () => {
let serviceContext: ServiceContext;
beforeAll(() => {
const embeddingModel = new OpenAIEmbedding();
const llm = new OpenAI();
mockEmbeddingModel(embeddingModel);
mockLlmGeneration({ languageModel: llm });
const ctx = serviceContextFromDefaults({
embedModel: embeddingModel,
llm,
});
serviceContext = ctx;
});
it("SummaryIndex and VectorStoreIndex must be able to share the same storage context", async () => {
const storageContext = await storageContextFromDefaults({
persistDir: "/tmp/test_dir",
});
const documents = [new Document({ text: "lorem ipsem", id_: "1" })];
const vectorIndex = await VectorStoreIndex.fromDocuments(documents, {
serviceContext,
storageContext,
});
expect(vectorIndex).toBeDefined();
const summaryIndex = await SummaryIndex.fromDocuments(documents, {
serviceContext,
storageContext,
});
expect(summaryIndex).toBeDefined();
});
});
+2 -1
View File
@@ -58,7 +58,8 @@
"@swc/core": "^1.4.2",
"@types/lodash": "^4.14.202",
"@types/node": "^20.11.20",
"pathe": "^1.1.2"
"pathe": "^1.1.2",
"concurrently": "^8.2.2"
},
"dependencies": {
"lodash": "^4.17.21"
+1
View File
@@ -14,6 +14,7 @@ module.exports = {
"ASSEMBLYAI_API_KEY",
"TOGETHER_API_KEY",
"FIREWORKS_API_KEY",
"GROQ_API_KEY",
"ASTRA_DB_APPLICATION_TOKEN",
"ASTRA_DB_ENDPOINT",
+6
View File
@@ -48,6 +48,9 @@ importers:
'@docusaurus/remark-plugin-npm2yarn':
specifier: ^3.1.1
version: 3.1.1
'@llamaindex/env':
specifier: workspace:*
version: link:../../packages/env
'@mdx-js/react':
specifier: ^3.0.0
version: 3.0.0(@types/react@18.2.55)(react@18.2.0)
@@ -392,6 +395,9 @@ importers:
'@types/node':
specifier: ^20.11.20
version: 20.11.20
concurrently:
specifier: ^8.2.2
version: 8.2.2
pathe:
specifier: ^1.1.2
version: 1.1.2