* Enhance LlamaIndexServer with next question suggestion feature - Added `suggest_next_questions` parameter to the LlamaIndexServer for suggesting follow-up questions after the assistant's response. - Updated README.md to document the new configuration option. - Introduced `SUGGEST_NEXT_QUESTION_PROMPT` in prompts.py for customizable question suggestions. - Bumped version to 0.1.16 in uv.lock to reflect the new feature. * Implement next question suggestion feature in LlamaIndexServer - Added `suggestNextQuestions` option to LlamaIndexServer for suggesting follow-up questions after the assistant's response. - Updated README.md to include the new configuration option. - Modified example workflow to utilize the new feature. - Enhanced chat handler to conditionally send suggested questions based on the new option. * add changeset * remove log * bundle ui instead of download * check test * check test check test check test check test check test check test check test check test check test check test * fix tests * Update artifact path in workflow and clarify README.md text - Changed the artifact path in the GitHub Actions workflow from `python/llama-index-server/dist/` to `dist/`. - Revised README.md to clarify the default prompt used for the `suggest_next_questions` configuration option. * support changeset for python * refactor: update llama-index-server structure and workflows * fix workflows * fix workflows * fix workflows * add changeset * fix cannot release python * Update packages/server/README.md Co-authored-by: Thuc Pham <51660321+thucpn@users.noreply.github.com> * Update starter questions in LlamaIndex App and add TODO for suggestion feature in chat API --------- Co-authored-by: Marcus Schiesser <mail@marcusschiesser.de> Co-authored-by: Thuc Pham <51660321+thucpn@users.noreply.github.com>
12 KiB
LlamaIndex Server
LlamaIndexServer is a Next.js-based application that allows you to quickly launch your LlamaIndex Workflows and Agent Workflows as an API server with an optional chat UI. It provides a complete environment for running LlamaIndex workflows with both API endpoints and a user interface for interaction.
Features
- Add a sophisticated chatbot UI to your LlamaIndex workflow
- Edit code and document artifacts in an OpenAI Canvas-style UI
- Extendable UI components for events and headers
- Built on Next.js for high performance and easy API development
Installation
npm i @llamaindex/server
Quick Start
Create an index.ts file and add the following code:
import { LlamaIndexServer } from "@llamaindex/server";
import { openai } from "@llamaindex/openai";
import { agent } from "@llamaindex/workflow";
import { wiki } from "@llamaindex/tools"; // or any other tool
const createWorkflow = () => agent({ tools: [wiki()], llm: openai("gpt-4o") });
new LlamaIndexServer({
workflow: createWorkflow,
uiConfig: {
appTitle: "LlamaIndex App",
starterQuestions: ["Who is the first president of the United States?"],
},
}).start();
The createWorkflow function is a factory function that creates an Agent Workflow with a tool that retrieves information from Wikipedia in this case. For more details, read about the Workflow factory contract.
Running the Server
In the same directory as index.ts, run the following command to start the server:
tsx index.ts
The server will start at http://localhost:3000
You can also make a request to the server:
curl -X POST "http://localhost:3000/api/chat" -H "Content-Type: application/json" -d '{"message": "Who is the first president of the United States?"}'
Configuration Options
The LlamaIndexServer accepts the following configuration options:
workflow: A callable function that creates a workflow instance for each request. See Workflow factory contract for more details.uiConfig: An object to configure the chat UI containing the following properties:appTitle: The title of the application (default:"LlamaIndex App")starterQuestions: List of starter questions for the chat UI (default:[])componentsDir: The directory for custom UI components rendering events emitted by the workflow. The default is undefined, which does not render custom UI components.llamaCloudIndexSelector: Whether to show the LlamaCloud index selector in the chat UI (requiresLLAMA_CLOUD_API_KEYto be set in the environment variables) (default:false)dev_mode: When enabled, you can update workflow code in the UI and see the changes immediately. It's currently in beta and only supports updating workflow code atapp/src/workflow.ts. Please start server in dev mode (npm run dev) to use see this reload feature enabled.
suggestNextQuestions: Whether to suggest next questions after the assistant's response (default:true). You can change the prompt for the next questions by setting theNEXT_QUESTION_PROMPTenvironment variable.
LlamaIndexServer accepts all the configuration options from Nextjs Custom Server such as port, hostname, dev, etc.
See all Nextjs Custom Server options here.
Workflow factory contract
The workflow provided will be called for each chat request to initialize a new workflow instance. The contract of the generated workflow must be the same as for the Agent Workflow.
This means that the workflow must handle a startAgentEvent event, which is the entry point of the workflow and contains the following information in it's data property:
{
userInput: MessageContent;
chatHistory?: ChatMessage[] | undefined;
};
The userInput is the latest user message and the chatHistory is the list of messages exchanged between the user and the workflow so far.
Furthermore, the workflow must stop with a stopAgentEvent event to mark the end of the workflow. In between, the workflow can emit UI events to render custom UI components and Artifact events to send structured data like generated documents or code snippets to the UI.
import {
createStatefulMiddleware,
createWorkflow,
startAgentEvent,
} from "@llamaindex/workflow";
import { ChatMemoryBuffer, type ChatMessage, Settings } from "llamaindex";
import { openai } from "@llamaindex/openai";
import { wiki } from "@llamaindex/tools";
Settings.llm = openai("gpt-4o");
export const workflowFactory = async () => {
const workflow = createWorkflow();
workflow.handle([startAgentEvent], async ({ data }) => {
const { state, sendEvent } = getContext();
const messages = data.chatHistory;
const toolCallResponse = await chatWithTools(
Settings.llm,
[wiki()],
messages,
);
// using result from tool call and use `sendEvent` to emit the next event...
});
// define more workflow handling logic here...
// Finally stop with a `stopAgentEvent` event to mark the end of the workflow.
// return stopAgentEvent.with({
// result: "This is the end!",
// });
return workflow;
};
To generate sophisticated examples of workflows, you best use the create-llama project.
AI-generated UI Components
The LlamaIndex server provides support for rendering workflow events using custom UI components, allowing you to extend and customize the chat interface. These components can be auto-generated using an LLM by providing a JSON schema of the workflow event.
UI Event Schema
To display custom UI components, your workflow needs to emit UI events that have an event type for identification and a data object:
class UIEvent extends WorkflowEvent<{
type: "ui_event";
data: UIEventData;
}> {}
The data object can be any JSON object. To enable AI generation of the UI component, you need to provide a schema for that data (here we're using Zod):
const MyEventDataSchema = z
.object({
stage: z
.enum(["retrieve", "analyze", "answer"])
.describe("The current stage the workflow process is in."),
progress: z
.number()
.min(0)
.max(1)
.describe("The progress in percent of the current stage"),
})
.describe("WorkflowStageProgress");
type UIEventData = z.infer<typeof MyEventDataSchema>;
Generate UI Components
The generateEventComponent function uses an LLM to generate a custom UI component based on the JSON schema of a workflow event. The schema should contain accurate descriptions of each field so that the LLM can generate matching components for your use case. We've done this for you in the example above using the describe function from Zod:
import { OpenAI } from "llamaindex";
import { generateEventComponent } from "@llamaindex/server";
import { MyEventDataSchema } from "./your-workflow";
// Also works well with Claude 3.5 Sonnet and Google Gemini 2.5 Pro
const llm = new OpenAI({ model: "gpt-4.1" });
const code = generateEventComponent(MyEventDataSchema, llm);
After generating the code, we need to save it to a file. The file name must match the event type from your workflow (e.g., ui_event.jsx for handling events with ui_event type):
fs.writeFileSync("components/ui_event.jsx", code);
Feel free to modify the generated code to match your needs. If you're not satisfied with the generated code, we suggest improving the provided JSON schema first or trying another LLM.
Note that
generateEventComponentis generating JSX code, but you can also provide a TSX file.
Server Setup
To use the generated UI components, you need to initialize the LlamaIndex server with the componentsDir that contains your custom UI components:
new LlamaIndexServer({
workflow: createWorkflow,
uiConfig: {
appTitle: "LlamaIndex App",
componentsDir: "components",
},
}).start();
Sending Artifacts to the UI
In addition to UI events for custom components, LlamaIndex Server supports a special ArtifactEvent to send structured data like generated documents or code snippets to the UI. These artifacts are displayed in a dedicated "Canvas" panel in the chat interface.
Artifact Event Structure
To send an artifact, your workflow needs to emit an event with type: "artifact". The data payload of this event should include:
type: A string indicating the type of artifact (e.g.,"document","code").created_at: A timestamp (e.g.,Date.now()) indicating when the artifact was created.data: An object containing the specific details of the artifact. The structure of this object depends on the artifacttype.
Defining and Sending an ArtifactEvent
First, define your artifact event using workflowEvent from @llamaindex/workflow:
import { workflowEvent } from "@llamaindex/workflow";
// Example for a document artifact
const artifactEvent = workflowEvent<{
type: "artifact"; // Must be "artifact"
data: {
type: "document"; // Custom type for your artifact (e.g., "document", "code")
created_at: number;
data: {
// Specific data for the document artifact type
title: string;
content: string;
type: "markdown" | "html"; // document format
};
};
}>();
Then, within your workflow logic, use sendEvent (obtained from getContext()) to emit the event:
// Assuming 'sendEvent' is available in your workflow handler
// and 'documentDetails' contains the content for the artifact.
sendEvent(
artifactEvent.with({
type: "artifact", // This top-level type must be "artifact"
data: {
type: "document", // This is your specific artifact type
created_at: Date.now(),
data: {
title: "My Generated Document",
content: "# Hello World
This is a markdown document.",
type: "markdown",
},
},
}),
);
This will send the artifact to the LlamaIndex Server UI, where it will be rendered in the ChatCanvasPanel by a renderer depending on the artifact type. For type document this is using the DocumentArtifactViewer.
Default Endpoints and Features
Chat Endpoint
The server includes a default chat endpoint at /api/chat for handling chat interactions.
Chat UI
The server always provides a chat interface at the root path (/) with:
- Configurable starter questions
- Real-time chat interface
- API endpoint integration
Static File Serving
- The server automatically mounts the
dataandoutputfolders at{server_url}{api_prefix}/files/data(default:/api/files/data) and{server_url}{api_prefix}/files/output(default:/api/files/output) respectively. - Your workflows can use both folders to store and access files. By convention, the
datafolder is used for documents that are ingested, and theoutputfolder is used for documents generated by the workflow.