mirror of
https://github.com/run-llama/template-workflow-extract-reconcile-invoice.git
synced 2026-06-30 22:17:53 -04:00
Update readmes on UI starters to link to datasets and explain application in more detail (#64)
* Update readmes on UI starters to link to datasets and be a little more explanatory * Update README.md * Update README.md * Fix typo, version * version
This commit is contained in:
@@ -1,31 +1,75 @@
|
||||
# Data Extraction and Ingestion
|
||||
# Invoice Extraction and Contract Reconciliation
|
||||
|
||||
This is a starter for LlamaAgents. See the [LlamaAgents (llamactl) getting started guide](https://developers.llamaindex.ai/python/llamaagents/llamactl/getting-started/) for context on local development and deployment.
|
||||
This template provides a LlamaAgents application for extracting structured data from invoices
|
||||
and reconciling it against contract documents using LlamaExtract, LlamaCloud Index, and Agent Data.
|
||||
It helps finance and operations teams validate that incoming invoices comply with agreed contract terms
|
||||
by automatically detecting mismatches in payment terms, totals, and other key fields.
|
||||
|
||||
To run the application, install [`uv`](https://docs.astral.sh/uv/) and run `uvx llamactl serve`.
|
||||
# Running the application
|
||||
|
||||
## Simple customizations
|
||||
This is a starter for LlamaAgents. See the
|
||||
[LlamaAgents (llamactl) getting started guide](https://developers.llamaindex.ai/python/llamaagents/llamactl/getting-started/)
|
||||
for context on local development and deployment.
|
||||
|
||||
For some basic customizations, you can modify `src/extraction_review/config.py`
|
||||
To run the application locally, clone this repo, install [`uv`](https://docs.astral.sh/uv/) and run `uvx llamactl serve`.
|
||||
|
||||
- **`USE_REMOTE_EXTRACTION_SCHEMA`**: Set to `False` to define your own Pydantic `ExtractionSchema` in this file. Set to `True` to reuse the schema from an existing LlamaCloud Extraction Agent.
|
||||
- **`EXTRACTION_AGENT_NAME`**: Logical name for your Extraction Agent. When `USE_REMOTE_EXTRACTION_SCHEMA` is `False`, this name is used to upsert the agent with your local schema; when `True`, it is used to fetch an existing agent.
|
||||
- **`EXTRACTED_DATA_COLLECTION`**: The Agent Data collection name used to store extractions (namespaced by agent name and environment).
|
||||
- **`ExtractionSchema`**: When using a local schema, edit this Pydantic model to match the fields you want extracted. Prefer optional types where possible to allow for partial extractions.
|
||||
This application can also be deployed directly to [LlamaCloud](https://cloud.llamaindex.ai) via the UI,
|
||||
or with `llamactl deployment create`.
|
||||
|
||||
The UI fetches the JSON Schema and collection name from the backend metadata workflow at runtime, and dynamically
|
||||
generates an editing UI based on the schema.
|
||||
## Features
|
||||
|
||||
## Complex customizations
|
||||
- **Invoice data extraction**: Uses a Pydantic `InvoiceExtractionSchema` to extract key invoice fields
|
||||
(vendor, dates, PO number, line items, subtotals, tax, totals, and more) via a LlamaExtract agent.
|
||||
- **Contract indexing and retrieval**: Includes an `index-contract` workflow that downloads contract files
|
||||
from LlamaCloud and indexes them into a dedicated `contracts` LlamaCloud Index for retrieval.
|
||||
- **Automated reconciliation**: Matches invoices to the most relevant contracts using retrieval plus an LLM,
|
||||
then produces an `InvoiceWithReconciliation` record with match confidence, rationale, and structured discrepancies.
|
||||
- **Agent Data storage**: Stores reconciled invoice records in LlamaCloud Agent Data, deduplicated by file hash,
|
||||
so that re-processing the same file replaces prior results instead of duplicating them.
|
||||
- **UI integration**: A web UI lets you upload invoices and contracts, monitor workflow progress,
|
||||
and review or edit extracted and reconciled data.
|
||||
|
||||
For more complex customizations, you can edit the rest of the application. For example, you could
|
||||
- Modify the existing file processing workflow to provide additional context for the extraction process
|
||||
- Take further action based on the extracted data.
|
||||
- Add additional workflows to submit data upon approval.
|
||||
## Example Documents
|
||||
|
||||
You can find sample invoice and contract PDF files to test the application with
|
||||
[here](https://github.com/run-llama/llama-datasets/tree/main/llama_agents/invoice-contracts).
|
||||
|
||||
## Configuration
|
||||
|
||||
All main configuration is in `src/extraction_review/config.py`.
|
||||
|
||||
## How It Works
|
||||
|
||||
The application uses a multi-step workflow powered by LlamaIndex:
|
||||
|
||||
1. **File Upload**: Users upload invoice or contract documents through the UI, which are stored in LlamaCloud.
|
||||
2. **Index Contracts**: Contract files are processed by the `index-contract` workflow and indexed into
|
||||
the `contracts` LlamaCloud Index.
|
||||
3. **Download Invoice**: The `process-file` workflow downloads the selected invoice file from LlamaCloud storage.
|
||||
4. **Extraction**: A LlamaExtract agent runs against the invoice using `InvoiceExtractionSchema`, returning
|
||||
structured invoice data plus field-level metadata.
|
||||
5. **Contract Retrieval**: The workflow queries the contracts index with a query built from invoice fields
|
||||
(vendor, PO number, invoice number, etc.) and retrieves the most relevant contracts.
|
||||
6. **Reconciliation**: An LLM compares the invoice to the retrieved contracts, selects the best match,
|
||||
and produces an `InvoiceWithReconciliation` object with match confidence, rationale, and discrepancy list.
|
||||
7. **Storage**: The reconciled invoice data is wrapped in an `ExtractedData` record (including file hash)
|
||||
and stored in Agent Data, replacing any previous records for the same file hash.
|
||||
8. **Review**: The UI displays the stored data for review, editing, and export.
|
||||
|
||||
### Workflows
|
||||
|
||||
The application includes three main workflows:
|
||||
|
||||
- **`process-file`** (`src/extraction_review/process_file.py`): Main workflow for processing invoices
|
||||
end-to-end (download → extract → reconcile → store).
|
||||
- **`index-contract`** (`src/extraction_review/index_contract.py`): Workflow for downloading and indexing
|
||||
contract documents into a LlamaCloud Index for later retrieval during reconciliation.
|
||||
- **`metadata`** (`src/extraction_review/metadata_workflow.py`): Exposes configuration metadata to the UI,
|
||||
returning the JSON Schema for `InvoiceWithReconciliation` and the Agent Data collection name.
|
||||
|
||||
## Linting and type checking
|
||||
|
||||
Python and javascript pacakges contain helpful scripts to lint, format, and type check the code.
|
||||
Python and javascript packages contain helpful scripts to lint, format, and type check the code.
|
||||
|
||||
To check and fix python code:
|
||||
|
||||
@@ -45,4 +89,4 @@ pnpm run typecheck
|
||||
pnpm run test
|
||||
# run all at once
|
||||
pnpm run all-fix
|
||||
```
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user