[PR #31] [MERGED] Refactoring utils.py and creating a document management UI #36

Closed
opened 2026-02-16 03:17:25 -05:00 by yindo · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/run-llama/notebookllama/pull/31
Author: @AstraBert
Created: 7/13/2025
Status: Merged
Merged: 7/17/2025
Merged by: @AstraBert

Base: mainHead: clelia/utils-refactoring-and-document-management


📝 Commits (10+)

  • c9d8db8 refactor: refactoring utils; feat: adding document management class
  • f25b3c2 feat: add UI
  • 666a89f chore: delete try.html and vbump
  • d1f3944 ci: typecheck
  • a40a08d chore: implementing suggestions
  • e99ea0c feat: first implementation of parametrized SQL (untested)
  • ecd9973 chore: resolve suggestions + tests
  • 2c889ac Fix boolean evaluation error
  • ca8799b Merge branch 'main' into clelia/utils-refactoring-and-document-management
  • ec7479c ci: linting

📊 Changes

17 files changed (+778 additions, -337 deletions)

View changed files

📝 pyproject.toml (+2 -1)
📝 src/notebookllama/Home.py (+46 -13)
src/notebookllama/documents.py (+136 -0)
src/notebookllama/mindmap.py (+109 -0)
src/notebookllama/pages/1_Document_Management_UI.py (+109 -0)
📝 src/notebookllama/pages/2_Document_Chat.py (+1 -1)
📝 src/notebookllama/pages/3_Interactive_Table_and_Plot_Visualization.py (+1 -1)
📝 src/notebookllama/pages/4_Observability_Dashboard.py (+0 -0)
src/notebookllama/processing.py (+148 -0)
src/notebookllama/querying.py (+43 -0)
📝 src/notebookllama/server.py (+3 -1)
src/notebookllama/utils.py (+0 -314)
src/notebookllama/verifying.py (+50 -0)
tests/test_document_management.py (+77 -0)
📝 tests/test_models.py (+30 -3)
📝 tests/test_utils.py (+2 -2)
📝 uv.lock (+21 -1)

📄 Description

Description

Refactoring changes

The utils module had exploded with hundreds of lines of miscellaneous code that does a lot of different things.
To keep it clean and maintainable, I decided to move different functionalities into their own modules, more specifically:

  • processing.py: everything concerning document processing and related operations
  • verifying.py: everything concerning claim verification
  • querying.py: everything concerning querying the LlamaCloud index

Adding a Document Management UI

Following what is highlighted in #21, we do not have a piece of UI that stores documents processed in the past.
By adding the pages/2_Document_Management_UI.py page and the documents.py module, we address this issue.

The solution implemented is the following:

  • We create a Postgres database table (we are already using Postgres for observability, so nothing new to set up) where we can store all the details about the documents processed documents.
  • We create a DocumentManager class that can import into the database instances of ManagedDocument containing all the details of the processed documents (content, summary, Q&A, bullet points, mind map). The import is done after the document processing workflow in Home.py has completed its run.
  • The same DocumentManager can export the documents, and we then render them through Streamlit.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/run-llama/notebookllama/pull/31 **Author:** [@AstraBert](https://github.com/AstraBert) **Created:** 7/13/2025 **Status:** ✅ Merged **Merged:** 7/17/2025 **Merged by:** [@AstraBert](https://github.com/AstraBert) **Base:** `main` ← **Head:** `clelia/utils-refactoring-and-document-management` --- ### 📝 Commits (10+) - [`c9d8db8`](https://github.com/run-llama/notebookllama/commit/c9d8db8885c66c4132e483cbd28d2463fe61b44e) refactor: refactoring utils; feat: adding document management class - [`f25b3c2`](https://github.com/run-llama/notebookllama/commit/f25b3c2afcc732e452b7ba6c15e777406ac5066c) feat: add UI - [`666a89f`](https://github.com/run-llama/notebookllama/commit/666a89f0c8cc6eb29bcb212a7c8b1c2cdadc70d3) chore: delete try.html and vbump - [`d1f3944`](https://github.com/run-llama/notebookllama/commit/d1f3944b8345c376246877fe8119e1d244da69a2) ci: typecheck - [`a40a08d`](https://github.com/run-llama/notebookllama/commit/a40a08d64e79269ccfe8487ce986c618c88cd7c3) chore: implementing suggestions - [`e99ea0c`](https://github.com/run-llama/notebookllama/commit/e99ea0c56f84b5f32a323eaf761d25d8d29b53a5) feat: first implementation of parametrized SQL (untested) - [`ecd9973`](https://github.com/run-llama/notebookllama/commit/ecd99735344b3773e5701340b19aca3d6d8d308d) chore: resolve suggestions + tests - [`2c889ac`](https://github.com/run-llama/notebookllama/commit/2c889acf827146817109778a1a6a3821124101a0) Fix boolean evaluation error - [`ca8799b`](https://github.com/run-llama/notebookllama/commit/ca8799b100c455611be88881fb5e15b1225583e7) Merge branch 'main' into clelia/utils-refactoring-and-document-management - [`ec7479c`](https://github.com/run-llama/notebookllama/commit/ec7479c763540d63ab525f04da1231dd83dfaf07) ci: linting ### 📊 Changes **17 files changed** (+778 additions, -337 deletions) <details> <summary>View changed files</summary> 📝 `pyproject.toml` (+2 -1) 📝 `src/notebookllama/Home.py` (+46 -13) ➕ `src/notebookllama/documents.py` (+136 -0) ➕ `src/notebookllama/mindmap.py` (+109 -0) ➕ `src/notebookllama/pages/1_Document_Management_UI.py` (+109 -0) 📝 `src/notebookllama/pages/2_Document_Chat.py` (+1 -1) 📝 `src/notebookllama/pages/3_Interactive_Table_and_Plot_Visualization.py` (+1 -1) 📝 `src/notebookllama/pages/4_Observability_Dashboard.py` (+0 -0) ➕ `src/notebookllama/processing.py` (+148 -0) ➕ `src/notebookllama/querying.py` (+43 -0) 📝 `src/notebookllama/server.py` (+3 -1) ➖ `src/notebookllama/utils.py` (+0 -314) ➕ `src/notebookllama/verifying.py` (+50 -0) ➕ `tests/test_document_management.py` (+77 -0) 📝 `tests/test_models.py` (+30 -3) 📝 `tests/test_utils.py` (+2 -2) 📝 `uv.lock` (+21 -1) </details> ### 📄 Description ## Description ### Refactoring changes The `utils` module had exploded with hundreds of lines of miscellaneous code that does a lot of different things. To keep it clean and maintainable, I decided to move different functionalities into their own modules, more specifically: - `processing.py`: everything concerning document processing and related operations - `verifying.py`: everything concerning claim verification - `querying.py`: everything concerning querying the LlamaCloud index ### Adding a Document Management UI Following what is highlighted in #21, we do not have a piece of UI that stores documents processed in the past. By adding the `pages/2_Document_Management_UI.py` page and the `documents.py` module, we address this issue. The solution implemented is the following: - We create a Postgres database table (we are already using Postgres for observability, so nothing new to set up) where we can store all the details about the documents processed documents. - We create a `DocumentManager` class that can import into the database instances of `ManagedDocument` containing all the details of the processed documents (content, summary, Q&A, bullet points, mind map). The import is done after the document processing workflow in `Home.py` has completed its run. - The same `DocumentManager` can export the documents, and we then render them through Streamlit. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
yindo added the pull-request label 2026-02-16 03:17:25 -05:00
yindo closed this issue 2026-02-16 03:17:25 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: run-llama/notebookllama#36