Compare commits

...

482 Commits

Author SHA1 Message Date
Logan f385e96ab8 Delete parse.md 2026-03-24 19:27:52 -06:00
Logan c3e4696b5f Delete index.md 2026-03-24 19:27:41 -06:00
Logan 1e40c9cf94 Delete extract.md 2026-03-24 19:27:25 -06:00
Logan 802bc2a9f8 Add deprecation notice and clean up README
Added deprecation notice and removed outdated content.
2026-03-24 19:26:59 -06:00
Neeraj Pradhan 5ea758b853 More robust extract tests with pytest xdist (#1117) 2026-02-16 16:16:15 -08:00
dependabot[bot] 208b6f2fa5 build(deps): bump slackapi/slack-github-action from 1.27.0 to 2.1.1 (#1092)
Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 1.27.0 to 2.1.1.
- [Release notes](https://github.com/slackapi/slack-github-action/releases)
- [Commits](https://github.com/slackapi/slack-github-action/compare/v1.27.0...v2.1.1)

---
updated-dependencies:
- dependency-name: slackapi/slack-github-action
  dependency-version: 2.1.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-14 21:03:05 -06:00
github-actions[bot] e1b9143f79 chore: version packages (#1116)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-02-13 15:29:09 -08:00
Neeraj Pradhan 232c55bd6a Bump up patch version (#1115) 2026-02-13 15:20:52 -08:00
Neeraj Pradhan ab6f2f8da5 Allows xlsx files in the sdk for extract (#1114) 2026-02-13 14:44:25 -08:00
github-actions[bot] 66c2639ec8 chore: version packages (#1112) 2026-02-11 15:18:43 -06:00
Logan da1916c69f more loudly deprecate ancient llama-parse package (#1111) 2026-02-11 15:16:01 -06:00
Neeraj Pradhan 345e272573 Lower frequency for e2e tests (#1110) 2026-02-11 09:07:15 -08:00
github-actions[bot] d70fbac1ce chore: version packages (#1103) 2026-02-02 11:46:39 -06:00
Logan 2358df10c6 add notice (don't merge until ready) (#1065) 2026-02-02 11:42:47 -06:00
Neeraj Pradhan 829628cc86 Use unique filenames when running dist tests (#1101) 2026-01-30 14:00:27 -08:00
Neeraj Pradhan 42b7bbd1ae Use sonnet when testing premium mode in extract e2e (#1098)
* Use sonnet when testing premium mode in extract e2e

* fix parse model
2026-01-27 16:16:48 -08:00
Neeraj Pradhan 38da9a52d7 Invalidate cache when running extract tests (#1097) 2026-01-26 17:33:23 -08:00
Neeraj Pradhan 1e7ec40ee7 Fix verbose logging on slack channel (#1096) 2026-01-26 17:12:50 -08:00
Neeraj Pradhan dd83c1a9d0 Add retries to all extract sdk functions uniformly (#1095) 2026-01-26 12:05:16 -08:00
Neeraj Pradhan 7cb83f5cd3 Change cron schedule for hourly extract tests (#1094) 2026-01-26 10:15:34 -08:00
Neeraj Pradhan b05266be6d Try to reparse scheduled workflow (#1093) 2026-01-26 09:56:22 -08:00
Neeraj Pradhan eab4798165 Force github reparse of the workflow (#1090) 2026-01-23 11:36:28 -08:00
Neeraj Pradhan b174fa8fab Run hourly extract tests to catch SDK schema drifts (#1089)
* Run hourly extract tests to catch SDK schema drifts

* fix url

* fix prod/staging env
2026-01-22 18:18:45 -08:00
github-actions[bot] b12ffef916 chore: version packages (#1087)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-21 12:44:43 -08:00
Neeraj Pradhan 07ec282257 Bump up patch version for python packages (#1086) 2026-01-21 12:30:23 -08:00
Neeraj Pradhan 013b689812 Bump up minor version for python packages (#1085) 2026-01-21 12:13:13 -08:00
Adrian Lyjak 3040951cb8 Use error description in invalid extraction error (#1081)
* fix: display extraction job error in InvalidExtractionData exception

Refactored InvalidExtractionData to read the `error` field from
ExtractRun and prominently display it in the exception message.
The job-level error is now stored in the `extraction_error` attribute
and included in the invalid_item's metadata as `job_error`.

* Create three-yaks-beg.md

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-18 17:43:21 -05:00
github-actions[bot] 9239498945 chore: version packages (#1076)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-14 19:15:05 +01:00
Pierre-Loic Doulcet 19cbb25631 remove extension filter (#1075)
* remove extension filter

* changeset

* Update ninety-goats-look.md

Make it a patch version

* Update package.json

back out of version bump

* Update pyproject.toml

back out of version bump

* Update package.json

back out of version bump

* Update pyproject.toml

back out of version bump

---------

Co-authored-by: Adrian Lyjak <adrianlyjak@gmail.com>
2026-01-14 19:13:39 +01:00
github-actions[bot] 812e2f7d72 chore: version packages (#1073)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-12 19:03:13 +01:00
Clelia (Astra) Bertelli d7864afe3f fix: bug fix retry logic in Classify and Extract (#1066)
* fix: bug fix retry logic in Classify and Extract

* chore: apply suggestion

* chore: add PARTIAL_SUCCESS to classify
2026-01-12 18:57:40 +01:00
github-actions[bot] ade8d027a5 chore: version packages (#1071)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-09 20:29:00 -05:00
Adrian Lyjak 997bcc8531 forgot ts changeset (#1070) 2026-01-09 20:23:29 -05:00
github-actions[bot] 8be554c234 chore: version packages (#1068)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-01-09 18:56:51 -05:00
Adrian Lyjak f777cab0c5 Add bounding box type support to TS too (#1069)
ts too
2026-01-09 18:55:16 -05:00
Adrian Lyjak b9b83c953d Parse bounding boxes from extract jobs results in agent data (#1067) 2026-01-09 18:47:57 -05:00
github-actions[bot] 3ec7024626 chore: version packages (#1058) 2025-12-10 11:53:30 -06:00
Logan d5b18a03fa Remove generate from build path to fix publishing (#1057) 2025-12-10 11:52:43 -06:00
Clelia (Astra) Bertelli 18dd04b6de docs: correct links in readme (#1056) 2025-12-10 17:08:58 +01:00
github-actions[bot] 685a5e6ccc chore: version packages (#1054) 2025-12-09 15:30:13 -06:00
Jim Geurts 576c3d9076 feat: support zod v4 & v3 (#1052) 2025-12-09 15:29:23 -06:00
Logan c8321d2bc5 improve parse ts polling (#1053) 2025-12-09 15:21:19 -06:00
Tuana Çelik 131bbed7aa batch parse sctript with asyncio (#1051)
* batch parse sctript with asyncio

* lint

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
2025-12-08 18:50:11 +01:00
Javier Torres 41c8ac2348 docs: Split Example Notebook (#1044)
* split notebook

* Lint
2025-12-08 13:57:20 +01:00
github-actions[bot] 32c53cdf96 chore: version packages (#1046) 2025-12-04 20:43:29 -06:00
Logan 71db318fc2 add tier/version to api (#1045) 2025-12-04 20:42:17 -06:00
George He dac0f79e51 Fix sheets API client (#1032) 2025-12-03 16:39:47 -06:00
github-actions[bot] 32487763d5 chore: version packages (#1043) 2025-12-03 14:52:26 -06:00
Daniel Bustamante Ospina 06c3c556e6 Add new fields to SpreadsheetParsingConfig and update validation tests (#1042) 2025-12-03 14:50:23 -06:00
github-actions[bot] e5dcaa83df chore: version packages (#1041)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-12-03 11:03:36 -08:00
Neeraj Pradhan 1b7198dc62 Bump llama cloud services and parse versions (#1040) 2025-12-03 10:39:35 -08:00
github-actions[bot] 9cfe074206 chore: version packages (#1039) 2025-12-02 12:16:50 -06:00
Logan ae30990ada line level bbox (#1038) 2025-12-02 12:12:17 -06:00
github-actions[bot] 8f1c359abc chore: version packages (#1037) 2025-12-02 09:50:07 -06:00
Logan 0a110de9c7 Dummy release (#1036) 2025-12-02 09:45:52 -06:00
github-actions[bot] d705b16923 chore: version packages (#1035) 2025-12-02 09:43:20 -06:00
Logan ca781132c8 No more presigned URLs by default (#1034) 2025-12-02 09:41:49 -06:00
Roman Isecke 7a68b0fb68 docs: add batch parse directory example notebook (#1009)
* create notebook to parse a batch of documents

* remove local dev code

* tidy

* don't git track the sample pdfs

* update notebook to use client

* add logic to fetch parse results using job id from batch item

* generate example for fetching results via parse job id

* fix linting

* convert notebook to use httpx rather than client for now

* fix linting
2025-12-01 13:57:18 -05:00
George He 87dec5433d Add timeouts to E2E GHA (#1031)
* Add timeouts

* Session timeouts too
2025-11-27 14:57:59 -08:00
Pierre-Loic Doulcet 99f4eba8d0 Pierre/more parse parameters (#1027)
* up python sdk

* bupmVErsion

* Update py/llama_cloud_services/parse/base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update py/llama_cloud_services/parse/base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-25 14:43:27 +01:00
github-actions[bot] 54561e2dd2 chore: version packages (#1025) 2025-11-24 16:41:22 -06:00
Logan Markewich bfaec79a8f changeset 2025-11-24 16:37:58 -06:00
Logan Markewich 3e0e522a6b update ts 2025-11-24 16:36:31 -06:00
Logan Markewich f70b6d87ec update py 2025-11-24 16:31:15 -06:00
Logan Markewich 693b5b83b1 improve llama-sheets example 2025-11-24 09:44:11 -06:00
Neeraj Pradhan ad38ef5cd7 Add notebook for tabular extraction (#1017) 2025-11-18 09:47:07 -08:00
Logan Markewich 4c4c6e6575 fix sheets test 2025-11-17 16:14:29 -06:00
github-actions[bot] 740b47d9dc chore: version packages (#1016)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-11-17 16:11:18 -06:00
Logan f3233deb2e propagate retrieval metadata to retrieved nodes (#1015) 2025-11-17 16:06:52 -06:00
github-actions[bot] fd45127678 chore: version packages (#1014)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-11-17 21:18:09 +01:00
Clelia (Astra) Bertelli 0506c88735 chore: rename classifyclient and keep it backward compatible (#1013)
* chore: rename classifyclient and keep it backward compatible

* chore: Replace ClassifyClient in notebooks

* chore: changesets
2025-11-17 21:16:23 +01:00
Logan 4bc9eb6c0d beta sheets API (#992) 2025-11-17 11:32:06 -06:00
Patricia 5a3dac655c Add support for custom metadata in file upload methods (#1012) 2025-11-17 11:18:11 -06:00
github-actions[bot] 519254efbe chore: version packages (#999)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-11-04 14:18:27 -05:00
Adrian Lyjak 6ab56b79f3 fix version breaking (#998) 2025-11-04 14:14:38 -05:00
Adrian Lyjak e020e3e2b1 Remove organization id from classify (#997) 2025-11-04 14:05:19 -05:00
Adrian Lyjak f293547910 destructured keyword params for classify (#996) 2025-11-04 14:04:41 -05:00
github-actions[bot] 662bc37462 chore: version packages (#995) 2025-11-03 20:15:50 -06:00
Neeraj Pradhan 9f1ef4ef1f Bump to version 0.6.78 (#994) 2025-11-03 20:11:18 -06:00
github-actions[bot] 1243573924 chore: version packages (#991) 2025-10-30 10:11:16 -06:00
Preston Carlson 407292b177 Fix: Return partial results on job failure (#990)
* Return partial result on failed job, especially job id

* Maintains NO_DATA_FOUND_IN_FILE throw behavior
2025-10-23 13:44:41 -07:00
Clelia (Astra) Bertelli a7df7c0912 docs: add llamaclassify demo (#989) 2025-10-23 17:38:57 +02:00
github-actions[bot] c758144bfe chore: version packages (#988)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-22 14:41:44 +02:00
Clelia (Astra) Bertelli fee516dd19 feat: add classify to ts sdk (#985)
* feat: add classify to ts sdk

* ci: changesets

* chore: camelCase for everyone; refactor: slimmer logic for fileContents/filePaths handling

* chore: implement claude suggestions
2025-10-22 14:39:20 +02:00
Neeraj Pradhan 032fbd5768 Add common SourceText class for classify/extract text inputs (#986) 2025-10-21 13:37:41 -07:00
Jerry Liu 970e864514 improve classify notebook (#983) 2025-10-20 10:07:35 -07:00
github-actions[bot] d0649ece6e chore: version packages (#982) 2025-10-16 16:58:29 -06:00
MartijnLeplae 5d4cabd843 Add ImageNode support in TypeScript (#969) 2025-10-16 16:56:28 -06:00
github-actions[bot] 9070a6ac16 chore: version packages (#981) 2025-10-15 12:01:34 -06:00
Bogdan Gheorghe 4f24f537f6 Add agressive table extraction argument (#980) 2025-10-15 11:57:34 -06:00
github-actions[bot] 8859a203e2 chore: version packages (#977) 2025-10-14 19:03:36 -06:00
dependabot[bot] b091364054 build(deps): bump astral-sh/setup-uv from 6 to 7 (#974) 2025-10-14 19:02:32 -06:00
dependabot[bot] 43b1a013ca build(deps): bump github/codeql-action from 3 to 4 (#973) 2025-10-14 19:02:20 -06:00
Logan f81532e7f2 safest types possible for parse (#976) 2025-10-14 19:02:07 -06:00
github-actions[bot] 986d3987d3 chore: version packages (#965)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-14 08:14:49 -06:00
Logan 1bf522311f fix default bbox values (#975) 2025-10-14 07:44:35 -06:00
Preston Carlson 24166dcfc8 Only escape single dollar sign in notebook md (#964)
* Limit escaping to lone dollar signs - preserve double dollar for latex equations

* Updated uv.lock via make lint

* Patch bump

* Unit test for _format_markdown_for_notebook

Test doesn't depend on getting real results/is just testing a string manipulation function, so inserting before other tests. Should move to its own file if we add additional formatting configurations
2025-10-07 08:06:03 -07:00
github-actions[bot] bfb7f3973f chore: version packages (#956)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-06 11:15:55 -04:00
dependabot[bot] 979f643c77 build(deps): bump actions/checkout from 4 to 5 (#961) 2025-10-06 09:12:38 -06:00
dependabot[bot] aefd89cf1b build(deps): bump actions/setup-python from 5 to 6 (#960) 2025-10-06 09:12:30 -06:00
dependabot[bot] 8ea2b2c64e build(deps): bump pnpm/action-setup from 3 to 4 (#959) 2025-10-06 09:12:20 -06:00
dependabot[bot] 4a9a2a21d8 build(deps): bump astral-sh/setup-uv from 3 to 6 (#958) 2025-10-06 09:12:08 -06:00
Logan e6a7939206 loosen packaging requirements (#962) 2025-10-06 09:11:57 -06:00
Adrian Lyjak 104a03e829 fix: re-enable js publishing (#963) 2025-10-06 11:10:46 -04:00
Terry Zhao 6e0f2f4ca0 citation can be null (#869)
* citation can be null

* Add changeset

---------

Co-authored-by: Terry Zhao <terryzhao@runllama.ai>
Co-authored-by: Adrian Lyjak <adrianlyjak@gmail.com>
2025-10-04 16:26:11 -04:00
dependabot[bot] 0708d11f8a Bump actions/setup-node from 4 to 5 (#909)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 5.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-04 16:21:50 -04:00
github-actions[bot] be19185503 chore: version packages (#954)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-03 20:14:04 -04:00
Adrian Lyjak 7571b0d6c4 Missed some things again with tag fixes (#955)
guh
2025-10-03 20:12:53 -04:00
Adrian Lyjak ad6734bf80 fixup tagging more better (#953)
* fix: correct private field type in py/package.json to be recognized by pnpm

* use packages more directly, make public

* add bump

* fix crash
2025-10-03 19:53:57 -04:00
github-actions[bot] 9ec2a8322e chore: version packages (#952) 2025-10-03 15:11:14 -06:00
Logan 51011b9f30 fix changeset harder (#951) 2025-10-03 15:09:58 -06:00
Logan 09805f9e15 swap changesets (#949) 2025-10-03 15:06:00 -06:00
Adrian Lyjak 8ced6f6eab fix: explicitly tag. I thought the action did this (#948) 2025-10-03 16:59:41 -04:00
Preston Carlson 081ddeca34 Escaping dollar signs in md output when running in a jupyter notebook (#945) 2025-10-03 14:52:26 -06:00
Adrian Lyjak 2460908789 Disable npm release (#946) 2025-10-03 16:13:16 -04:00
Adrian Lyjak c226d6a54c Fix more bugs in publishing (#944) 2025-10-03 11:16:43 -04:00
Adrian Lyjak 5d4c682eb2 fix: theres just one publish token (#943) 2025-10-03 10:56:10 -04:00
github-actions[bot] f72d3535c8 chore: version packages (#941)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-03 10:25:11 -04:00
Adrian Lyjak 1ea09a366e Update llama-cloud dep (#940) 2025-10-03 09:56:56 -04:00
Adrian Lyjak d4bbeb6389 ignore nvmrc (#942)
ignore npmrc
2025-10-03 00:21:32 -04:00
Adrian Lyjak d028397603 version and release via changesets (#849) 2025-10-03 00:08:52 -04:00
Emanuel Ferreira 35ea8476db docs: parse -> classify -> extract (#931) 2025-09-24 18:52:15 -03:00
Logan 3e5f7c4f1e Update parse.md 2025-09-24 11:35:13 -06:00
Adrian Lyjak 9d9b816644 Handle reasoning field conflict (#929)
* Handle reasoning field conflict

* update version to 0.6.69
2025-09-22 11:29:11 -04:00
Adrian Lyjak 83555f76e6 Handle validation errors for agent data retrieval (#928)
* feat: Add untyped agent data retrieval and handling

Introduces methods to retrieve agent data as untyped dictionaries,
handling validation errors gracefully. This allows for more flexible
data access when strict typing is not required or when data may be
malformed.

Co-authored-by: adrian <adrian@runllama.ai>

* Expose raw api result

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-09-22 11:28:49 -04:00
Adrian Lyjak 5edf5f914a Support creating indexes in a specified project_id (#924)
* Support creating indexes in a specified project_id

* Bump
2025-09-18 11:07:07 -04:00
Adrian Lyjak 22e4975cb2 Refactor agent fields in llama_cloud_services (#921) 2025-09-17 15:14:40 -04:00
Peter Rowlands (변기호) bc2f04379b py: bump version to v.0.6.66 (#920) 2025-09-16 19:34:18 +09:00
Peter Rowlands (변기호) f9f951d5d8 parse: expose spreadsheet_force_formula_computation option (#919) 2025-09-16 19:28:03 +09:00
Emmanuel Ferdman 355129fea5 Fix colab broken links (#750)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-09-14 23:10:21 +02:00
Adrian Lyjak d9aed80ded fix: v prefix goes deeper. Fix more (#899) 2025-09-08 17:45:06 -04:00
Pierre-Loic Doulcet c07d2d70a8 update parse package (#911) 2025-09-08 09:46:32 -06:00
Neeraj Pradhan ed6937a5a9 Fix uv sync; remove poetry lock (#906) 2025-09-05 17:13:31 -07:00
Neeraj Pradhan 34c15932a3 Bump version to 0.6.64 (#904) 2025-09-05 17:05:21 -07:00
Neeraj Pradhan b18ea96d11 Remove report generation related code from llama_cloud_services (#905) 2025-09-05 16:41:28 -07:00
Clelia (Astra) Bertelli 196ab827f5 fix: make ts release beautiful again (#902) 2025-09-05 10:41:39 -06:00
Peter Rowlands (변기호) ba4cb4d5e9 parse: expose page.slideSpeakerNotes (#889) 2025-09-05 15:48:44 +09:00
Adrian Lyjak 58d883b825 fix: "v" prefix being added to js versions (#898) 2025-09-04 15:39:27 -04:00
Adrian Lyjak 5fc5ebfc6c client unification (#895)
read from the shared client
2025-09-04 14:12:28 -04:00
Adrian Lyjak fe3e20fd53 Update version script, and unify the linting so that prettier is more consistent (#897)
Add version script, and unify the linting so that prettier is more consistent
2025-09-04 14:09:27 -04:00
Jerry Liu e7e59459ab getting started LlamaCloudIndex notebook (#891) 2025-09-02 14:52:39 -06:00
Logan Markewich f4d7c84e19 remove stale param 2025-09-02 13:37:16 -06:00
Yannis Panagis 9050a346e4 Added "SourceText" to __init__.py (#892) 2025-09-02 13:28:24 -06:00
Sourabh Desai 9690ccf4ea Fix tag push command in CONTRIBUTING.md (#894)
seems to be missing one little `v`
2025-09-02 10:49:14 -07:00
Sourabh Desai 97745f0f1c version bump to 0.6.63 (#893) 2025-09-02 10:36:51 -07:00
Sourabh Desai 61a696b9db add file names in return values (#888) 2025-08-29 15:55:18 -07:00
Sourabh Desai 3e01adaf0e add alternative builder method (#887)
* add alternative builder method

* fix test
2025-08-29 15:55:04 -07:00
Adrian Lyjak 37393b7e98 fix: Make env based api url overrideable (#881) 2025-08-20 20:51:09 -06:00
Jerry Liu ecd859a67c fix preset notebook: give outputs in markdown (#883) 2025-08-20 20:50:11 -06:00
Logan decca8e671 update all example notebooks (#882) 2025-08-20 20:49:53 -06:00
Jerry Liu 5ea0815187 add a starter notebook for llamaparse presets (#874) 2025-08-19 09:22:07 -07:00
Sourabh Desai cf149650f5 add acreate_classify_job (#878) 2025-08-18 15:36:01 -07:00
dependabot[bot] 4c6c231ea4 Bump actions/checkout from 4 to 5 (#875) 2025-08-18 12:58:31 -06:00
Jerry Liu 5955b26509 fix composite retriever (#873)
* cr

* cr
2025-08-18 11:23:24 +02:00
Adrian Lyjak 31f54bca55 feat: support passing a pre-uploaded file directly (#871)
* feat: support passing a pre-uploaded file directly

* bump version
2025-08-14 15:32:55 -04:00
Adrian Lyjak b1ae7bb736 handle extract error field (#870) 2025-08-14 11:08:50 -04:00
Adrian Lyjak 31fe12e0da parallelize e2e tests (#867)
parallelise e2e tests
2025-08-14 10:00:12 -04:00
Terry Zhao 90b0c5e295 feat: export ExtractedFieldMetadata and ExtractedFieldMetadataDict types (#868)
* feat: export ExtractedFieldMetadata and ExtractedFieldMetadataDict types from beta/agent module

- Add missing type exports for ExtractedFieldMetadata and ExtractedFieldMetadataDict
- These types are used by ExtractedData interface but were not accessible externally
- Fixes issue where dependent types could not be imported separately

* bump version

* fix lint

---------

Co-authored-by: Terry Zhao <terryzhao@runllama.ai>
2025-08-13 14:43:48 -07:00
Adrian Lyjak 79fe1930cf Re-order extraction metadata union for better parsing (#865)
* Re-order args so that pydantic doesn't parse nested dict to a empty extraction result

* Use a citations array instead
2025-08-13 16:22:06 -04:00
Sourabh Desai ab225c3eab Classifier SDK (#837)
* add files client

* add classification SDK (beta/experimental)

* lint

* lint

* update files client

* add polling timeout

* move e2e test settings to conftest.py

* unused params

* use e2e settings class

* make org id optional

* ordering params

* fix tests

* add sync support
2025-08-13 09:50:39 -07:00
Sourabh Desai 6f1de75909 fix presigned urls + add very necessary test (#864) 2025-08-12 15:28:54 -07:00
Sourabh Desai 230ed64e41 missing await (#863)
missed this await
2025-08-12 13:54:34 -07:00
Logan ef126c3a93 remove print (#861) 2025-08-11 17:42:55 -07:00
Logan 51a7534733 support llama parse audio (#859) 2025-08-11 12:57:01 -07:00
Sourabh Desai 4f5d2bde13 add files client (#836)
* add files client

* lint

* update files client

* move e2e test settings to conftest.py

* unused params

* make org id optional
2025-08-08 15:54:00 -07:00
Clelia (Astra) Bertelli 3d05fe5d77 chore: bump ts version for parse (#855) 2025-08-08 11:43:28 +02:00
Clelia (Astra) Bertelli c16ca673af feat: add parse and getTables methods to LlamaParseReader (#851)
* feat: add parse and getTables methods to LlamaParseReader

* feat: add tests

* fix: loop logic to fix test 🙈

* chore: implement suggestions
2025-08-08 11:35:54 +02:00
Neeraj Pradhan 6619034bce Bump version to 0.6.56 (#853) 2025-08-07 15:42:19 -07:00
Neeraj Pradhan c56fb5d8f7 Update docs for extract (#852)
* Update docs for extract

* add more details on async
2025-08-07 13:59:53 -07:00
Peter Rowlands (변기호) b407a5edb5 parse: expose HTML output for result table items when possible (#850) 2025-08-07 08:44:09 -06:00
Clelia (Astra) Bertelli e6a27d17fb wip: implementing Extract in TS (#839)
* wip: implementing Extract in TS

* feat: main implementation (untested)

* ci: lint

* feat: add stateless api support and retries mechanisms

* refactor: working LlamaExtract + tests

* refactor: working LlamaExtract + tests

* correct stateless extraction test

* correct stateless extraction test

* chore: intervals are now in seconds, extractStateless -> extract, support for multiple file types

* fix: infer file type

* fix: infer file type

* fix: change agent name

* docs: adding example

* docs: add link to example in extract.md
2025-08-07 12:18:58 +02:00
Peter Rowlands (변기호) 34077fd479 py: bump version to 0.6.55 (#846) 2025-08-06 13:02:35 +09:00
Peter Rowlands (변기호) 7a68ad5a7f utils/parse: add method to check pypi for package updates (#844)
add utils method to check pypi for package updates
2025-08-06 12:36:42 +09:00
Neeraj Pradhan 74a1b6c2f2 Update Extract with stateless API (#840) 2025-08-05 13:33:07 -07:00
Clelia (Astra) Bertelli 9a90ae5264 fix: run e2e only on 3.12 (#838)
* fix: run e2e only on 3.12

* ci: workflow name and linting

* ci: job name correction 🤦

* fix: test e2e only on PR

* chore: differentiate between e2e and non-e2e tests

* ci: run all tests using explicit patterns

* chore: moving tests

* fix: change name to test_index in unit_tests
2025-08-05 21:45:16 +02:00
Clelia (Astra) Bertelli 310c1bc105 docs: move ts examples in their own top-level folder (#845) 2025-08-05 19:06:32 +02:00
Marcus Schiesser cd20b29299 chore: build before releaes (#843)
* chore: add e2e tests and use monorepo for TS

* chore: build main package to run e2e tests

* chore: add build before releasing

* fix linting

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
2025-08-05 10:09:27 +02:00
Neeraj Pradhan 0cb7aeb81c Add claude code workflow with restricted access (#841) 2025-08-04 17:02:41 -07:00
Marcus Schiesser 98db5eeeae chore: remove llamaindex dep (#826)
* chore: remove llamaindex dep

* chore: remove all dependency on llamaindex

* feat: restructure docs/examples

* chore: remove llamaindex dep

* chore: remove all dependency on llamaindex

* simplify querytool

* fix tests

* revert version

* add missing import

* remove unused file

* feat: change default description to adapt it to LlamaCloud Index

---------

Co-authored-by: Clelia (Astra) Bertelli <clelia@runllama.ai>
2025-08-04 11:48:24 +02:00
Adrian Lyjak c21cb34ff6 fix: Fix bugs in ExtractedFieldMetadata parser (#834)
* fix: Fix bugs in ExtractedFieldMetadata parser

- Wasn't recursing through lists properly
- Fix field names, names changed or I copied incorrectly
- Handle reasoning on a parent object

* version script fixes

* update versions

* skip the unrelated failing test for now
2025-08-01 16:08:16 -04:00
Adrian Lyjak e28c7b9d92 Copy extracted citations to the new repo (#832)
* Copy extracted citations to the new repo

* fix spell check

* ignore examples too

* tweak timeout

* add changes to github actions

* shrug
2025-07-31 19:34:24 +02:00
Clelia (Astra) Bertelli ee4e565604 Example Notebooks (#829)
* fix: add symlink to avoid breaking links

* feat: copy examples
2025-07-31 16:54:12 +02:00
Clelia (Astra) Bertelli 6dbb089f4c delete examples (#830) 2025-07-31 16:53:54 +02:00
Logan Markewich c4b694db8d update symlink 2025-07-31 08:44:30 -06:00
Clelia (Astra) Bertelli 97f428ad06 fix: add symlink to avoid breaking links (#828) 2025-07-31 08:39:44 -06:00
Clelia (Astra) Bertelli ef92ee5408 feat: add ts examples (clean) (#822)
* feat: add ts examples (clean)

* chore: correct title
2025-07-31 11:25:29 +02:00
Logan d094668d03 Update extract.md 2025-07-30 14:58:25 -06:00
Logan 5bb5fc1625 Update parse.md 2025-07-30 14:58:09 -06:00
Logan 1d57e0071d Update parse.md 2025-07-30 14:57:31 -06:00
Logan 2a344c4f5c Update extract.md 2025-07-30 14:56:33 -06:00
Logan ce02559b8d Update README.md (#824) 2025-07-30 14:55:21 -06:00
Harshit Budhiraja e42746e372 docs(readme): update hyperlinks to correct targets (#820) 2025-07-30 14:53:43 -06:00
Clelia (Astra) Bertelli 3149dfd03a fix: no git checks on pnpm publish (#823) 2025-07-30 21:25:23 +02:00
Clelia (Astra) Bertelli e499fdbdab fix: add release to NPM (#819) 2025-07-30 20:55:41 +02:00
Clelia (Astra) Bertelli e57df39248 Merge index into main (#821)
* wip: monorepo changes

* fix ci for the time being

* fix ci for the time being pt2

* wip: first cloud refactoring for ts

* chore: restore original package

* fix: imports, package.json, tsconfig.json, client, reader

* feat: adjustments after local testing

* ci: github actions for typescript

* ci: typescript ci

* ci: nvmrc 🤦

* ci: remove cache 🤦

* ci: actions

* ci: actions (i lost count)

* ci: pnpm run format

* ci: pnpm run format

* chore: migrate llama-parse to uv

* add tests

* remove unneeded readme

* update workflows

* feat: modify py release workflow, adding uv version, bump version for llama-cloud-services to latest

* uv lock

* ci: python tests all tests

* fix: lock file pulling in wrong version of numpy

* feat: add index to llama-cloud-services (#817)

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
Co-authored-by: Adrian Lyjak <adrianlyjak@gmail.com>
2025-07-30 19:46:36 +02:00
Clelia (Astra) Bertelli 09b192b98b Adding TS llama-cloud-services and moving llama-parse to uv (#811)
* wip: monorepo changes

* fix ci for the time being

* fix ci for the time being pt2

* wip: first cloud refactoring for ts

* chore: restore original package

* fix: imports, package.json, tsconfig.json, client, reader

* feat: adjustments after local testing

* ci: github actions for typescript

* ci: typescript ci

* ci: nvmrc 🤦

* ci: remove cache 🤦

* ci: actions

* ci: actions (i lost count)

* ci: pnpm run format

* ci: pnpm run format

* chore: migrate llama-parse to uv

* add tests

* remove unneeded readme

* update workflows

* feat: modify py release workflow, adding uv version, bump version for llama-cloud-services to latest

* uv lock

* ci: python tests all tests

* fix: lock file pulling in wrong version of numpy

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
Co-authored-by: Adrian Lyjak <adrianlyjak@gmail.com>
2025-07-30 17:59:08 +02:00
Adrian Lyjak 13f01a0621 Adding support for page citations, and refactor the confidence into the field metadata (#815) 2025-07-30 10:55:29 -04:00
Javier Torres cf879a1a58 Bump llama-cloud version (#814) 2025-07-28 16:06:31 -05:00
Tuana Çelik fcdf2ab63e Fixes to multimodal report generation (#809) 2025-07-23 16:28:53 -06:00
Adrian Lyjak 083d8109c2 Make versioning a little easier, and fix llama_parse version (#808)
* Make versioning a little easier

* fix up ci
2025-07-21 18:49:07 -04:00
Adrian Lyjak 89cfc8b25f feat: default to _public agent data (#803)
* feat: default to _public agent data
* version bump
2025-07-21 15:58:03 -04:00
Peter Rowlands (변기호) c46e157f92 parse: expose preserve_very_small_text option (#806) 2025-07-21 14:19:15 +09:00
Peter Rowlands (변기호) 05d6026d37 bump version to v0.6.50 (#802) 2025-07-18 18:59:25 +09:00
Peter Rowlands (변기호) 8e98d5c146 parse: expose functionality to get raw job results (#801)
* add LlamaParse.get_result()

* add JobResult.get_text/get_markdown/get_json

* add tests
2025-07-18 18:50:29 +09:00
Adrian Lyjak 3f311c0669 Bump v0.6.49 (#797) 2025-07-16 19:42:09 -04:00
Adrian Lyjak b1a2f9d42b Add new method to fetch the full, non-paginated markdown (#796)
Add new method to fetch the full, non-paginated markdown for proper merge_tables_across_pages_in_markdown support
2025-07-16 19:29:57 -04:00
Neeraj Pradhan 142f55c94c Update to version 0.6.48 (#795)
* Update to version 0.6.48

* pin version

* poetry lock

* adjust warnings

* collect all agents for cleanup
2025-07-16 13:24:44 -07:00
Clelia (Astra) Bertelli 230a110e52 chore: vbump to 0.6.47 and example notebook (#794)
* chore: vbump to 0.6.47 and example notebook

* chore: update llama-parse pyproject.toml
2025-07-16 19:08:44 +02:00
Clelia (Astra) Bertelli 83e2b031cd feat: add table extraction for LlamaParse as CSV files (#793)
* feat: add table extraction for LlamaParse as CSV files

* chore: poetry lock

* chore: add tests

* fix: handle the case where no tables are present

* chore: implement suggestions
2025-07-16 17:08:09 +02:00
Adrian Lyjak 4844e26e5c Improve Agent Data interface, and add file related fields to extracted data for file tracking (#785)
Add file related fields for file tracking. Simplify API
2025-07-09 14:27:24 -04:00
Pierre-Loic Doulcet 70a049af3c merge_tables_across_pages_in_markdown parse parameter (#786)
* merge_tables_across_pages_in_markdown parse parameter

* base.py
2025-07-09 19:03:48 +02:00
Adrian Lyjak dc11776c86 Add nicer hand-written agent data interface (#782)
* Add nicer hand-written agent data interface

* bump to 0.6.44
2025-07-08 17:49:00 -04:00
Logan 2448a42b90 relax pydantic job object (#784) 2025-07-08 12:12:56 -06:00
Neeraj Pradhan c75a900174 Bump up version to 0.6.42 (#783) 2025-07-08 09:16:46 -07:00
Peter Rowlands (변기호) 2fb7adfe0e parse: loosen PageItem.rows type hint (v0.6.41) (#776)
* parse: loosen PageItem.rows type hint

* bump version to 0.6.41
2025-06-30 21:47:40 +09:00
Pierre-Loic Doulcet dc82270724 header footer control in llamaparse (#775) 2025-06-30 16:02:59 +08:00
Neeraj Pradhan d880a48dd0 Bump to version 0.6.39 (#772)
* Bump to version 0.6.39

* lock file update
2025-06-27 16:04:40 -07:00
Logan 7567e8b45e except one more error type (#771) 2025-06-27 10:17:57 -06:00
Neeraj Pradhan 0d59a90151 Relax tenacity version; bump up version to 0.6.37 (#769) 2025-06-25 15:32:20 -07:00
Neeraj Pradhan 98ad550b1a Manage extract agent lifecycle in pytest (#766) 2025-06-24 08:59:38 -07:00
Neeraj Pradhan b58f43ce9f Bump up version to 0.6.36 (#763) 2025-06-23 14:26:05 -07:00
Neeraj Pradhan acf6adcd91 Make job fetching more robust to connection errors (#764) 2025-06-23 13:17:28 -07:00
Neeraj Pradhan daf6576c3c Bump version to 0.6.35 (#762) 2025-06-20 09:33:21 -07:00
Logan 8caa4defa6 fix partition (#758) 2025-06-16 17:37:52 -06:00
Pierre-Loic Doulcet 26918b8de4 add high_res_ocr to the package (#757) 2025-06-16 16:28:23 +08:00
Pierre-Loic Doulcet 6fb5ebe2f9 6.32 warning on unused parameters (#755) 2025-06-12 22:35:48 -06:00
dependabot[bot] c0aa67995b Bump requests from 2.32.3 to 2.32.4 in /llama_parse (#754) 2025-06-10 18:14:44 -06:00
dependabot[bot] 9f841f8328 Bump tornado from 6.4.2 to 6.5.1 in /llama_parse (#753) 2025-06-10 18:14:35 -06:00
dependabot[bot] 99c75eece9 Bump h11 from 0.14.0 to 0.16.0 in /llama_parse (#752) 2025-06-10 18:14:27 -06:00
Logan 57d2586ee3 v0.6.31 (#751) 2025-06-10 17:58:36 -06:00
Jerry Liu 4280a43ec8 add multi-fund analysis notebook (#739) 2025-06-07 11:25:25 -07:00
Neeraj Pradhan 7f1082bbb2 Bump to version 0.6.30 (#748) 2025-06-05 14:34:20 -07:00
Simon Suo 57cfc45804 Directly pass None project_id (#743) 2025-06-05 14:16:54 -07:00
Soumil.Binhani 30e8913875 0.6.29: Standerdize the parsing input format for both .aget_json() and .aload_data() (#745) 2025-06-05 10:58:07 -06:00
Logan 0ce6d4d7a4 more optional types marked (#747) 2025-06-05 10:50:29 -06:00
Peter Rowlands (변기호) 584ba8d48e 0.6.28: fix job result format after partitioning changes (#741)
* parse: fix job result format

* bump to 0.6.28
2025-06-02 15:25:30 -07:00
Peter Rowlands (변기호) 925805ee11 parse: support partitioning files before parsing (#709)
* parse: add utils for handling target_pages

* parse: support partitioning docs into multiple parse jobs

* tests: add tests for partitioned parse

* drop unneeded get_job_result call

* add parse JobFailedException and expected error handling

* bump to 0.6.27
2025-06-02 12:27:58 -07:00
Logan 76fb73c971 v0.6.26 (#740) 2025-06-02 09:59:45 -06:00
Abhik Bhattacharjee 6d19ea9ac0 parse: fix the "model" parameter mismatch between playground and Python client (#737) 2025-06-02 09:35:30 -06:00
Pierre-Loic Doulcet 90431090e9 0.6.25 outlined_table_extraction (#736) 2025-05-30 11:37:21 +02:00
Neeraj Pradhan 6dff35b204 Add notebook for Form 4 extraction (#731)
* Add notebook for Form 4 extraction

* fix comments

* heavier caching; add mermaid diag

* add output directory

* save notebook
2025-05-29 18:31:56 -07:00
Logan e634c7978d v0.6.24 (#732) 2025-05-28 20:11:51 -06:00
Neeraj Pradhan 7a9e99bba2 Bump to version 0.6.23 (#729) 2025-05-20 09:43:06 -07:00
Adrian Lyjak efcdd4405b Pass through verify and timeout config to the extraction agent (#726) 2025-05-17 12:51:16 -07:00
Javier Torres bf3614690f Remove credits from parse metadata (#720) 2025-05-09 16:03:09 -05:00
Logan 7463e00da3 v0.6.22 (#718) 2025-05-08 11:44:41 -06:00
Tuana Çelik cbe9de0c57 Adding example for extracting with citations (#716)
* Adding example for extracting with citations

* removing TOC and installation output
2025-05-06 23:32:17 +02:00
Logan a023507d42 even more optional (#711) 2025-05-01 15:52:38 -06:00
Peter Rowlands (변기호) e48f544ddc parse: fix num_workers/parse job batching (#708) 2025-05-01 09:30:35 -06:00
Logan 4aa7ad5642 v0.6.20 (#707) 2025-04-29 08:53:55 -06:00
Sacha Bron c39cdbcd01 v0.6.19 (#706) 2025-04-29 12:28:21 +02:00
Pierre-Loic Doulcet 71eaa8bcc6 add auto_mode_configuration_jon for llamaParse (#704) 2025-04-29 12:23:03 +02:00
Pierre-Loic Doulcet 1e1cbdfc79 add support for presets (#703) 2025-04-29 11:54:54 +08:00
Logan cc8af4a43a make original height + width optional in the parse result (#702) 2025-04-27 18:31:35 -06:00
dependabot[bot] 43fbd48ab8 Bump actions/setup-python from 4 to 5 (#701)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-27 13:08:44 -06:00
dependabot[bot] 5ec66e9452 Bump actions/checkout from 3 to 4 (#700)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-27 13:08:31 -06:00
Scott Brenner 211521c82e Dependabot configuration to update actions in workflow (#698) 2025-04-27 12:52:11 -06:00
Scott Brenner 4ddaab1efb Refactor CodeQL workflow (#699)
* Refactor CodeQL workflow

* Update .github/workflows/codeql.yml
2025-04-27 12:51:56 -06:00
Neeraj Pradhan 53e5ce2e83 Bump to v0.6.16 (#697) 2025-04-25 14:39:52 -07:00
Neeraj Pradhan 9f4bd1cb64 Update to latest version of llama-cloud (#696)
update to latest version of llama-cloud
2025-04-25 14:14:49 -07:00
Logan 456863752b small enum nit for FailedPageMode (#693) 2025-04-23 21:34:26 -06:00
Pierre-Loic Doulcet c2dc34bbd6 Page error parameters (#691) 2025-04-23 20:47:57 -06:00
Logan fcabb04baf skip llama-report tests in cicd (#692)
* skip llama-report tests in cicd

* skip llama-report tests in cicd
2025-04-23 20:47:00 -06:00
Sacha Bron 8e7c32d3d6 Add markdown_table_multiline_header_separator support (#683)
* Add markdown_table_multiline_header_separator support

* Lint
2025-04-15 17:39:46 +02:00
Neeraj Pradhan 7e3013d914 Use unique filename to avoid db collisions later (#682)
* Use unique filename to avoid db collisions later

* add xfail marker to test_create_and_delete_report
2025-04-11 11:03:15 -07:00
Logan 4a664c33d2 parse readme nits (#681) 2025-04-10 19:25:06 -06:00
Logan 6d049ee2e4 v0.6.12 (#680) 2025-04-10 19:18:49 -06:00
Logan fa73e73664 new result object (#650) 2025-04-10 19:17:23 -06:00
Neeraj Pradhan bf67ee6056 Update docs for LlamaExtract (#679) 2025-04-10 12:16:32 -07:00
Neeraj Pradhan a1abef2ee9 Bump version to v0.6.11 (#678) 2025-04-10 11:23:06 -07:00
Neeraj Pradhan a753e01d3c Support text as input directly in the SDK (#676) 2025-04-09 21:40:56 -07:00
Logan 9b15065b24 v0.6.10 (#677) 2025-04-09 19:30:59 -06:00
Pierre-Loic Doulcet 6e4150537c Add compact_markdown_table parameter (#675) 2025-04-09 19:19:19 -06:00
Neeraj Pradhan 233d715a14 Better connection management on llamaextract client (#674) 2025-04-09 14:26:52 -07:00
Neeraj Pradhan 77ac385dfe Fix bytes input for LlamaExtract (#673)
* Fix bytes input for LlamaExtract

* backwards compatibility

* compat python 3.9
2025-04-09 10:37:22 -07:00
Neeraj Pradhan 53b78fcd7d Rename test endpoint to match functionality (#668) 2025-04-08 17:42:20 -07:00
Jerry Liu 16f81bd7ee add due diligence notebook (#670) 2025-04-08 09:13:11 -07:00
Marplex 0ee049fd11 Add layout agent mode visual citation demo notebook (#672) 2025-04-07 09:54:06 -06:00
Neeraj Pradhan 7dba17e5bc Update extract.md (#671) 2025-04-06 22:18:03 -07:00
Jerry Liu eeb678b937 solar panel extraction workflow (#667)
* cr

* cr

* cr
2025-04-02 17:28:13 -07:00
Emanuel Ferreira fe4eb664fd chore: add base url documentation (#666)
* wip

* newline

* wip

* docs
2025-04-01 18:43:17 -03:00
Jerry Liu 257720e443 fix notebook (#665)
cr
2025-04-01 08:05:34 -07:00
Jerry Liu e7afaedf3e create llamaextract demo with lm317 datasheet (#664) 2025-03-31 17:38:24 -07:00
Neeraj Pradhan b66b47a708 Bump to version 0.6.9 (#663)
* Bump to version 0.6.8

* add banks as dep

* Add platformdirs to poetry

* Fix version number
2025-03-28 17:07:46 -07:00
George He fe485ff62e fix:Add retry handling to parse and backoff patterns - catching 5XX errors and HTTP errors (#648)
* Add parse retry logic

* Update code cleanliness

* Update errors

* Fix lint

* Fix backoff strategies

* Update docs

* Fix errors

* Add base
2025-03-26 12:09:56 +01:00
Pierre-Loic Doulcet 1ebe1cee67 Add new parameter, fix parse_mode (#660)
* update with new parameters

* lint
2025-03-25 11:14:37 +01:00
Neeraj Pradhan e9252eb48a Update notebook for extract (#658) 2025-03-22 09:34:40 -07:00
Neeraj Pradhan dad7728135 Bump to version 0.6.7 (#656) 2025-03-21 21:26:54 -07:00
Neeraj Pradhan c5111e3335 Revert httpx_client as argument (#657) 2025-03-21 21:16:56 -07:00
Neeraj Pradhan bbbdb98362 Add provision for custom httpx client for LlamaExtract (#654) 2025-03-21 11:37:40 -07:00
Neeraj Pradhan 60cdc2af84 Add xfail for timeout errors in report gen (#655) 2025-03-21 11:06:49 -07:00
Neeraj Pradhan 344c20f331 Bump up version for release (#652) 2025-03-18 15:54:32 -07:00
Neeraj Pradhan 2b0496e947 Update llama cloud for extract endpoints (#651) 2025-03-18 15:43:43 -07:00
Laurie Voss 6c63dba6fb Typos and removing staging URL (#647) 2025-03-13 08:11:09 -07:00
Neeraj Pradhan 734c021a2e Add notebook for extraction from SEC 10-K/Q filings (#646)
* Add notebook for extraction from SEC 10-K/Q filings

* Add notebook for 10 k/q extraction

* Remove unnecessary cell

* fix file link

* fix code rendering

* Add notes for clarity

* fix notes
2025-03-12 20:42:17 -07:00
Neeraj Pradhan eeb034896f Bump to version 0.6.5 (updating llama-cloud dependency) (#645)
* Bump to version 0.6.5 (updating llama-cloud dependency)

* fix other endpoints
2025-03-06 18:22:42 -08:00
Sacha Bron 4c977e8384 Bump version 2025-03-06 17:04:56 +01:00
Sacha Bron c6137713c7 Add adaptive_long_table option (#638) 2025-03-04 22:42:05 +01:00
Neeraj Pradhan fd4b1893f1 Bump version to v0.6.3 (#636) 2025-02-26 15:09:39 -08:00
Neeraj Pradhan e542e6136b Update README.md (#635) 2025-02-26 15:41:19 -06:00
Neeraj Pradhan 393451e304 Add LlamaExtract to llama-cloud-services (#628) 2025-02-25 18:17:29 -08:00
Logan 5084ba27ab v0.6.2 (#632) 2025-02-25 18:35:44 -06:00
Pierre-Loic Doulcet c82771f841 add new parsing mode and prompt parameters (#622) 2025-02-25 18:24:04 -06:00
Logan dc6860535a fix publish flow (#617) 2025-02-24 22:59:02 -10:00
Logan c872617b4e add organization id and project id as args (#616) 2025-02-11 17:46:50 -06:00
Jen Person 47c8682761 fixing colab links (#611) 2025-02-10 11:29:02 -06:00
Jerry Liu 683400788b add gemini2 flash notebook (#606) 2025-02-07 14:48:40 -06:00
Logan 05065a8329 v0.6.0 (#603)
* v0.6.0

* nit release

* nit
2025-02-06 17:39:40 -06:00
Logan 1ae4d2bbc7 Refactor into llama-cloud-services (#597) 2025-02-06 16:15:57 -06:00
Sacha Bron ae38f406fd fix release pipeline no-dev issue (#592) 2025-01-24 15:17:01 +01:00
Pierre-Loic Doulcet 4897d01cb0 add new formatting instruction parameters (#582)
* add new formatting instruction parameters

* bump version

* wip

* s3 region

* update test
2025-01-22 15:56:57 +01:00
Logan bd7b563463 v0.5.19 (#569) 2024-12-27 13:07:01 -06:00
apostoli 530241dd0b Stoli/feat/connection handling (#568) 2024-12-27 11:34:00 -06:00
Pierre-Loic Doulcet 6338641107 Extract layout, audio files (#557) 2024-12-18 16:29:17 +01:00
Bharath Lakshman Kumar 6d62fb89c3 Fix docstring for aget_xlsx method (#551)
Updated docstring to describe xlsx download instead of image download
2024-12-13 20:35:18 +05:30
Ravi Theja 7d4df3b6e5 Add cookbook for parsing instructions (#550) 2024-12-13 06:44:17 -08:00
Ravi Theja bc28db5b92 Update cache parameter (#548) 2024-12-11 16:15:59 +01:00
Jerry Liu f78186c0f7 update auto-mode (#545) 2024-12-09 16:13:09 -06:00
Laurie Voss e3292f5566 Expanding auto mode notebook with strings and regex triggers (#544) 2024-12-09 12:03:09 -08:00
Jerry Liu 58f980f411 auto-mode notebook (#540)
Co-authored-by: Laurie Voss <github@seldo.com>
2024-12-09 08:59:21 -08:00
Ravi Theja 4740d0611d Add get charts function (#542)
* Add get charts function

* code refactoring

* solve linting

* Add cookbook
2024-12-09 21:28:48 +05:30
Laurie Voss 3651a10e80 JSON mode tour notebook (#531) 2024-12-06 14:21:15 -08:00
Pierre-Loic Doulcet 483b51c51c Add support for html_remove_navigation_elements. (#532) 2024-12-06 12:05:46 +01:00
Ravi Theja cdbddef86d Add demo videos notebooks (#529) 2024-12-05 08:38:34 -08:00
Pierre-Loic Doulcet 3690109abf Add more parameters (#525)
* add after revert

* 3.8 so numpy work

* change defaults

* change requested

* change requested
2024-12-04 15:39:00 +01:00
Pierre-Loic Doulcet 2e322b4fc8 Revert "Add more paramerters"
This reverts commit 735e5f3ddc.
2024-12-04 10:20:07 +01:00
Pierre-Loic Doulcet 735e5f3ddc Add more paramerters 2024-12-04 10:17:08 +01:00
Logan e4cb4c75e5 add test for downloading images (#506) 2024-11-21 13:08:29 -06:00
Jerry Liu 1693deff72 dynamic section retrieval nb (#484) 2024-11-13 13:29:30 +01:00
Jerry Liu 3270f1228d multimodal report generation image (#461)
* cr

* cr
2024-11-13 13:28:07 +01:00
Pierre-Loic Doulcet eeabf48d29 add input url and http_proxy (#475) 2024-11-12 12:56:58 -06:00
Pierre-Loic Doulcet 89348aa8e5 add xlsx support (#472) 2024-11-01 10:09:17 -06:00
Thiago Salvatore 3ab2ce27b5 Add PurePosixPath to list of allowed file-paths (#464) 2024-10-25 10:45:47 -06:00
Sacha Bron 265261862f Add continuous_mode (#460) 2024-10-22 19:45:46 +02:00
Sacha Bron 66cf052b8c Update issue templates (#457)
* Update issue templates

* Update issue templates
2024-10-21 19:51:46 +02:00
Jerry Liu 2ca2d81e58 fix RFP example (#455) 2024-10-21 09:13:24 -07:00
Sacha Bron 951ba4dfd8 Release is_formatting_instruction parameter (#446)
* Release is_formatting_instruction parameter

* Add annotate links
2024-10-17 12:29:05 +02:00
Adam Reichert 386d210e8b CLI Testing Tool for Parsing Results to Standard Output (#363) 2024-10-16 12:40:00 -06:00
Sacha Bron 9321602845 Add missing parameters (#441) 2024-10-15 10:57:32 -06:00
Jerry Liu 26c06353f0 Add RFP Response generation workflow (#438) 2024-10-14 08:45:04 -07:00
Jerry Liu 62cf12d6eb add multimodal RAG pipeline with contextual retrieval (#429) 2024-10-06 15:25:57 -07:00
Logan 253ee61463 improve error handling for jobs (#426) 2024-10-02 18:57:46 -06:00
Sourabh Desai 2ccd2a9397 Update README.md to convey need to specify extra_info["file_name"] (#417) 2024-09-29 17:07:12 -07:00
Jerry Liu c139e8e3e6 fix excel notebook (#416) 2024-09-24 17:11:33 -07:00
Ravi Theja 6e6e96c422 Update excel rag with o1 notebook (#415) 2024-09-24 07:42:00 -07:00
Jerry Liu b677e5226d nit: move o1 excel notebook (#414) 2024-09-23 10:51:51 -07:00
Ravi Theja df723584b6 Compare Excel RAG with o1 models (#409) 2024-09-23 10:42:47 -07:00
Sacha Bron efe06ffff0 Bump to v0.5.6 2024-09-19 14:33:30 +02:00
Pierre-Loic Doulcet 6ba052d58f add premium mode support (#406) 2024-09-18 11:48:45 +02:00
Sacha Bron 8cf52058b5 Remove JSON from valid result types (#400) 2024-09-18 11:48:22 +02:00
Jerry Liu 1bae09126c fix multimodal RAG over slide deck (#402) 2024-09-17 13:08:10 +08:00
Pierre-Loic Doulcet bbbae9de9d do not attach a filepath when a stram of bytes is passed (#394) 2024-09-10 11:53:53 -06:00
Thiago Salvatore 7cb6d06316 Enable support for custom filesystem (#117) 2024-09-10 10:39:46 -06:00
Jerry Liu bca5492829 update README (#386) 2024-09-09 17:31:58 -06:00
Sourabh Desai f6a4d8681f bump to 0.5.3 (#388) 2024-09-09 11:06:58 -06:00
Sourabh Desai fd3836ec95 allow custom httpx client (#384)
* allow custom httpx client

* split into aget_images + unit test

* typo
2024-09-07 10:51:18 -07:00
Ravi Theja f304c2dc08 Add timeout to the image request using httpx (#378) 2024-09-04 11:46:30 -06:00
Simon Suo f13a1a2fc3 Support take_screenshot (#372)
* wip

* wip]

* wip
2024-08-28 22:24:14 -07:00
Pierre-Loic Doulcet df1453e30c Update README.md 2024-08-28 09:28:49 +08:00
Pierre-Loic Doulcet bac204f800 Update README.md 2024-08-28 09:28:31 +08:00
Pierre-Loic Doulcet dbf24a7daa Update README.md 2024-08-28 09:26:00 +08:00
Logan Markewich 6a29b1ac96 update to use v0.11.0 of core 2024-08-22 20:48:33 -06:00
Jerry Liu 6c700d9e0f add report generation agent (#349) 2024-08-16 09:27:23 -07:00
Jerry Liu ab69e87c2a add report generation example (#340) 2024-08-10 00:30:28 -07:00
Jerry Liu d1f97531dc small edits (#342)
cr
2024-08-09 14:22:11 -07:00
Jonathan Liu 17ebbca6ea Adds more use case notebooks to multimodal parsing (#330)
* add auto insurance example

* adds additional claims to insurance example

* add -o flag to unzip

* add example for legal docs

* adds product manual use case

* revert gitignore

* Adds explanation to insurance and legal rag
2024-08-08 17:37:25 -07:00
Jerry Liu 08ddaaaa2f add gpt-4o-mini example (#316) 2024-07-25 17:11:02 -07:00
Sacha Bron 7515fe5f3e Update issue templates 2024-07-19 15:10:41 +02:00
Sacha Bron cd49dae7ed Update issue templates 2024-07-17 23:40:14 +02:00
Sacha Bron 2977f56061 Exclude Github generated files from pre-commit 2024-07-17 23:38:08 +02:00
Adam Reichert 8938286862 Modify _get_sub_docs to use Custom Separator (#254)
Move _get_sub_docs to private function
2024-07-17 10:04:03 -07:00
Pierre-Loic Doulcet 7b90d03f28 Pierre/fix page separator (#297)
* change page_separator logic add page_prefix and page_suffix

* up

* lint

* bump version
2024-07-17 19:03:21 +02:00
Sacha Bron 9dfe4d6d79 Update issue templates 2024-07-17 17:28:38 +02:00
Sacha Bron 3a781a453e Update issue templates 2024-07-17 16:32:20 +02:00
Sacha Bron e23487b1d8 Update issue templates 2024-07-17 16:25:00 +02:00
Sacha Bron ccee75721b Update issue templates 2024-07-17 16:23:26 +02:00
Sacha Bron 0a4147116c Update issue templates 2024-07-17 16:21:20 +02:00
Jerry Liu e58b40b34c nit fix multimodal (#292) 2024-07-16 09:17:36 -07:00
Jerry Liu 0db05b9b96 Add sonnet cookbook + llamaparse fixes (#289) 2024-07-16 09:16:24 -07:00
Jerry Liu a8a191ae87 nit: slide deck fix (#288) 2024-07-15 14:44:19 -07:00
Hemant Malik 4d92775aa8 llama-parse with elasticsearch vector database example notebook (#258)
* llama-parse with elasticsearch vector database example notebook

* lint

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
2024-07-15 14:29:13 -07:00
Jerry Liu 477847111e create multi-modal RAG notebook (#284) 2024-07-15 14:29:01 -07:00
Adam Reichert efbcfb1d2e Make File Extension Check Case-Insensitive (#277)
Make File Type Check Case-Insensitive
2024-07-13 13:43:41 -07:00
Logan a9b01c761c v0.4.7 2024-07-13 10:56:24 -06:00
Pierre-Loic Doulcet dac2f7c84e New parameters (#285)
wip
2024-07-12 09:06:21 +02:00
Logan Markewich 478142e509 lock 2024-07-08 09:45:18 -06:00
Logan Markewich e76d5ba679 v0.4.6 2024-07-08 09:42:25 -06:00
Huu Le 23dc9c0f68 fix file_input type issue (#271) 2024-07-08 09:31:31 -06:00
Sourabh Desai 8b96176d8a allow for bytes or buffer as input (#259)
* allow for bytes or buffer as input

* format & readme update

* lint
2024-06-28 23:31:35 -07:00
Logan 1bbf5f4823 v0.4.5 (#251) 2024-06-26 16:38:33 -07:00
Pierre-Loic Doulcet 2f57682035 add target_pages and bounding_box parameters (#250) 2024-06-26 16:25:44 -07:00
Jerry Liu 4c05160f98 add baseline to dcf excel RAG (#234) 2024-06-12 00:19:35 -07:00
Jerry Liu 3e9ad64a7f add split by page mode (#212) 2024-06-11 13:52:33 -07:00
Jerry Liu 21fa19c73b rewrite advanced example (#231) 2024-06-11 13:52:24 -07:00
Pierre-Loic Doulcet 58257d546b Pierre/new options (#216)
* Add spreadsheet extensions

* linting

* fix tests

* Add new parameters to the parser

* add option to API call

* lint

* lint remove trailing space

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
2024-06-07 18:17:54 -06:00
Adam Reichert 5e99d810fd Fix Typo in description of invalidate_cache argument of LlamaParse constructor (#221)
Fix Typo
2024-06-07 17:32:57 -06:00
Jerry Liu a2dc717d85 add simple excel notebook (with dcf) (#215) 2024-06-06 08:48:33 -07:00
Logan 87062e6ca8 add link to docs 2024-06-05 16:10:28 -06:00
Jerry Liu ae3a21c5ff add KG agent (#211) 2024-06-05 12:58:07 -07:00
Ravi Theja ccebb8a2fa Add Excel spreadsheet example with llamaparse (#204) 2024-05-30 09:48:33 -06:00
Pierre-Loic Doulcet 2d21d6e688 Add spreadsheet extensions (#203)
* Add spreadsheet extensions

* linting

* fix tests

---------

Co-authored-by: Logan Markewich <logan.markewich@live.com>
2024-05-29 20:11:28 -06:00
Logan 270d96b7e3 bump version, add image support (#201) 2024-05-28 15:54:54 -06:00
Ajit Mistry ea21daa96f Add example for Weaviate (#181)
add weaviate example
2024-05-28 13:40:11 -07:00
dependabot[bot] 42845f8d07 Bump requests from 2.31.0 to 2.32.2 (#198)
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 20:44:57 +02:00
dependabot[bot] 8b63ae9c46 Bump tqdm from 4.66.2 to 4.66.3 (#197)
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.66.2...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 20:43:04 +02:00
Pierre-Loic Doulcet 173060dc50 New option skip diagonal text and invalidate cache (#178)
* New option skip diagonal text and invalidate cache
2024-05-23 20:42:38 +02:00
Pierre-Loic Doulcet d19b35cd48 allow for .jpg images (#195) 2024-05-23 20:23:09 +02:00
Jerry Liu 0c83fbd679 nit: add caltrain weekend doc (#193) 2024-05-22 00:57:18 -07:00
Jerry Liu 6ae9c1d9cb clean up gpt4o notebook (#192) 2024-05-21 08:40:50 -07:00
Jerry Liu 27523b657a fix colab badge (#186) 2024-05-17 20:49:27 -07:00
Jerry Liu 56d73c1a3f llamaparse example over caltrain schedule (#185) 2024-05-17 09:22:16 -07:00
Jerry Liu 0d2ad9faab gpt4o notebook over tesla impact docs (#180)
Co-authored-by: Logan Markewich <logan.markewich@live.com>
2024-05-16 00:07:54 -07:00
Logan 4572f00657 add new mode and params for openai (#179)
add new mode
2024-05-14 12:24:25 -06:00
Logan 9ed208131f Ensure image extensions, vbump (#159) 2024-04-24 20:39:39 -06:00
Pierre-Loic Doulcet 91b03b2ea7 add html support (#154) 2024-04-23 13:14:13 -06:00
Yi Ding 70b5dc3a63 some notebook updates (#148) 2024-04-18 21:57:10 -06:00
Anoop Sharma d6ab0aa232 Added utils files (#91) 2024-04-14 15:17:31 -06:00
Logan f679e1c76b Skip tests if CICD doesn't populate the secrets (#142) 2024-04-14 15:12:22 -06:00
Logan b91f86ba3d v0.4.1 (#141) 2024-04-14 13:09:00 -06:00
Logan 0f2302fda4 QoL Changes (#140) 2024-04-14 13:01:39 -06:00
henrycunh ff729c05af docs: update readme with link to llamacloud (#139)
Having copy paste a link is silly!
2024-04-14 11:56:47 -07:00
Gautam Kumar 76a6821fb8 Fixing paths of data in example notebook (#136) 2024-04-10 15:52:35 -06:00
Logan 97c7a38a69 vbump (#111) 2024-03-21 10:51:50 -06:00
Pierre-Loic Doulcet 5d398a8a64 Extend supported formats (#110) 2024-03-21 10:44:18 -06:00
Jerry Liu 4252f6186b fixes to insurance demo (#97) 2024-03-21 00:02:47 -07:00
Jerry Liu 22148ade9f fix advanced RAG notebook title (#98) 2024-03-21 00:02:38 -07:00
Jerry Liu b8332fe8e1 nit: add colab badge to mongodb notebook (#109) 2024-03-21 00:02:29 -07:00
Ravi Theja e40e92a133 Add mongodb llamaparse example (#107) 2024-03-20 23:37:42 -07:00
Jerry Liu ba8f345f80 Revert "cr"
This reverts commit 2ddbf1ba0d.
2024-03-19 00:21:29 -07:00
Jerry Liu 2ddbf1ba0d cr 2024-03-19 00:20:42 -07:00
Haotian Zhang 23567c8f98 Init LlamaParseJsonNodeParser example (#93) 2024-03-18 15:15:36 -04:00
Logan 8d39ae7763 add agent demo (#88)
* add agent demo

* remove mention of react agent

* agents folder
2024-03-18 16:12:56 +01:00
Jerry Liu a2edc41fc7 nit: fix grammar in insurance cookbook (#89)
cr
2024-03-18 16:11:03 +01:00
Ikko Eltociear Ashimine 591b6fc44d Update demo_parsing_instructions.ipynb (#86)
usefull -> useful
2024-03-16 18:15:30 +01:00
Pierre-Loic Doulcet f8a3d92ce0 demo insurance + parsing instructions (#84) 2024-03-16 18:14:49 +01:00
Laurie Voss a1d18d83da Adding parsing instructions demo (#82) 2024-03-15 10:07:53 +01:00
Jerry Liu 1ad881e9fc add financial powerpoint cookbook (#78)
cr
2024-03-14 13:59:35 +01:00
Jerry Liu 2de26be464 add basic ppt demo (#75) 2024-03-14 00:28:49 -07:00
Logan Markewich 36f09543b2 v0.3.9 2024-03-13 08:59:00 -06:00
Jerry Liu 393acf8557 let client support more file types (#74) 2024-03-12 23:29:16 -07:00
Logan ab27f2ab79 fix to json bug (#68) 2024-03-07 08:53:37 -06:00
Jerry Liu d13b5ea30a add json cookbook (#64) 2024-03-06 10:06:55 -08:00
Pierre-Loic Doulcet 81843b9285 Pierre/json images (#62) 2024-03-06 08:38:45 -08:00
Logan b19f85234b fix language (#60) 2024-03-05 12:27:34 -06:00
Stefano Lottini 9dee30a616 Astra DB vector store imports from newly-named package (#26)
* Astra DB vector store imports from newly-named package

* removed explicit astrapy installation (comes with the astra vector store)

* fix (new) package renaming
2024-03-03 22:08:45 -08:00
Jerry Liu 4489eb1291 add language cookbook (#48)
* cr

* cr
2024-03-03 22:06:58 -08:00
Igor Udot dde72e3800 fix(base): add file_path to exception msg (#53)
Co-authored-by: igor.udot <igor.udot@espressif.com>
2024-03-03 22:06:09 -08:00
Jerry Liu e49fca4b51 add table comparisons cookbook (#55) 2024-03-01 16:36:00 -08:00
Jerry Liu 0411b08c07 [version] bump version to 0.3.5 (#49) 2024-02-28 17:05:22 -08:00
Stefano Lottini 6c490ab781 Astra DB notebooks allow optional namespace specification (#27) 2024-02-28 15:14:33 -08:00
Pierre-Loic Doulcet 737884d297 ADD: Support for language parameter in the package (#37)
* ADD: Support for language parameter in the package

* add language enum
2024-02-28 15:14:04 -08:00
Pascal Gula 00c63b046d Update demo_advanced.ipynb (#24) 2024-02-22 11:58:06 -06:00
Jerry Liu d04bf78335 add advanced rag example for astra (#23) 2024-02-20 14:47:02 -08:00
Logan Markewich 7dbe12b893 update to handle posix path objects 2024-02-20 14:58:52 -06:00
Eric Hare 4caa9cbc02 Astra DB Example Notebook updated for 0.10.x, remove ServiceContext (#22)
* cr

* Remove deprecated ServiceContext from notebook

---------

Co-authored-by: Jerry Liu <jerryjliu98@gmail.com>
2024-02-20 12:42:13 -08:00
Eric Hare af1f61186a Add an example for Astra DB Integration (#7) 2024-02-19 21:00:04 -08:00
Logan 13dea6ae1a Update README.md 2024-02-19 21:17:35 -06:00
Logan fa9698a681 Update README.md 2024-02-19 20:47:28 -06:00
Logan 4bed7c7895 Merge pull request #20 from run-llama/logan/async_batch 2024-02-19 18:06:37 -06:00
Logan Markewich a3954f3dda async batch 2024-02-19 17:38:09 -06:00
Logan Markewich b779e231bf version bump 2024-02-19 13:27:26 -06:00
Haotian Zhang de12de1437 Merge pull request #18 from run-llama/hz/demo_v5
upd demo results
2024-02-18 23:35:37 -05:00
Haotian Zhang 5274ea5277 cr 2024-02-18 23:32:38 -05:00
Haotian Zhang 398d775122 cr 2024-02-18 23:29:38 -05:00
Haotian Zhang aca18e12ef Merge pull request #17 from run-llama/hz/demo_v4
clean up notebook baseline method
2024-02-18 22:55:18 -05:00
Haotian Zhang 270c2ed0aa clean up notebook baseline 2024-02-18 22:49:49 -05:00
Logan 2cf960196f Merge pull request #15 from run-llama/logan/update_docs
Logan/update docs
2024-02-18 19:22:57 -06:00
Logan 6b83edc8fd Merge branch 'main' into logan/update_docs 2024-02-18 19:22:50 -06:00
Logan Markewich 711822223a [version] v0.3.0 2024-02-18 18:18:49 -06:00
Logan 8e6872b57c Merge pull request #16 from run-llama/jerry/fix_v10
nit: fix README instructions for v0.10 compatibility
2024-02-18 18:17:42 -06:00
Jerry Liu e14224b05d cr 2024-02-18 16:04:52 -08:00
Logan Markewich c3a30898af prioritize os env 2024-02-18 16:01:59 -06:00
Logan Markewich f3f6fb0444 enusre URL 2024-02-18 15:51:42 -06:00
Logan Markewich 641b6d61a7 simon comment 2024-02-18 15:50:44 -06:00
Logan Markewich c0996a64a7 allow api key 2024-02-17 23:05:22 -06:00
Logan Markewich a8904f39e2 poetry lock 2024-02-17 22:58:20 -06:00
Logan Markewich 81011f6336 update advanced demo notebook 2024-02-17 21:58:08 -06:00
Haotian Zhang 117b193ee9 Merge pull request #13 from run-llama/hz/demo_V3
Refactor demo structure
2024-02-15 22:23:35 -05:00
Haotian Zhang bd724a7939 cr 2024-02-15 22:19:48 -05:00
Haotian Zhang 866cdca216 Merge pull request #12 from run-llama/hz/demo_v1
New Llama Parser Demo
2024-02-15 15:48:59 -05:00
Haotian Zhang ba321ac5b0 cr 2024-02-15 15:38:30 -05:00
Haotian Zhang 8fab0ac2ad New Llama Parse Demo 2024-02-15 15:36:45 -05:00
364 changed files with 165176 additions and 3971 deletions
+8
View File
@@ -0,0 +1,8 @@
# Changesets
Hello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works
with multi-package repos, or single-package repos to help you version and publish your code. You can
find the full documentation for it [in our repository](https://github.com/changesets/changesets)
We have a quick list of common questions to get you started engaging with this project in
[our documentation](https://github.com/changesets/changesets/blob/main/docs/common-questions.md)
+11
View File
@@ -0,0 +1,11 @@
{
"$schema": "https://unpkg.com/@changesets/config@3.1.1/schema.json",
"changelog": "@changesets/cli/changelog",
"commit": false,
"fixed": [],
"linked": [],
"access": "restricted",
"baseBranch": "main",
"updateInternalDependencies": "patch",
"ignore": []
}
+31
View File
@@ -0,0 +1,31 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
**Describe the bug**
Write a concise description of what the bug is.
**Files**
If possible, please provide the PDF file causing the issue.
**Job ID**
If you have it, please provide the ID of the job you ran.
You can find it here: https://cloud.llamaindex.ai/parse in the "History" tab.
**Client:**
Please remove untested options:
- Python Library
- API
- Frontend (cloud.llamaindex.ai)
- Typescript Library
- Notebook
**Additional context**
Add any additional context about the problem here.
What options did you use? Premium mode, multimodal, fast mode, parsing instructions, etc.
Screenshots, code snippets, etc.
+10
View File
@@ -0,0 +1,10 @@
---
name: Custom issue
about: Not a bug nor a feature request
title: ''
labels: ''
assignees: ''
---
+10
View File
@@ -0,0 +1,10 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
+11
View File
@@ -0,0 +1,11 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# and
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
+53
View File
@@ -0,0 +1,53 @@
name: Build Package - Python
# Build package on its own without additional pip install
on:
push:
branches:
- main
paths:
- "py/**"
pull_request:
paths:
- "py/**"
env:
UV_VERSION: "0.7.20"
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
# You can use PyPy versions in python-version.
# For example, pypy-2.7 and pypy-3.8
matrix:
os: [ubuntu-latest, windows-latest]
python-version: ["3.9"]
steps:
- uses: actions/checkout@v5
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
version: ${{ env.UV_VERSION }}
- name: Set up Python
run: uv python install
- name: Display Python version
run: python --version
- name: Build
working-directory: py
run: uv build
- name: Test installing built package
shell: bash
working-directory: py
run: |
uv venv
uv pip install dist/*.whl
- name: Test import
working-directory: py
run: uv run -- python -c "import llama_cloud_services"
+34
View File
@@ -0,0 +1,34 @@
name: Build Package - TypeScript
on:
push:
branches:
- main
paths:
- "ts/**"
pull_request:
paths:
- "ts/**"
jobs:
pre_release:
name: Pre Release
runs-on: ubuntu-latest
steps:
- name: Checkout Repo
uses: actions/checkout@v5
- uses: pnpm/action-setup@v4
- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version-file: "ts/llama_cloud_services/.nvmrc"
- name: Install dependencies
working-directory: ts/llama_cloud_services/
run: pnpm install --no-frozen-lockfile
- name: Build
working-directory: ts/llama_cloud_services/
run: pnpm run build
+95
View File
@@ -0,0 +1,95 @@
name: Claude Code
on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review:
types: [submitted]
jobs:
claude:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
issues: read
id-token: write
steps:
- name: Check repository access
id: check-access
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get the user who triggered the event
case "${{ github.event_name }}" in
"issue_comment")
USER="${{ github.event.comment.user.login }}"
;;
"pull_request_review_comment")
USER="${{ github.event.comment.user.login }}"
;;
"pull_request_review")
USER="${{ github.event.review.user.login }}"
;;
"issues")
USER="${{ github.event.issue.user.login }}"
;;
esac
echo "Checking repository access for user: $USER"
# Check if user has write access to the repository
REPO="${{ github.repository }}"
if gh api repos/$REPO/collaborators/$USER/permission --jq '.permission' | grep -E "(admin|write)" > /dev/null 2>&1; then
echo "User $USER has write access to the repository"
echo "authorized=true" >> $GITHUB_OUTPUT
else
echo "User $USER does not have write access to the repository"
echo "authorized=false" >> $GITHUB_OUTPUT
exit 1
fi
- name: Checkout repository
if: steps.check-access.outputs.authorized == 'true'
uses: actions/checkout@v5
with:
fetch-depth: 1
- name: Run Claude Code
if: steps.check-access.outputs.authorized == 'true'
id: claude
uses: anthropics/claude-code-action@beta
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_GITHUB_API_KEY }}
# Optional: Specify model (defaults to Claude Sonnet 4, uncomment for Claude Opus 4)
# model: "claude-opus-4-20250514"
# Optional: Customize the trigger phrase (default: @claude)
# trigger_phrase: "/claude"
# Optional: Trigger when specific user is assigned to an issue
# assignee_trigger: "claude-bot"
# Optional: Allow Claude to run specific commands
# Allow bash commands to be run, for things like running tests, linting, etc.
allowed_tools: "Bash(rg:*),Bash(find:*),Bash(grep:*),Bash(pnpm:*),Bash(npm:*),Bash(uv:*),Bash(pip:*),Bash(pipx:*),Bash(make:*),Bash(cd:*),WebFetch"
# Optional: Add custom instructions for Claude to customize its behavior for your project
# custom_instructions: |
# Follow our coding standards
# Ensure all new code has tests
# Use TypeScript for new files
# Optional: Custom environment variables for Claude
# claude_env: |
# NODE_ENV: test
+41
View File
@@ -0,0 +1,41 @@
name: "CodeQL"
on:
push:
branches: ["main"]
pull_request:
# The branches below must be a subset of the branches above
branches: ["main"]
schedule:
- cron: "30 16 * * 4"
jobs:
analyze:
name: Analyze
# Runner size impacts CodeQL analysis time. To learn more, please see:
# - https://gh.io/recommended-hardware-resources-for-running-codeql
# - https://gh.io/supported-runners-and-hardware-resources
# - https://gh.io/using-larger-runners
# Consider using larger runners for possible analysis time improvements.
runs-on: "ubuntu-latest"
timeout-minutes: 360
permissions:
actions: read
contents: read
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v5
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: python
dependency-caching: true
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4
with:
category: "/language:python"
+162
View File
@@ -0,0 +1,162 @@
name: Extract E2E Tests (every 4 hours)
on:
schedule:
- cron: "0 */4 * * *"
workflow_dispatch:
# Allows manual triggering
inputs:
environment:
description: "Environment to run the tests in"
required: false
default: staging
type: choice
options:
- staging
- production
notify_slack:
description: "Notify Slack"
required: false
default: false
type: boolean
workflow_call:
env:
UV_VERSION: "0.7.20"
PYTHON_VERSION: "3.12"
SLACK_CHANNEL_ID: C078PHNTF44 # Extract channel ID
API_E2E_LOG_PATH: ${{ github.workspace }}/extract-e2e.log
jobs:
extract-e2e:
name: "Extract E2E Tests (${{ matrix.environment }})"
runs-on: ubuntu-latest
timeout-minutes: 30
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.environment }}
cancel-in-progress: true
strategy:
fail-fast: false
matrix:
environment: ${{ github.event_name == 'schedule' && fromJson('["staging", "production"]') || fromJson(format('["{0}"]', github.event.inputs.environment || 'staging')) }}
steps:
- name: Set runtime inputs
id: runtime
run: |
environment=${{ matrix.environment }}
notify_slack=${{ github.event.inputs.notify_slack || github.event_name == 'schedule' }}
echo "environment=${environment}" >> $GITHUB_OUTPUT
echo "notify_slack=${notify_slack}" >> $GITHUB_OUTPUT
if [ "${environment}" = "production" ]; then
echo "LLAMA_CLOUD_BASE_URL=https://api.cloud.llamaindex.ai" >> $GITHUB_ENV
api_key_secret="${{ secrets.LLAMA_CLOUD_API_KEY }}"
project_id_secret="${{ secrets.LLAMA_CLOUD_PROJECT_ID }}"
else
echo "LLAMA_CLOUD_BASE_URL=https://api.staging.llamaindex.ai" >> $GITHUB_ENV
api_key_secret="${{ secrets.LLAMA_CLOUD_API_KEY_STAGING }}"
project_id_secret="${{ secrets.LLAMA_CLOUD_PROJECT_ID_STAGING }}"
fi
if [ -n "$api_key_secret" ]; then
echo "LLAMA_CLOUD_API_KEY=$api_key_secret" >> $GITHUB_ENV
fi
if [ -n "$project_id_secret" ]; then
echo "LLAMA_CLOUD_PROJECT_ID=$project_id_secret" >> $GITHUB_ENV
fi
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
version: ${{ env.UV_VERSION }}
- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }} && uv python pin ${{ env.PYTHON_VERSION }}
- name: Run Extract E2E tests
id: extract-tests
continue-on-error: true
working-directory: py
run: |
set -o pipefail
rm -f "$API_E2E_LOG_PATH"
uv run pytest -v -n 8 --timeout=300 --session-timeout=1740 tests/extract/ 2>&1 | tee "$API_E2E_LOG_PATH"
- name: Extract pytest failure summary
id: failed-tests
if: steps.extract-tests.outcome == 'failure' || cancelled()
run: |
summary="$(python3 - <<'PY'
import os
import re
from pathlib import Path
log_path = Path(os.environ["API_E2E_LOG_PATH"])
if not log_path.exists():
print("Test log not found.")
raise SystemExit(0)
lines = log_path.read_text(errors="ignore").splitlines()
# Find the "short test summary info" section
start = None
for i, line in enumerate(lines):
if line.startswith("=") and "short test summary info" in line:
start = i + 1
break
if start is None:
print("No test summary found.")
raise SystemExit(0)
# Extract just the FAILED/ERROR lines (test name + short reason)
failed_tests = []
for line in lines[start:]:
if line.startswith("="):
break # End of section
if line.startswith("FAILED ") or line.startswith("ERROR "):
# Extract test name and truncate the error message
match = re.match(r"(FAILED|ERROR) ([\w/:.\[\]_-]+)", line)
if match:
failed_tests.append(f"{match.group(1)}: {match.group(2)}")
if failed_tests:
print("\n".join(failed_tests[:20])) # Limit to 20 tests max
else:
print("No failed tests found in summary.")
PY
)"
if [ -z "$summary" ]; then
summary="Failed test summary not available. Review the full run logs."
fi
{
printf 'summary<<EOF\n%s\nEOF\n' "$summary"
} >> "$GITHUB_OUTPUT"
- name: Check test results
if: always()
run: |
if [ "${{ steps.extract-tests.outcome }}" == "failure" ]; then
echo "Extract E2E tests failed"
exit 1
fi
- name: Post to Extract Slack channel
id: slack
if: (failure() || cancelled()) && steps.runtime.outputs.notify_slack == 'true'
uses: slackapi/slack-github-action@v2.1.1
with:
channel-id: ${{ env.SLACK_CHANNEL_ID }}
slack-message: |
:red_circle: *Extract E2E Failed* (${{ steps.runtime.outputs.environment }})
```
${{ steps.failed-tests.outputs.summary }}
```
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Run>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
+46
View File
@@ -0,0 +1,46 @@
name: Lint
on:
push:
branches:
- main
pull_request:
env:
UV_VERSION: "0.7.20"
jobs:
build:
runs-on: ubuntu-latest
strategy:
# You can use PyPy versions in python-version.
# For example, pypy-2.7 and pypy-3.8
matrix:
python-version: ["3.9"]
steps:
- uses: actions/checkout@v5
with:
fetch-depth: ${{ github.event_name == 'pull_request' && 2 || 0 }}
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
version: ${{ env.UV_VERSION }}
- name: Set up Python
run: uv python install ${{ matrix.python-version }}
- uses: pnpm/action-setup@v4
- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version-file: "ts/llama_cloud_services/.nvmrc"
- name: Install dependencies
run: pnpm install --no-frozen-lockfile
- name: Run linter
shell: bash
working-directory: py
run: uv run -- pre-commit run -a
# the js checks are run roundaboutly through lint-staged, and -a doesn't run it. Run them directly.
- run: pnpm -w --filter llama-cloud-services run lint
- run: pnpm -w --filter llama-cloud-services run format:check
+39
View File
@@ -0,0 +1,39 @@
name: Test end-to-end - Python
on:
pull_request:
paths:
- "py/**"
env:
UV_VERSION: "0.7.20"
LLAMA_CLOUD_API_KEY: ${{ secrets.LLAMA_CLOUD_API_KEY }}
jobs:
test_e2e:
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
# You can use PyPy versions in python-version.
# For example, pypy-2.7 and pypy-3.8
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
version: ${{ env.UV_VERSION }}
- name: Set up Python
run: uv python install ${{ matrix.python-version }} && uv python pin ${{ matrix.python-version }}
- name: Run Tests
working-directory: py
run: make e2e
- name: Remove virtual environment
working-directory: py
run: rm -rf .venv/
+42
View File
@@ -0,0 +1,42 @@
name: Test - Python
on:
push:
branches:
- main
paths:
- "py/**"
pull_request:
paths:
- "py/**"
env:
UV_VERSION: "0.7.20"
jobs:
test:
runs-on: ubuntu-latest
strategy:
# You can use PyPy versions in python-version.
# For example, pypy-2.7 and pypy-3.8
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
version: ${{ env.UV_VERSION }}
- name: Set up Python
run: uv python install ${{ matrix.python-version }} && uv python pin ${{ matrix.python-version }}
- name: Run Tests
working-directory: py
run: uv run pytest unit_tests/ -v
- name: Remove virtual environment
working-directory: py
run: rm -rf .venv/
+39
View File
@@ -0,0 +1,39 @@
name: Test - TypeScript
on:
push:
branches:
- main
paths:
- "ts/**"
pull_request:
paths:
- "ts/**"
env:
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
TURBO_TEAM: ${{ vars.TURBO_TEAM }}
TURBO_REMOTE_ONLY: true
LLAMA_CLOUD_API_KEY: ${{ secrets.LLAMA_CLOUD_API_KEY }}
jobs:
test:
name: Test - TypeScript
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: pnpm/action-setup@v4
- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version-file: "ts/llama_cloud_services/.nvmrc"
- name: Install dependencies
run: pnpm -r install --no-frozen-lockfile
- name: Build package
run: pnpm --filter llama-cloud-services build
- name: Run Tests
working-directory: ts/llama_cloud_services/
run: pnpm test
- name: Run e2e tests
working-directory: ts/e2e-tests/
run: pnpm test
@@ -0,0 +1,61 @@
name: Version Bump and Release
on:
push:
branches:
- main
concurrency: ${{ github.workflow }}-${{ github.ref }}
jobs:
release:
name: Release
runs-on: ubuntu-latest
# Only run on main branch pushes
if: github.ref == 'refs/heads/main'
steps:
- name: Checkout Repo
uses: actions/checkout@v5
- uses: pnpm/action-setup@v4
- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version: "22"
cache: "pnpm"
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: "3.11"
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install dependencies
run: pnpm install
- name: Add auth token to .npmrc file
run: |
cat << EOF >> ".npmrc"
//registry.npmjs.org/:_authToken=$NPM_TOKEN
EOF
env:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
- name: Create Release Pull Request or Publish packages
id: changesets
uses: changesets/action@v1
with:
commit: "chore: version packages"
title: "chore: version packages"
# Custom version script
version: pnpm -w run version
# Custom publish script
publish: pnpm -w run publish
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }}
LLAMA_PARSE_PYPI_TOKEN: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
+10 -1
View File
@@ -1,3 +1,12 @@
.git
__pycache__/
*.pyc
*.pyc
.DS_Store
.idea
.env*
.ipynb_checkpoints*
*_cache/
node_modules/
.turbo/
dist/
.npmrc
+92
View File
@@ -0,0 +1,92 @@
---
default_language_version:
python: python3
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-byte-order-marker
- id: check-merge-conflict
- id: check-symlinks
- id: check-toml
- id: check-yaml
- id: detect-private-key
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
exclude: ^ts/llama_cloud_services/src/client/
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.1.5
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
exclude: ".*uv.lock|examples/"
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 23.10.1
hooks:
- id: black-jupyter
name: black-src
alias: black
exclude: ".*uv.lock|examples/extract/solar_panel_e2e_comparison.ipynb"
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.0.1
hooks:
- id: mypy
exclude: ^py/tests|^py/unit_tests|^examples
additional_dependencies:
[
"types-requests",
"types-Deprecated",
"types-redis",
"types-setuptools",
"types-PyYAML",
"types-protobuf==4.24.0.4",
]
args:
[
--disallow-untyped-defs,
--ignore-missing-imports,
--python-version=3.10,
]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.16.0
hooks:
- id: blacken-docs
name: black-docs-text
alias: black
types_or: [rst, markdown, tex]
additional_dependencies: [black==23.10.1]
# Using PEP 8's line length in docs prevents excess left/right scrolling
args: [--line-length=79]
- repo: local
hooks:
- id: lint-staged
name: Run lint-staged for TS files
entry: pnpm -w exec lint-staged
language: system
pass_filenames: false
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
additional_dependencies: [tomli]
exclude: ^(uv.lock|docs|ts|examples|pnpm-lock.yaml)
args:
[
"--ignore-words-list",
"astroid,gallary,momento,narl,ot,rouge,nin,gere,te,inh,vor",
]
- repo: https://github.com/srstevenson/nb-clean
rev: 3.1.0
hooks:
- id: nb-clean
args: [--preserve-cell-outputs, --remove-empty-cells]
- repo: https://github.com/pappasam/toml-sort
rev: v0.23.1
hooks:
- id: toml-sort-fix
exclude: ".*uv.lock"
exclude: ^(.github/ISSUE_TEMPLATE|ts/llama_cloud_services/src/client|pnpm-lock.yaml)
+33
View File
@@ -0,0 +1,33 @@
# Python
## Installation
This project uses uv. Create a virtual environment, and run `uv sync`
## Versioning (Maintainers only)
Before merging your changes, make sure to bump the versions.
Make a version bump to `pyproject.toml`. If the underlying dependency on the llamacloud platform OpenAPI
sdk needs bumping, make sure to bring that in as well. If updating dependencies, run `uv lock`.
The legacy `llama_parse` package re-exports some of `llama_cloud_services` in the old namespace. The
versions need to be kept consistent to sidecar it with `llama_cloud_services`. Bump it's version in `llama_parse/pyproject.toml`, and also bump it's dependency version of `llama-cloud-services` to match.
**Note**: Don't worry about updating the `llama_parse/poetry.lock` file when bumping versions. The GitHub action will automatically run `poetry lock` for the llama_parse package during the build process (though it doesn't commit the updated lockfile back to the repo).
You can also do this with `./scripts/version-bump.py set 0.x.x` if you have `uv` installed.
Once the change is merged, push a tag `git tag -a v0.x.x -m 0.x.x` and `git push origin v0.x.x`.
This tagging step can be done with `./scripts/version-bump tag`.
# Typescript
## Installation
...
## Versioning
...
+13 -71
View File
@@ -1,73 +1,15 @@
# LlamaParse (Preview)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-cloud-services)](https://pypi.org/project/llama-cloud-services/)
[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_cloud_services)](https://github.com/run-llama/llama_cloud_services/graphs/contributors)
[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
LlamaParse is an API created by LlamaIndex to efficiently parse and represent files for efficient retrieval and context augmentation using LlamaIndex frameworks.
# Llama Cloud Services
LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
Currently available in preview mode for **free**. Try it out today!
**NOTE:** Currently, only PDF files are supported.
## Getting Started
First, login and get an api-key from `https://cloud.llamaindex.ai`.
Install the package:
`pip install llama-parse`
Then, you can run the following to parse your first PDF file:
```python
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
verbose=True
)
# sync
documents = parser.load_data("./my_file.pdf")
# async
documents = await parser.aload_data("./my_file.pdf")
```
## Using with `SimpleDirectoryReader`
You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
```python
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
from llama_index import SimpleDirectoryReader
parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
verbose=True
)
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader("./data", file_extractor=file_extractor).load_data()
```
Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
## Examples
Several end-to-end indexing examples can be found in the examples folder
- [Getting Started](examples/demo_basic.ipynb)
- [Advanced RAG Example](examples/demo_advanced.ipynb)
- [Raw API Usage](examples/demo_api.ipynb)
## Terms of Service
See the [Terms of Service Here](./TOS.pdf).
> **⚠️ DEPRECATION NOTICE**
>
> This repository and its packages are deprecated and will be maintained until **May 1, 2026**.
>
> **Please migrate to the new packages:**
> - **Python**: `pip install llama-cloud>=1.0` ([GitHub](https://github.com/run-llama/llama-cloud-py))
> - **TypeScript**: `npm install @llamaindex/llama-cloud` ([GitHub](https://github.com/run-llama/llama-cloud-ts))
>
> The new packages provide the same functionality with improved performance, better support, and active development.
+8
View File
@@ -0,0 +1,8 @@
# LlamaCloud Services Examples - Python
In this folder you will find several TypeScript end-to-end applications that contain examples regarding:
- [LlamaParse](./parse/)
- [LlamaCloud Index](./index/)
Follow the instructions in each example folder to get started!
+21
View File
@@ -0,0 +1,21 @@
node_modules
package-lock.json
yarn.lock
.DS_Store
.cache
.env
.vercel
.output
.nitro
/build/
/api/
/server/build
/public/build# Sentry Config File
.env.sentry-build-plugin
/test-results/
/playwright-report/
/blob-report/
/playwright/.cache/
.tanstack
.vscode
+4
View File
@@ -0,0 +1,4 @@
**/build
**/public
pnpm-lock.yaml
routeTree.gen.ts
+88
View File
@@ -0,0 +1,88 @@
# LlamaClassify Demo
A TypeScript demo application showcasing the power of **LlamaClassify** - an agentic documents classification service from [LlamaCloud](https://cloud.llamaindex.ai). This demo allows you to classify financial documents among three different types (Cash flow statement, Income Statement and Balance Sheet).
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Start the Demo](#start-the-demo)
- [How It Works](#how-it-works)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [License](#license)
- [Contributing](#contributing)
## Features
- 📄 **Documemt Classification**: Classify files based on well-defined rules you can customized and play around with.
- 🤖 **Reasoning-based Actionable Insights**: Get in-depth, reasoning based insights on the document classification, accompanied by confidence scores.
- 🎨 **Beautiful UI**: [DaisyUI](https://daisyui.com)-based interface powered by [TanStack](https://tanstack.com)
-**Fast Development**: Hot reload support with development mode
- 🛠️ **TypeScript**: Full TypeScript support with strict type checking
## Prerequisites
- Node.js (version 22 or higher)
- pnpm package manager
- LlamaCloud API key
## Installation
1. Clone the repository:
```bash
git clone https://github.com/run-llama/llama_cloud_services
cd lama_cloud_services/examples-ts/classify/
```
2. Install dependencies:
```bash
npm install
```
3. Set up your environment variables:
```bash
# Add your API key to your environment
export LLAMA_CLOUD_API_KEY="your-llamacloud-api-key"
```
## Usage
### Start the Demo
```bash
npm run dev
```
The application will be up and running on http://localhost:3000
## How It Works
1. **Document Input**: Enter the path to your document when prompted
2. **Parsing**: LlamaClassify, based on the rules you can find [here](./src/utils/classifier.ts), processes the document and classifies it
3. **Results**: The classification outcome, as well as the reasoning behind it and the confidence score, are displayed in the UI.
## Troubleshooting
### Common Issues
1. **Module Resolution Errors**: Ensure you're using Node.js 22+ and have all dependencies installed
2. **API Key Issues**: Verify your LlamaCloud API key is correctly set
3. **File Path Errors**: Use absolute paths or ensure relative paths are correct from the project root
## License
MIT License - see the [LICENSE](../../LICENSE) file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run `npm run format` and `npm run lint`
5. Submit a pull request
+34
View File
@@ -0,0 +1,34 @@
{
"name": "tanstack-start-example-basic",
"private": true,
"sideEffects": false,
"type": "module",
"scripts": {
"dev": "vite dev",
"build": "vite build && tsc --noEmit",
"start": "node .output/server/index.mjs"
},
"dependencies": {
"@tanstack/react-router": "^1.133.22",
"@tanstack/react-router-devtools": "^1.133.22",
"@tanstack/react-start": "^1.133.22",
"llama-cloud-services": "file:../../ts/llama_cloud_services",
"react": "^19.0.0",
"react-dom": "^19.0.0",
"tailwind-merge": "^2.6.0",
"zod": "^3.24.2"
},
"devDependencies": {
"@tailwindcss/postcss": "^4.1.15",
"@types/node": "^22.5.4",
"@types/react": "^19.0.8",
"@types/react-dom": "^19.0.3",
"@vitejs/plugin-react": "^4.6.0",
"daisyui": "^5.3.7",
"postcss": "^8.5.1",
"tailwindcss": "^4.1.15",
"typescript": "^5.7.2",
"vite": "^7.1.7",
"vite-tsconfig-paths": "^5.1.4"
}
}
+5
View File
@@ -0,0 +1,5 @@
export default {
plugins: {
'@tailwindcss/postcss': {},
},
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 862 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

@@ -0,0 +1,19 @@
{
"name": "",
"short_name": "",
"icons": [
{
"src": "/android-chrome-192x192.png",
"sizes": "192x192",
"type": "image/png"
},
{
"src": "/android-chrome-512x512.png",
"sizes": "512x512",
"type": "image/png"
}
],
"theme_color": "#ffffff",
"background_color": "#ffffff",
"display": "standalone"
}
@@ -0,0 +1,53 @@
import {
ErrorComponent,
Link,
rootRouteId,
useMatch,
useRouter,
} from '@tanstack/react-router'
import type { ErrorComponentProps } from '@tanstack/react-router'
export function DefaultCatchBoundary({ error }: ErrorComponentProps) {
const router = useRouter()
const isRoot = useMatch({
strict: false,
select: (state) => state.id === rootRouteId,
})
console.error('DefaultCatchBoundary Error:', error)
return (
<div className="min-w-0 flex-1 p-4 flex flex-col items-center justify-center gap-6">
<ErrorComponent error={error} />
<div className="flex gap-2 items-center flex-wrap">
<button
onClick={() => {
router.invalidate()
}}
className={`px-2 py-1 bg-gray-600 dark:bg-gray-700 rounded-sm text-white uppercase font-extrabold`}
>
Try Again
</button>
{isRoot ? (
<Link
to="/"
className={`px-2 py-1 bg-gray-600 dark:bg-gray-700 rounded-sm text-white uppercase font-extrabold`}
>
Home
</Link>
) : (
<Link
to="/"
className={`px-2 py-1 bg-gray-600 dark:bg-gray-700 rounded-sm text-white uppercase font-extrabold`}
onClick={(e) => {
e.preventDefault()
window.history.back()
}}
>
Go Back
</Link>
)}
</div>
</div>
)
}
@@ -0,0 +1,25 @@
import { Link } from '@tanstack/react-router'
export function NotFound({ children }: { children?: any }) {
return (
<div className="space-y-2 p-2">
<div className="text-gray-600 dark:text-gray-400">
{children || <p>The page you are looking for does not exist.</p>}
</div>
<p className="flex items-center gap-2 flex-wrap">
<button
onClick={() => window.history.back()}
className="bg-emerald-500 text-white px-2 py-1 rounded-sm uppercase font-black text-sm"
>
Go back
</button>
<Link
to="/"
className="bg-cyan-600 text-white px-2 py-1 rounded-sm uppercase font-black text-sm"
>
Start Over
</Link>
</p>
</div>
)
}
+225
View File
@@ -0,0 +1,225 @@
/* eslint-disable */
// @ts-nocheck
// noinspection JSUnusedGlobalSymbols
// This file was automatically generated by TanStack Router.
// You should NOT make any changes in this file as it will be overwritten.
// Additionally, you should also exclude this file from your linter and/or formatter to prevent it from being checked or modified.
import { Route as rootRouteImport } from './routes/__root'
import { Route as UsersRouteImport } from './routes/users'
import { Route as IndexRouteImport } from './routes/index'
import { Route as UsersIndexRouteImport } from './routes/users.index'
import { Route as PostsIndexRouteImport } from './routes/posts.index'
import { Route as UsersUserIdRouteImport } from './routes/users.$userId'
import { Route as PostsPostIdRouteImport } from './routes/posts.$postId'
import { Route as ApiClassifyRouteImport } from './routes/api/classify'
import { Route as PostsPostIdDeepRouteImport } from './routes/posts_.$postId.deep'
const UsersRoute = UsersRouteImport.update({
id: '/users',
path: '/users',
getParentRoute: () => rootRouteImport,
} as any)
const IndexRoute = IndexRouteImport.update({
id: '/',
path: '/',
getParentRoute: () => rootRouteImport,
} as any)
const UsersIndexRoute = UsersIndexRouteImport.update({
id: '/',
path: '/',
getParentRoute: () => UsersRoute,
} as any)
const PostsIndexRoute = PostsIndexRouteImport.update({
id: '/posts/',
path: '/posts/',
getParentRoute: () => rootRouteImport,
} as any)
const UsersUserIdRoute = UsersUserIdRouteImport.update({
id: '/$userId',
path: '/$userId',
getParentRoute: () => UsersRoute,
} as any)
const PostsPostIdRoute = PostsPostIdRouteImport.update({
id: '/posts/$postId',
path: '/posts/$postId',
getParentRoute: () => rootRouteImport,
} as any)
const ApiClassifyRoute = ApiClassifyRouteImport.update({
id: '/api/classify',
path: '/api/classify',
getParentRoute: () => rootRouteImport,
} as any)
const PostsPostIdDeepRoute = PostsPostIdDeepRouteImport.update({
id: '/posts_/$postId/deep',
path: '/posts/$postId/deep',
getParentRoute: () => rootRouteImport,
} as any)
export interface FileRoutesByFullPath {
'/': typeof IndexRoute
'/users': typeof UsersRouteWithChildren
'/api/classify': typeof ApiClassifyRoute
'/posts/$postId': typeof PostsPostIdRoute
'/users/$userId': typeof UsersUserIdRoute
'/posts': typeof PostsIndexRoute
'/users/': typeof UsersIndexRoute
'/posts/$postId/deep': typeof PostsPostIdDeepRoute
}
export interface FileRoutesByTo {
'/': typeof IndexRoute
'/api/classify': typeof ApiClassifyRoute
'/posts/$postId': typeof PostsPostIdRoute
'/users/$userId': typeof UsersUserIdRoute
'/posts': typeof PostsIndexRoute
'/users': typeof UsersIndexRoute
'/posts/$postId/deep': typeof PostsPostIdDeepRoute
}
export interface FileRoutesById {
__root__: typeof rootRouteImport
'/': typeof IndexRoute
'/users': typeof UsersRouteWithChildren
'/api/classify': typeof ApiClassifyRoute
'/posts/$postId': typeof PostsPostIdRoute
'/users/$userId': typeof UsersUserIdRoute
'/posts/': typeof PostsIndexRoute
'/users/': typeof UsersIndexRoute
'/posts_/$postId/deep': typeof PostsPostIdDeepRoute
}
export interface FileRouteTypes {
fileRoutesByFullPath: FileRoutesByFullPath
fullPaths:
| '/'
| '/users'
| '/api/classify'
| '/posts/$postId'
| '/users/$userId'
| '/posts'
| '/users/'
| '/posts/$postId/deep'
fileRoutesByTo: FileRoutesByTo
to:
| '/'
| '/api/classify'
| '/posts/$postId'
| '/users/$userId'
| '/posts'
| '/users'
| '/posts/$postId/deep'
id:
| '__root__'
| '/'
| '/users'
| '/api/classify'
| '/posts/$postId'
| '/users/$userId'
| '/posts/'
| '/users/'
| '/posts_/$postId/deep'
fileRoutesById: FileRoutesById
}
export interface RootRouteChildren {
IndexRoute: typeof IndexRoute
UsersRoute: typeof UsersRouteWithChildren
ApiClassifyRoute: typeof ApiClassifyRoute
PostsPostIdRoute: typeof PostsPostIdRoute
PostsIndexRoute: typeof PostsIndexRoute
PostsPostIdDeepRoute: typeof PostsPostIdDeepRoute
}
declare module '@tanstack/react-router' {
interface FileRoutesByPath {
'/users': {
id: '/users'
path: '/users'
fullPath: '/users'
preLoaderRoute: typeof UsersRouteImport
parentRoute: typeof rootRouteImport
}
'/': {
id: '/'
path: '/'
fullPath: '/'
preLoaderRoute: typeof IndexRouteImport
parentRoute: typeof rootRouteImport
}
'/users/': {
id: '/users/'
path: '/'
fullPath: '/users/'
preLoaderRoute: typeof UsersIndexRouteImport
parentRoute: typeof UsersRoute
}
'/posts/': {
id: '/posts/'
path: '/posts'
fullPath: '/posts'
preLoaderRoute: typeof PostsIndexRouteImport
parentRoute: typeof rootRouteImport
}
'/users/$userId': {
id: '/users/$userId'
path: '/$userId'
fullPath: '/users/$userId'
preLoaderRoute: typeof UsersUserIdRouteImport
parentRoute: typeof UsersRoute
}
'/posts/$postId': {
id: '/posts/$postId'
path: '/posts/$postId'
fullPath: '/posts/$postId'
preLoaderRoute: typeof PostsPostIdRouteImport
parentRoute: typeof rootRouteImport
}
'/api/classify': {
id: '/api/classify'
path: '/api/classify'
fullPath: '/api/classify'
preLoaderRoute: typeof ApiClassifyRouteImport
parentRoute: typeof rootRouteImport
}
'/posts_/$postId/deep': {
id: '/posts_/$postId/deep'
path: '/posts/$postId/deep'
fullPath: '/posts/$postId/deep'
preLoaderRoute: typeof PostsPostIdDeepRouteImport
parentRoute: typeof rootRouteImport
}
}
}
interface UsersRouteChildren {
UsersUserIdRoute: typeof UsersUserIdRoute
UsersIndexRoute: typeof UsersIndexRoute
}
const UsersRouteChildren: UsersRouteChildren = {
UsersUserIdRoute: UsersUserIdRoute,
UsersIndexRoute: UsersIndexRoute,
}
const UsersRouteWithChildren = UsersRoute._addFileChildren(UsersRouteChildren)
const rootRouteChildren: RootRouteChildren = {
IndexRoute: IndexRoute,
UsersRoute: UsersRouteWithChildren,
ApiClassifyRoute: ApiClassifyRoute,
PostsPostIdRoute: PostsPostIdRoute,
PostsIndexRoute: PostsIndexRoute,
PostsPostIdDeepRoute: PostsPostIdDeepRoute,
}
export const routeTree = rootRouteImport
._addFileChildren(rootRouteChildren)
._addFileTypes<FileRouteTypes>()
import type { getRouter } from './router.tsx'
import type { createStart } from '@tanstack/react-start'
declare module '@tanstack/react-start' {
interface Register {
ssr: true
router: Awaited<ReturnType<typeof getRouter>>
}
}
+15
View File
@@ -0,0 +1,15 @@
import { createRouter } from '@tanstack/react-router'
import { routeTree } from './routeTree.gen'
import { DefaultCatchBoundary } from './components/DefaultCatchBoundary'
import { NotFound } from './components/NotFound'
export function getRouter() {
const router = createRouter({
routeTree,
defaultPreload: 'intent',
defaultErrorComponent: DefaultCatchBoundary,
defaultNotFoundComponent: () => <NotFound />,
scrollRestoration: true,
})
return router
}
+128
View File
@@ -0,0 +1,128 @@
/// <reference types="vite/client" />
import {
HeadContent,
Scripts,
createRootRoute,
} from '@tanstack/react-router'
import * as React from 'react'
import { DefaultCatchBoundary } from '~/components/DefaultCatchBoundary'
import { NotFound } from '~/components/NotFound'
import { seo } from '~/utils/seo'
export const Route = createRootRoute({
head: () => ({
meta: [
{
charSet: 'utf-8',
},
{
name: 'viewport',
content: 'width=device-width, initial-scale=1',
},
...seo({
title:
'Financial Documents Classification Agent',
description: `Classify financial documents as balance sheets, income statements and cash flow statemets. `,
}),
],
links: [
{ rel: 'stylesheet', href: "https://cdn.jsdelivr.net/npm/daisyui@5" },
{
rel: 'apple-touch-icon',
sizes: '180x180',
href: '/apple-touch-icon.png',
},
{
rel: 'icon',
type: 'image/png',
sizes: '32x32',
href: '/favicon-32x32.png',
},
{
rel: 'icon',
type: 'image/png',
sizes: '16x16',
href: '/favicon-16x16.png',
},
{ rel: 'manifest', href: '/site.webmanifest', color: '#fffff' },
{ rel: 'icon', href: '/favicon.ico' },
],
scripts: [
{
src: '/customScript.js',
type: 'text/javascript',
},
{
src: "https://cdn.jsdelivr.net/npm/@tailwindcss/browser@4",
type: "text/javascript",
}
],
}),
errorComponent: DefaultCatchBoundary,
notFoundComponent: () => <NotFound />,
shellComponent: RootDocument,
})
function RootDocument({ children }: { children: React.ReactNode }) {
return (
<html>
<head>
<HeadContent />
</head>
<body>
<div className="navbar bg-base-100 shadow-sm">
<div className="navbar-start">
<div className="dropdown">
<div tabIndex={0} role="button" className="btn btn-ghost btn-circle">
<svg
xmlns="http://www.w3.org/2000/svg"
className="h-5 w-5"
fill="none"
viewBox="0 0 24 24"
stroke="currentColor"
>
<path
strokeLinecap="round"
strokeLinejoin="round"
strokeWidth="2"
d="M4 6h16M4 12h16M4 18h7"
/>
</svg>
</div>
<ul
tabIndex={0}
className="menu menu-lg dropdown-content bg-base-100 rounded-box z-1 mt-3 w-80 p-2 shadow"
>
<li><a href="/">Home</a></li>
<li><a href="https://cloud.llamaindex.ai">Get Started with LlamaCloud</a></li>
<li><a href="https://developers.llamaindex.ai/python/cloud/llamaclassify/getting_started/">LlamaClassify Docs</a></li>
</ul>
</div>
</div>
<div className="navbar-center">
<a className="btn btn-ghost text-xl" href="/">Financial Documents Classification Agent</a>
</div>
<div className="navbar-end">
<a href="https://github.com/run-llama/llama_cloud_services/main/blob/examples-ts/classify">
<button className="btn btn-ghost btn-circle">
<div className="indicator">
<svg
xmlns="http://www.w3.org/2000/svg"
className="h-10 w-10"
fill="currentColor"
viewBox="0 0 640 512"
>
<path d="M237.9 461.4C237.9 463.4 235.6 465 232.7 465C229.4 465.3 227.1 463.7 227.1 461.4C227.1 459.4 229.4 457.8 232.3 457.8C235.3 457.5 237.9 459.1 237.9 461.4zM206.8 456.9C206.1 458.9 208.1 461.2 211.1 461.8C213.7 462.8 216.7 461.8 217.3 459.8C217.9 457.8 216 455.5 213 454.6C210.4 453.9 207.5 454.9 206.8 456.9zM251 455.2C248.1 455.9 246.1 457.8 246.4 460.1C246.7 462.1 249.3 463.4 252.3 462.7C255.2 462 257.2 460.1 256.9 458.1C256.6 456.2 253.9 454.9 251 455.2zM316.8 72C178.1 72 72 177.3 72 316C72 426.9 141.8 521.8 241.5 555.2C254.3 557.5 258.8 549.6 258.8 543.1C258.8 536.9 258.5 502.7 258.5 481.7C258.5 481.7 188.5 496.7 173.8 451.9C173.8 451.9 162.4 422.8 146 415.3C146 415.3 123.1 399.6 147.6 399.9C147.6 399.9 172.5 401.9 186.2 425.7C208.1 464.3 244.8 453.2 259.1 446.6C261.4 430.6 267.9 419.5 275.1 412.9C219.2 406.7 162.8 398.6 162.8 302.4C162.8 274.9 170.4 261.1 186.4 243.5C183.8 237 175.3 210.2 189 175.6C209.9 169.1 258 202.6 258 202.6C278 197 299.5 194.1 320.8 194.1C342.1 194.1 363.6 197 383.6 202.6C383.6 202.6 431.7 169 452.6 175.6C466.3 210.3 457.8 237 455.2 243.5C471.2 261.2 481 275 481 302.4C481 398.9 422.1 406.6 366.2 412.9C375.4 420.8 383.2 435.8 383.2 459.3C383.2 493 382.9 534.7 382.9 542.9C382.9 549.4 387.5 557.3 400.2 555C500.2 521.8 568 426.9 568 316C568 177.3 455.5 72 316.8 72zM169.2 416.9C167.9 417.9 168.2 420.2 169.9 422.1C171.5 423.7 173.8 424.4 175.1 423.1C176.4 422.1 176.1 419.8 174.4 417.9C172.8 416.3 170.5 415.6 169.2 416.9zM158.4 408.8C157.7 410.1 158.7 411.7 160.7 412.7C162.3 413.7 164.3 413.4 165 412C165.7 410.7 164.7 409.1 162.7 408.1C160.7 407.5 159.1 407.8 158.4 408.8zM190.8 444.4C189.2 445.7 189.8 448.7 192.1 450.6C194.4 452.9 197.3 453.2 198.6 451.6C199.9 450.3 199.3 447.3 197.3 445.4C195.1 443.1 192.1 442.8 190.8 444.4zM179.4 429.7C177.8 430.7 177.8 433.3 179.4 435.6C181 437.9 183.7 438.9 185 437.9C186.6 436.6 186.6 434 185 431.7C183.6 429.4 181 428.4 179.4 429.7z" />
</svg>
</div>
</button>
</a>
</div>
</div>
<hr />
{children}
<Scripts />
</body>
</html>
)
}
@@ -0,0 +1,45 @@
import { createFileRoute } from '@tanstack/react-router'
import { classifier, classificationRules, parsingConfig } from '~/utils/classifier'
export const Route = createFileRoute('/api/classify')({
component: RouteComponent,
server: {
handlers: {
POST: async ({ request }) => {
const body = await request.formData()
const fl = body.get("file") as File;
if (!fl) {
return new Response(JSON.stringify({"result": "you need to provide a file"}))
}
const buff = await fl.arrayBuffer()
const rawRes = await classifier.classify(
classificationRules,
parsingConfig,
{ fileContents: [new Uint8Array(buff)] },
)
const results = rawRes.items
let classification = ""
for (const result of results) {
if ("result" in result && result.result) {
classification += `
<div class="card bg-base-100 shadow-xl p-6 mb-4">
<div class="space-y-3">
<p><span class="font-semibold">📄 Document:</span> ${fl.name}</p>
<p><span class="font-semibold">🏷️ Type:</span> <span class="badge badge-primary">${result.result.type}</span></p>
<p><span class="font-semibold">📊 Confidence:</span> ${result.result.confidence*100}%</p>
<p><span class="font-semibold">💭 Reasoning:</span> ${result.result.reasoning}</p>
</div>
</div>
`
}
}
return new Response(JSON.stringify({"result": classification}))
},
},
},
})
function RouteComponent() {
return
}
+99
View File
@@ -0,0 +1,99 @@
import { createFileRoute } from '@tanstack/react-router'
import { useRef, useState } from 'react'
export const Route = createFileRoute('/')({
component: Home,
})
function Home() {
const [file, setFile] = useState<null | File>(null)
const fileInputRef = useRef<HTMLInputElement>(null)
const [reply, setReply] = useState<null | string>(null)
const [loading, setLoading] = useState<boolean>(false)
const handleFileChange = (event: React.ChangeEvent<HTMLInputElement>) => {
const selectedFile = event.target.files?.[0]
if (selectedFile) {
setFile(selectedFile)
}
}
const handleClearFile = () => {
if (file) {
setFile(null)
}
if (fileInputRef.current) {
fileInputRef.current.value = ''
}
if (reply) {
setReply(null)
}
}
const handleClassify = async () => {
if (!file) return
if (reply) {
setReply(null)
}
setLoading(true)
try {
const formData = new FormData()
formData.append('file', file)
const res = await fetch('/api/classify', {
method: 'POST',
body: formData,
})
const data = await res.json()
setReply(data.result)
} catch (error) {
console.error('Error:', error)
} finally {
setLoading(false)
}
}
return (
<div className="flex flex-col justify-center items-center gap-y-8">
<br />
<h1 className="text-xl font-bold text-gray-700">AI-Powered finacial document classification</h1>
<h2 className="text-lg font-semibold text-gray-500">Need help sorting out the financial documents jungle? Let our classification agent handle it!</h2>
<fieldset className="fieldset bg-base-100 border-base-300 rounded-box w-200 border p-4">
<legend className="fieldset-legend text-lg">Upload your financial document here</legend>
<label className="label flex justify-center">
<input type="file" className="file-input" onChange={handleFileChange} accept='application/pdf' ref={fileInputRef} />
</label>
</fieldset>
{file && (
<div className="flex flex-col justify-center items-center gap-y-8">
<p className="text-sm text-gray-600">Selected file: {file.name}</p>
<div className='grid grid-cols-2 gap-x-6'>
<button
type="button"
className='btn bg-gray-500 text-white shadow-lg hover:bg-gray-600 hover:shadow-xl rounded'
onClick={handleClassify}
>
Classify
</button>
<button
onClick={handleClearFile}
type="button"
className="px-4 py-2 bg-red-300 text-black rounded hover:bg-red-400 hover:shadow-xl shadow-lg"
>
Clear
</button>
</div>
</div>
)}
{loading && (
<span className="loading loading-spinner text-primary"></span>
)}
{reply && (
<div
className="max-w-2xl w-full"
dangerouslySetInnerHTML={{ __html: reply }}
/>
)}
</div>
)
}
@@ -0,0 +1,23 @@
import { LlamaClassify, ClassifierRule, ClassifyParsingConfiguration } from "llama-cloud-services"
export const classifier = new LlamaClassify(process.env.LLAMA_CLOUD_API_KEY);
export const classificationRules: ClassifierRule[] = [
{
description: "Shows a company's assets, liabilities, and shareholders' equity at a specific point in time, providing a snapshot of financial position.",
type: "balance_sheet"
},
{
description: "Reports cash inflows and outflows from operating, investing, and financing activities, highlighting liquidity and cash management.",
type: "cash_flow_statement"
},
{
description: "Summarizes revenues, expenses, and profits over a period, indicating financial performance and profitability.",
type: "income_statement"
},
];
export const parsingConfig: ClassifyParsingConfiguration = {
lang: "en",
max_pages: 20,
}
+33
View File
@@ -0,0 +1,33 @@
export const seo = ({
title,
description,
keywords,
image,
}: {
title: string
description?: string
image?: string
keywords?: string
}) => {
const tags = [
{ title },
{ name: 'description', content: description },
{ name: 'keywords', content: keywords },
{ name: 'twitter:title', content: title },
{ name: 'twitter:description', content: description },
{ name: 'twitter:creator', content: '@tannerlinsley' },
{ name: 'twitter:site', content: '@tannerlinsley' },
{ name: 'og:type', content: 'website' },
{ name: 'og:title', content: title },
{ name: 'og:description', content: description },
...(image
? [
{ name: 'twitter:image', content: image },
{ name: 'twitter:card', content: 'summary_large_image' },
{ name: 'og:image', content: image },
]
: []),
]
return tags
}
+22
View File
@@ -0,0 +1,22 @@
{
"include": ["**/*.ts", "**/*.tsx"],
"compilerOptions": {
"strict": true,
"esModuleInterop": true,
"jsx": "react-jsx",
"module": "ESNext",
"moduleResolution": "Bundler",
"lib": ["DOM", "DOM.Iterable", "ES2022"],
"isolatedModules": true,
"resolveJsonModule": true,
"skipLibCheck": true,
"target": "ES2022",
"allowJs": true,
"forceConsistentCasingInFileNames": true,
"baseUrl": ".",
"paths": {
"~/*": ["./src/*"]
},
"noEmit": true
}
}
+19
View File
@@ -0,0 +1,19 @@
import { tanstackStart } from '@tanstack/react-start/plugin/vite'
import { defineConfig } from 'vite'
import tsConfigPaths from 'vite-tsconfig-paths'
import viteReact from '@vitejs/plugin-react'
export default defineConfig({
server: {
port: 3000,
},
plugins: [
tsConfigPaths({
projects: ['./tsconfig.json'],
}),
tanstackStart({
srcDirectory: 'src',
}),
viteReact(),
],
})
+122
View File
@@ -0,0 +1,122 @@
# LlamaExtract Demo
A TypeScript demo application showcasing the power of **LlamaExract** - a structured data extraction agentic service from [LlamaCloud](https://cloud.llamaindex.ai). This demo allows you to extract structured information from scientific papers and get them into a nice markdown format.
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Start the Demo](#start-the-demo)
- [Development Mode](#development-mode)
- [Build the Project](#build-the-project)
- [Code Quality](#code-quality)
- [Quick Commands Reference](#quick-commands-reference)
- [How It Works](#how-it-works)
- [API Dependencies](#api-dependencies)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [License](#license)
- [Contributing](#contributing)
## Features
- 📄 **Structured Data Extraction**: Extract data from your files effortlessly, and structure them the way you want!
- 🤖 **Markdown Rendering**: Generate markdown directly from your extracted data
- 🎨 **Beautiful CLI**: Styled console interface with colors and ASCII art
-**Fast Development**: Hot reload support with watch mode
- 🛠️ **TypeScript**: Full TypeScript support with strict type checking
## Prerequisites
- Node.js (version 18 or higher)
- pnpm package manager
- LlamaCloud API key
## Installation
1. Clone the repository:
```bash
git clone https://github.com/run-llama/llama_cloud_services
cd lama_cloud_services/examples-ts/extract/
```
2. Install dependencies:
```bash
npm install
```
3. Set up your environment variables:
```bash
# Add your API key to your environment
export LLAMA_CLOUD_API_KEY="your-llamacloud-api-key"
```
## Usage
### Start the Demo
```bash
npm run start
```
The application will display a welcome screen and prompt you to enter the path to a document you'd like to process.
### Development Mode
For development with hot reload:
```bash
npm run dev
```
### Build the Project
```bash
npm run build
```
### Code Quality
Format code:
```bash
npm run format
```
Lint code:
```bash
npm run lint
```
## How It Works
1. **Document Input**: Enter the path to your document when prompted
2. **Parsing**: LlamaExtract, based on the schema you can find [here](./src/schema.ts), processes the document and extracts structured data
3. **Markdown Rendering**: The extracted content is rendered into beautiful markdown
4. **Results**: View the results directly in your terminal
## Troubleshooting
### Common Issues
1. **Module Resolution Errors**: Ensure you're using Node.js 18+ and have all dependencies installed
2. **API Key Issues**: Verify your LlamaCloud API key is correctly set
3. **File Path Errors**: Use absolute paths or ensure relative paths are correct from the project root
## License
MIT License - see the [LICENSE](../../LICENSE) file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run `npm run format` and `npm run lint`
5. Submit a pull request
+14
View File
@@ -0,0 +1,14 @@
import js from "@eslint/js";
import globals from "globals";
import tseslint from "typescript-eslint";
import { defineConfig } from "eslint/config";
export default defineConfig([
{
files: ["**/*.{js,mjs,cjs,ts,mts,cts}"],
plugins: { js },
extends: ["js/recommended"],
languageOptions: { globals: globals.browser },
},
tseslint.configs.recommended,
]);
File diff suppressed because it is too large Load Diff
+37
View File
@@ -0,0 +1,37 @@
{
"name": "llama-extract-demo",
"version": "0.1.0",
"description": "Demo for LlamaExtract in TypeScript",
"main": "index.js",
"scripts": {
"test": "echo \"There are no tests\"",
"start": "npm exec tsx src/index.ts",
"lint": "eslint ./src/",
"format": "prettier --write ./src/",
"build": "tsc",
"dev": "npm exec tsx --watch src/index.ts"
},
"author": "LlamaIndex",
"license": "MIT",
"dependencies": {
"cli-markdown": "^3.5.1",
"consola": "^3.4.2",
"figlet": "^1.8.2",
"llama-cloud-services": "file:../../ts/llama_cloud_services",
"marked": "^15.0.12",
"marked-terminal": "^7.3.0",
"picocolors": "^1.1.1"
},
"devDependencies": {
"@eslint/js": "^9.32.0",
"@types/figlet": "^1.7.0",
"@types/marked-terminal": "^6.1.1",
"@types/node": "^24.2.0",
"eslint": "^9.32.0",
"globals": "^16.3.0",
"jiti": "^2.5.1",
"prettier": "^3.6.2",
"typescript": "^5.9.2",
"typescript-eslint": "^8.39.0"
}
}
+47
View File
@@ -0,0 +1,47 @@
import { LlamaExtract, ExtractConfig } from "llama-cloud-services";
import cliMarkdown from "cli-markdown";
import { logger } from "./logger";
import pc from "picocolors";
import { consoleInput, renderLogo } from "./utils";
import { dataSchema } from "./schema";
import { renderMarkdown, ResearchData } from "./markdown";
export async function main(): Promise<number> {
const extractClient = new LlamaExtract(
process.env.LLAMA_CLOUD_API_KEY!,
"https://api.cloud.llamaindex.ai",
);
await renderLogo();
logger.log(
`Welcome to ${pc.bold(
pc.magentaBright("LlamaExtract Demo✨"),
)}, our demo for ${pc.bold(pc.green("LlamaExtract"))}, a ${pc.bold(
pc.cyan("LlamaCloud☁️"),
)} (https://cloud.llamaindex.ai) product!.\nIn this demo we are going to try extracting relevant information ${pc.bold(
pc.yellowBright("from scientific papers"),
)}. Type the path to the paper you would like to process below👇\nIf you wish to exit, just type ${pc.bold(
pc.gray("quit"),
)}.\n`,
);
while (true) {
const userInput = await consoleInput();
if (userInput.toLowerCase() == "quit") {
break;
}
try {
const generatedData = await extractClient.extract(
dataSchema,
{} as ExtractConfig,
userInput,
);
const research = renderMarkdown(generatedData?.data as ResearchData); // Added await here
logger.log(`${pc.bold(pc.cyan("Extracted information:✨"))}:\n`);
logger.log(cliMarkdown(research));
} catch (error) {
logger.error(`Error processing file: ${error}`);
}
}
return 0;
}
main().catch(console.error);
+8
View File
@@ -0,0 +1,8 @@
import { createConsola } from "consola";
import type { ConsolaInstance } from "consola";
export const logger: ConsolaInstance = createConsola({
formatOptions: {
date: false,
},
});
+172
View File
@@ -0,0 +1,172 @@
type Author = {
name: string;
affiliation?: string;
email?: string;
};
type Methodology = {
approach?: string;
participants?: string;
methods?: string[];
};
type Result = {
finding?: string;
significance?: string;
supportingData?: string;
};
type Reference = {
title: string;
authors: string;
year?: string;
relevance?: string;
};
type Discussion = {
implications?: string[];
limitations?: string[];
futureWork?: string[];
};
type Publication = {
journal?: string;
year: string;
doi?: string;
url?: string;
};
export type ResearchData = {
title: string;
authors: Author[];
abstract: string;
keywords?: string[];
mainFindings: string[];
methodology?: Methodology;
results?: Result[];
discussion?: Discussion;
references?: Reference[];
publication?: Publication;
};
export function renderMarkdown(data: ResearchData): string {
const {
title,
authors,
abstract,
keywords,
mainFindings,
methodology,
results,
discussion,
references,
publication,
} = data;
const md: string[] = [];
md.push(`# ${title}\n`);
// Authors
md.push(`## Authors`);
md.push(
authors
.map(
(author) =>
`- **${author.name}**${
author.affiliation ? `, *${author.affiliation}*` : ""
}${author.email ? ` (${author.email})` : ""}`,
)
.join("\n"),
);
// Abstract
md.push(`\n## Abstract\n${abstract}`);
// Keywords
if (keywords && keywords.length > 0) {
md.push(`\n## Keywords\n${keywords.map((k) => `- ${k}`).join("\n")}`);
}
// Main Findings
md.push(
`\n## Main Findings\n${mainFindings.map((f) => `- ${f}`).join("\n")}`,
);
// Methodology
if (methodology) {
md.push(`\n## Methodology`);
if (methodology.approach) md.push(`**Approach:** ${methodology.approach}`);
if (methodology.participants)
md.push(`**Participants:** ${methodology.participants}`);
if (methodology.methods?.length) {
md.push(
`**Methods:**\n${methodology.methods.map((m) => `- ${m}`).join("\n")}`,
);
}
}
// Results
if (results?.length) {
md.push(`\n## Results`);
results.forEach((result, i) => {
md.push(`\n### Result ${i + 1}`);
if (result.finding) md.push(`- **Finding:** ${result.finding}`);
if (result.significance)
md.push(`- **Significance:** ${result.significance}`);
if (result.supportingData)
md.push(`- **Supporting Data:** ${result.supportingData}`);
});
}
// Discussion
if (discussion) {
md.push(`\n## Discussion`);
if (discussion.implications?.length) {
md.push(
`### Implications\n${discussion.implications
.map((d) => `- ${d}`)
.join("\n")}`,
);
}
if (discussion.limitations?.length) {
md.push(
`### Limitations\n${discussion.limitations
.map((d) => `- ${d}`)
.join("\n")}`,
);
}
if (discussion.futureWork?.length) {
md.push(
`### Future Work\n${discussion.futureWork
.map((d) => `- ${d}`)
.join("\n")}`,
);
}
}
// References
if (references?.length) {
md.push(`\n## References`);
references.forEach((ref, i) => {
md.push(
`\n**[${i + 1}]** ${ref.title} — *${ref.authors}*${
ref.year ? ` (${ref.year})` : ""
}`,
);
if (ref.relevance) md.push(`> ${ref.relevance}`);
});
}
// Publication Info
if (publication) {
md.push(`\n## Publication`);
if (publication.journal) md.push(`- **Journal:** ${publication.journal}`);
if (publication.year) md.push(`- **Year:** ${publication.year}`);
if (publication.doi) md.push(`- **DOI:** ${publication.doi}`);
if (publication.url)
md.push(`- **URL:** [${publication.url}](${publication.url})`);
}
return md.join("\n");
}
+169
View File
@@ -0,0 +1,169 @@
export const dataSchema = {
type: "object",
required: ["title", "authors", "abstract", "mainFindings"],
properties: {
title: {
type: "string",
description: "The full title of the research paper",
},
authors: {
type: "array",
description: "List of all authors of the paper",
items: {
type: "object",
properties: {
name: {
type: "string",
description: "Full name of the author",
},
affiliation: {
type: "string",
description:
"Institution or organization the author is affiliated with",
},
email: {
type: "string",
description: "Contact email of the author if provided",
},
},
},
},
abstract: {
type: "string",
description: "Complete abstract or summary of the paper",
},
keywords: {
type: "array",
description:
"Key terms and phrases that describe the paper's main topics",
items: {
type: "string",
},
},
mainFindings: {
type: "array",
description: "Key findings, conclusions, or contributions of the paper",
items: {
type: "string",
},
},
methodology: {
type: "object",
description: "Research methods and approaches used",
properties: {
approach: {
type: "string",
description: "Overall research approach or study design",
},
participants: {
type: "string",
description: "Description of study participants or data sources",
},
methods: {
type: "array",
description: "Specific methods, techniques, or tools used",
items: {
type: "string",
},
},
},
},
results: {
type: "array",
description: "Main results and outcomes of the research",
items: {
type: "object",
properties: {
finding: {
type: "string",
description: "Description of the specific result or finding",
},
significance: {
type: "string",
description:
"Statistical significance or importance of the finding",
},
supportingData: {
type: "string",
description: "Relevant statistics, measurements, or data points",
},
},
},
},
discussion: {
type: "object",
properties: {
implications: {
type: "array",
description: "Theoretical or practical implications of the findings",
items: {
type: "string",
},
},
limitations: {
type: "array",
description: "Study limitations or constraints",
items: {
type: "string",
},
},
futureWork: {
type: "array",
description: "Suggested future research directions",
items: {
type: "string",
},
},
},
},
references: {
type: "array",
description:
"Key papers cited that are crucial to understanding this work",
items: {
type: "object",
properties: {
title: {
type: "string",
description: "Title of the cited paper",
},
authors: {
type: "string",
description: "Authors of the cited paper",
},
year: {
type: "string",
description: "Publication year",
},
relevance: {
type: "string",
description: "Why this reference is important to the current paper",
},
},
required: ["title", "authors"],
},
},
publication: {
type: "object",
properties: {
journal: {
type: "string",
description: "Name of the journal or conference",
},
year: {
type: "string",
description: "Year of publication",
},
doi: {
type: "string",
description: "Digital Object Identifier (DOI) of the paper",
},
url: {
type: "string",
description: "URL where the paper can be accessed",
},
},
required: ["year"],
},
},
};
+4
View File
@@ -0,0 +1,4 @@
declare module "cli-markdown" {
function cliMarkdown(input: string): string;
export default cliMarkdown;
}
+33
View File
@@ -0,0 +1,33 @@
import * as readline from "readline/promises";
import figlet from "figlet";
import pc from "picocolors";
export async function renderLogo(): Promise<void> {
const logoText = figlet.textSync("Extract Demo", {
font: "ANSI Shadow",
horizontalLayout: "default",
verticalLayout: "default",
width: 100,
whitespaceBreak: true,
});
// Add some styling with picocolors
const styledLogo = pc.bold(pc.redBright(logoText));
// Add some padding/margin
console.log("\n");
console.log(styledLogo);
console.log(pc.gray("─".repeat(60)));
console.log("\n");
}
export async function consoleInput(): Promise<string> {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const answer = await rl.question("Path to your file: ");
rl.close();
return answer;
}
+131
View File
@@ -0,0 +1,131 @@
# LlamaCloud Index Demo
A TypeScript demo application showcasing the power of **LlamaCloud Index** - a fully automated document ingestion and retrieval serviced offered within [LlamaCloud](https://cloud.llamaindex.ai). This demo allows you to ask questions, retrieve relevant contextual information and generate AI-powered responses using OpenAI's GPT models.
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Start the Demo](#start-the-demo)
- [Development Mode](#development-mode)
- [Build the Project](#build-the-project)
- [Code Quality](#code-quality)
- [Quick Commands Reference](#quick-commands-reference)
- [How It Works](#how-it-works)
- [API Dependencies](#api-dependencies)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [License](#license)
- [Contributing](#contributing)
## Features
- 🤖 **RAG**: Simple-yet-effective Retrieval Augmented Generation pipeline built on top of LlamaCloud Index and OpenAI
- 🎨 **Beautiful CLI**: Styled console interface with colors and ASCII art
-**Fast Development**: Hot reload support with watch mode
- 🛠️ **TypeScript**: Full TypeScript support with strict type checking
## Prerequisites
- Node.js (version 18 or higher)
- pnpm package manager
- OpenAI API key
- LlamaCloud API key
- An existing LlamaCloud Index pipeline
## Installation
1. Clone the repository:
```bash
git clone https://github.com/run-llama/llama_cloud_services
cd lama_cloud_services/examples-ts/index/
```
2. Install dependencies:
```bash
pnpm install
```
3. Set up your environment variables:
```bash
export OPENAI_API_KEY="your-openai-api-key"
export LLAMA_CLOUD_API_KEY="your-llamacloud-api-key"
export PIPELINE_NAME="your-pipeline-name"
```
4. Or write them into a `.env` file:
```env
OPENAI_API_KEY="your-openai-api-key"
LLAMA_CLOUD_API_KEY="your-llamacloud-api-key"
PIPELINE_NAME="your-pipeline-name"
```
## Usage
### Start the Demo
```bash
pnpm run start
```
The application will display a welcome screen and prompt you to start chatting!
### Development Mode
For development with hot reload:
```bash
pnpm run dev
```
### Build the Project
```bash
pnpm run build
```
### Code Quality
Format code:
```bash
pnpm run format
```
Lint code:
```bash
pnpm run lint
```
## How It Works
1. **Message Input**: Enter a message
2. **Retrieval**: Several nodes are retrieved from the LlamaCloud index you specified
3. **AI Response Generation**: The retrieved information is passed on to the AI model, along with its relevance score, and a reply to your original message is generated starting from that.
4. **Results**: View the AI-generated summary in your terminal
## Troubleshooting
### Common Issues
1. **Module Resolution Errors**: Ensure you're using Node.js 18+ and have all dependencies installed
2. **API Key Issues**: Verify your OpenAI and LlamaCloud API keys are correctly set
## License
MIT License - see the [LICENSE](../../LICENSE) file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run `pnpm run format` and `pnpm run lint`
5. Submit a pull request
+15
View File
@@ -0,0 +1,15 @@
import js from "@eslint/js";
import globals from "globals";
import tseslint from "typescript-eslint";
import { defineConfig } from "eslint/config";
export default defineConfig([
{
files: ["**/*.{js,mjs,cjs,ts,mts,cts}"],
plugins: { js },
extends: ["js/recommended"],
languageOptions: { globals: globals.browser },
},
{ files: ["**/*.js"], languageOptions: { sourceType: "script" } },
tseslint.configs.recommended,
]);
+48
View File
@@ -0,0 +1,48 @@
{
"name": "llama-chat",
"version": "0.1.0",
"description": "Demo for LlamaCloud Index in TypeScript",
"type": "module",
"main": "index.js",
"scripts": {
"test": "echo \"There are no tests\"",
"start": "pnpm exec tsx src/index.ts",
"lint": "eslint ./src/",
"format": "prettier --write ./src/",
"build": "tsc",
"dev": "pnpm exec tsx --watch src/index.ts"
},
"keywords": [
"ai",
"rag",
"retrieval",
"pipeline",
"llms",
"chatbot"
],
"author": "LlamaIndex",
"license": "MIT",
"packageManager": "pnpm@10.12.4",
"devDependencies": {
"@eslint/js": "^9.32.0",
"@types/figlet": "^1.7.0",
"@types/node": "^24.1.0",
"@typescript-eslint/eslint-plugin": "^8.38.0",
"@typescript-eslint/parser": "^8.38.0",
"eslint": "^9.32.0",
"globals": "^16.3.0",
"jiti": "^2.5.1",
"prettier": "^3.6.2",
"typescript": "^5.8.3",
"typescript-eslint": "^8.38.0"
},
"dependencies": {
"@ai-sdk/openai": "^1.3.23",
"ai": "^4.3.19",
"consola": "^3.4.2",
"dotenv": "^17.2.1",
"figlet": "^1.8.2",
"llama-cloud-services": "link:../../ts/llama_cloud_services",
"picocolors": "^1.1.1"
}
}
+1770
View File
File diff suppressed because it is too large Load Diff
+48
View File
@@ -0,0 +1,48 @@
import { LlamaCloudIndex } from "llama-cloud-services";
import { logger } from "./logger";
import pc from "picocolors";
import {
consoleInput,
retrievalAugmentedGeneration,
renderLogo,
} from "./utils";
import dotenv from "dotenv";
dotenv.config();
export async function main(): Promise<number> {
const index = new LlamaCloudIndex({
name: process.env.PIPELINE_NAME as string,
projectName: "Default",
apiKey: process.env.LLAMA_CLOUD_API_KEY, // can provide API-key in the constructor or in the env
});
const retriever = index.asRetriever({
similarityTopK: 5,
});
await renderLogo();
logger.log(
`Welcome to ${pc.bold(
pc.magentaBright("✨LlamaChat✨"),
)}, our demo for ${pc.bold(pc.green("Index🦙"))}, a ${pc.bold(
pc.cyan("LlamaCloud☁️"),
)} (https://cloud.llamaindex.ai) product!.\nType a question below, and you will get an answer!👇\nIf you wish to exit, just type ${pc.bold(
pc.gray("quit"),
)}.\n`,
);
while (true) {
const userInput = await consoleInput();
if (userInput.toLowerCase() == "quit") {
break;
}
try {
const nodes = await retriever.retrieve(userInput);
const summary = await retrievalAugmentedGeneration(nodes, userInput);
logger.log(`${pc.bold(pc.magentaBright("LlamaChat✨:"))}\n${summary}`);
} catch (error) {
logger.error(`Error processing your request: ${error}`);
}
}
return 0;
}
main().catch(console.error);
+8
View File
@@ -0,0 +1,8 @@
import { createConsola } from "consola";
import type { ConsolaInstance } from "consola";
export const logger: ConsolaInstance = createConsola({
formatOptions: {
date: false,
},
});
+56
View File
@@ -0,0 +1,56 @@
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { NodeWithScore, MetadataMode } from "llamaindex";
import * as readline from "readline/promises";
import figlet from "figlet";
import pc from "picocolors";
export async function renderLogo(): Promise<void> {
const logoText = figlet.textSync("LlamaChat", {
font: "ANSI Shadow",
horizontalLayout: "default",
verticalLayout: "default",
width: 100,
whitespaceBreak: true,
});
// Add some styling with picocolors
const styledLogo = pc.bold(pc.yellowBright(logoText));
// Add some padding/margin
console.log("\n");
console.log(styledLogo);
console.log(pc.gray("─".repeat(60)));
console.log("\n");
}
export async function consoleInput(): Promise<string> {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const answer = await rl.question(pc.cyanBright("You✨:"));
rl.close();
return answer;
}
export async function retrievalAugmentedGeneration(
nodes: NodeWithScore[],
prompt: string,
): Promise<string> {
let mainText: string = "";
for (const node of nodes) {
mainText += `\t{information: '${node.node.getContent(
MetadataMode.ALL,
)}', relevanceScore: '${node.score ?? "no score"}'}\n`;
}
const { text } = await generateText({
model: openai("gpt-4.1"),
prompt: `[\n${mainText}\n]\n\nBased on the information you are given and on the relevance score of that (where -1 means no score available), answer to this user prompt: '${prompt}'`,
});
return text;
}
+22
View File
@@ -0,0 +1,22 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"lib": ["ES2022"],
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"types": ["node"],
"moduleResolution": "bundler",
"allowSyntheticDefaultImports": true,
"resolveJsonModule": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
+124
View File
@@ -0,0 +1,124 @@
# LlamaParse Demo
A TypeScript demo application showcasing the power of **LlamaParse** - an intelligent document parsing service from [LlamaCloud](https://cloud.llamaindex.ai). This demo allows you to parse various document formats and generate AI-powered summaries using OpenAI's GPT models.
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Start the Demo](#start-the-demo)
- [Development Mode](#development-mode)
- [Build the Project](#build-the-project)
- [Code Quality](#code-quality)
- [Quick Commands Reference](#quick-commands-reference)
- [How It Works](#how-it-works)
- [API Dependencies](#api-dependencies)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [License](#license)
- [Contributing](#contributing)
## Features
- 📄 **Document Parsing**: Parse PDFs, Word docs, and other formats using LlamaParse
- 🤖 **AI Summaries**: Generate intelligent summaries using OpenAI GPT-4
- 🎨 **Beautiful CLI**: Styled console interface with colors and ASCII art
-**Fast Development**: Hot reload support with watch mode
- 🛠️ **TypeScript**: Full TypeScript support with strict type checking
## Prerequisites
- Node.js (version 18 or higher)
- pnpm package manager
- OpenAI API key
- LlamaCloud API key
## Installation
1. Clone the repository:
```bash
git clone https://github.com/run-llama/llama_cloud_services
cd lama_cloud_services/examples-ts/parse/
```
2. Install dependencies:
```bash
pnpm install
```
3. Set up your environment variables:
```bash
# Add your API keys to your environment
export OPENAI_API_KEY="your-openai-api-key"
export LLAMA_CLOUD_API_KEY="your-llamacloud-api-key"
```
## Usage
### Start the Demo
```bash
pnpm run start
```
The application will display a welcome screen and prompt you to enter the path to a document you'd like to process.
### Development Mode
For development with hot reload:
```bash
pnpm run dev
```
### Build the Project
```bash
pnpm run build
```
### Code Quality
Format code:
```bash
pnpm run format
```
Lint code:
```bash
pnpm run lint
```
## How It Works
1. **Document Input**: Enter the path to your document when prompted
2. **Parsing**: LlamaParse processes the document and extracts structured content
3. **AI Summary**: The extracted content is sent to OpenAI GPT-4 for summarization
4. **Results**: View the AI-generated summary in your terminal
## Troubleshooting
### Common Issues
1. **Module Resolution Errors**: Ensure you're using Node.js 18+ and have all dependencies installed
2. **API Key Issues**: Verify your OpenAI and LlamaCloud API keys are correctly set
3. **File Path Errors**: Use absolute paths or ensure relative paths are correct from the project root
## License
MIT License - see the [LICENSE](../../LICENSE) file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run `pnpm run format` and `pnpm run lint`
5. Submit a pull request
Binary file not shown.
+15
View File
@@ -0,0 +1,15 @@
import js from "@eslint/js";
import globals from "globals";
import tseslint from "typescript-eslint";
import { defineConfig } from "eslint/config";
export default defineConfig([
{
files: ["**/*.{js,mjs,cjs,ts,mts,cts}"],
plugins: { js },
extends: ["js/recommended"],
languageOptions: { globals: globals.browser },
},
{ files: ["**/*.js"], languageOptions: { sourceType: "script" } },
tseslint.configs.recommended,
]);
+47
View File
@@ -0,0 +1,47 @@
{
"name": "llamaparse-demo",
"version": "0.1.0",
"description": "Demo for LlamaParse in TypeScript",
"type": "module",
"main": "index.js",
"scripts": {
"test": "echo \"There are no tests\"",
"start": "pnpm exec tsx src/index.ts",
"lint": "eslint ./src/",
"format": "prettier --write ./src/",
"build": "tsc",
"dev": "pnpm exec tsx --watch src/index.ts"
},
"keywords": [
"ai",
"ocr",
"parsing",
"intelligent-document-processing",
"pdf",
"llms"
],
"author": "LlamaIndex",
"license": "MIT",
"packageManager": "pnpm@10.12.4",
"devDependencies": {
"@eslint/js": "^9.32.0",
"@types/figlet": "^1.7.0",
"@types/node": "^24.1.0",
"@typescript-eslint/eslint-plugin": "^8.38.0",
"@typescript-eslint/parser": "^8.38.0",
"eslint": "^9.32.0",
"globals": "^16.3.0",
"jiti": "^2.5.1",
"prettier": "^3.6.2",
"typescript": "^5.8.3",
"typescript-eslint": "^8.38.0"
},
"dependencies": {
"@ai-sdk/openai": "^1.3.23",
"ai": "^4.3.19",
"consola": "^3.4.2",
"figlet": "^1.8.2",
"llama-cloud-services": "link:../../ts/llama_cloud_services",
"picocolors": "^1.1.1"
}
}
+1758
View File
File diff suppressed because it is too large Load Diff
+34
View File
@@ -0,0 +1,34 @@
import { LlamaParseReader } from "llama-cloud-services";
import { logger } from "./logger";
import pc from "picocolors";
import { consoleInput, generateSummary, renderLogo } from "./utils";
export async function main(): Promise<number> {
const reader = new LlamaParseReader({ resultType: "markdown" });
await renderLogo();
logger.log(
`Welcome to ${pc.bold(
pc.magentaBright("✨LlamaParse Demo✨"),
)}, our demo for ${pc.bold(pc.green("LlamaParse🦙"))}, a ${pc.bold(
pc.cyan("LlamaCloud☁️"),
)} (https://cloud.llamaindex.ai) product!.\nType the path to the document you would like to process below👇\nIf you wish to exit, just type ${pc.bold(
pc.gray("quit"),
)}.\n`,
);
while (true) {
const userInput = await consoleInput();
if (userInput.toLowerCase() == "quit") {
break;
}
try {
const documents = await reader.loadData(userInput);
const summary = await generateSummary(documents); // Added await here
logger.log(`${pc.bold(pc.cyan("AI-generated summary✨"))}:\n${summary}`);
} catch (error) {
logger.error(`Error processing file: ${error}`);
}
}
return 0;
}
main().catch(console.error);
+8
View File
@@ -0,0 +1,8 @@
import { createConsola } from "consola";
import type { ConsolaInstance } from "consola";
export const logger: ConsolaInstance = createConsola({
formatOptions: {
date: false,
},
});
+51
View File
@@ -0,0 +1,51 @@
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { Document } from "llamaindex";
import * as readline from "readline/promises";
import figlet from "figlet";
import pc from "picocolors";
export async function renderLogo(): Promise<void> {
const logoText = figlet.textSync("LlamaParse Demo", {
font: "ANSI Shadow",
horizontalLayout: "default",
verticalLayout: "default",
width: 100,
whitespaceBreak: true,
});
// Add some styling with picocolors
const styledLogo = pc.bold(pc.magentaBright(logoText));
// Add some padding/margin
console.log("\n");
console.log(styledLogo);
console.log(pc.gray("─".repeat(60)));
console.log("\n");
}
export async function consoleInput(): Promise<string> {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const answer = await rl.question("Path to your file: ");
rl.close();
return answer;
}
export async function generateSummary(documents: Document[]): Promise<string> {
let mainText: string = "";
for (const document of documents) {
mainText += `${document.text}\n\n---\n\n`;
}
const { text } = await generateText({
model: openai("gpt-4.1"),
prompt: `</chat>\n\t<text>${mainText}</text>\n\t<instructions>Could you please generate a summary of the given text?</instructions>\n</chat>`,
});
return text;
}
+22
View File
@@ -0,0 +1,22 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"lib": ["ES2022"],
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"types": ["node"],
"moduleResolution": "bundler",
"allowSyntheticDefaultImports": true,
"resolveJsonModule": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
+19
View File
@@ -0,0 +1,19 @@
# LlamaCloud Services Examples - Python
> **⚠️ DEPRECATION NOTICE**
>
> This repository and its packages are deprecated and will be maintained until **May 1, 2026**.
>
> **Please migrate to the new packages:**
> - **Python**: `pip install llama-cloud>=1.0` ([GitHub](https://github.com/run-llama/llama-cloud-py))
> - **TypeScript**: `npm install @llamaindex/llama-cloud` ([GitHub](https://github.com/run-llama/llama-cloud-ts))
>
> The new packages provide the same functionality with improved performance, better support, and active development.
In this folder you will find several python notebooks that contain examples regarding:
- [LlamaParse](./parse/)
- [LlamaExtract](./extract/)
- [LlamaCloudIndex](./index/)
Follow the instructions in each notebook to get started!
+1
View File
@@ -0,0 +1 @@
sample_files/
@@ -0,0 +1,815 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cell-0",
"metadata": {},
"source": [
"# Batch Parse with LlamaCloud Directories\n",
"\n",
"This notebook demonstrates how to use LlamaCloud's batch processing API to parse multiple files in a directory. The workflow includes:\n",
"\n",
"1. **Creating a Directory** - Set up a directory to organize your files\n",
"2. **Uploading Files** - Upload multiple files to the directory\n",
"3. **Starting a Batch Parse Job** - Kick off batch processing on all files\n",
"4. **Monitoring Progress** - Check the status and view results\n",
"\n",
"This is useful when you need to parse many documents at once, as the batch API handles the orchestration and provides progress tracking."
]
},
{
"cell_type": "markdown",
"id": "0c2b5e1a",
"metadata": {},
"source": [
"> **⚠️ DEPRECATION NOTICE**>> This example uses the deprecated `llama-cloud-services` package, which will be maintained until **May 1, 2026**.>> **Please migrate to:**> - **Python**: `pip install llama-cloud>=1.0` ([GitHub](https://github.com/run-llama/llama-cloud-py))> - **New Package Documentation**: https://docs.cloud.llamaindex.ai/>> The new package provides the same functionality with improved performance and support."
]
},
{
"cell_type": "markdown",
"id": "cell-1",
"metadata": {},
"source": [
"## Setup and Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-2",
"metadata": {},
"outputs": [],
"source": [
"%pip install llama-cloud python-dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-3",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"import httpx\n",
"\n",
"# Load environment variables\n",
"load_dotenv()\n",
"\n",
"# Set your API key\n",
"LLAMA_CLOUD_API_KEY = os.environ.get(\"LLAMA_CLOUD_API_KEY\", \"llx-...\")\n",
"\n",
"# Optional: Set base URL (defaults to https://api.cloud.llamaindex.ai if not set)\n",
"LLAMA_CLOUD_BASE_URL = os.environ.get(\n",
" \"LLAMA_CLOUD_BASE_URL\", \"https://api.cloud.llamaindex.ai\"\n",
")\n",
"\n",
"# Optional: Set project_id if you have one, otherwise it will use your default project\n",
"PROJECT_ID = os.environ.get(\"LLAMA_CLOUD_PROJECT_ID\", None)\n",
"\n",
"print(\"✅ API key configured\")\n",
"print(f\" Base URL: {LLAMA_CLOUD_BASE_URL}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-4",
"metadata": {},
"source": [
"## Setup HTTP Client\n",
"\n",
"Since the current version of the llama-cloud SDK has some issues with the beta endpoints, we'll use direct HTTP requests with httpx for reliability."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-5",
"metadata": {},
"outputs": [],
"source": [
"# Create HTTP client with authentication\n",
"headers = {\n",
" \"Authorization\": f\"Bearer {LLAMA_CLOUD_API_KEY}\",\n",
"}\n",
"\n",
"print(\"✅ HTTP client configured\")\n",
"print(f\" Using base URL: {LLAMA_CLOUD_BASE_URL}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-6",
"metadata": {},
"source": [
"## Step 1: Create a Directory\n",
"\n",
"First, we'll create a directory to organize our files. Directories help you group related files together for batch processing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-7",
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"# Create a directory with a timestamp in the name\n",
"timestamp = datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
"directory_name = f\"batch-parse-demo-{timestamp}\"\n",
"\n",
"# Create directory using HTTP request\n",
"response = httpx.post(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/directories\",\n",
" headers=headers,\n",
" params={\"project_id\": PROJECT_ID},\n",
" json={\n",
" \"name\": directory_name,\n",
" \"description\": \"Demo directory for batch parse example\",\n",
" },\n",
" timeout=60.0,\n",
")\n",
"\n",
"if response.status_code in [200, 201]:\n",
" directory = response.json()\n",
" directory_id = directory[\"id\"]\n",
" project_id = directory[\"project_id\"]\n",
"\n",
" print(f\"✅ Created directory: {directory['name']}\")\n",
" print(f\" Directory ID: {directory_id}\")\n",
" print(f\" Project ID: {project_id}\")\n",
"else:\n",
" raise Exception(\n",
" f\"Failed to create directory: {response.status_code} - {response.text}\"\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "cell-8",
"metadata": {},
"source": [
"## Step 2: Upload Files to the Directory\n",
"\n",
"Now we'll upload some files to our directory. For this demo, we'll download some sample PDFs and upload them.\n",
"\n",
"You can replace these with your own files."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-9",
"metadata": {},
"outputs": [],
"source": [
"# Create a directory for sample files\n",
"import requests\n",
"\n",
"os.makedirs(\"sample_files\", exist_ok=True)\n",
"\n",
"# Sample documents to download\n",
"sample_docs = {\n",
" \"attention.pdf\": \"https://arxiv.org/pdf/1706.03762.pdf\",\n",
" \"bert.pdf\": \"https://arxiv.org/pdf/1810.04805.pdf\",\n",
"}\n",
"\n",
"# Download sample documents\n",
"for filename, url in sample_docs.items():\n",
" filepath = f\"sample_files/{filename}\"\n",
" if not os.path.exists(filepath):\n",
" print(f\"📥 Downloading {filename}...\")\n",
" response = requests.get(url)\n",
" if response.status_code == 200:\n",
" with open(filepath, \"wb\") as f:\n",
" f.write(response.content)\n",
" print(f\" ✅ Downloaded {filename}\")\n",
" else:\n",
" print(f\" ❌ Failed to download {filename}\")\n",
" else:\n",
" print(f\"📁 {filename} already exists\")\n",
"\n",
"print(\"\\n✅ Sample files ready!\")"
]
},
{
"cell_type": "markdown",
"id": "cell-10",
"metadata": {},
"source": [
"### Upload Files to Directory\n",
"\n",
"Now let's upload the files to our directory using the `upload_file_to_directory` endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-11",
"metadata": {},
"outputs": [],
"source": [
"uploaded_files = []\n",
"\n",
"# Workaround: Use direct HTTP requests instead of SDK due to SDK bug\n",
"import httpx\n",
"\n",
"for filename in os.listdir(\"sample_files\"):\n",
" if filename.endswith(\".pdf\"):\n",
" filepath = f\"sample_files/{filename}\"\n",
"\n",
" print(f\"📤 Uploading {filename}...\")\n",
"\n",
" # Upload file using direct HTTP request (SDK has a bug with file uploads)\n",
" with open(filepath, \"rb\") as f:\n",
" # Prepare the multipart form data correctly\n",
" files = {\"upload_file\": (filename, f, \"application/pdf\")}\n",
"\n",
" # Make the request directly\n",
" response = httpx.post(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/directories/{directory_id}/files/upload\",\n",
" params={\"project_id\": project_id},\n",
" files=files,\n",
" headers={\"Authorization\": f\"Bearer {LLAMA_CLOUD_API_KEY}\"},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code in [200, 201]:\n",
" directory_file = response.json()\n",
" uploaded_files.append(directory_file)\n",
" print(f\" ✅ Uploaded: {directory_file.get('display_name')}\")\n",
" print(f\" File ID: {directory_file.get('id')}\")\n",
" else:\n",
" print(f\" ❌ Upload failed: {response.status_code}\")\n",
" print(f\" Error: {response.text[:200]}\")\n",
"\n",
"print(f\"\\n✅ Uploaded {len(uploaded_files)} files to directory\")"
]
},
{
"cell_type": "markdown",
"id": "cell-12",
"metadata": {},
"source": [
"## Step 3: Create a Batch Parse Job\n",
"\n",
"Now that we have files in our directory, let's create a batch parse job to process them all at once.\n",
"\n",
"The batch processing API uses the same configuration as LlamaParse."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-13",
"metadata": {},
"outputs": [],
"source": [
"# Configure the parse job\n",
"# This configuration will apply to all files in the directory\n",
"job_config = {\n",
" \"job_name\": \"parse_raw_file_job\", # Must match the JobNames enum value\n",
" \"partitions\": {},\n",
" \"parameters\": {\n",
" \"type\": \"parse\",\n",
" \"lang\": \"en\",\n",
" \"fast_mode\": True,\n",
" },\n",
"}\n",
"\n",
"print(\"✅ Job configuration created\")\n",
"print(f\" Language: {job_config['parameters']['lang']}\")\n",
"print(f\" Fast mode: {job_config['parameters']['fast_mode']}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-14",
"metadata": {},
"source": [
"### Submit the Batch Job\n",
"\n",
"Now let's submit the batch job to process all files in the directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-15",
"metadata": {},
"outputs": [],
"source": [
"print(f\"🚀 Submitting batch parse job for directory: {directory_id}\")\n",
"print(f\" Processing {len(uploaded_files)} files...\\n\")\n",
"\n",
"# Submit batch job using HTTP request\n",
"response = httpx.post(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/batch-processing\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" json={\n",
" \"directory_id\": directory_id,\n",
" \"job_config\": job_config,\n",
" \"page_size\": 100, # Number of files to fetch per batch\n",
" \"continue_as_new_threshold\": 10, # Workflow continuation threshold\n",
" },\n",
" timeout=60.0,\n",
")\n",
"\n",
"if response.status_code in [200, 201]:\n",
" batch_job = response.json()\n",
" batch_job_id = batch_job[\"id\"]\n",
"\n",
" print(\"✅ Batch job submitted successfully!\")\n",
" print(f\" Batch Job ID: {batch_job_id}\")\n",
" print(f\" Workflow ID: {batch_job.get('workflow_id')}\")\n",
" print(f\" Status: {batch_job.get('status')}\")\n",
" print(f\" Total Items: {batch_job.get('total_items')}\")\n",
"else:\n",
" raise Exception(\n",
" f\"Failed to create batch job: {response.status_code} - {response.text}\"\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "cell-16",
"metadata": {},
"source": [
"## Step 4: Monitor Job Progress\n",
"\n",
"Now let's monitor the batch job progress. We'll poll the status endpoint to see how the job is progressing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-17",
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"\n",
"def print_job_status(status_data):\n",
" \"\"\"Helper function to print job status in a readable format.\"\"\"\n",
" job = status_data[\"job\"]\n",
" progress_pct = status_data[\"progress_percentage\"]\n",
"\n",
" print(f\"\\n{'='*60}\")\n",
" print(f\"Job Status: {job['status']}\")\n",
" print(f\"{'='*60}\")\n",
" print(f\"Total Items: {job['total_items']}\")\n",
" print(f\"Completed: {job['processed_items']}\")\n",
" print(f\"Failed: {job['failed_items']}\")\n",
" print(f\"Skipped: {job['skipped_items']}\")\n",
" print(f\"Progress: {progress_pct:.1f}%\")\n",
"\n",
" if job.get(\"completed_at\"):\n",
" print(f\"Completed At: {job['completed_at']}\")\n",
" elif job.get(\"started_at\"):\n",
" print(f\"Started At: {job['started_at']}\")\n",
"\n",
" print(f\"{'='*60}\")\n",
"\n",
"\n",
"# Poll for status updates\n",
"print(\"🔄 Monitoring batch job progress...\")\n",
"print(\n",
" \"Note: It may take a few seconds for the workflow to initialize and count files.\\n\"\n",
")\n",
"\n",
"max_polls = 60 # Maximum number of status checks (increased for longer jobs)\n",
"poll_interval = 10 # Seconds between checks\n",
"\n",
"for i in range(max_polls):\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/batch-processing/{batch_job_id}\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" status_data = response.json()\n",
" print_job_status(status_data)\n",
"\n",
" # Check if job is complete\n",
" job_status = status_data[\"job\"][\"status\"]\n",
" if job_status in [\"completed\", \"failed\", \"cancelled\"]:\n",
" print(f\"\\n✅ Job finished with status: {job_status}\")\n",
" break\n",
"\n",
" if i < max_polls - 1:\n",
" print(f\"\\n⏳ Waiting {poll_interval} seconds before next check...\")\n",
" time.sleep(poll_interval)\n",
" else:\n",
" print(f\"Error getting status: {response.status_code} - {response.text}\")\n",
" break\n",
"else:\n",
" print(f\"\\n⚠️ Reached maximum polling attempts. Job may still be running.\")"
]
},
{
"cell_type": "markdown",
"id": "cell-18",
"metadata": {},
"source": [
"## Step 5: View Job Items\n",
"\n",
"Let's look at the individual items in the batch job to see which files were processed successfully."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-19",
"metadata": {},
"outputs": [],
"source": [
"# Get all items in the batch job\n",
"response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/batch-processing/{batch_job_id}/items\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id, \"limit\": 100},\n",
" timeout=60.0,\n",
")\n",
"\n",
"if response.status_code == 200:\n",
" items_response = response.json()\n",
"\n",
" print(f\"\\n📋 Batch Job Items ({items_response['total_size']} total)\")\n",
" print(f\"{'='*80}\\n\")\n",
"\n",
" for item in items_response[\"items\"]:\n",
" status_emoji = (\n",
" \"✅\"\n",
" if item[\"status\"] == \"completed\"\n",
" else \"❌\"\n",
" if item[\"status\"] == \"failed\"\n",
" else \"⏳\"\n",
" )\n",
" print(f\"{status_emoji} {item['item_name']}\")\n",
" print(f\" Status: {item['status']}\")\n",
" print(f\" Item ID: {item['item_id']}\")\n",
"\n",
" if item.get(\"error_message\"):\n",
" print(f\" Error: {item['error_message']}\")\n",
"\n",
" print()\n",
"else:\n",
" print(f\"Error listing items: {response.status_code} - {response.text}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-20",
"metadata": {},
"source": [
"## Step 6: Retrieve Processing Results\n",
"\n",
"For each completed file, we can retrieve the processing results to see where the parsed output is stored."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-21",
"metadata": {},
"outputs": [],
"source": [
"# Get processing results for a specific item\n",
"if items_response[\"items\"]:\n",
" first_item = items_response[\"items\"][0]\n",
"\n",
" print(f\"\\n🔍 Processing results for: {first_item['item_name']}\")\n",
" print(f\"{'='*80}\\n\")\n",
"\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/batch-processing/items/{first_item['item_id']}/processing-results\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" results = response.json()\n",
"\n",
" print(f\"Item: {results['item_name']}\")\n",
" print(f\"Total processing runs: {len(results['processing_results'])}\\n\")\n",
"\n",
" for i, result in enumerate(results[\"processing_results\"], 1):\n",
" print(f\"Run {i}:\")\n",
" print(f\" Job Type: {result['job_type']}\")\n",
" print(f\" Processed At: {result['processed_at']}\")\n",
" print(f\" Parameters Hash: {result['parameters_hash']}\")\n",
"\n",
" if result.get(\"output_s3_path\"):\n",
" print(f\" Output S3 Path: {result['output_s3_path']}\")\n",
"\n",
" if result.get(\"output_metadata\"):\n",
" print(f\" Output Metadata: {result['output_metadata']}\")\n",
"\n",
" print()\n",
" else:\n",
" print(f\"Error getting results: {response.status_code} - {response.text}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-22",
"metadata": {},
"source": [
"## Optional: List All Batch Jobs\n",
"\n",
"You can also list all batch jobs in your project to see the history of batch processing operations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-23",
"metadata": {},
"outputs": [],
"source": [
"# List all parse jobs in the project\n",
"response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/beta/batch-processing\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id, \"job_type\": \"parse\", \"limit\": 10},\n",
" timeout=60.0,\n",
")\n",
"\n",
"if response.status_code == 200:\n",
" jobs_response = response.json()\n",
"\n",
" print(f\"\\n📊 Recent Batch Parse Jobs ({jobs_response['total_size']} total)\")\n",
" print(f\"{'='*80}\\n\")\n",
"\n",
" for job in jobs_response[\"items\"]:\n",
" status_emoji = (\n",
" \"✅\"\n",
" if job[\"status\"] == \"completed\"\n",
" else \"❌\"\n",
" if job[\"status\"] == \"failed\"\n",
" else \"⏳\"\n",
" )\n",
" print(f\"{status_emoji} Job ID: {job['id']}\")\n",
" print(f\" Status: {job['status']}\")\n",
" print(f\" Directory: {job['directory_id']}\")\n",
" print(f\" Total Items: {job['total_items']}\")\n",
" print(f\" Completed: {job['processed_items']}\")\n",
" print(f\" Created: {job['created_at']}\")\n",
" print()\n",
"else:\n",
" print(f\"Error listing jobs: {response.status_code} - {response.text}\")"
]
},
{
"cell_type": "markdown",
"id": "uug7591rkq",
"metadata": {},
"source": [
"## Step 7: Retrieve Parsed Text Results\n",
"\n",
"Once the batch job is complete, each BatchJobItem will have a `job_id` field that maps to a parse job ID. We can use this ID with the standard parse client methods to fetch the actual parsed text results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "vpp0vxtc0y",
"metadata": {},
"outputs": [],
"source": [
"# Get all completed items and their job IDs\n",
"completed_items = [\n",
" item for item in items_response[\"items\"] if item[\"status\"] == \"completed\"\n",
"]\n",
"\n",
"print(f\"📄 Found {len(completed_items)} completed items\\n\")\n",
"print(f\"{'='*80}\\n\")\n",
"\n",
"# Display the job_id for each completed item\n",
"for item in completed_items:\n",
" print(f\"📝 {item['item_name']}\")\n",
" print(f\" Item ID: {item['item_id']}\")\n",
" print(f\" Parse Job ID: {item['job_id']}\")\n",
" print()"
]
},
{
"cell_type": "markdown",
"id": "4gck6hwpnl6",
"metadata": {},
"source": [
"### Fetch Parsed Text for a Specific Document\n",
"\n",
"Now let's use the `job_id` to retrieve the actual parsed text content using the parse client methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "g191kvgxxvk",
"metadata": {},
"outputs": [],
"source": [
"# Get the parsed text for the first completed item\n",
"if completed_items:\n",
" first_completed = completed_items[0]\n",
"\n",
" print(f\"📖 Retrieving parsed text for: {first_completed['item_name']}\")\n",
" print(f\" Using Parse Job ID: {first_completed['job_id']}\\n\")\n",
" print(f\"{'='*80}\\n\")\n",
"\n",
" # Use the job_id to fetch the parse result\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/parsing/job/{first_completed['job_id']}/result/text\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" parse_result = response.text\n",
"\n",
" print(f\"✅ Retrieved parsed text ({len(parse_result)} characters)\\n\")\n",
"\n",
" # Display first 1000 characters as a preview\n",
" print(\"Preview (first 1000 characters):\")\n",
" print(\"-\" * 80)\n",
" print(parse_result[:1000])\n",
" print(\"-\" * 80)\n",
"\n",
" if len(parse_result) > 1000:\n",
" print(f\"\\n... and {len(parse_result) - 1000} more characters\")\n",
" else:\n",
" print(\n",
" f\"Error retrieving parse result: {response.status_code} - {response.text}\"\n",
" )\n",
"else:\n",
" print(\"⚠️ No completed items found to retrieve results from\")"
]
},
{
"cell_type": "markdown",
"id": "2olccb4l8fj",
"metadata": {},
"source": [
"### Retrieve Parsed Results in Other Formats\n",
"\n",
"You can also retrieve the parsed results in JSON or Markdown format using different client methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "lcqsfxiw0sr",
"metadata": {},
"outputs": [],
"source": [
"if completed_items:\n",
" first_completed = completed_items[0]\n",
"\n",
" print(\n",
" f\"📋 Retrieving parse results in different formats for: {first_completed['item_name']}\\n\"\n",
" )\n",
"\n",
" # Get as JSON (includes structured data with pages, images, etc.)\n",
" print(\"1️⃣ Retrieving as JSON...\")\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/parsing/job/{first_completed['job_id']}/result/json\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" json_result = response.json()\n",
" print(f\" ✅ JSON result with {len(json_result['pages'])} pages\")\n",
" print(f\" Keys: {list(json_result.keys())}\\n\")\n",
" else:\n",
" print(f\" Error: {response.status_code}\\n\")\n",
"\n",
" # Get as Markdown\n",
" print(\"2️⃣ Retrieving as Markdown...\")\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/parsing/job/{first_completed['job_id']}/result/markdown\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" markdown_result = response.text\n",
" print(f\" ✅ Markdown result ({len(markdown_result)} characters)\\n\")\n",
"\n",
" # Display markdown preview\n",
" print(\"Markdown Preview (first 500 characters):\")\n",
" print(\"-\" * 80)\n",
" print(markdown_result[:500])\n",
" print(\"-\" * 80)\n",
"\n",
" if len(markdown_result) > 500:\n",
" print(f\"\\n... and {len(markdown_result) - 500} more characters\")\n",
" else:\n",
" print(f\" Error: {response.status_code}\")\n",
"else:\n",
" print(\"⚠️ No completed items found to retrieve results from\")"
]
},
{
"cell_type": "markdown",
"id": "lr61wqkfq3",
"metadata": {},
"source": [
"### Batch Process All Parsed Results\n",
"\n",
"You can also loop through all completed items to retrieve and process all the parsed results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "kltydf9xzkl",
"metadata": {},
"outputs": [],
"source": [
"# Process all completed items\n",
"print(f\"🔄 Processing all {len(completed_items)} completed items...\\n\")\n",
"print(f\"{'='*80}\\n\")\n",
"\n",
"all_results = {}\n",
"\n",
"for item in completed_items:\n",
" print(f\"📄 Processing: {item['item_name']}\")\n",
" print(f\" Parse Job ID: {item['job_id']}\")\n",
"\n",
" try:\n",
" # Retrieve the parsed text for this item\n",
" response = httpx.get(\n",
" f\"{LLAMA_CLOUD_BASE_URL}/api/v1/parsing/job/{item['job_id']}/result/text\",\n",
" headers=headers,\n",
" params={\"project_id\": project_id},\n",
" timeout=60.0,\n",
" )\n",
"\n",
" if response.status_code == 200:\n",
" parsed_text = response.text\n",
"\n",
" all_results[item[\"item_name\"]] = {\n",
" \"job_id\": item[\"job_id\"],\n",
" \"text\": parsed_text,\n",
" \"length\": len(parsed_text),\n",
" }\n",
"\n",
" print(f\" ✅ Retrieved {len(parsed_text)} characters\")\n",
" else:\n",
" all_results[item[\"item_name\"]] = {\n",
" \"job_id\": item[\"job_id\"],\n",
" \"error\": f\"HTTP {response.status_code}\",\n",
" }\n",
" print(f\" ❌ Error: HTTP {response.status_code}\")\n",
"\n",
" except Exception as e:\n",
" print(f\" ❌ Error: {str(e)}\")\n",
" all_results[item[\"item_name\"]] = {\"job_id\": item[\"job_id\"], \"error\": str(e)}\n",
"\n",
" print()\n",
"\n",
"print(f\"{'='*80}\")\n",
"print(f\"\\n✅ Processed {len(all_results)} items\")\n",
"print(f\"\\nSummary:\")\n",
"for name, result in all_results.items():\n",
" if \"error\" in result:\n",
" print(f\" ❌ {name}: Error - {result['error']}\")\n",
" else:\n",
" print(f\" ✅ {name}: {result['length']:,} characters\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File diff suppressed because one or more lines are too long
-138
View File
@@ -1,138 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using the Raw API\n",
"\n",
"This notebook walks through how to use the raw API and how"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-02-02 11:11:39-- https://arxiv.org/pdf/1706.03762.pdf\n",
"Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.3.42, 151.101.67.42, ...\n",
"Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 2215244 (2.1M) [application/pdf]\n",
"Saving to: ./attention.pdf\n",
"\n",
"./attention.pdf 100%[===================>] 2.11M --.-KB/s in 0.08s \n",
"\n",
"2024-02-02 11:11:39 (27.3 MB/s) - ./attention.pdf saved [2215244/2215244]\n",
"\n"
]
}
],
"source": [
"!wget \"https://arxiv.org/pdf/1706.03762.pdf\" -O \"./attention.pdf\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"api_key = \"llx-...\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import mimetypes\n",
"import requests\n",
"import time\n",
"\n",
"headers = {\"Authorization\": f\"Bearer {api_key}\"}\n",
"file_path = \"./attention.pdf\"\n",
"base_url = \"https://api.cloud.llamaindex.ai/api/parsing\"\n",
"\n",
"with open(file_path, \"rb\") as f:\n",
" mime_type = mimetypes.guess_type(file_path)[0]\n",
" files = {\"file\": (f.name, f, mime_type)}\n",
"\n",
" # send the request, upload the file\n",
" url = f\"{base_url}/upload\"\n",
" response = requests.post(url, headers=headers, files=files)\n",
"\n",
"response.raise_for_status()\n",
"# get the job id for the result_url\n",
"job_id = response.json()[\"id\"]\n",
"result_type = \"text\" # or \"markdown\"\n",
"result_url = f\"{base_url}/job/{job_id}/result/{result_type}\"\n",
"\n",
"# check for the result until its ready\n",
"while True:\n",
" response = requests.get(result_url, headers=headers)\n",
" if response.status_code == 200:\n",
" break\n",
"\n",
" time.sleep(2)\n",
"\n",
"# download the result\n",
"result = response.json()\n",
"output = result[result_type]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Provided proper attribution is provided, Google hereby grants permission to\n",
" reproduce the tables and figures in this paper solely for use in journalistic or\n",
" scholarly works.\n",
" Attention Is All You Need\n",
"arXiv:1706.03762v7 [cs.CL] 2 Aug 2023\n",
" Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit\n",
" Google Brain Google Brain Google Research Google Research\n",
" avaswani@google.com noam@google.com nikip@google.com usz@google.com\n",
" Llion Jones Aidan N. Gomez † Łukasz Kaiser\n",
" Google Research University of Toronto \n"
]
}
],
"source": [
"print(output[:1000])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama-parse-aNC435Vv-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
-191
View File
@@ -1,191 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LlamaParse Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install llama-index llama-parse"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-02-02 11:10:10-- https://arxiv.org/pdf/1706.03762.pdf\n",
"Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.3.42, 151.101.67.42, ...\n",
"Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 2215244 (2.1M) [application/pdf]\n",
"Saving to: ./attention.pdf\n",
"\n",
"./attention.pdf 100%[===================>] 2.11M --.-KB/s in 0.08s \n",
"\n",
"2024-02-02 11:10:10 (25.9 MB/s) - ./attention.pdf saved [2215244/2215244]\n",
"\n"
]
}
],
"source": [
"!wget \"https://arxiv.org/pdf/1706.03762.pdf\" -O \"./attention.pdf\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio\n",
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()\n",
"\n",
"import os\n",
"os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-...\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Started parsing the file under job_id dd0b8e31-0c09-4497-b78a-cc1c92f1d6cf\n"
]
}
],
"source": [
"from llama_parse import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"text\").load_data(\"./attention.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ad\n",
"relying entirely on an attention mechanism to draw global dependencies between input and output.\n",
"The Transformer allows for significantly more parallelization and can reach a new state of the art in\n",
"translation quality after being trained for as little as twelve hours on eight P100 GPUs.\n",
"2 Background\n",
"The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU\n",
"[16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building\n",
"block, computing hidden representations in parallel for all input and output positions. In these models,\n",
"the number of operations required to relate signals from two arbitrary input or output positions grows\n",
"in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes\n",
"it more difficult to learn dependencies between distant positions [12]. In the Transformer this is\n",
"reduced to a constant number of operations, albeit at the cost of reduced effective res\n"
]
}
],
"source": [
"print(documents[0].text[6000:7000])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Started parsing the file under job_id d4531453-1bbb-48c4-8324-ae9fea9f2fa2\n"
]
}
],
"source": [
"from llama_parse import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"markdown\").load_data(\"./attention.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ction describes the training regime for our models.\n",
"\n",
"##### Training Data and Batching\n",
"\n",
"We trained on the standard WMT 2014 English-German dataset consisting of about 4.5 million\n",
"sentence pairs. Sentences were encoded using byte-pair encoding [3], which has a shared source-\n",
"target vocabulary of about 37000 tokens. For English-French, we used the significantly larger WMT\n",
"2014 English-French dataset consisting of 36M sentences and split tokens into a 32000 word-piece\n",
"vocabulary [38]. Sentence pairs were batched together by approximate sequence length. Each training\n",
"batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000\n",
"target tokens.\n",
"\n",
"##### Hardware and Schedule\n",
"\n",
"We trained our models on one machine with 8 NVIDIA P100 GPUs. For our base models using\n",
"the hyperparameters described throughout the paper, each training step took about 0.4 seconds. We\n",
"trained the base models for a total of 100,000 steps or 12 hours. For our big models,(described on the\n",
"bo...\n"
]
}
],
"source": [
"print(documents[0].text[20000:21000] + \"...\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "llama-parse-aNC435Vv-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
File diff suppressed because it is too large Load Diff
Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 MiB

File diff suppressed because one or more lines are too long
@@ -0,0 +1,10 @@
# Financial Modeling Assumptions
Discount Rate: 8%
Terminal Growth Rate: 2%
Tax Rate: 25%
Revenue Growth (Years 1-5): 10% per annum
Revenue Growth (Years 6-10): 5% per annum
Capital Expenditures as % of Revenue: 7%
Working Capital Assumption: 3% of Revenue
Depreciation Rate: 10% per annum
Cost of Capital Assumption: 8%
Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

@@ -0,0 +1 @@
sec_form_4_dump.json
File diff suppressed because it is too large Load Diff
Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 440 KiB

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

After

Width:  |  Height:  |  Size: 156 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 893 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 287 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 769 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 942 KiB

Some files were not shown because too many files have changed in this diff Show More