delta-rs

mirror of https://github.com/langchain-ai/delta-rs.git synced 2026-07-01 20:34:35 -04:00

Author	SHA1	Message	Date
Sam Wright	dc9c51b878	doc: content and formatting improvements to API docs Signed-off-by: Sam Wright <samuel@plaindocs.com>	2026-03-24 09:19:46 -04:00
Ethan Urbanski	08222ea17b	fix: preserve recent tombstones in full vacuum Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-03-21 09:26:16 -07:00
R. Tyler Croy	7437c1e9d2	fix: rely on u64 for all references to delta versions There never has been a valid negative version in the Delta protocol. I'm not sure why this was even here as i64. Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-03-21 08:40:41 -07:00
Ethan Urbanski	481fddeffd	chore: tighten duplicate argument validation in create() and vacuum() (#4299 ) # Description Tighten the Python compatibility handling in `DeltaTable.create()` and `DeltaTable.vacuum()` ensuring duplicate values are rejected when legacy positional arguments are mixed with keywords. # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-03-20 10:31:59 +01:00
Bhavana Sundar	c85075e427	fix(python): make commit control args keyword-only across mutating APIs (#4283 ) # Description Added `deprecate_positional_commit_args` to `_util.py` as a helper that preserves legacy positional behavior per method, emits a DeprecationWarning, rejects invalid usage, and normalizes to canonical commit_properties-first order. Wired into all public mutating APIs across `table.py`, `writer/convert_to.py`, and `transaction.py`. `write_deltalake` params reordered to canonical order (already keyword-only, no shim needed). `restore` left untouched (already keyword-only). Unit tests added in `tests/test_util.py`. # Related Issue(s) - closes #4252 Notes: - `_internal.pyi stubs are already in canonical order - `create()` and `vacuum()` handle legacy positional args inline (they had extra trailing params beyond the commit args, so the shared helper is bypassed for those two) Follow-up PR will make all APIs keyword-only and remove the compatibility path AI disclosure: I used Claude as a coding assistant to help map out affected methods. I reviewed every modification, ran the full test suite locally, and understand each change made. --------- Signed-off-by: Bhavana Sundar <bhavana7899@gmail.com> Co-authored-by: Ethan Urbanski <ethanurbanski@gmail.com>	2026-03-18 23:49:12 -04:00
Khalid Mammadov	87405d48c7	fix(tests): replace tempfile with tmp_path in date partitioned table test (#4297 ) # Description This test fails on local when run repeatedly. `tempfile.gettempdir()` always returns the same path and cauases test to fail on second run. Existing fixture for `tmp_path` returns unique path for each test # Related Issue(s) # Documentation NA Signed-off-by: Khalid Mammadov <khalidmammadov9@gmail.com>	2026-03-18 18:32:41 +00:00
Ethan Urbanski	f15752b611	fix merge schema evolution Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-03-17 17:04:25 -04:00
Ethan Urbanski	db87e212c5	fix: make keep_versions order independent Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-03-16 18:12:30 -04:00
Ion Koutsouris	5871decd6b	chore: bump version (#4277 ) # Description The description of the main changes of your pull request # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-03-12 11:46:10 -04:00
Abhi Agarwal	c3fbf38e99	Fix python type signatures Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>	2026-03-11 07:59:32 -07:00
Abhi Agarwal	16f9048c6d	type signature Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>	2026-03-11 07:59:32 -07:00
Abhi Agarwal	c8ce0d0cce	Make write with `None` snapshot write to default size Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>	2026-03-11 07:59:32 -07:00
Abhi Agarwal	0126a0126e	Move to u64 to represent file sizes Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>	2026-03-11 07:59:32 -07:00
Abhi Agarwal	7bd28f5370	Consolidate `target_file_size` and allow unbounded writes Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>	2026-03-11 07:59:32 -07:00
Florian Valeye	49865edaf8	fix: add a central arrow delta type normalization (#4254 ) # Description When we write data with unsupported Arrow types (`Date64`, `Timestamp(ns)`), writes either fail with confusing kernel errors or produce tables with incompatible schemas. The goal was to centralize the normalization (including the opportunity to refactor `convert_to_delta`) into `normalize_for_delta` to convert unsupported Arrow types to their Delta-compatible equivalents before writing. This matches the Delta protocol specification and is consistent with how Spark handles these types. # Related Issue(s) - Fixes #3877 - Fixes #1721 ## Types conversion \| Arrow type \| Delta-compatible type \| \|---\|---\| \| `Date64` \| `Date32` \| \| `Timestamp(s/ms/ns, tz)` \| `Timestamp(us, tz)` \| Nested types (Struct, List, FixedSizeList, Map) are normalized recursively. --------- Signed-off-by: Florian Valeye <florian.valeye@gmail.com>	2026-03-10 14:46:48 +00:00
Ethan Urbanski	675936ae44	fix: coerce decimal literals in target subset filters (#4267 ) # Description Found this while working on #4266. Merge target subset filters can retain decimal precision/scale from the source expression instead of the target schema. For example `decimal(4, 1)` when the column is `decimal(6, 1)`. The newer file skipping path rejects the mismatch. This fix is normalize `target_subset_filter` against the target schema before simplification, and extend literal coercion to handle between bounds. # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-03-10 10:03:02 +01:00
R. Tyler Croy	e6ffb54e84	fix: create_write_transaction works again, now with 100% more coverage During some experimentation somwewhere along the line the actual implementation was commented out. This wasn't caught in CI it seems because we have no test coverage of the function! Welp, now we do! Fixes #4126 Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-03-08 13:04:02 -07:00
R. Tyler Croy	9a1cff93fc	WIP: testing something in CI Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-03-07 10:36:43 -08:00
Manish Sogiyawar	7a378c210b	feat(python): add post_commithook_properties to alter metadata apis (#4249 ) # Description adds `post_commithook_properties` support to the missing `table.alter` metadata methods, so cleanup and checkpoint creation behavior can be controlled consistently via `cleanup_expired_logs` and `create_checkpoint` respectively ## Validation - `make tests` - `cargo check -p deltalake-python` - `cargo fmt --all --check` Signed-off-by: vsmanish1772 <smanish1772@gmail.com>	2026-03-06 11:58:51 +00:00
anshulbaliga7	f0bbccf45f	docs: fix several nit issues in docs	2026-03-01 09:50:59 -08:00
Avinash Changrani	dd3b1aabe1	feat: use batched delete_stream in delete_dir - Refactor delete_dir to delete objects via ObjectStore::delete_stream for improved performance. - Add tests for nested deletes, sibling-prefix safety, and missing-prefix no-op behavior. Signed-off-by: Avinash Changrani <avinashchangrani99@gmail.com>	2026-03-01 09:01:04 -08:00
Ethan Urbanski	8e227bcc78	fix: pad DV keep masks to numRecords (#4236 ) # Description Fixes #4235 - `DeltaTable.deletion_vectors()` returned truncated selection vectors when the highest deleted row index was below the file's total row count. Kernel returns a sparse DV mask (up to highest deleted index). The api returned that raw mask directly, which could be shorter than numRecords. What Changed - Plumb `num_records: Option<u64>` through scan replay DV side channel - Pad short masks with `true` up to `numRecords` at the API boundary - Error if mask exceeds `numRecords` or `numRecords` is missing This is now a stricter contract with `deletion_vectors()` now failing if a DV file is missing `numRecords` instead of returning a truncated mask. Upstream Kernel Note If kernel can return full length selection vectors, this normalization will not be needed. Will look into if an upstream feature on delta-kernel is welcomed for a length aware selection vector api # Related Issue(s) - #4235 <!--- For example: - closes #106 ---> # Documentation Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-27 16:45:05 +01:00
Ethan Urbanski	4b6fd8f05e	fix: delete partition fallback batching and add action coalescing (#4211 ) # Description some follow up/hardening changes from the partition only delete work done recently DELETE partition only fallback and add action evaluation could materialize all actions into a single batch, which breaks on large tables Changes: - DELETE fallback uses batched partition metadata instead of single batch materialization - Shared partition metadata MemTable builder across scan and DELETE paths - Snapshot fast path for partition only column projection - add_actions coalescing streams directly into BatchCoalescer instead of pre-collecting - Python docs note get_add_actions() return type migration <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-26 18:02:57 +01:00
Khalid Mammadov	d364f62619	feat: vacuum lite mode to avoid storage listing (#4227 ) # Description Vacuum Lite Mode only deletes Stale Tombstone files, but current implementation does full file listing regardless of Lite or Full mode. This change avoids listing storage for Lite mode and tries to simlify and clarify logic by segregating concerns for each mode. # Related Issue(s) - closes [#106](https://github.com/delta-io/delta-rs/issues/4228) # Documentation Added test cases to test and clarify intend --------- Signed-off-by: Khalid Mammadov <khalidmammadov9@gmail.com> Signed-off-by: R. Tyler Croy <rtyler@brokenco.de> Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>	2026-02-25 08:50:42 +01:00
Ion Koutsouris	65b50e0405	feat: multi part upload in compaction log Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-02-24 06:19:00 -08:00
Ion Koutsouris	56b4a00373	chore: resolve feedback Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-02-24 06:19:00 -08:00
Ion Koutsouris	8323ddf63f	feat: log compaction Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-02-24 06:19:00 -08:00
Thomas Frederik Hoeck	e7b7885140	feat: added disk spilling for merge (#4219 ) Added disk spilling for `merge `similar to optimize functions to allow for merges which touches many files in the target # Description I have added functionality for spilling to disk similar to how it works in the optimize functions. If nothing is provided it works as before. I have added test similar to those for the other spill functions. I have tested my cases in #4217 which now successfully completes the merge without OOM. I have used AI (Opus 4.6) for getting a overview of the project structure and for writing most of the code. I have review and verified the code myself. Work done: - create `create_session_state_with_spill_config` (which is just a move and rename of `create_session_state_for_optimize`) - use `create_session_state_with_spill_config` in existing optimize functions - use `create_session_state_with_spill_config` for `merge` # Related Issue(s) Closes #4217 # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com> Co-authored-by: Thomas Frederik Hoeck <tfh@norden.com>	2026-02-24 10:53:53 +00:00
Ethan Urbanski	1724f89548	chore: python datafusion 52 upgrade (#4226 ) # Description Upgrades the Python DataFusion path to 52.x and makes the integration lane blocking in CI # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-24 01:19:58 +01:00
vsmanish1772	cb80248f39	get add action return arrow table Signed-off-by: vsmanish1772 <smanish1772@gmail.com>	2026-02-16 07:13:40 -08:00
Khalid Mammadov	4ac9f71e2c	fix: clarify vacuum command documentation for DeltaTable (#4196 ) # Description Improve documentation for vacuum command and specifically for full=False case. # Related Issue(s) Related to this discussion: https://github.com/delta-io/delta-rs/issues/3644 # Documentation Updated Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-13 09:11:17 +00:00
Ethan Urbanski	f157cdca8d	fix: preserve generated column metadata during schema merge (#4191 ) # Description Fixes a regression where `schema_mode="merge"` appends strip `delta.generationExpression` from the table schema. Once lost, subsequent writes compute NULL instead of generated values. # Related Issue(s) - closes #4186 <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-12 16:04:04 +09:00
Ion Koutsouris	0d088c6fe0	chore: Bump version from 1.4.1 to 1.4.2 (#4182 ) # Description The description of the main changes of your pull request # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-02-09 05:30:38 -08:00
Robert Pack	c4f9d42423	refactor: remove table level stats on TableProvider (#4174 ) # Description For delta tables, table level statistics are not quite as useful as file level stats. However we do go through quite some trouble to expose table level stats which also assume we always have a materialised log to expose these stats. As such, it hinders us in migration to a lazy architecture. In fact the datafusion native file-based table implementation (parquet, json, csv, ...) only expose stats on the execution pkan level, and not on the table provider level. In this PR we therefore remote the table level stats from the current table provider and remove the associated code. Signed-off-by: Robert Pack <robstar.pack@gmail.com>	2026-02-09 02:31:51 +01:00
khalidmammadov	bbce1a1c08	add path existing check test case Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	ebc09fd885	remove line Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	306239de10	remove docs Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	81d102eee4	Avoid raising exceptions when path does not exist Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	facb08b3e8	Add test case for negative case and simplify error case Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	2e4b1aae2e	revert unnecessary change Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
khalidmammadov	323fc80070	fix: stopping is_deltatable path creation for invalid path Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>	2026-02-08 12:19:44 -08:00
Ethan Urbanski	ecc8beb20b	feat: expose DV metadata and payloads as Arrow streams (#4168 ) # Description Adds `DeltaTable.deletion_vectors() -> RecordBatchReader` returning one row per data file with a deletion vector. Schema: `filepath: utf8`, `selection_vector: list[bool]` (true = keep, false = deleted). Reuses the existing DataFusion replay path via `replay_deletion_vectors(...)`. Results are deterministic and sorted by filepath. Core changes: - `DeletionVectorSelection` struct, `DeltaScan::deletion_vectors()`, shared `scan_metadata_stream()` helper to avoid drift between scan paths - Replaced internal DV `expect(...)` with typed error propagation Python binding: - `cloned_table_and_state()` to avoid TOCTOU on table + snapshot - Chunked Arrow batch output with non-nullable list items - Preserves `without_files` guard behavior # Related Issue(s) - Closes #4159 <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> cc @ion-elgreco --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-07 17:13:17 +01:00
Stephen Carman	9261e3982e	Add the unity catalog dependency back Signed-off-by: Stephen Carman <shcarman@gmail.com>	2026-02-05 10:39:20 -08:00
Stephen Carman	3244f73671	Add the unity catalog dependency back Signed-off-by: Stephen Carman <shcarman@gmail.com>	2026-02-05 10:39:20 -08:00
Ethan Urbanski	f7a4bc2a3c	fix: preserve kernel column segments (#4164 ) # Description Fix kernel to DataFusion column expression conversion to preserve exact `ColumnName` path segments. Fixes #4082 Fix: Use DataFusion `ident(...)` for the base column segment when converting `Expression::Column`, then `.field(...)` for remaining path segments. Preserves exact segment names, avoids SQL style normalization. # Related Issue(s) - closes #4082 <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-02-05 04:04:09 +01:00
Ion Koutsouris	c095787589	fix: nested runtimes in stream adapter (#4148 ) # Description - closes https://github.com/delta-io/delta-rs/issues/4147 Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2026-01-30 14:58:25 +00:00
Ethan Urbanski	5121983552	fix(python): guard DataFusion FFI export on datafusion major version (#4142 ) # Description - Add a runtime version check in __datafusion_table_provider__ to prevent FFI ABI mismatch segfaults - Block capsule export when installed datafusion major != 52 - Provide actionable error text with QueryBuilder workaround Changes: - lib.rs: add REQUIRED_DATAFUSION_PY_MAJOR, datafusion_python_version(), guard at method start - test_datafusion.py: add incompatible version and not installed tests Note: This guard is a temporary safety net to prevent segfaults until DataFusion 52 Python wheels are available on PyPI. Once wheels land, users can install datafusion==52.* and use SessionContext registration normally. # Related Issue(s) - #4135 <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>	2026-01-28 23:14:57 +01:00
R. Tyler Croy	6ee66e2a7c	chore: upgrade python version for a patch release Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-01-28 07:00:33 -08:00
R. Tyler Croy	2256561540	chore: update Python formatting for rust 2024 Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-01-26 07:20:53 -08:00
R. Tyler Croy	957f5267bd	chore: upgrade python version for the next release This change also brings the python library onto rust 2024 edition Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>	2026-01-26 07:20:53 -08:00

1 2 3 4 5 ...

893 Commits