893 Commits

Author SHA1 Message Date
Sam Wright dc9c51b878 doc: content and formatting improvements to API docs
Signed-off-by: Sam Wright <samuel@plaindocs.com>
2026-03-24 09:19:46 -04:00
Ethan Urbanski 08222ea17b fix: preserve recent tombstones in full vacuum
Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-03-21 09:26:16 -07:00
R. Tyler Croy 7437c1e9d2 fix: rely on u64 for all references to delta versions
There never has been a valid negative version in the Delta protocol. I'm
not sure why this was even here as i64.

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-03-21 08:40:41 -07:00
Ethan Urbanski 481fddeffd chore: tighten duplicate argument validation in create() and vacuum() (#4299)
# Description
Tighten the Python compatibility handling in `DeltaTable.create()` and
`DeltaTable.vacuum()` ensuring duplicate values are rejected when legacy
positional arguments are mixed with keywords.

# Related Issue(s)
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-03-20 10:31:59 +01:00
Bhavana Sundar c85075e427 fix(python): make commit control args keyword-only across mutating APIs (#4283)
# Description

Added `deprecate_positional_commit_args` to `_util.py` as a helper that
preserves legacy positional behavior per method, emits a
DeprecationWarning, rejects invalid usage, and normalizes to canonical
commit_properties-first order.

Wired into all public mutating APIs across `table.py`,
`writer/convert_to.py`, and `transaction.py`. `write_deltalake` params
reordered to canonical order (already keyword-only, no shim needed).
`restore` left untouched (already keyword-only).

Unit tests added in `tests/test_util.py`.
# Related Issue(s)

- closes #4252 

Notes:
- `_internal.pyi stubs are already in canonical order
- `create()` and `vacuum()` handle legacy positional args inline (they
had extra trailing params beyond the commit args, so the shared helper
is bypassed for those two)
Follow-up PR will make all APIs keyword-only and remove the
compatibility path

AI disclosure: I used Claude as a coding assistant to help map out
affected methods. I reviewed every modification, ran the full test suite
locally, and understand each change made.

---------

Signed-off-by: Bhavana Sundar <bhavana7899@gmail.com>
Co-authored-by: Ethan Urbanski <ethanurbanski@gmail.com>
2026-03-18 23:49:12 -04:00
Khalid Mammadov 87405d48c7 fix(tests): replace tempfile with tmp_path in date partitioned table test (#4297)
# Description
This test fails on local when run repeatedly. 

`tempfile.gettempdir()` always returns the same path and cauases test to
fail on second run.
Existing fixture for `tmp_path` returns unique path for each test

# Related Issue(s)

# Documentation
NA

Signed-off-by: Khalid Mammadov <khalidmammadov9@gmail.com>
2026-03-18 18:32:41 +00:00
Ethan Urbanski f15752b611 fix merge schema evolution
Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-03-17 17:04:25 -04:00
Ethan Urbanski db87e212c5 fix: make keep_versions order independent
Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-03-16 18:12:30 -04:00
Ion Koutsouris 5871decd6b chore: bump version (#4277)
# Description
The description of the main changes of your pull request

# Related Issue(s)
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-03-12 11:46:10 -04:00
Abhi Agarwal c3fbf38e99 Fix python type signatures
Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>
2026-03-11 07:59:32 -07:00
Abhi Agarwal 16f9048c6d type signature
Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>
2026-03-11 07:59:32 -07:00
Abhi Agarwal c8ce0d0cce Make write with None snapshot write to default size
Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>
2026-03-11 07:59:32 -07:00
Abhi Agarwal 0126a0126e Move to u64 to represent file sizes
Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>
2026-03-11 07:59:32 -07:00
Abhi Agarwal 7bd28f5370 Consolidate target_file_size and allow unbounded writes
Signed-off-by: Abhi Agarwal <abhi@airspace-intelligence.com>
2026-03-11 07:59:32 -07:00
Florian Valeye 49865edaf8 fix: add a central arrow delta type normalization (#4254)
# Description
When we write data with unsupported Arrow types (`Date64`,
`Timestamp(ns)`), writes either fail with confusing kernel errors or
produce tables with incompatible schemas.
The goal was to centralize the normalization (including the opportunity
to refactor `convert_to_delta`) into `normalize_for_delta` to convert
unsupported Arrow types to their Delta-compatible equivalents before
writing.
This matches the Delta protocol specification and is consistent with how
Spark handles these types.

# Related Issue(s)
- Fixes #3877
- Fixes #1721

## Types conversion
| Arrow type | Delta-compatible type |
|---|---|
| `Date64` | `Date32` |
| `Timestamp(s/ms/ns, tz)` | `Timestamp(us, tz)` |

Nested types (Struct, List, FixedSizeList, Map) are normalized
recursively.

---------

Signed-off-by: Florian Valeye <florian.valeye@gmail.com>
2026-03-10 14:46:48 +00:00
Ethan Urbanski 675936ae44 fix: coerce decimal literals in target subset filters (#4267)
# Description

Found this while working on #4266.

Merge target subset filters can retain decimal precision/scale from the
source expression instead of the target schema. For example `decimal(4,
1)` when the column is `decimal(6, 1)`. The newer file skipping path
rejects the mismatch.

This fix is normalize `target_subset_filter` against the target schema
before simplification, and extend literal coercion to handle between
bounds.

# Related Issue(s)
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-03-10 10:03:02 +01:00
R. Tyler Croy e6ffb54e84 fix: create_write_transaction works again, now with 100% more coverage
During some experimentation somwewhere along the line the actual
implementation was commented out. This wasn't caught in CI it seems
because we have no test coverage of the function!

Welp, now we do!

Fixes #4126

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-03-08 13:04:02 -07:00
R. Tyler Croy 9a1cff93fc WIP: testing something in CI
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-03-07 10:36:43 -08:00
Manish Sogiyawar 7a378c210b feat(python): add post_commithook_properties to alter metadata apis (#4249)
# Description
adds `post_commithook_properties` support to the missing `table.alter`
metadata methods, so cleanup and checkpoint creation behavior can be
controlled consistently via `cleanup_expired_logs` and
`create_checkpoint` respectively

## Validation
- `make tests`
- `cargo check -p deltalake-python`
- `cargo fmt --all --check`

Signed-off-by: vsmanish1772 <smanish1772@gmail.com>
2026-03-06 11:58:51 +00:00
anshulbaliga7 f0bbccf45f docs: fix several nit issues in docs 2026-03-01 09:50:59 -08:00
Avinash Changrani dd3b1aabe1 feat: use batched delete_stream in delete_dir
- Refactor delete_dir to delete objects via ObjectStore::delete_stream for improved performance.
- Add tests for nested deletes, sibling-prefix safety, and missing-prefix no-op behavior.

Signed-off-by: Avinash Changrani <avinashchangrani99@gmail.com>
2026-03-01 09:01:04 -08:00
Ethan Urbanski 8e227bcc78 fix: pad DV keep masks to numRecords (#4236)
# Description
Fixes #4235 - `DeltaTable.deletion_vectors()` returned truncated
selection vectors when the highest deleted row index was below the
file's total row count.

Kernel returns a sparse DV mask (up to highest deleted index). The api
returned that raw mask directly, which could be shorter than numRecords.

**What Changed**
- Plumb `num_records: Option<u64>` through scan replay DV side channel
- Pad short masks with `true` up to `numRecords` at the API boundary
- Error if mask exceeds `numRecords` or `numRecords` is missing

This is now a stricter contract with `deletion_vectors()` now failing if
a DV file is missing `numRecords` instead of returning a truncated mask.


**Upstream Kernel Note**
If kernel can return full length selection vectors, this normalization
will not be needed. Will look into if an upstream feature on
delta-kernel is welcomed for a length aware selection vector api

# Related Issue(s)
- #4235  
<!---
For example:

- closes #106
--->

# Documentation

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-27 16:45:05 +01:00
Ethan Urbanski 4b6fd8f05e fix: delete partition fallback batching and add action coalescing (#4211)
# Description
some follow up/hardening changes from the partition only delete work
done recently

DELETE partition only fallback and add action evaluation could
materialize all actions into a single batch, which breaks on large
tables

Changes:
- DELETE fallback uses batched partition metadata instead of single
batch materialization
- Shared partition metadata MemTable builder across scan and DELETE
paths
- Snapshot fast path for partition only column projection
- add_actions coalescing streams directly into BatchCoalescer instead of
pre-collecting
- Python docs note get_add_actions() return type migration

<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-26 18:02:57 +01:00
Khalid Mammadov d364f62619 feat: vacuum lite mode to avoid storage listing (#4227)
# Description
Vacuum Lite Mode only deletes Stale Tombstone files, but current
implementation does full file listing regardless of Lite or Full mode.

This change avoids listing storage for Lite mode and tries to simlify
and clarify logic by segregating concerns for each mode.

# Related Issue(s)
- closes [#106](https://github.com/delta-io/delta-rs/issues/4228)

# Documentation

Added test cases to test and clarify intend

---------

Signed-off-by: Khalid Mammadov <khalidmammadov9@gmail.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
2026-02-25 08:50:42 +01:00
Ion Koutsouris 65b50e0405 feat: multi part upload in compaction log
Signed-off-by: Ion Koutsouris
<15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-02-24 06:19:00 -08:00
Ion Koutsouris 56b4a00373 chore: resolve feedback
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-02-24 06:19:00 -08:00
Ion Koutsouris 8323ddf63f feat: log compaction
Signed-off-by: Ion Koutsouris
<15728914+ion-elgreco@users.noreply.github.com>
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-02-24 06:19:00 -08:00
Thomas Frederik Hoeck e7b7885140 feat: added disk spilling for merge (#4219)
Added disk spilling for `merge `similar to optimize functions to allow
for merges which touches many files in the target

# Description
I have added functionality for spilling to disk similar to how it works
in the optimize functions. If nothing is provided it works as before.

I have added test similar to those for the other spill functions.

I have tested my cases in #4217 which now successfully completes the
merge without OOM.

I have used AI (Opus 4.6) for getting a overview of the project
structure and for writing most of the code. I have review and verified
the code myself.

Work done:

- create `create_session_state_with_spill_config` (which is just a move
and rename of `create_session_state_for_optimize`)
- use `create_session_state_with_spill_config` in existing optimize
functions
- use `create_session_state_with_spill_config` for `merge`

# Related Issue(s)
Closes #4217


# Documentation

<!---
Share links to useful documentation
--->

---------

Signed-off-by: Thomas Frederik Hoeck <tfh@norden.com>
Co-authored-by: Thomas Frederik Hoeck <tfh@norden.com>
2026-02-24 10:53:53 +00:00
Ethan Urbanski 1724f89548 chore: python datafusion 52 upgrade (#4226)
# Description
Upgrades the Python DataFusion path to 52.x and makes the integration
lane blocking in CI

# Related Issue(s)
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-24 01:19:58 +01:00
vsmanish1772 cb80248f39 get add action return arrow table
Signed-off-by: vsmanish1772 <smanish1772@gmail.com>
2026-02-16 07:13:40 -08:00
Khalid Mammadov 4ac9f71e2c fix: clarify vacuum command documentation for DeltaTable (#4196)
# Description
Improve documentation for vacuum command and specifically for
**full=False** case.

# Related Issue(s)
Related to this discussion:
https://github.com/delta-io/delta-rs/issues/3644

# Documentation
Updated

Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-13 09:11:17 +00:00
Ethan Urbanski f157cdca8d fix: preserve generated column metadata during schema merge (#4191)
# Description
Fixes a regression where `schema_mode="merge"` appends strip
`delta.generationExpression` from the table schema. Once lost,
subsequent writes compute NULL instead of generated values.

# Related Issue(s)
- closes #4186 
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-12 16:04:04 +09:00
Ion Koutsouris 0d088c6fe0 chore: Bump version from 1.4.1 to 1.4.2 (#4182)
# Description
The description of the main changes of your pull request

# Related Issue(s)
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-02-09 05:30:38 -08:00
Robert Pack c4f9d42423 refactor: remove table level stats on TableProvider (#4174)
# Description

For delta tables, table level statistics are not quite as useful as file
level stats. However we do go through quite some trouble to expose table
level stats which also assume we always have a materialised log to
expose these stats. As such, it hinders us in migration to a lazy
architecture.

In fact the datafusion native file-based table implementation (parquet,
json, csv, ...) only expose stats on the execution pkan level, and not
on the table provider level.

In this PR we therefore remote the table level stats from the current
table provider and remove the associated code.

Signed-off-by: Robert Pack <robstar.pack@gmail.com>
2026-02-09 02:31:51 +01:00
khalidmammadov bbce1a1c08 add path existing check test case
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov ebc09fd885 remove line
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov 306239de10 remove docs
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov 81d102eee4 Avoid raising exceptions when path does not exist
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov facb08b3e8 Add test case for negative case and simplify error case
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov 2e4b1aae2e revert unnecessary change
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
khalidmammadov 323fc80070 fix: stopping is_deltatable path creation for invalid path
Signed-off-by: khalidmammadov <khalidmammadov9@gmail.com>
2026-02-08 12:19:44 -08:00
Ethan Urbanski ecc8beb20b feat: expose DV metadata and payloads as Arrow streams (#4168)
# Description
Adds `DeltaTable.deletion_vectors() -> RecordBatchReader` returning one
row per data file with a deletion vector.

Schema: `filepath: utf8`, `selection_vector: list[bool]` (true = keep,
false = deleted).

Reuses the existing DataFusion replay path via
`replay_deletion_vectors(...)`. Results are deterministic and sorted by
filepath.

**Core changes:**
- `DeletionVectorSelection` struct, `DeltaScan::deletion_vectors()`,
shared
  `scan_metadata_stream()` helper to avoid drift between scan paths
- Replaced internal DV `expect(...)` with typed error propagation

**Python binding:**
- `cloned_table_and_state()` to avoid TOCTOU on table + snapshot
- Chunked Arrow batch output with non-nullable list items
- Preserves `without_files` guard behavior

# Related Issue(s)
- Closes #4159
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

cc @ion-elgreco

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-07 17:13:17 +01:00
Stephen Carman 9261e3982e Add the unity catalog dependency back
Signed-off-by: Stephen Carman <shcarman@gmail.com>
2026-02-05 10:39:20 -08:00
Stephen Carman 3244f73671 Add the unity catalog dependency back
Signed-off-by: Stephen Carman <shcarman@gmail.com>
2026-02-05 10:39:20 -08:00
Ethan Urbanski f7a4bc2a3c fix: preserve kernel column segments (#4164)
# Description
Fix kernel to DataFusion column expression conversion to preserve exact
`ColumnName` path segments.

Fixes #4082

**Fix:**
Use DataFusion `ident(...)` for the base column segment when converting
`Expression::Column`, then `.field(...)` for remaining path segments.
Preserves exact segment names, avoids SQL style normalization.

# Related Issue(s)
- closes #4082

<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-02-05 04:04:09 +01:00
Ion Koutsouris c095787589 fix: nested runtimes in stream adapter (#4148)
# Description
- closes https://github.com/delta-io/delta-rs/issues/4147

Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
2026-01-30 14:58:25 +00:00
Ethan Urbanski 5121983552 fix(python): guard DataFusion FFI export on datafusion major version (#4142)
# Description

- Add a runtime version check in __datafusion_table_provider__ to
prevent FFI ABI mismatch segfaults
- Block capsule export when installed datafusion major != 52
- Provide actionable error text with QueryBuilder workaround

Changes:

- lib.rs: add REQUIRED_DATAFUSION_PY_MAJOR, datafusion_python_version(),
guard at method start
- test_datafusion.py: add incompatible version and not installed tests

Note: This guard is a temporary safety net to prevent segfaults until
DataFusion 52 Python wheels are available on PyPI. Once wheels land,
users can install datafusion==52.* and use SessionContext registration
normally.

# Related Issue(s)
- #4135 
<!---
For example:

- closes #106
--->

# Documentation

<!---
Share links to useful documentation
--->

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
2026-01-28 23:14:57 +01:00
R. Tyler Croy 6ee66e2a7c chore: upgrade python version for a patch release
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-01-28 07:00:33 -08:00
R. Tyler Croy 2256561540 chore: update Python formatting for rust 2024
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-01-26 07:20:53 -08:00
R. Tyler Croy 957f5267bd chore: upgrade python version for the next release
This change also brings the python library onto rust 2024 edition

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
2026-01-26 07:20:53 -08:00