Home page refactor

Signed-off-by: Sam Wright <samuel@plaindocs.com>
This commit is contained in:
Sam Wright
2026-02-26 13:15:14 +02:00
committed by R. Tyler Croy
parent 7a378c210b
commit b014e5e0d4
5 changed files with 66 additions and 78 deletions
+23
View File
@@ -0,0 +1,23 @@
`deltalake` is a Rust-based re-implementation of the DeltaLake protocol originally developed at DataBricks. The `deltalake` library has APIs in Rust and Python. The `deltalake` implementation has no dependencies on Java, Spark or DataBricks.
## Contributing
The Delta Lake community welcomes contributors from all developers, regardless of your experience or programming background.
You can write Rust code, Python code, documentation, submit bugs, or give talks to the community. We welcome all of these contributions.
Feel free to [join our Slack](https://go.delta.io/slack) and message us in the #delta-rs channel any time!
We value kind communication and building a productive, friendly environment for maximum collaboration and fun.
## Important terminology
* `deltalake` refers to the Rust or Python API of delta-rs
* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol. This depends on Spark and Java.
## Why implement the Delta Lake transaction log protocol in Rust?
Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries. `deltalake` allows you to manage your dataset using a Delta Lake approach without any Java or Spark dependencies.
A `DeltaTable` on disk is simply a directory that stores metadata in JSON files and data in Parquet files.
+5
View File
@@ -0,0 +1,5 @@
## Project history
Check out this video by Denny Lee & QP Hou to learn about the genesis of the delta-rs project:
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZQdEdifcBh8?si=ytGW7FB-kwl6VqsV" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+34 -77
View File
@@ -1,91 +1,48 @@
`deltalake` is an open source library that makes working with tabular datasets easier, more robust and more performant. With deltalake you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files.
`deltalake` is an open source library that makes working with tabular datasets easier, more robust and more performant. With `deltalake` you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files.
`deltalake` can be used to manage data stored on a local file system or in the cloud. `deltalake` integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
With `deltalake` you can manage data stored on a local file system or in the cloud. `deltalake` integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
`deltalake` uses a lakehouse framework for managing datasets. With this lakehouse approach you manage your datasets with a `DeltaTable` object and then `deltalake` takes care of the underlying files. Within a `DeltaTable` your data is stored in high performance Parquet files while metadata is stored in a set of JSON files called a transaction log.
`deltalake` is a Rust-based re-implementation of the DeltaLake protocol originally developed at DataBricks. The `deltalake` library has APIs in Rust and Python. The `deltalake` implementation has no dependencies on Java, Spark or DataBricks.
## Important terminology
* `deltalake` refers to the Rust or Python API of delta-rs
* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol. This depends on Spark and Java.
## Why implement the Delta Lake transaction log protocol in Rust?
Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries. `deltalake` allows you to manage your dataset using a Delta Lake approach without any Java or Spark dependencies.
A `DeltaTable` on disk is simply a directory that stores metadata in JSON files and data in Parquet files.
`deltalake` uses a lakehouse framework where you manage your datasets with a `DeltaTable` object and `deltalake` takes care of the underlying files.
## Quick start
You can install `deltalake` in Python with `pip`
```bash
pip install deltalake
```
We create a Pandas `DataFrame` and write it to a `DeltaTable`:
```python
import pandas as pd
from deltalake import DeltaTable,write_deltalake
1. Install the Python dependencies with `pip`:
df = pd.DataFrame(
{
"id": [1, 2, 3],
"name": ["Aadhya", "Bob", "Chen"],
}
)
```bash
pip install deltalake pyarrow tabulate
```
(
write_deltalake(
table_or_uri="delta_table_dir",
data=df,
)
)
```
We create a `DeltaTable` object that holds the metadata for the Delta table:
```python
dt = DeltaTable("delta_table_dir")
```
We load the `DeltaTable` into a Pandas `DataFrame` with `to_pandas` on a `DeltaTable`:
```python
new_df = dt.to_pandas()
```
- `pyarrow` is needed for the DataFrame import
- `tabulate` is needed to print the DataFrame in the example
Or we can load the data into a Polars `DataFrame` with `pl.read_delta`:
```python
import polars as pl
new_df = pl.read_delta("delta_table_dir")
```
1. Create a Pandas `DataFrame` and write it to a `DeltaTable`:
Or we can load the data with DuckDB:
```python
import duckdb
duckdb.query("SELECT * FROM delta_scan('./delta_table_dir')")
```
```python
from deltalake import write_deltalake, DeltaTable
import pandas as pd
Or we can load the data with DataFusion:
```python
from datafusion import SessionContext
# Create a Pandas DataFrame and write it to a DeltaTable:
df = pd.DataFrame({"num": [8, 9], "letter": ["aa", "bb"]})
write_deltalake("tmp/some-table", df)
ctx = SessionContext()
ctx.register_dataset("my_delta_table", dt.to_pyarrow_dataset())
ctx.sql("select * from my_delta_table")
```
# Create a DeltaTable object to track metadata for the Delta table
dt = DeltaTable("tmp/some-table")
# Overwrite the DataFrame with new data
df = pd.DataFrame({"num": [11, 22], "letter": ["dd", "ee"]})
write_deltalake("tmp/some-table", df, mode="overwrite")
# Easily revert to version 0 of the table
df = DeltaTable("tmp/some-table", version=0)
# Print the the original version 0 data
print(df.to_pandas().to_markdown())
```
## Contributing
## Next steps
The Delta Lake community welcomes contributors from all developers, regardless of your experience or programming background.
You can write Rust code, Python code, documentation, submit bugs, or give talks to the community. We welcome all of these contributions.
Feel free to [join our Slack](https://go.delta.io/slack) and message us in the #delta-rs channel any time!
We value kind communication and building a productive, friendly environment for maximum collaboration and fun.
## Project history
Check out this video by Denny Lee & QP Hou to learn about the genesis of the delta-rs project:
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZQdEdifcBh8?si=ytGW7FB-kwl6VqsV" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
- Learn about Querying Delta Tables
- Learn about using `deltalake` with Polars
- Learn about using `deltalake` with DuckDB
- Learn about using `deltalake` with DataFusion
+4 -1
View File
@@ -49,7 +49,6 @@ nav:
- Home: index.md
- Why Use Delta Lake: why-use-delta-lake.md
- Delta Lake for big and small data: delta-lake-big-data-small-data.md
- Best practices: delta-lake-best-practices.md
- Usage:
- Installation: usage/installation.md
- Overview: usage/overview.md
@@ -66,6 +65,7 @@ nav:
- usage/writing/index.md
- usage/writing/writing-to-s3-with-locking-provider.md
- Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md
- Best practices: usage/delta-lake-best-practices.md
- Optimize:
- Small file compaction: usage/optimize/small-file-compaction-with-optimize.md
- Z Order: usage/optimize/delta-lake-z-order.md
@@ -104,6 +104,9 @@ nav:
- File skipping: how-delta-lake-works/delta-lake-file-skipping.md
- Upgrade guides:
- Version 1.0.0: upgrade-guides/guide-1.0.0.md
- About:
- Contributing: about/contributing.md
- History: about/history.md
not_in_nav: |
/_build/