Home page refactor

Signed-off-by: Sam Wright <samuel@plaindocs.com>
2026-07-01 20:34:35 -04:00 · 2026-02-26 13:15:14 +02:00
parent 7a378c210b
commit b014e5e0d4
5 changed files with 66 additions and 78 deletions
@@ -0,0 +1,23 @@
+`deltalake` is a Rust-based re-implementation of the DeltaLake protocol originally developed at DataBricks. The `deltalake` library has APIs in Rust and Python. The `deltalake` implementation has no dependencies on Java, Spark or DataBricks.
+
+## Contributing
+
+The Delta Lake community welcomes contributors from all developers, regardless of your experience or programming background.
+
+You can write Rust code, Python code, documentation, submit bugs, or give talks to the community.  We welcome all of these contributions.
+
+Feel free to [join our Slack](https://go.delta.io/slack) and message us in the #delta-rs channel any time!
+
+We value kind communication and building a productive, friendly environment for maximum collaboration and fun.
+
+
+## Important terminology
+
+* `deltalake` refers to the Rust or Python API of delta-rs
+* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol.  This depends on Spark and Java.
+
+## Why implement the Delta Lake transaction log protocol in Rust?
+
+Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries.  `deltalake` allows you to manage your dataset using a Delta Lake approach without any Java or Spark dependencies.
+
+A `DeltaTable` on disk is simply a directory that stores metadata in JSON files and data in Parquet files.
@@ -0,0 +1,5 @@
+## Project history
+
+Check out this video by Denny Lee & QP Hou to learn about the genesis of the delta-rs project:
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/ZQdEdifcBh8?si=ytGW7FB-kwl6VqsV" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
@@ -1,91 +1,48 @@
-`deltalake` is an open source library that makes working with tabular datasets easier, more robust and more performant. With deltalake you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files. 
+`deltalake` is an open source library that makes working with tabular datasets easier, more robust and more performant. With `deltalake` you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files.

-`deltalake` can be used to manage data stored on a local file system or in the cloud. `deltalake` integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
+With `deltalake` you can manage data stored on a local file system or in the cloud. `deltalake` integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.

-`deltalake` uses a lakehouse framework for managing datasets. With this lakehouse approach you manage your datasets with a `DeltaTable` object and then `deltalake` takes care of the underlying files. Within a `DeltaTable` your data is stored in high performance Parquet files while metadata is stored in a set of JSON files called a transaction log.
-
-`deltalake` is a Rust-based re-implementation of the DeltaLake protocol originally developed at DataBricks. The `deltalake` library has APIs in Rust and Python. The `deltalake` implementation has no dependencies on Java, Spark or DataBricks.
-
-
-## Important terminology
-
-* `deltalake` refers to the Rust or Python API of delta-rs
-* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol.  This depends on Spark and Java.
-
-## Why implement the Delta Lake transaction log protocol in Rust?
-
-Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries.  `deltalake` allows you to manage your dataset using a Delta Lake approach without any Java or Spark dependencies.
-
-A `DeltaTable` on disk is simply a directory that stores metadata in JSON files and data in Parquet files.  
+`deltalake` uses a lakehouse framework where you manage your datasets with a `DeltaTable` object and `deltalake` takes care of the underlying files.

 ## Quick start

-You can install `deltalake` in Python with `pip`
-```bash
-pip install deltalake
-```
-We create a Pandas `DataFrame` and write it to a `DeltaTable`:
-```python
-import pandas as pd
-from deltalake import DeltaTable,write_deltalake
+1. Install the Python dependencies with `pip`:

-df = pd.DataFrame(
-    {
-        "id": [1, 2, 3],
-        "name": ["Aadhya", "Bob", "Chen"],
-    }
-)
+    ```bash
+    pip install deltalake pyarrow tabulate
+    ```

-(
-    write_deltalake(
-        table_or_uri="delta_table_dir",
-        data=df,
-    )
-)
-```
-We create a `DeltaTable` object that holds the metadata for the Delta table:
-```python
-dt = DeltaTable("delta_table_dir")
-```
-We load the `DeltaTable` into a Pandas `DataFrame` with `to_pandas` on a `DeltaTable`:
-```python
-new_df = dt.to_pandas()
-```
+    - `pyarrow` is needed for the DataFrame import
+    - `tabulate` is needed to print the DataFrame in the example

-Or we can load the data into a Polars `DataFrame` with `pl.read_delta`:
-```python
-import polars as pl
-new_df = pl.read_delta("delta_table_dir")
-```
+1. Create a Pandas `DataFrame` and write it to a `DeltaTable`:

-Or we can load the data with DuckDB:
-```python
-import duckdb
-duckdb.query("SELECT * FROM delta_scan('./delta_table_dir')")
-```
+    ```python
+    from deltalake import write_deltalake, DeltaTable
+    import pandas as pd

-Or we can load the data with DataFusion:
-```python
-from datafusion import SessionContext
+    #  Create a Pandas DataFrame and write it to a DeltaTable:
+    df = pd.DataFrame({"num": [8, 9], "letter": ["aa", "bb"]})
+    write_deltalake("tmp/some-table", df)

-ctx = SessionContext()
-ctx.register_dataset("my_delta_table", dt.to_pyarrow_dataset())
-ctx.sql("select * from my_delta_table")
-```
+    # Create a DeltaTable object to track metadata for the Delta table
+    dt = DeltaTable("tmp/some-table")
+
+    # Overwrite the DataFrame with new data
+    df = pd.DataFrame({"num": [11, 22], "letter": ["dd", "ee"]})
+    write_deltalake("tmp/some-table", df, mode="overwrite")
+
+    # Easily revert to version 0 of the table
+    df = DeltaTable("tmp/some-table", version=0)
+
+    # Print the the original version 0 data
+    print(df.to_pandas().to_markdown())
+    ```


-## Contributing
+## Next steps

-The Delta Lake community welcomes contributors from all developers, regardless of your experience or programming background.
-
-You can write Rust code, Python code, documentation, submit bugs, or give talks to the community.  We welcome all of these contributions.
-
-Feel free to [join our Slack](https://go.delta.io/slack) and message us in the #delta-rs channel any time!
-
-We value kind communication and building a productive, friendly environment for maximum collaboration and fun.
-
-## Project history
-
-Check out this video by Denny Lee & QP Hou to learn about the genesis of the delta-rs project:
-
-<iframe width="560" height="315" src="https://www.youtube.com/embed/ZQdEdifcBh8?si=ytGW7FB-kwl6VqsV" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+- Learn about Querying Delta Tables
+- Learn about using `deltalake` with Polars
+- Learn about using `deltalake` with DuckDB
+- Learn about using `deltalake` with DataFusion
@@ -49,7 +49,6 @@ nav:
      - Home: index.md
      - Why Use Delta Lake: why-use-delta-lake.md
      - Delta Lake for big and small data: delta-lake-big-data-small-data.md
-      - Best practices: delta-lake-best-practices.md
  - Usage:
      - Installation: usage/installation.md
      - Overview: usage/overview.md
@@ -66,6 +65,7 @@ nav:
          - usage/writing/index.md
          - usage/writing/writing-to-s3-with-locking-provider.md
      - Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md
+      - Best practices: usage/delta-lake-best-practices.md
      - Optimize:
          - Small file compaction: usage/optimize/small-file-compaction-with-optimize.md
          - Z Order: usage/optimize/delta-lake-z-order.md
@@ -104,6 +104,9 @@ nav:
      - File skipping: how-delta-lake-works/delta-lake-file-skipping.md
  - Upgrade guides:
      - Version 1.0.0: upgrade-guides/guide-1.0.0.md
+  - About:
+      - Contributing: about/contributing.md
+      - History: about/history.md
 not_in_nav: |
  /_build/