chore: Update READMEs of crates to be more consistent (#17691)

* chore: Update READMEs of crates to be more consistent

* Add some more Apache project links

* Minor formatting

* Formatting

* Update datafusion/pruning/README.md

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* suggestion

* formatting

* formatting

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
This commit is contained in:
Jeffrey Vo
2025-09-22 20:35:29 +10:00
committed by GitHub
parent 1629420162
commit b63ca3e09a
52 changed files with 278 additions and 152 deletions
+5 -2
View File
@@ -19,12 +19,15 @@
<!-- Note this file is included in the crates.io page as well https://crates.io/crates/datafusion-cli -->
# DataFusion Command-line Interface
# Apache DataFusion Command-line Interface
[DataFusion](https://datafusion.apache.org/) is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
DataFusion CLI (`datafusion-cli`) is a small command line utility that runs SQL queries using the DataFusion engine.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
# Frequently Asked Questions
## Where can I find more information?
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-catalog-listing"
description = "datafusion-catalog-listing"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -4
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion catalog-listing
# Apache DataFusion Catalog Listing
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion with [ListingTable], an implementation
of [TableProvider] based on files in a directory (either locally or on remote
@@ -29,8 +29,8 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[listingtable]: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html
[tableprovider]: https://docs.rs/datafusion/latest/datafusion/datasource/trait.TableProvider.html
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-catalog"
description = "datafusion-catalog"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Catalog
# Apache DataFusion Catalog
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides catalog management functionality, including catalogs, schemas, and tables.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Common Runtime
# Apache DataFusion Common Runtime
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides common utilities.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Common
# Apache DataFusion Common
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides common data types and utilities.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+5 -8
View File
@@ -17,15 +17,12 @@
under the License.
-->
# DataFusion Core
<!--
Note the main crates.io landing page https://crates.io/crates/datafusion
uses the workspace README.md file, not this file
-->
DataFusion is an extensible query execution framework, written in Rust,
that uses Apache Arrow as its in-memory format.
# Apache DataFusion Core
This crate contains the main entry points and high level DataFusion APIs such as
`SessionContext`, `DataFrame` and `ListingTable`.
For more information, please see:
- [DataFusion Website](https://datafusion.apache.org)
- [DataFusion API Docs](https://docs.rs/datafusion/latest/datafusion/)
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-datasource-avro"
description = "datafusion-datasource-avro"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+6 -4
View File
@@ -17,15 +17,17 @@
under the License.
-->
# DataFusion datasource
# Apache DataFusion Avro DataSource
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that defines a Avro based file source.
This crate is a submodule of DataFusion that defines an [Apache Avro] based file source.
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[apache avro]: https://avro.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-datasource-csv"
description = "datafusion-datasource-csv"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion datasource
# Apache DataFusion CSV DataSource
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that defines a CSV based file source.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-datasource-json"
description = "datafusion-datasource-json"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion datasource
# Apache DataFusion JSON DataSource
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that defines a JSON based file source.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-datasource-parquet"
description = "datafusion-datasource-parquet"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+6 -4
View File
@@ -17,15 +17,17 @@
under the License.
-->
# DataFusion datasource
# Apache DataFusion Parquet DataSource
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that defines a Parquet based file source.
This crate is a submodule of DataFusion that defines an [Apache Parquet] based file source.
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[apache parquet]: https://parquet.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-datasource"
description = "datafusion-datasource"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion datasource
# Apache DataFusion DataSource
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that defines common DataSource related components like FileScanConfig, FileCompression etc.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1
View File
@@ -19,6 +19,7 @@
name = "datafusion-doc"
description = "Documentation module for DataFusion query engine"
keywords = ["datafusion", "query", "sql"]
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
homepage = { workspace = true }
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Execution
# Apache DataFusion Documentation
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides structures and macros
for documenting user defined functions.
@@ -28,5 +28,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Execution
# Apache DataFusion Execution
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides execution runtime such as the memory pools and disk manager.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1
View File
@@ -19,6 +19,7 @@
name = "datafusion-expr-common"
description = "Logical plan and expression representation for DataFusion query engine"
keywords = ["datafusion", "logical", "plan", "expressions"]
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
homepage = { workspace = true }
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Logical Plan and Expressions
# Apache DataFusion Common Logical Plan and Expressions
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides common logical expressions
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Logical Plan and Expressions
# Apache DataFusion Logical Plan and Expressions
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides data types and utilities for logical plans and expressions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+11 -11
View File
@@ -17,10 +17,10 @@
under the License.
-->
# `datafusion-ffi`: Apache DataFusion Foreign Function Interface
# Apache DataFusion Foreign Function Interface
This crate contains code to allow interoperability of Apache [DataFusion] with
functions from other libraries and/or [DataFusion] versions using a stable
This crate contains code to allow interoperability of [Apache DataFusion] with
functions from other libraries and/or DataFusion versions using a stable
interface.
One of the limitations of the Rust programming language is that there is no
@@ -28,10 +28,10 @@ stable [Rust ABI] (Application Binary Interface). If a library is compiled with
one version of the Rust compiler and you attempt to use that library with a
program compiled by a different Rust compiler, there is no guarantee that you
can access the data structures. In order to share code between libraries loaded
at runtime, you need to use Rust's [FFI](Foreign Function Interface (FFI)).
at runtime, you need to use Rust's [FFI] (Foreign Function Interface (FFI)).
The purpose of this crate is to define interfaces between [DataFusion] libraries
that will remain stable across different versions of [DataFusion]. This allows
The purpose of this crate is to define interfaces between DataFusion libraries
that will remain stable across different versions of DataFusion. This allows
users to write libraries that can interface between each other at runtime rather
than require compiling all of the code into a single executable.
@@ -46,7 +46,7 @@ See [API Docs] for details and examples.
Two use cases have been identified for this crate, but they are not intended to
be all inclusive.
1. `datafusion-python` which will use the FFI to provide external services such
1. [`datafusion-python`] which will use the FFI to provide external services such
as a `TableProvider` without needing to re-export the entire `datafusion-python`
code base. With `datafusion-ffi` these packages do not need `datafusion-python`
as a dependency at all.
@@ -68,8 +68,8 @@ stable interfaces that closely mirror the Rust native approach. To learn more
about this approach see the [abi_stable] and [async-ffi] crates.
If you have a library in another language that you wish to interface to
[DataFusion] the recommendation is to create a Rust wrapper crate to interface
with your library and then to connect it to [DataFusion] using this crate.
DataFusion the recommendation is to create a Rust wrapper crate to interface
with your library and then to connect it to DataFusion using this crate.
Alternatively, you could use [bindgen] to interface directly to the [FFI] provided
by this crate, but that is currently not supported.
@@ -101,12 +101,12 @@ In this crate we have a variety of structs which closely mimic the behavior of
their internal counterparts. To see detailed notes about how to use them, see
the example in `FFI_TableProvider`.
[datafusion]: https://datafusion.apache.org
[apache datafusion]: https://datafusion.apache.org/
[api docs]: http://docs.rs/datafusion-ffi/latest
[rust abi]: https://doc.rust-lang.org/reference/abi.html
[ffi]: https://doc.rust-lang.org/nomicon/ffi.html
[abi_stable]: https://crates.io/crates/abi_stable
[async-ffi]: https://crates.io/crates/async-ffi
[bindgen]: https://crates.io/crates/bindgen
[datafusion-python]: https://datafusion.apache.org/python/
[`datafusion-python`]: https://datafusion.apache.org/python/
[datafusion-contrib]: https://github.com/datafusion-contrib
@@ -19,6 +19,7 @@
name = "datafusion-functions-aggregate-common"
description = "Utility functions for implementing aggregate functions for the DataFusion query engine"
keywords = ["datafusion", "logical", "plan", "expressions"]
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
homepage = { workspace = true }
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Aggregate Function Library
# Apache DataFusion Aggregate Function Common Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains common functionality for implementation aggregate and window functions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Aggregate Function Library
# Apache DataFusion Aggregate Function Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains implementations of aggregate functions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+6 -4
View File
@@ -17,16 +17,18 @@
under the License.
-->
# DataFusion Nested Type Function Library
# Apache DataFusion Nested Type Function Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains functions for working with arrays, maps and structs, such as `array_append` that work with
`ListArray`, `LargeListArray` and `FixedListArray` types from the `arrow` crate.
`ListArray`, `LargeListArray` and `FixedListArray` types from the [`arrow`] crate.
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`arrow`]: https://crates.io/crates/arrow
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Table Function Library
# Apache DataFusion Table Function Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains table functions that can be used in DataFusion queries.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Window Function Common Library
# Apache DataFusion Window Function Common Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains common functions for implementing window functions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Window Function Library
# Apache DataFusion Window Function Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains window function definitions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Function Library
# Apache DataFusion Function Library
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains packages of function that can be used to customize the
functionality of DataFusion.
@@ -28,5 +28,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1
View File
@@ -19,6 +19,7 @@
name = "datafusion-macros"
description = "Procedural macros for DataFusion query engine"
keywords = ["datafusion", "query", "sql"]
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
homepage = { workspace = true }
+5 -6
View File
@@ -17,15 +17,14 @@
under the License.
-->
# DataFusion Window Function Common Library
# Apache DataFusion Macros
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains common macros used in DataFusion
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
Most projects should use the [`datafusion`] crate directly.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+5 -2
View File
@@ -17,7 +17,9 @@
under the License.
-->
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
# Apache DataFusion Optimizer
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains the DataFusion logical optimizer.
Please see [Query Optimizer] in the Library User Guide for more information.
@@ -26,6 +28,7 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
[query optimizer]: https://datafusion.apache.org/library-user-guide/query-optimizer.html
+31 -1
View File
@@ -1,4 +1,25 @@
# DataFusion Physical Expression Adapter
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Apache DataFusion Physical Expression Adapter
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate provides utilities for adapting physical expressions to different schemas in DataFusion.
@@ -6,3 +27,12 @@ It handles schema differences in file scans by rewriting expressions to match th
including type casting, missing columns, and partition values.
For detailed documentation, see the [`PhysicalExprAdapter`] trait documentation.
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
[`physicalexpradapter`]: https://docs.rs/datafusion/latest/datafusion/physical_expr_adapter/trait.PhysicalExprAdapter.html
+7 -4
View File
@@ -17,16 +17,19 @@
under the License.
-->
# DataFusion Core Physical Expressions
# Apache DataFusion Core Physical Expressions
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides shared APIs for implementing
physical expressions such as `PhysicalExpr` and `PhysicalSortExpr`.
physical expressions such as [`PhysicalExpr`] and [`PhysicalSortExpr`].
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
[`physicalexpr`]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
[`physicalsortexpr`]: https://docs.rs/datafusion/latest/datafusion/physical_expr/struct.PhysicalSortExpr.html
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Physical Expressions
# Apache DataFusion Physical Expressions
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides data types and utilities for physical expressions.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -4
View File
@@ -17,10 +17,9 @@
under the License.
-->
# DataFusion Physical Optimizer
# Apache DataFusion Physical Optimizer
DataFusion is an extensible query execution framework, written in Rust,
that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains the physical optimizer for DataFusion.
@@ -28,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Physical Plan
# Apache DataFusion Physical Plan
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that contains the `ExecutionPlan` trait and the various implementations of that
trait for built in operators such as filters, projections, joins, aggregations, etc.
@@ -28,5 +28,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -19,9 +19,9 @@
name = "datafusion-proto-common"
description = "Protobuf serialization of DataFusion common types"
keywords = ["arrow", "query", "sql"]
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
readme = { workspace = true }
homepage = { workspace = true }
repository = { workspace = true }
license = { workspace = true }
+9 -5
View File
@@ -17,17 +17,21 @@
under the License.
-->
# `datafusion-proto-common`: Apache DataFusion Protobuf Serialization / Deserialization
# Apache DataFusion Protobuf Common Serialization / Deserialization
This crate contains code to convert Apache [DataFusion] primitive types to and from
bytes, which can be useful for sending data over the network.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains code to convert DataFusion primitive types to and from
bytes using [Protocol Buffers], which can be useful for sending data over the network.
See [API Docs] for details and examples.
Most projects should use the [`datafusion-proto`] crate directly, which re-exports
this module. If you are already using the [`datafusion-protp`] crate, there is no
this module. If you are already using the [`datafusion-proto`] crate, there is no
reason to use this crate directly in your project as well.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[protocol buffers]: https://protobuf.dev/
[`datafusion-proto`]: https://crates.io/crates/datafusion-proto
[datafusion]: https://datafusion.apache.org
[api docs]: http://docs.rs/datafusion-proto/latest
+9 -5
View File
@@ -17,13 +17,17 @@
under the License.
-->
# `datafusion-proto`: Apache DataFusion Protobuf Serialization / Deserialization
# Apache DataFusion Protobuf Serialization / Deserialization
This crate contains code to convert [Apache DataFusion] plans to and from
bytes, which can be useful for sending plans over the network, for example
when building a distributed query engine.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate contains code to convert DataFusion plans to and from bytes using [Protocol Buffers],
which can be useful for sending plans over the network, for example when building a distributed
query engine.
See [API Docs] for details and examples.
[apache datafusion]: https://datafusion.apache.org
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[protocol buffers]: https://protobuf.dev/
[api docs]: http://docs.rs/datafusion-proto/latest
+1
View File
@@ -1,6 +1,7 @@
[package]
name = "datafusion-pruning"
description = "DataFusion Pruning Logic"
readme = "README.md"
version = { workspace = true }
edition = { workspace = true }
homepage = { workspace = true }
+34
View File
@@ -0,0 +1,34 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Apache DataFusion Pruning Logic
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that contains pruning logic, to analyze filter expressions with
statistics such as min/max values and null counts, proving files / large subsections of files can be skipped
without reading the actual data.
Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+1 -1
View File
@@ -18,11 +18,11 @@
[package]
name = "datafusion-session"
description = "datafusion-session"
readme = "README.md"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
license.workspace = true
readme.workspace = true
repository.workspace = true
rust-version.workspace = true
version.workspace = true
+4 -3
View File
@@ -17,9 +17,9 @@
under the License.
-->
# DataFusion Session
# Apache DataFusion Session
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate provides **session-related abstractions** used in the DataFusion query engine. A _session_ represents the runtime context for query execution, including configuration, runtime environment, function registry, and planning.
@@ -27,5 +27,6 @@ Most projects should use the [`datafusion`] crate directly, which re-exports
this module. If you are already using the [`datafusion`] crate, there is no
reason to use this crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
+14 -5
View File
@@ -17,9 +17,15 @@ specific language governing permissions and limitations
under the License.
-->
# datafusion-spark: Spark-compatible Expressions
# Apache DataFusion Spark-compatible Expressions
This crate provides Apache Spark-compatible expressions for use with DataFusion.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides [Apache Spark] compatible expressions for use with DataFusion.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[apache spark]: https://spark.apache.org/
## Testing Guide
@@ -29,12 +35,15 @@ or `coerce_types`) is not applied.
Therefore, direct invocation tests should only be used to verify that the function is correctly implemented.
Please be sure to add additional tests beyond direct invocation.
For more detailed testing guidelines, refer to
the [Spark SQLLogicTest README](../sqllogictest/test_files/spark/README.md).
For more detailed testing guidelines, refer to the [Spark SQLLogicTest README].
## Implementation References
When implementing Spark-compatible functions, you can check if there are existing implementations in
the [Sail](https://github.com/lakehq/sail) or [Comet](https://github.com/apache/datafusion-comet) projects first.
the [Sail] or [Comet] projects first.
If you do port functionality from these sources, make sure to port over the corresponding tests too, to ensure
correctness and compatibility.
[spark sqllogictest readme]: ../sqllogictest/test_files/spark/README.md
[sail]: https://github.com/lakehq/sail
[comet]: https://github.com/apache/datafusion-comet
+8 -7
View File
@@ -17,10 +17,10 @@
under the License.
-->
# DataFusion SQL Query Planner
# Apache DataFusion SQL Query Planner
This crate provides a general purpose SQL query planner that can parse SQL and translate queries into logical
plans. Although this crate is used by the [DataFusion][df] query engine, it was designed to be easily usable from any
plans. Although this crate is used by the [Apache DataFusion] query engine, it was designed to be easily usable from any
project that requires a SQL query planner and does not make any assumptions about how the resulting logical plan
will be translated to a physical plan. For example, there is no concept of row-based versus columnar execution in the
logical plan.
@@ -29,12 +29,12 @@ Note that the [`datafusion`] crate re-exports this module. If you are already
using the [`datafusion`] crate in your project, there is no reason to use this
crate directly in your project as well.
[df]: https://crates.io/crates/datafusion
[apache datafusion]: https://datafusion.apache.org/
[`datafusion`]: https://crates.io/crates/datafusion
## Example Usage
See the [examples](examples) directory for fully working examples.
See the [examples] directory for fully working examples.
Here is an example of producing a logical plan from a SQL string.
@@ -69,8 +69,8 @@ fn main() {
```
This is the logical plan that is produced from this example. Note that this is an **unoptimized**
logical plan. The [datafusion-optimizer](https://crates.io/crates/datafusion-optimizer) crate provides a query
optimizer that can be applied to plans produced by this crate.
logical plan. The [datafusion-optimizer] crate provides a query optimizer that can be applied to
plans produced by this crate.
```
Sort: state_tax DESC NULLS FIRST
@@ -87,4 +87,5 @@ Sort: state_tax DESC NULLS FIRST
TableScan: orders
```
[df]: https://crates.io/crates/datafusion
[examples]: examples
[datafusion-optimizer]: https://crates.io/crates/datafusion-optimizer
+14 -8
View File
@@ -17,23 +17,29 @@
under the License.
-->
# DataFusion sqllogictest
# Apache DataFusion sqllogictest
[DataFusion][df] is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that contains an implementation of [sqllogictest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki).
This crate is a submodule of DataFusion that contains an implementation of [sqllogictest].
[df]: https://crates.io/crates/datafusion
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[sqllogictest]: https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
## Overview
This crate uses [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) to parse and run `.slt` files in the
[`test_files`](test_files) directory of this crate or the [`data/sqlite`](https://github.com/apache/datafusion-testing/tree/main/data/sqlite)
directory of the [datafusion-testing](https://github.com/apache/datafusion-testing) crate.
This crate uses [sqllogictest-rs] to parse and run `.slt` files in the [`test_files`] directory of
this crate or the [`data/sqlite`] directory of the [datafusion-testing] repository.
[sqllogictest-rs]: https://github.com/risinglightdb/sqllogictest-rs
[`test_files`]: test_files
[`data/sqlite`]: https://github.com/apache/datafusion-testing/tree/main/data/sqlite
[datafusion-testing]: https://github.com/apache/datafusion-testing
## Testing setup
1. `rustup update stable` DataFusion uses the latest stable release of rust
1. `rustup update stable` DataFusion uses the latest stable release of Rust
2. `git submodule init`
3. `git submodule update --init --remote --recursive`
+5 -2
View File
@@ -19,9 +19,12 @@
# Apache DataFusion Substrait
This crate contains a [Substrait] producer and consumer for [Apache DataFusion]
[Apache DataFusion] is an extensible query execution framework, written in Rust, that uses [Apache Arrow] as its in-memory format.
This crate is a submodule of DataFusion that provides a [Substrait] producer and consumer for DataFusion
plans. See [API Docs] for details and examples.
[apache arrow]: https://arrow.apache.org/
[apache datafusion]: https://datafusion.apache.org/
[substrait]: https://substrait.io
[apache datafusion]: https://datafusion.apache.org
[api docs]: https://docs.rs/datafusion-substrait/latest