mirror of
https://github.com/langchain-ai/datafusion.git
synced 2026-07-01 21:24:06 -04:00
34dad2ccee
- closes https://github.com/apache/datafusion/issues/19796 This patch aims to implement a fast-path for the ExecutionPlan::with_new_children function for some plans, moving closer to a physical plan re-use implementation and improving planning performance. If the passed children properties are the same as in self, we do not actually recompute self's properties (which could be costly if projection mapping is required). Instead, we just replace the children and re-use self's properties as-is. To be able to compare two different properties -- ExecutionPlan::properties(...) signature is modified and now returns `&Arc<PlanProperties>`. If `children` properties are the same in `with_new_children` -- we clone our properties arc and then a parent plan will consider our properties as unchanged, doing the same. - Return `&Arc<PlanProperties>` from `ExecutionPlan::properties(...)` instead of a reference. - Implement `with_new_children` fast-path if there is no children properties changes for all major plans. Note: currently, `reset_plan_states` does not allow to re-use plan in general: it is not supported for dynamic filters and recursive queries features, as in this case state reset should update pointers in the children plans. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
DataFusion Examples
This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Prerequisites
Run git submodule update --init to init test files.
Running Examples
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init
# Change to the examples directory
cd datafusion-examples/examples
# Run all examples in a group
cargo run --example <group> -- all
# Run a specific example within a group
cargo run --example <group> -- <subcommand>
# Run all examples in the `dataframe` group
cargo run --example dataframe -- all
# Run a single example from the `dataframe` group
# (apply the same pattern for any other group)
cargo run --example dataframe -- dataframe
Builtin Functions Examples
Group: builtin_functions
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| date_time | builtin_functions/date_time.rs |
Examples of date-time related functions and queries |
| function_factory | builtin_functions/function_factory.rs |
Register CREATE FUNCTION handler to implement SQL macros |
| regexp | builtin_functions/regexp.rs |
Examples of using regular expression functions |
Custom Data Source Examples
Group: custom_data_source
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| adapter_serialization | custom_data_source/adapter_serialization.rs |
Preserve custom PhysicalExprAdapter information during plan serialization using PhysicalExtensionCodec interception |
| csv_json_opener | custom_data_source/csv_json_opener.rs |
Use low-level FileOpener APIs for CSV/JSON |
| csv_sql_streaming | custom_data_source/csv_sql_streaming.rs |
Run a streaming SQL query against CSV data |
| custom_datasource | custom_data_source/custom_datasource.rs |
Query a custom TableProvider |
| custom_file_casts | custom_data_source/custom_file_casts.rs |
Implement custom casting rules |
| custom_file_format | custom_data_source/custom_file_format.rs |
Write to a custom file format |
| default_column_values | custom_data_source/default_column_values.rs |
Custom default values using metadata |
| file_stream_provider | custom_data_source/file_stream_provider.rs |
Read/write via FileStreamProvider for streams |
Data IO Examples
Group: data_io
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| catalog | data_io/catalog.rs |
Register tables into a custom catalog |
| json_shredding | data_io/json_shredding.rs |
Implement filter rewriting for JSON shredding |
| parquet_adv_idx | data_io/parquet_advanced_index.rs |
Create a secondary index across multiple parquet files |
| parquet_emb_idx | data_io/parquet_embedded_index.rs |
Store a custom index inside Parquet files |
| parquet_enc | data_io/parquet_encrypted.rs |
Read & write encrypted Parquet files |
| parquet_enc_with_kms | data_io/parquet_encrypted_with_kms.rs |
Encrypted Parquet I/O using a KMS-backed factory |
| parquet_exec_visitor | data_io/parquet_exec_visitor.rs |
Extract statistics by visiting an ExecutionPlan |
| parquet_idx | data_io/parquet_index.rs |
Create a secondary index |
| query_http_csv | data_io/query_http_csv.rs |
Query CSV files via HTTP |
| remote_catalog | data_io/remote_catalog.rs |
Interact with a remote catalog |
DataFrame Examples
Group: dataframe
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| cache_factory | dataframe/cache_factory.rs |
Custom lazy caching for DataFrames using CacheFactory |
| dataframe | dataframe/dataframe.rs |
Query DataFrames from various sources and write output |
| deserialize_to_struct | dataframe/deserialize_to_struct.rs |
Convert Arrow arrays into Rust structs |
Execution Monitoring Examples
Group: execution_monitoring
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| mem_pool_exec_plan | execution_monitoring/memory_pool_execution_plan.rs |
Memory-aware ExecutionPlan with spilling |
| mem_pool_tracking | execution_monitoring/memory_pool_tracking.rs |
Demonstrates memory tracking |
| tracing | execution_monitoring/tracing.rs |
Demonstrates tracing integration |
External Dependency Examples
Group: external_dependency
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| dataframe_to_s3 | external_dependency/dataframe_to_s3.rs |
Query DataFrames and write results to S3 |
| query_aws_s3 | external_dependency/query_aws_s3.rs |
Query S3-backed data using object_store |
Flight Examples
Group: flight
Category: Distributed
| Subcommand | File Path | Description |
|---|---|---|
| client | flight/client.rs |
Execute SQL queries via Arrow Flight protocol |
| server | flight/server.rs |
Run DataFusion server accepting FlightSQL/JDBC queries |
| sql_server | flight/sql_server.rs |
Standalone SQL server for JDBC clients |
Proto Examples
Group: proto
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| composed_extension_codec | proto/composed_extension_codec.rs |
Use multiple extension codecs for serialization/deserialization |
| expression_deduplication | proto/expression_deduplication.rs |
Example of expression caching/deduplication using the codec decorator pattern |
Query Planning Examples
Group: query_planning
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| analyzer_rule | query_planning/analyzer_rule.rs |
Custom AnalyzerRule to change query semantics |
| expr_api | query_planning/expr_api.rs |
Create, execute, analyze, and coerce Exprs |
| optimizer_rule | query_planning/optimizer_rule.rs |
Replace predicates via a custom OptimizerRule |
| parse_sql_expr | query_planning/parse_sql_expr.rs |
Parse SQL into DataFusion Expr |
| plan_to_sql | query_planning/plan_to_sql.rs |
Generate SQL from expressions or plans |
| planner_api | query_planning/planner_api.rs |
APIs for logical and physical plan manipulation |
| pruning | query_planning/pruning.rs |
Use pruning to skip irrelevant files |
| thread_pools | query_planning/thread_pools.rs |
Configure custom thread pools for DataFusion execution |
Relation Planner Examples
Group: relation_planner
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| match_recognize | relation_planner/match_recognize.rs |
Implement MATCH_RECOGNIZE pattern matching |
| pivot_unpivot | relation_planner/pivot_unpivot.rs |
Implement PIVOT / UNPIVOT |
| table_sample | relation_planner/table_sample.rs |
Implement TABLESAMPLE |
SQL Ops Examples
Group: sql_ops
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| analysis | sql_ops/analysis.rs |
Analyze SQL queries |
| custom_sql_parser | sql_ops/custom_sql_parser.rs |
Implement a custom SQL parser to extend DataFusion |
| frontend | sql_ops/frontend.rs |
Build LogicalPlans from SQL |
| query | sql_ops/query.rs |
Query data using SQL |
UDF Examples
Group: udf
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| adv_udaf | udf/advanced_udaf.rs |
Advanced User Defined Aggregate Function (UDAF) |
| adv_udf | udf/advanced_udf.rs |
Advanced User Defined Scalar Function (UDF) |
| adv_udwf | udf/advanced_udwf.rs |
Advanced User Defined Window Function (UDWF) |
| async_udf | udf/async_udf.rs |
Asynchronous User Defined Scalar Function |
| udaf | udf/simple_udaf.rs |
Simple UDAF example |
| udf | udf/simple_udf.rs |
Simple UDF example |
| udtf | udf/simple_udtf.rs |
Simple UDTF example |
| udwf | udf/simple_udwf.rs |
Simple UDWF example |