## Which issue does this PR close? - Closes #20384. - See #18181 for related context. ## Rationale for this change When `array_has_any` is passed a scalar for either of its arguments, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar argument and probe it instead. ## What changes are included in this PR? * Add benchmark to cover the one-scalar-arg case * Implement optimization as described above Note that we fallback to a linear scan when the scalar arg is smaller than a threshold (<= 8 elements), because benchmarks suggested probing a HashSet is not profitable for very small arrays. ## Are these changes tested? Yes. Tests pass and benchmarked. ## Are there any user-facing changes? No. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
DataFusion Documentation
This folder contains the source content of the User Guide and Contributor Guide. These are both published to https://datafusion.apache.org/ as part of the release process.
Dependencies
Install build dependencies and build the documentation using uv:
uv sync
uv run bash build.sh
The docs build regenerates the workspace dependency graph via
docs/scripts/generate_dependency_graph.sh, so ensure cargo, cargo-depgraph
(cargo install cargo-depgraph --version ^1.6 --locked), and Graphviz dot
(brew install graphviz or sudo apt-get install -y graphviz) are available.
Build & Preview
Run the provided script to build the HTML pages.
# If using venv, ensure you have activated it
./build.sh
The HTML will be generated into a build directory. Open build/html/index.html
in your preferred browser, e.g.
Preview the site on Linux by running this command.
# On macOS
open build/html/index.html
# On Linux with Firefox
firefox build/html/index.html
Making Changes
To make changes to the docs, simply make a Pull Request with your proposed changes as normal. When the PR is merged the docs will be automatically updated.
Release Process
This documentation is hosted at https://datafusion.apache.org/
When the PR is merged to the main branch of the DataFusion
repository, a github workflow which:
- Builds the html content
- Pushes the html content to the
asf-sitebranch in this repository.
The Apache Software Foundation provides https://datafusion.apache.org/, which serves content based on the configuration in .asf.yaml, which specifies the target as https://datafusion.apache.org/.