Files
datafusion/docs
Neil Conway 585bbf35d3 perf: Optimize array_has_any() with scalar arg (#20385)
## Which issue does this PR close?

- Closes #20384.
- See #18181 for related context.

## Rationale for this change

When `array_has_any` is passed a scalar for either of its arguments, we
can use a much faster algorithm: rather than doing O(N*M) comparisons
for each row of the columnar arg, we can build a hash table on the
scalar argument and probe it instead.

## What changes are included in this PR?

* Add benchmark to cover the one-scalar-arg case
* Implement optimization as described above

Note that we fallback to a linear scan when the scalar arg is smaller
than a threshold (<= 8 elements), because benchmarks suggested probing a
HashSet is not profitable for very small arrays.

## Are these changes tested?

Yes. Tests pass and benchmarked.

## Are there any user-facing changes?

No.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
2026-02-24 20:59:08 +00:00
..
2024-03-02 08:44:04 -07:00

DataFusion Documentation

This folder contains the source content of the User Guide and Contributor Guide. These are both published to https://datafusion.apache.org/ as part of the release process.

Dependencies

Install build dependencies and build the documentation using uv:

uv sync
uv run bash build.sh

The docs build regenerates the workspace dependency graph via docs/scripts/generate_dependency_graph.sh, so ensure cargo, cargo-depgraph (cargo install cargo-depgraph --version ^1.6 --locked), and Graphviz dot (brew install graphviz or sudo apt-get install -y graphviz) are available.

Build & Preview

Run the provided script to build the HTML pages.

# If using venv, ensure you have activated it
./build.sh

The HTML will be generated into a build directory. Open build/html/index.html in your preferred browser, e.g.

Preview the site on Linux by running this command.

# On macOS
open build/html/index.html
# On Linux with Firefox
firefox build/html/index.html

Making Changes

To make changes to the docs, simply make a Pull Request with your proposed changes as normal. When the PR is merged the docs will be automatically updated.

Release Process

This documentation is hosted at https://datafusion.apache.org/

When the PR is merged to the main branch of the DataFusion repository, a github workflow which:

  1. Builds the html content
  2. Pushes the html content to the asf-site branch in this repository.

The Apache Software Foundation provides https://datafusion.apache.org/, which serves content based on the configuration in .asf.yaml, which specifies the target as https://datafusion.apache.org/.