Files
datafusion/docs
Neil Conway d6fb3608b0 perf: Optimize array_position for scalar needle (#20532)
## Which issue does this PR close?

- Closes #20530 

## Rationale for this change

The previous implementation of `array_position` used
`compare_element_to_list` for every input row. When the needle is a
scalar (quite common), we can do much better by searching over the
entire flat haystack values array with a single call to
`arrow_ord::cmp::not_distinct`. We can then iterate over the resulting
set bits to determine per-row results.

This is ~5-10x faster than the previous implementation for typical
inputs.

## What changes are included in this PR?

* Implement new fast path for `array_position` with scalar needle
* Improve docs for `array_position`
* Don't use `internal_err` to report a user-visible error

## Are these changes tested?

Yes, and benchmarked. Additional tests added in a separate PR (#20531)

## Are there any user-facing changes?

No.
2026-02-26 18:40:55 +00:00
..
2024-03-02 08:44:04 -07:00

DataFusion Documentation

This folder contains the source content of the User Guide and Contributor Guide. These are both published to https://datafusion.apache.org/ as part of the release process.

Dependencies

Install build dependencies and build the documentation using uv:

uv sync
uv run bash build.sh

The docs build regenerates the workspace dependency graph via docs/scripts/generate_dependency_graph.sh, so ensure cargo, cargo-depgraph (cargo install cargo-depgraph --version ^1.6 --locked), and Graphviz dot (brew install graphviz or sudo apt-get install -y graphviz) are available.

Build & Preview

Run the provided script to build the HTML pages.

# If using venv, ensure you have activated it
./build.sh

The HTML will be generated into a build directory. Open build/html/index.html in your preferred browser, e.g.

Preview the site on Linux by running this command.

# On macOS
open build/html/index.html
# On Linux with Firefox
firefox build/html/index.html

Making Changes

To make changes to the docs, simply make a Pull Request with your proposed changes as normal. When the PR is merged the docs will be automatically updated.

Release Process

This documentation is hosted at https://datafusion.apache.org/

When the PR is merged to the main branch of the DataFusion repository, a github workflow which:

  1. Builds the html content
  2. Pushes the html content to the asf-site branch in this repository.

The Apache Software Foundation provides https://datafusion.apache.org/, which serves content based on the configuration in .asf.yaml, which specifies the target as https://datafusion.apache.org/.