From e894a03bea638e35677eaf27876966013dd64bf4 Mon Sep 17 00:00:00 2001 From: Neil Conway Date: Wed, 25 Feb 2026 13:12:42 -0500 Subject: [PATCH] perf: Use Hashbrown for array_distinct (#20538) ## Which issue does this PR close? N/A ## Rationale for this change #20364 recently optimized `array_distinct` to use batched row conversion. As part of that PR, `std::HashSet` was used. This PR just replaces `std::HashSet` with `hashbrown::HashSet`, which measurably improves performance. ## What changes are included in this PR? ## Are these changes tested? Yes. ## Are there any user-facing changes? No. --- datafusion/functions-nested/src/set_ops.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datafusion/functions-nested/src/set_ops.rs b/datafusion/functions-nested/src/set_ops.rs index 2348b3c53..150559111 100644 --- a/datafusion/functions-nested/src/set_ops.rs +++ b/datafusion/functions-nested/src/set_ops.rs @@ -34,8 +34,8 @@ use datafusion_expr::{ ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility, }; use datafusion_macros::user_doc; +use hashbrown::HashSet; use std::any::Any; -use std::collections::HashSet; use std::fmt::{Display, Formatter}; use std::sync::Arc;