mirror of
https://github.com/langchain-ai/arrow-rs.git
synced 2026-07-01 21:34:01 -04:00
df702cfc71
# Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/7394 # Rationale for this change Prepare for next software release # What changes are included in this PR? 1. Update version to `55.2.0` 2. See rendered changelog here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_55.2.0/CHANGELOG.md # Are there any user-facing changes? New release version
32 KiB
32 KiB
Changelog
55.2.0 (2025-06-22)
Implemented enhancements:
- Do not populate nulls for
NullArrayforMutableArrayData#7725 - Implement
PartialEqfor RunArray #7691 interleave_viewsis really slow #7688 [arrow]- Add min max aggregates for FixedSizeBinary #7674 [arrow]
- Deliver pyarrow as a standalone crate #7668 [arrow]
- [Variant] Implement
VariantObject::fieldandVariantObject::fields#7665 [parquet] - [Variant] Implement read support for remaining primitive types #7630 [parquet]
- Fast and ergonomic method to add metadata to a
RecordBatch#7628 [arrow] - Add efficient way to change the keys of string dictionary builder #7610 [arrow]
- Support
add_nullson additional builder types #7605 [arrow] - Add
into_innerforAsyncArrowWriter#7603 [parquet] - Optimize
PrimitiveBuilder::append_trusted_len_iter#7591 [arrow] - Benchmark for filter+concat and take+concat into even sized record batches #7589 [arrow]
max_statistics_truncate_lengthis ignored when writing statistics to data page headers #7579 [parquet]- Feature Request: Encoding in
parquet-rewrite#7575 [parquet] - Add a
strong_countmethod toBuffer#7568 [arrow] - Create version of LexicographicalComparator that compares fixed number of columns #7531 [arrow]
- parquet-show-bloom-filter should work with integer typed columns #7528 [parquet]
- Allow merging primitive dictionary values in concat and interleave kernels #7518 [arrow]
- Add efficient concatenation of StructArrays #7516 [arrow]
- Rename
flight-sql-experimentaltoflight-sql#7498 [arrow] [arrow-flight] - Consider moving from ryu to lexical-core for string formatting / casting floats to string. #7496
- Arithmetic kernels can be safer and faster #7494 [arrow]
- Speedup
filter_bytesby precalculating capacity #7465 [arrow] - [Variant]: Rust API to Create Variant Values #7424 [parquet] [arrow]
- [Variant] Rust API to Read Variant Values #7423 [arrow]
- Release arrow-rs / parquet Minor version
55.1.0(May 2025) #7393 [parquet] - Support create_random_array for Decimal data types #7343 [arrow]
- Truncate Parquet page data page statistics #7555 [parquet] (etseidl)
Fixed bugs:
- In arrow_json, Decoder::decode can panic if it encounters two high surrogates in a row. #7712
- FlightSQL "GetDbSchemas" and "GetTables" schemas do not fully match the protocol #7637 [arrow] [arrow-flight]
- Cannot read encrypted Parquet file if page index reading is enabled #7629 [parquet]
encoding_statsnot present in Parquet generated byparquet-rewrite#7616 [parquet]- When writing parquet plaintext footer files
footer_signing_key_metadatais not included, encryption alghoritm is always written in footer #7599 [parquet] new_null_arraypanics when constructing a struct of a dictionary #7571- Parquet derive fails to build when Result is aliased #7547
- Unable to read
Dictionary(u8, FixedSizeBinary(_))using datafusion. #7545 [parquet] - filter_record_batch panics with empty struct array. #7538 [arrow]
- Panic in
pretty_formatfunction when displaying DurationSecondsArray withi64::MIN/i64::MAX#7533 [arrow] - Record API unable to parse TIME_MILLIS when encoded as INT32 #7510 [parquet]
- The
read_record_batchfunc of theRecordBatchDecoderdoes not respect theskip_validationproperty #7508 [arrow] arrow-55.1.0breaksfilter_record_batch#7500- Files containing binary data with >=8_388_855 bytes per row written with
arrow-rscan't be read withpyarrow#7489 [parquet] - [Bug] Ingestion with Arrow Flight Sql panic when the input stream is empty or fallible #7329 [arrow] [arrow-flight]
- Ensure page encoding statistics are written to Parquet file #7643 [parquet] (etseidl)
Documentation updates:
- arrow_reader_row_filter benchmark doesn't capture page cache improvements #7460 [parquet] [arrow]
- chore: fix a typo in
ExtensionType::supports_data_typedocs #7682 [arrow] (mbrobbel) - [Variant] Add variant docs and examples #7661 [parquet] (alamb)
- Minor: Add version to deprecation notice for
ParquetMetaDataReader::decode_footer#7639 [parquet] (etseidl) - Add references for defaults in
WriterPropertiesBuilder#7558 [parquet] (etseidl) - Clarify Docs: NullBuffer::len is in bits #7556 [arrow] (alamb)
- docs: fix typo for
Decimal128Array#7525 [arrow] (burmecia) - Minor: Add examples to ProjectionMask documentation #7523 [parquet] (alamb)
- Improve documentation for Parquet
WriterProperties#7491 [parquet] (alamb)
Closed issues:
- [Variant] More efficient determination of String vs ShortString #7700
- [Variant] Improve API for iterating over values of a VariantList #7685 [parquet]
- [Variant] Consider validating variants on creation (rather than read) #7684 [parquet]
- Miri test_native_type_pow test failing #7641 [arrow]
- Improve performance of
coalesceandconcatfor views #7615 [arrow] - Bad min value in row group statistics in some special cases #7593
- Feature Request: BloomFilter Position Flexibility in
parquet-rewrite#7552 [parquet]
Merged pull requests:
- arrow-array: Implement PartialEq for RunArray #7727 [arrow] (brancz)
- fix: Do not add null buffer for
NullArrayin MutableArrayData #7726 [arrow] (comphead) - fix JSON decoder error checking for UTF16 / surrogate parsing panic #7721 [arrow] (nicklan)
- [Variant] Introduce new type over &str for ShortString #7718 [parquet] (friendlymatthew)
- Split out variant code into several new sub-modules #7717 [parquet] (scovich)
- Support write to buffer api for SerializedFileWriter #7714 [parquet] (zhuqi-lucas)
- Make variant iterators safely infallible #7704 [parquet] (scovich)
- Speedup
interleave_views(4-7x faster) #7695 [arrow] (Dandandan) - Define a "arrow-pyrarrow" crate to implement the "pyarrow" feature. #7694 [arrow] (brunal)
- Document REE row format and add some more tests #7680 [arrow] (alamb)
- feat: add min max aggregate support for FixedSizeBinary #7675 [arrow] (alexwilcoxson-rel)
- arrow-data: Add REE support for
build_extendandbuild_extend_nulls#7671 [arrow] (brancz) - Remove
lazy_staticdependency #7669 [arrow] (Expyron) - Finish implementing Variant::Object and Variant::List #7666 [parquet] (scovich)
- Add
RecordBatch::schema_metadata_mutandField::metadata_mut#7664 [arrow] (emilk) - [Variant] Simplify creation of Variants from metadata and value #7663 [parquet] (alamb)
- chore: group prost dependabot updates #7659 (mbrobbel)
- Initial Builder API for Creating Variant Values #7653 [parquet] (PinkCrow007)
- Add
BatchCoalescer::push_filtered_batchand docs #7652 [arrow] (alamb) - Optimize coalesce kernel for StringView (10-50% faster) #7650 [arrow] (alamb)
- arrow-row: Add support for REE #7649 [arrow] (brancz)
- Use approximate comparisons for pow tests #7646 [arrow] (adamreeve)
- [Variant] Implement read support for remaining primitive types #7644 [parquet] (superserious-dev)
- Add
pretty_format_batches_with_schemafunction #7642 [arrow] (lewiszlw) - Deprecate old Parquet page index parsing functions #7640 [parquet] (etseidl)
- Update FlightSQL
GetDbSchemasandGetTablesschemas to fully match the protocol #7638 [arrow] [arrow-flight] (sgrebnov) - Minor: Remove outdated FIXME from
ParquetMetaDataReader#7635 [parquet] (etseidl) - Fix the error info of
StructArray::try_new#7634 [arrow] (xudong963) - Fix reading encrypted Parquet pages when using the page index #7633 [parquet] (adamreeve)
- [Variant] Add commented out primitive test casees #7631 [parquet] (alamb)
- Improve
coalescekernel tests #7626 [arrow] (alamb) - Revert "Revert "Improve
coalesceandconcatperformance for views… #7625 [arrow] (Dandandan) - Revert "Improve
coalesceandconcatperformance for views (#7614)" #7623 [arrow] (Dandandan) - Improve coalesce_kernel benchmark to capture inline vs non inline views #7619 [arrow] (alamb)
- Improve
coalesceandconcatperformance for views #7614 [arrow] (Dandandan) - feat: add constructor to help efficiently upgrade key for GenericBytesDictionaryBuilder #7611 [arrow] (albertlockett)
- feat: support append_nulls on additional builders #7606 [arrow] (albertlockett)
- feat: add AsyncArrowWriter::into_inner #7604 [parquet] (jpopesculian)
- Move variant interop test to Rust integration test #7602 [parquet] (alamb)
- Include footer key metadata when writing encrypted Parquet with a plaintext footer #7600 [parquet] (rok)
- Add
coalescekernel andBatchCoalescerfor statefully combining selected b…atches: #7597 [arrow] (alamb) - Add FixedSizeBinary to
take_kernelbenchmark #7592 [arrow] (alamb) - Fix GenericBinaryArray docstring. #7588 [arrow] (brunal)
- fix: error reading multiple batches of
Dict(_, FixedSizeBinary(_))#7585 [parquet] (albertlockett) - Revert "Minor: remove filter code deprecated in 2023 (#7554)" #7583 [arrow] (alamb)
- Fixed a warning build build: function never used. #7577 [parquet] (JigaoLuo)
- Adding Encoding argument in
parquet-rewrite#7576 [parquet] (JigaoLuo) - feat: add
row_group_is_[max/min]_value_exactto StatisticsConverter #7574 [parquet] (CookiePieWw) - [array] Remove unwrap checks from GenericByteArray::value_unchecked #7573 [arrow] (ctsk)
- [benches/row_format] fix typo in array lengths #7572 [arrow] (ctsk)
- Add a strong_count method to Buffer #7569 [arrow] (westonpace)
- Minor: Enable byte view for clickbench benchmark #7565 [parquet] (zhuqi-lucas)
- Optimize length calculation in row encoding for fixed-length columns #7564 [arrow] (ctsk)
- Use PR title and description for commit message #7563 (kou)
- Use apache/arrow-{go,java,js} in integration test #7561 (kou)
- Implement Array Decoding in arrow-avro #7559 [arrow] (jecsand838)
- Minor: remove filter code deprecated in 2023 #7554 [arrow] (alamb)
- fix: Correct docs for
WriterPropertiesBuilder::set_column_index_truncate_length#7553 [parquet] (etseidl) - Adding Bloom Filter Position argument in parquet-rewrite #7550 [parquet] (JigaoLuo)
- Fix
Resultname collision in parquet_derive #7548 (jspaezp) - Fix: Converted feature flight-sql-experimental to flight-sql #7546 [arrow] [arrow-flight] (kunalsinghdadhwal)
- Fix CI on main due to logical conflict #7542 [arrow] (alamb)
- Fix
filter_record_batchpanics with empty struct array #7539 [arrow] (thorfour) - [Variant] Initial API for reading Variant data and metadata #7535 (mkarbo)
- fix: Panic in pretty_format function when displaying DurationSecondsA… #7534 [arrow] (zhuqi-lucas)
- Create version of LexicographicalComparator that compares fixed number of columns (~ -15%) #7530 [arrow] (Dandandan)
- Make parquet-show-bloom-filter work with integer typed columns #7529 [parquet] (adamreeve)
- chore(deps): update criterion requirement from 0.5 to 0.6 #7527 [parquet] [arrow] (mbrobbel)
- Minor: Add a parquet row_filter test, reduce some test boiler plate #7522 [parquet] (alamb)
- Refactor
build_array_readerinto a struct #7521 [parquet] (alamb) - arrow: add concat structs benchmark #7520 [arrow] (asubiotto)
- arrow-select: add support for merging primitive dictionary values #7519 [arrow] (asubiotto)
- arrow-select: add support for optimized concatenation of struct arrays #7517 [arrow] (asubiotto)
- Fix Clippy in CI for Rust 1.87 release #7514 [parquet] [arrow] [arrow-flight] (alamb)
- Simplify
ParquetRecordBatchReader::nextcontrol logic #7512 [parquet] (alamb) - Fix record API support for reading INT32 encoded TIME_MILLIS #7511 [parquet] (njaremko)
- RecordBatchDecoder: skip RecordBatch validation when
skip_validationproperty is enabled #7509 [arrow] (nilskch) - Introduce
ReadPlanto encapsulate the calculation of what parquet rows to decode #7502 [parquet] (alamb) - Update documentation for ParquetReader #7501 [parquet] (alamb)
- Improve
Fielddocs, add missingField::set_*methods #7497 [arrow] (alamb) - Speed up arithmetic kernels, reduce
unsafeusage #7493 [arrow] (Dandandan) - Prevent FlightSQL server panics for
do_putwhen stream is empty or 1st stream element is an Err #7492 [arrow] [arrow-flight] (superserious-dev) - arrow-ipc: add
StreamDecoder::schema#7488 [arrow] (lidavidm) - arrow-select: Implement concat for
RunArrays #7487 [arrow] (brancz) - [Variant] Add (empty)
parquet-variantcrate, updateparquet-testingpin #7485 (alamb) - Improve error messages if schema hint mismatches with parquet schema #7481 [parquet] [arrow] (alamb)
- Add
arrow_reader_clickbenchbenchmark #7470 [parquet] (alamb) - Speedup
filter_bytes-20-40%,-37%) #7463 [arrow] (Dandandan)filter_nativelow selectivity ( - Update arrow_reader_row_filter benchmark to reflect ClickBench distribution #7461 [parquet] (alamb)
- Add Map support to arrow-avro #7451 [arrow] (jecsand838)
- Support Utf8View for Avro #7434 [arrow] (kumarlokesh)
- Add support for creating random Decimal128 and Decimal256 arrays #7427 [arrow] (Weijun-H)
* This Changelog was automatically generated by github_changelog_generator