# Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/7394 # Rationale for this change Prepare for next software release # What changes are included in this PR? 1. Update version to `55.2.0` 2. See rendered changelog here: https://github.com/alamb/arrow-rs/blob/alamb/prepare_55.2.0/CHANGELOG.md # Are there any user-facing changes? New release version
856 KiB
Historical Changelog
55.1.0 (2025-05-09)
Breaking changes:
- refactor!: do not default the struct array length to 0 in Struct::try_new #7247 [arrow] (westonpace)
Implemented enhancements:
- Add a way to get max
usizefromOffsetSizeTrait#7474 [arrow] - Deterministic metadata encoding #7448 [arrow]
- Support Arrow type Dictionary with value FixedSizeBinary in Parquet #7445
- Parquet: Add ability to project rowid in parquet reader #7444
- Move parquet::file::metadata::reader::FooterTail to parquet::file::metadata so that it is public #7438 [parquet]
- Speedup take_bytes by precalculating capacity #7432 [arrow]
- Improve performance of interleave_primitive and interleave_bytes #7421 [arrow]
- Implement
EqandDefaultforScalarBuffer#7411 [arrow] - Add decryption support for column index and offset index #7390 [parquet]
- Support writing encrypted Parquet files with plaintext footers #7320 [parquet]
- Support Parquet key management tools #7256 [parquet]
- Verify footer tags when reading encrypted Parquet files with plaintext footers #7255 [parquet]
- StructArray::try_new behavior can be unexpected when there are no child arrays #7246 [arrow]
- Parquet performance: improve performance of reading int8/int16 #7097 [parquet]
Fixed bugs:
- StructArray::try_new validation incorrectly returns an error when
logical_nulls()returns Some() && null_count == 0 #7435 - Reading empty DataPageV2 fails with
snappy: corrupt input (empty)#7388 [parquet]
Documentation updates:
- Improve documentation and add examples for ArrowPredicateFn #7480 [parquet] (alamb)
- Document Arrow <--> Parquet schema conversion better #7479 [parquet] (alamb)
- Fix a typo in arrow/examples/README.md #7473 [arrow] (Mottl)
Closed issues:
- Refactor Parquet DecryptionPropertiesBuilder to fix use of unreachable #7476 [parquet]
- Implement
EqandDefaultforOffsetBuffer#7417 [arrow]
Merged pull requests:
- Add Parquet
arrow_readerbenchmarks for {u}int{8,16} columns #7484 [parquet] (alamb) - fix:
rustdoc::unportable_markdownwas removed #7483 [arrow] [arrow-flight] (crepererum) - Support round trip reading / writing Arrow
Durationtype to parquet #7482 [parquet] (Liyixin95) - Add const MAX_OFFSET to OffsetSizeTrait #7478 [arrow] (thinkharderdev)
- Refactor Parquet DecryptionPropertiesBuilder #7477 [parquet] (adamreeve)
- Support parsing and display pretty for StructType #7469 [arrow] (goldmedal)
- chore(deps): update sysinfo requirement from 0.34.0 to 0.35.0 #7462 [parquet] (dependabot[bot])
- Verify footer tags when reading encrypted Parquet files with plaintext footers #7459 [parquet] (rok)
- Improve comments for avro #7449 [arrow] (kumarlokesh)
- feat: Support round trip reading/writing Arrow type
Dictionary(_, FixedSizeBinary(_))to Parquet #7446 [parquet] (albertlockett) - Fix out of bounds crash in RleValueDecoder #7441 [parquet] (apilloud)
- Make
FooterTailpublic #7440 [parquet] (masonh22) - Support writing encrypted Parquet files with plaintext footers #7439 [parquet] (rok)
- feat: deterministic metadata encoding #7437 [arrow] (timsaucer)
- Fix validation logic in
StructArray::try_newto account for array.logical_nulls() returning Some() and null_count == 0 #7436 [arrow] (phillipleblanc) - Minor: Fix typo in async_reader comment #7433 [parquet] (amoeba)
- feat: coerce fixed size binary to binary view #7431 [arrow] (chenkovsky)
- chore(deps): update brotli requirement from 7.0 to 8.0 #7430 [parquet] (dependabot[bot])
- Speedup take_bytes (-35% -69%) by precalculating capacity #7422 [arrow] (Dandandan)
- Improve performance of interleave_primitive (-15% - 45%) / interleave_bytes (-10-25%) #7420 [arrow] (Dandandan)
- Implement
EqandDefaultforOffsetBuffer#7418 [arrow] (kylebarron) - Implement
DefaultforBuffer&ScalarBuffer#7413 [arrow] (emilk) - Implement
EqforScalarBufferwhenT: Eq#7412 [arrow] (emilk) - Skip page should also support skip dict page #7409 [parquet] (zhuqi-lucas)
- Replace
RecordBatch::with_schema_uncheckedwithRecordBatch::new_unchecked#7405 [arrow] (tustvold) - feat: Adding
with_schema_uncheckedmethod forRecordBatch#7402 [arrow] (comphead) - Add benchmark for parquet reader with row_filter and project settings #7401 [parquet] (zhuqi-lucas)
- Parquet: Expose accessors from
ArrowReaderOptions#7400 (kylebarron) - Support decryption of Parquet column and offset indexes #7399 [parquet] (adamreeve)
- Handle compressed empty DataPage v2 #7389 [parquet] (EnricoMi)
- Improve performance of reading int8/int16 Parquet data #7055 [parquet] (etseidl)
55.0.0 (2025-04-08)
Breaking changes:
- Change Parquet API interaction to use
u64(support files larger than 4GB in WASM) #7371 [parquet] (kylebarron) - Remove
AsyncFileReader::get_metadata_with_options, addoptionstoAsyncFileReader::get_metadata#7342 [parquet] (corwinjoy) - Parquet: Support reading Parquet metadata via suffix range requests #7334 [parquet] (kylebarron)
- Upgrade to
object_storeto0.12.0#7328 [parquet] (mbrobbel) - Upgrade
pyo3to0.24#7324 [arrow] (mbrobbel) - Reapply Box
FlightErrror::tonicto reduce size (fixes nightly clippy) #7277 [arrow] [arrow-flight] (alamb) - Improve parquet gzip compression performance using zlib-rs #7200 [parquet] (psvri)
- Fix:
date_partto extract only the requested part (not the overall interval) #7189 [arrow] (delamarch3) - chore: upgrade flatbuffer version to
25.2.10#7134 [arrow] (tisonkun) - Add hooks to json encoder to override default encoding or add support for unsupported types #7015 [arrow] (adriangb)
Implemented enhancements:
- Improve the performance of
concat#7357 [arrow] - Pushdown predictions to Parquet in-memory row group fetches #7348 [parquet]
- Improve CSV parsing errors: Print the row that makes csv parsing fails #7344 [arrow]
- Support ColumnMetaData
encoding_statsin Parquet Writing #7341 [parquet] - Support writing Parquet with modular encryption #7327 [parquet]
- Parquet Use U64 Instead of Usize (wasm support for files greater than 4GB) #7238 [parquet]
- Support different TimeUnits and timezones when reading Timestamps from INT96 #7220 [parquet]
Fixed bugs:
- New clippy failures in code base with release of rustc 1.86 #7381 [parquet] [arrow]
- Fix bug in
ParquetMetaDataReaderand add test of suffix metadata reads with encryption #7372 [parquet] (etseidl)
Documentation updates:
- Improve documentation on
ArrayData::offset#7385 [arrow] (alamb) - Improve documentation for
AsyncFileReader::get_metadata#7380 [parquet] (alamb) - Improve documentation on implementing Parquet predicate pushdown #7370 [parquet] (alamb)
- Add documentation and examples for pretty printing, make
pretty_format_columns_with_optionspub #7346 [arrow] (alamb) - Improve documentation on writing parquet, including multiple threads #7321 [parquet] (alamb)
Merged pull requests:
- chore: apply clippy suggestions newly introduced in rust 1.86 #7382 [parquet] [arrow] (westonpace)
- bench: add more {boolean, string, int} benchmarks for concat kernel #7376 [arrow] (rluvaton)
- Add more examples of using Parquet encryption #7374 [parquet] (adamreeve)
- Clean up
ArrowReaderMetadata::load_async#7369 [parquet] (etseidl) - bump pyo3 for RUSTSEC-2025-0020 #7368 [arrow] (onursatici)
- Test int96 Parquet file from Spark #7367 [parquet] (mbutrovich)
- fix: respect offset/length when converting ArrayData to StructArray #7366 [arrow] (westonpace)
- Print row, data present, expected type, and row number in error messages for arrow-csv #7361 [arrow] (psiayn)
- Use rust builtins for round_upto_multiple_of_64 and ceil #7358 [arrow] (psvri)
- Write parquet PageEncodingStats #7354 [parquet] (jhorstmann)
- Move
sysinfotodev-dependencies#7353 [parquet] (mbrobbel) - chore(deps): update sysinfo requirement from 0.33.0 to 0.34.0 #7352 [parquet] (dependabot[bot])
- Add additional benchmarks for utf8view comparison kernels #7351 [arrow] (zhuqi-lucas)
- Upgrade to twox-hash 2.0 #7347 [parquet] (alamb)
- refactor: apply borrowed chunk reader to Sbbf::read_from_column_chunk #7345 [parquet] (ethe)
- Merge changelog and version from 54.3.1 into main #7340 [parquet] [arrow] (timsaucer)
- Remove
object-storelabel from.asf.yaml#7339 (mbrobbel) - Encapsulate encryption code more in readers #7337 [parquet] (alamb)
- Bump MSRV to 1.81 #7336 [parquet] [arrow] [arrow-flight] (mbrobbel)
- Add an option to show column type #7335 [arrow] (blaginin)
- Add missing type annotation #7326 [parquet] (mbrobbel)
- Minor: Improve parallel parquet encoding example #7323 [parquet] (alamb)
- feat: allow if expressions for fallbacks in downcast macro #7322 [arrow] (rluvaton)
- Minor: rename
ParquetRecordBatchStream::readertoParquetRecordBatchStream::reader_factory#7319 [parquet] (alamb) - bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318 [arrow] (timsaucer)
- Remove object_store references in Readme.md #7317 (alamb)
- Adopt MSRV policy #7314 (psvri)
- fix: correct array length validation error message #7313 [arrow] (wkalt)
- chore: remove trailing space in debug print #7311 [arrow] (xxchan)
- Improve
concatperformance, and addappend_arrayfor some array builder implementations #7309 [arrow] (rluvaton) - feat: add
append_bufferforNullBufferBuilder#7308 [arrow] (rluvaton) - MINOR: fix incorrect method name in deprecate node #7306 [arrow] (waynexia)
- Allow retrieving Parquet decryption keys using the key metadata #7286 [parquet] (adamreeve)
- Support different TimeUnits and timezones when reading Timestamps from INT96 #7285 [parquet] (mbutrovich)
- Add Parquet Modular encryption support (write) #7111 [parquet] (rok)
54.3.1 (2025-03-26)
Fixed bugs:
- Round trip encoding of list of fixed list fails when offset is not zero #7315
Merged pull requests:
- Add missing type annotation #7326 [parquet] (mbrobbel)
- bugfix: correct offsets when serializing a list of fixed sized list and non-zero start offset #7318 [arrow] (timsaucer)
54.3.0 (2025-03-17)
Implemented enhancements:
- Using column chunk offset index in
InMemoryRowGroup::fetch#7300 - Support reading parquet with modular encryption #7296 [parquet]
- Add example for how to read/write encrypted parquet files #7281 [parquet]
- Have writer return parsed
ParquetMetadata#7254 [parquet] - feat: Support Utf8View in JSON reader #7244 [arrow]
- StructBuilder should provide a way to get a &dyn ArrayBuilder of a field builder #7193 [arrow]
- Support div_wrapping/rem_wrapping for numeric arithmetic kernels #7158 [arrow]
- Improve RleDecoder performance #7195 [parquet] (Dandandan)
- Improve arrow-json deserialization performance by 30% #7157 [arrow] (mwylde)
- Add
with_skip_validationflag to IPCStreamReader,FileReaderandFileDecoder#7120 [arrow] (alamb)
Fixed bugs:
- Archery integration CI test is failing on main: error: package
half v2.5.0cannot be built because it requires rustc 1.81 or newer, while the currently active rustc version is 1.77.2 #7291 - MSRV CI check is failing on main #7289
- Incorrect IPC schema encoding for multiple dictionaries #7058 [arrow] [arrow-flight]
Documentation updates:
- Add example for how to read encrypted parquet files #7283 [parquet] (rok)
- Update the relative path of the test data in docs #7221 (Ziy1-Tan)
- Minor: fix doc and remove unused code #7194 [arrow] (lewiszlw)
- doc: modify wrong comment #7190 [arrow] (YichiZhang0613)
- doc: fix IPC file reader/writer docs #7178 [arrow] (Jefffrey)
Merged pull requests:
- chore: require ffi feature in arrow-schema benchmark #7298 [arrow] (ethe)
- Fix archery integration test #7292 (alamb)
- Minor: run
test_decimal_listagain #7282 [parquet] (alamb) - Move Parquet encryption tests into the arrow_reader integration tests #7279 [parquet] (adamreeve)
- Include license and notice files in published crates, part 2 #7275 [arrow] (ankane)
- feat: Support Utf8View in JSON reader #7263 [arrow] (zhuqi-lucas)
- feat: use
force_validatefeature flag when creating an arrays #7241 [arrow] (rluvaton) - fix: take on empty struct array returns empty array #7224 [arrow] (westonpace)
- fix: correct
bloom_filter_positiondescription #7223 [parquet] (romanz) - Minor: Move
make_builderinto mod.rs #7218 (lewiszlw) - Expose
field_buildersinStructBuilder#7217 [arrow] (lewiszlw) - Minor: Fix json StructMode docs links #7215 [arrow] (gstvg)
- [main] Bump arrow version to 54.2.1 (#7207) #7212 (alamb)
- feat: add
downcast_integer_arraymacro helper #7211 [arrow] (rluvaton) - Remove zstd pin #7199 [parquet] (tustvold)
- fix: Use chrono's quarter() to avoid conflict #7198 [arrow] (yutannihilation)
- Fix some Clippy 1.85 warnings #7167 [parquet] [arrow] (mbrobbel)
- feat: add to concat different data types error message the data types #7166 [arrow] (rluvaton)
- Add Week ISO, Year ISO computation #7163 [arrow] (kosiew)
- fix: create_random_batch fails with timestamp types having a timezone #7162 [arrow] (niebayes)
- Avoid overflow of remainder #7159 [arrow] (wForget)
- fix: Data type inference for NaN, inf and -inf in csv files #7150 [arrow] (Mottl)
- Preserve null dictionary values in
interleaveandconcatkernels #7144 [arrow] (kawadakk) - Support casting
Dateto a time zone-specific timestamp #7141 [arrow] (friendlymatthew) - Minor: Add doctest to ArrayDataBuilder::build_unchecked #7139 [arrow] (gstvg)
- arrow-ord: add support for nested types to
partition#7131 [arrow] (asubiotto) - Update prost-build requirement from =0.13.4 to =0.13.5 #7127 [arrow] [arrow-flight] (dependabot[bot])
- Avoid use of
flatbuffers::size_prefixed_root, fix validation error in arrow-flight #7109 [arrow] [arrow-flight] (bkietz) - Optimise decimal casting for infallible conversions #7021 [arrow] (aweltsch)
53.4.1 (2025-03-04)
Fixed bugs:
- Take empty struct array would get array with length 0 #7225
Closed issues:
54.2.1 (2025-02-27)
Fixed bugs:
- Use chrono >= 0.4.34, < 0.4.40 to avoid breaking #7210
Fixed bugs:
54.2.0 (2025-02-12)
Implemented enhancements:
- Casting from Utf8View to Dict(k, Utf8View) #7114
- Support creating map arrays with key metadata #7100 [arrow]
- [parquet] Print Parquet BasicTypeInfo id when present #7081 [parquet]
- Add arrow-ipc benchmarks for the IPC reader and writer #6968 [arrow]
Fixed bugs:
- NullBufferBuilder::allocated_size Returns Size in Bits #7121 [arrow]
- [Regression in 54.0.0]. Decimal cast to smaller precision gives invalid (off-by-one) result in some cases #7069 [arrow]
- Minor: Fix deprecated note to point to the correct const #7067 [arrow]
- incorrect error message for reading definition levels #7056 [parquet]
- First None in ListArray panics in
cast_with_options#7043 [arrow]
Documentation updates:
- Minor: Clarify documentation on
NullBufferBuilder::allocated_size#7089 [arrow] (alamb) - Minor: Update release schedule #7086 (alamb)
- Improve
ListArraydocumentation for slices #7039 [arrow] (alamb)
Merged pull requests:
- fix: NullBufferBuilder::allocated_size should return Size in Bytes #7122 [arrow] (shuozel)
- minor: fix deprecated_note #7105 [arrow] (Chen-Yuan-Lai)
- Minor: Fix ArrayDataBuilder::build_unchecked docs #7103 [arrow] (gstvg)
- Support setting key field in MapBuilder #7101 [arrow] (rshkv)
- Add tests that arrow IPC data is validated #7096 [arrow] (alamb)
- Print Parquet BasicTypeInfo id when present #7094 [parquet] (devinrsmith)
- Expose record boundary information in JSON decoder #7092 [arrow] (scovich)
- Benchmarks for Arrow IPC reader #7091 [arrow] (alamb)
- Benchmarks for Arrow IPC writer #7090 [arrow] (alamb)
- Add another decimal cast edge test case #7078 [arrow] (findepi)
- minor: re-export
OffsetBufferBuilderinarrowcrate #7077 [arrow] (alamb) - Support converting large dates (i.e. +10999-12-31) from string to Date32 #7074 [arrow] (phillipleblanc)
- fix: issue introduced in #6833 - less than equal check for scale in decimal conversion #7070 [arrow] (himadripal)
- perf: inline
from_iterforScalarBuffer#7066 [arrow] (0ax1) - fix: first none/empty list in
ListArraypanics incast_with_options#7065 [arrow] (irenjj) - Minor: add ticket reference for todo #7064 [parquet] (alamb)
- Refactor some decimal-related code and tests #7062 [arrow] (CurtHagenlocher)
- fix error message for reading definition levels #7057 [parquet] (jp0317)
- Update release schedule README.md #7053 (alamb)
- Support both 0x01 and 0x02 as type for list of booleans in thrift metadata #7052 [parquet] (jhorstmann)
- Refactor arrow-ipc: Move
create_*_arraymethods intoRecordBatchDecoder#7029 [arrow] (alamb)
54.1.0 (2025-01-29)
Implemented enhancements:
- Create GitHub releases automatically on tagging #7041
- Add required methods to access inner builder for
NullBufferBuilder#7002 [arrow] - Re-export
NullBufferBuilderin the arrow crate #6975 [arrow] arrow-stringfunction should support binary input as well #6923 [arrow]- MMap support for IPC files #6709 [arrow]
- fix: mark (Large)ListView as nested and support in equal data type #6995 [arrow] (rluvaton)
- Expose min/max values for Decimal128/256 and improve docs #6992 [arrow] (alamb)
- [Parquet] Improve speed of dictionary encoding NaN float values #6953 [parquet] (adamreeve)
- Optimize
BooleanBufferBuilderfor non nullable columns #6973 [arrow] arrow::compute::concatshould merge dictionary type when concatenating list of dictionaries #6888 [arrow]- Improve error message for unsupported cast between struct and other types #6724 [arrow]
- implement regexp_match, regexp_scalar_match and regexp_array_match for StringViewArray #6717 [arrow]
- Speed up Parquet utf8 validation #6667 [parquet]
Fixed bugs:
- Regression: Concatenating sliced
ListArrays is broken #7034 PrimitiveDictionaryBuilderwith specific value data type and capacity #7011 [arrow]- Arrow IPC Writer Panics for sliced nested arrays #6997 [arrow]
- RecordBatch with no columns cannot be roundtripped through Parquet #6988 [parquet]
- StringView: Using the Interleave kernel (and potentially others) results in many repeated buffers in variadic_buffers #6780 [arrow]
- fix prefetch of page index #6999 [parquet] (adriangb)
- fix: Parquet column writer
Dictionary(_, Decimal128)andDictionary(_, Decimal256)#6987 [parquet] (korowa) - Writing floating point values containing NaN to Parquet is slow when using dictionary encoding #6952 [parquet] [arrow]
- Public API using private types:
Buffer::from_bytestakes unexportedBytes#6754 [parquet] [arrow] [arrow-flight] - Some MSRVs are inaccurate #6741 [parquet] [arrow] [arrow-flight]
Documentation updates:
- docs: add to bit slice iterator docs that the start value is inclusive and end value is exclusive #7022 [arrow] (rluvaton)
- Fix duplicate link references in README #7020 (Jefffrey)
- Enhance ListViewArray related docs #7007 [arrow] (Jefffrey)
- Document data type support and examples to predicates
*like,starts_with,ends_with,contains#7003 [arrow] (alamb) - Minor: improve documentation on timezone representations #7000 [arrow] (alamb)
- Add additional documentation for UTC representation of timestamps #6994 [arrow] (Abdullahsab3)
- Improve
ParquetRecordBatchStreamBuilderdocs / examples #6948 [parquet] (alamb) - Document the
ParquetRecordBatchStreambuffering #6947 [parquet] (alamb) - Minor: improve
zipkernel docs, add examples #6928 [arrow] (alamb) - Add doctest example for
Buffer::from_bytes#6920 [arrow] (kylebarron) - [object store] Add planned object_store release schedule to crate readme #6904 (alamb)
- Avoid panics? #6737 [parquet]
Merged pull requests:
- Create GitHub releases automatically on tagging #7042 (kou)
- Fix
concatfor slicedListArrays#7037 [arrow] (alamb) - Minor: Clarify NullBufferBuilder::new capacity parameter #7016 [arrow] (alamb)
- Add
is_validandtruncatemethods toNullBufferBuilder#7013 [arrow] (Chen-Yuan-Lai) - fix: use the values builder capacity for the hash map in
PrimitiveDictionaryBuilder::new_from_builders#7012 [arrow] (rluvaton) - Refactor ipc reading code into methods on
ArrayReader#7006 [arrow] (alamb) - Minor: make it clear Predicate is crate private #7001 [arrow] (alamb)
- fix: Panic on reencoding offsets in arrow-ipc with sliced nested arrays #6998 [arrow] (HawaiianSpork)
- Add check for empty schema in
parquet::schema::types::from_thrift_helper#6990 [parquet] (etseidl) - Add example reading data from an
mmaped IPC file #6986 [arrow] (alamb) - Improve
arrow-ipcdocumentation #6983 [arrow] (alamb) - Add
simdutf8feature to makesimdutf8optional, consolidatecheck_valid_utf8#6979 [parquet] (alamb) - Export NullBufferBuilder along with BooleanBufferBuilder in
arrowcrate #6976 [arrow] (alamb) - Minor: improve the documentation of NullBuffer and BooleanBuffer #6974 [arrow] (alamb)
- Simplify Validation/Alignment APIs of
ArrayDataBuilder: validate and align #6966 [arrow] (alamb) - Fix WASM CI for Rust 1.84 release #6963 (alamb)
- [Parquet] Add benchmark and test for writing NaNs to Parquet #6955 [parquet] [arrow] (adamreeve)
- Add
peek_next_page_offsettoSerializedPageReader#6945 [parquet] (XiangpengHao) - Improve
Bufferdocumentation, deprecateBuffer::from_bytesaddFrom<Bytes>andFrom<bytes::Bytes>impls #6939 [parquet] [arrow] [arrow-flight] (alamb) - minor: fix test and remove println in tests #6935 [arrow] (himadripal)
- Document how to use Extend for generic methods on ArrayBuilders #6932 [arrow] (wiedld)
- [Parquet] Add projection utility functions #6931 [parquet] (XiangpengHao)
- [Parquet] Reuse buffer in
ByteViewArrayDecoderPlain#6930 [parquet] (XiangpengHao) - Support
Binaryarrays instarts_with,ends_withandcontains#6926 [arrow] (rluvaton) - Improve the error message for casting between struct and non-struct types #6919 [arrow] (takaebato)
- Fix error message typos with Parquet compression #6918 [parquet] (orf)
- Expose arrow-schema methods, for use when writing parquet outside of ArrowWriter #6916 [parquet] (wiedld)
- feat(arrow-ord): support boolean in
rankand add tests for sorting lists of booleans #6912 [arrow] (rluvaton) - chore(arrow-ord): move
can_rankto therankfile #6910 [arrow] (rluvaton) - feat(parquet): Add next_row_group API for ParquetRecordBatchStream #6907 [parquet] (Xuanwo)
- feat(arrow-select):
concatkernel will merge dictionary values for list of dictionaries #6893 [arrow] (rluvaton) - add
extend_dictionaryin dictionary builder for improved performance #6875 [arrow] (rluvaton) - [arrow-string] Implement string view support for
regexp_match#6849 [arrow] (tlm365) - Add support
StringView/BinaryViewininterleavekernel #6779 [arrow] (onursatici) RecordBatchnormalization (flattening) #6758 [arrow] (ngli-me)
54.0.0 (2024-12-18)
Breaking changes:
- avoid redundant parsing of repeated value in RleDecoder #6834 [parquet] (jp0317)
- Handling nullable DictionaryArray in CSV parser #6830 [arrow] (edmondop)
- fix(flightsql): remove Any encoding of DoPutUpdateResult #6825 [arrow] [arrow-flight] (davisp)
- arrow-ipc: Default to not preserving dict IDs #6788 [arrow] (brancz)
- Remove some very old deprecated functions #6774 [parquet] [arrow] (alamb)
- update to pyo3 0.23.0 #6745 [arrow] (psvri)
- Remove APIs deprecated since v 4.4.0 #6722 [arrow] [arrow-flight] (findepi)
- Return
Nonewhen Parquet page indexes are not present in file #6639 [parquet] (etseidl) - Add
ParquetError::NeedMoreDatamarkParquetErrorasnon_exhaustive#6630 [parquet] (etseidl) - Remove APIs deprecated since v 2.0.0 #6609 [arrow] (findepi)
Implemented enhancements:
- Parquet schema hint doesn't support integer types upcasting #6891 [parquet]
- Parquet UTF-8 max statistics are overly pessimistic #6867 [parquet]
- Add builder support for Int8 keys #6844 [arrow]
- Formalize the name of the nested
Fieldin a list #6784 [parquet] [arrow] [arrow-flight] - Allow disabling the writing of Parquet Offset Index #6778 [parquet]
parquet::record::make_rowis not exposed to users, leaving no option to users to manually createRowobjects #6761 [parquet]- Avoid
from_num_days_from_ce_optcalls intimestamp_s_to_datetimeif we don't need #6746 [arrow] - Support Temporal -> Utf8View casting #6734 [arrow]
- Add Option To Coerce List Type on Parquet Write #6733 [parquet] [arrow]
- Support Numeric -> Utf8View casting #6714 [arrow]
- Support Utf8View <=> boolean casting #6713 [arrow]
Fixed bugs:
Buffer::bit_sliceloses length with byte-aligned offsets #6895 [arrow]- parquet arrow writer doesn't track memory size correctly for fixed sized lists #6839 [parquet]
- Casting Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases #6833 [arrow]
- Should empty nullable dictionary be parsed as null from arrow-csv? #6821 [arrow]
- Array take doesn't make fields nullable #6809
- Arrow Flight Encodes a Slice's List Offsets If the slice offset is starts with zero #6803 [arrow]
- Parquet readers incorrectly interpret legacy nested lists #6756 [parquet]
- filter_bits under-allocates resulting boolean buffer #6750 [arrow]
- Multi-language support issues with Arrow FlightSQL client's execute_update and execute_ingest methods #6545 [arrow] [arrow-flight]
Documentation updates:
- Should we document at what rate deprecated APIs are removed? #6851 [parquet] [arrow]
- Fix docstring for
Format::with_headerinarrow-csv#6856 [arrow] (kylebarron) - Add deprecation / API removal policy #6852 [parquet] [arrow] (alamb)
- Minor: add example for creating
SchemaDescriptor#6841 [parquet] (alamb) - chore: enrich panic context when BooleanBuffer fails to create #6810 [arrow] (tisonkun)
Closed issues:
- [FlightSQL] GetCatalogsBuilder does not sort the catalog names #6807 [arrow] [arrow-flight]
- Add a lint to automatically check for unused dependencies #6796 [arrow] [arrow-flight]
Merged pull requests:
- doc: add comment for timezone string #6899 [arrow] (xxchan)
- docs: fix typo #6890 [arrow] (rluvaton)
- Minor: Fix deprecation notice for
arrow_to_parquet_schema#6889 [parquet] (etseidl) - Add Field::with_dict_is_ordered #6885 [arrow] (alamb)
- Deprecate "max statistics size" property in
WriterProperties#6884 [parquet] (etseidl) - Add deprecation warnings for everything related to
dict_id#6873 [parquet] [arrow] [arrow-flight] (brancz) - Enable matching temporal as from_type to Utf8View #6872 [arrow] (Kev1n8)
- Enable string-based column projections from Parquet files #6871 [parquet] (etseidl)
- Improvements to UTF-8 statistics truncation #6870 [parquet] (etseidl)
- fix: make GetCatalogsBuilder sort catalog names #6864 [arrow] [arrow-flight] (niebayes)
- add buffered data_pages to parquet column writer total bytes estimation #6862 [parquet] (onursatici)
- Update prost-build requirement from =0.13.3 to =0.13.4 #6860 [arrow] [arrow-flight] (dependabot[bot])
- Minor: add comments explaining bad MSRV, output in json #6857 (alamb)
- perf: Use Cow in get_format_string in FFI_ArrowSchema #6853 [arrow] (andygrove)
- chore: add cast_decimal benchmark #6850 [arrow] (andygrove)
- arrow-array::builder: support Int8, Int16 and Int64 keys #6845 [arrow] (ajwerner)
- Add
ArrowToParquetSchemaConverter, deprecatearrow_to_parquet_schema#6840 [parquet] (alamb) - Remove APIs deprecated in 50.0.0 #6838 [arrow] (findepi)
- fix: decimal conversion looses value on lower precision #6836 [arrow] (himadripal)
- Update sysinfo requirement from 0.32.0 to 0.33.0 #6835 [parquet] (dependabot[bot])
- Optionally coerce names of maps and lists to match Parquet specification #6828 [parquet] (etseidl)
- Remove deprecated unary_dyn and try_unary_dyn #6824 [arrow] (findepi)
- Remove deprecated flight_data_from_arrow_batch #6823 [arrow] [arrow-flight] (findepi)
- [arrow-cast] Support cast boolean from/to string view #6822 [arrow] (tlm365)
- Hook up Avro Decoder #6820 [arrow] (tustvold)
- Fix arrow-avro compilation without default features #6819 [arrow] (findepi)
- Support shrink to empty #6817 [arrow] (tustvold)
- [arrow-cast] Support cast numeric to string view (alternate) #6816 [arrow] (alamb)
- Hide implicit optional dependency features in arrow-flight #6806 [arrow] [arrow-flight] (findepi)
- fix: Encoding of List offsets was incorrect when slice offsets begin with zero #6805 [arrow] (HawaiianSpork)
- Enable unused_crate_dependencies Rust lint, remove unused dependencies #6804 [arrow] [arrow-flight] (findepi)
- Minor: Fix docstrings for
ColumnProperties::statistics_enabledproperty #6798 [parquet] (etseidl) - Add option to disable writing of Parquet offset index #6797 [parquet] (etseidl)
- Remove unused dependencies #6792 [arrow] [arrow-flight] (findepi)
- Add
Array::shrink_to_fit(&mut self)#6790 [arrow] (emilk) - Formalize the default nested list field name to
item#6785 [parquet] [arrow] [arrow-flight] (gruuya) - Improve UnionArray logical_nulls tests #6781 [arrow] (gstvg)
- Improve list builder usage example in docs #6775 [arrow] (findepi)
- Update proc-macro2 requirement from =1.0.89 to =1.0.92 #6772 [arrow] [arrow-flight] (dependabot[bot])
- Allow NullBuffer construction directly from array #6769 [parquet] [arrow] (findepi)
- Include license and notice files in published crates #6767 [parquet] [arrow] [arrow-flight] (ankane)
- fix: remove redundant
bit_util::ceil#6766 [arrow] (miroim) - Remove 'make_row', expose a 'Row::new' method instead. #6763 [parquet] (jonded94)
- Read nested Parquet 2-level lists correctly #6757 [parquet] (etseidl)
- Split
timestamp_s_to_datetimetodateandtimeto avoid unnecessary computation #6755 [arrow] (jayzhan211) - More trivial implementation of
Box<dyn AsyncArrowWriter>andBox<dyn AsyncArrowReader>#6748 [parquet] (ethe) - Update cache action to v4 #6744 (findepi)
- Remove redundant implementation of
StringArrayType#6743 [arrow] (tlm365) - Fix Dictionary logical nulls for RunArray/UnionArray Values #6740 [arrow] (findepi)
- Allow reading Parquet maps that lack a
valuesfield #6730 [parquet] (etseidl) - Improve default implementation of Array::is_nullable #6721 [arrow] (findepi)
- Fix Buffer::bit_slice losing length with byte-aligned offsets #6707 [arrow] [arrow-flight] (itsjunetime)
53.3.0 (2024-11-17)
- Signed decimal e-notation parsing bug #6728 [arrow]
- Add support for Utf8View -> numeric in can_cast_types #6715
- IPC file writer produces incorrect footer when not preserving dict ID #6710 [arrow]
- parquet from_thrift_helper incorrectly checks index #6693 [parquet]
- Primitive REPEATED fields not contained in LIST annotated groups aren't read as lists by record reader #6648 [parquet]
- DictionaryHandling does not recurse into Map fields #6644 [arrow] [arrow-flight]
- Array writer output empty when no record is written #6613 [arrow]
- Archery Integration Test with c# failing on main #6577 [arrow]
- Potential unsoundness in
filter_run_end_array#6569 [arrow] - Parquet reader can generate incorrect validity buffer information for nested structures #6510 [parquet]
- arrow-array ffi: FFI_ArrowArray.null_count is always interpreted as unsigned and initialized during conversion from C to Rust. #6497 [arrow]
Documentation updates:
- Minor: Document pattern for accessing views in StringView #6673 [arrow] (alamb)
- Improve Array::is_nullable documentation #6615 [arrow] (findepi)
- Minor: improve docs for ByteViewArray->ByteArray From impl #6610 [arrow] (alamb)
Performance improvements:
Closed issues:
- Incorrect like results for pattern starting/ending with
%percent and containing escape characters #6702 [arrow]
Merged pull requests:
- Fix signed decimal e-notation parsing #6729 [arrow] (gruuya)
- Clean up some arrow-flight tests and duplicated code #6725 [arrow] [arrow-flight] (itsjunetime)
- Update PR template section about API breaking changes #6723 (findepi)
- Support for casting
StringViewArraytoDecimalArray#6720 [arrow] (tlm365) - File writer preserve dict bug #6711 [arrow] (brancz)
- Add filter_kernel benchmark for run array #6706 [arrow] (delamarch3)
- Fix string view ILIKE checks with NULL values #6705 [arrow] (findepi)
- Implement logical_null_count for more array types #6704 [arrow] (findepi)
- Fix LIKE with escapes #6703 [arrow] (findepi)
- Speed up
filter_bytes#6699 [arrow] (Dandandan) - Minor: fix misleading comment in byte view #6695 [arrow] (jayzhan211)
- minor fix on checking index #6694 [parquet] (jp0317)
- Undo run end filter performance regression #6691 [arrow] (delamarch3)
- Reimplement
PartialEqofGenericByteViewArraycompares by logical value #6689 [arrow] (tlm365) - feat: expose known_schema from FlightDataEncoder #6688 [arrow] [arrow-flight] (nathanielc)
- Update hashbrown requirement from 0.14.2 to 0.15.1 #6684 [parquet] [arrow] (dependabot[bot])
- Support Duration in JSON Reader #6683 [arrow] (simonvandel)
- Check predicate and values are the same length for run end array filter safety #6675 [arrow] (delamarch3)
- [ffi] Fix arrow-array null_count error during conversion from C to Rust #6674 [arrow] (adbmal)
- Support
Utf8Viewforbit_lengthkernel #6671 [arrow] (austin362667) - Fix string view LIKE checks with NULL values #6662 [arrow] (findepi)
- Improve documentation for
nullifkernel #6658 [arrow] (alamb) - Improve test_auth error message when contains() fails #6657 [arrow] [arrow-flight] (findepi)
- Let std::fmt::Debug for StructArray output Null/Validity info #6655 [arrow] (XinyuZeng)
- Include offending line number when processing CSV file fails #6653 [arrow] (findepi)
- feat: add write_bytes for GenericBinaryBuilder #6652 [arrow] (tisonkun)
- feat: Support Utf8View in JSON serialization #6651 [arrow] (jonmmease)
- fix: include chrono-tz in flight sql cli #6650 [arrow] [arrow-flight] (crepererum)
- Handle primitive REPEATED field not contained in LIST annotated group #6649 [parquet] (zeevm)
- Implement
append_nforBooleanBuilder#6646 [arrow] (delamarch3) - fix: recurse into Map datatype when hydrating dictionaries #6645 [arrow] [arrow-flight] (nathanielc)
- fix: enable TLS roots for flight CLI client #6640 [arrow] [arrow-flight] (crepererum)
- doc: Clarify take kernel semantics #6632 [arrow] (viirya)
- Return error rather than panic when too many row groups are written #6629 [parquet] (etseidl)
- Fix test feature selection so all feature combinations work as expected #6626 [parquet] (itsjunetime)
- Add Parquet RowSelection benchmark #6623 [parquet] (XiangpengHao)
- Optimize
take_bitsto optimizetake_boolean/take_primitive/take_byte_view: up to -25% #6622 [arrow] (Dandandan) - Make downcast macros hygenic (#6400) #6620 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.88 to =1.0.89 #6618 [arrow] [arrow-flight] (dependabot[bot])
- Fix arrow-json writer empty #6614 [arrow] (gwik)
- Add
ParquetObjectReader::with_runtime#6612 [parquet] [arrow] (itsjunetime) - Re-enable
C#arrow flight integration test #6611 [arrow] (alamb)
53.3.0 (2024-11-17)
Implemented enhancements:
PartialEqof GenericByteViewArray (StringViewArray / ByteViewArray) that compares on equality rather than logical value #6679 [arrow]- Need a mechanism to handle schema changes due to dictionary hydration in FlightSQL server implementations #6672 [arrow] [arrow-flight]
- Support encoding Utf8View columns to JSON #6642 [arrow]
- Implement
append_nforBooleanBuilder#6634 [arrow] - Some take optimizations #6621 [arrow]
- Error Instead of Panic On Attempting to Write More Than 32769 Row Groups #6591 [parquet]
- Make casting from a timestamp without timezone to a timestamp with timezone configurable #6555
- Add
record_batch!macro for easy record batch creation #6553 [arrow] - Support
Binary-->Utf8Viewcasting #6531 [arrow] downcast_primitive_arrayanddowncast_dictionary_arrayare not hygienic wrt imports #6400 [arrow]- Implement interleave_record_batch #6731 [arrow] (waynexia)
- feat:
record_batch!macro #6588 [arrow] (ByteBaker)
Fixed bugs:
- Signed decimal e-notation parsing bug #6728 [arrow]
- Add support for Utf8View -> numeric in can_cast_types #6715
- IPC file writer produces incorrect footer when not preserving dict ID #6710 [arrow]
- parquet from_thrift_helper incorrectly checks index #6693 [parquet]
- Primitive REPEATED fields not contained in LIST annotated groups aren't read as lists by record reader #6648 [parquet]
- DictionaryHandling does not recurse into Map fields #6644 [arrow] [arrow-flight]
- Array writer output empty when no record is written #6613 [arrow]
- Archery Integration Test with c# failing on main #6577 [arrow]
- Potential unsoundness in
filter_run_end_array#6569 [arrow] - Parquet reader can generate incorrect validity buffer information for nested structures #6510 [parquet]
- arrow-array ffi: FFI_ArrowArray.null_count is always interpreted as unsigned and initialized during conversion from C to Rust. #6497 [arrow]
Documentation updates:
- Minor: Document pattern for accessing views in StringView #6673 [arrow] (alamb)
- Improve Array::is_nullable documentation #6615 [arrow] (findepi)
- Minor: improve docs for ByteViewArray->ByteArray From impl #6610 [arrow] (alamb)
Performance improvements:
Closed issues:
- Incorrect like results for pattern starting/ending with
%percent and containing escape characters #6702 [arrow]
Merged pull requests:
- Fix signed decimal e-notation parsing #6729 [arrow] (gruuya)
- Clean up some arrow-flight tests and duplicated code #6725 [arrow] [arrow-flight] (itsjunetime)
- Update PR template section about API breaking changes #6723 (findepi)
- Support for casting
StringViewArraytoDecimalArray#6720 [arrow] (tlm365) - File writer preserve dict bug #6711 [arrow] (brancz)
- Add filter_kernel benchmark for run array #6706 [arrow] (delamarch3)
- Fix string view ILIKE checks with NULL values #6705 [arrow] (findepi)
- Implement logical_null_count for more array types #6704 [arrow] (findepi)
- Fix LIKE with escapes #6703 [arrow] (findepi)
- Speed up
filter_bytes#6699 [arrow] (Dandandan) - Minor: fix misleading comment in byte view #6695 [arrow] (jayzhan211)
- minor fix on checking index #6694 [parquet] (jp0317)
- Undo run end filter performance regression #6691 [arrow] (delamarch3)
- Reimplement
PartialEqofGenericByteViewArraycompares by logical value #6689 [arrow] (tlm365) - feat: expose known_schema from FlightDataEncoder #6688 [arrow] [arrow-flight] (nathanielc)
- Update hashbrown requirement from 0.14.2 to 0.15.1 #6684 [parquet] [arrow] (dependabot[bot])
- Support Duration in JSON Reader #6683 [arrow] (simonvandel)
- Check predicate and values are the same length for run end array filter safety #6675 [arrow] (delamarch3)
- [ffi] Fix arrow-array null_count error during conversion from C to Rust #6674 [arrow] (adbmal)
- Support
Utf8Viewforbit_lengthkernel #6671 [arrow] (austin362667) - Fix string view LIKE checks with NULL values #6662 [arrow] (findepi)
- Improve documentation for
nullifkernel #6658 [arrow] (alamb) - Improve test_auth error message when contains() fails #6657 [arrow] [arrow-flight] (findepi)
- Let std::fmt::Debug for StructArray output Null/Validity info #6655 [arrow] (XinyuZeng)
- Include offending line number when processing CSV file fails #6653 [arrow] (findepi)
- feat: add write_bytes for GenericBinaryBuilder #6652 [arrow] (tisonkun)
- feat: Support Utf8View in JSON serialization #6651 [arrow] (jonmmease)
- fix: include chrono-tz in flight sql cli #6650 [arrow] [arrow-flight] (crepererum)
- Handle primitive REPEATED field not contained in LIST annotated group #6649 [parquet] (zeevm)
- Implement
append_nforBooleanBuilder#6646 [arrow] (delamarch3) - fix: recurse into Map datatype when hydrating dictionaries #6645 [arrow] [arrow-flight] (nathanielc)
- fix: enable TLS roots for flight CLI client #6640 [arrow] [arrow-flight] (crepererum)
- doc: Clarify take kernel semantics #6632 [arrow] (viirya)
- Return error rather than panic when too many row groups are written #6629 [parquet] (etseidl)
- Fix test feature selection so all feature combinations work as expected #6626 [parquet] (itsjunetime)
- Add Parquet RowSelection benchmark #6623 [parquet] (XiangpengHao)
- Optimize
take_bitsto optimizetake_boolean/take_primitive/take_byte_view: up to -25% #6622 [arrow] (Dandandan) - Make downcast macros hygenic (#6400) #6620 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.88 to =1.0.89 #6618 [arrow] [arrow-flight] (dependabot[bot])
- Fix arrow-json writer empty #6614 [arrow] (gwik)
- Add
ParquetObjectReader::with_runtime#6612 [parquet] [arrow] (itsjunetime) - Re-enable
C#arrow flight integration test #6611 [arrow] (alamb) - Add Array::logical_null_count for inspecting number of null values #6608 [parquet] [arrow] (findepi)
- Added casting from Binary/LargeBinary to Utf8View #6592 [arrow] (ngli-me)
- Parquet AsyncReader: Don't panic when empty offset_index is Some([]) #6582 [parquet] (jroddev)
- Skip writing down null buffers for non-nullable primitive arrays #6524 [parquet] (bkirwi)
53.2.0 (2024-10-21)
Implemented enhancements:
- Implement arrow_json encoder for Decimal128 & Decimal256 DataTypes #6605 [arrow]
- Support DataType::FixedSizeList in make_builder within struct_builder.rs #6594 [arrow]
- Support DataType::Dictionary in
make_builderwithin struct_builder.rs #6589 [arrow] - Interval parsing from string - accept "mon" and "mons" token #6548 [arrow]
AsyncArrowWriterAPI to get the total size of a written parquet file #6530 [parquet]append_manyfor Dictionary builders #6529 [arrow]- Missing tonic
GRPC_STATUSwith tonic 0.12.1 #6515 [arrow] [arrow-flight] - Add example of how to use parquet metadata reader APIs for a local cache #6504 [parquet]
- Remove reliance on
raw-entryfeature of Hashbrown #6498 [parquet] [arrow] [arrow-flight] - Improve page index metadata loading in
SerializedFileReader::new_with_options#6491 [parquet] - Release arrow-rs / parquet minor version
53.1.0(October 2024) #6340 [arrow]
Fixed bugs:
- Compilation fail where
c_char = u8#6571 [arrow] - Arrow flight CI test failing on
master#6568 [arrow] [arrow-flight]
Documentation updates:
Closed issues:
Merged pull requests:
- Minor: more comments for
RecordBatch.get_array_memory_size()#6607 [arrow] (2010YOUY01) - Implement arrow_json encoder for Decimal128 & Decimal256 #6606 [arrow] (phillipleblanc)
- Add support for building FixedSizeListBuilder in struct_builder's mak… #6595 [arrow] (kszlim)
- Add limited support for dictionary builders in
make_buildersfor stru… #6593 [arrow] (kszlim) - Fix CI with new valid certificates and add script for future usage #6585 [arrow] [arrow-flight] (itsjunetime)
- Update proc-macro2 requirement from =1.0.87 to =1.0.88 #6579 [arrow] [arrow-flight] (dependabot[bot])
- Fix clippy complaints #6573 [parquet] [arrow] [arrow-flight] (itsjunetime)
- Use c_char instead of i8 to compile on platforms where c_char = u8 #6572 [arrow] (itsjunetime)
- Bump pyspark from 3.3.1 to 3.3.2 in /parquet/pytest #6564 [parquet] (dependabot[bot])
unsafeimprovements #6551 [arrow] (ssbr)- Update README.md #6550 [arrow] [arrow-flight] (Abdullahsab3)
- Fix string '0' cast to decimal with scale 0 #6547 [arrow] (findepi)
- Add finish to
AsyncArrowWriter::finish#6543 [parquet] (etseidl) - Add append_nulls to dictionary builders #6542 [arrow] (adriangb)
- Improve UnionArray::is_nullable #6540 [arrow] (tustvold)
- Allow to read parquet binary column as UTF8 type #6539 [parquet] (goldmedal)
- Use HashTable instead of raw_entry_mut #6537 [parquet] [arrow] (tustvold)
- Add append_many to dictionary arrays to allow adding repeated values #6534 [arrow] (adriangb)
- Adds documentation and example recommending Vec<ArrayRef> over ChunkedArray #6527 [arrow] (efredine)
- Update proc-macro2 requirement from =1.0.86 to =1.0.87 #6526 [arrow] [arrow-flight] (dependabot[bot])
- Add
ColumnChunkMetadataBuilderclear APIs #6523 [parquet] (alamb) - Update sysinfo requirement from 0.31.2 to 0.32.0 #6521 [parquet] (dependabot[bot])
- Update Tonic to 0.12.3 #6517 [arrow] [arrow-flight] (cisaacson)
- Detect missing page indexes while reading Parquet metadata #6507 [parquet] (etseidl)
- Use ParquetMetaDataReader to load page indexes in
SerializedFileReader::new_with_options#6506 [parquet] (etseidl) - Improve parquet
MetadataFetchandAsyncFileReaderdocs #6505 [parquet] (alamb) - fix arrow-json encoding with dictionary including nulls #6503 [arrow] (samuelcolvin)
- Update brotli requirement from 6.0 to 7.0 #6499 [parquet] (dependabot[bot])
- Benchmark both scenarios, with records skipped and without skipping, for delta-bin-packed primitive arrays with half nulls. #6489 [parquet] (wiedld)
- Add round trip tests for reading/writing parquet metadata #6463 [parquet] (alamb)
53.1.0 (2024-10-02)
Implemented enhancements:
- Write null counts in Parquet statistics when they are known to be zero #6502 [parquet]
- Make it easier to find / work with
ByteView#6478 [arrow] - Update lexical-core version due to soundness issues with current version #6468
- Add builder style API for manipulating
ParquetMetaData#6465 [parquet] ArrayData.align_buffersshould supportStructdata type / child data #6461 [arrow]- Add a method to return the number of skipped rows in a
RowSelection#6428 [parquet] - Bump lexical-core to 1.0 #6397 [arrow]
- Add union_extract kernel #6386 [arrow]
- implement
regexp_is_match_utf8andregexp_is_match_utf8_scalarforStringViewArray#6370 [arrow] - Add support for BinaryView in arrow_string::length #6358 [arrow]
- Add
as_uniontoAsArray#6351 - Ability to append non contiguous strings to
StringBuilder#6347 [arrow] - Add Catalog DB Schema subcommands to
flight_sql_client#6331 [arrow] [arrow-flight] - Add support for Utf8View in arrow_string::length #6305 [arrow]
- Reading FIXED_LEN_BYTE_ARRAY columns with nulls is inefficient #6296 [parquet]
- Optionally verify 32-bit CRC checksum when decoding parquet pages #6289 [parquet]
- Speed up
pad_nullsforFixedLenByteArrayBuffer#6297 [parquet] (etseidl) - Improve performance of set_bits by avoiding to set individual bits #6288 [arrow] (kazuyukitanimura)
Fixed bugs:
- BitIterator panics when retrieving length #6480 [arrow]
- Flight data retrieved via Python client (wrapping C++) cannot be used by Rust Arrow #6471 [arrow]
- CI integration test failing: Archery test With other arrows #6448 [parquet] [arrow] [arrow-flight]
- IPC not respecting not preserving dict ID #6443 [parquet] [arrow] [arrow-flight]
- Failing CI: Prost requires Rust 1.71.1 #6436 [arrow] [arrow-flight]
- Invalid struct arrays in IPC data causes panic during read #6416 [arrow]
- REE Dicts cannot be encoded/decoded with streaming IPC #6398 [arrow]
- Reading json
mapwith non-nullable value schema doesn't error if values are actually null #6391 - StringViewBuilder with deduplication does not clear observed values #6384 [arrow]
- Cast from Decimal(p, s) to dictionary-encoded Decimal(p, s) loses precision and scale #6381 [arrow]
- LocalFileSystem
listoperation returns objects in wrong order #6375 compute::binary_mutreturnsErr(PrimitiveArray<T>)only with certain arrays #6374 [arrow]- Exporting Binary/Utf8View from arrow-rs to pyarrow fails #6366 [arrow]
- warning: methods
as_anyandnext_batchare never used inparquetcrate #6143 [parquet]
Documentation updates:
- chore: add docs, part of #37 #6496 [parquet] [arrow] [arrow-flight] (ByteBaker)
- Minor: improve
ChunkedReaderdocs #6477 [parquet] (alamb) - Minor: Add some missing documentation to fix CI errors #6445 [arrow] (etseidl)
- Fix doc "bit width" to "byte width" #6434 [arrow] (kylebarron)
- chore: add docs, part of #37 #6433 [arrow] (ByteBaker)
- chore: add docs, part of #37 #6424 [arrow] (ByteBaker)
- Rephrase doc comment #6421 [parquet] [arrow] [arrow-flight] (waynexia)
- Remove "NOT YET FULLY SUPPORTED" comment from DataType::Utf8View/BinaryView #6380 [arrow] (alamb)
- Improve
GenericStringBuilderdocumentation #6372 [arrow] (alamb)
Closed issues:
- Columnar json writer for arrow-json #6411
- Primitive
binary/unaryare not as fast as they could be #6364 [arrow] - Different numeric type may be able to compare #6357
Merged pull requests:
- fix: override
size_hintforBitIteratorto return the exact remaining size #6495 [arrow] (Beihao-Zhou) - Minor: Fix path in format command in CONTRIBUTING.md #6494 (etseidl)
- Write null counts in Parquet statistics when they are known #6490 [parquet] (etseidl)
- Add configuration option to
StatisticsConverterto control interpretation of missing null counts in Parquet statistics #6485 [parquet] (etseidl) - fix: check overflow numbers while inferring type for csv files #6481 [arrow] (CookiePieWw)
- Add better documentation, examples and builer-style API to
ByteView#6479 [arrow] (alamb) - Add take_arrays util for getting entries from 2d arrays #6475 [arrow] (akurmustafa)
- Deprecate
MetadataLoader#6474 [parquet] (etseidl) - Update tonic-build requirement from =0.12.2 to =0.12.3 #6473 [arrow] [arrow-flight] (dependabot[bot])
- Align buffers from Python (FFI) #6472 [arrow] (EnricoMi)
- Add
ParquetMetaDataBuilder#6466 [parquet] (alamb) - Make
ArrayData.align_buffersalign child data buffers recursively #6462 [arrow] (EnricoMi) - Minor: Silence compiler warnings for
parquet::file::metadata::reader#6457 [parquet] (etseidl) - Minor: Error rather than panic for unsupported for dictionary
casting #6456 [arrow] (goldmedal) - Support cast between Durations + between Durations all numeric types #6452 [arrow] (tisonkun)
- Deprecate methods from footer.rs in favor of
ParquetMetaDataReader#6451 [parquet] (etseidl) - Workaround for missing Parquet page indexes in
ParquetMetadaReader#6450 [parquet] (etseidl) - Fix CI by disabling newly failing rust <> nanoarrow integration test in CI #6449 (alamb)
- Add
IpcSchemaEncoder, deprecate ipc schema functions, Fix IPC not respecting not preserving dict ID #6444 [parquet] [arrow] [arrow-flight] (brancz) - Add additional documentation and builder APIs to
SortOptions#6441 [arrow] (alamb) - Update prost-build requirement from =0.13.2 to =0.13.3 #6440 [arrow] [arrow-flight] (dependabot[bot])
- Bump arrow-flight MSRV to 1.71.1 #6437 [arrow] [arrow-flight] (gstvg)
- Silence warnings that
as_anyandnext_batchare never used #6432 [parquet] (etseidl) - Add
ParquetMetaDataReader#6431 [parquet] (etseidl) - Add RowSelection::skipped_row_count #6429 [parquet] (progval)
- perf: Faster decimal precision overflow checks #6419 [arrow] (andygrove)
- fix: don't panic in IPC reader if struct child arrays have different lengths #6417 [arrow] (alexwilcoxson-rel)
- Reduce integration test matrix #6407 (kou)
- Move lifetime of
take_iterfrom iterator to its items #6403 [arrow] (dariocurr) - Update lexical-core requirement from 0.8 to 1.0 (to resolve RUSTSEC-2023-0086) #6402 [arrow] (dariocurr)
- Fix encoding/decoding REE Dicts when using streaming IPC #6399 [arrow] (brancz)
- fix: binary_mut should work if only one input array has null buffer #6396 [arrow] (viirya)
- Add
set_bitsfuzz test #6394 [arrow] (alamb) - impl
From<ScalarBuffer<T>>forBuffer#6389 [arrow] (mbrobbel) - Add
union_extractkernel #6387 [arrow] (gstvg) - Clear string-tracking hash table when ByteView deduplication is enabled #6385 [arrow] (shanesveller)
- fix: Stop losing precision and scale when casting decimal to dictionary #6383 [arrow] (andygrove)
- Add
ARROW_VERSIONconst #6379 [arrow] (samuelcolvin) - parquet writer: Raise an error when the row_group_index overflows i16 #6378 [parquet] (progval)
- Implement native support StringViewArray for
regexp_is_matchandregexp_is_match_scalarfunction, deprecateregexp_is_match_utf8andregexp_is_match_utf8_scalar#6376 [arrow] (tlm365) - Update chrono-tz requirement from 0.9 to 0.10 #6371 [arrow] (dependabot[bot])
- Support StringViewArray interop with python: fix lingering C Data Interface issues for *ViewArray #6368 [arrow] (a10y)
- stop panic in
MetadataLoaderon invalid data #6367 [parquet] (samuelcolvin) - Add support for BinaryView in arrow_string::length #6359 [arrow] (Omega359)
- impl
From<Vec<T>>forBuffer#6355 [arrow] (mbrobbel) - Add breaking change from #6043 to
CHANGELOG#6354 (mbrobbel) - Benchmark for bit_mask (set_bits) #6353 [arrow] (kazuyukitanimura)
- Update prost-build requirement from =0.13.1 to =0.13.2 #6350 [arrow] [arrow-flight] (dependabot[bot])
- fix: clippy warnings from nightly rust 1.82 #6348 [parquet] [arrow] (waynexia)
- Add support for Utf8View in arrow_string::length #6345 [arrow] (Omega359)
- feat: add catalog/schema subcommands to flight_sql_client. #6332 [arrow] [arrow-flight] (nathanielc)
- Manually run fmt on all files under parquet #6328 [parquet] (etseidl)
- Implement UnionArray logical_nulls #6303 [arrow] (gstvg)
- Parquet: Verify 32-bit CRC checksum when decoding pages #6290 [parquet] (xmakro)
53.0.0 (2024-08-31)
Breaking changes:
- parquet_derive: Match fields by name, support reading selected fields rather than all #6269 (double-free)
- Update parquet object_store dependency to 0.11.0 #6264 [parquet] (alamb)
- parquet Statistics - deprecate
has_*APIs and add_optfunctions that returnOption<T>#6216 [parquet] (Michael-J-Ward) - Expose bulk ingest in flight sql client and server #6201 [arrow] [arrow-flight] (djanderson)
- Upgrade protobuf definitions to flightsql 17.0 (#6133) #6169 [arrow-flight] (alamb)
- Remove automatic buffering in
ipc::reader::FileReaderfor for consistent buffering #6132 [arrow] (V0ldek) - No longer write Parquet column metadata after column chunks *and* in the footer #6117 [parquet] (etseidl)
- Remove
impl<T: AsRef<[u8]>> From<T> for Bufferthat easily accidentally copies data #6043 [arrow] (XiangpengHao)
Implemented enhancements:
- Derive
PartialEqandEqforparquet::arrow::ProjectionMask#6329 [parquet] - Allow converting empty
pyarrow.RecordBatchtoarrow::RecordBatch#6318 [arrow] - Parquet writer should not write any min/max data to ColumnIndex when all values are null #6315 [parquet]
- Parquet: Add
unionmethod toRowSelection#6307 [parquet] - Support writing
UTC adjusted timearrow array to parquet #6277 [parquet] - A better way to resize the buffer for the snappy encode/decode #6276 [parquet]
- parquet_derive: support reading selected columns from parquet file #6268
- Tests for invalid parquet files #6261 [parquet]
- Implement
date_partforDuration#6245 [arrow] - Avoid unnecessary null buffer construction when converting arrays to a different type #6243 [parquet] [arrow]
- Add
parquet_opendalin related projects #6235 - Look into optimizing reading FixedSizeBinary arrays from parquet #6219 [parquet] [arrow]
- Add benchmarks for
BYTE_STREAM_SPLITencoded ParquetFIXED_LEN_BYTE_ARRAYdata #6203 [parquet] - Make it easy to write parquet to object_store -- Implement
AsyncFileWriterfor a type that implementsobj_store::MultipartUploadforAsyncArrowWriter#6200 [parquet] - Remove test duplication in parquet statistics tets #6185 [parquet]
- Support BinaryView Types in C Schema FFI #6170 [arrow]
- speedup take_byte_view kernel #6167 [arrow]
- Add support for
StringViewandBinaryViewstatistics inStatisticsConverter#6164 [parquet] - Support casting
BinaryView-->Utf8andLargeUtf8#6162 [arrow] - Implement
filterkernel specially forFixedSizeByteArray#6153 [arrow] - Use
LevelHistogramthroughout Parquet metadata #6134 [parquet] - Support DoPutStatementIngest from Arrow Flight SQL 17.0 #6124 [arrow] [arrow-flight]
- ColumnMetaData should no longer be written inline with data #6115 [parquet]
- Implement date_part for
Interval#6113 [arrow] - Implement
Into<Arc<dyn Array>>forArrayData#6104 - Allow flushing or non-buffered writes from
arrow::ipc::writer::StreamWriter#6099 [arrow] - Default block_size for
StringViewArray#6094 [arrow] - Remove
Statistics::has_min_max_setandValueStatistics::has_min_max_setand useOptioninstead #6093 [parquet] - Upgrade arrow-flight to tonic 0.12 #6072
- Improve speed of row converter by skipping utf8 checks #6058 [arrow]
- Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048 [parquet]
- Release arrow-rs / parquet minor version
52.2.0(August 2024) #5998 [parquet] [arrow]
Fixed bugs:
- Invalid
ColumnIndexwritten in parquet #6310 [parquet] - comparison_kernels benchmarks panic #6283 [arrow]
- Printing schema metadata includes possibly incorrect compression level #6270 [parquet]
- Don't panic when creating
FieldfromFFI_ArrowSchemawith no name #6251 [arrow] - lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6226 [arrow]
- Parquet Statistics null_count does not distinguish between
0and not specified #6215 [parquet] - Using a take kernel on a dense union can result in reaching "unreachable" code #6206 [arrow]
- Adding sub day seconds to Date64 is ignored. #6198 [arrow]
- mismatch between parquet type
is_optionalcodes and comment #6191 [parquet]
Documentation updates:
- Minor: improve filter documentation #6317 [arrow] (alamb)
- Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306 [arrow] (alamb)
- Minor: improve
RowFilterandArrowPredicatedocs #6301 [parquet] (alamb) - Improve documentation for
MutableArrayData#6272 [arrow] (alamb) - Add examples to
StringViewBuilderandBinaryViewBuilder#6240 [arrow] (alamb) - minor: enhance document for ParquetField #6239 [parquet] (mapleFU)
- Minor: Improve Type documentation #6224 [arrow] (alamb)
- Minor: Update
DateType::Date64docs #6223 [arrow] (alamb) - Add (more) Parquet Metadata Documentation #6184 [parquet] (alamb)
- Add additional documentation and examples to
ArrayAccessor#6141 [arrow] (alamb) - Minor: improve comments in temporal.rs tests #6140 [arrow] (alamb)
- Minor: Update release schedule in README #6125 (alamb)
Closed issues:
- Simplify take octokit workflow #6279
- Make the bearer token visible in FlightSqlServiceClient #6253 [arrow] [arrow-flight]
- Port
takeworkflow to useoktokit#6242 - Remove
SchemaBuilderdependency fromStructArrayconstructors #6138 [arrow]
Merged pull requests:
- Derive PartialEq and Eq for parquet::arrow::ProjectionMask #6330 [parquet] (thinkharderdev)
- Support zero column
RecordBatches in pyarrow integration (use RecordBatchOptions when converting a pyarrow RecordBatch) #6320 [arrow] (Michael-J-Ward) - Fix writing of invalid Parquet ColumnIndex when row group contains null pages #6319 [parquet] (adriangb)
- Pass empty vectors as min/max for all null pages when building ColumnIndex #6316 [parquet] (etseidl)
- Update tonic-build requirement from =0.12.0 to =0.12.2 #6314 [arrow] [arrow-flight] (dependabot[bot])
- Parquet: add
unionmethod toRowSelection#6308 [parquet] (sdd) - Specialize filter for structs and sparse unions #6304 [arrow] (gstvg)
- Err on
try_from_le_slice#6295 [parquet] (samuelcolvin) - fix reference in doctest to size_of which is not imported by default #6286 [arrow] (rtyler)
- Support writing UTC adjusted time arrays to parquet #6278 [parquet] (aykut-bozkurt)
- Minor:
pub use ByteViewin arrow and improve documentation #6275 [arrow] (alamb) - Fix accessing name from ffi schema #6273 [arrow] (kylebarron)
- Do not print compression level in schema printer #6271 [parquet] (ttencate)
- ci: use octokit to add assignee #6267 (dsgibbons)
- Add tests for bad parquet files #6262 [parquet] (alamb)
- Add
Statistics::distinct_count_optand deprecateStatistics::distinct_count#6259 [parquet] (alamb) - Minor: move
FallibleRequestStreamandFallibleTonicResponseStreamto a module #6258 [arrow] [arrow-flight] (alamb) - Make the bearer token visible in FlightSqlServiceClient #6254 [arrow] [arrow-flight] (ccciudatu)
- Use
unary()for array conversion in Parquet array readers, speed upDecimal128,Decimal256andFloat16#6252 [parquet] [arrow] (etseidl) - Update tower requirement from 0.4.13 to 0.5.0 #6250 [arrow] [arrow-flight] (dependabot[bot])
- Implement date_part for durations #6246 [arrow] (nrc)
- Remove unnecessary null buffer construction when converting arrays to a different type #6244 [parquet] [arrow] (etseidl)
- Implement PartialEq for GenericByteViewArray #6241 [arrow] (alamb)
- Minor: Remove non standard footer from LICENSE.txt / reference to Apache Aurora #6237 (alamb)
- docs: Add parquet_opendal in related projects #6236 (Xuanwo)
- Avoid infinite loop in bad parquet by checking the number of rep levels #6232 [parquet] (jp0317)
- Specialize Prefix/Suffix Match for
Like/ILikebetween Array and Scalar for StringViewArray #6231 [arrow] (xinlifoobar) - fix: lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6225 [arrow] (viirya)
- Modest improvement to FixedLenByteArray BYTE_STREAM_SPLIT arrow decoder #6222 [parquet] (etseidl)
- Improve performance of
FixedLengthBinarydecoding #6220 [parquet] (etseidl) - Update documentation for Parquet BYTE_STREAM_SPLIT encoding #6212 [parquet] (etseidl)
- Improve interval parsing #6211 [arrow] (samuelcolvin)
- minor: Suggest take on interleave docs #6210 [arrow] (gstvg)
- fix: Correctly handle take on dense union of a single selected type #6209 [arrow] (gstvg)
- Add time dictionary coercions #6208 [arrow] (adriangb)
- fix(arrow): restrict the range of temporal values produced via
data_gen#6205 [arrow] (kyle-mccarthy) - Add benchmarks for
BYTE_STREAM_SPLITencoded ParquetFIXED_LEN_BYTE_ARRAYdata #6204 [parquet] (etseidl) - Move
ParquetMetadataWriterto its own module, update documentation #6202 [parquet] (alamb) - Add
ThriftMetadataWriterfor writing Parquet metadata #6197 [parquet] (adriangb) - Update zstd-sys requirement from >=2.0.0, <2.0.13 to >=2.0.0, <2.0.14 #6196 [parquet] (dependabot[bot])
- fix parquet type
is_optionalcomments #6192 [parquet] (jp0317) - Remove duplicated statistics tests in parquet #6190 [parquet] (Kev1n8)
- Benchmarks for
bool_and#6189 [arrow] (simonvandel) - Fix typo in documentation of Float64Array #6188 [arrow] (mesejo)
- Make it clear that
StatisticsConvertercan not panic #6187 [parquet] (alamb) - add filter benchmark for
FixedSizeBinaryArray#6186 [arrow] (chloro-pn) - Update sysinfo requirement from 0.30.12 to 0.31.2 #6182 [parquet] (dependabot[bot])
- Add support for
StringViewandBinaryViewstatistics inStatisticsConverter#6181 [parquet] (Kev1n8) - Support casting between BinaryView <--> Utf8 and LargeUtf8 #6180 [arrow] (xinlifoobar)
- Implement specialized filter kernel for
FixedSizeByteArray#6178 [arrow] (chloro-pn) - Support
StringViewandBinaryViewin CDataInterface #6171 [arrow] (a10y) - Optimize
takekernel forBinaryViewArrayandStringViewArray#6168 [arrow] (a10y) - Support Parquet
BYTE_STREAM_SPLITfor INT32, INT64, and FIXED_LEN_BYTE_ARRAY primitive types #6159 [parquet] (etseidl) - Fix comparison kernel benchmarks #6147 [arrow] (samuelcolvin)
- improve
LIKEregex performance up to 12x #6145 [arrow] (samuelcolvin) - Optimize
min_booleanandbool_and#6144 [arrow] (simonvandel) - Reduce bounds check in
RowIter, addunsafe Rows::row_unchecked#6142 [arrow] (XiangpengHao) - Minor: Simplify
StructArrayconstructors #6139 [arrow] (Rafferty97) - Implement exponential block size growing strategy for
StringViewBuilder#6136 [arrow] (XiangpengHao) - Use
LevelHistograminPageIndex#6135 [parquet] (etseidl) - Add ArrowError::ArithmeticError #6130 [arrow] (andygrove)
- Improve
LIKEperformance for "contains" style queries #6128 [arrow] (samuelcolvin) - Add
BooleanArray::new_from_packedandBooleanArray::new_from_u8#6127 [arrow] (chloro-pn) - improvements to
(i)starts_withand(i)ends_withperformance #6118 [arrow] (samuelcolvin) - Fix Clippy for the Rust 1.80 release #6116 [parquet] [arrow] [arrow-flight] (alamb)
- added a flush method to IPC writers #6108 [arrow] (V0ldek)
- Add support for level histograms added in PARQUET-2261 to
ParquetMetaData#6105 [parquet] (etseidl) - Implement date_part for intervals #6071 [arrow] (nrc)
- feat(parquet): Implement AsyncFileWriter for
object_store::buffered::BufWriter#6013 [parquet] (Xuanwo)
52.2.0 (2024-07-24)
Implemented enhancements:
- Faster min/max for string/binary view arrays #6088 [arrow]
- Support casting to/from Utf8View #6076 [arrow]
- Min/max support for String/BinaryViewArray #6052 [arrow]
- Improve performance of constructing
ByteViews for small strings #6034 [parquet] [arrow] - Fast UTF-8 validation when reading StringViewArray from Parquet #5995 [parquet]
- Optimize StringView row decoding #5945 [arrow]
- Implementing
deduplicate/internfunctionality for StringView #5910 [arrow] - Add
FlightSqlServiceClient::new_from_inner#6003 [arrow] [arrow-flight] (lewiszlw) - Complete
StringViewArrayandBinaryViewArrayparquet decoder: #6004 [parquet] (XiangpengHao) - Add begin/end_transaction methods in FlightSqlServiceClient #6026 [arrow] [arrow-flight] (lewiszlw)
- Read Parquet statistics as arrow
Arrays#6046 [parquet] (efredine)
Fixed bugs:
- Panic in
ParquetMetadata::memory_sizeif no min/max set #6091 [parquet] - BinaryViewArray doesn't roundtrip a single
Some(&[])through parquet #6086 [parquet] - Parquet
ColumnIndexfor null columns is written even when statistics are disabled #6010 [parquet]
Documentation updates:
- Fix typo in GenericByteViewArray documentation #6054 [arrow] (progval)
- Minor: Improve parquet PageIndex documentation #6042 [parquet] (alamb)
Closed issues:
- Potential performance improvements for reading Parquet to StringViewArray/BinaryViewArray #5904 [parquet] [arrow]
Merged pull requests:
- Faster
GenericByteViewconstruction #6102 [parquet] [arrow] (XiangpengHao) - Add benchmark to track byte-view construction performance #6101 [parquet] (XiangpengHao)
- Optimize
bool_orusingmax_boolean#6100 [arrow] (simonvandel) - Optimize
max_booleanby operating on u64 chunks #6098 [arrow] (simonvandel) - fix panic in
ParquetMetadata::memory_size: check has_min_max_set before invoking min()/max() #6092 [parquet] (Fischer0522) - Implement specialized min/max for
GenericBinaryView(StringViewandBinaryView) #6089 [arrow] (XiangpengHao) - Add PartialEq to ParquetMetaData and FileMetadata #6082 [parquet] (adriangb)
- Enable casting from Utf8View #6077 [arrow] (a10y)
- StringView support in arrow-csv #6062 [arrow] (2010YOUY01)
- Implement min max support for string/binary view types #6053 [arrow] (XiangpengHao)
- Minor: clarify the relationship between
file::metadataandformatin docs #6049 [parquet] (alamb) - Minor API adjustments for StringViewBuilder #6047 [arrow] (XiangpengHao)
- Add parquet
StatisticsConverterfor arrow reader #6046 [parquet] (efredine) - Directly decode String/BinaryView types from arrow-row format #6044 [arrow] (XiangpengHao)
- Clean up unused code for view types in offset buffer #6040 [parquet] (XiangpengHao)
- Avoid using Buffer api that accidentally copies data #6039 [parquet] [arrow] [arrow-flight] (XiangpengHao)
- MINOR: Fix
hashbrownversion inarrow-array, remove fromarrow-row#6035 [arrow] (mbrobbel) - Improve performance reading
ByteViewArrayfrom parquet by removing an implicit copy #6031 [parquet] (XiangpengHao) - Add begin/end_transaction methods in FlightSqlServiceClient #6026 [arrow] [arrow-flight] (lewiszlw)
- Unsafe improvements: core
parquetcrate. #6024 [parquet] (veluca93) - Additional tests for parquet reader utf8 validation #6023 [parquet] (alamb)
- Update zstd-sys requirement from >=2.0.0, <2.0.12 to >=2.0.0, <2.0.13 #6019 [parquet] (dependabot[bot])
- fix doc ci in latest rust nightly version #6012 [arrow] [arrow-flight] (Rachelint)
- Do not write
ColumnIndexfor null columns when not writing page statistics #6011 [parquet] (etseidl) - Fast utf8 validation when loading string view from parquet #6009 [parquet] (XiangpengHao)
- Deduplicate strings/binarys when building view types #6005 [arrow] (XiangpengHao)
- Complete
StringViewArrayandBinaryViewArrayparquet decoder: implement delta byte array and delta length byte array encoding #6004 [parquet] (XiangpengHao) - Add
FlightSqlServiceClient::new_from_inner#6003 [arrow] [arrow-flight] (lewiszlw) - Rename
Schema::all_fieldstoflattened_fields#6001 [parquet] [arrow] [arrow-flight] (lewiszlw) - Refine documentation and examples for
DataType#5997 [arrow] (alamb) - implement
DataType::try_form(&str)#5994 [arrow] (samuelcolvin) - Implement dictionary support for reading ByteView from parquet #5973 [parquet] (XiangpengHao)
52.1.0 (2024-07-02)
Implemented enhancements:
- Implement
eqcomparison for StructArray #5960 [arrow] - A new feature as a workaround hack to unavailable offset support in Arrow Java #5959 [arrow]
- Add
min_bytesandmax_bytestoPageIndex#5949 [parquet] - Error message in ArrowNativeTypeOp::neg_checked doesn't include the operation #5944 [arrow]
- Add object_store_opendal as related projects #5925
- Opaque retry errors make debugging difficult #5923
- Implement arrow-row en/decoding for GenericByteView types #5921 [arrow]
- The arrow-rs repo is very large #5908
- [DISCUSS] Release arrow-rs / parquet patch release
52.0.1#5906 [arrow] - Implement
compare_opforGenericBinaryView#5897 [arrow] - New null with view types are not supported #5893 [arrow]
- Cleanup ByteView construction #5878 [parquet] [arrow]
castkernel support forStringViewArrayandBinaryViewArray\<--\>DictionaryArray` #5861 [arrow]- parquet::ArrowWriter show allow writing Bloom filters before the end of the file #5859 [parquet]
- API to get memory usage for parquet ArrowWriter #5851 [parquet]
- Support writing
IntervalMonthDayNanoArrayto parquet via Arrow Writer #5849 [parquet] - Write parquet statistics for
IntervalDayTimeArray,IntervalMonthDayNanoArrayandIntervalYearMonthArray#5847 [parquet] - Make
RowSelection::from_consecutive_rangespublic #5846 [parquet] Schema::try_mergeshould be able to merge List of any data type with List of Null data type #5843 [arrow]- Add a way to move
fieldsout of parquetRow#5841 [parquet] - Make
TimeUnitandIntervalUnitCopy#5839 [arrow] - Limit Parquet Page Row Count By Default to reduce writer memory requirements with highly compressable columns #5797 [parquet]
- Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns #5770 [parquet] [arrow]
- Structured ByteView Access (underlying StringView/BinaryView representation) #5736 [arrow]
- [parquet_derive] support OPTIONAL (def_level = 1) columns by default #5716
- Maps cast to other Maps with different Elements, Key and Value Names #5702 [arrow]
- Provide Arrow Schema Hint to Parquet Reader #5657 [parquet] [arrow]
Fixed bugs:
- Wrong error type in case of invalid amount in Interval components #5986 [arrow]
- Empty and Null structarray fails to IPC roundtrip #5920
- FixedSizeList got out of range when the total length of the underlying values over i32::MAX #5901 [arrow]
- Out of range when extending on a slice of string array imported through FFI #5896 [arrow]
- cargo msrv test is failing on main for
object_store#5864 [parquet]
Documentation updates:
- chore: update RunArray reference in run_iterator.rs #5892 [arrow] (Weijun-H)
- Minor: Clarify when page index structures are read #5886 [parquet] (alamb)
- Improve Parquet reader/writer properties docs #5863 [parquet] (alamb)
- Refine documentation for
unary_mutandbinary_mut#5798 [arrow] (alamb)
Closed issues:
Merged pull requests:
- fix: error in case of invalid amount interval component #5987 [arrow] (DDtKey)
- Minor: fix clippy complaint in parquet_derive #5984 (alamb)
- Reduce repo size by removing accumulative commits in CI job #5982 (Owen-CH-Leung)
- Add operation in ArrowNativeTypeOp::neg_check error message (#5944) #5980 [arrow] (zhao-gang)
- Implement directly build byte view array on top of parquet buffer #5972 [parquet] (XiangpengHao)
- Handle flight dictionary ID assignment automatically #5971 [arrow] [arrow-flight] (thinkharderdev)
- Add view buffer for parquet reader #5970 [parquet] [arrow] (XiangpengHao)
- Add benchmark for reading binary/binary view from parquet #5968 [parquet] (XiangpengHao)
- feat(5851): ArrowWriter memory usage #5967 [parquet] (wiedld)
- Add ParquetMetadata::memory_size size estimation #5965 [parquet] (alamb)
- Fix FFI array offset handling #5964 [arrow] (tustvold)
- Implement sort for String/BinaryViewArray #5963 [arrow] (XiangpengHao)
- Improve error message for unsupported nested comparison #5961 [arrow] (alamb)
- chore(5797): change default parquet data_page_row_limit to 20k #5957 [parquet] (wiedld)
- Document process for PRs with breaking changes #5953 (alamb)
- Minor: fixup contribution guide about clippy #5952 (alamb)
- feat: add max_bytes and min_bytes on PageIndex #5950 [parquet] (tshauck)
- test: Add unit test for extending slice of list array #5948 [arrow] (viirya)
- minor: row format benches for bool & nullable int #5943 [arrow] (korowa)
- Better document support for nested comparison #5942 [arrow] (tustvold)
- Provide Arrow Schema Hint to Parquet Reader - Alternative 2 #5939 [parquet] (efredine)
likebenchmark for StringView #5936 [arrow] (alamb)- Fix typo in benchmark name
egexp-->regexp#5935 [arrow] (alamb) - Revert "Write Bloom filters between row groups instead of the end " #5932 [parquet] (alamb)
- Implement like/ilike etc for StringViewArray #5931 [arrow] (XiangpengHao)
- docs: Fix broken links of object_store_opendal README #5929 (Xuanwo)
- Expose
IntervalMonthDayNanoandIntervalDayTimeand update docs #5928 [arrow] (alamb) - Update proc-macro2 requirement from =1.0.85 to =1.0.86 #5927 [arrow] [arrow-flight] (dependabot[bot])
- docs: Add object_store_opendal as related projects #5926 (Xuanwo)
- Add eq benchmark for StringArray/StringViewArray #5924 [arrow] (XiangpengHao)
- Implement arrow-row encoding/decoding for view types #5922 [arrow] (XiangpengHao)
- fix(ipc): set correct row count when reading struct arrays with zero fields #5918 [arrow] (kawadakk)
- Update zstd-sys requirement from >=2.0.0, <2.0.10 to >=2.0.0, <2.0.12 #5913 [parquet] (dependabot[bot])
- fix: prevent potential out-of-range access in FixedSizeListArray #5902 [arrow] (BubbleCal)
- Implement compare operations for view types #5900 [arrow] (XiangpengHao)
- minor: use as_primitive replace downcast_ref #5898 [arrow] (Kikkon)
- fix: Adjust FFI_ArrowArray offset based on the offset of offset buffer #5895 [arrow] (viirya)
- implement
new_null_arrayfor view types #5894 [arrow] (XiangpengHao) - chore: add view type single column tests #5891 [parquet] (ariesdevil)
- Minor: expose timestamp_tz_format for csv writing #5890 [arrow] (tmi)
- chore: implement parquet error handling for object_store #5889 [parquet] (abhiaagarwal)
- Document when the ParquetRecordBatchReader will re-read metadata #5887 [parquet] (alamb)
- Add simple GC for view array types #5885 [arrow] (XiangpengHao)
- Update for new clippy rules #5881 [parquet] [arrow] (XiangpengHao)
- clean up ByteView construction #5879 [parquet] [arrow] (XiangpengHao)
- Avoid copy/allocation when read view types from parquet #5877 [parquet] (XiangpengHao)
- Document parquet ArrowWriter type limitations #5875 [parquet] (alamb)
- Benchmark for casting view to dict arrays (and the reverse) #5874 [arrow] (XiangpengHao)
- Implement Take for Dense UnionArray #5873 [arrow] (gstvg)
- Improve performance of casting
StringView/BinaryViewtoDictionaryArray#5872 [arrow] (XiangpengHao) - Improve performance of casting
DictionaryArraytoStringViewArray#5871 [arrow] (XiangpengHao) - fix: msrv CI for object_store #5866 (korowa)
- parquet: Fix warning about unused import #5865 [parquet] (progval)
- Preallocate for
FixedSizeListinconcat#5862 [arrow] (judahrand) - Faster primitive arrays encoding into row format #5858 [arrow] (korowa)
- Added panic message to docs. #5857 [arrow] (SeeRightThroughMe)
- feat: call try_merge recursively for list field #5852 [arrow] (mnpw)
- Minor: refine row selection example more #5850 [parquet] (alamb)
- Make RowSelection's from_consecutive_ranges public #5848 [parquet] (advancedxy)
- Add exposing fields from parquet row #5842 [parquet] (SHaaD94)
- Derive
CopyforTimeUnitandIntervalUnit#5840 [arrow] (mbrobbel) - feat: support reading OPTIONAL column in parquet_derive #5717 (double-free)
- Add the ability for Maps to cast to another case where the field names are different #5703 [arrow] (HawaiianSpork)
52.0.0 (2024-06-03)
Breaking changes:
- chore: Make binary_mut kernel accept different type for second arg #5833 [arrow] (viirya)
- fix(flightsql): remove Any encoding of
DoPutPreparedStatementResult#5817 [arrow] [arrow-flight] (erratic-pattern) - Encode UUID as FixedLenByteArray in parquet_derive #5773 (conradludgate)
- Structured interval types for
IntervalMonthDayNanoorIntervalDayTime(#3125) (#5654) #5769 [parquet] [arrow] (tustvold) - Fallible stream for arrow-flight do_exchange call (#3462) #5698 [arrow] [arrow-flight] (opensourcegeek)
- Update object_store dependency in arrow to
0.10.0#5675 [parquet] (tustvold) - Remove deprecated JSON writer #5651 [arrow] (tustvold)
- Change
UnionArrayconstructors #5623 [arrow] [arrow-flight] (mbrobbel) - Update py03 from 0.20 to 0.21 #5566 [arrow] (Jefffrey)
- Optionally require alignment when reading IPC, respect alignment when writing #5554 [arrow] [arrow-flight] (hzuo)
Implemented enhancements:
- Serialize
BinaryandLargeBinaryas HEX with JSON Writer #5783 [arrow] - Some optimizations in arrow_buffer::util::bit_util do more harm than good #5771 [arrow]
- Support skipping comments in CSV files #5758 [arrow]
parquet-deriveshould be included in repository README. #5751- proposal: Make AsyncArrowWriter accepts AsyncFileWriter trait instead #5738 [parquet]
- Nested nullable fields do not get treated as nullable in data_gen #5712 [arrow]
- Optionally support flexible column lengths #5678 [arrow]
- Arrow Flight SQL example server: do_handshake should include auth header #5665 [arrow] [arrow-flight]
- Add support for the "r+" datatype in the C Data interface /
RunArray#5631 [arrow] - Serialize
FixedSizeBinaryas HEX with JSON Writer #5620 [arrow] - Cleanup UnionArray Constructors #5613 [arrow] [arrow-flight]
- Zero Copy Support #5593
- ObjectStore bulk delete #5591
- Retry on Broken Connection #5589
StreamReaderis not zero-copy #5584 [arrow]- Create
ArrowReaderMetadatafrom externalized metadata #5582 [parquet] - Make
filterinfilter_leavesAPI propagate error #5574 [arrow] - Support
Listincompare_op#5572 - Make FixedSizedList Json serializable #5568 [arrow]
- arrow-ord: Support sortting StructArray #5559 [arrow]
- Add scientific notation decimal parsing in
parse_decimal#5549 [arrow] takekernel support forStringViewArrayandBinaryViewArray#5511 [arrow]filterkernel support forStringViewArrayandBinaryViewArray#5510 [arrow]- Display support for
StringViewArrayandBinaryViewArray#5509 [arrow] - Arrow Flight format support for
StringViewArrayandBinaryViewArray#5507 [arrow] [arrow-flight] - IPC format support for
StringViewArrayandBinaryViewArray#5506 [parquet] [arrow]
Fixed bugs:
- List Row Encoding Sorts Incorrectly #5807 [arrow]
- Schema Root Message Name Ignored by parquet-fromcsv #5804 [parquet]
- Compute data buffer length by using start and end values in offset buffer #5756 [arrow]
- parquet: ByteArrayEncoder allocates large unused FallbackEncoder for Parquet 2 #5755 [parquet]
- The CI pipeline
Archery test With other arrowis broken #5742 [arrow] - Unable to parse scientific notation string to decimal when scale is 0 #5739 [arrow]
- Stateless prepared statements wrap
DoPutPreparedStatementResultwithAnywhich is differs from Go implementation #5731 [arrow] [arrow-flight] - "Rustdocs are clean (amd64, nightly)" CI check is failing #5725 [parquet] [arrow]
- "Archery test With other arrows" integration tests are failing #5719 [parquet] [arrow]
- parquet_derive: invalid examples/documentation #5687
- Arrow FLight SQL: invalid location in get_flight_info_prepared_statement #5669 [arrow] [arrow-flight]
- Rust Interval definition incorrect #5654 [parquet] [arrow]
- DECIMAL regex in csv reader does not accept positive exponent specifier #5648 [arrow]
- panic when casting
ListArraytoFixedSizeList#5642 [arrow] - FixedSizeListArray::try_new Errors on Entirely Null Array With Size 0 #5614 [arrow]
parquet / Build wasm32 (pull_request)CI check failing on main #5565 [parquet] [arrow]- Documentation fix: example in parquet/src/column/mod.rs is incorrect #5560 [parquet]
- IPC code writes data with insufficient alignment #5553 [arrow] [arrow-flight]
- Cannot access example Flight SQL Server from dbeaver #5540 [arrow] [arrow-flight]
- parquet: "not yet implemented" error when codec is actually implemented but disabled #5520 [parquet]
Documentation updates:
- Minor: Improve arrow_cast documentation #5825 [arrow] (alamb)
- Minor: Improve
ArrowReaderBuilder::with_row_selectiondocs #5824 [parquet] (alamb) - Minor: Add examples for ColumnPath::from #5813 [parquet] (alamb)
- Minor: Clarify docs on
EnabledStatistics#5812 [parquet] (alamb) - Add parquet-derive to repository README #5795 (konjac)
- Refine ParquetRecordBatchReaderBuilder docs #5774 [parquet] (alamb)
- docs: add sizing explanation to bloom filter docs in parquet #5705 [parquet] (hiltontj)
Closed issues:
binary_mutkernel requires both args to be the same type (which is inconsistent withbinary) #5818 [arrow]- Panic when displaying debug the results via log::info in the browser. #5599 [arrow]
Merged pull requests:
- feat: impl *Assign ops for types in arrow-buffer #5832 [arrow] (waynexia)
- Relax zstd-sys Version Pin #5829 [parquet] (waynexia)
- Minor: Document timestamp with/without cast behavior #5826 [arrow] (alamb)
- fix: invalid examples/documentation in parquet_derive doc #5823 (Weijun-H)
- Check length of
FIXED_LEN_BYTE_ARRAYforuuidlogical parquet type #5821 [parquet] (mbrobbel) - Allow overriding the inferred parquet schema root #5814 [parquet] (tustvold)
- Revisit List Row Encoding (#5807) #5811 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.83 to =1.0.84 #5805 [arrow] [arrow-flight] (dependabot[bot])
- Fix typo continuation maker -> marker #5802 [arrow] (djanderson)
- fix: serialization of decimal #5801 [arrow] (yjshen)
- Allow constructing ByteViewArray from existing blocks #5796 [arrow] (tustvold)
- Push SortOptions into DynComparator Allowing Nested Comparisons (#5426) #5792 [arrow] (tustvold)
- Fix incorrect URL to Parquet CPP types.h #5790 [parquet] (viirya)
- Update proc-macro2 requirement from =1.0.82 to =1.0.83 #5789 [arrow] [arrow-flight] (dependabot[bot])
- Update prost-build requirement from =0.12.4 to =0.12.6 #5788 [arrow] [arrow-flight] (dependabot[bot])
- Refine parquet documentation on types and metadata #5786 [parquet] (alamb)
- feat(arrow-json): encode
BinaryandLargeBinarytypes as hex when writing JSON #5785 [arrow] (hiltontj) - fix broken link to ballista crate in README.md #5784 (navicore)
- feat(arrow-csv): support encoding of binary in CSV writer #5782 [arrow] (hiltontj)
- Fix documentation for parquet
parse_metadata,decode_metadataanddecode_footer#5781 [parquet] (alamb) - Support casting a
FixedSizedList<T>[1]toT#5779 [arrow] (sadboy) - [parquet] Set the default size of BitWriter in DeltaBitPackEncoder to 1MB #5776 [parquet] (AdamGS)
- Remove harmful table lookup optimization for bitmap operations #5772 [arrow] (HadrienG2)
- Remove deprecated comparison kernels (#4733) #5768 [arrow] (tustvold)
- Add environment variable definitions to run the nanoarrow integration tests #5764 (paleolimbot)
- Downgrade to Rust 1.77 in integration pipeline to fix CI (#5719) #5761 (tustvold)
- Expose boolean builder contents #5760 [arrow] (HadrienG2)
- Allow specifying comment character for CSV reader #5759 [arrow] (bbannier)
- Expose the null buffer of every builder that has one #5754 [arrow] (HadrienG2)
- feat: Make AsyncArrowWriter accepts AsyncFileWriter #5753 [parquet] (Xuanwo)
- Improve repository readme #5752 (alamb)
- Document object store release cadence #5750 (alamb)
- Compute data buffer length by using start and end values in offset buffer #5741 [arrow] (viirya)
- fix: parse string of scientific notation to decimal when the scale is 0 #5740 [arrow] (yjshen)
- Minor: avoid (likely unreachable) panic in FlightClient #5734 [arrow] [arrow-flight] (alamb)
- Update proc-macro2 requirement from =1.0.81 to =1.0.82 #5732 [arrow] [arrow-flight] (dependabot[bot])
- Improve error message for timestamp queries outside supported range #5730 [arrow] (Abdi-29)
- Refactor to share code between do_put and do_exchange calls #5728 [arrow] [arrow-flight] (opensourcegeek)
- Update brotli requirement from 5.0 to 6.0 #5726 [parquet] (dependabot[bot])
- Fix
GenericListBuildertest typo #5724 [arrow] (Kikkon) - Deprecate NullBuilder capacity, as it behaves in a surprising way #5721 [arrow] (HadrienG2)
- Fix nested nullability when randomly generating arrays #5713 [arrow] (alexwilcoxson-rel)
- Fix up clippy for Rust 1.78 #5710 [parquet] [arrow] (alamb)
- Support casting
StringView/BinaryView-->StringArray/BinaryArray. #5704 [arrow] (RinChanNOWWW) - Fix documentation around handling of nulls in cmp kernels #5697 [arrow] (Jefffrey)
- Support casting
StringArray/BinaryArray-->StringView/BinaryView#5686 [arrow] (RinChanNOWWW) - Add support for flexible column lengths #5679 [arrow] (Posnet)
- Move ffi stream and utils from arrow to arrow-array #5670 [arrow] (alexandreyc)
- Arrow Flight SQL example JDBC driver incompatibility #5666 [arrow] [arrow-flight] (istvan-fodor)
- Add
ListView&LargeListViewbasic construction and validation #5664 [arrow] (Kikkon) - Update proc-macro2 requirement from =1.0.80 to =1.0.81 #5659 [arrow] [arrow-flight] (dependabot[bot])
- Modify decimal regex to accept positive exponent specifier #5649 [arrow] (jdcasale)
- feat: JSON encoding of
FixedSizeList#5646 [arrow] (hiltontj) - Update proc-macro2 requirement from =1.0.79 to =1.0.80 #5644 [arrow] [arrow-flight] (dependabot[bot])
- fix: panic when casting
ListArraytoFixedSizeList#5643 [arrow] (jonahgao) - Add more invalid utf8 parquet reader tests #5639 [parquet] (alamb)
- Update brotli requirement from 4.0 to 5.0 #5637 [parquet] (dependabot[bot])
- Update flatbuffers requirement from 23.1.21 to 24.3.25 #5636 [arrow] (dependabot[bot])
- Increase
BinaryViewArraytest coverage #5635 [arrow] (alamb) - PrettyPrint support for
StringViewArrayandBinaryViewArray#5634 [arrow] (alamb) - feat(ffi): add run end encoded arrays #5632 [arrow] (notfilippo)
- Accept parquet schemas without explicitly required Map keys #5630 [parquet] (jupiter)
- Implement
filterkernel for byte view arrays. #5624 [arrow] (RinChanNOWWW) - feat: encode FixedSizeBinary in JSON as hex string #5622 [arrow] (hiltontj)
- Update Flight crate README version #5621 [arrow] [arrow-flight] (phillipleblanc)
- feat: support reading and writing
StringViewandBinaryViewin parquet (part 1) #5618 [parquet] [arrow] (alamb) - Use FixedSizeListArray::new in FixedSizeListBuilder #5612 [arrow] (tustvold)
- String to decimal conversion written using E/scientific notation #5611 [arrow] (Nekit2217)
- Account for Timezone when Casting Timestamp to Date32 #5605 [arrow] (Lordworms)
- Update prost-build requirement from =0.12.3 to =0.12.4 #5604 [arrow] [arrow-flight] (dependabot[bot])
- Fix panic when displaying dates on 32-bit platforms #5603 [arrow] (ivanceras)
- Implement
takekernel for byte view array. #5602 [arrow] (RinChanNOWWW) - Add tests for Arrow Flight support for
StringViewArrayandBinaryViewArray#5601 [arrow] [arrow-flight] (XiangpengHao) - test: Add a test for RowFilter with nested type #5600 [parquet] (viirya)
- Minor: Add docs for GenericBinaryBuilder, links to
GenericStringBuilder#5597 [arrow] (alamb) - Bump chrono-tz from 0.8 to 0.9 #5596 [arrow] (Jefffrey)
- Update brotli requirement from 3.3 to 4.0 #5586 [parquet] (dependabot[bot])
- Add
UnionArray::into_parts#5585 [arrow] (mbrobbel) - Expose ArrowReaderMetadata::try_new #5583 [parquet] (kylebarron)
- Add
try_filter_leavesto propagate error from filter closure #5575 [arrow] (viirya) - filter for run end array #5573 [arrow] (fabianmurariu)
- Pin zstd-sys to
v2.0.9in parquet #5567 [parquet] (Jefffrey) - Split arrow_cast::cast::string into it's own submodule #5563 [arrow] (monkwire)
- Correct example code for column (#5560) #5561 [parquet] (zgershkoff)
- Split arrow_cast::cast::dictionary into it's own submodule #5555 [arrow] (monkwire)
- Split arrow_cast::cast::decimal into it's own submodule #5552 [arrow] (monkwire)
- Fix new clippy lints for Rust 1.77 #5544 [parquet] [arrow] (alamb)
- fix: correctly encode ticket #5543 [arrow] [arrow-flight] (freddieptf)
- feat: implemented with_field() for FixedSizeListBuilder #5541 [arrow] (istvan-fodor)
- Split arrow_cast::cast::list into it's own submodule #5537 [arrow] (monkwire)
- Bump black from 22.10.0 to 24.3.0 in /parquet/pytest #5535 [parquet] (dependabot[bot])
- Add OffsetBufferBuilder #5532 [arrow] (tustvold)
- Add IPC StreamDecoder #5531 [arrow] (tustvold)
- IPC format support for StringViewArray and BinaryViewArray #5525 [arrow] (XiangpengHao)
- parquet: Use specific error variant when codec is disabled #5521 [parquet] (progval)
- impl
From<ScalarBuffer<T>>forVec<T>#5518 [arrow] (mbrobbel)
51.0.0 (2024-03-15)
Breaking changes:
- Remove internal buffering from AsyncArrowWriter (#5484) #5485 [parquet] (tustvold)
- Make ArrayBuilder also Sync #5353 [arrow] (dvic)
- Raw JSON writer (~10x faster) (#5314) #5318 [arrow] (tustvold)
Implemented enhancements:
- Prototype Arrow over HTTP in Rust #5496 [arrow]
- Add DataType::ListView and DataType::LargeListView #5492 [parquet] [arrow]
- Improve documentation around handling of dictionary arrays in arrow flight #5487 [arrow] [arrow-flight]
- Better memory limiting in parquet
ArrowWriter#5484 [parquet] - Support Creating Non-Nullable Lists and Maps within a Struct #5482 [arrow]
- [DISCUSSION] Better borrow propagation (e.g.
RecordBatch::schema()to return&SchemaRefvsSchemaRef) #5463 [parquet] [arrow] [arrow-flight] - Build Scalar with ArrayRef #5459
- AsyncArrowWriter doesn't limit underlying ArrowWriter to respect buffer-size #5450 [parquet]
- Refine
Displayimplementation forFlightError#5438 [arrow] [arrow-flight] - Better ergonomics for
FixedSizeListandLargeList#5372 [arrow] - Update Flight proto #5367 [arrow] [arrow-flight]
- Support check similar datatype but with different magnitudes #5358 [arrow]
- Buffer memory usage for custom allocations is reported as 0 #5346 [arrow]
- Can the ArrayBuilder trait be made Sync? #5344 [arrow]
- support cast 'UTF8' to
FixedSizeList#5339 [arrow] - Support Creating Non-Nullable Lists with ListBuilder #5330 [arrow]
ParquetRecordBatchStreamBuilder::new()panics instead of erroring out when opening a corrupted file #5315 [parquet]- Raw JSON Writer #5314 [arrow]
- Add support for more fused boolean operations #5297 [arrow]
- parquet: Allow disabling embed
ARROW_SCHEMA_META_KEYadded by theArrowWriter#5296 [parquet] - Support casting strings like '2001-01-01 01:01:01' to Date32 #5280 [arrow]
- Temporal Extract/Date Part Kernel #5266 [arrow]
- Support for extracting hours/minutes/seconds/etc. from
Time32/Time64type in temporal kernels #5261 [arrow] - parquet: add method to get both the inner writer and the file metadata when closing SerializedFileWriter #5253 [parquet]
- Release arrow-rs version 50.0.0 #5234
Fixed bugs:
- Empty String Parses as Zero in Unreleased Arrow #5504 [arrow]
- Unused import in nightly rust #5476 [parquet] [arrow] [arrow-flight]
- Error
The data type type List .. has no natural orderwhen usingarrow::compute::lexsort_to_indiceswith list and more than one column #5454 [arrow] - Wrong size assertion in arrow_buffer::builder::NullBufferBuilder::new_from_buffer #5445 [arrow]
- Inconsistency between comments and code implementation #5430 [arrow]
- OOB access in
Buffer::from_iter#5412 [arrow] - Cast kernel doesn't return null for string to integral cases when overflowing under safe option enabled #5397 [arrow]
- Make ffi consume variable layout arrays with empty offsets #5391 [arrow]
- RecordBatch conversion from pyarrow loses Schema's metadata #5354 [arrow]
- Debug output of Time32/Time64 arrays with invalid values has confusing nulls #5336 [arrow]
- Removing a column from a
RecordBatchdrops schema metadata #5327 [arrow] - Panic when read an empty parquet file #5304 [parquet]
- How to enable statistics for string columns? #5270 [parquet]
concat::tests::test_string_dictionary_merge failurefails on Mac / has different results in different platforms #5255 [arrow]
Documentation updates:
- Minor: Add doc comments to
GenericByteViewArray#5512 [arrow] (alamb) - Improve docs for logical and physical nulls even more #5434 [arrow] (alamb)
- Add example of converting RecordBatches to JSON objects #5364 [arrow] (alamb)
Performance improvements:
Closed issues:
- Add
StringViewArrayimplementation and layout and basic construction + tests #5469 [parquet] [arrow] - Add
DataType::Utf8ViewandDataType::BinaryView#5468 [parquet] [arrow]
Merged pull requests:
- Deprecate array_to_json_array #5515 [arrow] (tustvold)
- Fix integer parsing of empty strings (#5504) #5505 [arrow] (tustvold)
- feat: clarifying comments in struct_builder.rs #5494 #5499 [arrow] (istvan-fodor)
- Update proc-macro2 requirement from =1.0.78 to =1.0.79 #5498 [arrow] [arrow-flight] (dependabot[bot])
- Add DataType::ListView and DataType::LargeListView #5493 [parquet] [arrow] (Kikkon)
- Better document parquet pushdown #5491 [parquet] (tustvold)
- Fix NullBufferBuilder::new_from_buffer wrong size assertion #5489 [arrow] (Kikkon)
- Support dictionary encoding in structures for
FlightDataEncoder, add documentation forarrow_flight::encode::Dictionary#5488 [arrow] [arrow-flight] (thinkharderdev) - Add MapBuilder::with_values_field to support non-nullable values (#5482) #5483 [arrow] (lasantosr)
- feat: initial support string_view and binary_view, supports layout and basic construction + tests #5481 [arrow] (ariesdevil)
- Add more comprehensive documentation on testing and benchmarking to CONTRIBUTING.md #5478 (monkwire)
- Remove unused import detected by nightly rust #5477 [parquet] [arrow] [arrow-flight] (XiangpengHao)
- Add RecordBatch::schema_ref #5474 [parquet] [arrow] [arrow-flight] (monkwire)
- Provide access to inner Write for parquet writers #5471 [parquet] (tustvold)
- Add DataType::Utf8View and DataType::BinaryView #5470 [parquet] [arrow] (XiangpengHao)
- Update base64 requirement from 0.21 to 0.22 #5467 [parquet] [arrow] [arrow-flight] (dependabot[bot])
- Minor: Fix formatting typo in
Field::new_list_field#5464 [arrow] (alamb) - Fix test_string_dictionary_merge (#5255) #5461 [arrow] (tustvold)
- Use Vec::from_iter in Buffer::from_iter #5460 [arrow] (Kikkon)
- Document parquet writer memory limiting (#5450) #5457 [parquet] (tustvold)
- Document UnionArray Panics #5456 [arrow] (Kikkon)
- fix: lexsort_to_indices unsupported mixed types with list #5455 [arrow] (alamb)
- Refine
DisplayandSourceimplementation for error types #5439 [arrow] [arrow-flight] (BugenZhao) - Improve debug output of Time32/Time64 arrays #5428 [arrow] (monkwire)
- Miri fix: Rename invalid_mut to without_provenance_mut #5418 [arrow] (Jefffrey)
- Ensure addition/multiplications in when allocating buffers don't overflow #5417 [arrow] (Jefffrey)
- Update Flight proto: PollFlightInfo & expiration time #5413 [arrow] [arrow-flight] (Jefffrey)
- Add tests for serializing lists of dictionary encoded values to json #5399 [arrow] (jhorstmann)
- Return null for overflow when casting string to integer under safe option enabled #5398 [arrow] (viirya)
- Propagate error instead of panic for
take_bytes#5395 [arrow] (viirya) - Improve like kernel by ~2% #5390 [arrow] (psvri)
- Enable running arrow-array and arrow-arith with miri and avoid strict provenance warning #5387 [arrow] (jhorstmann)
- Update to chrono 0.4.34 #5385 [arrow] (tustvold)
- Return error instead of panic when reading invalid Parquet metadata #5382 [parquet] (mmaitre314)
- Update tonic requirement from 0.10.0 to 0.11.0 #5380 [arrow] [arrow-flight] (dependabot[bot])
- Update tonic-build requirement from =0.10.2 to =0.11.0 #5379 [arrow] [arrow-flight] (dependabot[bot])
- Fix latest clippy lints #5376 [arrow] (tustvold)
- feat: utility functions for creating
FixedSizeListandLargeListdtypes #5373 [arrow] (universalmind303) - Minor(docs): update master to main for DataFusion/Ballista #5363 (caicancai)
- Return an error instead of a panic when reading a corrupted Parquet file with mismatched column counts #5362 [parquet] (mmaitre314)
- feat: support casting FixedSizeList with new child type #5360 [arrow] (wjones127)
- Add more debugging info to StructBuilder validate_content #5357 [arrow] (viirya)
- pyarrow: Preserve RecordBatch's schema metadata #5355 [arrow] (atwam)
- Mark Encoding::BIT_PACKED as deprecated and document its compatibility issues #5348 [parquet] (jhorstmann)
- Track the size of custom allocations for use via Array::get_buffer_memory_size #5347 [arrow] (jhorstmann)
- fix: Return an error on type mismatch rather than panic (#4995) #5341 [parquet] (carols10cents)
- Minor: support cast values to fixedsizelist #5340 [arrow] (Weijun-H)
- Enhance Time32/Time64 support in date_part #5337 [arrow] (Jefffrey)
- feat: add
take_record_batch. #5333 [arrow] (RinChanNOWWW) - Add ListBuilder::with_field to support non nullable list fields (#5330) #5331 [arrow] (tustvold)
- Don't omit schema metadata when removing column #5328 [arrow] (kylebarron)
- Update proc-macro2 requirement from =1.0.76 to =1.0.78 #5324 [arrow] [arrow-flight] (dependabot[bot])
- Enhance Date64 type documentation #5323 [arrow] (Jefffrey)
- fix panic when decode a group with no child #5322 [parquet] (Liyixin95)
- Minor/Doc Expand FlightSqlServiceClient::handshake doc #5321 [arrow] [arrow-flight] (devinjdangelo)
- Refactor temporal extract date part kernels #5319 [arrow] (Jefffrey)
- Add JSON writer benchmarks (#5314) #5317 [arrow] (tustvold)
- Bump actions/cache from 3 to 4 #5308 (dependabot[bot])
- Avro block decompression #5306 [arrow] (tustvold)
- Result into error in case of endianness mismatches #5301 [arrow] (pangiole)
- parquet: Add ArrowWriterOptions to skip embedding the arrow metadata #5299 [parquet] (evenyag)
- Add support for more fused boolean operations #5298 [arrow] (RTEnzyme)
- Support Parquet Byte Stream Split Encoding #5293 [parquet] (mwlon)
- Extend string parsing support for Date32 #5282 [arrow] (gruuya)
- Bring some methods over from ArrowWriter to the async version #5251 [parquet] (AdamGS)
50.0.0 (2024-01-08)
Breaking changes:
- Make regexp_match take scalar pattern and flag #5245 [arrow] (viirya)
- Use Vec in ColumnReader (#5177) #5193 [parquet] (tustvold)
- Remove SIMD Feature #5184 [arrow] (tustvold)
- Use Total Ordering for Aggregates and Refactor for Better Auto-Vectorization #5100 [arrow] (jhorstmann)
- Allow the
zipcompute function to operator onScalarvalues viaDatum#5086 [arrow] (Nathan-Fenner) - Improve C Data Interface and Add Integration Testing Entrypoints #5080 [arrow] (pitrou)
- Parquet: read/write f16 for Arrow #5003 [parquet] (Jefffrey)
Implemented enhancements:
- Support get offsets or blocks info from arrow file. #5252 [arrow]
- Make regexp_match take scalar pattern and flag #5246 [arrow]
- Cannot access pen state website on arrow-row #5238 [arrow]
- RecordBatch with_schema's error message is hard to read #5227 [arrow]
- Support cast between StructArray. #5219 [arrow]
- Remove nightly-only simd feature and related code in ArrowNumericType #5185 [arrow]
- Use Vec instead of Slice in ColumnReader #5177 [parquet]
- Request to Memmap Arrow IPC files on disk #5153 [arrow]
- GenericColumnReader::read_records Yields Truncated Records #5150 [parquet]
- Nested Schema Projection #5148 [parquet] [arrow]
- Support specifying
quoteandescapein CsvWriterBuilder#5146 [arrow] - Support casting of Float16 with other numeric types #5138 [arrow]
- Parquet: read parquet metadata with page index in async and with size hints #5129 [parquet]
- Cast from floating/timestamp to timestamp/floating #5122 [arrow]
- Support Casting List To/From LargeList in Cast Kernel #5113 [arrow]
- Expose a path for converting
bytes::Bytesintoarrow_buffer::Bufferwithout copy #5104 [arrow] - API inconsistency of ListBuilder make it hard to use as nested builder #5098 [arrow]
- Parquet: don't truncate min/max statistics for float16 and decimal when writing file #5075 [parquet]
- Parquet: derive boundary order when writing columns #5074 [parquet]
- Support new Arrow PyCapsule Interface for Python FFI #5067 [arrow]
48.0.1arrow patch release #5050 [parquet] [arrow]- Binary columns do not receive truncated statistics #5037 [parquet]
- Re-evaluate Explicit SIMD Aggregations #5032 [arrow]
- Min/Max Kernels Should Use Total Ordering #5031 [arrow]
- Allow
zipcompute kernel to takeScalar/Datum#5011 [arrow] - Add Float16/Half-float logical type to Parquet #4986 [parquet]
- feat: cast (Large)List to FixedSizeList #5081 [arrow] (wjones127)
- Update Parquet Encoding Documentation #5051 [parquet]
Fixed bugs:
- json schema inference can't handle null field turned into object field in subsequent rows #5215 [arrow]
- Invalid trailing content after
Zin timezone is ignored #5182 [arrow] - Take panics on a fixed size list array when given null indices #5169 [arrow]
- EnabledStatistics::Page does not take effect on ByteArrayEncoder #5162 [parquet]
- Parquet: ColumnOrder not being written when writing parquet files #5152 [parquet]
- Parquet: Interval columns shouldn't write min/max stats #5145 [parquet]
- cast
Utf8to decimal failure #5127 [arrow] - coerce_primitive not honored when decoding from serde object #5095 [arrow]
- Unsound MutableArrayData Constructor #5091 [arrow]
- RowGroupReader.get_row_iter() fails with Path ColumnPath not found #5064 [parquet]
- cast format 'yyyymmdd' to Date32 give a error #5044 [arrow]
Performance improvements:
Closed issues:
- Working example of list_flights with ObjectStore #5116
- (object_store) Error broken pipe on S3 multipart upload #5106
Merged pull requests:
- Update parquet object_store dependency to 0.9.0 #5290 [parquet] (tustvold)
- Update proc-macro2 requirement from =1.0.75 to =1.0.76 #5289 [arrow] [arrow-flight] (dependabot[bot])
- Enable JS tests again #5287 (domoritz)
- Update proc-macro2 requirement from =1.0.74 to =1.0.75 #5279 [arrow] [arrow-flight] (dependabot[bot])
- Update proc-macro2 requirement from =1.0.73 to =1.0.74 #5271 [arrow] [arrow-flight] (dependabot[bot])
- Update proc-macro2 requirement from =1.0.71 to =1.0.73 #5265 [arrow] [arrow-flight] (dependabot[bot])
- Update docs for datatypes #5260 [arrow] (Jefffrey)
- Don't suppress errors in ArrowArrayStreamReader #5256 [arrow] (tustvold)
- Add IPC FileDecoder #5249 [arrow] (tustvold)
- optimize the next function of ArrowArrayStreamReader #5248 [arrow] (doki23)
- ci: Fail Miri CI on first failure #5243 (Jefffrey)
- Remove 'unwrap' from Result #5241 [parquet] (zeevm)
- Update arrow-row docs URL #5239 [arrow] (thomas-k-cameron)
- Improve regexp kernels performance by avoiding cloning Regex #5235 [arrow] (viirya)
- Update proc-macro2 requirement from =1.0.70 to =1.0.71 #5231 [arrow] [arrow-flight] (dependabot[bot])
- Minor: Improve comments and errors for ArrowPredicate #5230 [parquet] (alamb)
- Bump actions/upload-pages-artifact from 2 to 3 #5229 (dependabot[bot])
- make with_schema's error more readable #5228 [arrow] (shuoli84)
- Use
try_newwhen casting between structs to propagate error #5226 [arrow] (viirya) - feat(cast): support cast between struct #5221 [arrow] (my-vegetable-has-exploded)
- Add
entriestoMapBuilderto return both key and value array builders #5218 [arrow] (viirya) - fix(json): fix inferring object after field was null #5216 [arrow] (kskalski)
- Support MapBuilder in make_builder #5210 [arrow] (viirya)
- impl
From<OffsetBuffer<T>>forScalarBuffer<T>#5203 [arrow] (mbrobbel) - impl
From<BufferBuilder<T>>forBuffer#5202 [arrow] (mbrobbel) - impl
From<BufferBuilder<T>>forScalarBuffer<T>#5201 [arrow] (mbrobbel) - feat: Support quote and escape in Csv WriterBuilder #5196 [arrow] (my-vegetable-has-exploded)
- chore: simplify cast_string_to_interval #5195 [arrow] (jackwener)
- Clarify interval comparison behavior with documentation and tests #5192 [arrow] (alamb)
- Add
BooleanArray::into_partsmethod #5191 [arrow] (mbrobbel) - Fix deprecated note for
Buffer::from_raw_parts#5190 [arrow] (mbrobbel) - Fix: Ensure Timestamp Parsing Rejects Characters After 'Z #5189 [arrow] (razeghi71)
- Simplify parquet statistics generation #5183 [parquet] (tustvold)
- Parquet: Ensure page statistics are written only when conifgured from the Arrow Writer #5181 [parquet] (AdamGS)
- Blockwise IO in IPC FileReader (#5153) #5179 [arrow] (tustvold)
- Replace ScalarBuffer in Parquet with Vec (#1849) (#5177) #5178 [parquet] (tustvold)
- Bump actions/setup-python from 4 to 5 #5175 (dependabot[bot])
- Add
LargeListBuildertomake_builder#5171 [arrow] (viirya) - fix: ensure take_fixed_size_list can handle null indices #5170 (westonpace)
- Removing redundant
as castsin parquet #5168 [parquet] (psvri) - Bump actions/labeler from 4.3.0 to 5.0.0 #5167 (dependabot[bot])
- improve: make RunArray displayable #5166 [arrow] (yukkit)
- ci: Add cargo audit CI action #5160 [arrow] (Jefffrey)
- Parquet: write column_orders in FileMetaData #5158 [parquet] (Jefffrey)
- Adding
is_nulldatatype shortcut method #5157 [arrow] (comphead) - Parquet: don't truncate f16/decimal min/max stats #5154 [parquet] (Jefffrey)
- Support nested schema projection (#5148) #5149 [arrow] (tustvold)
- Parquet: omit min/max for interval columns when writing stats #5147 [parquet] (Jefffrey)
- Deprecate Fields::remove and Schema::remove #5144 [arrow] (tustvold)
- Support casting of Float16 with other numeric types #5139 [arrow] (viirya)
- Parquet: Make
MetadataLoaderpublic #5137 [parquet] (AdamGS) - Add FileReaderBuilder for arrow-ipc to allow reading large no. of column files #5136 [arrow] (Jefffrey)
- Parquet: clear metadata and project fields of ParquetRecordBatchStream::schema #5135 [parquet] (Jefffrey)
- JSON: write struct array nulls as null #5133 [arrow] (Jefffrey)
- Update proc-macro2 requirement from =1.0.69 to =1.0.70 #5131 [arrow] [arrow-flight] (dependabot[bot])
- Fix negative decimal string #5128 [arrow] (viirya)
- Cleanup list casting and support nested lists (#5113) #5124 [arrow] (tustvold)
- Cast from numeric/timestamp to timestamp/numeric #5123 [arrow] (viirya)
- Improve cast docs #5114 [arrow] (tustvold)
- Update prost-build requirement from =0.12.2 to =0.12.3 #5112 [arrow] [arrow-flight] (dependabot[bot])
- Parquet: derive boundary order when writing #5110 [parquet] (Jefffrey)
- Implementing
ArrayBuilderforBox<dyn ArrayBuilder>#5109 [arrow] (viirya) - Fix 'ColumnPath not found' error reading Parquet files with nested REPEATED fields #5102 [parquet] (mmaitre314)
- fix: coerce_primitive for serde decoded data #5101 [arrow] (fansehep)
- Extend aggregation benchmarks #5096 [arrow] (jhorstmann)
- Expand parquet crate overview doc #5093 [parquet] (mmaitre314)
- Ensure arrays passed to MutableArrayData have same type (#5091) #5092 [arrow] (tustvold)
- Update prost-build requirement from =0.12.1 to =0.12.2 #5088 [arrow] [arrow-flight] (dependabot[bot])
- Add FFI from_raw #5082 [arrow] (tustvold)
- [fix #5044] Support converting 'yyyymmdd' format to date #5078 [arrow] (Tangruilin)
- Enable truncation of binary statistics columns #5076 [parquet] (emcake)
49.0.0 (2023-11-07)
Breaking changes:
- Return row count when inferring schema from JSON #5008 [arrow] (asayers)
- Update object_store 0.8.0 #5043 [parquet] (tustvold)
Implemented enhancements:
- Cast from integer/timestamp to timestamp/integer #5039 [arrow]
- Support casting from integer to binary #5014 [arrow]
- Return row count when inferring schema from JSON #5007 [arrow]
- [FlightSQL] Allow custom commands in get-flight-info #4996 [arrow] [arrow-flight]
- Support
RecordBatch::remove_column()andSchema::remove_field()#4952 [arrow] arrow_json: supportbinarydeserialization #4945 [arrow]- Support StructArray in Cast Kernel #4908 [arrow]
- There exists a
ParquetRecordWriterproc macro inparquet_derive, butParquetRecordReaderis missing #4772 [parquet]
Fixed bugs:
- Regression when serializing large json numbers #5038 [arrow]
- RowSelection::intersection Produces Invalid RowSelection #5036 [parquet]
- Incorrect comment on arrow::compute::kernels::sort::sort_to_indices #5029 [arrow]
Documentation updates:
Merged pull requests:
- Parquet f32/f64 handle signed zeros in statistics #5048 [parquet] (Jefffrey)
- Fix serialization of large integers in JSON (#5038) #5042 [arrow] (tustvold)
- Fix RowSelection::intersection (#5036) #5041 [parquet] (tustvold)
- Cast from integer/timestamp to timestamp/integer #5040 [arrow] (viirya)
- doc: update comment on sort_to_indices to reflect correct ordering #5033 [arrow] (westonpace)
- Support casting from integer to binary #5015 [arrow] (viirya)
- Update tracing-log requirement from 0.1 to 0.2 #4998 [arrow] [arrow-flight] (dependabot[bot])
- feat(flight-sql): Allow custom commands in get-flight-info #4997 [arrow] [arrow-flight] (amartins23)
- [MINOR] No need to jump to web pages #4994 (smallzhongfeng)
- Support metadata in SchemaBuilder #4987 [arrow] (tustvold)
- feat: support schema change by idx and reverse #4985 [arrow] (fansehep)
- Bump actions/setup-node from 3 to 4 #4982 (dependabot[bot])
- Add arrow_cast::base64 and document usage in arrow_json #4975 [arrow] (tustvold)
- Add SchemaBuilder::remove (#4952) #4964 [arrow] (tustvold)
- Add
Field::remove(),Schema::remove(), andRecordBatch::remove_column()APIs #4959 [arrow] (Folyd) - Add
RecordReadertrait and proc macro to implement it for a struct #4773 [parquet] (Joseph-Rance)
48.0.0 (2023-10-18)
Breaking changes:
- Evaluate null_regex for string type in csv (now such values will be parsed as
Nullrather than"") #4942 [arrow] (haohuaijin) - fix(csv)!: infer null for empty column. #4910 [arrow] (kskalski)
- feat: log headers/trailers in flight CLI (+ minor fixes) #4898 [arrow] [arrow-flight] (crepererum)
- fix(arrow-json)!: include null fields in schema inference with a type of Null #4894 [arrow] (kskalski)
- Mark OnCloseRowGroup Send #4893 [parquet] (devinjdangelo)
- Specialize Thrift Decoding (~40% Faster) (#4891) #4892 [parquet] (tustvold)
- Make ArrowRowGroupWriter Public and SerializedRowGroupWriter Send #4850 [parquet] (devinjdangelo)
Implemented enhancements:
- Allow schema fields to merge with
Nulldatatype #4901 [arrow] - Add option to FlightDataEncoder to always send dictionaries #4895 [arrow] [arrow-flight]
- Rework Thrift Encoding / Decoding of Parquet Metadata #4891 [parquet]
- Plans for supporting Extension Array to support Fixed shape tensor Array #4890
- Implement Take for UnionArray #4882 [arrow]
- Check precision overflow for casting floating to decimal #4865 [arrow]
- Replace lexical #4774 [arrow]
- Add read access to settings in
csv::WriterBuilder#4735 [arrow] - Improve the performance of "DictionaryValue" row encoding #4712 [arrow] [arrow-flight]
Fixed bugs:
- Should we make blank values and empty string to
Nonein csv? #4939 [arrow] - [FlightSQL] SubstraitPlan structure is not exported #4932 [arrow] [arrow-flight]
- Loading page index breaks skipping of pages with nested types #4921 [parquet]
- CSV schema inference assumes
Utf8for empty columns #4903 [arrow] - parquet: Field Ids are not read from a Parquet file without serialized arrow schema #4877 [parquet]
- make_primitive_scalar function loses DataType Internal information #4851 [arrow]
- StructBuilder doesn't handle nulls correctly for empty structs #4842 [arrow]
NullArray::is_null()returnsfalseincorrectly #4835 [arrow]- cast_string_to_decimal should check precision overflow #4829 [arrow]
- Null fields are omitted by
infer_json_schema_from_seekable#4814 [arrow]
Closed issues:
Merged pull requests:
- Assume Pages Delimit Records When Offset Index Loaded (#4921) #4943 [parquet] (tustvold)
- Update pyo3 requirement from 0.19 to 0.20 #4941 [arrow] (crepererum)
- Add
FileWriterschema getter #4940 [arrow] (haixuanTao) - feat: support parsing for parquet writer option #4938 [parquet] (fansehep)
- Export
SubstraitPlanstructure in arrow_flight::sql (#4932) #4933 [arrow] [arrow-flight] (amartins23) - Update zstd requirement from 0.12.0 to 0.13.0 #4923 [parquet] [arrow] (dependabot[bot])
- feat: add method for async read bloom filter #4917 [parquet] (hengfeiyang)
- Minor: Clarify rationale for
FlightDataEncoderAPI, add examples #4916 [arrow] [arrow-flight] (alamb) - Update regex-syntax requirement from 0.7.1 to 0.8.0 #4914 [arrow] (dependabot[bot])
- feat: document & streamline flight SQL CLI #4912 [arrow] [arrow-flight] (crepererum)
- Support Arbitrary JSON values in JSON Reader (#4905) #4911 [arrow] (tustvold)
- Cleanup CSV WriterBuilder, Default to AutoSI Second Precision (#4735) #4909 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.68 to =1.0.69 #4907 [arrow] [arrow-flight] (dependabot[bot])
- chore: add csv example #4904 [arrow] (fansehep)
- feat(schema): allow null fields to be merged with other datatypes #4902 [arrow] (kskalski)
- Update proc-macro2 requirement from =1.0.67 to =1.0.68 #4900 [arrow] [arrow-flight] (dependabot[bot])
- Add option to
FlightDataEncoderto always resend batch dictionaries #4896 [arrow] [arrow-flight] (alexwilcoxson-rel) - Fix integration tests #4889 (tustvold)
- Support Parsing Avro File Headers #4888 (tustvold)
- Support parquet bloom filter length #4885 [parquet] (letian-jiang)
- Replace lz4 with lz4_flex Allowing Compilation for WASM #4884 [parquet] [arrow] (tustvold)
- Implement Take for UnionArray #4883 [arrow] (avantgardnerio)
- Update tonic-build requirement from =0.10.1 to =0.10.2 #4881 [arrow] [arrow-flight] (dependabot[bot])
- parquet: Read field IDs from Parquet Schema #4878 [parquet] (Samrose-Ahmed)
- feat: improve flight CLI error handling #4873 [arrow] [arrow-flight] (crepererum)
- Support Encoding Parquet Columns in Parallel #4871 [parquet] (tustvold)
- Check precision overflow for casting floating to decimal #4866 [arrow] (viirya)
- Make align_buffers as public API #4863 [arrow] (viirya)
- Enable new integration tests (#4828) #4862 (tustvold)
- Faster Serde Integration (~80% faster) #4861 [arrow] (tustvold)
- fix: make_primitive_scalar bug #4852 [arrow] (JasonLi-cn)
- Update tonic-build requirement from =0.10.0 to =0.10.1 #4846 [arrow] [arrow-flight] (dependabot[bot])
- Allow Constructing Non-Empty StructArray with no Fields (#4842) #4845 [arrow] (tustvold)
- Refine documentation to
Array::is_null#4838 [arrow] (alamb) - fix: add missing precision overflow checking for
cast_string_to_decimal#4830 [arrow] (jonahgao)
47.0.0 (2023-09-19)
Breaking changes:
- Make FixedSizeBinaryArray value_data return a reference #4820 [arrow]
- Update prost to v0.12.1 #4825 [arrow] [arrow-flight] (tustvold)
- feat: FixedSizeBinaryArray::value_data return reference #4821 [arrow] (wjones127)
- Stateless Row Encoding / Don't Preserve Dictionaries in
RowConverter(#4811) #4819 [arrow] [arrow-flight] (tustvold) - fix: entries field is non-nullable #4808 [arrow] (wjones127)
- Fix flight sql do put handling, add bind parameter support to FlightSQL cli client #4797 [arrow] [arrow-flight] (suremarc)
- Remove unused dyn_cmp_dict feature #4766 [arrow] (tustvold)
- Add underlying
std::io::ErrortoIoErrorand addIpcErrorvariant #4726 [arrow] [arrow-flight] (alexandreyc)
Implemented enhancements:
- Row Format Adapative Block Size #4812 [arrow]
- Stateless Row Conversion #4811 [arrow] [arrow-flight]
- Add option to specify custom null values for CSV reader #4794 [arrow]
- parquet::record::RowIter cannot be customized with batch_size and defaults to 1024 #4782 [parquet]
DynScalarabstraction (something that makes it easy to create scalarDatums) #4781 [arrow]Datumis not exported as part ofarrow(it is only exported inarrow_array) #4780 [arrow]Scalaris not exported as part ofarrow(it is only exported inarrow_array) #4779 [arrow]- Support IntoPyArrow for impl RecordBatchReader #4730 [arrow]
- Datum Based String Kernels #4595 [arrow] [arrow-flight]
Fixed bugs:
- MapArray::new_from_strings creates nullable entries field #4807 [arrow]
- pyarrow module can't roundtrip tensor arrays #4805 [arrow]
concat_batcheserrors with "schema mismatch" error when only metadata differs #4799 [arrow]- panic in
cmpkernels with DictionaryArrays:Option::unwrap()on aNonevalue' #4788 [arrow] - stream ffi panics if schema metadata values aren't valid utf8 #4750 [arrow]
- Regression: Incorrect Sorting of
*ListArrayin 46.0.0 #4746 [arrow] - Row is no longer comparable after reuse #4741 [arrow]
- DoPut FlightSQL handler inadvertently consumes schema at start of Request<Streaming<FlightData>> #4658
- Return error when converting schema #4752 [arrow] (wjones127)
- Implement PyArrowType for
Box<dyn RecordBatchReader + Send>#4751 [arrow] (wjones127)
Closed issues:
- Building arrow-rust for target wasm32-wasi falied to compile packed_simd_2 #4717
Merged pull requests:
- Respect FormatOption::nulls for NullArray #4836 [arrow] (tustvold)
- Fix merge_dictionary_values in selection kernels #4833 [arrow] (tustvold)
- Fix like scalar null #4832 [arrow] (tustvold)
- More chrono deprecations #4822 [arrow] (tustvold)
- Adaptive Row Block Size (#4812) #4818 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.66 to =1.0.67 #4816 [arrow] [arrow-flight] (dependabot[bot])
- Do not check schema for equality in concat_batches #4815 [arrow] (alamb)
- fix: export record batch through stream #4806 [arrow] (wjones127)
- Improve CSV Reader Benchmark Coverage of Small Primitives #4803 [arrow] (tustvold)
- csv: Add option to specify custom null values #4795 [arrow] (vrongmeal)
- Expand docstring and add example to
Scalar#4793 [arrow] (alamb) - Re-export array crate root (#4780) (#4779) #4791 [arrow] (tustvold)
- Fix DictionaryArray::normalized_keys (#4788) #4789 [arrow] (tustvold)
- Allow custom tree builder for parquet::record::RowIter #4783 [parquet] (YuraKotov)
- Bump actions/checkout from 3 to 4 #4767 (dependabot[bot])
- fix: avoid panic if offset index not exists. #4761 [parquet] (RinChanNOWWW)
- Relax constraints on PyArrowType #4757 (tustvold)
- Chrono deprecations #4748 [arrow] (tustvold)
- Fix List Sorting, Revert Removal of Rank Kernels #4747 [arrow] (tustvold)
- Clear row buffer before reuse #4742 [arrow] (yjshen)
- Datum based like kernels (#4595) #4732 [arrow] [arrow-flight] (tustvold)
- feat: expose DoGet response headers & trailers #4727 [arrow] [arrow-flight] (crepererum)
- Cleanup length and bit_length kernels #4718 [arrow] (tustvold)
46.0.0 (2023-08-21)
Breaking changes:
- API improvement:
batches_to_flight_dataforces clone #4656 [arrow] - Add AnyDictionary Abstraction and Take ArrayRef in DictionaryArray::with_values #4707 [arrow] (tustvold)
- Cleanup parquet type builders #4706 [parquet] (tustvold)
- Take kernel dyn Array #4705 [arrow] (tustvold)
- Improve ergonomics of Scalar #4704 [arrow] (tustvold)
- Datum based comparison kernels (#4596) #4701 [parquet] [arrow] [arrow-flight] (tustvold)
- Improve
ArrayLogical Nullability #4691 [parquet] [arrow] (tustvold) - Validate ArrayData Buffer Alignment and Automatically Align IPC buffers (#4255) #4681 [arrow] (tustvold)
- More intuitive bool-to-string casting #4666 [arrow] (fsdvh)
- enhancement: batches_to_flight_data use a schema ref as param. #4665 [arrow] [arrow-flight] (jackwener)
- fix: from_thrift avoid panic when stats in invalid. #4642 [parquet] (jackwener)
- bug: Add some missing field in row group metadata: ordinal, total co… #4636 [parquet] (liurenjie1024)
- Remove deprecated limit kernel #4597 [arrow] (tustvold)
Implemented enhancements:
- parquet: support setting the field_id with an ArrowWriter #4702 [parquet]
- Support references in i256 arithmetic ops #4694 [arrow]
- Precision-Loss Decimal Arithmetic #4664 [arrow]
- Faster i256 Division #4663 [arrow]
- Support
concat_batchesfor 0 columns #4661 [arrow] filter_record_batchshould support filtering record batch without columns #4647 [arrow]- Improve speed of
lexicographical_partition_ranges#4614 [arrow] - object_store: multipart ranges for HTTP #4612
- Add Rank Function #4606 [arrow]
- Datum Based Comparison Kernels #4596 [parquet] [arrow] [arrow-flight]
- Convenience method to create
DataType::Listcorrectly #4544 [arrow] - Remove Deprecated Arithmetic Kernels #4481 [arrow]
- Equality kernel where null==null gives true #4438 [arrow]
Fixed bugs:
- Parquet ArrowWriter Ignores Nulls in Dictionary Values #4690 [parquet] [arrow]
- Schema Nullability Validation Fails to Account for Dictionary Nulls #4689 [parquet] [arrow]
- Comparison Kernels Ignore Nulls in Dictionary Values #4688 [parquet] [arrow]
- Casting List to String Ignores Format Options #4669 [arrow]
- Double free in C Stream Interface #4659 [arrow]
- CI Failing On Packed SIMD #4651 [arrow]
RowInterner::size()much too low for high cardinality dictionary columns #4645 [arrow]- Decimal PrimitiveArray change datatype after try_unary #4644
- Better explanation in docs for Dictionary field encoding using RowConverter #4639 [arrow]
List(FixedSizeBinary)array equality check may return wrong result #4637 [arrow]arrow::compute::nullifpanics ifNullArrayis provided #4634 [arrow]- Empty lists in FixedSizeListArray::try_new is not handled #4623 [arrow]
- Bounds checking in
MutableBuffer::set_null_bitscan be bypassed #4620 [arrow] - TypedDictionaryArray Misleading Null Behaviour #4616 [parquet] [arrow]
- bug: Parquet writer missing row group metadata fields such as
compressed_size,file offset. #4610 [parquet] new_null_arraygenerates an invalid union array #4600 [arrow]- Footer parsing fails for very large parquet file. #4592 [parquet]
- bug(parquet): Disabling global statistics but enabling for particular column breaks reading #4587 [parquet]
arrow::compute::concatpanics for dense union arrays with non-trivial type IDs #4578 [arrow]
Closed issues:
- [object_store] when Create a AmazonS3 instance work with MinIO without set endpoint got error MissingRegion #4617
Merged pull requests:
- Add distinct kernels (#960) (#4438) #4716 [arrow] (tustvold)
- Update parquet object_store 0.7 #4715 [parquet] (tustvold)
- Support Field ID in ArrowWriter (#4702) #4710 [parquet] (tustvold)
- Remove rank kernels #4703 [arrow] (tustvold)
- Support references in i256 arithmetic ops #4692 [arrow] (viirya)
- Cleanup DynComparator (#2654) #4687 [arrow] (tustvold)
- Separate metadata fetch from
ArrowReaderBuilderconstruction (#4674) #4676 [parquet] (tustvold) - cleanup some assert() with error propagation #4673 [parquet] (zeevm)
- Faster i256 Division (2-100x) (#4663) #4672 [arrow] (tustvold)
- Fix MSRV CI #4671 (tustvold)
- Fix equality of nested nullable FixedSizeBinary (#4637) #4670 [arrow] (tustvold)
- Use ArrayFormatter in cast kernel #4668 [arrow] (tustvold)
- Minor: Improve API docs for FlightSQL metadata builders #4667 [arrow] [arrow-flight] (alamb)
- Support
concat_batchesfor 0 columns #4662 [arrow] (Dandandan) - fix ownership of c stream error #4660 [arrow] (wjones127)
- Minor: Fix illustration for dict encoding #4657 [arrow] (JayjeetAtGithub)
- minor: move comment to the correct location #4655 [arrow] (jackwener)
- Update packed_simd and run miri tests on simd code #4654 [arrow] (jhorstmann)
- impl
From<Vec<T>>forBufferBuilderandMutableBuffer#4650 [arrow] (mbrobbel) - Filter record batch with 0 columns #4648 [arrow] (Dandandan)
- Account for child
Bucketsize in OrderPreservingInterner #4646 [arrow] (alamb) - Implement
Default,ExtendandFromIteratorforBufferBuilder#4638 [arrow] (mbrobbel) - fix(select): handle
NullArrayinnullif#4635 [arrow] (kawadakk) - Move
BufferBuildertoarrow-buffer#4630 [arrow] (mbrobbel) - allow zero sized empty fixed #4626 [arrow] (smiklos)
- fix: compute_dictionary_mapping use wrong offsetSize #4625 [arrow] (jackwener)
- impl
FromIteratorforMutableBuffer#4624 [arrow] (mbrobbel) - expand docs for FixedSizeListArray #4622 [arrow] (smiklos)
- fix(buffer): panic on end index overflow in
MutableBuffer::set_null_bits#4621 [arrow] (kawadakk) - impl
Defaultforarrow_buffer::buffer::MutableBuffer#4619 [arrow] (mbrobbel) - Minor: improve docs and add example for lexicographical_partition_ranges #4615 [arrow] (alamb)
- Cleanup sort #4613 [arrow] (tustvold)
- Add rank function (#4606) #4609 [arrow] (tustvold)
- Add more docs and examples for ListArray and OffsetsBuffer #4607 [arrow] (alamb)
- Simplify dictionary sort #4605 [arrow] (tustvold)
- Consolidate sort benchmarks #4604 [arrow] (tustvold)
- Don't Reorder Nulls in sort_to_indices (#4545) #4603 [arrow] (tustvold)
- fix(data): create child arrays of correct length when building a sparse union null array #4601 [arrow] (kawadakk)
- Use u32 metadata_len when parsing footer of parquet. #4599 [parquet] (Berrysoft)
- fix(data): map type ID to child index before indexing a union child array #4598 [arrow] (kawadakk)
- Remove deprecated arithmetic kernels (#4481) #4594 [arrow] (tustvold)
- Test Disabled Page Statistics (#4587) #4589 [parquet] (tustvold)
- Cleanup ArrayData::buffers #4583 [arrow] (tustvold)
- Use contains_nulls in ArrayData equality of byte arrays #4582 [arrow] (tustvold)
- Vectorized lexicographical_partition_ranges (~80% faster) #4575 [arrow] (tustvold)
- chore: add datatype new_list #4561 [arrow] (fansehep)
45.0.0 (2023-07-30)
Breaking changes:
- Fix timezoned timestamp arithmetic #4546 [arrow] (alexandreyc)
Implemented enhancements:
- Use FormatOptions in Const Contexts #4580 [arrow]
- Human Readable Duration Display #4554 [arrow]
BooleanBuilder: Addvalidity_slicemethod for accessing validity bits #4535 [arrow]- Support
FixedSizedListArrayforlengthkernel #4517 [arrow] RowCoverter::convertthat targets an existingRows#4479 [arrow]
Fixed bugs:
- Panic
assertion failed: idx < self.lenwhen casting DictionaryArrays with nulls #4576 [arrow] - arrow-arith is_null is buggy with NullArray #4565 [arrow]
- Incorrect Interval to Duration Casting #4553 [arrow]
- Too large validity buffer pre-allocation in
FixedSizeListBuilder::new#4549 [arrow] - Like with wildcards fail to match fields with new lines. #4547 [arrow]
- Timestamp Interval Arithmetic Ignores Timezone #4457 [arrow]
Merged pull requests:
- refactor: simplify hour_dyn() with time_fraction_dyn() #4588 [arrow] (jackwener)
- Move from_iter_values to GenericByteArray #4586 [arrow] (tustvold)
- Mark GenericByteArray::new_unchecked unsafe #4584 [arrow] (tustvold)
- Configurable Duration Display #4581 [arrow] (tustvold)
- Fix take_bytes Null and Overflow Handling (#4576) #4579 [arrow] (tustvold)
- Move chrono-tz arithmetic tests to integration #4571 [arrow] (tustvold)
- Write Page Offset Index For All-Nan Pages #4567 [parquet] (MachaelLee)
- support NullArray un arith/boolean kernel #4566 [arrow] (smiklos)
- Remove Sync from arrow-flight example #4564 [arrow] [arrow-flight] (tustvold)
- Fix interval to duration casting (#4553) #4562 [arrow] (tustvold)
- docs: fix wrong parameter name #4559 [parquet] (SteveLauC)
- Fix FixedSizeListBuilder capacity (#4549) #4552 [arrow] (tustvold)
- docs: fix wrong inline code snippet in parquet document #4550 [parquet] (SteveLauC)
- fix multiline wildcard likes (fixes #4547) #4548 [arrow] (nl5887)
- Provide default
is_emptyimpl forarrow::array::ArrayBuilder#4543 [arrow] (mbrobbel) - Add RowConverter::append (#4479) #4541 [arrow] (tustvold)
- Clarify GenericColumnReader::read_records #4540 [parquet] (tustvold)
- Initial loongarch port #4538 [arrow] (xiangzhai)
- Update proc-macro2 requirement from =1.0.64 to =1.0.66 #4537 [arrow] [arrow-flight] (dependabot[bot])
- add a validity slice access for boolean array builders #4536 [arrow] (ChristianBeilschmidt)
- use new num version instead of explicit num-complex dependency #4532 [arrow] (mwlon)
- feat: Support
FixedSizedListArrayforlengthkernel #4520 [arrow] (Weijun-H)
44.0.0 (2023-07-14)
Breaking changes:
- Use Parser for cast kernel (#4512) #4513 [arrow] (tustvold)
- Add Datum based arithmetic kernels (#3999) #4465 [arrow] (tustvold)
Implemented enhancements:
- eq_dyn_binary_scalar should support FixedSizeBinary types #4491 [arrow]
- Port Tests from Deprecated Arithmetic Kernels #4480 [arrow]
- Implement RecordBatchReader for Boxed trait object #4474 [arrow]
- Support
Date-Datekernel #4383 [arrow] - Default FlightSqlService Implementations #4372 [arrow] [arrow-flight]
Fixed bugs:
- Parquet:
AsyncArrowWriterto a file corrupts the footer for large columns #4526 [parquet] - [object_store] Failure to send bytes to azure #4522
- Cannot cast string '2021-01-02' to value of Date64 type #4512 [arrow]
- Incorrect Interval Subtraction #4489 [arrow]
- Interval Negation Incorrect #4488 [arrow]
- Parquet: AsyncArrowWriter inner buffer is not correctly limited and causes OOM #4477 [parquet]
Merged pull requests:
- Fix AsyncArrowWriter flush for large buffer sizes (#4526) #4527 [parquet] (tustvold)
- Cleanup cast_primitive_to_list #4511 [arrow] (tustvold)
- Bump actions/upload-pages-artifact from 1 to 2 #4508 (dependabot[bot])
- Support Date - Date (#4383) #4504 [arrow] (tustvold)
- Bump actions/labeler from 4.2.0 to 4.3.0 #4501 (dependabot[bot])
- Update proc-macro2 requirement from =1.0.63 to =1.0.64 #4500 [arrow] [arrow-flight] (dependabot[bot])
- Add negate kernels (#4488) #4494 [arrow] (tustvold)
- Add Datum Arithmetic tests, Fix Interval Substraction (#4480) #4493 [arrow] (tustvold)
- support FixedSizeBinary types in eq_dyn_binary_scalar/neq_dyn_binary_scalar #4492 [arrow] (maxburke)
- Add default implementations to the FlightSqlService trait #4485 [arrow] [arrow-flight] (rossjones)
- add num-complex requirement #4482 [arrow] (mwlon)
- fix incorrect buffer size limiting in parquet async writer #4478 [parquet] (richox)
- feat: support RecordBatchReader on boxed trait objects #4475 [arrow] (wjones127)
- Improve in-place primitive sorts by 13-67% #4473 [arrow] (psvri)
- Add Scalar/Datum abstraction (#1047) #4393 [arrow] (tustvold)
43.0.0 (2023-06-30)
Breaking changes:
- Simplify ffi import/export #4447 [arrow] (Virgiel)
- Return Result from Parquet Row APIs #4428 [parquet] (zeevm)
- Remove Binary Dictionary Arithmetic Support #4407 [arrow] (tustvold)
Implemented enhancements:
- Request: a way to copy a
RowtoRows#4466 [arrow] - Reuse schema when importing from FFI #4444 [arrow]
- [FlightSQL] Allow implementations of
FlightSqlServiceto handle custom actions and commands #4439 - Support
NullBuilder#4429 [arrow]
Fixed bugs:
- Regression in in parquet
42.0.0: Bad parquet column indexes for All Null Columns, resulting inParquet error: StructArrayReader out of syncon read #4459 [parquet] - Regression in 42.0.0: Parsing fractional intervals without leading 0 is not supported #4424 [arrow]
Documentation updates:
Merged pull requests:
- Append Row to Rows (#4466) #4470 [arrow] (tustvold)
- feat(flight-sql): Allow implementations of FlightSqlService to handle custom actions and commands #4463 [arrow] [arrow-flight] (amartins23)
- Docs: Add clearer API doc links #4461 [parquet] [arrow] [arrow-flight] (alamb)
- Fix empty offset index for all null columns (#4459) #4460 [parquet] (tustvold)
- Bump peaceiris/actions-gh-pages from 3.9.2 to 3.9.3 #4455 (dependabot[bot])
- Convince the compiler to auto-vectorize the range check in parquet DictionaryBuffer #4453 [parquet] (jhorstmann)
- fix docs deployment #4452 [parquet] [arrow] (xxchan)
- Update indexmap requirement from 1.9 to 2.0 #4451 [arrow] (dependabot[bot])
- Update proc-macro2 requirement from =1.0.60 to =1.0.63 #4450 [arrow] [arrow-flight] (dependabot[bot])
- Bump actions/deploy-pages from 1 to 2 #4449 (dependabot[bot])
- Revise error message in From<Buffer> for ScalarBuffer #4446 [arrow] (viirya)
- minor: remove useless mut #4443 [parquet] [arrow] (jackwener)
- unify substring for binary&utf8 #4442 [arrow] (jackwener)
- Casting fixedsizelist to list/largelist #4433 [arrow] (jayzhan211)
- feat: support
NullBuilder#4430 [arrow] (izveigor) - Remove Float64 -> Float32 cast in IPC Reader #4427 [arrow] (ming08108)
- Parse intervals like
.5the same as0.5#4425 [arrow] (alamb) - feat: add strict mode to json reader #4421 [arrow] (blinkseb)
- Add DictionaryArray::occupancy #4415 [arrow] (tustvold)
42.0.0 (2023-06-16)
Breaking changes:
- Remove 64-bit to 32-bit Cast from IPC Reader #4412 [arrow] (ming08108)
- Truncate Min/Max values in the Column Index #4389 [parquet] (AdamGS)
- feat(flight): harmonize server metadata APIs #4384 [arrow] [arrow-flight] (roeap)
- Move record delimiting into ColumnReader (#4365) #4376 [parquet] (tustvold)
- Changed array_to_json_array to take &dyn Array #4370 [arrow] (dadepo)
- Make PrimitiveArray::with_timezone consuming #4366 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Add doc example of constructing a MapArray #4385 [arrow]
- Support
millisecondandmicrosecondfunctions #4374 [arrow] - Changed array_to_json_array to take &dyn Array #4369 [arrow]
- compute::ord kernel for getting min and max of two scalar/array values #4347 [arrow]
- Release 41.0.0 of arrow/arrow-flight/parquet/parquet-derive #4346
- Refactor CAST tests to use new cast array syntax #4336 [arrow]
- pass bytes directly to parquet's KeyValue #4317
- PyArrow conversions could return TypeError if provided incorrect Python type #4312 [arrow]
- Have array_to_json_array support Map #4297 [arrow]
- FlightSQL: Add helpers to create
CommandGetXdbcTypeInforesponses (XdbcInfoValueand builders) #4257 [arrow] [arrow-flight] - Have array_to_json_array support FixedSizeList #4248 [arrow]
- Truncate ColumnIndex ByteArray Statistics #4126 [parquet]
- Arrow compute kernel regards selection vector #4095 [arrow]
Fixed bugs:
- Wrongly calculated data compressed length in IPC writer #4410 [arrow]
- Take Kernel Handles Nullable Indices Incorrectly #4404 [arrow]
- StructBuilder::new Doesn't Validate Builder DataTypes #4397 [arrow]
- Parquet error: Not all children array length are the same! when using RowSelection to read a parquet file #4396
- RecordReader::skip_records Is Incorrect for Repeated Columns #4368 [parquet]
- List-of-String Array panics in the presence of row filters #4365 [parquet]
- Fail to read block compressed gzip files with parquet-fromcsv #4173 [parquet]
Closed issues:
- Have a parquet file not able to be deduped via arrow-rs, complains about Decimal precision? #4356
- Question: Could we move
dict_id, dict_is_orderedinto DataType? #4325
Merged pull requests:
- Fix reading gzip file with multiple gzip headers in parquet-fromcsv. #4419 [parquet] (ghuls)
- Cleanup nullif kernel #4416 [arrow] (tustvold)
- Fix bug in IPC logic that determines if the buffer should be compressed or not #4411 [arrow] (lwpyr)
- Faster unpacking of Int32Type dictionary #4406 [arrow] (tustvold)
- Improve
takekernel performance on primitive arrays, fix bad null index handling (#4404) #4405 [arrow] (tustvold) - More take benchmarks #4403 [arrow] (tustvold)
- Add
BooleanBuffer::new_unsetandBooleanBuffer::new_setandBooleanArray::new_nullconstructors #4402 [arrow] (tustvold) - Add PrimitiveBuilder type constructors #4401 [arrow] (tustvold)
- StructBuilder Validate Child Data (#4397) #4400 [arrow] (tustvold)
- Faster UTF-8 truncation #4399 [parquet] (tustvold)
- Minor: Derive
Hashimpls forCastOptionsandFormatOptions#4395 [arrow] (alamb) - Fix typo in README #4394 [arrow] [arrow-flight] (okue)
- Improve parquet
WriterProperitesandReaderPropertiesdocs #4392 [parquet] (alamb) - Cleanup downcast macros #4391 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.59 to =1.0.60 #4388 [arrow] [arrow-flight] (dependabot[bot])
- Consolidate ByteArray::from_iterator #4386 [arrow] (tustvold)
- Add MapArray constructors and doc example #4382 [arrow] (tustvold)
- Documentation Improvements #4381 [arrow] (tustvold)
- Add NullBuffer and BooleanBuffer From conversions #4380 [arrow] (tustvold)
- Add more examples of constructing Boolean, Primitive, String, and Decimal Arrays, and From impl for i256 #4379 [arrow] (alamb)
- Add ListArrayReader benchmarks #4378 [parquet] (tustvold)
- Update comfy-table requirement from 6.0 to 7.0 #4377 [arrow] (dependabot[bot])
- feat: Add
microsecondandmillisecondkernels #4375 [arrow] (izveigor) - Update hashbrown requirement from 0.13 to 0.14 #4373 [parquet] [arrow] (dependabot[bot])
- minor: use as_boolean to resolve TODO #4367 [arrow] (jackwener)
- Have array_to_json_array support MapArray #4364 [arrow] (dadepo)
- deprecate: as_decimal_array #4363 [arrow] (izveigor)
- Add support for FixedSizeList in array_to_json_array #4361 [arrow] (dadepo)
- refact: use as_primitive in cast.rs test #4360 [arrow] (Weijun-H)
- feat(flight): add xdbc type info helpers #4359 [arrow] [arrow-flight] (roeap)
- Minor: float16 to json #4358 [arrow] (izveigor)
- Raise TypeError on PyArrow import #4316 [arrow] (wjones127)
- Arrow Cast: Fixed Point Arithmetic for Interval Parsing #4291 [arrow] (mr-brobot)
41.0.0 (2023-06-02)
Breaking changes:
- Rename list contains kernels to in_list (#4289) #4342 [parquet] [arrow] (tustvold)
- Move BooleanBufferBuilder and NullBufferBuilder to arrow_buffer #4338 [arrow] (tustvold)
- Add separate row_count and level_count to PageMetadata (#4321) #4326 [parquet] (tustvold)
- Treat legacy TIMSETAMP_X converted types as UTC #4309 [parquet] (sergiimk)
- Simplify parquet PageIterator #4306 [parquet] (tustvold)
- Add Builder style APIs and docs for
FlightData,FlightInfo,FlightEndpoint,LocaationandTicket#4294 [arrow] [arrow-flight] (alamb) - Make GenericColumnWriter Send #4287 [parquet] (tustvold)
- feat: update flight-sql to latest specs #4250 [arrow] [arrow-flight] (roeap)
- feat(api!): make ArrowArrayStreamReader Send #4232 [arrow] (wjones127)
Implemented enhancements:
- Make SerializedRowGroupReader::new() Public #4330 [parquet]
- Speed up i256 division and remainder operations #4302 [arrow]
- export function parquet_to_array_schema_and_fields #4298 [parquet]
- FLightSQL: add helpers to create
CommandGetCatalogs,CommandGetSchemas, andCommandGetTablesrequests #4295 [arrow] [arrow-flight] - Make ColumnWriter Send #4286 [parquet]
- Add Builder for
FlightInfoto make it easier to create new requests #4281 [arrow] [arrow-flight] - Support Writing/Reading Decimal256 to/from Parquet #4264 [parquet]
- FlightSQL: Add helpers to create
CommandGetSqlInforesponses (SqlInfoValueand builders) #4256 [arrow] [arrow-flight] - Update flight-sql implementation to latest specs #4249 [arrow] [arrow-flight]
- Make ArrowArrayStreamReader Send #4222 [arrow]
- Support writing FixedSizeList to Parquet #4214 [parquet]
- Cast between
Intervals#4181 [arrow] - Splice Parquet Data #4155 [parquet]
- CSV Schema More Flexible Timestamp Inference #4131 [arrow]
Fixed bugs:
- Doc for arrow_flight::sql is missing enums that are Xdbc related #4339 [arrow] [arrow-flight]
- concat_batches panics with total_len <= bit_len assertion for records with lists #4324 [arrow]
- Incorrect PageMetadata Row Count returned for V1 DataPage #4321 [parquet]
- [parquet] Not following the spec for TIMESTAMP_MILLIS legacy converted types #4308 [parquet]
- ambiguous glob re-exports of contains_utf8 #4289 [parquet] [arrow]
- flight_sql_client --header "key: value" yields a value with a leading whitespace #4270 [arrow] [arrow-flight]
- Casting Timestamp to date is off by one day for dates before 1970-01-01 #4211 [arrow]
Merged pull requests:
- Don't infer 16-byte decimal as decimal256 #4349 [parquet] (tustvold)
- Fix MutableArrayData::extend_nulls (#1230) #4343 [arrow] (tustvold)
- Update FlightSQL metadata locations, names and docs #4341 [arrow] [arrow-flight] (alamb)
- chore: expose Xdbc related FlightSQL enums #4340 [arrow] [arrow-flight] (appletreeisyellow)
- Update pyo3 requirement from 0.18 to 0.19 #4335 [arrow] (dependabot[bot])
- Skip unnecessary null checks in MutableArrayData #4333 [arrow] (tustvold)
- feat: add read parquet by custom rowgroup examples #4332 [parquet] (sundy-li)
- Make SerializedRowGroupReader::new() public #4331 [parquet] (burmecia)
- Don't split record across pages (#3680) #4327 [parquet] (tustvold)
- fix date conversion if timestamp below unixtimestamp #4323 [arrow] (comphead)
- Short-circuit on exhausted page in skip_records #4320 [parquet] (tustvold)
- Handle trailing padding when skipping repetition levels (#3911) #4319 [parquet] (tustvold)
- Use
page_sizeconsistently, deprecatepagesizein parquet WriterProperties #4313 [parquet] (alamb) - Add roundtrip tests for Decimal256 and fix issues (#4264) #4311 [parquet] (tustvold)
- Expose page-level arrow reader API (#4298) #4307 [parquet] (tustvold)
- Speed up i256 division and remainder operations #4303 [arrow] (viirya)
- feat(flight): support int32_to_int32_list_map in sql infos #4300 [arrow] [arrow-flight] (roeap)
- feat(flight): add helpers to handle
CommandGetCatalogs,CommandGetSchemas, andCommandGetTablesrequests #4296 [arrow] [arrow-flight] (roeap) - Improve docs and tests for `SqlInfoList #4293 [arrow] [arrow-flight] (alamb)
- minor: fix arrow_row docs.rs links #4292 [arrow] (roeap)
- Update proc-macro2 requirement from =1.0.58 to =1.0.59 #4290 [arrow] [arrow-flight] (dependabot[bot])
- Improve
ArrowWritermemory usage: Buffer Pages in ArrowWriter instead of RecordBatch (#3871) #4280 [parquet] (tustvold) - Minor: Add more docstrings in arrow-flight #4279 [arrow] [arrow-flight] (alamb)
- Add
Debugimpls forArrowWriterandSerializedFileWriter#4278 [parquet] (alamb) - Expose
RecordBatchWritertoarrowcrate #4277 [arrow] (alexandreyc) - Update criterion requirement from 0.4 to 0.5 #4275 [parquet] [arrow] (dependabot[bot])
- Add parquet-concat #4274 [parquet] (tustvold)
- Convert FixedSizeListArray to GenericListArray #4273 [arrow] (tustvold)
- feat: support 'Decimal256' for parquet #4272 [parquet] (Weijun-H)
- Strip leading whitespace from flight_sql_client custom header values #4271 [arrow] [arrow-flight] (mkmik)
- Add Append Column API (#4155) #4269 [parquet] (tustvold)
- Derive Default for WriterProperties #4268 [parquet] (tustvold)
- Parquet Reader/writer for fixed-size list arrays #4267 [parquet] (dexterduck)
- feat(flight): add sql-info helpers #4266 [arrow] [arrow-flight] (roeap)
- Convert parquet metadata back to builders #4265 [parquet] (tustvold)
- Add constructors for FixedSize array types (#3879) #4263 [arrow] (tustvold)
- Extract IPC ArrayReader struct #4259 [arrow] (tustvold)
- Update object_store requirement from 0.5 to 0.6 #4258 [parquet] (dependabot[bot])
- Support Absolute Timestamps in CSV Schema Inference (#4131) #4217 [arrow] (tustvold)
- feat: cast between
Intervals#4182 [arrow] (izveigor)
40.0.0 (2023-05-19)
Breaking changes:
- Prefetch page index (#4090) #4216 [parquet] (tustvold)
- Add RecordBatchWriter trait and implement it for CSV, JSON, IPC and P… #4206 [parquet] [arrow] (alexandreyc)
- Remove powf_scalar kernel #4187 [arrow] (tustvold)
- Allow format specification in cast #4169 [arrow] (parthchandra)
Implemented enhancements:
- ObjectStore with_url Should Handle Path #4199
- Support
Interval+/-Interval#4178 [arrow] - [parquet] add compression info to
print_column_chunk_metadata()#4172 [parquet] - Allow cast to take in a format specification #4168 [arrow]
- Support extended pow arithmetic #4166 [arrow]
- Preload page index for async ParquetObjectReader #4090 [parquet]
Fixed bugs:
Merged pull requests:
- Arrow Arithmetic: Subtract timestamps #4244 [arrow] (mr-brobot)
- Update proc-macro2 requirement from =1.0.57 to =1.0.58 #4236 [arrow] [arrow-flight] (dependabot[bot])
- Fix Nightly Clippy Lints #4233 [arrow] (tustvold)
- Minor: use all primitive types in test_layouts #4229 [arrow] (izveigor)
- Add close method to RecordBatchWriter trait #4228 [parquet] [arrow] (alexandreyc)
- Update proc-macro2 requirement from =1.0.56 to =1.0.57 #4219 [arrow] [arrow-flight] (dependabot[bot])
- Feat docs #4215 [parquet] [arrow] (Folyd)
- feat: Support bitwise and boolean aggregate functions #4210 [arrow] (izveigor)
- Document how to sort a RecordBatch #4204 [arrow] (tustvold)
- Fix incorrect cast Timestamp with Timezone #4201 [arrow] (aprimadi)
- Add implementation of
RecordBatchReaderfor CSV reader #4195 [arrow] (alexandreyc) - Add Sliced ListArray test (#3748) #4186 [arrow] (tustvold)
- refactor: simplify can_cast_types code. #4185 [arrow] (jackwener)
- Minor: support new types in struct_builder.rs #4177 [arrow] (izveigor)
- feat: add compression info to print_column_chunk_metadata() #4176 [parquet] (SteveLauC)
39.0.0 (2023-05-05)
Breaking changes:
- Allow creating unbuffered streamreader #4165 [arrow] (ming08108)
- Cleanup ChunkReader (#4118) #4156 [parquet] (tustvold)
- Remove Type from NativeIndex #4146 [parquet] (tustvold)
- Don't Duplicate Offset Index on RowGroupMetadata #4142 [parquet] (tustvold)
- Return BooleanBuffer from BooleanBufferBuilder #4140 [parquet] [arrow] (tustvold)
- Cleanup CSV schema inference (#4129) (#4130) #4133 [parquet] [arrow] (tustvold)
- Remove deprecated parquet ArrowReader #4125 [parquet] (tustvold)
- refactor: construct
StructArrayw/FieldRef#4116 [parquet] [arrow] (crepererum) - Ignore Field Metadata in equals_datatype for Dictionary, RunEndEncoded, Map and Union #4111 [arrow] (izveigor)
- Add StructArray Constructors (#3879) #4064 [arrow] (tustvold)
Implemented enhancements:
- Release 39.0.0 of arrow/arrow-flight/parquet/parquet-derive (next release after 38.0.0) #4170 [arrow] [arrow-flight]
- Fixed point decimal multiplication for DictionaryArray #4135 [arrow]
- Remove Seek Requirement from CSV ReaderBuilder #4130 [parquet] [arrow]
- Inconsistent CSV Inference and Parsing DateTime Handling #4129 [parquet] [arrow]
- Support accessing ipc Reader/Writer inner by reference #4121
- Add Type Declarations for All Primitive Tensors and Buffer Builders #4112 [arrow]
- Support
Interval + TimestampandInterval + Datein addition toTimestamp + IntervalandInterval + Date#4094 [arrow] - Enable setting FlightDescriptor on FlightDataEncoderBuilder #3855 [arrow] [arrow-flight]
Fixed bugs:
- Parquet Page Index Reader Assumes Consecutive Offsets #4149 [parquet]
- Equality of nested data types #4110 [arrow]
Documentation updates:
- Improve Documentation of Parquet ChunkReader #4118
Closed issues:
Merged pull requests:
- Prep for 39.0.0 #4171 [arrow] [arrow-flight] (iajoiner)
- Support Compression in parquet-fromcsv #4160 [parquet] (suxiaogang223)
- feat: support bitwise shift left/right with scalars #4159 [arrow] (izveigor)
- Cleanup reading page index (#4149) (#4090) #4151 [parquet] (tustvold)
- feat: support
bitwiseshift left/right #4148 [arrow] (Weijun-H) - Don't hardcode port in FlightSQL tests #4145 [arrow] [arrow-flight] (tustvold)
- Better flight SQL example codes #4144 [arrow] [arrow-flight] (sundy-li)
- chore: clean the code by using
as_primitive#4143 [arrow] (Weijun-H) - docs: fix the wrong ln command in CONTRIBUTING.md #4139 (SteveLauC)
- Infer Float64 for JSON Numerics Beyond Bounds of i64 #4138 [arrow] (SteveLauC)
- Support fixed point multiplication for DictionaryArray of Decimals #4136 [arrow] (viirya)
- Make arrow_json::ReaderBuilder method names consistent #4128 [arrow] (tustvold)
- feat: add get_{ref, mut} to arrow_ipc Reader and Writer #4122 (sticnarf)
- feat: support
Interval+TimestampandInterval+Date#4117 [arrow] (Weijun-H) - Support NullArray in JSON Reader #4114 [arrow] (jiangzhx)
- Add Type Declarations for All Primitive Tensors and Buffer Builders #4113 [arrow] (izveigor)
- Update regex-syntax requirement from 0.6.27 to 0.7.1 #4107 [arrow] (dependabot[bot])
- feat: set FlightDescriptor on FlightDataEncoderBuilder #4101 [arrow] [arrow-flight] (Weijun-H)
- optimize cast for same decimal type and same scale #4088 [arrow] (liukun4515)
38.0.0 (2023-04-21)
Breaking changes:
- Remove DataType from PrimitiveArray constructors #4098 [arrow] (tustvold)
- Use Into<Arc<str>> for PrimitiveArray::with_timezone #4097 [arrow] (tustvold)
- Store StructArray entries in MapArray #4085 [parquet] [arrow] (tustvold)
- Add DictionaryArray Constructors (#3879) #4068 [arrow] [arrow-flight] (tustvold)
- Relax JSON schema inference generics #4063 [arrow] (tustvold)
- Remove ArrayData from Array (#3880) #4061 [arrow] (tustvold)
- Add CommandGetXdbcTypeInfo to Flight SQL Server #4055 [arrow] [arrow-flight] (c-thiel)
- Remove old JSON Reader and Decoder (#3610) #4052 [parquet] [arrow] (tustvold)
- Use BufRead for JSON Schema Inference #4041 [arrow] (WenyXu)
Implemented enhancements:
- Support dyn_compare_scalar for Decimal256 #4083 [arrow]
- Better JSON Reader Error Messages #4076 [arrow]
- Additional data type groups #4056 [arrow]
- Async JSON reader #4043 [arrow]
- Field::contains Should Recurse into DataType #4029 [arrow]
- Prevent UnionArray with Repeated Type IDs #3982 [parquet] [arrow]
- Support
Timestamp+/-Intervaltypes #3963 [arrow] - First-Class Array Abstractions #3880 [parquet] [arrow] [arrow-flight]
Fixed bugs:
- Update readme to remove reference to Jira #4091
- OffsetBuffer::new Rejects 0 Offsets #4066 [arrow]
- Parquet AsyncArrowWriter not shutting down inner async writer. #4058 [parquet]
- Flight SQL Server missing command type.googleapis.com/arrow.flight.protocol.sql.CommandGetXdbcTypeInfo #4054 [arrow] [arrow-flight]
- RawJsonReader Errors with Empty Schema #4053 [parquet] [arrow]
- RawJsonReader Integer Truncation #4049 [arrow]
- Sparse UnionArray Equality Incorrect Offset Handling #4044 [arrow]
Documentation updates:
Closed issues:
- Parquet reader of Int96 columns and coercion to timestamps #4075
- Serializing timestamp from int (json raw decoder) #4069 [arrow]
- Support casting to/from Interval and Duration #3998 [arrow]
Merged pull requests:
- Fix Docs Typos #4100 [parquet] (rnarkk)
- Update tonic-build requirement from =0.9.1 to =0.9.2 #4099 [arrow] [arrow-flight] (dependabot[bot])
- Increase minimum chrono version to 0.4.24 #4093 [arrow] (alamb)
- Simplify reference to GitHub issues #4092 (bkmgit)
- [Minor]: Add
Hashtrait to SortOptions. #4089 [arrow] (mustafasrepo) - Include byte offsets in parquet-layout #4086 [parquet] (tustvold)
- feat: Support dyn_compare_scalar for Decimal256 #4084 [arrow] (izveigor)
- Add ByteArray constructors (#3879) #4081 [arrow] (tustvold)
- Update prost-build requirement from =0.11.8 to =0.11.9 #4080 [arrow] [arrow-flight] (dependabot[bot])
- Improve JSON decoder errors (#4076) #4079 [arrow] (tustvold)
- Fix Timestamp Numeric Truncation in JSON Reader #4074 [arrow] (tustvold)
- Serialize numeric to tape (#4069) #4073 [arrow] (tustvold)
- feat: Prevent UnionArray with Repeated Type IDs #4070 [arrow] (Weijun-H)
- Add PrimitiveArray::try_new (#3879) #4067 [arrow] (tustvold)
- Add ListArray Constructors (#3879) #4065 [arrow] (tustvold)
- Shutdown parquet async writer #4059 [parquet] (kindly)
- feat: additional data type groups #4057 [arrow] (izveigor)
- Fix precision loss in Raw JSON decoder (#4049) #4051 [arrow] (tustvold)
- Use lexical_core in CSV and JSON parser (~25% faster) #4050 [arrow] (tustvold)
- Add offsets accessors to variable length arrays (#3879) #4048 [arrow] (tustvold)
- Document Async decoder usage (#4043) (#78) #4046 [arrow] (tustvold)
- Fix sparse union array equality (#4044) #4045 [arrow] (tustvold)
- feat: DataType::contains support nested type #4042 [arrow] (Weijun-H)
- feat: Support Timestamp +/- Interval types #4038 [arrow] (Weijun-H)
- Fix object_store CI #4037 (tustvold)
- feat: cast from/to interval and duration #4020 [arrow] (Weijun-H)
37.0.0 (2023-04-07)
Breaking changes:
- Fix timestamp handling in cast kernel (#1936) (#4033) #4034 [arrow] (tustvold)
- Update tonic 0.9.1 #4011 [arrow] [arrow-flight] (tustvold)
- Use FieldRef in DataType (#3955) #3983 [parquet] [arrow] (tustvold)
- Store Timezone as Arc<str> #3976 [parquet] [arrow] (tustvold)
- Panic instead of discarding nulls converting StructArray to RecordBatch - (#3951) #3953 [parquet] [arrow] (tustvold)
- Fix(flight_sql): PreparedStatement has no token for auth. #3948 [arrow] [arrow-flight] (youngsofun)
- Add Strongly Typed Array Slice (#3929) #3930 [parquet] [arrow] (tustvold)
- Add Zero-Copy Conversion between Vec and MutableBuffer #3920 [arrow] (tustvold)
Implemented enhancements:
- Support Decimals cast to Utf8/LargeUtf #3991 [arrow]
- Support Date32/Date64 minus Interval #3962 [arrow]
- Reduce Cloning of Field #3955 [parquet] [arrow] [arrow-flight]
- Support Deserializing Serde DataTypes to Arrow #3949 [arrow]
- Add multiply_fixed_point #3946 [arrow]
- Strongly Typed Array Slicing #3929 [parquet] [arrow]
- Make it easier to match FlightSQL messages #3874 [arrow] [arrow-flight]
- Support Casting Between Binary / LargeBinary and FixedSizeBinary #3826 [arrow]
Fixed bugs:
- Incorrect Overflow Casting String to Timestamp #4033
- f16::ZERO and f16::ONE are mixed up #4016 [arrow]
- Handle overflow precision when casting from integer to decimal #3995 [arrow]
- PrimitiveDictionaryBuilder.finish should use actual value type #3971 [arrow]
- RecordBatch From StructArray Silently Discards Nulls #3952 [parquet] [arrow]
- I256 Checked Subtraction Overflows for i256::MINUS_ONE #3942 [arrow]
- I256 Checked Multiply Overflows for i256::MIN #3941 [arrow]
Closed issues:
Merged pull requests:
- Prep for 37.0.0 #4031 [arrow] [arrow-flight] (iajoiner)
- Add RecordBatch::with_schema #4028 [arrow] (tustvold)
- Only require compatible batch schema in ArrowWriter #4027 [parquet] (tustvold)
- Add Fields::contains #4026 [arrow] (tustvold)
- Minor: add methods "is_positive" and "signum" to i256 #4024 [arrow] (izveigor)
- Deprecate Array::data (#3880) #4019 [arrow] (tustvold)
- feat: add tests for ArrowNativeTypeOp #4018 [arrow] (izveigor)
- fix: f16::ZERO and f16::ONE are mixed up #4017 [arrow] (izveigor)
- Minor: Float16Tensor #4013 [arrow] (izveigor)
- Add FlightSQL module docs and links to
arrow-flightcrates #4012 [arrow] [arrow-flight] (alamb) - Update proc-macro2 requirement from =1.0.54 to =1.0.56 #4008 [arrow] [arrow-flight] (dependabot[bot])
- Cleanup Primitive take #4006 [arrow] (tustvold)
- Deprecate combine_option_bitmap #4005 [arrow] (tustvold)
- Minor: add tests for BooleanBuffer #4004 [arrow] (izveigor)
- feat: support to read/write customized metadata in ipc files #4003 [arrow] (framlog)
- Cleanup more uses of Array::data (#3880) #4002 [parquet] [arrow] (tustvold)
- Remove js feature from README #4001 [arrow] (akazukin5151)
- feat: add the implementation BitXor to BooleanBuffer #3997 [arrow] (izveigor)
- Handle precision overflow when casting from integer to decimal #3996 [arrow] (viirya)
- Support CAST from Decimal datatype to String #3994 [arrow] (comphead)
- Add Field Constructors for Complex Fields #3992 [parquet] [arrow] [arrow-flight] (tustvold)
- fix: remove unused type parameters. #3986 [arrow] (youngsofun)
- Add UnionFields (#3955) #3981 [parquet] [arrow] (tustvold)
- Cleanup Fields Serde #3980 [arrow] (tustvold)
- Support Rust structures -->
RecordBatchby addingSerdesupport toRawDecoder(#3949) #3979 [arrow] (tustvold) - Convert string_to_timestamp_nanos to doctest #3978 [arrow] (tustvold)
- Fix documentation of string_to_timestamp_nanos #3977 [arrow] (byteink)
- add Date32/Date64 support to subtract_dyn #3974 [arrow] (SinanGncgl)
- PrimitiveDictionaryBuilder.finish should use actual value type #3972 [arrow] (viirya)
- Update proc-macro2 requirement from =1.0.53 to =1.0.54 #3968 [arrow] [arrow-flight] (dependabot[bot])
- Async writer tweaks #3967 [parquet] (tustvold)
- Fix reading ipc files with unordered projections #3966 [arrow] (framlog)
- Add Fields abstraction (#3955) #3965 [parquet] [arrow] [arrow-flight] (tustvold)
- feat: cast between
Binary/LargeBinaryandFixedSizeBinary#3961 [arrow] (Weijun-H) - feat: support async writer (#1269) #3957 [parquet] (ShiKaiWi)
- Add ListBuilder::append_value (#3949) #3954 [arrow] (tustvold)
- Improve array builder documentation (#3949) #3951 [arrow] (tustvold)
- Faster i256 parsing #3950 [arrow] (tustvold)
- Add multiply_fixed_point #3945 [arrow] (viirya)
- feat: enable metadata import/export through C data interface #3944 [arrow] (wjones127)
- Fix checked i256 arithmetic (#3942) (#3941) #3943 [arrow] (tustvold)
- Avoid memory copies in take_list #3940 [arrow] (tustvold)
- Faster decimal parsing (30-60%) #3939 [arrow] (spebern)
- Fix: FlightSqlClient panic when execute_update. #3938 [arrow] [arrow-flight] (youngsofun)
- Cleanup row count handling in JSON writer #3934 [arrow] (tustvold)
- Add typed buffers to UnionArray (#3880) #3933 [arrow] (tustvold)
- feat: add take for MapArray #3925 [arrow] (wjones127)
- Deprecate Array::data_ref (#3880) #3923 [arrow] (tustvold)
- Zero-copy conversion from Vec to PrimitiveArray #3917 [arrow] (tustvold)
- feat: Add Commands enum to decode prost messages to strong type #3887 [arrow] [arrow-flight] (stuartcarnie)
36.0.0 (2023-03-24)
Breaking changes:
- Use dyn Array in sort kernels #3931 [arrow] (tustvold)
- Enforce struct nullability in JSON raw reader (#3900) (#3904) #3906 [arrow] (tustvold)
- Return ScalarBuffer from PrimitiveArray::values (#3879) #3896 [arrow] (tustvold)
- Use BooleanBuffer in BooleanArray (#3879) #3895 [arrow] (tustvold)
- Seal ArrowPrimitiveType #3882 [arrow] (tustvold)
- Support compression levels #3847 [parquet] (spebern)
Implemented enhancements:
- Improve speed of parsing string to Times #3919 [arrow]
- feat: add comparison/sort support for Float16 #3914
- Pinned version in arrow-flight's build-dependencies are causing conflicts #3876
- Add compression options (levels) #3844 [parquet] [arrow]
- Use Unsigned Integer for Fixed Size DataType #3815
- Common trait for RecordBatch and StructArray #3764 [arrow]
- Allow precision loss on multiplying decimal arrays #3689 [arrow]
Fixed bugs:
- Raw JSON Reader Allows Non-Nullable Struct Children to Contain Nulls #3904
- Nullable field with nested not nullable map in json #3900
- parquet_derive doesn't support Vec<u8> #3864 [parquet]
- [REGRESSION] Parsing timestamps with lower case time separator #3863 [arrow]
- [REGRESSION] Parsing timestamps with leap seconds #3861 [arrow]
- [REGRESSION] Parsing timestamps with fractional seconds / microseconds / milliseconds / nanoseconds #3859 [arrow]
- CSV Reader Doesn't set Timezone #3841
- PyArrowConvert Leaks Memory #3683 [arrow]
Merged pull requests:
- Derive RunArray Clone #3932 [arrow] (tustvold)
- Move protoc generation to binary crate, unpin prost/tonic build (#3876) #3927 [arrow] [arrow-flight] (tustvold)
- Fix JSON Temporal Encoding of Multiple Batches #3924 [arrow] (doki23)
- Cleanup uses of Array::data_ref (#3880) #3918 [parquet] [arrow] (tustvold)
- Support microsecond and nanosecond in interval parsing #3916 [arrow] (alamb)
- feat: add comparison/sort support for Float16 #3915 [arrow] (izveigor)
- Add AsArray trait for more ergonomic downcasting #3912 [parquet] [arrow] (tustvold)
- Add OffsetBuffer::new #3910 [arrow] (tustvold)
- Add PrimitiveArray::new (#3879) #3909 [arrow] (tustvold)
- Support timezones in CSV reader (#3841) #3908 [arrow] (tustvold)
- Improve ScalarBuffer debug output #3907 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.52 to =1.0.53 #3905 [arrow] [arrow-flight] (dependabot[bot])
- Re-export parquet compression level structs #3903 [parquet] (tustvold)
- Fix parsing timestamps of exactly 32 characters #3902 [arrow] (tustvold)
- Add iterators to BooleanBuffer and NullBuffer #3901 [arrow] (tustvold)
- Array equality for &dyn Array (#3880) #3899 [arrow] (tustvold)
- Add BooleanArray::new (#3879) #3898 [arrow] (tustvold)
- Revert structured ArrayData (#3877) #3894 (tustvold)
- Fix pyarrow memory leak (#3683) #3893 [arrow] (tustvold)
- Minor: add examples for
ListBuilderandGenericListBuilder#3891 [arrow] (alamb) - Update syn requirement from 1.0 to 2.0 #3890 (dependabot[bot])
- Use of
mul_checkedto avoid silent overflow in interval arithmetic #3886 [arrow] (Weijun-H) - Flesh out NullBuffer abstraction (#3880) #3885 [parquet] [arrow] (tustvold)
- Implement Bit Operations for i256 #3884 [arrow] (tustvold)
- Flatten arrow_buffer #3883 [arrow] (tustvold)
- Add Array::to_data and Array::nulls (#3880) #3881 [arrow] (tustvold)
- Added support for byte vectors and slices to parquet_derive (#3864) #3878 [parquet] (waymost)
- chore: remove LevelDecoder #3872 [parquet] (Weijun-H)
- Parse timestamps with leap seconds (#3861) #3862 [arrow] (tustvold)
- Faster time parsing (~93% faster) #3860 [arrow] (tustvold)
- Parse timestamps with arbitrary seconds fraction #3858 [arrow] (tustvold)
- Add BitIterator #3856 [arrow] (tustvold)
- Improve decimal parsing performance #3854 [arrow] (spebern)
- Update proc-macro2 requirement from =1.0.51 to =1.0.52 #3853 [arrow] [arrow-flight] (dependabot[bot])
- Update bitflags requirement from 1.2.1 to 2.0.0 #3852 [arrow] (dependabot[bot])
- Add offset pushdown to parquet #3848 [parquet] (tustvold)
- Add timezone support to JSON reader #3845 [arrow] (tustvold)
- Allow precision loss on multiplying decimal arrays #3690 [arrow] (viirya)
35.0.0 (2023-03-10)
Breaking changes:
- Add RunEndBuffer (#1799) #3817 [arrow] (tustvold)
- Restrict DictionaryArray to ArrowDictionaryKeyType #3813 [arrow] (tustvold)
- refactor: assorted
FlightSqlServiceClientimprovements #3788 [arrow] [arrow-flight] (crepererum) - minor: make Parquet CLI input args consistent #3786 [parquet] (XinyuZeng)
- Return Buffers from ArrayData::buffers instead of slice (#1799) #3783 [arrow] (tustvold)
- Use NullBuffer in ArrayData (#3775) #3778 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Support timestamp/time and date types in json decoder #3834 [arrow]
- Support decoding decimals in new raw json decoder #3819 [arrow]
- Timezone Aware Timestamp Parsing #3794 [arrow]
- Preallocate buffers for FixedSizeBinary array creation #3792 [arrow]
- Make Parquet CLI args consistent #3785 [parquet]
- Creates PrimitiveDictionaryBuilder from provided keys and values builders #3776 [arrow]
- Use NullBuffer in ArrayData #3775 [parquet] [arrow]
- Support unary_dict_mut in arth #3710 [arrow]
- Support cast <> String to interval #3643 [arrow]
- Support Zero-Copy Conversion from Vec to/from MutableBuffer #3516 [arrow]
Fixed bugs:
- Timestamp Unit Casts are Unchecked #3833 [arrow]
- regexp_match skips first match when returning match #3803 [arrow]
- Cast to timestamp with time zone returns timestamp #3800 [arrow]
- Schema-level metadata is not encoded in Flight responses #3779 [arrow] [arrow-flight]
Closed issues:
- FlightSQL CLI client: simple test #3814 [arrow] [arrow-flight]
Merged pull requests:
- refactor: timestamp overflow check #3840 [arrow] (Weijun-H)
- Prep for 35.0.0 #3836 [parquet] [arrow] [arrow-flight] (iajoiner)
- Support timestamp/time and date json decoding #3835 [arrow] (spebern)
- Make dictionary preservation optional in row encoding #3831 [arrow] (tustvold)
- Move prettyprint to arrow-cast #3828 [arrow] [arrow-flight] (tustvold)
- Support decoding decimals in raw decoder #3820 [arrow] (spebern)
- Add ArrayDataLayout, port validation (#1799) #3818 [arrow] (tustvold)
- test: add test for FlightSQL CLI client #3816 [arrow] [arrow-flight] (crepererum)
- Add regexp_match docs #3812 [arrow] (tustvold)
- fix: Ensure Flight schema includes parent metadata #3811 [arrow] [arrow-flight] (stuartcarnie)
- fix: regexp_match skips first match #3807 [arrow] (Weijun-H)
- fix: change uft8 to timestamp with timezone #3806 [arrow] (Weijun-H)
- Support reading decimal arrays from json #3805 [arrow] (spebern)
- Add unary_dict_mut #3804 [arrow] (viirya)
- Faster timestamp parsing (~70-90% faster) #3801 [arrow] (tustvold)
- Add concat_elements_bytes #3798 [arrow] (tustvold)
- Timezone aware timestamp parsing (#3794) #3795 [arrow] (tustvold)
- Preallocate buffers for FixedSizeBinary array creation #3793 [arrow] (maxburke)
- feat: simple flight sql CLI client #3789 [arrow] [arrow-flight] (crepererum)
- Creates PrimitiveDictionaryBuilder from provided keys and values builders #3777 [arrow] (viirya)
- ArrayData Enumeration for Remaining Layouts #3769 [arrow] (tustvold)
- Update prost-build requirement from =0.11.7 to =0.11.8 #3767 [arrow] [arrow-flight] (dependabot[bot])
- Implement concat_elements_dyn kernel #3763 [arrow] (Weijun-H)
- Support for casting
Utf8andLargeUtf8-->Interval#3762 [arrow] (doki23) - into_inner() for CSV Writer #3759 [arrow] (Weijun-H)
- Zero-copy Vec conversion (#3516) (#1176) #3756 [arrow] (tustvold)
- ArrayData Enumeration for Primitive, Binary and UTF8 #3749 [arrow] (tustvold)
- Add
into_primitive_dict_buildertoDictionaryArray#3715 [arrow] (viirya)
34.0.0 (2023-02-24)
Breaking changes:
- Infer 2020-03-19 00:00:00 as timestamp not Date64 in CSV (#3744) #3746 [arrow] (tustvold)
- Implement fallible streams for
FlightClient::do_put#3464 [arrow] [arrow-flight] (alamb)
Implemented enhancements:
- Support casting string to timestamp with microsecond resolution #3751
- Add datatime/interval/duration into comparison kernels #3729 [arrow]
- ! (not) operator overload for SortOptions #3726 [arrow]
- parquet: convert Bytes to ByteArray directly #3719 [parquet]
- Implement simple RecordBatchReader #3704
- Is possible to implement GenericListArray::from_iter ? #3702
take_runimprovements #3701 [arrow]- Support
as_mut_anyin Array trait #3655 Array-->Displayformatter that supports more options and is configurable #3638 [parquet] [arrow]- arrow-csv: support decimal256 #3474 [arrow]
Fixed bugs:
- CSV reader infers Date64 type for fields like "2020-03-19 00:00:00" that it can't parse to Date64 #3744 [arrow]
Merged pull requests:
- Update to 34.0.0 and update changelog #3757 [parquet] [arrow] [arrow-flight] (iajoiner)
- Update MIRI for split crates (#2594) #3754 (tustvold)
- Update prost-build requirement from =0.11.6 to =0.11.7 #3753 [arrow] [arrow-flight] (dependabot[bot])
- Enable casting of string to timestamp with microsecond resolution #3752 [arrow] (gruuya)
- Use Typed Buffers in Arrays (#1811) (#1176) #3743 [arrow] (tustvold)
- Cleanup arithmetic kernel type constraints #3739 [arrow] (tustvold)
- Make dictionary kernels optional for comparison benchmark #3738 [arrow] (tustvold)
- Support String Coercion in Raw JSON Reader #3736 [arrow] (rguerreiromsft)
- replace for loop by try_for_each #3734 [arrow] (suxiaogang223)
- feat: implement generic record batch reader #3733 (wjones127)
- [minor] fix doc test fail #3732 [arrow] (Ted-Jiang)
- Add datetime/interval/duration into dyn scalar comparison #3730 [arrow] (viirya)
- Using Borrow<Value> on infer_json_schema_from_iterator #3728 [arrow] (rguerreiromsft)
- Not operator overload for SortOptions #3727 [arrow] (berkaysynnada)
- fix: encoding batch with no columns #3724 [arrow] [arrow-flight] (wangrunji0408)
- feat: impl
Ord/PartialOrdforSortOptions#3723 [arrow] (crepererum) - Add From<Bytes> for ByteArray #3720 [parquet] (tustvold)
- Deprecate old JSON reader (#3610) #3718 [parquet] [arrow] (tustvold)
- Add pretty format with options #3717 [arrow] (tustvold)
- Remove unreachable decimal take #3716 [arrow] (tustvold)
- Feat: arrow csv decimal256 #3711 [arrow] (suxiaogang223)
- perf:
take_runimprovements #3705 [arrow] (askoa) - Add raw MapArrayReader #3703 [arrow] (tustvold)
- feat: Sort kernel for
RunArray#3695 [arrow] (askoa) - perf: Remove sorting to yield sorted_rank #3693 [arrow] (askoa)
- fix: Handle sliced array in run array iterator #3681 [arrow] (askoa)
33.0.0 (2023-02-10)
Breaking changes:
- Use ArrayFormatter in Cast Kernel #3668 [arrow] (tustvold)
- Use dyn Array in cast kernels #3667 [arrow] (tustvold)
- Return references from FixedSizeListArray and MapArray #3652 [parquet] [arrow] (tustvold)
- Lazy array display (#3638) #3647 [parquet] [arrow] (tustvold)
- Use array_value_to_string in arrow-csv #3514 [arrow] (JayjeetAtGithub)
Implemented enhancements:
- Support UTF8 cast to Timestamp with timezone #3664
- Add modulus_dyn and modulus_scalar_dyn #3648 [arrow]
- A trait for append_value and append_null on ArrayBuilders #3644
- Improve error message "batches[0] schema is different with argument schema" #3628 [arrow]
- Specified version of helper function to cast binary to string #3623 [arrow]
- Casting generic binary to generic string #3606 [arrow]
- Use
array_value_to_stringinarrow-csv#3483 [arrow]
Fixed bugs:
- ArrowArray::try_from_raw Misleading Signature #3684 [arrow]
- PyArrowConvert Leaks Memory #3683 [arrow]
- Arrow-csv reader cannot produce RecordBatch even if the bytes are necessary #3674
- FFI Fails to Account For Offsets #3671 [arrow]
- Regression in CSV reader error handling #3656 [arrow]
- UnionArray Child and Value Fail to Account for non-contiguous Type IDs #3653 [arrow]
- Panic when accessing RecordBatch from pyarrow #3646 [arrow]
- Multiplication for decimals is incorrect #3645
- Inconsistent output between pretty print and CSV writer for Arrow #3513 [arrow]
Closed issues:
- Release 33.0.0 of arrow/arrow-flight/parquet/parquet-derive (next release after 32.0.0) #3682
- Release
32.0.0ofarrow/arrow-flight/parquet/parquet-derive(next release after31.0.0) #3584 [parquet] [arrow] [arrow-flight]
Merged pull requests:
- Move FFI to sub-crates #3687 [arrow] (tustvold)
- Update to 33.0.0 and update changelog #3686 [parquet] [arrow] [arrow-flight] (iajoiner)
- Cleanup FFI interface (#3684) (#3683) #3685 [arrow] (tustvold)
- fix: take_run benchmark parameter #3679 [arrow] (askoa)
- Minor: Add some examples to Date*Array and Time*Array #3678 [arrow] (alamb)
- Add CSV Decoder::capacity (#3674) #3677 [arrow] (tustvold)
- Add ArrayData::new_null and DataType::primitive_width #3676 [arrow] (tustvold)
- Fix FFI which fails to account for offsets #3675 [arrow] (viirya)
- Support UTF8 cast to Timestamp with timezone #3673 [arrow] (comphead)
- Fix Date64Array docs #3670 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.50 to =1.0.51 #3669 [arrow] [arrow-flight] (dependabot[bot])
- Add timezone accessor for Timestamp*Array #3666 [arrow] (tustvold)
- Faster timezone cast #3665 [arrow] (tustvold)
- feat + fix: IPC support for run encoded array. #3662 [arrow] (askoa)
- Implement std::fmt::Write for StringBuilder (#3638) #3659 [arrow] (tustvold)
- Include line and field number in CSV UTF-8 error (#3656) #3657 [arrow] (tustvold)
- Handle non-contiguous type_ids in UnionArray (#3653) #3654 [arrow] (tustvold)
- Add modulus_dyn and modulus_scalar_dyn #3649 [arrow] (viirya)
- Improve error message with detailed schema #3637 [arrow] (Veeupup)
- Add limit to ArrowReaderBuilder to push limit down to parquet reader #3633 [parquet] (thinkharderdev)
- chore: delete wrong comment and refactor set_metadata in
Field#3630 [arrow] (chunshao90) - Fix typo in comment #3627 [parquet] (kjschiroo)
- Minor: Update doc strings about Page Index / Column Index #3625 [parquet] (alamb)
- Specified version of helper function to cast binary to string #3624 [arrow] (viirya)
- feat: take kernel for RunArray #3622 [arrow] (askoa)
- Remove BitSliceIterator specialization from try_for_each_valid_idx #3621 [arrow] (tustvold)
- Reduce PrimitiveArray::try_unary codegen #3619 [arrow] (tustvold)
- Reduce Dictionary Builder Codegen #3616 [arrow] (tustvold)
- Minor: Add test for dictionary encoding of batches #3608 [arrow-flight] (alamb)
- Casting generic binary to generic string #3607 [arrow] (viirya)
- Add ArrayAccessor, Iterator, Extend and benchmarks for RunArray #3603 [arrow] (askoa)
32.0.0 (2023-01-27)
Breaking changes:
- Allow
StringArrayconstruction withVec<Option<String>>#3602 [arrow] (sinistersnare) - Use native types in PageIndex (#3575) #3578 [parquet] (tustvold)
- Add external variant to ParquetError (#3285) #3574 [parquet] (tustvold)
- Return reference from ListArray::values #3561 [arrow] (tustvold)
- feat: Add
RunEndEncodedArray#3553 [parquet] [arrow] (askoa)
Implemented enhancements:
- There should be a
From<Vec<Option<String>>>impl forGenericStringArray<OffsetSize>#3599 [arrow] - FlightDataEncoder Optionally send Schema even when no record batches #3591 [arrow-flight]
- Use Native Types in PageIndex #3575 [parquet]
- Packing array into dictionary of generic byte array #3571 [arrow]
- Implement
Error::Sourcefor ArrowError and FlightError #3566 [arrow] [arrow-flight] - [FlightSQL] Allow access to underlying FlightClient #3551 [arrow-flight]
- Arrow CSV writer should not fail when cannot cast the value #3547 [arrow]
- Write Deprecated Min Max Statistics When ColumnOrder Signed #3526 [parquet]
- Improve Performance of JSON Reader #3441
- Support footer kv metadata for IPC file #3432
- Add
Externalvariant to ParquetError #3285 [parquet]
Fixed bugs:
- Nullif of NULL Predicate is not NULL #3589
- BooleanBufferBuilder Fails to Clear Set Bits On Truncate #3587 [arrow]
nullifincorrectly calculatesnull_count, sometimes panics with subtraction overflow error #3579 [arrow]- Meet warning when use pyarrow #3543 [arrow]
- Incorrect row group total_byte_size written to parquet file #3530 [parquet]
- Overflow when casting timestamps prior to the epoch #3512 [arrow]
Closed issues:
- Panic on Key Overflow in Dictionary Builders #3562 [parquet] [arrow]
- Bumping version gives compilation error (arrow-array) #3525
Merged pull requests:
- Add Push-Based CSV Decoder #3604 [arrow] (tustvold)
- Update to flatbuffers 23.1.21 #3597 [arrow] (tustvold)
- Faster BooleanBufferBuilder::append_n for true values #3596 [arrow] (tustvold)
- Support sending schemas for empty streams #3594 [arrow-flight] (alamb)
- Faster ListArray to StringArray conversion #3593 [arrow] (tustvold)
- Add conversion from StringArray to BinaryArray #3592 [arrow] (tustvold)
- Fix nullif null count (#3579) #3590 [arrow] (tustvold)
- Clear bits in BooleanBufferBuilder (#3587) #3588 [arrow] (tustvold)
- Iterate all dictionary key types in cast test #3585 [arrow] (viirya)
- Propagate EOF Error from AsyncRead #3576 [parquet] (Sach1nAgarwal)
- Show row_counts also for (FixedLen)ByteArray #3573 [parquet] (bmmeijers)
- Packing array into dictionary of generic byte array #3572 [arrow] (viirya)
- Remove unwrap on datetime cast for CSV writer #3570 [arrow] (comphead)
- Implement
std::error::Error::sourceforArrowErrorandFlightError#3567 [arrow] [arrow-flight] (alamb) - Improve GenericBytesBuilder offset overflow panic message (#139) #3564 [arrow] (tustvold)
- Implement Extend for ArrayBuilder (#1841) #3563 [arrow] (tustvold)
- Update pyarrow method call with kwargs #3560 [arrow] (Frankonly)
- Update pyo3 requirement from 0.17 to 0.18 #3557 [arrow] (viirya)
- Expose Inner FlightServiceClient on FlightSqlServiceClient (#3551) #3556 [arrow-flight] (tustvold)
- Fix final page row count in parquet-index binary #3554 [parquet] (tustvold)
- Parquet Avoid Reading 8 Byte Footer Twice from AsyncRead #3550 [parquet] (Sach1nAgarwal)
- Improve concat kernel capacity estimation #3546 [arrow] (tustvold)
- Update proc-macro2 requirement from =1.0.49 to =1.0.50 #3545 [arrow-flight] (dependabot[bot])
- Update pyarrow method call to avoid warning #3544 [arrow] (Frankonly)
- Enable casting between Utf8/LargeUtf8 and Binary/LargeBinary #3542 [arrow] (viirya)
- Use GHA concurrency groups (#3495) #3538 (tustvold)
- set sum of uncompressed column size as row group size for parquet files #3531 [parquet] (sidred)
- Minor: Add documentation about memory use for ArrayData #3529 [arrow] (alamb)
- Upgrade to clap 4.1 + fix test #3528 [parquet] (tustvold)
- Write backwards compatible row group statistics (#3526) #3527 [parquet] (tustvold)
- No panic on timestamp buffer overflow #3519 [arrow] (comphead)
- Support casting from binary to dictionary of binary #3482 [arrow] (viirya)
- Add Raw JSON Reader (~2.5x faster) #3479 [arrow] (tustvold)
31.0.0 (2023-01-13)
Breaking changes:
- support RFC3339 style timestamps in
arrow-json#3449 [arrow] (JayjeetAtGithub) - Improve arrow flight batch splitting and naming #3444 [arrow-flight] (alamb)
- Parquet record API: timestamp as signed integer #3437 [parquet] (ByteBaker)
- Support decimal int32/64 for writer #3431 [parquet] (liukun4515)
Implemented enhancements:
- Support casting Date32 to timestamp #3504 [arrow]
- Support casting strings like
'2001-01-01'to timestamp #3492 [arrow] - CLI to "rewrite" parquet files #3476 [parquet]
- Add more dictionary value type support to
build_compare#3465 - Allow
concat_batchesto take non owned RecordBatch #3456 [arrow] - Release Arrow
30.0.1(maintenance release for30.0.0) #3455 - Add string comparisons (starts_with, ends_with, and contains) to kernel #3442 [arrow]
- make_builder Loses Timezone and Decimal Scale Information #3435 [arrow]
- Use RFC3339 style timestamps in arrow-json #3416 [arrow]
- ArrayData
get_slice_memory_sizeor similar #3407 [arrow] [arrow-flight]
Fixed bugs:
- Unable to read CSV with null boolean value #3521 [arrow]
- Make consistent behavior on zeros equality on floating point types #3509
- Sliced batch w/ bool column doesn't roundtrip through IPC #3496 [arrow] [arrow-flight]
- take kernel on List array introduces nulls instead of empty lists #3471 [arrow]
- Infinite Loop If Skipping More CSV Lines than Present #3469 [arrow]
Merged pull requests:
- Fix reading null booleans from CSV #3523 [arrow] (tustvold)
- minor fix: use the unified decimal type builder #3522 [parquet] (liukun4515)
- Update version to
31.0.0and add changelog #3518 [parquet] [arrow] [arrow-flight] (iajoiner) - Additional nullif re-export #3515 [arrow] (tustvold)
- Make consistent behavior on zeros equality on floating point types #3510 (viirya)
- Enable cast Date32 to Timestamp #3508 [arrow] (comphead)
- Update prost-build requirement from =0.11.5 to =0.11.6 #3507 [arrow-flight] (dependabot[bot])
- minor fix for the comments #3505 [arrow] (liukun4515)
- Fix DataTypeLayout for LargeList #3503 [arrow] (viirya)
- Add string comparisons (starts_with, ends_with, and contains) to kernel #3502 [arrow] (snmvaughan)
- Add a function to get memory size of array slice #3501 [arrow] (askoa)
- Fix IPCWriter for Sliced BooleanArray #3498 [arrow] (crepererum)
- Fix: Added support to cast string without time #3494 [arrow] (gaelwjl)
- Fix negative interval prettyprint #3491 [arrow] (Jefffrey)
- Fixes a broken link in the arrow lib.rs rustdoc #3487 [arrow] (AdamGS)
- Refactoring build_compare for decimal and using downcast_primitive #3484 (viirya)
- Add tests for record batch size splitting logic in FlightClient #3481 [arrow-flight] (alamb)
- change
concat_batchesparameter to non owned reference #3480 [arrow] (askoa) - feat: add
parquet-rewriteCLI #3477 [parquet] (crepererum) - Preserve empty list array elements in take kernel #3473 [arrow] (jonmmease)
- Add a test for stream writer for writing sliced array #3472 [arrow] (viirya)
- Fix CSV infinite loop and improve error messages #3470 [arrow] (tustvold)
- Add more dictionary value type support to
build_compare#3466 (viirya) - Add tests for
FlightClient::{list_flights, list_actions, do_action, get_schema}#3463 [arrow-flight] (alamb) - Minor: add ticket links to failing ipc integration tests #3461 (alamb)
- feat:
column_namebased index access forRecordBatchandStructArray#3458 [arrow] (askoa) - Support Decimal256 in FFI #3453 [arrow] (viirya)
- Remove multiversion dependency #3452 [arrow] (tustvold)
- Re-export nullif kernel #3451 [arrow] (tustvold)
- Meaningful error message for map builder with null keys #3450 [arrow] (Jefffrey)
- Parquet writer v2: clear buffer after page flush #3447 [parquet] (askoa)
- Verify ArrayData::data_type compatible in PrimitiveArray::from #3440 [arrow] (tustvold)
- Preserve DataType metadata in make_builder #3438 [arrow] (tustvold)
- Consolidate arrow ipc tests and increase coverage #3427 [arrow] (alamb)
- Generic bytes dictionary builder #3426 [arrow] (viirya)
- Minor: Improve docs for arrow-ipc, remove clippy ignore #3421 [arrow] (alamb)
- refactor: convert
*like_dyn,*like_utf8_scalar_dynand*like_dictfunctions to macros #3411 [arrow] (askoa) - Add parquet-index binary #3405 [parquet] (tustvold)
- Complete mid-level
FlightClient#3402 [arrow-flight] (alamb) - Implement
RecordBatch<-->FlightDataencode/decode + tests #3391 [arrow] [arrow-flight] (alamb) - Provide
into_builderfor bytearray #3326 [arrow] (viirya)
30.0.1 (2023-01-04)
Implemented enhancements:
- Generic bytes dictionary builder #3425 [arrow]
- Derive Clone for the builders in object-store. #3419
- Mid-level
ArrowFlightClient #3371 [arrow-flight] - Improve performance of the CSV parser #3338 [arrow]
Fixed bugs:
nullifkernel no longer exported #3454 [arrow]- PrimitiveArray from ArrayData Unsound For IntervalArray #3439 [arrow]
- LZ4-compressed PQ files unreadable by Pandas and ClickHouse #3433 [parquet]
- Parquet Record API: Cannot convert date before Unix epoch to json #3430 [parquet]
- parquet-fromcsv with writer version v2 does not stop #3408 [parquet]
30.0.0 (2022-12-29)
Breaking changes:
- Infer Parquet JSON Logical and Converted Type as UTF-8 #3376 [parquet] (tustvold)
- Use custom Any instead of prost_types #3360 [arrow-flight] (tustvold)
- Use bytes in arrow-flight #3359 [arrow-flight] (tustvold)
Implemented enhancements:
- Add derived implementations of Clone and Debug for
ParquetObjectReader#3381 [parquet] - Speed up TrackedWrite #3366 [parquet]
- Is it possible for ArrowWriter to write key_value_metadata after write all records #3356 [parquet]
- Add UnionArray test to arrow-pyarrow integration test #3346
- Document / Deprecate arrow_flight::utils::flight_data_from_arrow_batch #3312 [arrow] [arrow-flight]
- [FlightSQL] Support HTTPs #3309 [arrow-flight]
- Support UnionArray in ffi #3304 [arrow]
- Add support for Azure Data Lake Storage Gen2 (aka: ADLS Gen2) in Object Store library #3283
- Support casting from String to Decimal #3280 [arrow]
- Allow ArrowCSV writer to control the display of NULL values #3268 [arrow]
Fixed bugs:
- FlightSQL example is broken #3386 [arrow-flight]
- CSV Reader Bounds Incorrectly Handles Header #3364 [arrow]
- Incorrect output string from
try_to_type#3350 - Decimal arithmetic computation fails to run because decimal type equality #3344 [arrow]
- Pretty print not implemented for Map #3322 [arrow]
- ILIKE Kernels Inconsistent Case Folding #3311 [arrow]
Documentation updates:
- minor: Improve arrow-flight docs #3372 [arrow] [arrow-flight] (alamb)
Merged pull requests:
- Version 30.0.0 release notes and changelog #3406 [parquet] [arrow] [arrow-flight] (alamb)
- Ends ParquetRecordBatchStream when polling on StreamState::Error #3404 [parquet] (viirya)
- fix clippy issues #3398 (Jimexist)
- Upgrade multiversion to 0.7.1 #3396 (viirya)
- Make FlightSQL Support HTTPs #3388 [arrow-flight] (viirya)
- Fix broken FlightSQL example #3387 [arrow-flight] (viirya)
- Update prost-build #3385 [arrow-flight] (tustvold)
- Split out arrow-arith (#2594) #3384 [arrow] (tustvold)
- Add derive for Clone and Debug for
ParquetObjectReader#3382 [parquet] (kszlim) - Initial Mid-level
FlightClient#3378 [arrow-flight] (alamb) - Document all features on docs.rs #3377 [arrow] [arrow-flight] (tustvold)
- Split out arrow-row (#2594) #3375 [arrow] (tustvold)
- Remove unnecessary flush calls on TrackedWrite #3374 [parquet] (viirya)
- Update proc-macro2 requirement from =1.0.47 to =1.0.49 #3369 [arrow-flight] (dependabot[bot])
- Add CSV build_buffered (#3338) #3368 [arrow] (tustvold)
- feat: add append_key_value_metadata #3367 [parquet] (jiacai2050)
- Add csv-core based reader (#3338) #3365 [arrow] (tustvold)
- Put BufWriter into TrackedWrite #3361 [parquet] (viirya)
- Add CSV reader benchmark (#3338) #3357 [arrow] (tustvold)
- Use ArrayData::ptr_eq in DictionaryTracker #3354 [arrow] (tustvold)
- Deprecate flight_data_from_arrow_batch #3353 [arrow] [arrow-flight] (Dandandan)
- Fix incorrect output string from try_to_type #3351 (viirya)
- Fix unary_dyn for decimal scalar arithmetic computation #3345 [arrow] (viirya)
- Add UnionArray test to arrow-pyarrow integration test #3343 (viirya)
- feat: configure null value in arrow csv writer #3342 [arrow] (askoa)
- Optimize bulk writing of all blocks of bloom filter #3340 [parquet] (viirya)
- Add MapArray to pretty print #3339 [arrow] (askoa)
- Update prost-build 0.11.4 #3334 [arrow-flight] (tustvold)
- Faster Parquet Bloom Writer #3333 (tustvold)
- Add bloom filter benchmark for parquet writer #3323 [parquet] (viirya)
- Add ASCII fast path for ILIKE scalar (90% faster) #3306 [arrow] (tustvold)
- Support UnionArray in ffi #3305 [arrow] (viirya)
- Support casting from String to Decimal #3281 [arrow] (viirya)
- add more integration test for parquet bloom filter round trip tests #3210 [parquet] (Jimexist)
29.0.0 (2022-12-09)
Breaking changes:
- Minor: Allow
Field::newandField::new_with_dictto take existingStringas well as&str#3288 [arrow] (alamb) - update
&Option<T>toOption<&T>#3249 [parquet] [arrow] (Jimexist) - Hide
*_dict_scalarkernels behind*_dynkernels #3202 [arrow] (viirya)
Implemented enhancements:
- Support writing BloomFilter in arrow_writer #3275 [parquet]
- Support casting from unsigned numeric to Decimal256 #3272 [arrow]
- Support casting from Decimal256 to float types #3266 [arrow]
- Make arithmetic kernels supports DictionaryArray of DecimalType #3254 [arrow]
- Casting from Decimal256 to unsigned numeric #3239 [arrow]
- precision is not considered when cast value to decimal #3223 [arrow]
- Use RegexSet in arrow_csv::infer_field_schema #3211 [arrow]
- Implement FlightSQL Client #3206 [arrow-flight]
- Add binary_mut and try_binary_mut #3143 [arrow]
- Add try_unary_mut #3133 [arrow]
Fixed bugs:
- Skip null buffer when importing FFI ArrowArray struct if no null buffer in the spec #3290 [arrow]
- using ahash
compile-time-rngkills reproducible builds #3271 [parquet] - Decimal128 to Decimal256 Overflows #3265 [arrow]
nullifpanics on empty array #3261 [arrow]- Some more inconsistency between can_cast_types and cast_with_options #3250 [arrow]
- Enable casting between Dictionary of DecimalArray and DecimalArray #3237 [arrow]
- new_null_array Panics creating StructArray with non-nullable fields #3226 [arrow]
- bool should cast from/to Float16Type as
can_cast_typesreturns true #3221 [arrow] - Utf8 and LargeUtf8 cannot cast from/to Float16 but can_cast_types returns true #3220 [arrow]
- Re-enable some tests in
arrow-castcrate #3219 [arrow] - Off-by-one buffer size error triggers Panic when constructing RecordBatch from IPC bytes (should return an Error) #3215 [arrow]
- arrow to and from pyarrow conversion results in changes in schema #3136 [arrow]
Documentation updates:
Merged pull requests:
- Use BufWriter when writing bloom filters and limit tests (#3318) #3319 [parquet] (tustvold)
- Use take for dictionary like comparisons #3313 [arrow] (tustvold)
- Update versions to 29.0.0 and update CHANGELOG #3315 [parquet] [arrow] [arrow-flight] (alamb)
- refactor: Merge similar functions
ilike_scalarandnilike_scalar#3303 [arrow] (askoa) - Split out arrow-ord (#2594) #3299 [arrow] (tustvold)
- Split out arrow-string (#2594) #3295 [arrow] (tustvold)
- Skip null buffer when importing FFI ArrowArray struct if no null buffer in the spec #3293 [arrow] (viirya)
- Don't use dangling NonNull as sentinel #3289 [arrow] (tustvold)
- Set bloom filter on byte array #3284 [parquet] (viirya)
- Fix ipc schema custom_metadata serialization #3282 [arrow] (Jefffrey)
- Disable const-random ahash feature on non-WASM (#3271) #3277 [parquet] (tustvold)
- fix(ffi): handle null data buffers from empty arrays #3276 [arrow] (wjones127)
- Support casting from unsigned numeric to Decimal256 #3273 [arrow] (viirya)
- Add parquet-layout binary #3269 [parquet] (tustvold)
- Support casting from Decimal256 to float types #3267 [arrow] (viirya)
- Simplify decimal cast logic #3264 [arrow] (tustvold)
- Fix panic on nullif empty array (#3261) #3263 [arrow] (tustvold)
- Add BooleanArray::from_unary and BooleanArray::from_binary #3258 [arrow] (tustvold)
- Minor: Remove parquet build script #3257 [parquet] (tustvold)
- Make arithmetic kernels supports DictionaryArray of DecimalType #3255 [arrow] (viirya)
- Support List and LargeList in Row format (#3159) #3251 [arrow] (tustvold)
- Don't recurse to children in ArrayData::try_new #3248 [arrow] (tustvold)
- Validate dictionaries read over IPC #3247 [arrow] (tustvold)
- Fix MapBuilder example #3246 [arrow] (tustvold)
- Loosen nullability restrictions added in #3205 (#3226) #3244 [arrow] (tustvold)
- Better document implications of offsets (#3228) #3243 [arrow] (tustvold)
- Add new API to validate the precision for decimal array #3242 [arrow] (liukun4515)
- Move nullif to arrow-select (#2594) #3241 [arrow] (tustvold)
- Casting from Decimal256 to unsigned numeric #3240 [arrow] (viirya)
- Enable casting between Dictionary of DecimalArray and DecimalArray #3238 [arrow] (viirya)
- Remove unwraps from 'create_primitive_array' #3232 [arrow] (aarashy)
- Fix CI build by upgrading tonic-build to 0.8.4 #3231 [arrow-flight] (viirya)
- Remove negative scale check #3230 [arrow] (viirya)
- Update prost-build requirement from =0.11.2 to =0.11.3 #3225 [arrow-flight] (dependabot[bot])
- Get the round result for decimal to a decimal with smaller scale #3224 [arrow] (liukun4515)
- Move tests which require chrono-tz feature from
arrow-casttoarrow#3222 [arrow] (viirya) - add test cases for extracting week with/without timezone #3218 [arrow] (waitingkuo)
- Use RegexSet for matching DataType #3217 [arrow] (askoa)
- Update tonic-build to 0.8.3 #3214 [arrow-flight] (tustvold)
- Support StructArray in Row Format (#3159) #3212 [arrow] (tustvold)
- Infer timestamps from CSV files #3209 [arrow] (Jefffrey)
- fix bug: cast decimal256 to other decimal with no-safe #3208 [arrow] (liukun4515)
- FlightSQL Client & integration test #3207 [arrow-flight] (avantgardnerio)
- Ensure StructArrays check nullability of fields #3205 [arrow] (Jefffrey)
- Remove special case ArrayData equality for decimals #3204 [arrow] (tustvold)
- Add a cast test case for decimal negative scale #3203 [arrow] (viirya)
- Move zip and shift kernels to arrow-select #3201 [arrow] (tustvold)
- Deprecate limit kernel #3200 [arrow] (tustvold)
- Use SlicesIterator for ArrayData Equality #3198 [arrow] (viirya)
- Add _dyn kernels of like, ilike, nlike, nilike kernels for dictionary support #3197 [arrow] (viirya)
- Adding scalar nlike_dyn, ilike_dyn, nilike_dyn kernels #3195 [arrow] (psvri)
- Use self capture in DataType #3190 [arrow] (tustvold)
- To pyarrow with schema #3188 [arrow] (doki23)
- Support Duration in array_value_to_string #3183 [arrow] (psvri)
- Support
FixedSizeBinaryin Row format #3182 [arrow] (tustvold) - Add binary_mut and try_binary_mut #3144 [arrow] (viirya)
- Add try_unary_mut #3134 [arrow] (viirya)
28.0.0 (2022-11-25)
Breaking changes:
- StructArray::columns return slice #3186 [parquet] [arrow] (tustvold)
- Return slice from GenericByteArray::value_data #3171 [arrow] (tustvold)
- Support decimal negative scale #3152 [arrow] (viirya)
- refactor: convert
Field::metadatatoHashMap#3148 [parquet] [arrow] (crepererum) - Don't Skip Serializing Empty Metadata (#3082) #3126 [arrow] (askoa)
- Add Decimal128, Decimal256, Float16 to DataType::is_numeric #3121 [arrow] (tustvold)
- Upgrade to thrift 0.17 and fix issues #3104 [parquet] [arrow] (Jimexist)
- Fix prettyprint for Interval second fractions #3093 [arrow] (Jefffrey)
- Remove Option from
Field::metadata#3091 [parquet] [arrow] (askoa)
Implemented enhancements:
- Add iterator to RowSelection #3172 [parquet]
- create an integration test set for parquet crate against pyspark for working with bloom filters #3167 [parquet]
- Row Format Size Tracking #3160 [arrow]
- Add ArrayBuilder::finish_cloned() #3154 [arrow]
- Optimize memory usage of json reader #3150
- Add
Field::sizeandDataType::size#3147 [parquet] [arrow] - Add like_utf8_scalar_dyn kernel #3145 [arrow]
- support comparison for decimal128 array with scalar in kernel #3140 [arrow]
- audit and create a document for bloom filter configurations #3138 [parquet]
- Should be the rounding vs truncation when cast decimal to smaller scale #3137 [arrow]
- Upgrade chrono to 0.4.23 #3120
- Implements more temporal kernels using time_fraction_dyn #3108 [arrow]
- Upgrade to thrift 0.17 #3105 [parquet] [arrow]
- Be able to parse time formatted strings #3100 [arrow]
- Improve "Fail to merge schema" error messages #3095 [arrow]
- Expose
SortingColumnwhen reading and writing parquet metadata #3090 [parquet] - Change Field::metadata to HashMap #3086 [parquet] [arrow]
- Support bloom filter reading and writing for parquet #3023 [parquet]
- API to take back ownership of an ArrayRef #2901 [arrow]
- Specialized Interleave Kernel #2864 [arrow]
Fixed bugs:
- arithmetic overflow leads to segfault in
concat_batches#3123 [arrow] - Clippy failing on master : error: use of deprecated associated function chrono::NaiveDate::from_ymd: use from_ymd_opt() instead #3097 [parquet] [arrow]
- Pretty print for interval types has wrong formatting #3092 [arrow]
- Field is not serializable with binary formats #3082 [arrow]
- Decimal Casts are Unchecked #2986 [arrow]
Closed issues:
- Release Arrow
27.0.0(next release after26.0.0) #3045 [parquet] [arrow] [arrow-flight] - Perf about ParquetRecordBatchStream vs ParquetRecordBatchReader #2916
Merged pull requests:
- Improve regex related kernels by upto 85% #3192 [arrow] (psvri)
- Derive clone for arrays #3184 [arrow] (tustvold)
- Row decode cleanups #3180 [arrow] (tustvold)
- Update zstd requirement from 0.11.1 to 0.12.0 #3178 [parquet] [arrow] (dependabot[bot])
- Move decimal constants from
arrow-datatoarrow-schemacrate #3177 [arrow] (mbrobbel) - bloom filter part V: add an integration with pytest against pyspark #3176 [parquet] (Jimexist)
- Bloom filter config tweaks (#3023) #3175 [parquet] (tustvold)
- Add RowParser #3174 [arrow] (tustvold)
- Add
RowSelection::iter(),Into<Vec<RowSelector>>and example #3173 [parquet] (alamb) - Add read parquet examples #3170 [parquet] (xudong963)
- Faster BinaryArray to StringArray conversion (~67%) #3168 [arrow] (tustvold)
- Remove unnecessary downcasts in builders #3166 [arrow] (tustvold)
- bloom filter part IV: adjust writer properties, bloom filter properties, and incorporate into column encoder #3165 [parquet] (Jimexist)
- Fix parquet decimal precision #3164 [parquet] (psvri)
- Add Row size methods (#3160) #3163 [arrow] (tustvold)
- Prevent precision=0 for decimal type #3162 [arrow] (psvri)
- Remove unnecessary Buffer::from_slice_ref reference #3161 [arrow] (tustvold)
- Add finish_cloned to ArrayBuilder #3158 [arrow] (askoa)
- Check overflow in MutableArrayData extend offsets (#3123) #3157 [arrow] (tustvold)
- Extend Decimal256 as Primitive #3156 [arrow] (tustvold)
- Doc improvements #3155 [arrow] (psvri)
- Add collect.rs example #3153 [arrow] (viirya)
- Implement Neg for i256 #3151 [arrow] (tustvold)
- feat:
{Field,DataType}::size#3149 [arrow] (crepererum) - Add like_utf8_scalar_dyn kernel #3146 [arrow] (viirya)
- comparison op: decimal128 array with scalar #3141 [arrow] (liukun4515)
- Cast: should get the round result for decimal to a decimal with smaller scale #3139 [arrow] (liukun4515)
- Fix Panic on Reading Corrupt Parquet Schema (#2855) #3130 [parquet] (psvri)
- Clippy parquet fixes #3124 [parquet] [arrow] (psvri)
- Add GenericByteBuilder (#2969) #3122 [arrow] (tustvold)
- parquet bloom filter part III: add sbbf writer, remove
bloomdefault feature, add reader properties #3119 [parquet] (Jimexist) - Add downcast_array (#2901) #3117 [arrow] (tustvold)
- Add COW conversion for Buffer and PrimitiveArray and unary_mut #3115 [arrow] (viirya)
- Include field name in merge error message #3113 [arrow] (andygrove)
- Add PrimitiveArray::unary_opt #3110 [arrow] (tustvold)
- Implements more temporal kernels using time_fraction_dyn #3107 [arrow] (viirya)
- cast: support unsigned numeric type to decimal128 #3106 [arrow] (liukun4515)
- Expose
SortingColumnin parquet files #3103 [parquet] (askoa) - parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo #3102 [parquet] (Jimexist)
- Parse Time32/Time64 from formatted string #3101 [arrow] (Jefffrey)
- Cleanup temporal _internal functions #3099 [arrow] (viirya)
- Improve schema mismatch error message #3098 [arrow] (askoa)
- Fix clippy by avoiding deprecated functions in chrono #3096 [parquet] [arrow] (viirya)
- Minor: Add diagrams and documentation to row format #3094 [arrow] (alamb)
- Minor: Use ArrowNativeTypeOp instead of total_cmp directly #3087 [arrow] (viirya)
- Check overflow while casting between decimal types #3076 [arrow] (viirya)
- add bloom filter implementation based on split block (sbbf) spec #3057 [parquet] (Jimexist)
- Add FixedSizeBinaryArray::try_from_sparse_iter_with_size #3054 [arrow] (maxburke)
27.0.0 (2022-11-11)
Breaking changes:
- Recurse into Dictionary value type in DataType::is_nested #3083 [arrow] (tustvold)
- early type checks in
RowConverter#3080 [arrow] (crepererum) - Add Decimal128 and Decimal256 to downcast_primitive #3056 [arrow] (viirya)
- Replace remaining _generic temporal kernels with _dyn kernels #3046 [arrow] (viirya)
- Replace year_generic with year_dyn #3041 [arrow] (viirya)
- Validate decimal256 with i256 directly #3025 [arrow] (viirya)
- Hadoop LZ4 Support for LZ4 Codec #3013 [parquet] (marioloko)
- Replace hour_generic with hour_dyn #3006 [arrow] (viirya)
- Accept any &dyn Array in nullif kernel #2940 [arrow] (tustvold)
Implemented enhancements:
- Row Format: Option to detach/own a row #3078 [arrow]
- Row Format: API to check if datatypes are supported #3077 [arrow]
- Deprecate Buffer::count_set_bits #3067 [arrow]
- Add Decimal128 and Decimal256 to downcast_primitive #3055 [arrow]
- Improved UX of creating
TimestampNanosecondArraywith timezones #3042 [arrow] - Cast decimal256 to signed integer #3039 [arrow]
- Support casting Date64 to Timestamp #3037 [arrow]
- Check overflow when casting floating point value to decimal256 #3032 [arrow]
- Compare i256 in validate_decimal256_precision #3024 [arrow]
- Check overflow when casting floating point value to decimal128 #3020 [arrow]
- Add macro downcast_temporal_array #3008 [arrow]
- Replace hour_generic with hour_dyn #3005 [arrow]
- Replace temporal _generic kernels with dyn #3004 [arrow]
- Add
RowSelection::intersection#3003 [parquet] - I would like to round rather than truncate when casting f64 to decimal #2997 [arrow]
- arrow::compute::kernels::temporal should support nanoseconds #2995 [arrow]
- Release Arrow
26.0.0(next release after25.0.0) #2953 [parquet] [arrow] [arrow-flight] - Add timezone offset for debug format of Timestamp with Timezone #2917 [arrow]
- Support merge RowSelectors when creating RowSelection #2858 [parquet]
Fixed bugs:
- Inconsistent Nan Handling Between Scalar and Non-Scalar Comparison Kernels #3074 [arrow]
- Debug format for timestamp ignores timezone #3069 [arrow]
- Row format decode loses timezone #3063 [arrow]
- binary operator produces incorrect result on arrays with resized null buffer #3061 [arrow]
- RLEDecoder Panics on Null Padded Pages #3035 [parquet]
- Nullif with incorrect valid_count #3031 [arrow]
- RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024 #3029 [parquet]
- Converted type is None according to Parquet Tools then utilizing logical types #3017
- CompressionCodec LZ4 incompatible with C++ implementation #2988 [parquet]
Documentation updates:
Merged pull requests:
- Improved UX of creating
TimestampNanosecondArraywith timezones #3088 [arrow] (src255) - Remove unused range module #3085 [parquet] (tustvold)
- Make intersect_row_selections a member function #3084 [parquet] (tustvold)
- Update hashbrown requirement from 0.12 to 0.13 #3081 [parquet] [arrow] (dependabot[bot])
- feat: add
OwnedRow#3079 [arrow] (crepererum) - Use ArrowNativeTypeOp on non-scalar comparison kernels #3075 [arrow] (viirya)
- Add missing inline to ArrowNativeTypeOp #3073 [arrow] (tustvold)
- fix debug information for Timestamp with Timezone #3072 [arrow] (waitingkuo)
- Deprecate Buffer::count_set_bits (#3067) #3071 [arrow] (tustvold)
- Add compare to ArrowNativeTypeOp #3070 [arrow] (tustvold)
- Minor: Improve docstrings on WriterPropertiesBuilder #3068 [parquet] (alamb)
- Faster f64 inequality #3065 [arrow] (tustvold)
- Fix row format decode loses timezone (#3063) #3064 [arrow] (tustvold)
- Fix null_count computation in binary #3062 [arrow] (viirya)
- Faster f64 equality #3060 [arrow] (tustvold)
- Update arrow-flight subcrates (#3044) #3052 [arrow-flight] (tustvold)
- Minor: Remove cloning ArrayData in with_precision_and_scale #3050 [arrow] (viirya)
- Split out arrow-json (#3044) #3049 [arrow] (tustvold)
- Move
intersect_row_selectionsfrom datafusion to arrow-rs. #3047 [parquet] (Ted-Jiang) - Split out arrow-csv (#2594) #3044 [arrow] (tustvold)
- Move reader_parser to arrow-cast (#3022) #3043 [arrow] (tustvold)
- Cast decimal256 to signed integer #3040 [arrow] (viirya)
- Enable casting from Date64 to Timestamp #3038 [arrow] (gruuya)
- Fix decoding long and/or padded RLE data (#3029) (#3035) #3036 [parquet] (tustvold)
- Fix nullif when existing array has no nulls #3034 [arrow] (tustvold)
- Check overflow when casting floating point value to decimal256 #3033 [arrow] (viirya)
- Update parquet to depend on arrow subcrates #3028 [parquet] (tustvold)
- Make various i256 methods const #3026 [arrow] (tustvold)
- Split out arrow-ipc #3022 [arrow] (tustvold)
- Check overflow while casting floating point value to decimal128 #3021 [arrow] (viirya)
- Update arrow-flight #3019 [arrow-flight] (tustvold)
- Move ArrowNativeTypeOp to arrow-array (#2594) #3018 [arrow] (tustvold)
- Support cast timestamp to time #3016 [arrow] (naosense)
- Add filter example #3014 [arrow] (tustvold)
- Check overflow when casting integer to decimal #3009 [arrow] (viirya)
- Add macro downcast_temporal_array #3007 [arrow] (viirya)
- Parquet Writer: Make column descriptor public on the writer #3002 [parquet] (pier-oliviert)
- Update chrono-tz requirement from 0.7 to 0.8 #3001 [arrow] (dependabot[bot])
- Round instead of Truncate while casting float to decimal #3000 [arrow] (waitingkuo)
- Support Predicate Pushdown for Parquet Lists (#2108) #2999 [parquet] (tustvold)
- Split out arrow-cast (#2594) #2998 [arrow] (tustvold)
arrow::compute::kernels::temporalshould support nanoseconds #2996 [arrow] (comphead)- Add
RowSelection::from_selectors_and_combineto merge RowSelectors #2994 [parquet] (Ted-Jiang) - Simplify Single-Column Dictionary Sort #2993 [arrow] (tustvold)
- Minor: Add entry to changelog for 26.0.0 RC2 fix #2992 (alamb)
- Fix ignored limit on
lexsort_to_indices#2991 [arrow] (alamb) - Add clone and equal functions for CastOptions #2985 [arrow] (askoa)
- minor: remove redundant prefix #2983 [arrow] [arrow-flight] (jackwener)
- Compare dictionary decimal arrays #2982 [arrow] (viirya)
- Compare dictionary and non-dictionary decimal arrays #2980 [arrow] (viirya)
- Add decimal comparison kernel support #2978 [arrow] (viirya)
- Move concat kernel to arrow-select (#2594) #2976 [arrow] (tustvold)
- Specialize interleave for byte arrays (#2864) #2975 [arrow] (tustvold)
- Use unary function for numeric to decimal cast #2973 [arrow] (viirya)
- Specialize filter kernel for binary arrays (#2969) #2971 [arrow] (tustvold)
- Combine take_utf8 and take_binary (#2969) #2970 [arrow] (tustvold)
- Faster Scalar Dictionary Comparison ~10% #2968 [arrow] (tustvold)
- Move
byte_sizefrom datafusion::physical_expr #2965 [arrow] (avantgardnerio) - Pass decompressed size to parquet Codec::decompress (#2956) #2959 [parquet] (marioloko)
- Add Decimal Arithmetic #2881 [arrow] (tustvold)
26.0.0 (2022-10-28)
Breaking changes:
- Cast Timestamps to RFC3339 strings #2934
- Remove Unused NativeDecimalType #2945 [arrow] (tustvold)
- Format Timestamps as RFC3339 #2939 [arrow] (waitingkuo)
- Update flatbuffers to resolve RUSTSEC-2021-0122 #2895 [arrow] (tustvold)
- replace
from_timestampbyfrom_timestamp_opt#2894 [arrow] (waitingkuo)
Implemented enhancements:
- Optimized way to count the numbers of
trueandfalsevalues in a BooleanArray #2963 [arrow] - Add pow to i256 #2954 [arrow]
- Write Generic Code over [Large]BinaryArray and [Large]StringArray #2946 [arrow]
- Add Page Row Count Limit #2941 [parquet]
- prettyprint to show timezone offset for timestamp with timezone #2937 [arrow]
- Cast numeric to decimal256 #2922 [arrow]
- Add
freeze_with_dictionaryAPI toMutableArrayData#2914 [arrow] - Support decimal256 array in sort kernels #2911 [arrow]
- support
[+/-]hhmmand[+/-]hhas fixedoffset timezone format #2910 [arrow] - Cleanup decimal sort function #2907 [arrow]
- replace
from_timestampbyfrom_timestamp_opt#2892 [arrow] - Move Primitive arity kernels to arrow-array #2787 [arrow]
- add overflow-checking for negative arithmetic kernel #2662 [arrow]
Fixed bugs:
- Subtle compatibility issue with serve_arrow #2952
- error[E0599]: no method named
total_cmpfound for structf16in the current scope #2926 [arrow] - Fail at rowSelection
and_thenmethod #2925 [parquet] - Ordering not implemented for FixedSizeBinary types #2904 [arrow]
- Parquet API: Could not convert timestamp before unix epoch to string/json #2897 [parquet]
- Overly Pessimistic RLE Size Estimation #2889 [parquet]
- Memory alignment error in
RawPtrBox::new#2882 [arrow] - Compilation error under chrono-tz feature #2878 [arrow]
- AHash Statically Allocates 64 bytes #2875 [parquet]
parquet::arrow::arrow_writer::ArrowWriterignores page size properties #2853 [parquet]
Documentation updates:
Closed issues:
- SerializedFileWriter comments about multiple call on consumed self #2935 [parquet]
- Pointer freed error when deallocating ArrayData with shared memory buffer #2874
- Release Arrow
25.0.0(next release after24.0.0) #2820 [parquet] [arrow] [arrow-flight] - Replace DecimalArray with PrimitiveArray #2637 [parquet] [arrow]
Merged pull requests:
- Fix ignored limit on lexsort_to_indices (#2991) #2991 [arrow] (alamb)
- Fix GenericListArray::try_new_from_array_data error message (#526) #2961 [arrow] (tustvold)
- Fix take string on sliced indices #2960 [arrow] (tustvold)
- Add BooleanArray::true_count and BooleanArray::false_count #2957 [arrow] (tustvold)
- Add pow to i256 #2955 [arrow] (viirya)
- fix datatype for timestamptz debug fmt #2948 [arrow] (waitingkuo)
- Add GenericByteArray (#2946) #2947 [arrow] (tustvold)
- Specialize interleave string ~2-3x faster #2944 [arrow] (tustvold)
- Added support for LZ4_RAW compression. (#1604) #2943 [parquet] (marioloko)
- Add optional page row count limit for parquet
WriterProperties(#2941) #2942 [parquet] (tustvold) - Cleanup orphaned doc comments (#2935) #2938 [parquet] (tustvold)
- support more fixedoffset tz format #2936 [arrow] (waitingkuo)
- Benchmark with prepared row converter #2930 [arrow] (tustvold)
- Add lexsort benchmark (#2871) #2929 [arrow] (tustvold)
- Improve panic messages for RowSelection::and_then (#2925) #2928 [parquet] (tustvold)
- Update required half from 2.0 --> 2.1 #2927 [arrow] (alamb)
- Cast numeric to decimal256 #2923 [arrow] (viirya)
- Cleanup generated proto code #2921 [arrow-flight] (tustvold)
- Deprecate TimestampArray from_vec and from_opt_vec #2919 [parquet] [arrow] (tustvold)
- Support decimal256 array in sort kernels #2912 [arrow] (viirya)
- Add timezone abstraction #2909 [arrow] (tustvold)
- Cleanup decimal sort function #2908 [arrow] (viirya)
- Simplify TimestampArray from_vec with timezone #2906 [arrow] (tustvold)
- Implement ord for FixedSizeBinary types #2905 [arrow] (maxburke)
- Update chrono-tz requirement from 0.6 to 0.7 #2903 [arrow] (dependabot[bot])
- Parquet record api support timestamp before epoch #2899 [parquet] (AnthonyPoncet)
- Specialize interleave integer #2898 [arrow] (tustvold)
- Support overflow-checking variant of negate kernel #2893 [arrow] (viirya)
- Respect Page Size Limits in ArrowWriter (#2853) #2890 [parquet] (tustvold)
- Improve row format docs #2888 [arrow] (tustvold)
- Add FixedSizeList::from_iter_primitive #2887 [arrow] (tustvold)
- Simplify ListArray::from_iter_primitive #2886 [arrow] (tustvold)
- Split out value selection kernels into arrow-select (#2594) #2885 [arrow] (tustvold)
- Increase default IPC alignment to 64 (#2883) #2884 [arrow] (tustvold)
- Copying inappropriately aligned buffer in ipc reader #2883 [arrow] (viirya)
- Validate decimal IPC read (#2387) #2880 [arrow] (tustvold)
- Fix compilation error under
chrono-tzfeature #2879 [arrow] (viirya) - Don't validate decimal precision in ArrayData (#2637) #2873 [arrow] (tustvold)
- Add downcast_integer and downcast_primitive #2872 [arrow] (tustvold)
- Filter DecimalArray as PrimitiveArray ~5x Faster (#2637) #2870 [arrow] (tustvold)
- Treat DecimalArray as PrimitiveArray in row format #2866 [arrow] (tustvold)
25.0.0 (2022-10-14)
Breaking changes:
- Make DecimalArray as PrimitiveArray #2857 [parquet] [arrow] (viirya)
- fix timestamp parsing while no explicit timezone given #2814 [arrow] (waitingkuo)
- Support Arbitrary Number of Arrays in downcast_primitive_array #2809 (tustvold)
Implemented enhancements:
- Restore Integration test JSON schema serialization #2876 [arrow]
- Fix various invalid_html_tags clippy error #2861 [parquet] [arrow] [arrow-flight]
- Replace complicated temporal macro with generic functions #2851 [arrow]
- Add NaN handling in dyn scalar comparison kernels #2829 [arrow]
- Add overflow-checking variant of sum kernel #2821 [arrow]
- Update to Clap 4 #2817 [parquet]
- Safe API to Operate on Dictionary Values #2797 [arrow]
- Add modulus op into
ArrowNativeTypeOp#2753 [arrow] - Allow creating of TimeUnit instances without direct dependency on parquet-format #2708 [parquet]
- Arrow Row Format #2677 [arrow]
Fixed bugs:
- Don't try to infer nulls in CSV schema inference #2859 [arrow]
parquet::arrow::arrow_writer::ArrowWriterignores page size properties #2853 [parquet]- Introducing ArrowNativeTypeOp made it impossible to call kernels from generics #2839 [arrow]
- Unsound ArrayData to Array Conversions #2834 [parquet] [arrow]
- Regression:
the trait bound for<'de> arrow::datatypes::Schema: serde::de::Deserialize<'de> is not satisfied#2825 [arrow] - convert string to timestamp shouldn't apply local timezone offset if there's no explicit timezone info in the string #2813 [arrow]
Closed issues:
Merged pull requests:
- Take decimal as primitive (#2637) #2869 [arrow] (tustvold)
- Split out arrow-integration-test crate #2868 [arrow] (tustvold)
- Decimal cleanup (#2637) #2865 [parquet] [arrow] (tustvold)
- Fix various invalid_html_tags clippy errors #2862 [parquet] [arrow] [arrow-flight] (viirya)
- Don't try to infer nullability in CSV reader #2860 [arrow] (Dandandan)
- Fix page size on dictionary fallback #2854 [parquet] (thinkharderdev)
- Replace complicated temporal macro with generic functions #2850 [arrow] (viirya)
- [feat] Add pub api for checking column index is sorted. #2849 [parquet] (Ted-Jiang)
- parquet: Add
snapoption to README #2847 [parquet] (exyi) - Cleanup cast kernel #2846 [arrow] (tustvold)
- Simplify ArrowNativeType #2841 [arrow] (tustvold)
- Expose ArrowNativeTypeOp trait to make it useful for type bound #2840 [arrow] (viirya)
- Add
interleavekernel (#1523) #2838 [arrow] (tustvold) - Handle empty offsets buffer (#1824) #2836 [arrow] (tustvold)
- Validate ArrayData type when converting to Array (#2834) #2835 [parquet] [arrow] (tustvold)
- Derive ArrowPrimitiveType for Decimal128Type and Decimal256Type (#2637) #2833 [arrow] (tustvold)
- Add NaN handling in dyn scalar comparison kernels #2830 [arrow] (viirya)
- Simplify OrderPreservingInterner allocation strategy ~97% faster (#2677) #2827 [arrow] (tustvold)
- Convert rows to arrays (#2677) #2826 [arrow] (tustvold)
- Add overflow-checking variant of sum kernel #2822 [arrow] (viirya)
- Update Clap dependency to version 4 #2819 [parquet] (jgoday)
- Fix i256 checked multiplication #2818 [arrow] (tustvold)
- Add string_dictionary benches for row format (#2677) #2816 [arrow] (tustvold)
- Add OrderPreservingInterner::lookup (#2677) #2815 [arrow] (tustvold)
- Simplify FixedLengthEncoding #2812 [arrow] (tustvold)
- Implement ArrowNumericType for Float16Type #2810 [arrow] (tustvold)
- Add DictionaryArray::with_values to make it easier to operate on dictionary values #2798 [arrow] (tustvold)
- Add i256 (#2637) #2781 [arrow] (tustvold)
- Add modulus ops into
ArrowNativeTypeOp#2756 [arrow] (HaoYang670) - feat: cast List / LargeList to Utf8 / LargeUtf8 #2588 [arrow] (gandronchik)
24.0.0 (2022-09-30)
Breaking changes:
- Cleanup
ArrowNativeType(#1918) #2793 [parquet] [arrow] (tustvold) - Remove
ArrowNativeType::FromStr#2775 [arrow] (tustvold) - Split out
arrow-arraycrate (#2594) #2769 [arrow] (tustvold) - Add
dyn_arith_dictfeature flag #2760 [arrow] (tustvold) - Split out
arrow-datainto a separate crate #2746 [arrow] (tustvold) - Split out arrow-schema (#2594) #2711 [arrow] (tustvold)
Implemented enhancements:
- Include field name in Parquet PrimitiveTypeBuilder error messages #2804 [parquet]
- Add PrimitiveArray::reinterpret_cast #2785
- BinaryBuilder and StringBuilder initialization parameters in struct_builder may be wrong #2783 [arrow]
- Add divide scalar dyn kernel which produces null for division by zero #2767 [arrow]
- Add divide dyn kernel which produces null for division by zero #2763 [arrow]
- Improve performance of checked kernels on non-null data #2747 [arrow]
- Add overflow-checking variants of arithmetic dyn kernels #2739 [arrow]
- The
binaryfunction should not panic on unequaled array length. #2721 [arrow]
Fixed bugs:
- min compute kernel is incorrect with sliced buffers in arrow 23 #2779 [arrow]
try_unary_dictshould check value type of dictionary array #2754 [arrow]
Closed issues:
- Add back JSON import/export for schema #2762
- null casting and coercion for Decimal128 #2761
- Json decoder behavior changed from versions 21 to 21 and returns non-sensical num_rows for RecordBatch #2722 [arrow]
- Release Arrow
23.0.0(next release after22.0.0) #2665 [parquet] [arrow] [arrow-flight]
Merged pull requests:
- add field name to parquet PrimitiveTypeBuilder error messages #2805 [parquet] (andygrove)
- Add struct equality test case (#514) #2791 [arrow] (tustvold)
- Move unary kernels to arrow-array (#2787) #2789 [arrow] (tustvold)
- Disable test harness for string_dictionary_builder benchmark #2788 [arrow] (tustvold)
- Add PrimitiveArray::reinterpret_cast (#2785) #2786 (tustvold)
- Fix BinaryBuilder and StringBuilder Capacity Allocation in StructBuilder #2784 (chunshao90)
- Fix min/max computation for sliced arrays (#2779) #2780 [arrow] (tustvold)
- Fix Backwards Compatible Parquet List Encodings (#1915) #2774 [parquet] (tustvold)
- MINOR: Fix clippy for rust 1.64.0 #2772 [parquet] [arrow] (viirya)
- MINOR: Fix clippy for rust 1.64.0 #2771 (viirya)
- Add divide scalar dyn kernel which produces null for division by zero #2768 [arrow] (viirya)
- Add divide dyn kernel which produces null for division by zero #2764 [arrow] (viirya)
- Add value type check in try_unary_dict #2755 [arrow] (viirya)
- Fix
verify_release_candidate.shfor new arrow subcrates #2752 (alamb) - Fix: Issue 2721 : binary function should not panic but return error w… #2750 [arrow] (aksharau)
- Speed up checked kernels for non-null data (~1.4-5x faster) #2749 [arrow] (Dandandan)
- Add overflow-checking variants of arithmetic dyn kernels #2740 [arrow] (viirya)
- Trim parquet row selection #2705 [parquet] (tustvold)
23.0.0 (2022-09-16)
Breaking changes:
- Move JSON Test Format To integration-testing #2724 [arrow] (tustvold)
- Split out arrow-buffer crate (#2594) #2693 [arrow] (tustvold)
- Simplify DictionaryBuilder constructors (#2684) (#2054) #2685 [parquet] [arrow] (tustvold)
- Deprecate RecordBatch::concat replace with concat_batches (#2594) #2683 [arrow] (tustvold)
- Add overflow-checking variant for primitive arithmetic kernels and explicitly define overflow behavior #2643 [arrow] (viirya)
- Update thrift v0.16 and vendor parquet-format (#2502) #2626 [parquet] (tustvold)
- Update flight definitions including backwards-incompatible change to GetSchema #2586 [arrow] [arrow-flight] (liukun4515)
Implemented enhancements:
- Cleanup like and nlike utf8 kernels #2744 [arrow]
- Speedup eq and neq kernels for utf8 arrays #2742 [arrow]
- API for more ergonomic construction of
RecordBatchOptions#2728 [arrow] - Automate updates to
CHANGELOG-old.md#2726 - Don't check the
DivideByZeroerror for float modulus #2720 [arrow] try_binaryshould not panic on unequaled array length. #2715 [arrow]- Add benchmark for bitwise operation #2714 [arrow]
- Add overflow-checking variants of arithmetic scalar dyn kernels #2712 [arrow]
- Add divide_opt kernel which produce null values on division by zero error #2709 [arrow]
- Add
DataTypefunction to detect nested types #2704 [arrow] - Add support of sorting dictionary of other primitive types #2700 [arrow]
- Sort indices of dictionary string values #2697 [arrow]
- Support empty projection in
RecordBatch::project#2690 [arrow] - Support sorting dictionary encoded primitive integer arrays #2679 [arrow]
- Use BitIndexIterator in min_max_helper #2674 [arrow]
- Support building comparator for dictionaries of primitive integer values #2672 [arrow]
- Change max/min string macro to generic helper function
min_max_helper#2657 [arrow] - Add overflow-checking variant of arithmetic scalar kernels #2651 [arrow]
- Compare dictionary with binary array #2644 [arrow]
- Add overflow-checking variant for primitive arithmetic kernels #2642 [arrow]
- Use
downcast_primitive_arrayin arithmetic kernels #2639 [arrow] - Support DictionaryArray in temporal kernels #2622 [arrow]
- Inline Generated Thift Code Into Parquet Crate #2502 [parquet]
Fixed bugs:
- Escape contains patterns for utf8 like kernels #2745 [arrow]
- Float Array should not panic on
DivideByZeroin theDividekernel #2719 [arrow] - DictionaryBuilders can Create Invalid DictionaryArrays #2684 [parquet] [arrow]
arrowcrate does not build withfeatures = ["ffi"]anddefault_features = false. #2670 [arrow]- Invalid results with
RowSelectorhavingrow_countof 0 #2669 [parquet] - clippy error: unresolved import
crate::array::layout#2659 [arrow] - Cast the numeric without the
CastOptions#2648 [arrow] - Explicitly define overflow behavior for primitive arithmetic kernels #2641 [arrow]
- update the
flight.protoand fix schema to SchemaResult #2571 [arrow] [arrow-flight] - Panic when first data page is skipped using ColumnChunkData::Sparse #2543 [parquet]
SchemaResultin IPC deviates from other implementations #2445 [arrow] [arrow-flight]
Closed issues:
Merged pull requests:
- Speedup string equal/not equal to empty string, cleanup like/ilike kernels, fix escape bug #2743 [arrow] (Dandandan)
- Partially flatten arrow-buffer #2737 [arrow] (tustvold)
- Automate updates to
CHANGELOG-old.md#2732 (iajoiner) - Update read parquet example in parquet/arrow home #2730 [parquet] (datapythonista)
- Better construction of RecordBatchOptions #2729 [arrow] (askoa)
- benchmark: bitwise operation #2718 [arrow] (liukun4515)
- Update
try_binaryandchecked_ops, and removemath_checked_op#2717 [arrow] (HaoYang670) - Support bitwise op in kernel: or,xor,not #2716 [arrow] (liukun4515)
- Add overflow-checking variants of arithmetic scalar dyn kernels #2713 [arrow] (viirya)
- Add divide_opt kernel which produce null values on division by zero error #2710 [arrow] (viirya)
- Add DataType::is_nested() #2707 [arrow] (kfastov)
- Update criterion requirement from 0.3 to 0.4 #2706 [parquet] [arrow] (dependabot[bot])
- Support bitwise and operation in the kernel #2703 [arrow] (liukun4515)
- Add support of sorting dictionary of other primitive arrays #2701 [arrow] (viirya)
- Clarify docs of binary and string builders #2699 [arrow] (datapythonista)
- Sort indices of dictionary string values #2698 [arrow] (viirya)
- Add support for empty projection in RecordBatch::project #2691 [arrow] (Dandandan)
- Temporarily disable Golang integration tests re-enable JS #2689 (tustvold)
- Verify valid UTF-8 when converting byte array (#2205) #2686 [arrow] (tustvold)
- Support sorting dictionary encoded primitive integer arrays #2680 [arrow] (viirya)
- Skip RowSelectors with zero rows #2678 [parquet] (askoa)
- Faster Null Path Selection in ArrayData Equality #2676 [arrow] (dhruv9vats)
- Use BitIndexIterator in min_max_helper #2675 [arrow] (viirya)
- Support building comparator for dictionaries of primitive integer values #2673 [arrow] (viirya)
- json feature always requires base64 feature #2668 [parquet] (eagletmt)
- Add try_unary, binary, try_binary kernels ~90% faster #2666 [arrow] (tustvold)
- Use downcast_dictionary_array in unary_dyn #2663 [arrow] (tustvold)
- optimize the
numeric_cast_with_error#2661 [arrow] (liukun4515) - ffi feature also requires layout #2660 [arrow] (viirya)
- Change max/min string macro to generic helper function min_max_helper #2658 [arrow] (viirya)
- Fix flaky test
test_fuzz_async_reader_selection#2656 [parquet] (thinkharderdev) - MINOR: Ignore flaky test test_fuzz_async_reader_selection #2655 [parquet] (viirya)
- MutableBuffer::typed_data - shared ref access to the typed slice #2652 [arrow] (medwards)
- Overflow-checking variant of arithmetic scalar kernels #2650 [arrow] (viirya)
- support
CastOptionfor casting numeric #2649 [arrow] (liukun4515) - Help LLVM vectorize comparison kernel ~50-80% faster #2646 [arrow] (tustvold)
- Support comparison between dictionary array and binary array #2645 [arrow] (viirya)
- Use
downcast_primitive_arrayin arithmetic kernels #2640 [arrow] (viirya) - Fully qualifying parquet items #2638 (dingxiangfei2009)
- Support DictionaryArray in temporal kernels #2623 [arrow] (viirya)
- Comparable Row Format #2593 [arrow] (tustvold)
- Fix bug in page skipping #2552 [parquet] (thinkharderdev)
22.0.0 (2022-09-02)
Breaking changes:
- Use
total_cmpfor floating value ordering and removenan_orderingfeature flag #2614 [arrow] (viirya) - Gate dyn comparison of dictionary arrays behind
dyn_cmp_dict#2597 [arrow] (tustvold) - Move JsonSerializable to json module (#2300) #2595 [arrow] (tustvold)
- Decimal precision scale datatype change #2532 [parquet] [arrow] (psvri)
- Refactor PrimitiveBuilder Constructors #2518 [parquet] [arrow] (psvri)
- Refactoring DecimalBuilder constructors #2517 [arrow] (psvri)
- Refactor FixedSizeBinaryBuilder Constructors #2516 [parquet] [arrow] (psvri)
- Refactor BooleanBuilder Constructors #2515 [arrow] (psvri)
- Refactor UnionBuilder Constructors #2488 [arrow] (psvri)
Implemented enhancements:
- Add Macros to assist with static dispatch #2635 [arrow]
- Support comparison between DictionaryArray and BooleanArray #2617 [arrow]
- Use
total_cmpfor floating value ordering and removenan_orderingfeature flag #2613 [arrow] - Support empty projection in CSV, JSON readers #2603 [arrow]
- Support SQL-compliant NaN ordering between for DictionaryArray and non-DictionaryArray #2599 [arrow]
- Add
dyn_cmp_dictfeature flag to gate dyn comparison of dictionary arrays #2596 [arrow] - Add max_dyn and min_dyn for max/min for dictionary array #2584 [arrow]
- Allow FlightSQL implementers to extend
do_get()#2581 [arrow-flight] - Support SQL-compliant behavior on
eq_dyn,neq_dyn,lt_dyn,lt_eq_dyn,gt_dyn,gt_eq_dyn#2569 [arrow] - Add sql-compliant feature for enabling sql-compliant kernel behavior #2568
- Calculate
sumfor dictionary array #2565 [arrow] - Add test for float nan comparison #2556 [arrow]
- Compare dictionary with string array #2548 [arrow]
- Compare dictionary with primitive array in
lt_dyn,lt_eq_dyn,gt_dyn,gt_eq_dyn#2538 [arrow] - Compare dictionary with primitive array in
eq_dynandneq_dyn#2535 [arrow] - UnionBuilder Create Children With Capacity #2523 [arrow]
- Speed up
like_utf8_scalarfor%pat%#2519 [arrow] - Replace macro with TypedDictionaryArray in comparison kernels #2513 [arrow]
- Use same codebase for boolean kernels #2507 [arrow]
- Use u8 for Decimal Precision and Scale #2496 [arrow]
- Integrate skip row without pageIndex in SerializedPageReader in Fuzz Test #2475 [parquet]
- Avoid unnecessary copies in Arrow IPC reader #2437 [arrow]
- Add GenericColumnReader::skip_records Missing OffsetIndex Fallback #2433 [parquet]
- Support Reading PageIndex with ParquetRecordBatchStream #2430 [parquet]
- Specialize FixedLenByteArrayReader for Parquet #2318 [parquet]
- Make JSON support Optional via Feature Flag #2300 [arrow]
Fixed bugs:
- Casting timestamp array to string should not ignore timezone #2607 [arrow]
- Ilike_ut8_scalar kernels have incorrect logic #2544 [arrow]
- Always validate the array data when creating array in IPC reader #2541 [arrow]
- Int96Converter Truncates Timestamps #2480 [parquet]
- Error Reading Page Index When Not Available #2434 [parquet]
ParquetFileArrowReader::get_record_reader[_by_column]batch_sizeoverallocates #2321 [parquet]
Documentation updates:
Closed issues:
- Add support for CAST from
Interval(DayTime)toTimestamp(Nanosecond, None)#2606 [arrow] - Why do we check for null in TypedDictionaryArray value function #2564 [arrow]
- Add the
lengthfield forBuffer#2524 [arrow] - Avoid large over allocate buffer in async reader #2512 [parquet]
- Rewriting Decimal Builders using
const_generic. #2390 [arrow] - Rewrite Decimal Array using
const_generic#2384 [arrow]
Merged pull requests:
- Add downcast macros (#2635) #2636 [arrow] (tustvold)
- Document all arrow features in docs.rs (#2633) #2634 [arrow] (tustvold)
- Document dyn_cmp_dict #2624 [arrow] (tustvold)
- Support comparison between DictionaryArray and BooleanArray #2618 [arrow] (viirya)
- Cast timestamp array to string array with timezone #2608 [arrow] (viirya)
- Support empty projection in CSV and JSON readers #2604 [arrow] (Dandandan)
- Make JSON support optional via a feature flag (#2300) #2601 [parquet] [arrow] (tustvold)
- Support SQL-compliant NaN ordering for DictionaryArray and non-DictionaryArray #2600 [arrow] (viirya)
- Split out integration test plumbing (#2594) (#2300) #2598 [arrow] (tustvold)
- Refactor Binary Builder and String Builder Constructors #2592 [parquet] [arrow] (psvri)
- Dictionary like scalar kernels #2591 [arrow] (psvri)
- Validate dictionary key in TypedDictionaryArray (#2578) #2589 [arrow] (tustvold)
- Add max_dyn and min_dyn for max/min for dictionary array #2585 [arrow] (viirya)
- Code cleanup of array value functions #2583 [arrow] (psvri)
- Allow overriding of do_get & export useful macro #2582 [arrow-flight] (avantgardnerio)
- MINOR: Upgrade to pyo3 0.17 #2576 [arrow] (andygrove)
- Support SQL-compliant NaN behavior on eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #2570 [arrow] (viirya)
- Add sum_dyn to calculate sum for dictionary array #2566 [arrow] (viirya)
- struct UnionBuilder will create child buffers with capacity #2560 [arrow] (kastolars)
- Don't panic on RleValueEncoder::flush_buffer if empty (#2558) #2559 [parquet] (tustvold)
- Add the
lengthfield for Buffer and use moreBufferin IPC reader to avoid memory copy. #2557 [arrow] [arrow-flight] (HaoYang670) - Add test for float nan comparison #2555 [arrow] (viirya)
- Compare dictionary array with string array #2549 [arrow] (viirya)
- Always validate the array data (except the
Decimal) when creating array in IPC reader #2547 [arrow] (HaoYang670) - MINOR: Fix test_row_type_validation test #2546 [arrow] (viirya)
- Fix ilike_utf8_scalar kernels #2545 [arrow] (psvri)
- fix typo #2540 (00Masato)
- Compare dictionary array and primitive array in lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn kernels #2539 [arrow] (viirya)
- [MINOR]Avoid large over allocate buffer in async reader #2537 [parquet] (Ted-Jiang)
- Compare dictionary with primitive array in
eq_dynandneq_dyn#2533 [arrow] (viirya) - Add iterator for FixedSizeBinaryArray #2531 [arrow] (tustvold)
- add bench: decimal with byte array and fixed length byte array #2529 [parquet] (liukun4515)
- Add FixedLengthByteArrayReader Remove ComplexObjectArrayReader #2528 [parquet] (tustvold)
- Split out byte array decoders (#2318) #2527 [parquet] (tustvold)
- Use offset index in ParquetRecordBatchStream #2526 [parquet] (thinkharderdev)
- Clean the
create_arrayin IPC reader. #2525 [arrow] (HaoYang670) - Remove DecimalByteArrayConvert (#2480) #2522 [parquet] (tustvold)
- Improve performance of
%pat%(>3x speedup) #2521 [arrow] (Dandandan) - remove len field from MapBuilder #2520 [arrow] (psvri)
- Replace macro with TypedDictionaryArray in comparison kernels #2514 [arrow] (viirya)
- Avoid large over allocate buffer in sync reader #2511 [parquet] (Ted-Jiang)
- Avoid useless memory copies in IPC reader. #2510 [arrow] (HaoYang670)
- Refactor boolean kernels to use same codebase #2508 [arrow] (viirya)
- Remove Int96Converter (#2480) #2481 [parquet] (tustvold)
21.0.0 (2022-08-18)
Breaking changes:
- Return structured
ColumnCloseResult(#2465) #2466 [parquet] (tustvold) - Push
ChunkReaderintoSerializedPageReader(#2463) #2464 [parquet] (tustvold) - Revise FromIterator for Decimal128Array to use Into instead of Borrow #2442 [parquet] [arrow] (viirya)
- Use Fixed-Length Array in BasicDecimal new and raw_value #2405 [arrow] (HaoYang670)
- Remove deprecated ParquetWriter #2380 [parquet] (tustvold)
- Remove deprecated SliceableCursor and InMemoryWriteableCursor #2378 [parquet] (tustvold)
Implemented enhancements:
- add into_inner method to ArrowWriter #2491 [parquet]
- Remove byteorder dependency #2472 [parquet]
- Return Structured ColumnCloseResult from GenericColumnWriter::close #2465 [parquet]
- Push
ChunkReaderintoSerializedPageReader#2463 [parquet] - Support SerializedPageReader::skip_page without OffsetIndex #2459 [parquet]
- Support Time64/Time32 comparison #2457 [arrow]
- Revise FromIterator for Decimal128Array to use Into instead of Borrow #2441 [parquet]
- Support
RowFilterwithinParquetRecordBatchReader#2431 [parquet] - Remove the field
StructBuilder::len#2429 [arrow] - Standardize creation and configuration of parquet --> Arrow readers (
ParquetRecordBatchReaderBuilder) #2427 [parquet] - Use
OffsetIndexto Prune IO inParquetRecordBatchStream#2426 [parquet] - Support
peek_next_pageandskip_next_pageinInMemoryPageReader#2406 [parquet] - Support casting from
Utf8/LargeUtf8toBinary/LargeBinary#2402 [arrow] - Support casting between
Decimal128andDecimal256arrays #2375 [arrow] - Combine multiple selections into the same batch size in
skip_records#2358 [parquet] - Add API to change timezone for timestamp array #2346 [arrow]
- Change the output of
read_bufferArrow IPC API to returnResult<_>#2342 [arrow] - Allow
skip_recordsinGenericColumnReaderto skip across row groups #2331 [parquet] - Optimize the validation of
Decimal256#2320 [arrow] - Implement Skip for
DeltaBitPackDecoder#2281 [parquet] - Changes to
ParquetRecordBatchStreamto support row filtering in DataFusion #2270 [parquet] - Add
ArrayReader::skip_recordsAPI #2197 [parquet]
Fixed bugs:
- Panic in SerializedPageReader without offset index #2503 [parquet]
- MapArray columns don't handle null values correctly #2484 [arrow]
- There is no compiler error when using an invalid Decimal type. #2440 [arrow]
- Flight SQL Server sends incorrect response for
DoPutUpdateResult#2403 [arrow-flight] AsyncFileReaderNo Longer Object-Safe #2372 [parquet]- StructBuilder Does not Verify Child Lengths #2252 [arrow]
Closed issues:
Merged pull requests:
- Fix bug in page skipping #2504 [parquet] (thinkharderdev)
- Fix
MapArrayReader(#2484) (#1699) (#1561) #2500 [parquet] (tustvold) - Add API to Retrieve Finished Writer from Parquet Writer #2498 [parquet] (jiacai2050)
- Derive Copy,Clone for BasicDecimal #2495 [arrow] (tustvold)
- remove byteorder dependency from parquet #2486 [parquet] (psvri)
- parquet-read: add support to read parquet data from stdin #2482 [parquet] (nvartolomei)
- Remove Position trait (#1163) #2479 [parquet] (tustvold)
- Add ChunkReader::get_bytes #2478 [parquet] (tustvold)
- RFC: Simplify decimal (#2440) #2477 [arrow] (tustvold)
- Use Parquet OffsetIndex to prune IO with RowSelection #2473 [parquet] (thinkharderdev)
- Remove unnecessary Option from Int96 #2471 [parquet] (tustvold)
- remove len field from StructBuilder #2468 [arrow] (psvri)
- Make Parquet reader filter APIs public (#1792) #2467 [parquet] (tustvold)
- enable ipc compression feature for integration test #2462 (liukun4515)
- Simplify implementation of Schema #2461 [arrow] (HaoYang670)
- Support skip_page missing OffsetIndex Fallback in SerializedPageReader #2460 [parquet] (Ted-Jiang)
- support time32/time64 comparison #2458 [arrow] (waitingkuo)
- Utf8array casting #2456 [arrow] (psvri)
- Remove outdated license text #2455 (alamb)
- Support RowFilter within ParquetRecordBatchReader (#2431) #2452 [parquet] (tustvold)
- benchmark: decimal builder and vec to decimal array #2450 [arrow] (liukun4515)
- Collocate Decimal Array Validation Logic #2446 [arrow] (liukun4515)
- Minor: Move From trait for Decimal256 impl to decimal.rs #2443 [arrow] (liukun4515)
- decimal benchmark: arrow reader decimal from parquet int32 and int64 #2438 [parquet] (liukun4515)
- MINOR: Simplify
split_secondfunction #2436 [arrow] (viirya) - Add ParquetRecordBatchReaderBuilder (#2427) #2435 [parquet] (tustvold)
- refactor: refine validation for decimal128 array #2428 [arrow] (liukun4515)
- Benchmark of casting decimal arrays #2424 [arrow] (viirya)
- Test non-annotated repeated fields (#2394) #2422 [parquet] (tustvold)
- Fix #2416 Automatic version updates for github actions with dependabot #2417 (iemejia)
- Add validation logic for StructBuilder::finish #2413 [arrow] (psvri)
- test: add test for reading decimal value from primitive array reader #2411 [parquet] (liukun4515)
- Upgrade ahash to 0.8 #2410 [parquet] [arrow] (Dandandan)
- Support peek_next_page and skip_next_page in InMemoryPageReader #2407 [parquet] (Ted-Jiang)
- Fix DoPutUpdateResult #2404 [arrow-flight] (avantgardnerio)
- Implement Skip for DeltaBitPackDecoder #2393 [parquet] (Ted-Jiang)
- fix: Don't instantiate the scalar composition code quadratically for dictionaries #2391 [arrow] (Marwes)
- MINOR: Remove unused trait and some cleanup #2389 [arrow] (viirya)
- Decouple parquet fuzz tests from converter (#1661) #2386 [parquet] (tustvold)
- Rewrite
DecimalandDecimalArrayusingconst_generic#2383 [parquet] [arrow] (HaoYang670) - Simplify BitReader (~5-10% faster) #2381 [parquet] (tustvold)
- Fix parquet clippy lints (#1254) #2377 [parquet] (tustvold)
- Cast between
Decimal128andDecimal256arrays #2376 [arrow] (viirya) - support compression for IPC with revamped feature flags #2369 [arrow] (alamb)
- Implement AsyncFileReader for
Box<dyn AsyncFileReader>#2368 [parquet] (tustvold) - Remove get_byte_ranges where bound #2366 [parquet] (tustvold)
- refactor: Make read_num_bytes a function instead of a macro #2364 [parquet] (Marwes)
- refactor: Group metrics into page and column metrics structs #2363 [parquet] (Marwes)
- Speed up
Decimal256validation based on bytes comparison and add benchmark test #2360 [parquet] [arrow] (liukun4515) - Combine multiple selections into the same batch size in skip_records #2359 [parquet] (Ted-Jiang)
- Add API to change timezone for timestamp array #2347 [arrow] (viirya)
- Clean the code in
field.rsand add more tests #2345 [arrow] (HaoYang670) - Add Parquet RowFilter API #2335 [parquet] (tustvold)
- Make skip_records in complex_object_array can skip cross row groups #2332 [parquet] (Ted-Jiang)
- Integrate Record Skipping into Column Reader Fuzz Test #2315 [parquet] (Ted-Jiang)
20.0.0 (2022-08-05)
Breaking changes:
- Add more const evaluation for
GenericBinaryArrayandGenericListArray: addPREFIXand data type constructor #2327 [parquet] [arrow] (HaoYang670) - Make FFI support optional, change APIs to be
safe(#2302) #2303 [arrow] (tustvold) - Remove
test_utilsfrom default features (#2298) #2299 [arrow] (tustvold) - Rename
DataType::DecimaltoDataType::Decimal128#2229 [parquet] [arrow] (viirya) - Add
Decimal128IterandDecimal256Iterand do maximum precision/scale check #2140 [arrow] (viirya)
Implemented enhancements:
- Add the constant data type constructors for
ListArray#2311 [arrow] - Update
FlightSqlServicetrait to pass session info along #2308 [arrow-flight] - Optimize
take_bitsfor non-null indices #2306 [arrow] - Make FFI support optional via Feature Flag
ffi#2302 [arrow] - Mark
ffi::ArrowArray::try_newis safe #2301 [arrow] - Remove test_utils from default arrow-rs features #2298 [arrow]
- Remove
JsonEqualtrait #2296 [arrow] - Move
with_precision_and_scaletoDecimalarray traits #2291 [arrow] - Improve readability and maybe performance of string --> numeric/time/date/timetamp cast kernels #2285 [arrow]
- Add vectorized unpacking for 8, 16, and 64 bit integers #2276 [parquet]
- Use initial capacity for interner hashmap #2273 [arrow]
- Impl FromIterator for Decimal256Array #2248 [arrow]
- Separate
ArrayReader::next_batchwithArrayReader::read_recordsandArrayReader::consume_batch#2236 [parquet] - Rename
DataType::DecimaltoDataType::Decimal128#2228 [arrow] - Automatically Grow Parquet BitWriter Buffer #2226 [parquet]
- Add
append_optionsupport toDecimal128BuilderandDecimal256Builder#2224 [arrow] - Split the
FixedSizeBinaryArrayandFixedSizeListArrayfromarray_binary.rsandarray_list.rs#2217 [arrow] - Don't
BoxValues inPrimitiveDictionaryBuilder#2215 [arrow] - Use BitChunks in equal_bits #2186 [arrow]
- Implement
HashforSchema#2182 [arrow] - read decimal data type from parquet file with binary physical type #2159 [parquet]
- The
GenericStringBuildershould useGenericBinaryBuilder#2156 [arrow] - Update Rust version to 1.62 #2143 [parquet] [arrow] [arrow-flight]
- Check precision and scale against maximum value when constructing
Decimal128andDecimal256#2139 [arrow] - Use
ArrayAccessorinDecimal128IterandDecimal256Iter#2138 [arrow] - Use
ArrayAccessorandFromIteratorin Cast Kernels #2137 [arrow] - Add
TypedDictionaryArrayfor more ergonomic interaction withDictionaryArray#2136 [arrow] - Use
ArrayAccessorin Comparison Kernels #2135 [arrow] - Support
peek_next_page()and skip_next_pageinInMemoryColumnChunkReader#2129 [parquet] - Lazily materialize the null buffer builder for all array builders. #2125 [arrow]
- Do value validation for
Decimal256#2112 [arrow] - Support
skip_def_levelsforColumnLevelDecoder#2107 [parquet] - Add integration test for scan rows with selection #2106 [parquet]
- Support for casting from Utf8/String to
Time32/Time64#2053 [arrow] - Update prost and tonic related crates #2268 [arrow-flight] (carols10cents)
Fixed bugs:
- temporal conversion functions cannot work on negative input properly #2325 [arrow]
- IPC writer should truncate string array with all empty string #2312 [arrow]
- Error order for comparing
Decimal128orDecimal256#2256 [arrow] - Fix maximum and minimum for decimal values for precision greater than 38 #2246 [arrow]
IntervalMonthDayNanoType::make_value()does not match C implementation #2234 [arrow]FlightSqlServicetrait does not allowimpls to do handshake #2210 [arrow-flight]EnabledStatistics::Nonenot working #2185 [parquet]- Boolean ArrayData Equality Incorrect Slice Handling #2184 [arrow]
- Publicly export MapFieldNames #2118 [arrow]
Documentation updates:
- Update instructions on How to join the slack #arrow-rust channel -- or maybe try to switch to discord?? #2192
- [Minor] Improve arrow and parquet READMEs, document parquet feature flags #2324 [parquet] [arrow] (alamb)
Performance improvements:
- Improve speed of writing string dictionaries to parquet by skipping a copy(#1764) #2322 [parquet] [arrow] (tustvold)
Closed issues:
- Fix wrong logic in calculate_row_count when skipping values #2328 [parquet]
- Support filter for parquet data type #2126 [parquet]
- Make skip value in ByteArrayDecoderDictionary avoid decoding #2088 [parquet]
Merged pull requests:
- fix: Fix skip error in calculate_row_count. #2329 [parquet] (Ted-Jiang)
- temporal conversion functions should work on negative input properly #2326 [arrow] (viirya)
- Increase DeltaBitPackEncoder miniblock size to 64 for 64-bit integers (#2282) #2319 [parquet] (tustvold)
- Remove JsonEqual #2317 [parquet] [arrow] (viirya)
- fix: IPC writer should truncate string array with all empty string #2314 [arrow] (JasonLi-cn)
- Pass pull
Request<FlightDescriptor>toFlightSqlServiceimpls #2309 [parquet] [arrow-flight] (avantgardnerio) - Speedup take_boolean / take_bits for non-null indices (~4 - 5x speedup) #2307 [arrow] (Dandandan)
- Add typed dictionary (#2136) #2297 [arrow] (tustvold)
- [Minor] Improve types shown in cast error messages #2295 [arrow] (alamb)
- Move
with_precision_and_scaletoBasicDecimalArraytrait #2292 [parquet] [arrow] (viirya) - Replace the
fn get_data_typebyconst DATA_TYPEin BinaryArray and StringArray #2289 [arrow] (HaoYang670) - Clean up string casts and improve performance #2284 [arrow] (alamb)
- [Minor] Add tests for temporal cast error paths #2283 [arrow] (alamb)
- Add unpack8, unpack16, unpack64 (#2276) ~10-50% faster #2278 [parquet] (tustvold)
- Fix bugs in the
from_listfunction. #2277 [arrow] (HaoYang670) - fix: use signed comparator to compare decimal128 and decimal256 #2275 [arrow] (liukun4515)
- Use initial capacity for interner hashmap #2272 [parquet] (Dandandan)
- Remove fallibility from paruqet RleEncoder (#2226) #2259 [parquet] (tustvold)
- Fix escaped like wildcards in
like_utf8/nlike_utf8kernels #2258 [arrow] (daniel-martinez-maqueda-sap) - Add tests for reading nested decimal arrays from parquet #2254 [parquet] (tustvold)
- feat: Implement string cast operations for Time32 and Time64 #2251 [arrow] (stuartcarnie)
- move
FixedSizeListtoarray_fixed_size_list.rs#2250 [arrow] (HaoYang670) - Impl FromIterator for Decimal256Array #2247 [arrow] (viirya)
- Fix max and min value for decimal precision greater than 38 #2245 [arrow] (viirya)
- Make
Schema::fieldsandSchema::metadatapub(public) #2239 [arrow] (alamb) - [Minor] Improve Schema metadata mismatch error #2238 [arrow] (alamb)
- Separate ArrayReader::next_batch with read_records and consume_batch #2237 [parquet] (Ted-Jiang)
- Update
IntervalMonthDayNanoType::make_value()to conform to specifications #2235 [arrow] (avantgardnerio) - Disable value validation for Decimal256 case #2232 [arrow] (viirya)
- Automatically grow parquet BitWriter (#2226) (~10% faster) #2231 [parquet] (tustvold)
- Only trigger
arrowCI on changes to arrow #2227 (alamb) - Add append_option support to decimal builders #2225 [arrow] (bphillips-exos)
- Optimized writing of byte array to parquet (#1764) (2x faster) #2221 [parquet] (tustvold)
- Increase test coverage of ArrowWriter #2220 [parquet] (tustvold)
- Update instructions on how to join the Slack channel #2219 (HaoYang670)
- Move
FixedSizeBinaryArraytoarray_fixed_size_binary.rs#2218 [arrow] (HaoYang670) - Avoid boxing in PrimitiveDictionaryBuilder #2216 [arrow] (tustvold)
- remove redundant CI benchmark check, cleanups #2212 [parquet] (alamb)
- Update
FlightSqlServicetrait to proxy handshake #2211 [arrow-flight] (avantgardnerio) - parquet: export json api with
serde_jsonfeature name #2209 [parquet] (flisky) - Cleanup record skipping logic and tests (#2158) #2199 [parquet] (tustvold)
- Use BitChunks in equal_bits #2194 [arrow] (tustvold)
- Fix disabling parquet statistics (#2185) #2191 [parquet] (tustvold)
- Change CI names to match crate names #2189 (alamb)
- Fix offset handling in boolean_equal (#2184) #2187 [arrow] (tustvold)
- Implement
HashforSchema#2183 [arrow] (crepererum) - Let the
StringBuilderuseBinaryBuilder#2181 [arrow] (HaoYang670) - Use ArrayAccessor and FromIterator in Cast Kernels #2169 [arrow] (viirya)
- Split most arrow specific CI checks into their own workflows (reduce common CI time to 21 minutes) #2168 (alamb)
- Remove another attempt to cache target directory in action.yaml #2167 (alamb)
- Run actions on push to master, pull requests #2166 (alamb)
- Break parquet_derive and arrow_flight tests into their own workflows #2165 (alamb)
- [minor] use type aliases refine code. #2161 [parquet] (Ted-Jiang)
- parquet reader: Support reading decimals from parquet
BYTE_ARRAYtype #2160 [parquet] (liukun4515) - Add integration test for scan rows with selection #2158 [parquet] (Ted-Jiang)
- Use ArrayAccessor in Comparison Kernels #2157 [arrow] (viirya)
- Implement
peek\_next\_pageandskip\_next\_pagefor `InMemoryColumnCh… #2155 [parquet] (thinkharderdev) - Avoid decoding unneeded values in ByteArrayDecoderDictionary #2154 [parquet] (thinkharderdev)
- Only run integration tests when
arrowchanges #2152 (alamb) - Break out docs CI job to its own github action #2151 (alamb)
- Do not pretend to cache rust build artifacts, speed up CI by ~20% #2150 (alamb)
- Update rust version to 1.62 #2144 [parquet] [arrow] [arrow-flight] (Ted-Jiang)
- Make MapFieldNames public (#2118) #2134 [arrow] (tustvold)
- Add ArrayAccessor trait, remove duplication in array iterators (#1948) #2133 [arrow] (tustvold)
- Lazily materialize the null buffer builder for all array builders. #2127 [arrow] (HaoYang670)
- Faster parquet DictEncoder (~20%) #2123 [parquet] (tustvold)
- Add validation for Decimal256 #2113 [arrow] (viirya)
- Support skip_def_levels for ColumnLevelDecoder #2111 [parquet] (Ted-Jiang)
- Donate
object_storecode from object_store_rs to arrow-rs #2081 (alamb) - Improve
validate_utf8performance #2048 [arrow] (tfeda)
19.0.0 (2022-07-22)
Breaking changes:
- Rename
DecimalArray``/DecimalBuildertoDecimal128Array/Decimal128Builder#2101 [arrow] - Change builder
appendmethods to be infallible where possible #2103 [parquet] [arrow] (jhorstmann) - Return reference from
UnionArray::child(#2035) #2099 [arrow] (tustvold) - Remove
preserve_orderfeature fromserde_jsondependency (#2095) #2098 [parquet] [arrow] (tustvold) - Rename
weekdayandweekday0kernels to tonum_days_from_mondayandnum_days_since_sunday#2066 [arrow] (alamb) - Remove
null_countfromwrite_batch_with_statistics#2047 [parquet] (tustvold)
Implemented enhancements:
- Use
total_cmpfrom std #2130 [arrow] - Permit parallel fetching of column chunks in
ParquetRecordBatchStream#2110 [parquet] - The
GenericBinaryBuildershould use buffer builders directly. #2104 [arrow] - Pass
generate_decimal256_casearrow integration test #2093 [arrow] - Rename
weekdayandweekday0kernels to tonum_days_from_mondayanddays_since_sunday#2065 [arrow] - Improve performance of
filter_dict#2062 [arrow] - Improve performance of
set_bits#2060 [arrow] - Lazily materialize the null buffer builder of
BooleanBuilder#2058 [arrow] BooleanArray::from_itershould omit validity buffer if all values are valid #2055 [arrow]- FFI_ArrowSchema should set
DICTIONARY_ORDEREDflag if a field's dictionary is ordered #2049 [arrow] - Support
peek_next_page()andskip_next_pageinSerializedPageReader#2043 [parquet] - Support FFI / C Data Interface for
MapType#2037 [arrow] - The
DecimalArrayBuildershould useFixedSizedBinaryBuilder#2026 [arrow] - Enable
serialized_readerread specific Page by passing row ranges. #1976 [parquet]
Fixed bugs:
type_idandvalue_offsetare incorrect for slicedUnionArray#2086 [arrow]- Boolean
takekernel does not handle null indices correctly #2057 [arrow] - Don't double-count nulls in
write_batch_with_statistics#2046 [parquet] - Parquet Writer Ignores Statistics specification in
WriterProperties#2014 [parquet]
Documentation updates:
Closed issues:
- Why does
serde_jsonspecify thepreserve_orderfeature inarrowpackage #2095 [arrow] - Support
skip_valuesin DictionaryDecoder #2079 [parquet] - Support skip_values in ColumnValueDecoderImpl #2078 [parquet]
- Support
skip_valuesinByteArrayColumnValueDecoder#2072 [parquet] - Several
Builder::appendmethods returning results even though they are infallible #2071 - Improve formatting of logical plans containing subqueries #2059
- Return reference from
UnionArray::child#2035 - support write page index #1777 [parquet]
Merged pull requests:
- Use
total_cmpfrom std #2131 [arrow] (Dandandan) - fix clippy #2124 (alamb)
- Fix logical merge conflict:
matcharms have incompatible types #2121 (alamb) - Update
GenericBinaryBuilderto use buffer builders directly. #2117 [arrow] (HaoYang670) - Simplify null mask preservation in parquet reader #2116 [parquet] (tustvold)
- Add get_byte_ranges method to AsyncFileReader trait #2115 [parquet] (thinkharderdev)
- add test for skip_values in DictionaryDecoder and fix it #2105 [parquet] (Ted-Jiang)
- Define Decimal128Builder and Decimal128Array #2102 [parquet] [arrow] (viirya)
- Support skip_values in DictionaryDecoder #2100 [parquet] (thinkharderdev)
- Pass generate_decimal256_case integration test, add
DataType::Decimal256#2094 [parquet] [arrow] (viirya) DecimalBuildershould useFixedSizeBinaryBuilder#2092 [arrow] (HaoYang670)- Array writer indirection #2091 [parquet] (tustvold)
- Remove doc hidden from GenericColumnReader #2090 [parquet] (tustvold)
- Support skip_values in ColumnValueDecoderImpl #2089 [parquet] (thinkharderdev)
- type_id and value_offset are incorrect for sliced UnionArray #2087 [arrow] (viirya)
- Add IPC truncation test case for StructArray #2083 [arrow] (viirya)
- Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077 [arrow] (jhorstmann)
- Support skip_values in ByteArrayColumnValueDecoder #2076 [parquet] (Ted-Jiang)
- Lazily materialize the null buffer builder of boolean builder #2073 [arrow] (HaoYang670)
- Fix windows CI (#2069) #2070 (tustvold)
- Test utf8_validation checks char boundaries #2068 [arrow] (tustvold)
- feat(compute): Support doy (day of year) for temporal #2067 [arrow] (ovr)
- Support nullable indices in boolean take kernel and some optimizations #2064 [arrow] (jhorstmann)
- Improve performance of filter_dict #2063 [arrow] (viirya)
- Ignore null buffer when creating ArrayData if null count is zero #2056 [arrow] (jhorstmann)
- feat(compute): Support week0 (PostgreSQL behaviour) for temporal #2052 [arrow] (ovr)
- Set DICTIONARY_ORDERED flag for FFI_ArrowSchema #2050 [arrow] (viirya)
- Generify parquet write path (#1764) #2045 [parquet] (tustvold)
- Support peek_next_page() and skip_next_page in serialized_reader. #2044 [parquet] (Ted-Jiang)
- Support MapType in FFI #2042 [arrow] (viirya)
- Add support of converting
FixedSizeBinaryArraytoDecimalArray#2041 [arrow] (HaoYang670) - Truncate IPC record batch #2040 [arrow] (viirya)
- Refine the List builder #2034 [arrow] (HaoYang670)
- Add more tests of RecordReader Batch Size Edge Cases (#2025) #2032 [parquet] (tustvold)
- Add support for adding intervals to dates #2031 [arrow] (avantgardnerio)
18.0.0 (2022-07-08)
Breaking changes:
- Fix several bugs in parquet writer statistics generation, add
EnabledStatisticsto control level of statistics generated #2022 [parquet] (tustvold) - Add page index reader test for all types and support empty index. #2012 [parquet] (Ted-Jiang)
- Add
Decimal256BuilderandDecimal256Array; Decimal arrays now implementBasicDecimalArraytrait #2000 [parquet] [arrow] (viirya) - Simplify
ColumnReader::read_batch#1995 [parquet] [arrow] (tustvold) - Remove
PrimitiveBuilder::finish_dict(#1978) #1980 [arrow] (tustvold) - Disallow cast from other datatypes to
NullType#1942 [arrow] (liukun4515) - Add column index writer for parquet #1935 [parquet] (liukun4515)
Implemented enhancements:
- Add
DataType::Dictionarysupport tosubtract_scalar,multiply_scalar,divide_scalar#2019 [arrow] - Support DictionaryArray in
add_scalarkernel #2017 [arrow] - Enable column page index read test for all types #2010 [parquet]
- Simplify
FixedSizeBinaryBuilder#2007 [arrow] - Support
Decimal256BuilderandDecimal256Array#1999 [arrow] - Support
DictionaryArrayinunarykernel #1989 [arrow] - Add kernel to quickly compute comparisons on
Arrays #1987 [arrow] - Support
DictionaryArrayindividekernel #1982 [arrow] - Implement
Into<ArrayData>forT: Array#1979 [arrow] - Support
DictionaryArrayinmultiplykernel #1972 [arrow] - Support
DictionaryArrayinsubtractkernel #1970 [arrow] - Declare
DecimalArray::lengthas a constant #1967 [arrow] - Support
DictionaryArrayinaddkernel #1950 [arrow] - Add builder style methods to
Field#1934 [arrow] - Make
StringDictionaryBuilderfaster #1851 [arrow] concat_elements_utf8should accept arbitrary number of input arrays #1748 [arrow]
Fixed bugs:
- Array reader for list columns fails to decode if batches fall on row group boundaries #2025 [parquet]
ColumnWriterImpl::write_batch_with_statisticsincorrect distinct count in statistics #2016 [parquet]ColumnWriterImpl::write_batch_with_statisticscan write incorrect page statistics #2015 [parquet]RowFormatteris not part of the public api #2008 [parquet]- Infinite Loop possible in
ColumnReader::read_batchFor Corrupted Files #1997 [parquet] PrimitiveBuilder::finish_dictdoes not validate dictionary offsets #1978 [arrow]- Incorrect
n_buffersinFFI_ArrowArray#1959 [arrow] DecimalArray::from_fixed_size_list_arrayfails whenoffset > 0#1958 [arrow]- Incorrect (but ignored) metadata written after ColumnChunk #1946 [parquet]
Send+Syncimpl forAllocationmay not be sound unlessAllocationisSend+Syncas well #1944 [arrow]- Disallow cast from other datatypes to
NullType#1923 [arrow]
Documentation updates:
Closed issues:
- Column chunk statistics of
min_bytesandmax_bytesreturn wrong size #2021 [parquet] - [Discussion] Refactor the
Decimals by using constant generic. #2001 - Move
DecimalArrayto a new file #1985 [arrow] - Support
DictionaryArrayinmultiplykernel #1974 - close function instead of mutable reference #1969 [parquet]
- Incorrect
null_countof DictionaryArray #1962 [arrow] - Support multi diskRanges for ChunkReader #1955 [parquet]
- Persisting Arrow timestamps with Parquet produces missing
TIMESTAMPin schema #1920 [parquet] - Separate get_next_page_header from get_next_page in PageReader #1834 [parquet]
Merged pull requests:
- Consistent case in Index enumeration #2029 [parquet] (tustvold)
- Fix record delimiting on row group boundaries (#2025) #2027 [parquet] (tustvold)
- Add builder style APIs For
Field:with_name,with_data_typeandwith_nullable#2024 [arrow] (alamb) - Add dictionary support to subtract_scalar, multiply_scalar, divide_scalar #2020 [arrow] (viirya)
- Support DictionaryArray in add_scalar kernel #2018 [arrow] (viirya)
- Refine the
FixedSizeBinaryBuilder#2013 [arrow] (HaoYang670) - Add RowFormatter to record public API #2009 [parquet] (FabioBatSilva)
- Fix parquet test_common feature flags #2003 [parquet] (tustvold)
- Stub out Skip Records API (#1792) #1998 [parquet] [arrow-flight] (tustvold)
- Implement
Into<ArrayData>forT: Array#1992 [parquet] [arrow] (heyrutvik) - Add unary_cmp #1991 [arrow] (viirya)
- Support DictionaryArray in unary kernel #1990 [arrow] (viirya)
- Refine
FixedSizeListBuilder#1988 [arrow] (HaoYang670) - Move
DecimalArrayto array_decimal.rs #1986 [arrow] (HaoYang670) - MINOR: Fix clippy error after updating rust toolchain #1984 [parquet] [arrow] [arrow-flight] (viirya)
- Support dictionary array for divide kernel #1983 [arrow] (viirya)
- Support dictionary array for subtract and multiply kernel #1971 [arrow] (viirya)
- Declare the value_length of decimal array as a
const#1968 [arrow] (HaoYang670) - Fix the behavior of
from_fixed_size_listwhen offset > 0 #1964 [arrow] (HaoYang670) - Calculate n_buffers in FFI_ArrowArray by data layout #1960 [arrow] (viirya)
- Fix the doc of
FixedSizeListArray::value_length#1957 [arrow] (HaoYang670) - Use InMemoryColumnChunkReader (~20% faster) #1956 [parquet] (tustvold)
- Unpin clap (#1867) #1954 [parquet] (tustvold)
- Set is_adjusted_to_utc if any timezone set (#1932) #1953 [parquet] [arrow] (tustvold)
- Add add_dyn for DictionaryArray support #1951 [arrow] (viirya)
- write
ColumnMetadataafter the column chunk data, not theColumnChunk#1947 [parquet] (liukun4515) - Require Send+Sync bounds for Allocation trait #1945 [arrow] (jhorstmann)
- Faster StringDictionaryBuilder (~60% faster) (#1851) #1861 [arrow] (tustvold)
- Arbitrary size concat elements utf8 #1787 [arrow] (Ismail-Maj)
17.0.0 (2022-06-24)
Breaking changes:
- Add validation to
RecordBatchfor non-nullable fields containing null values #1890 [arrow] (andygrove) - Rename
ArrayData::validate_dict_offsetstoArrayData::validate_values#1889 [arrow] (frolovdev) - Add
Decimal128API and use it in DecimalArray and DecimalBuilder #1871 [parquet] [arrow] (viirya) - Mark typed buffer APIs
safe(#996) (#1027) #1866 [parquet] [arrow] (tustvold)
Implemented enhancements:
- add a small doc example showing
ArrowWriterbeing used with a cursor #1927 [parquet] - Support
castto/fromNULLandDataType::Decimal#1921 [arrow] - Add
Decimal256API #1913 [arrow] - Add
DictionaryArray::keyfunction #1911 [arrow] - Support specifying capacities for
ListArraysinMutableArrayData#1884 [arrow] - Explicitly declare the features used for each dependency #1876 [parquet] [arrow] [arrow-flight]
- Add Decimal128 API and use it in DecimalArray and DecimalBuilder #1870 [arrow]
PrimitiveArray::from_itershould omit validity buffer if all values are valid #1856 [arrow]- Add
from(v: Vec<Option<&[u8]>>)andfrom(v: Vec<&[u8]>)forFixedSizedBInaryArray#1852 [arrow] - Add
Vec-inspired APIs toBufferBuilder#1850 [arrow] - PyArrow integration test for C Stream Interface #1847 [arrow]
- Add
nilikesupport incomparison#1845 [arrow] - Split up
arrow::array::buildermodule #1843 [arrow] - Add
quartersupport intemporalkernels #1835 [arrow] - Rename
ArrayData::validate_dictionary_offsettoArrayData::validate_values#1812 [arrow] - Clean up the testing code for
substringkernel #1801 [arrow] - Speed up
substring_by_charkernel #1800 [arrow]
Fixed bugs:
- unable to write parquet file with UTC timestamp #1932 [parquet]
- Incorrect max and min decimals #1916 [arrow]
dynamic_typesexample does not print the projection #1902 [arrow]log2(0)panicked at'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5#1901 [parquet]- Final slicing in
combine_option_bitmapneeds to use bit slices #1899 [arrow] - Dictionary IPC writer writes incorrect schema #1892 [arrow]
- Creating a
RecordBatchwith null values in non-nullable fields does not cause an error #1888 [arrow] - Upgrade
regexdependency #1874 [arrow] - Miri reports leaks in ffi tests #1872 [arrow]
- AVX512 + simd binary and/or kernels slower than autovectorized version #1829 [arrow]
Documentation updates:
- Blog post about arrow 10.0.0 - 16.0.0 #1808
- Add README for the compute module. #1940 [arrow] (HaoYang670)
- minor: clarify docstring on
DictionaryArray::lookup_key#1910 [arrow] (alamb) - minor: add a diagram to docstring for DictionaryArray #1909 [arrow] (alamb)
- Closes #1902: Print the original and projected RecordBatch in dynamic_types example #1903 [arrow] (martin-g)
Closed issues:
Merged pull requests:
- Set adjusted to UTC if UTC timezone (#1932) #1937 [parquet] (tustvold)
- Split up parquet::arrow::array_reader (#1483) #1933 [parquet] (tustvold)
- Add ArrowWriter doctest (#1927) #1930 [parquet] (tustvold)
- Update indexmap dependency #1929 [arrow] (tustvold)
- Complete and fixup split of
arrow::array::buildermodule (#1843) #1928 [arrow] (tustvold) - MINOR: Replace
checked_add/sub().unwrap()with+/-#1924 [arrow] (HaoYang670) - Support casting
NULLto/fromDecimal#1922 [arrow] (liukun4515) - Update half requirement from 1.8 to 2.0 #1919 [arrow] (dependabot[bot])
- Fix max and min decimal for max precision #1917 [arrow] (viirya)
- Add
Decimal256API #1914 [arrow] (viirya) - Add
DictionaryArray::keyfunction #1912 [arrow] (alamb) - Fix misaligned reference and logic error in crc32 #1906 [parquet] (saethlin)
- Refine the
bit_utilof Parquet. #1905 [parquet] (HaoYang670) - Use bit_slice in combine_option_bitmap #1900 [arrow] (jhorstmann)
- Issue #1876: Explicitly declare the used features for each dependency in integration_testing #1898 (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test #1897 [parquet] (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet_derive #1896 (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet #1895 [parquet] (martin-g)
- Minor: Add examples to docstring for
weekday#1894 [arrow] (alamb) - Correct nullable in read_dictionary #1893 [arrow] (viirya)
- Feature add weekday temporal kernel #1891 [arrow] (nl5887)
- Support specifying list capacities for
MutableArrayData#1885 [arrow] (jhorstmann) - Issue #1876: Explicitly declare the used features for each dependency in parquet #1881 [parquet] (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in arrow-flight #1880 [arrow-flight] (martin-g)
- Split up arrow::array::builder module (#1843) #1879 [arrow] (DaltonModlin)
- Fix memory leak in ffi test #1878 [arrow] (viirya)
- Issue #1876 - Explicitly declare the used features for each dependency #1877 [arrow] (martin-g)
- Fixes #1874 - Upgrade
regexdependency to 1.5.6 #1875 [arrow] (martin-g) - Do not print exit code from miri, instead it should be the return value of the script #1873 (jhorstmann)
- Update vendored gRPC #1869 [arrow-flight] (tustvold)
- Expose
BitSliceIteratorandBitIndexIterator(#1864) #1865 [arrow] (tustvold) - Exclude some long-running tests when running under miri #1863 [arrow] (jhorstmann)
- Add vec-inspired APIs to BufferBuilder (#1850) #1860 [arrow] (tustvold)
- Omit validity buffer in PrimitiveArray::from_iter when all values are valid #1859 [arrow] (jhorstmann)
- Add two
frommethods forFixedSizeBinaryArray#1854 [arrow] (HaoYang670) - Clean up the test code of
substringkernel. #1853 [arrow] (HaoYang670) - Add PyArrow integration test for C Stream Interface #1848 [arrow] (viirya)
- Add
nilikesupport incomparison#1846 [arrow] (MazterQyou) - MINOR: Remove version check from
test_command_help#1844 [parquet] (viirya) - Implement UnionArray FieldData using Type Erasure #1842 [arrow] (tustvold)
- Add
quartersupport intemporal#1836 [arrow] (MazterQyou) - speed up
substring_by_charby about 2.5x #1832 [arrow] (HaoYang670) - Remove simd and avx512 bitwise kernels in favor of autovectorization #1830 [arrow] (jhorstmann)
- Refactor parquet::arrow module #1827 [parquet] (tustvold)
- docs: remove experimental marker on C Stream Interface #1821 [arrow] (wjones127)
- Separate Page IO from Page Decode #1810 [parquet] (tustvold)
16.0.0 (2022-06-10)
Breaking changes:
- Seal
ArrowNativeTypeandOffsetSizeTraitfor safety (#1028) #1819 [arrow] (tustvold) - Improve API for
csv::infer_file_schemaby removing redundant ref #1776 [arrow] (tustvold)
Implemented enhancements:
- List equality method should work on empty offset
ListArray#1817 [arrow] - Command line tool for convert CSV to Parquet #1797 [parquet]
- IPC writer should write validity buffer for
UnionArrayin V4 IPC message #1793 [arrow] - Add function for row alignment with page mask #1790 [parquet]
- Rust IPC Read should be able to read V4 UnionType Array #1788 [arrow]
combine_option_bitmapshould accept arbitrary number of input arrays. #1780 [arrow]- Add
substring_by_charkernels for slicing on character boundaries #1768 [arrow] - Support reading
PageIndexfrom column metadata #1761 [parquet] - Support casting from
DataType::Utf8toDataType::Boolean#1740 [arrow] - Make current position available in
FileWriter. #1691 [parquet] - Support writing parquet to
stdout#1687 [parquet]
Fixed bugs:
- Incorrect Offset Validation for Sliced List Array Children #1814 [arrow]
- Parquet Snappy Codec overwrites Existing Data in Decompression Buffer #1806 [parquet]
flight_data_to_arrow_batchdoes not supportRecordBatches with no columns #1783 [arrow-flight]- parquet does not compile with
features=["zstd"]#1630 [parquet]
Documentation updates:
- Update arrow module docs #1840 [arrow] (tustvold)
- Update safety disclaimer #1837 [arrow] (tustvold)
- Update ballista readme link #1765 (tustvold)
- Move changelog archive to
CHANGELOG-old.md#1759 (alamb)
Closed issues:
DataType::DecimalNon-Compliant? #1779 [arrow]- Further simplify the offset validation #1770 [arrow]
- Best way to convert arrow to Rust native type #1760 [arrow]
- Why
Parquetis a part ofArrow? #1715 [parquet] [arrow]
Merged pull requests:
- Make equals_datatype method public, enabling other modules #1838 [arrow] (nl5887)
- [Minor] Clarify
PageIteratorDocumentation #1831 [parquet] (Ted-Jiang) - Update MIRI pin #1828 (tustvold)
- Change to use
resolver v2, test more feature flag combinations in CI, fix errors (#1630) #1822 [parquet] [arrow] (tustvold) - Add ScalarBuffer abstraction (#1811) #1820 [arrow] (tustvold)
- Fix list equal for empty offset list array #1818 [arrow] (viirya)
- Fix Decimal and List ArrayData Validation (#1813) (#1814) #1816 [arrow] (tustvold)
- Don't overwrite existing data on snappy decompress (#1806) #1807 [parquet] (tustvold)
- Rename
arrow/benches/string_kernels.rstoarrow/benches/substring_kernels.rs#1805 [arrow] (HaoYang670) - Add public API for decoding parquet footer #1804 [parquet] (tustvold)
- Add AsyncFileReader trait #1803 [parquet] (tustvold)
- add parquet-fromcsv (#1) #1798 [parquet] (kazuk)
- Use IPC row count info in IPC reader #1796 [arrow] (viirya)
- Fix typos in the Memory and Buffers section of the docs home #1795 [arrow] (datapythonista)
- Write validity buffer for UnionArray in V4 IPC message #1794 [arrow] (viirya)
- feat:Add function for row alignment with page mask #1791 [parquet] (Ted-Jiang)
- Read and skip validity buffer of UnionType Array for V4 ipc message #1789 [arrow] [arrow-flight] (viirya)
- Add
Substring_by_char#1784 [arrow] (HaoYang670) - Add
ParquetFileArrowReader::try_new#1782 [parquet] (tustvold) - Arbitrary size combine option bitmap #1781 [arrow] (Ismail-Maj)
- Implement
ChunkReaderforBytes, deprecateSliceableCursor#1775 [parquet] (tustvold) - Access metadata of flushed row groups on write (#1691) #1774 [parquet] (tustvold)
- Simplify ParquetFileArrowReader Metadata API #1773 [parquet] (tustvold)
- MINOR: Unpin nightly version as packed_simd releases new version #1771 (viirya)
- Update comfy-table requirement from 5.0 to 6.0 #1769 [arrow] (dependabot[bot])
- Optionally disable
validate_decimal_precisioncheck inDecimalBuilder.append_valuefor interop test #1767 [arrow] (viirya) - Minor: Clean up the code of MutableArrayData #1763 [arrow] (HaoYang670)
- Support reading PageIndex from parquet metadata, prepare for skipping pages at reading #1762 [parquet] (Ted-Jiang)
- Support casting
Utf8toBoolean#1738 [arrow] (MazterQyou)
15.0.0 (2022-05-27)
Breaking changes:
- Change
ArrayDataBuilder::null_bit_bufferto acceptOption<Buffer>rather thanBuffer#1739 [arrow] (HaoYang670) - Remove
null_countfromArrayData::try_new()#1721 [arrow] (HaoYang670) - Change parquet writers to use standard
std:io::Writerather customParquetWritertrait (#1717) (#1163) #1719 [parquet] (tustvold) - Add explicit column mask for selection in parquet:
ProjectionMask(#1701) #1716 [parquet] (tustvold) - Add type_ids in Union datatype #1703 [parquet] [arrow] (viirya)
- Fix Parquet Reader's Arrow Schema Inference #1682 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Rename the
stringkernel toconcatenate_elements#1747 [arrow] ArrayDataBuilder::null_bit_buffershould acceptOption<Buffer>as input type #1737 [arrow]- Fix schema comparison for non_canonical_map when running flight test #1730 [arrow]
- Add support in aggregate kernel for
BinaryArray#1724 [arrow] - Fix incorrect null_count in
generate_unions_caseintegration test #1712 [arrow] - Keep type ids in Union datatype to follow Arrow spec and integrate with other implementations #1690 [arrow]
- Support Reading Alternative List Representations to Arrow From Parquet #1680 [parquet]
- Speed up the offsets checking #1675 [arrow]
- Separate Parquet -> Arrow Schema Conversion From ArrayBuilder #1655 [parquet]
- Add
leaf_columnsargument toArrowReader::get_record_reader_by_columns#1653 [parquet] - Implement
string_concatkernel #1540 [arrow] - Improve Unit Test Coverage of ArrayReaderBuilder #1484 [parquet]
Fixed bugs:
- Parquet write failure (from record batches) when data is nested two levels deep #1744 [parquet]
- IPC reader may break on projection #1735 [arrow]
- Latest nightly fails to build with feature simd #1734 [arrow]
- Trying to write parquet file in parallel results in corrupt file #1717 [parquet]
- Roundtrip failure when using DELTA_BINARY_PACKED #1708 [parquet]
ArrayData::try_newcannot always return expected error. #1707 [arrow]- "out of order projection is not supported" after Fix Parquet Arrow Schema Inference #1701 [parquet]
- Rust is not interoperability with C++ for IPC schemas with dictionaries #1694 [arrow]
- Incorrect Repeated Field Schema Inference #1681 [parquet]
- Parquet Treats Embedded Arrow Schema as Authoritative #1663 [parquet]
- parquet_to_arrow_schema_by_columns Incorrectly Handles Nested Types #1654 [parquet]
- Inconsistent Arrow Schema When Projecting Nested Parquet File #1652 [parquet]
- StructArrayReader Cannot Handle Nested Lists #1651 [parquet]
- Bug (
substringkernel): The null buffer is not aligned whenoffset != 0#1639 [arrow]
Documentation updates:
- Parquet command line tool does not install "globally" #1710 [parquet]
- Improve integration test document to follow Arrow C++ repo CI #1742 [arrow] (viirya)
Merged pull requests:
- Test for list array equality with different offsets #1756 [arrow] (alamb)
- Rename
string_concattoconcat_elements_utf8#1754 [arrow] (alamb) - Rename the
stringkernel toconcat_elements. #1752 [arrow] (HaoYang670) - Support writing nested lists to parquet #1746 [parquet] (tustvold)
- Pin nightly version to bypass packed_simd build error #1743 (viirya)
- Fix projection in IPC reader #1736 [arrow] (iyupeng)
cargo installinstalls not globally #1732 [parquet] (kazuk)- Fix schema comparison for non_canonical_map when running flight test #1731 (viirya)
- Add
min_binaryandmax_binaryaggregate kernels #1725 [arrow] (HaoYang670) - Fix parquet benchmarks #1723 [parquet] (tustvold)
- Fix BitReader::get_batch zero extension (#1708) #1722 [parquet] (tustvold)
- Implementation string concat #1720 [arrow] (Ismail-Maj)
- Check the length of
null_bit_bufferinArrayData::try_new()#1714 [arrow] (HaoYang670) - Fix incorrect null_count in
generate_unions_caseintegration test #1713 [arrow] (viirya) - Fix: Null buffer accounts for
offsetinsubstringkernel. #1704 [arrow] (HaoYang670) - Minor: Refine
OffsetSizeTraitto extendnum::Integer#1702 [arrow] (HaoYang670) - Fix StructArrayReader handling nested lists (#1651) #1700 [parquet] (tustvold)
- Speed up the offsets checking #1684 [arrow] (HaoYang670)
14.0.0 (2022-05-13)
Breaking changes:
- Use
bytesin parquet rather than custom Buffer implementation (#1474) #1683 [parquet] (tustvold) - Rename
OffsetSize::fn is_largetoconst OffsetSize::IS_LARGE#1664 [parquet] [arrow] (HaoYang670) - Remove
StringOffsetTraitandBinaryOffsetTrait#1645 [arrow] (HaoYang670) - Fix
generate_nested_dictionary_caseintegration test failure #1636 [arrow] [arrow-flight] (viirya)
Implemented enhancements:
- Add support for
DataType::Durationin ffi interface #1688 [arrow] - Fix
generate_unions_caseintegration test #1676 [arrow] - Add
DictionaryArraysupport forbit_lengthkernel #1673 [arrow] - Add
DictionaryArraysupport forlengthkernel #1672 [arrow] - flight_client_scenarios integration test should receive schema from flight data #1669 [arrow]
- Unpin Flatbuffer version dependency #1667 [arrow]
- Add dictionary array support for substring function #1656 [arrow]
- Exclude dict_id and dict_is_ordered from equality comparison of
Field#1646 [arrow] - Remove
StringOffsetTraitandBinaryOffsetTrait#1644 [arrow] - Add tests and examples for
UnionArray::from(data: ArrayData)#1643 [arrow] - Add methods
pub fn offsets_buffer,pub fn types_ids_bufferandpub fn data_bufferforArrayDataBuilder#1640 [arrow] - Fix
generate_nested_dictionary_caseintegration test failure for Rust cases #1635 [arrow] - Expose
ArrowWriterrow group flush in public API #1626 [parquet] - Add
substringsupport forFixedSizeBinaryArray#1618 [arrow] - Add PrettyPrint for
UnionArrays #1594 [arrow] - Add SIMD support for the
lengthkernel #1489 [arrow] - Support dictionary arrays in length and bit_length #1674 [arrow] (viirya)
- Add dictionary array support for substring function #1665 [arrow] (sunchao)
- Add
DecimalTypesupport innew_null_array#1659 [arrow] (yjshen)
Fixed bugs:
- Docs.rs build is broken #1695
- Interoperability with C++ for IPC schemas with dictionaries #1694
UnionArray::is_nullincorrect #1625 [arrow]- Published Parquet documentation missing
arrow::async_reader#1617 [parquet] - Files written with Julia's Arrow.jl in IPC format cannot be read by arrow-rs #1335 [arrow]
Documentation updates:
- Correct arrow-flight readme version #1641 [arrow-flight] (alamb)
Closed issues:
- Make
OffsetSizeTrait::IS_LARGEas a const value #1658 - Question: Why are there 3 types of
OffsetSizeTraits? #1638 - Written Parquet file way bigger than input files #1627
- Ensure there is a single zero in the offsets buffer for an empty ListArray. #1620
- Filtering
UnionArrayChanges DataType #1595
Merged pull requests:
- Fix docs.rs build #1696 [parquet] (alamb)
- support duration in ffi #1689 [arrow] (ryan-jacobs1)
- fix bench command line options #1685 [parquet] [arrow] (kazuk)
- Enable branch protection #1679 (tustvold)
- Fix logical merge conflict in #1588 #1678 [parquet] (tustvold)
- Fix generate_unions_case for Rust case #1677 [arrow] (viirya)
- Receive schema from flight data #1670 (viirya)
- unpin flatbuffers dependency version #1668 [arrow] (Cheappie)
- Remove parquet dictionary converters (#1661) #1662 [parquet] (tustvold)
- Minor: simplify the function
GenericListArray::get_type#1650 [arrow] (HaoYang670) - Pretty Print
UnionArrays #1648 [arrow] (tfeda) - Exclude
dict_idanddict_is_orderedfrom equality comparison ofField#1647 [arrow] (viirya) - expose row-group flush in public api #1634 [parquet] (Cheappie)
- Add
substringsupport forFixedSizeBinaryArray#1633 [arrow] (HaoYang670) - Fix UnionArray is_null #1632 [arrow] (viirya)
- Do not assume dictionaries exists in footer #1631 [arrow] (pcjentsch)
- Add support for nested list arrays from parquet to arrow arrays (#993) #1588 [parquet] (tustvold)
- Add
asyncinto doc features #1349 [parquet] (HaoYang670)
13.0.0 (2022-04-29)
Breaking changes:
- Update
parquet::basic::LogicalTypeto be more idomatic #1612 [parquet] (tfeda) - Fix Null Mask Handling in
ArrayData,UnionArray, andMapArray#1589 [arrow] (tustvold) - Replace
&Option<T>withOption<&T>in severalarrowandparquetAPIs #1571 [parquet] [arrow] (tfeda)
Implemented enhancements:
- Read/write nested dictionary under fixed size list in ipc stream reader/write #1609 [arrow]
- Add support for
BinaryArrayinsubstringkernel #1593 [arrow] - Read/write nested dictionary under large list in ipc stream reader/write #1584 [arrow]
- Read/write nested dictionary under map in ipc stream reader/write #1582 [arrow]
- Implement
Clonefor JSONDecoderOptions#1580 [arrow] - Add utf-8 validation checking to
substringkernel #1575 [arrow] - Support casting to/from
DataType::Nullincastkernel #1572 [arrow] (WinkerDu)
Fixed bugs:
- Parquet schema should allow scale == precision for decimal type #1606 [parquet]
- ListArray::from(ArrayData) dereferences invalid pointer when offsets are empty #1601 [arrow]
- ArrayData Equality Incorrect Null Mask Offset Handling #1599
- Filtering UnionArray Incorrect Handles Runs #1598
- [Safety] Filtering Dense UnionArray Produces Invalid Offsets #1596
- [Safety] UnionBuilder Doesn't Check Types #1591
- Union Layout Should Not Support Separate Validity Mask #1590
- Incorrect nullable flag when reading maps ( test_read_maps fails when
force_validateis active) #1587 [parquet] - Output of
ipc::reader::tests::projection_should_workfails validation #1548 [arrow] - Incorrect min/max statistics for decimals with byte-array notation #1532
Documentation updates:
Closed issues:
- Dense UnionArray Offsets Are i32 not i8 #1597 [arrow]
- Replace
&Option<T>withOption<&T>in some APIs #1556 [parquet] [arrow] - Improve ergonomics of
parquet::basic::LogicalType#1554 [parquet] - Mark the current
substringfunction asunsafeand rename it. #1541 [arrow] - Requirements for Async Parquet API #1473 [parquet]
Merged pull requests:
- Nit: use the standard function
div_ceil#1629 [arrow] (HaoYang670) - Update flatbuffers requirement from =2.1.1 to =2.1.2 #1622 [arrow] (dependabot[bot])
- Fix decimals min max statistics #1621 [parquet] (atefsawaed)
- Add example readme #1615 [arrow] (alamb)
- Improve docs and examples links on main readme #1614 [arrow] (alamb)
- Read/Write nested dictionaries under FixedSizeList in IPC #1610 [arrow] (viirya)
- Add
substringsupport for binary #1608 [arrow] (HaoYang670) - Parquet: schema validation should allow scale == precision for decimal type #1607 [parquet] (sunchao)
- Don't access and validate offset buffer in ListArray::from(ArrayData) #1602 [arrow] (jhorstmann)
- Fix map nullable flag in
ParquetTypeConverter#1592 [parquet] (viirya) - Read/write nested dictionary under large list in ipc stream reader/writer #1585 [arrow] (viirya)
- Read/write nested dictionary under map in ipc stream reader/writer #1583 [arrow] (viirya)
- Derive
CloneandPartialEqfor jsonDecoderOptions#1581 [arrow] (alamb) - Add utf-8 validation checking for
substring#1577 [arrow] (HaoYang670) - Use
Option<T>rather thanOption<&T>for copy types in substring kernel #1576 [arrow] (tustvold) - Use littleendian arrow files for
projection_should_work#1573 [arrow] (viirya)
12.0.0 (2022-04-15)
Breaking changes:
- Add
ArrowReaderOptionstoParquetFileArrowReader, add option to skip decoding arrow metadata from parquet (#1459) #1558 [parquet] (tustvold) - Support
RecordBatchwith zero columns but non zero row count, add field toRecordBatchOptions(#1536) #1552 [arrow] (tustvold) - Consolidate JSON Reader options and
DecoderOptions#1539 [arrow] (alamb) - Update
prost,prost-deriveandprost-typesto 0.10,tonic, andtonic-buildto0.7#1510 [arrow-flight] (alamb) - Add Json
DecoderOptionsand support customformat_stringfor each field #1451 [arrow] (sum12)
Implemented enhancements:
- Read/write nested dictionary in ipc stream reader/writer #1565 [arrow]
- Support
FixedSizeBinaryin the Arrow C data interface #1553 [arrow] - Support Empty Column Projection in
ParquetRecordBatchReader#1537 [parquet] - Support
RecordBatchwith zero columns but non zero row count #1536 [arrow] - Add support for
Date32/Date64<-->String/LargeStringincastkernel #1535 [arrow] - Support creating arrays from externally owned memory like
VecorString#1516 [arrow] - Speed up the
substringkernel #1511 [arrow] - Handle Parquet Files With Inconsistent Timestamp Units #1459 [parquet]
Fixed bugs:
- Error Inferring Schema for LogicalType::UNKNOWN #1557 [parquet]
- Read dictionary from nested struct in ipc stream reader panics #1549 [arrow]
filterproduces invalid sparseUnionArrays #1547 [arrow]- Documentation for
GenericListBuilderis not exposed. #1518 [arrow] - cannot read parquet file #1515 [parquet]
- The
substringkernel panics when chars > U+0x007F #1478 [arrow] - Hang due to infinite loop when reading some parquet files with RLE encoding and bit packing #1458 [parquet]
Documentation updates:
- Improve JSON reader documentation #1559 [arrow] (alamb)
- Improve doc string for
substringkernel #1529 [arrow] (HaoYang670) - Expose documentation of
GenericListBuilder#1525 [arrow] (comath) - Add a diagram to
takekernel documentation #1524 [arrow] (alamb)
Closed issues:
- Interesting benchmark results of
min_max_helper#1400
Merged pull requests:
- Fix incorrect
into_buffersfor UnionArray #1567 [arrow] (viirya) - Read/write nested dictionary in ipc stream reader/writer #1566 [arrow] (viirya)
- Support FixedSizeBinary and FixedSizeList for the C data interface #1564 [arrow] (sunchao)
- Split out ListArrayReader into separate module (#1483) #1563 [parquet] (tustvold)
- Split out
MapArrayinto separate module (#1483) #1562 [parquet] (tustvold) - Support empty projection in
ParquetRecordBatchReader#1560 [parquet] (tustvold) - fix infinite loop in not fully packed bit-packed runs #1555 [parquet] (tustvold)
- Add test for creating FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones #1551 [arrow] (alamb)
- Fix reading dictionaries from nested structs in ipc
StreamReader#1550 [arrow] (dispanser) - Add support for Date32/64 <--> String/LargeString in
castkernel #1534 [arrow] (yjshen) - fix clippy errors in 1.60 #1527 [parquet] [arrow] (alamb)
- Mark
remove-old-releases.shexecutable #1522 (alamb) - Delete duplicate code in the
sortkernel #1519 [arrow] (HaoYang670) - Fix reading nested lists from parquet files #1517 [parquet] (viirya)
- Speed up the
substringkernel by about 2x #1512 [arrow] (HaoYang670) - Add
new_from_stringsto createMapArrays#1507 [arrow] (viirya) - Decouple buffer deallocation from ffi and allow creating buffers from rust vec #1494 [arrow] (jhorstmann)
11.1.0 (2022-03-31)
Implemented enhancements:
- Implement
size_hintandExactSizedIteratorfor DecimalArray #1505 [arrow] - Support calculate length by chars for
StringArray#1493 [arrow] - Add
lengthkernel support forListArray#1470 [arrow] - The length kernel should work with
BinaryArrays #1464 [arrow] - FFI for Arrow C Stream Interface #1348 [arrow]
- Improve performance of
DictionaryArray::try_new()#1313 [arrow]
Fixed bugs:
- MIRI error in math_checked_divide_op/try_from_trusted_len_iter #1496 [arrow]
- Parquet Writer Incorrect Definition Levels for Nested NullArray #1480 [parquet]
- FFI: ArrowArray::try_from_raw shouldn't clone #1425 [arrow]
- Parquet reader fails to read null list. #1399 [parquet]
Documentation updates:
- A small mistake in the doc of
BinaryArrayandLargeBinaryArray#1455 [arrow] - A small mistake in the doc of
GenericBinaryArray::take_iter_unchecked#1454 [arrow] - Add links in the doc of
BinaryOffsetSizeTrait#1453 [arrow] - The doc of
FixedSizeBinaryArrayis confusing. #1452 [arrow] - Clarify docs that SlicesIterator ignores null values #1504 [arrow] (alamb)
- Update the doc of
BinaryArrayandLargeBinaryArray#1471 [arrow] (HaoYang670)
Closed issues:
packed_simdv.s.portable_simd, which should be used? #1492- Cleanup: Use Arrow take kernel Within parquet ListArrayReader #1482 [parquet]
Merged pull requests:
- Implement
size_hintandExactSizedIteratorforDecimalArray#1506 [arrow] (alamb) - Add
StringArray::num_charsfor calculating number of characters #1503 [arrow] (HaoYang670) - Workaround nightly miri error in
try_from_trusted_len_iter#1497 [arrow] (jhorstmann) - update doc of array_binary and array_string #1491 [arrow] (HaoYang670)
- Use Arrow take kernel within ListArrayReader #1490 [parquet] (viirya)
- Add
lengthkernel support for List Array #1488 [arrow] (HaoYang670) - Support sort for
Decimaldata type #1487 [arrow] (yjshen) - Fix reading/writing nested null arrays (#1480) (#1036) (#1399) #1481 [parquet] (tustvold)
- Implement ArrayEqual for UnionArray #1469 [arrow] (viirya)
- Support the
lengthkernel on Binary Array #1465 [arrow] (HaoYang670) - Remove Clone and copy source structs internally #1449 [arrow] (viirya)
- Fix Parquet reader for null lists #1448 [parquet] (viirya)
- Improve performance of DictionaryArray::try_new() #1435 [arrow] (jackwener)
- Add FFI for Arrow C Stream Interface #1384 [arrow] (viirya)
11.0.0 (2022-03-17)
Breaking changes:
- Replace
filter_row_groupswithReadOptionsin parquet SerializedFileReader #1389 [parquet] (yjshen) - Implement projection for arrow
IPC Readerfile / streams #1339 [arrow] [arrow-flight] (Dandandan)
Implemented enhancements:
- Fix generate_interval_case integration test failure #1445
- Make the doc examples of
ListArrayandLargeListArraymore readable #1433 - Redundant
ifandabsinshift()#1427 - Improve substring kernel performance #1422 [arrow]
- Add missing value_unchecked() of
FixedSizeBinaryArray#1419 - Remove duplicate bound check in function
shift#1408 - Support dictionary array in C data interface #1397
- filter kernel should work with
UnionArrays #1394 [arrow] - filter kernel should work with
FixedSizeListArrayss #1393 [arrow] - Add doc examples for creating FixedSizeListArray #1392 [arrow]
- Update
rust-versionto 1.59 #1377 - Arrow IPC projection support #1338
- Implement basic FlightSQL Server #1386 [arrow-flight] (wangfenjin)
Fixed bugs:
- DictionaryArray::try_new ignores validity bitmap of the keys #1429 [arrow]
- The doc of
GenericListArrayis confusing #1424 - DeltaBitPackDecoder Incorrectly Handles Non-Zero MiniBlock Bit Width Padding #1417 [parquet]
- DeltaBitPackEncoder Pads Miniblock BitWidths With Arbitrary Values #1416 [parquet]
- Possible unaligned write with MutableBuffer::push #1410 [arrow]
- Integration Test is failing on master branch #1398 [arrow]
Documentation updates:
- Rewrite doc of
GenericListArray#1450 [arrow] (HaoYang670) - Fix integration doc about build.ninja location #1438 (viirya)
Merged pull requests:
- Rewrite doc example of
ListArrayandLargeListArray#1447 [arrow] (HaoYang670) - Fix generate_interval_case in integration test #1446 [arrow] (viirya)
- Fix generate_decimal128_case in integration test #1440 (viirya)
filterkernel should work with FixedSizeListArrays #1434 [arrow] (viirya)- Support nullable keys in DictionaryArray::try_new #1430 [arrow] (jhorstmann)
- remove redundant if/clamp_min/abs #1428 [arrow] (jackwener)
- Add doc example for creating
FixedSizeListArray#1426 [arrow] (HaoYang670) - Directly write to MutableBuffer in substring #1423 [arrow] (viirya)
- Fix possibly unaligned writes in MutableBuffer #1421 [arrow] (jhorstmann)
- Add value_unchecked() and unit test #1420 [arrow] (jackwener)
- Fix DeltaBitPack MiniBlock Bit Width Padding #1418 [parquet] (tustvold)
- Update zstd requirement from 0.10 to 0.11 #1415 [parquet] (dependabot[bot])
- Set
default-features = falseforzstdin the parquet crate to supportwasm32-unknown-unknown#1414 [parquet] (kylebarron) - Add support for
UnionArrayinfilterkernel #1412 [arrow] (viirya) - Remove duplicate bound check in the function
shift#1409 [arrow] (HaoYang670) - Add dictionary support for C data interface #1407 [arrow] (sunchao)
- Fix a small spelling mistake in docs. #1406 [arrow] (HaoYang670)
- Add unit test to check
FixedSizeBinaryArrayinput all none #1405 [arrow] (jackwener) - Move csv Parser trait and its implementations to utils module #1385 [arrow] (sum12)
10.0.0 (2022-03-04)
Breaking changes:
- Remove existing has_ methods for optional fields in
ColumnChunkMetaData#1346 [parquet] (shanisolomon) - Remove redundant
has_methods inColumnChunkMetaData#1345 [parquet] (shanisolomon)
Implemented enhancements:
- Add extract month and day in temporal.rs #1387
- Add clone to
IpcWriteOptions#1381 [arrow] - Support
MapArrayinfilterkernel #1378 [arrow] - Add
weektemporal kernel #1375 [arrow] - Improve performance of
compare_dict_op#1371 [arrow] - Add support for LargeUtf8 in json writer #1357 [parquet]
- Make
arrow::array::builder::MapBuilderpublic #1354 [arrow] - Refactor
StructArray::from#1351 [arrow] - Refactor
RecordBatch::validate_new_batch#1350 [arrow] - Remove redundant has_ methods for optional column metadata fields #1344 [parquet]
- Add
writemethod to JsonWriter #1340 [arrow] - Refactor the code of
Bitmap::new#1337 [arrow] - Use DictionaryArray's iterator in
compare_dict_op#1329 [arrow] - Add
as_decimal_array(arr: &dyn Array) -> &DecimalArray#1312 [arrow] - More ergonomic / idiomatic primitive array creation from iterators #1298 [arrow]
- Implement DictionaryArray support in
eq_dyn,neq_dyn,lt_dyn,lt_eq_dyn,gt_dyn,gt_eq_dyn#1201 [arrow]
Fixed bugs:
cargo clippyfails on themasterbranch #1362 [arrow]ArrowArray::try_from_rawshould not assume the pointers are from Arc #1333 [arrow]- Fix CSV Writer::new to accept delimiter and make WriterBuilder::build use it #1328 [arrow]
- Make bounds configurable via builder when reading CSV #1327 [arrow]
- Add
with_datetime_format()to CSV WriterBuilder #1272 [arrow]
Performance improvements:
Closed issues:
- Consider removing redundant has_XXX metadata functions in
ColumnChunkMetadata#1332
Merged pull requests:
- Support extract
dayandmonthin temporal.rs #1388 [arrow] (Ted-Jiang) - Add write method to Json Writer #1383 [arrow] (matthewmturner)
- Derive
CloneforIpcWriteOptions#1382 [arrow] (matthewmturner) - feat: support maps in MutableArrayData #1379 [arrow] (helgikrs)
- Support extract
weekin temporal.rs #1376 [arrow] (Ted-Jiang) - Speed up the function
min_max_string#1374 [arrow] (HaoYang670) - Improve performance if dictionary kernels, add benchmark and add
take_iter_unchecked#1372 [arrow] (viirya) - Update pyo3 requirement from 0.15 to 0.16 #1369 [arrow] (dependabot[bot])
- Update contributing guide #1368 (HaoYang670)
- Allow primitive array creation from iterators of PrimitiveTypes (as well as
Option) #1367 [arrow] (viirya) - Update flatbuffers requirement from =2.1.0 to =2.1.1 #1364 [arrow] (dependabot[bot])
- Fix clippy lints #1363 [parquet] [arrow] (HaoYang670)
- Refactor
RecordBatch::validate_new_batch#1361 [arrow] (HaoYang670) - Refactor
StructArray::from#1360 [arrow] (HaoYang670) - Update flatbuffers requirement from =2.0.0 to =2.1.0 #1359 [arrow] (dependabot[bot])
- fix: add LargeUtf8 support in json writer #1358 [arrow] (tiphaineruy)
- Add
as_decimal_arrayfunction #1356 [arrow] (liukun4515) - Publicly export arrow::array::MapBuilder #1355 [arrow] (tjwilson90)
- Add with_datetime_format to csv WriterBuilder #1347 [arrow] (gsserge)
- Refactor
Bitmap::new#1343 [arrow] (HaoYang670) - Remove delimiter from csv Writer #1342 [arrow] (gsserge)
- Make bounds configurable in csv ReaderBuilder #1341 [arrow] (gsserge)
ArrowArray::try_from_rawshould not assume the pointers are from Arc #1334 [arrow] (viirya)- Use DictionaryArray's iterator in
compare_dict_op#1330 [arrow] (viirya) - Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326 [arrow] (viirya)
- Arrow Rust + Conbench Integration #1289 (dianaclarke)
9.1.0 (2022-02-19)
Implemented enhancements:
- Exposing page encoding stats #1321
- Improve filter performance by special casing high and low selectivity predicates #1288 [arrow]
- Speed up
DeltaBitPackDecoder#1281 [parquet] - Fix all clippy lints in arrow crate #1255 [arrow]
- Expose page encoding
ColumnChunkMetadata#1322 [parquet] (shanisolomon) - Expose column index and offset index in
ColumnChunkMetadata#1318 [parquet] (shanisolomon) - Expose bloom filter offset in
ColumnChunkMetadata#1309 [parquet] (shanisolomon) - Add
DictionaryArray::try_new()to create dictionaries from pre existing arrays #1300 [arrow] (alamb) - Add
DictionaryArray::keys_iter, andtake_iterfor other array types #1296 [arrow] (viirya) - Make
rledecoder public underexperimentalfeature #1271 [parquet] (zeevm) - Add
DictionaryArraysupport ineq_dynkernel #1263 [arrow] (viirya)
Fixed bugs:
lenis not a parameter ofMutableArrayData::extend#1316- module
data_typeis private in Rust Parquet 8.0.0 #1302 [parquet] - Test failure: bit_chunk_iterator #1294
- csv_writer benchmark fails with "no such file or directory" #1292
Documentation updates:
Performance improvements:
- Vectorize DeltaBitPackDecoder, up to 5x faster decoding #1284 [parquet] (tustvold)
- Skip zero-ing primitive nulls #1280 [parquet] (tustvold)
- Add specialized filter kernels in
computemodule (up to 10x faster) #1248 [parquet] [arrow] (tustvold)
Closed issues:
- Expose column and offset index metadata offset #1317
- Expose bloom filter metadata offset #1308
- Improve ergonomics to construct
DictionaryArraysfromKeyandValuearrays #1299 - Make it easier to iterate over
DictionaryArray#1295 [arrow] - (WON'T FIX) Don't Interwine Bit and Byte Aligned Operations in
BitReader#1282 - how to create arrow::array from streamReader #1278
- Remove scientific notation when converting floats to strings. #983
Merged pull requests:
- Update the document of function
MutableArrayData::extend#1336 [arrow] (HaoYang670) - Fix clippy lint
dead_code#1324 [arrow] (gsserge) - fix test bug and ensure that bloom filter metadata is serialized in
to_thrift#1320 [parquet] (shanisolomon) - Enable more clippy lints in arrow #1315 [arrow] (gsserge)
- Fix clippy lint
clippy::type_complexity#1310 [arrow] (gsserge) - Fix clippy lint
clippy::float_equality_without_abs#1305 [arrow] (gsserge) - Fix clippy
clippy::vec_init_then_pushlint #1303 [arrow] (gsserge) - Fix failing csv_writer bench #1293 [arrow] (andygrove)
- Changes for 9.0.2 #1291 [parquet] [arrow] [arrow-flight] (alamb)
- Fix bitmask creation also for simd comparisons with scalar #1290 [arrow] (jhorstmann)
- Fix simd comparison kernels #1286 [arrow] (jhorstmann)
- Restrict Decoder to compatible types (#1276) #1277 [parquet] (tustvold)
- Fix some clippy lints in parquet crate, rename
LevelEncodervariants to conform to Rust standards #1273 [parquet] (HaoYang670) - Use new DecimalArray creation API in arrow crate #1249 [arrow] (alamb)
- Improve
DecimalArrayAPI ergonomics: additer(),FromIterator,with_precision_and_scale#1223 [arrow] (alamb)
9.0.2 (2022-02-09)
Breaking changes:
- Add
Send+SynctoDataType,RowGroupReader,FileReader,ChunkReader. #1264 - Rename the function
Bitmap::lentoBitmap::bit_lento clarify its meaning #1242 [parquet] [arrow] (HaoYang670) - Remove unused / broken
memory-checkfeature #1222 [arrow] (jhorstmann) - Potentially buffer multiple
RecordBatchesbefore writing a parquet row group inArrowWriter#1214 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Add
asyncarrow parquet reader #1154 [parquet] [arrow] (tustvold) - Rename
Bitmap::lentoBitmap::bit_len#1233 - Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
- Write Multiple RecordBatch to Parquet Row Group #1211
- Add doc examples for
eq_dynetc. #1202 [arrow] - Add comparison kernels for
BinaryArray#1108 impl ArrowNativeType for i128#1098- Remove
Copytrait bound from dyn scalar kernels #1243 [arrow] (matthewmturner) - Add
into_innerfor IPCFileWriter#1236 [arrow] (yjshen) - [Minor]Re-export
array::builder::make_builderto make it available for downstream #1235 [arrow] (yjshen)
Fixed bugs:
- Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
- Get
Unknown configuration option rust-versionwhen running the rust format command #1240 BitmapLength Validation is Incorrect #1231 [arrow]- Writing sliced
ListArrayorMapArrayignore offsets #1226 [parquet] - Remove broken
memory-trackingcrate feature #1171 - Revert making
parquet::data_typeandparquet::arrow::schemaexperimental #1244 [parquet] (tustvold)
Documentation updates:
- Update parquet crate documentation and examples #1253 [parquet] [arrow] (alamb)
- Refresh parquet readme / contributing guide #1252 [parquet] (alamb)
- Add docs examples for dynamically compare functions #1250 [arrow] (HaoYang670)
- Add Rust Docs examples for UnionArray #1241 [arrow] (HaoYang670)
- Improve documentation for Bitmap #1237 [arrow] (alamb)
Performance improvements:
- Improve performance for arithmetic kernels with
simdfeature enabled (except for division/modulo) #1221 [arrow] (jhorstmann) - Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
- Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)
Closed issues:
UnalignedBitChunkIteratorto that iterates through already alignedu64blocks #1227- Remove unused
ArrowArrayReaderin parquet #1197 [parquet]
Merged pull requests:
- Upgrade clap to 3.0.0 #1261 [parquet] (Jimexist)
- Update chrono-tz requirement from 0.4 to 0.6 #1259 [arrow] (dependabot[bot])
- Update zstd requirement from 0.9 to 0.10 #1257 [parquet] (dependabot[bot])
- Fix NullArrayReader (#1245) #1246 [parquet] (tustvold)
- dyn compare for binary array #1238 [arrow] (HaoYang670)
- Remove arrow array reader (#1197) #1234 [parquet] (tustvold)
- Fix null bitmap length validation (#1231) #1232 [arrow] (tustvold)
- Faster bitmask iteration #1228 [parquet] [arrow] (tustvold)
- Add non utf8 values into the test cases of BinaryArray comparison #1220 [arrow] (HaoYang670)
- Update DECIMAL_RE to allow scientific notation in auto inferred schemas #1216 [arrow] (pjmore)
- Fix simd comparison kernels #1286 [arrow] (jhorstmann)
- Fix bitmask creation also for simd comparisons with scalar #1290 [arrow] (jhorstmann)
8.0.0 (2022-01-20)
Breaking changes:
- Return error from JSON writer rather than panic #1205 [arrow] (Ted-Jiang)
- Remove
ArrowSignedNumericTypeto Simplify and reduce code duplication in arithmetic kernels #1161 [arrow] (jhorstmann) - Restrict RecordReader and friends to scalar types (#1132) #1155 [parquet] (tustvold)
- Move more parquet functionality behind experimental feature flag (#1032) #1134 [parquet] (tustvold)
Implemented enhancements:
- Parquet reader should be able to read structs within list #1186 [parquet]
- Disable serde_json
arbitrary_precisionfeature flag #1174 [arrow] - Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
- Return
Errfrom JSON writer rather thanpanic!for unsupported types #1157 [arrow] - Support
scalarmathematics kernels forArrayand scalar value #1153 [arrow] - Support
DecimalArrayin sort kernel #1137 - Parquet Fuzz Tests #1053
- BooleanBufferBuilder Append Packed #1038 [arrow]
- parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
- Reduce Public Parquet API #1032 [parquet]
- Add
from_iter_valuesfor binary array #1188 [arrow] (Jimexist) - Add support for
MapArrayin json writer #1149 [arrow] (helgikrs)
Fixed bugs:
- Empty string arrays with no nulls are not equal #1208 [arrow]
- Pretty print a
RecordBatchcontainingFloat16triggers a panic #1193 [arrow] - Writing structs nested in lists produces an incorrect output #1184 [parquet]
- Undefined behavior for
GenericStringArray::from_iter_valuesif reported iterator upper bound is incorrect #1144 [arrow] - Interval comparisons with
simdfeature asserts #1136 [arrow] - RecordReader Permits Illegal Types #1132 [parquet]
Security fixes:
- Fix undefined behavor in GenericStringArray::from_iter_values #1145 [arrow] (alamb)
- parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) #1082 [parquet] [arrow] (tustvold)
Documentation updates:
- Update parquet crate readme #1192 [parquet] (alamb)
- Document safety justification of some uses of
from_trusted_len_iter#1148 [arrow] (alamb)
Performance improvements:
- Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
- Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)
Closed issues:
Merged pull requests:
- fix a bug in variable sized equality #1209 [arrow] (helgikrs)
- Pin WASM / packed SIMD tests to nightly-2022-01-17 #1204 (alamb)
- feat: add support for casting Duration/Interval to Int64Array #1196 [arrow] (e-dard)
- Add comparison support for fully qualified BinaryArray #1195 [arrow] (HaoYang670)
- Fix in display of
Float16Array#1194 [arrow] (helgikrs) - update nightly version for miri #1189 (Jimexist)
- feat(parquet): support for reading structs nested within lists #1187 [parquet] (helgikrs)
- fix: Fix a bug in how definition levels are calculated for nested structs in a list #1185 [parquet] (helgikrs)
- Truncate bitmask on BooleanBufferBuilder::resize: #1183 [parquet] [arrow] (tustvold)
- Add ticket reference for false positive in clippy #1181 [arrow] (alamb)
- Fix record formatting in 1.58 #1178 [parquet] (tustvold)
- Serialize i128 as JSON string #1175 [arrow] (tustvold)
- Support DecimalType in
sortandtakekernels #1172 [arrow] (liukun4515) - Fix new clippy lints introduced in Rust 1.58 #1170 [parquet] [arrow] (alamb)
- Fix compilation error with simd feature #1169 [arrow] (jhorstmann)
- Fix bug while writing parquet with empty lists of structs #1166 [parquet] (helgikrs)
- Use tempfile for parquet tests #1165 [parquet] (tustvold)
- Remove left over dev/README.md file from arrow/arrow-rs split #1162 (alamb)
- Add multiply_scalar kernel #1159 [arrow] (viirya)
- Fuzz test different parquet encodings #1156 [parquet] (tustvold)
- Add subtract_scalar kernel #1152 [arrow] (viirya)
- Add add_scalar kernel #1151 [arrow] (viirya)
- Move simd right out of for_each loop #1150 [arrow] (viirya)
- Internal Remove
GenericStringArray::from_vecandGenericStringArray::from_opt_vec#1147 [arrow] (alamb) - Implement SIMD comparison operations for types with less than 4 lanes (i128) #1146 [arrow] (jhorstmann)
- Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) #1110 [parquet] (tustvold)
- Generify ColumnReaderImpl and RecordReader (#1040) #1041 [parquet] (tustvold)
- BooleanBufferBuilder::append_packed (#1038) #1039 [arrow] (tustvold)
7.0.0 (2022-1-07)
Arrow
Breaking changes:
pretty_format_batchesnow returnsResult<impl Display>rather thanString: #975MutableBuffer::typed_data_mutis markedunsafe: #1029- UnionArray updated match latest Arrow spec, added
UnionMode,UnionArray::new()markedunsafe: #885
New Features:
- Support for
Float16Arraytypes #888 - IPC support for
UnionArray#654 - Dynamic comparison kernels for scalars (e.g.
eq_dyn_scalar), includingDictionaryArray: #1113
Enhancements:
- Added
Schema::with_metadataandField::with_metadata#1092 - Support for custom datetime format for inference and parsing csv files #1112
- Implement
ArrayforArrayReffor easier use #1129 - Pretty printing display support for
FixedSizeBinaryArray#1097 - Dependency Upgrades:
pyo3,parquet-format,prost,tonic - Avoid allocating vector of indices in
lexicographical_partition_ranges#998
Parquet
Fixed bugs:
- (parquet) Fix reading of dictionary encoded pages with null values: #1130
Changelog
6.5.0 (2021-12-23)
- 092fc64bbb019244887ebd0d9c9a2d3e3a9aebc0 support cast decimal to decimal (#1084) (#1093)
- 01459762ed18b504e00e7b2818fce91f19188b1e Fix like regex escaping (#1085) (#1090)
- 7c748bfccbc2eac0c1138378736b70dcb7e26a5b support cast decimal to signed numeric (#1073) (#1089)
- bd3600b6483c253ae57a38928a636d39a6b7cb02 parquet: Use constant for RLE decoder buffer size (#1070) (#1088)
- 2b5c53ecd92468fd95328637a15de7f35b6fcf28 Box RleDecoder index buffer (#1061) (#1062) (#1081)
- 78721bc1a467177679ad6196b994759cf4d73377 BooleanBufferBuilder correct buffer length (#1051) (#1052) (#1080)
- 3a5e3541d3a4db61a828011ed95c8539adf1d57c support cast signed numeric to decimal (#1044) (#1079)
- 000bdb3053098255d43288aa3e8665e8b1892a6c fix(compute): LIKE escape parenthesis (#1042) (#1078)
- e0abdb9e62772a2f853974e68e744246e7f47569 Add Schema::project and RecordBatch::project functions (#1033) (#1077)
- 31911a4d6328d889d98796b896412b3997f73e13 Remove outdated safety example from doc (#1050) (#1058)
- 71ac8620993a65a7f1f57278c3495556625356b3 Use existing array type in
takekernel (#1046) (#1057) - 1c5902376b7f7d56cb5249db4f98a6a370ead919 Extract method to drive PageIterator -> RecordReader (#1031) (#1056)
- 7ca39361f8733b86bc0cef5ed5d74093e2c6b14d Clarify governance of arrow crate (#1030) (#1055)
6.4.0 (2021-12-10)
- 049f48559f578243935b6e512d06c4c2df360bf1 Force new cargo and target caching to fix CI (#1023) (#1024)
- ef37da3b60f71a52d5ad67e9ca810dca38b29f00 Fix a broken link and some missing styling in the main arrow crate docs (#1013) (#1019)
- f2c746a9b968714cfe05d35fcee8658371acd899 Remove out of date comment (#1008) (#1018)
- 557fc11e3b2a09a680c0cfbf38d27b13101b63fe Remove unneeded
rcfeature of serde (#990) (#1016) - b28385e096b1cf8f5fb2773d49b160f93d94fbac Docstrings for Timestamp*Array. (#988) (#1015)
- a92672e40217670d2566a85d70b0b59fffac594c Add full data validation for ArrayData::try_new() (#1007)
- 6c8b2936d7b07e1e2f5d1d48eea425a385382dfb Add boolean comparison to scalar kernels for less then, greater than (#977) (#1005)
- 14d140aeca608a23a8a6b2c251c8f53ffd377e61 Fix some typos in code and comments (#985) (#1006)
- b4507f562fb0eddfb79840871cd2733dc0e337cd Fix warnings introduced by Rust/Clippy 1.57.0 (#1004)
6.3.0 (2021-11-26)
Changes:
- 7e51df015ce851a5de444ca08b57b38e7ee959a3 add more error test case and change the code style (#952) (#976)
- 6c570cfe98d6a7a4ec74b139b733c5c72ed10015 Support read decimal data from csv reader if user provide the schema with decimal data type (#941) (#974)
- 4fa0d4d7f7d9ca0a3da2a6dfe3eae6dc2d51a79a Adding Pretty Print Support For Fixed Size List (#958) (#968)
- 9d453a3128013c03e8ed854ded76b15cc6f28be4 Fix bug in temporal utilities due to DST being ignored. (#955) (#967)
- 1b9fd9e3fb2653236513bb7dda5aa2fa14d1d831 Inferring 2. as Float64 for issue #929 (#950) (#966)
- e6c5e1c877bd94b3d6e545567f901d9962257cf8 Fix CI for latest nightly (#970) (#973)
- c96e8de457442806e18944f0b26dd06ba4cb1aee Fix primitive sort when input contains more nulls than the given sort limit (#954) (#965)
- 094037d418381584178db1d886cad3b5024b414a Update comfy-table to 5.0 (#957) (#964)
- 9f635021eee6786c5377c891218c5f88ebce07c3 Fix csv writing of timestamps to show timezone. (#849) (#963)
- f7deba4c3a050a52608462ee8a827bb8f6364140 Adding ability to parse float from number with leading decimal (#831) (#962)
- 59f96e842d05b63882f7ba285c66a9739761cf84 add ilike comparator (#874) (#961)
- 54023c8a5543c9f9fa4955afa01189029f3e96f5 Remove unpassable cargo publish check from verify-release-candidate.sh (#882) (#949)
6.2.0 (2021-11-12)
Features / Fixes:
- 4037933e43cad9e4de027039ce14caa65f78300a Fix validation for offsets of StructArrays (#942) (#946)
- 1af9ca5d363d870550026a7b1abcb749befbb371 implement take kernel for null arrays (#939) (#944)
- 320de1c20aefbf204f6888e2ad3663863afeba9f add checker for appending i128 to decimal builder (#928) (#943)
- dff14113884ad4246a8cafb9be579ebdb4e1481f Validate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)
- c3eae1ec56303b97c9e15263063a6a13122ef194 fix some warning about unused variables in panic tests (#894) (#933)
- e80bb018450f13a30811ffd244c42917d8bf8a62 fix some clippy warnings (#896) (#930)
- bde89463b627be3f60b5569d038ca36c434da71d feat(ipc): add support for deserializing messages with nested dictionary fields (#923) (#931)
- 792544b5fb7b84224ef9745ecb9f330663c14fb4 refactor regexp_is_match_utf8_scalar to try to mitigate miri failures (#895) (#932)
- 3f0e252811cbb6e3f7c774959787dcfec985d03e Automatically retry failed MIRI runs to work around intermittent failures (#934)
- c9a9515c46d560ced00e23ff57cb10a1c97573cb Update mod.rs (#909) (#919)
- 64ed79ece67141b92dc45b8a1d43cb9d909aa6a9 Mark boolean kernels public (#913) (#920)
- 8b95fe0bbf03588c5cc00f67365c5b0dac4d7a34 doc example mistype (#904) (#918)
- 34c5eab4862cab16fdfd5f5ed6c68dce6298dfa4 allow null array to be cast to all other types (#884) (#917)
- 3c69752e55ed0c58f5a8faed918a22b45cd93766 Fix instances of UB that cause tests to not pass under miri (#878) (#916)
- 85402148c3af03d0855e81f855715ea98a7491c5 feat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)
- 03d95e626cb0e654775fefa77786674ea41be4a2 Fix references to changelog (#905)
6.1.0 (2021-10-29)
Features / Fixes:
- b42649b0088fe7762c713a41a23c1abdf8d0496d implement eq_dyn and neq_dyn (#858) (#867)
- 01743f3f10a377c1ca857cd554acbf84155766d8 fix: fix a bug in offset calculation for unions (#863) (#871)
- 8bfff793a23f0e71008c7a9eea7a54d6b913ecff add lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)
- 8845e91d4ab584c822e9ee903db7069551b124af fix(ipc): Support serializing structs containing dictionaries (#848) (#865)
- 620282a0d9fdd2a8ed7e8313d17ba3dec64c80e5 Implement boolean equality kernels (#844) (#857)
- 94cddcacf785be982e69689291ce034ef00220b4 Cherry pick fix parquet_derive with default features (and fix cargo publish) (#856)
- 733fd583ddb3dbe6b4d58a809c444ee16ac0eae8 Use kernel utility for parsing timestamps in csv reader. (#832) (#853)
- 2cc64937a153f632796915d2d9869d5c2a501d28 [Minor] Fix clippy errors with new rust version (1.56) and float formatting with nightly (#845) (#850)
Other:
- bfac9e5a027e3bd78b7a1ec90c75a3e385bd66bb Test out new tarpaulin version (#852) (#866)
- 809350ced392cfc78d8a1a46228d4ffc25dea9ff Update README.md (#834) (#854)
- 70582f40dd21f5c710c4946266d0563a92b92337 [MINOR] Delete temp file from docs (#836) (#855)
- a721e00014015a7e598946b6efb9b1da8080ec85 Force fresh cargo cache key in CI (#839) (#851)
6.0.0 (2021-10-13)
Breaking changes:
- Replace
ArrayData::new()withArrayData::try_new()andunsafe ArrayData::new_unchecked#822 [parquet] [arrow] (alamb) - Update Bitmap::len to return bits rather than bytes #749 [arrow] (matthewmturner)
- use sort_unstable_by in primitive sorting #552 [arrow] (Jimexist)
- New MapArray support #491 [parquet] [arrow] (nevi-me)
Implemented enhancements:
- Improve parquet binary writer speed by reducing allocations #819
- Expose buffer operations #808
- Add doc examples of writing parquet files using
ArrowWriter#788
Fixed bugs:
- JSON reader can create null struct children on empty lists #825
- Incorrect null count for cast kernel for list arrays #815
minuteandsecondtemporal kernels do not respect timezone #500- Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)
Documentation updates:
- Doctest for PrimitiveArray using from_iter_values. #694 [arrow] (novemberkilo)
- Doctests for BinaryArray and LargeBinaryArray. #625 [arrow] (novemberkilo)
- Add links in docstrings #605 [arrow] (alamb)
5.5.0 (2021-09-24)
Implemented enhancements:
Fixed bugs:
- Converting from string to timestamp uses microseconds instead of milliseconds #780
- Document has no link to
RowColumnIter#762 - length on slices with null doesn't work #744
5.4.0 (2021-09-10)
Implemented enhancements:
- Upgrade lexical-core to 0.8 #747
append_nullsandappend_trusted_len_iterfor PrimitiveBuilder #725- Optimize MutableArrayData::extend for null buffers #397
Fixed bugs:
- Arithmetic with scalars doesn't work on slices #742
- Comparisons with scalar don't work on slices #740
unarykernel doesn't respect offset #738new_null_arraycreates invalid struct arrays #734- --no-default-features is broken for parquet #733 [parquet]
Bitmap::lenreturns the number of bytes, not bits. #730- Decimal logical type is formatted incorrectly by print_schema #713
- parquet_derive does not support chrono time values #711
- Numeric overflow when formatting Decimal type #710
- The integration tests are not running #690
Closed issues:
- Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729
5.3.0 (2021-08-26)
Implemented enhancements:
- Add optimized filter kernel for regular expression matching #697
- Can't cast from timestamp array to string array #587
Fixed bugs:
- 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
- Support reading json string into binary data type. #701
Closed issues:
5.2.0 (2021-08-12)
Implemented enhancements:
- Make rand an optional dependency #671
- Remove undefined behavior in
valuemethod of boolean and primitive arrays #645 - Avoid materialization of indices in filter_record_batch for single arrays #636
- Add a note about arrow crate security / safety #627
- Allow the creation of String arrays from an iterator of &Option<&str> #598
- Support arrow map datatype #395
Fixed bugs:
- Parquet fixed length byte array columns write byte array statistics #660 [parquet]
- Parquet boolean columns write Int32 statistics #659 [parquet]
- Writing Parquet with a boolean column fails #657
- JSON decoder data corruption for large i64/u64 #653
- Incorrect min/max statistics for strings in parquet files #641 [parquet]
Closed issues:
5.1.0 (2021-07-29)
Implemented enhancements:
- Make FFI_ArrowArray empty() public #602
- exponential sort can be used to speed up lexico partition kernel #586
- Implement sort() for binary array #568
- primitive sorting can be improved and more consistent with and without
limitif sorted unstably #553
Fixed bugs:
- Confusing memory usage with CSV reader #623
- FFI implementation deviates from specification for array release #595
- Parquet file content is different if
~/.cargois in a git checkout #589 - Ensure output of MIRI is checked for success #581
- MIRI failure in
array::ffi::tests::test_structand other ffi tests #580 - ListArray equality check may return wrong result #570
- cargo audit failed #561
- ArrayData::slice() does not work for nested types such as StructArray #554
Documentation updates:
- More examples of how to construct Arrays #301
Closed issues:
5.0.0 (2021-07-14)
Breaking changes:
- Remove lifetime from DynComparator #543 [arrow]
- Simplify interactions with arrow flight APIs #376 [arrow-flight]
- refactor: remove lifetime from DynComparator #542 [arrow] (e-dard)
- use iterator for partition kernel instead of generating vec #438 [arrow] (Jimexist)
- Remove DictionaryArray::keys_array method #419 [arrow] (jhorstmann)
- simplify interactions with arrow flight APIs #377 [arrow-flight] (garyanaplan)
- return reference from DictionaryArray::values() (#313) #314 [arrow] (tustvold)
Implemented enhancements:
- Allow creation of StringArrays from Vec<String> #519 [arrow]
- Implement RecordBatch::concat #461 [arrow]
- Implement RecordBatch::slice() to slice RecordBatches #460 [arrow]
- Add a RecordBatch::split to split large batches into a set of smaller batches #343
- generate parquet schema from rust struct #539 [parquet] (nevi-me)
- Implement
RecordBatch::concat#537 [arrow] (silathdiir) - Implement function slice for RecordBatch #490 [arrow] (b41sh)
- add lexicographically partition points and ranges #424 [arrow] (Jimexist)
- allow to read non-standard CSV #326 [arrow] (kazuk)
- parquet: Speed up
BitReader/DeltaBitPackDecoder#325 [parquet] (kornholi) - ARROW-12343: [Rust] Support auto-vectorization for min/max #9 [arrow] (Dandandan)
- ARROW-12411: [Rust] Create RecordBatches from Iterators #7 [arrow] (alamb)
Fixed bugs:
- Error building on master - error: cyclic package dependency: package
ahash v0.7.4depends on itself. Cycle #544 - IPC reader panics with out of bounds error #541
- Take kernel doesn't handle nulls and structs correctly #530 [arrow]
- master fails to compile with
default-features=false#529 - README developer instructions out of date #523
- Update rustc and packed_simd in CI before 5.0 release #517
- Incorrect memory usage calculation for dictionary arrays #503 [arrow]
- sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
- Cast of utf8 types and list container types don't respect offset #334 [arrow]
- fix take kernel null handling on structs #531 [arrow] (bjchambers)
- Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
- parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
- Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
- respect offset in utf8 and list casts #335 [arrow] (ritchie46)
- Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
- ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
- fix invalid null handling in filter #296 [arrow] (ritchie46)
- fix NaN handling in parquet statistics #256 (crepererum)
Documentation updates:
- Improve arrow's crate's readme on crates.io #463
- Clean up README.md in advance of the 5.0 release #536 [arrow] [arrow-flight] [parquet] (alamb)
- fix readme instructions to reflect new structure #524 (marcvanheerden)
- Improve docs for NullArray, new_null_array and new_empty_array #240 [arrow] (alamb)
Merged pull requests:
- Fix default arrow build #533 [arrow] (alamb)
- Add tests for building applications using arrow with different feature flags #532 [arrow] (alamb)
- Remove unused futures dependency from arrow-flight #528 [arrow-flight] (alamb)
- CI: update rust nightly and packed_simd #525 [arrow] (ritchie46)
- Support
StringArraycreation from String Vec #522 [arrow] (silathdiir) - Fix parquet benchmark schema #513 [parquet] (nevi-me)
- Fix parquet definition levels #511 [parquet] (nevi-me)
- Fix for primitive and boolean take kernel for nullable indices with an offset #509 [arrow] (jhorstmann)
- Bump flatbuffers #499 [arrow] (PsiACE)
- implement second/minute helpers for temporal #493 [arrow] (ovr)
- special case concatenating single element array shortcut #492 [arrow] (Jimexist)
- update docs to reflect recent changes (joins and window functions) #489 (Jimexist)
- Update rand, proc-macro and zstd dependencies #488 [arrow] [arrow-flight] [parquet] (alamb)
- Doctest for GenericListArray. #474 [arrow] (novemberkilo)
- remove stale comment on
ArrayDataequality and update unit tests #472 (Jimexist) - remove unused patch file #471 (Jimexist)
- fix clippy warnings for rust 1.53 #470 (Jimexist)
- Fix PR labeler #468 (Dandandan)
- Tweak dev backporting docs #466 (alamb)
- Unvendor Archery #459 (kszucs)
- Add sort boolean benchmark #457 (alamb)
- Add C data interface for decimal128 and timestamp #453 [arrow] (alippai)
- Implement the Iterator trait for the json Reader. #451 [arrow] (LaurentMazare)
- Update release docs + release email template #450 (alamb)
- remove clippy unnecessary wraps suppression in cast kernel #449 (Jimexist)
- Use partition for bool sort #448 (Jimexist)
- remove unnecessary wraps in sort #445 (Jimexist)
- Python FFI bridge for Schema, Field and DataType #439 [arrow] (kszucs)
- Update release Readme.md #436 (alamb)
- Derive Eq and PartialEq for SortOptions #425 (tustvold)
- refactor lexico sort for future code reuse #423 (Jimexist)
- Reenable MIRI check on PRs #421 (alamb)
- Sort by float lists #420 (medwards)
- Fix out of bounds read in bit chunk iterator #416 (jhorstmann)
- Doctests for DecimalArray. #414 (novemberkilo)
- Add Decimal to CsvWriter and improve debug display #406 (alippai)
- MINOR: update install instruction #400 (alippai)
- use prettier to auto format md files #398 (Jimexist)
- window::shift to work for all array types #388 (Jimexist)
- add more tests for window::shift and handle boundary cases #386 (Jimexist)
- Implement faster arrow array reader #384 (yordan-pavlov)
- Add set_bit to BooleanBufferBuilder to allow mutating bit in index #383 (boazberman)
- make sure that only concat preallocates buffers #382 (ritchie46)
- Respect max rowgroup size in Arrow writer #381 [parquet] (nevi-me)
- Fix typo in release script, update release location #380 (alamb)
- Doctests for FixedSizeBinaryArray #378 (novemberkilo)
- Simplify shift kernel using new_null_array #370 (Dandandan)
- allow
SliceableCursorto be constructed from anArcdirectly #369 (crepererum) - Add doctest for ArrayBuilder #367 (alippai)
- Fix version in readme #365 (domoritz)
- Remove superfluous space #363 (domoritz)
- Add crate badges #362 (domoritz)
- Disable MIRI check until it runs cleanly on CI #360 (alamb)
- Only register Flight.proto with cargo if it exists #351 (tustvold)
- Reduce memory usage of concat (large)utf8 #348 (ritchie46)
- Fix filter UB and add fast path #341 (ritchie46)
- Automatic cherry-pick script #339 (alamb)
- Doctests for BooleanArray. #338 (novemberkilo)
- feature gate ipc reader/writer #336 (ritchie46)
- Add ported Rust release verification script #331 (wesm)
- Doctests for StringArray and LargeStringArray. #330 (novemberkilo)
- inline PrimitiveArray::value #329 (ritchie46)
- Enable wasm32 as a target architecture for the SIMD feature #324 (roee88)
- Fix undefined behavior in FFI and enable MIRI checks on CI #323 (roee88)
- Mutablebuffer::shrink_to_fit #318 [arrow] (ritchie46)
- Add (simd) modulus op #317 (gangliao)
- feature gate csv functionality #312 [arrow] (ritchie46)
- [Minor] Version upgrades #304 (Dandandan)
- Remove old release scripts #293 (alamb)
- Add Send to the ArrayBuilder trait #291 (Max-Meldrum)
- Added changelog generator script and configuration. #289 (jorgecarleitao)
- manually bump development version #288 (nevi-me)
- Fix FFI and add support for Struct type #287 (roee88)
- Fix subtraction underflow when sorting string arrays with many nulls #285 (medwards)
- Speed up bound checking in
take#281 (Dandandan) - Update PR template by commenting out instructions #278 (nevi-me)
- Added Decimal support to pretty-print display utility (#230) #273 (mgill25)
- Fix null struct and list roundtrip #270 (nevi-me)
- 1.52 clippy fixes #267 (nevi-me)
- Fix typo in csv/reader.rs #265 (domoritz)
- Fix empty Schema::metadata deserialization error #260 (hulunbier)
- update datafusion and ballista doc links #259 (Jimexist)
- support full u32 and u64 roundtrip through parquet #258 [parquet] (crepererum)
- [MINOR] Added env to run rust in integration. #253 (jorgecarleitao)
- [Minor] Made integration tests always run. #248 (jorgecarleitao)
- fix parquet max_definition for non-null structs #246 (nevi-me)
- Disabled rebase needed until demonstrate working. #243 (jorgecarleitao)
- pin flatbuffers to 0.8.4 #239 (ritchie46)
- sort_primitive result is capped to the min of limit or values.len #236 (medwards)
- Read list field correctly #234 [parquet] (nevi-me)
- Fix code examples for RecordBatch::try_from_iter #231 (alamb)
- Support string dictionaries in csv reader (#228) #229 (tustvold)
- support LargeUtf8 in sort kernel #26 (ritchie46)
- Removed unused files #22 (jorgecarleitao)
- ARROW-12504: Buffer::from_slice_ref set correct capacity #18 [arrow] (tustvold)
- Add GitHub templates #17 (andygrove)
- ARROW-12493: Add support for writing dictionary arrays to CSV and JSON #16 [arrow] (tustvold)
- ARROW-12426: [Rust] Fix concatenation of arrow dictionaries #15 [arrow] (tustvold)
- Update repository and homepage urls #14 [arrow] [arrow-flight] [parquet] (Dandandan)
- Added rebase-needed bot #13 (jorgecarleitao)
- Added Integration tests against arrow #10 (jorgecarleitao)
4.4.0 (2021-06-24)
Breaking changes:
- migrate partition kernel to use Iterator trait #437 [arrow]
- Remove DictionaryArray::keys_array #391 [arrow]
Implemented enhancements:
- sort kernel boolean sort can be O(n) #447 [arrow]
- C data interface for decimal128, timestamp, date32 and date64 #413
- Add Decimal to CsvWriter #405
- Use iterators to increase performance of creating Arrow arrays #200 [parquet]
Fixed bugs:
- Release Audit Tool (RAT) is not being triggered #481
- Security Vulnerabilities: flatbuffers:
read_scalarandread_scalar_atallow transmuting values withoutunsafeblocks #476 - Clippy broken after upgrade to rust 1.53 #467
- Pull Request Labeler is not working #462
- Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
- parquet reading hangs when row_group contains more than 2048 rows of data #349
- Fail to build arrow #247
- JSON reader does not implement iterator #193 [arrow]
Security fixes:
- Ensure a successful MIRI Run on CI #227
Closed issues:
- sort kernel has a lot of unnecessary wrapping #446
- [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]
4.3.0 (2021-06-10)
Implemented enhancements:
- Add partitioning kernel for sorted arrays #428 [arrow]
- Implement sort by float lists #427 [arrow]
- Derive Eq and PartialEq for SortOptions #426 [arrow]
- use prettier and github action to normalize markdown document syntax #399
- window::shift can work for more than just primitive array type #392
- Doctest for ArrayBuilder #366
Fixed bugs:
- Boolean
notkernel does not take offset of null buffer into account #417 - my contribution not marged in 4.2 release #394
- window::shift shall properly handle boundary cases #387
- Parquet
WriterProperties.max_row_group_sizenot wired up #257 - Out of bound reads in chunk iterator #198 [arrow]
4.2.0 (2021-05-29)
Breaking changes:
Implemented enhancements:
- Simplify shift kernel using null array #371
- Provide
Arc-based constructor forparquet::util::cursor::SliceableCursor#368 - Add badges to crates #361
- Consider inlining PrimitiveArray::value #328
- Implement automated release verification script #327
- Add wasm32 to the list of target architectures of the simd feature #316
- add with_escape for csv::ReaderBuilder #315 [arrow]
- IPC feature gate #310
- csv feature gate #309 [arrow]
- Add
shrink_to/shrink_to_fittoMutableBuffer#297
Fixed bugs:
- Incorrect crate setup instructions #364
- Arrow-flight only register rerun-if-changed if file exists #350
- Dictionary Comparison Uses Wrong Values Array #332
- Undefined behavior in FFI implementation #322
- All-null column get wrong parquet null-counts #306 [parquet]
- Filter has inconsistent null handling #295
4.1.0 (2021-05-17)
Implemented enhancements:
- Add Send to ArrayBuilder #290 [arrow]
- Improve performance of bound checking option #280 [arrow]
- extend compute kernel arity to include nullary functions #276
- Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
- Add support for pretty-printing Decimal numbers #230 [arrow]
- CSV Reader String Dictionary Support #228 [arrow]
- Add Builder interface for adding Arrays to record batches #210 [arrow]
- Support auto-vectorization for min/max #209 [arrow]
- Support LargeUtf8 in sort kernel #25 [arrow]
Fixed bugs:
- no method named
select_nth_unstable_byfound for mutable reference&mut [T]#283 - Rust 1.52 Clippy error #266
- NaNs can break parquet statistics #255 [parquet]
- u64::MAX does not roundtrip through parquet #254 [parquet]
- Integration tests failing to compile (flatbuffer) #249 [arrow]
- Fix compatibility quirks between arrow and parquet structs #245 [parquet]
- Unable to write non-null Arrow structs to Parquet #244 [parquet]
- schema: missing field
metadatawhen deserialize #241 [arrow] - Arrow does not compile due to flatbuffers upgrade #238 [arrow]
- Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
- arrow-rs contains a copy of the "format" directory #233 [arrow]
- Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
- Read list field correctly in <struct<list>> #167 [parquet]
- FFI listarray lead to undefined behavior. #20
Security fixes:
Documentation updates:
- Comment out the instructions in the PR template #277
- Update links to datafusion and ballista in README.md #19
- Update "repository" in Cargo.toml #12
Closed issues:
- Arrow Aligned Vec #268
- [Rust]: Tracking issue for AVX-512 #220 [arrow]
- Umbrella issue for clippy integration #217 [arrow]
- Support sort #215 [arrow]
- Support stable Rust #214 [arrow]
- Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
- ArrayData buffers are inconsistent across implementations #207
- 3.0.1 patch release #204
- Document patch release process #202
- Simplify Offset #186 [arrow]
- Typed Bytes #185 [arrow]
- [CI]docker-compose setup should enable caching #175
- Improve take primitive performance #174
- [CI] Try out buildkite #165 [arrow]
- Update assignees in JIRA where missing #160
- [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
- [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
- [Rust][DataFusion] Implement modulus expression #98 [arrow]
- [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
- [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
- [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
- [DataFusion] Implement metrics framework #90 [arrow]
- [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
- [DataFusion] Implement pretty print for physical query plan #88 [arrow]
- [Archery] Support rust clippy in the lint command #83
- [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
- [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
- [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
- [DataFusion] Parquet data source does not support complex types #39 [arrow]
- Merge utils from Parquet and Arrow #32 [arrow] [parquet]
- Add benchmarks for Parquet #30 [parquet]
- Mark methods that do not perform bounds checking as unsafe #28 [arrow]
- Test issue #24 [arrow]
- This is a test issue #11