mirror of
https://github.com/mozilla/gecko-dev.git
synced 2025-01-08 12:22:34 +00:00
7c5470c9ff
Differential Revision: https://phabricator.services.mozilla.com/D106197
225 lines
7.9 KiB
Plaintext
225 lines
7.9 KiB
Plaintext
Changes for 0.8.2 'Eurasian hobby':
|
|
-----------------------------------
|
|
|
|
0.8.2 is a middle-size update of the 0.8.0 branch:
|
|
- ARM32 optimizations for ipred and itx in 10/12bits,
|
|
completing the 10b/12b work on ARM64 and ARM32
|
|
- Give the post-filters their own threads
|
|
- ARM64: rewrite the wiener functions
|
|
- Speed up coefficient decoding, 0.5%-3% global decoding gain
|
|
- x86 optimizations for CDEF_filter and wiener in 10/12bit
|
|
- x86: rewrite the SGR AVX2 asm
|
|
- x86: improve msac speed on SSE2+ machines
|
|
- ARM32: improve speed of ipred and warp
|
|
- ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
|
|
- ARM32/64: improve speed of looprestoration
|
|
- Add seeking, pausing to the player
|
|
- Update the player for rendering of 10b/12b
|
|
- Misc speed improvements and fixes on all platforms
|
|
- Add a xxh3 muxer in the dav1d application
|
|
|
|
|
|
Changes for 0.8.1 'Eurasian hobby':
|
|
-----------------------------------
|
|
|
|
0.8.1 is a minor update on 0.8.0:
|
|
- Keep references to buffers valid after dav1d_close(). Fixes a regression
|
|
caused by the picture buffer pool added in 0.8.0.
|
|
- ARM32 optimizations for 10bit bitdepth for SGR
|
|
- ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
|
|
- ARM64 optimizations for 10bit bitdepth for SGR
|
|
- x86 optimizations for wiener in SSE2/SSSE3/AVX2
|
|
|
|
|
|
Changes for 0.8.0 'Eurasian hobby':
|
|
-----------------------------------
|
|
|
|
0.8.0 is a major update for dav1d:
|
|
- Improve the performance by using a picture buffer pool;
|
|
The improvements can reach 10% on some cases on Windows.
|
|
- Support for Apple ARM Silicon
|
|
- ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
|
|
- ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
|
|
put/prep 8tap/bilin, wiener and CDEF filters
|
|
- ARM64 optimizations for cfl_ac 444 for all bitdepths
|
|
- x86 optimizations for MC 8-tap, mc_scaled in AVX2
|
|
- x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
|
|
|
|
|
|
Changes for 0.7.1 'Frigatebird':
|
|
------------------------------
|
|
|
|
0.7.1 is a minor update on 0.7.0:
|
|
- ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
|
|
- SSE2 optimizations for prep_bilin and prep_8tap
|
|
- AVX2 optimizations for MC scaled
|
|
- Fix a clamping issue in motion vector projection
|
|
- Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
|
|
- Improvements on the dav1dplay utility player to support resizing
|
|
|
|
|
|
Changes for 0.7.0 'Frigatebird':
|
|
------------------------------
|
|
|
|
0.7.0 is a major release for dav1d:
|
|
- Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
|
|
- 10b/12b ARM64 optimizations are mostly complete:
|
|
- ipred (paeth, smooth, dc, pal, filter, cfl)
|
|
- itxfm (only 10b)
|
|
- AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
|
|
- AVX2 for cfl4:4:4
|
|
- AVX-512 CDEF filter
|
|
- ARM64 8b improvements for cfl_ac and itxfm
|
|
- ARM64 implementation for emu_edge in 8b/10b/12b
|
|
- ARM32 implementation for emu_edge in 8b
|
|
- Improvements on the dav1dplay utility player to support 10 bit,
|
|
non-4:2:0 pixel formats and film grain on the GPU
|
|
|
|
|
|
Changes for 0.6.0 'Gyrfalcon':
|
|
------------------------------
|
|
|
|
0.6.0 is a major release for dav1d:
|
|
- New ARM64 optimizations for the 10/12bit depth:
|
|
- mc_avg, mc_w_avg, mc_mask
|
|
- mc_put/mc_prep 8tap/bilin
|
|
- mc_warp_8x8
|
|
- mc_w_mask
|
|
- mc_blend
|
|
- wiener
|
|
- SGR
|
|
- loopfilter
|
|
- cdef
|
|
- New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
|
|
- New SSSE3 optimizations for film grain
|
|
- New AVX2 optimizations for msac_adapt16
|
|
- Fix rare mismatches against the reference decoder, notably because of clipping
|
|
- Improvements on ARM64 on msac, cdef and looprestoration optimizations
|
|
- Improvements on AVX2 optimizations for cdef_filter
|
|
- Improvements in the C version for itxfm, cdef_filter
|
|
|
|
|
|
Changes for 0.5.2 'Asiatic Cheetah':
|
|
------------------------------------
|
|
|
|
0.5.2 is a small release improving speed for ARM32 and adding minor features:
|
|
- ARM32 optimizations for loopfilter, ipred_dc|h|v
|
|
- Add section-5 raw OBU demuxer
|
|
- Improve the speed by reducing the L2 cache collisions
|
|
- Fix minor issues
|
|
|
|
|
|
Changes for 0.5.1 'Asiatic Cheetah':
|
|
------------------------------------
|
|
|
|
0.5.1 is a small release improving speeds and fixing minor issues
|
|
compared to 0.5.0:
|
|
- SSE2 optimizations for CDEF, wiener and warp_affine
|
|
- NEON optimizations for SGR on ARM32
|
|
- Fix mismatch issue in x86 asm in inverse identity transforms
|
|
- Fix build issue in ARM64 assembly if debug info was enabled
|
|
- Add a workaround for Xcode 11 -fstack-check bug
|
|
|
|
|
|
Changes for 0.5.0 'Asiatic Cheetah':
|
|
------------------------------------
|
|
|
|
0.5.0 is a medium release fixing regressions and minor issues,
|
|
and improving speed significantly:
|
|
- Export ITU T.35 metadata
|
|
- Speed improvements on blend_ on ARM
|
|
- Speed improvements on decode_coef and MSAC
|
|
- NEON optimizations for blend*, w_mask_, ipred functions for ARM64
|
|
- NEON optimizations for CDEF and warp on ARM32
|
|
- SSE2 optimizations for MSAC hi_tok decoding
|
|
- SSSE3 optimizations for deblocking loopfilters and warp_affine
|
|
- AVX2 optimizations for film grain and ipred_z2
|
|
- SSE4 optimizations for warp_affine
|
|
- VSX optimizations for wiener
|
|
- Fix inverse transform overflows in x86 and NEON asm
|
|
- Fix integer overflows with large frames
|
|
- Improve film grain generation to match reference code
|
|
- Improve compatibility with older binutils for ARM
|
|
- More advanced Player example in tools
|
|
|
|
|
|
Changes for 0.4.0 'Cheetah':
|
|
----------------------------
|
|
|
|
- Fix playback with unknown OBUs
|
|
- Add an option to limit the maximum frame size
|
|
- SSE2 and ARM64 optimizations for MSAC
|
|
- Improve speed on 32bits systems
|
|
- Optimization in obmc blend
|
|
- Reduce RAM usage significantly
|
|
- The initial PPC SIMD code, cdef_filter
|
|
- NEON optimizations for blend functions on ARM
|
|
- NEON optimizations for w_mask functions on ARM
|
|
- NEON optimizations for inverse transforms on ARM64
|
|
- VSX optimizations for CDEF filter
|
|
- Improve handling of malloc failures
|
|
- Simple Player example in tools
|
|
|
|
|
|
Changes for 0.3.1 'Sailfish':
|
|
------------------------------
|
|
|
|
- Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
|
|
- Reduce binary size, notably on Windows
|
|
- SSSE3 optimizations for ipred_filter
|
|
- ARM optimizations for MSAC
|
|
|
|
|
|
Changes for 0.3.0 'Sailfish':
|
|
------------------------------
|
|
|
|
This is the final release for the numerous speed improvements of 0.3.0-rc.
|
|
It mostly:
|
|
- Fixes an annoying crash on SSSE3 that happened in the itx functions
|
|
|
|
|
|
Changes for 0.2.2 (0.3.0-rc) 'Antelope':
|
|
-----------------------------
|
|
|
|
- Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
|
|
The impact is important on SSSE3, SSE4 and AVX2 cpus
|
|
- SSSE3 optimizations for all blocks size in itx
|
|
- SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
|
|
- Speed improvements on CDEF for SSE4 CPUs
|
|
- NEON optimizations for SGR and loop filter
|
|
- Minor crashes, improvements and build changes
|
|
|
|
|
|
Changes for 0.2.1 'Antelope':
|
|
----------------------------
|
|
|
|
- SSSE3 optimization for cdef_dir
|
|
- AVX2 improvements of the existing CDEF optimizations
|
|
- NEON improvements of the existing CDEF and wiener optimizations
|
|
- Clarification about the numbering/versionning scheme
|
|
|
|
|
|
Changes for 0.2.0 'Antelope':
|
|
----------------------------
|
|
|
|
- ARM64 and ARM optimizations using NEON instructions
|
|
- SSSE3 optimizations for both 32 and 64bits
|
|
- More AVX2 assembly, reaching almost completion
|
|
- Fix installation of includes
|
|
- Rewrite inverse transforms to avoid overflows
|
|
- Snap packaging for Linux
|
|
- Updated API (ABI and API break)
|
|
- Fixes for un-decodable samples
|
|
|
|
|
|
Changes for 0.1.0 'Gazelle':
|
|
----------------------------
|
|
|
|
Initial release of dav1d, the fast and small AV1 decoder.
|
|
- Support for all features of the AV1 bitstream
|
|
- Support for all bitdepth, 8, 10 and 12bits
|
|
- Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
|
|
- Full acceleration for AVX2 64bits processors, making it the fastest decoder
|
|
- Partial acceleration for SSSE3 processors
|
|
- Partial acceleration for NEON processors
|