Bug 1890370 - Remove libtheora from the tree. r=sylvestre,frontend-codestyle-reviewers,zeid

Differential Revision: https://phabricator.services.mozilla.com/D215395
This commit is contained in:
Paul Adenot 2024-07-15 14:20:37 +00:00
parent f507ce0633
commit b81bab7544
71 changed files with 3 additions and 19157 deletions

View File

@ -159,14 +159,11 @@ media/libopus/.*
media/libpng/.*
media/libsoundtouch/.*
media/libspeex_resampler/.*
media/libtheora/.*
media/libtremor/.*
media/libvorbis/.*
media/libvpx/.*
media/libwebp/.*
media/libyuv/.*
media/mozva/va/.*
media/openmax_dl/.*
media/openmax_il/.*
media/webrtc/signaling/src/sdp/sipcc/.*
media/webrtc/trunk/.*

View File

@ -1382,14 +1382,12 @@ media/libopus/
media/libpng/
media/libsoundtouch/
media/libspeex_resampler/
media/libtheora/
media/libvorbis/
media/libvpx/
media/libwebp/
media/libyuv/
media/mozva/va
media/mp4parse-rust/
media/openmax_dl/
media/openmax_il/
media/webrtc/signaling/gtest/MockCall.h
mfbt/double-conversion/double-conversion/

View File

@ -100,20 +100,6 @@ opus_packet_parse
opus_strerror
opus_multistream_encode_float
opus_multistream_surround_encoder_create
# libtheora symbols
th_comment_clear
th_comment_init
th_decode_alloc
th_decode_free
th_decode_headerin
th_decode_packetin
th_decode_ycbcr_out
th_granule_frame
th_info_clear
th_info_init
th_packet_isheader
th_packet_iskeyframe
th_setup_free
vorbis_block_clear
vorbis_block_init
vorbis_comment_clear

View File

@ -60,7 +60,6 @@ external_dirs += [
"media/libnestegg",
"media/libogg",
"media/libopus",
"media/libtheora",
"media/libspeex_resampler",
"media/libsoundtouch",
"media/mp4parse-rust",

View File

@ -1,56 +0,0 @@
Monty <monty@xiph.org>
- Original VP3 port
Timothy B. Terriberry
Gregory Maxwell
Ralph Giles
Monty
- Ongoing development
Dan B. Miller
- Pre alpha3 development
Rudolf Marek
Wim Tayman
Dan Lenski
Nils Pipenbrinck
Monty
- MMX optimized functions
David Schleef
- C64x port
Aaron Colwell
Thomas Vander Stichele
Jan Gerber
Conrad Parker
Cristian Adam
Sebastian Pippin
Simon Hosie
Brad Smith
- Bug fixes, enhancements, build systems.
Mauricio Piacentini
- Original win32 projects and example ports
- VP3->Theora transcoder
Silvia Pfeiffer
- Figures for the spec
Michael Smith
Andre Pang
calc
Chris Cheney
Brendan Cully
Edward Hervey
Adam Moss
Colin Ward
Jeremy C. Reed
Arc Riley
Rodolphe Ortalo
- Bug fixes
Robin Watts
- ARM code optimisations
and other Xiph.org contributors

View File

@ -1,255 +0,0 @@
libteora 1.2.0alpha1 (2010 September 23)
- New 'ptalarbvorm' encoder with better rate/distortion optimization
- New th_encode_ctl option for copying configuration from an existing
setup header, useful for splicing streams.
- Returns TH_DUPFRAME in more cases.
- Add ARM optimizations
- Add TI C64x+ DSP optimizations
- Other performance improvements
- Rename speedlevel 2 to 3 and provide a new speedlevel 2
- Various minor bug fixes
libtheora 1.1.2 (unreleased snapshot)
- Fix Huffman table decoding with OC_HUFF_SLUSH is set to 0
- Fix a frame size bug in player_example
- Add support for passing a buffer the size of the picture
region, rather than a full padded frame to th_encode_ycbcr_in()
as was possible with the legacy pre-1.0 API.
- 4:4:4 support in player_example using software yuv->rgb
- Better rgb->yuv conversion in png2theora
- Clean up warnings and local variables
- Build and documentation fixes
libtheora 1.1.1 (2009 October 1)
- Fix problems with MSVC inline assembly
- Add the missing encoder_disabled.c to the distribution
- build updates: autogen.sh should work better after switching systems
and the MSVC project now defaults to the dynamic runtime library
- Namespace some variables to avoid conflicts on wince.
libtheora 1.1.0 (2009 September 24)
- Fix various small issues with the example and telemetry code
- Fix handing a zero-byte packet as the first frame
- Documentation cleanup
- Two minor build fixes
libtheora 1.1beta3 (2009 August 22)
- Rate control fixes to smooth quality
- MSVC build now exports all of the 1.0 api
- Assorted small bug fixes
libtheora 1.1beta2 (2009 August 12)
- Fix a rate control problem with difficult input
- Build fixes for OpenBSD and Apple Xcode
- Examples now all use the 1.0 api
- TH_ENCCTL_SET_SPLEVEL works again
- Various bug fixes and source tree rearrangement
libtheora 1.1beta1 (2009 August 5)
- Support for two-pass encoding
- Performance optimization of both encoder and decoder
- Encoder supports dynamic adjustment of quality and
bitrate targets
- Encoder is generally more configurable, and all
rate control modes perform better
- Encoder now accepts 4:2:2 and 4:4:4 chroma sampling
- Decoder telemetry output shows quantization choice
and a breakdown of bitrate usage in the frame
- MSVC assembly optimizations up to date and functional
libtheora 1.1alpha2 (2009 May 26)
- Reduce lambda for small quantizers.
- New encoder fDCT does better on smooth gradients
- Use SATD for mode decisions (1-2% bitrate reduction)
- Assembly rewrite for new features and general speed up
- Share code between the encoder and decoder for performance
- Fix 4:2:2 decoding and telemetry
- MSVC project files updated, but assembly is disabled.
- New configure option --disable-spec to work around toolchain
detection failures.
- Limit symbol exports on MacOS X.
- Port remaining unit tests from the 1.0 release.
libtheora 1.1alpha1 (2009 March 27)
- Encoder rewrite with much improved vbr quality/bitrate and
better tracking of the target rate in cbr mode.
- MSVC project files do not work in this release.
libtheora 1.0 (2008 November 3)
- Merge x86 assembly for forward DCT from Thusnelda branch.
- Update 32 bit MMX with loop filter fix.
- Check for an uninitialized state before dereferencing in propagating
decode calls.
- Remove all TH_DEBUG statements.
- Rename the bitpacker source files copied from libogg to avoid
confusing simple build systems using both libraries.
- Declare bitfield entries to be explicitly signed for Solaris cc.
- Set quantization parameters to default values when an empty buffer is
passed with TH_ENCCTL_SET_QUANT_PARAMS.
- Split encoder and decoder tests depending on configure settings.
- Return lstylex.sty to the distribution.
- Disable inline assembly on gcc versions prior to 3.1.
- Remove extern references for OC_*_QUANT_MIN.
- Make various data tables static const so they can be read-only.
- Remove ENCCTL codes from the old encoder API.
- Implement TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE ctl.
- Fix segfault when exactly one of the width or height is not a multiple
of 16, but the other is.
- Compute the correct vertical offset for chroma.
- cpuid assembly fix for MSVC.
- Add VS2008 project files.
- Build updates for 64-bit platforms, Mingw32, VS and XCode.
- Do not clobber the cropping rectangle.
- Declare ourselves 1.0final to pkg-config to sort after beta releases.
- Fix the scons build to include asm in libtheoradec/enc.
libtheora 1.0beta3 (2008 April 16)
- Build new libtheoradec and libtheoraenc libraries
supporting the new API from theora-exp. This API should
not be considered stable yet.
- Change granule_frame() to return an index as documented.
This is a change of behaviour from 1.0beta1.
- Document that granule_time() returns the end of the
presentation interval.
- Use a custom copy of the libogg bitpacker in the decoder
to avoid function call overhead.
- MMX code improved and ported to MSVC.
- Fix a problem with the MMX code on SELinux.
- Fix a problem with decoder quantizer initialization.
- Fix a page queue problem with png2theora.
- Improved robustness.
- Updated VS2005 project files.
- Dropped build support for Microsoft VS2003.
- Dropped build support for the unreleased libogg2.
- Added the specification to the autotools build.
- Specification corrections.
libtheora 1.0beta2 (2007 October 12)
- Fix a crash bug on char-is-unsigned architectures (PowerPC)
- Fix a buffer sizing issue that caused rare encoder crashes
- Fix a buffer alignment issue
- Build fixes for MingW32, MSVC
- Improved format documentation.
libtheora 1.0beta1 (2007 September 22)
- Granulepos scheme modified to match other codecs. This bumps
the bitstream revision to 3.2.1. Bitstreams marked 3.2.0 are
handled correctly by this decoder. Older decoders will show
a one frame sync error in the less noticeable direction.
libtheora 1.0alpha8 (2007 September 18)
- Switch to new spec compliant decoder from theora-exp branch.
Written by Dr. Timothy Terriberry.
- Add support to the encoder for using quantization settings
provided by the application.
- more assembly optimizations
libtheora 1.0alpha7 (2006 June 20)
- Enable mmx assembly by default
- Avoid some relocations that caused problems on SELinux
- Other build fixes
- time testing mode (-f) for the dump_video example
libtheora 1.0alpha6 (2006 May 30)
* Merge theora-mmx simd acceleration (x86_32 and x86_64)
* Major RTP payload specification update
* Minor format specification updates
* Fix some spurious calls to free() instead of _ogg_free()
* Fix invalid array indexing in PixelLineSearch()
* Improve robustness against invalid input
* General warning cleanup
* The offset_y member now means what every application thought it meant
(offset from the top). This will mean some old files (those with a
non-centered image created with a buggy encoder) will display differently.
libtheora 1.0alpha5 (2005 August 20)
* Fixed bitrate management bugs that caused popping and encode
errors
* Fixed a crash problem with the theora_state internals not
being intialized properly.
* new utility function:
- theora_granule_shift()
* dump_video example now makes YUV4MPEG files by default, so
the results can be fed back to encoder_example and similar
tools. The old behavior is restored through the '-r' switch.
* ./configure now prints a summary
* simple unit test of the comment api under 'make check'
* misc code cleanup, warning and leak fixes
libtheora 1.0alpha4 (2004 December 15)
* first draft of the Theora I Format Specification
* API documentation generated from theora.h with Doxygen
* fix a double-update bug in the motion analysis
* apply the loop filter before filling motion vector border
in the reference frame
* new utility functions:
- theora_packet_isheader(),
- theora_packet_iskeyframe()
- theora_granule_frame()
* optional support for building without floating point
* optional support for building without encode support
* various build and packaging fixes
* pkg-config support
* SymbianOS build support
libtheora 1.0alpha3 (2004 March 20)
UPDATE: on 2004 July 1 the Theora I bitstream format was frozen. Files
produced by the libtheora 1.0alpha3 reference encoder will always be
decodable by the Theora I spec.
* Bitstream info header FORMAT CHANGES:
- move the granulepos shift field to maintain byte alignment longer.
- reserve 5 additional bits for subsampling and interlace flags.
* Bitstream setup header FORMAT CHANGES:
- support for a range of interpolated quant matricies.
- include the in-loop block filter coeff.
* Bitsteam data packet FORMAT CHANGES:
- Reserve a bit for per-block Q index selection.
- Flip the coded image orientation for compatibility with VP3.
This allows lossless transcoding of VP3 content, but files
encoded with earlier theora releases would play upside down.
* example VP3 lossless transcoder
* optional support for libogg2
* timing improvements in the example player
* packaging and build system updates and fixes
libtheora 1.0alpha2 (2003 June 9)
* bitstream FORMAT CHANGES:
- store the quant tables in a third setup header for
future encoder flexibility
- store the huffman tables in the third setup header
- add a field for marking the colorspace to the info header
- add crop parameters for non-multiple-of-16 frame sizes
- add a second vorbiscomment-style metadata header
* API changes to handle multiple headers with a single
theora_decode_header() call, like libvorbis
* code cleanup and minor fixes
* new dump_video code example/utility
* experimental win32 code examples
libtheora 1.0alpha1 (2002 September 25)
* First release of the theora reference implementation
* Port of the newly opened VP3 code to the Ogg container
* Rewrite of the code for portability and to use the libogg bitpacker

View File

@ -1,28 +0,0 @@
Copyright (C) 2002-2009 Xiph.org Foundation
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of the Xiph.org Foundation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,18 +0,0 @@
Please see the file COPYING for the copyright license for this software.
In addition to and irrespective of the copyright license associated
with this software, On2 Technologies, Inc. makes the following statement
regarding technology used in this software:
On2 represents and warrants that it shall not assert any rights
relating to infringement of On2's registered patents, nor initiate
any litigation asserting such rights, against any person who, or
entity which utilizes the On2 VP3 Codec Software, including any
use, distribution, and sale of said Software; which make changes,
modifications, and improvements in said Software; and to use,
distribute, and sell said changes as well as applications for other
fields of use.
This reference implementation is originally derived from the On2 VP3
Codec Software, and the Theora video format is essentially compatible
with the VP3 video format, consisting of a backward-compatible superset.

View File

@ -1,24 +0,0 @@
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
ifdef GNU_AS
ifeq ($(TARGET_CPU),arm)
armfrag-gnu.s: armopts-gnu.S
armidct-gnu.s: armopts-gnu.S
armloop-gnu.s: armopts-gnu.S
# armopts needs a specific rule, because arm2gnu.pl will always add the .S
# suffix when translating the files that include it.
armopts-gnu.S: $(srcdir)/lib/arm/armopts.s
$(PERL) $(srcdir)/lib/arm/arm2gnu.pl < $< > $@
# For all others, we can use an implicit rule
%-gnu.s: $(srcdir)/lib/arm/%.s
$(PERL) $(srcdir)/lib/arm/arm2gnu.pl < $< > $@
endif
endif
include $(topsrcdir)/config/rules.mk

View File

@ -1,148 +0,0 @@
# Xiph.org Foundation's libtheora
### What is Theora?
Theora was Xiph.Org's first publicly released video codec, intended
for use within the Foundation's Ogg multimedia streaming system.
Theora is derived directly from On2's VP3 codec, adds new features
while allowing it a longer useful lifetime.
The 1.0 release decoder supported all the new features, but the
encoder is nearly identical to the VP3 code.
The 1.1 release featured a completely rewritten encoder, offering
better performance and compression, and making more complete use
of the format's feature set.
The 1.2 release features significant additional improvements in
compression and performance. Files produced by newer encoders can
be decoded by earlier releases.
### Where is Theora?
Theora's main site is https://www.theora.org. Releases of Theora
and related libraries can be found on the
[download page](https://www.theora.org/downloads/) or the
[main Xiph.Org site](https://xiph.org/downloads/).
Development source is kept at https://gitlab.xiph.org/xiph/theora.
## Getting started with the code
### What do I need to build the source?
Requirements summary:
For libtheora:
* libogg 1.1 or newer.
For example encoder:
* as above,
* libvorbis and libvorbisenc 1.0.1 or newer.
(libvorbis 1.3.1 or newer for 5.1 audio)
For creating a source distribution package:
* as above,
* Doxygen to build the API documentation,
* pdflatex and fig2dev to build the format specification
(transfig package in Ubuntu).
For the player only:
* as above,
* SDL (Simple Direct media Layer) libraries and headers,
* OSS audio driver and development headers.
The provided build system is the GNU automake/autoconf system, and
the main library, libtheora, should already build smoothly on any
system. Failure of libtheora to build on a GNU-enabled system is
considered a bug; please report problems to theora-dev@xiph.org.
Windows build support is included in the win32 directory.
Project files for Apple XCode are included in the macosx directory.
There is also a more limited scons build.
### How do I use the sample encoder?
The sample encoder takes raw video in YUV4MPEG2 format, as used by
lavtools, mjpeg-tools and other packages. The encoder expects audio,
if any, in a separate wave WAV file. Try 'encoder_example -h' for a
complete list of options.
An easy way to get raw video and audio files is to use MPlayer as an
export utility. The options " -ao pcm -vo yuv4mpeg " will export a
wav file named audiodump.wav and a YUV video file in the correct
format for encoder_example as stream.yuv. Be careful when exporting
video alone; MPlayer may drop frames to 'keep up' with the audio
timer. The example encoder can't properly synchronize input audio and
video file that aren't in sync to begin with.
The encoder will also take video or audio on stdin if '-' is specified
as the input file name.
There is also a 'png2theora' example which accepts a set of image
files in that format.
### How do I use the sample player?
The sample player takes an Ogg file on standard in; the file may be
audio alone, video alone or video with audio.
### What other tools are available?
The programs in the examples directory are intended as tutorial source
for developers using the library. As such they sacrifice features and
robustness in the interests of comprehension and should not be
considered serious applications.
If you're wanting to just use theora, consider the programs linked
from https://www.theora.org/. There is playback support in a number
of common free players, and plugins for major media frameworks.
Jan Gerber's ffmpeg2theora is an excellent encoding front end.
## Troubleshooting the build process
### Compile error, such as:
encoder_internal.h:664: parse error before `ogg_uint16_t`
This means you have version of libogg prior to 1.1. A *complete* new Ogg
install, libs and headers is needed.
Also be sure that there aren't multiple copies of Ogg installed in
/usr and /usr/local; an older one might be first on the search path
for libs and headers.
### Link error, such as:
undefined reference to `oggpackB_stream`
See above; you need libogg 1.1 or later.
### Link error, such as:
undefined reference to `vorbis_granule_time`
You need libvorbis and libvorbisenc from the 1.0.1 release or later.
### Link error, such as:
/usr/lib/libSDL.a(SDL_esdaudio.lo): In function `ESD_OpenAudio`:
SDL_esdaudio.lo(.text+0x25d): undefined reference to `esd_play_stream`
Be sure to use an SDL that's built to work with OSS. If you use an
SDL that is also built with ESD and/or ALSA support, it will try to
suck in all those extra libraries at link time too. That will only
work if the extra libraries are also installed.
### Link warning, such as:
libtool: link: warning: library `/usr/lib/libogg.la` was moved.
libtool: link: warning: library `/usr/lib/libogg.la` was moved.
Re-run theora/autogen.sh after an Ogg or Vorbis rebuild/reinstall

View File

@ -1,277 +0,0 @@
diff --git a/lib/arm/arm2gnu.pl b/lib/arm/arm2gnu.pl
index 8cb68e4..d6fe09c 100755
--- a/lib/arm/arm2gnu.pl
+++ b/lib/arm/arm2gnu.pl
@@ -25,6 +25,8 @@ $n=0;
$thumb = 0; # ARM mode by default, not Thumb.
@proc_stack = ();
+printf (" .syntax unified\n");
+
LINE:
while (<>) {
diff --git a/lib/arm/armbits.s b/lib/arm/armbits.s
index 9400722..fd6e444 100644
--- a/lib/arm/armbits.s
+++ b/lib/arm/armbits.s
@@ -67,28 +67,28 @@ oc_pack_read_refill
; negative.
CMP r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
- LDRHIB r14,[r11],#1 ; r14 = *ptr++
+ LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
- ORRHI r2,r14,LSL r3 ; r2 = window|=r14<<32-available
+ ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
- LDRHIB r14,[r11],#1 ; r14 = *ptr++
+ LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
- ORRHI r2,r14,LSL r3 ; r2 = window|=r14<<32-available
+ ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
- LDRHIB r14,[r11],#1 ; r14 = *ptr++
+ LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
- ORRHI r2,r14,LSL r3 ; r2 = window|=r14<<32-available
+ ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
- LDRHIB r14,[r11],#1 ; r14 = *ptr++
+ LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
- ORRHI r2,r14,LSL r3 ; r2 = window|=r14<<32-available
+ ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
SUBS r3,r0,r3 ; r3 = available-=_bits, available<bits => GT
BLT oc_pack_read_refill_last
MOV r0,r2,LSR r0 ; r0 = window>>32-_bits
@@ -104,14 +104,14 @@ oc_pack_read_refill_last
CMP r11,r10 ; ptr<stop => LO
; If we didn't hit the end of the packet, then pull enough of the next byte to
; to fill up the window.
- LDRLOB r14,[r11] ; (LO) r14 = *ptr
+ LDRBLO r14,[r11] ; (LO) r14 = *ptr
; Otherwise, set the EOF flag and pretend we have lots of available bits.
MOVHS r14,#1 ; (HS) r14 = 1
ADDLO r10,r3,r1 ; (LO) r10 = available
STRHS r14,[r12,#8] ; (HS) eof = 1
ANDLO r10,r10,#7 ; (LO) r10 = available&7
MOVHS r3,#1<<30 ; (HS) available = OC_LOTS_OF_BITS
- ORRLO r2,r14,LSL r10 ; (LO) r2 = window|=*ptr>>(available&7)
+ ORRLO r2,r2,r14,LSL r10 ; (LO) r2 = window|=*ptr>>(available&7)
MOV r0,r2,LSR r0 ; r0 = window>>32-_bits
MOV r2,r2,LSL r1 ; r2 = window<<=_bits
STR r11,[r12,#-4] ; ptr = r11
@@ -183,32 +183,32 @@ oc_huff_token_decode_refill
; We can't possibly need more than 15 bits, so available must be <= 15.
; Therefore we can load at least two bytes without checking it.
CMP r2,r3 ; ptr<stop => HI
- LDRHIB r14,[r3],#1 ; r14 = *ptr++
+ LDRBHI r14,[r3],#1 ; r14 = *ptr++
RSBHI r5,r5,#24 ; (HI) available = 32-(available+=8)
RSBLS r5,r5,#32 ; (LS) r5 = 32-available
- ORRHI r4,r14,LSL r5 ; r4 = window|=r14<<32-available
+ ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
CMPHI r2,r3 ; ptr<stop => HI
- LDRHIB r14,[r3],#1 ; r14 = *ptr++
+ LDRBHI r14,[r3],#1 ; r14 = *ptr++
SUBHI r5,#8 ; available += 8
; (HI) Stall...
- ORRHI r4,r14,LSL r5 ; r4 = window|=r14<<32-available
+ ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
; We can use unsigned compares for both the pointers and for available
; (allowing us to chain condition codes) because available will never be
; larger than 32 (or we wouldn't be here), and thus 32-available will never be
; negative.
CMPHI r2,r3 ; ptr<stop => HI
CMPHI r5,#7 ; available<=24 => HI
- LDRHIB r14,[r3],#1 ; r14 = *ptr++
+ LDRBHI r14,[r3],#1 ; r14 = *ptr++
SUBHI r5,#8 ; available += 8
; (HI) Stall...
- ORRHI r4,r14,LSL r5 ; r4 = window|=r14<<32-available
+ ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
CMP r2,r3 ; ptr<stop => HI
MOVLS r5,#-1<<30 ; (LS) available = OC_LOTS_OF_BITS+32
CMPHI r5,#7 ; (HI) available<=24 => HI
- LDRHIB r14,[r3],#1 ; (HI) r14 = *ptr++
+ LDRBHI r14,[r3],#1 ; (HI) r14 = *ptr++
SUBHI r5,#8 ; (HI) available += 8
; (HI) Stall...
- ORRHI r4,r14,LSL r5 ; (HI) r4 = window|=r14<<32-available
+ ORRHI r4,r4,r14,LSL r5 ; (HI) r4 = window|=r14<<32-available
RSB r14,r10,#32 ; r14 = 32-n
MOV r14,r4,LSR r14 ; r14 = bits=window>>32-n
ADD r12,r12,r14 ;
diff --git a/lib/arm/armfrag.s b/lib/arm/armfrag.s
index 38627ed..38ee775 100644
--- a/lib/arm/armfrag.s
+++ b/lib/arm/armfrag.s
@@ -357,7 +357,7 @@ ofrintra_v6_lp
ORR r5, r5, r5, LSR #8 ; r5 = __777766
PKHBT r2, r2, r3, LSL #16 ; r2 = 33221100
PKHBT r3, r4, r5, LSL #16 ; r3 = 77665544
- STRD r2, [r0], r1
+ STRD r2, r3, [r0], r1
BGT ofrintra_v6_lp
LDMFD r13!,{r4-r6,PC}
ENDP
@@ -397,7 +397,7 @@ ofrinter_v6_lp
USAT16 r12,#8, r12 ; r12= __66__44
USAT16 r5, #8, r5 ; r4 = __77__55
ORR r5, r12,r5, LSL #8 ; r5 = 33221100
- STRD r4, [r0], r2
+ STRD r4, r5, [r0], r2
BGT ofrinter_v6_lp
LDMFD r13!,{r4-r7,PC}
ENDP
@@ -439,7 +439,7 @@ ofrinter2_v6_lp
USAT16 r8, #8, r8 ; r8 = __22__00
USAT16 r7, #8, r7 ; r7 = __33__11
ORR r8, r8, r7, LSL #8 ; r8 = 33221100
- STRD r8, [r0], r3
+ STRD r8, r9, [r0], r3
BGT ofrinter2_v6_lp
LDMFD r13!,{r4-r9,PC}
ENDP
diff --git a/lib/arm/armidct.s b/lib/arm/armidct.s
index 68530c7..269f74b 100644
--- a/lib/arm/armidct.s
+++ b/lib/arm/armidct.s
@@ -875,7 +875,7 @@ idct2_1core_v6 PROC
LDR r3, OC_C4S4
LDRSH r6, [r1], #16 ; r6 = x[1,0]
SMULWB r12,r3, r2 ; r12= t[0,0]=OC_C4S4*x[0,0]>>16
- LDRD r4, OC_C7S1 ; r4 = OC_C7S1; r5 = OC_C1S7
+ LDRD r4, r5, OC_C7S1 ; r4 = OC_C7S1; r5 = OC_C1S7
SMULWB r6, r3, r6 ; r6 = t[1,0]=OC_C4S4*x[1,0]>>16
SMULWT r4, r4, r2 ; r4 = t[0,4]=OC_C7S1*x[0,1]>>16
SMULWT r7, r5, r2 ; r7 = t[0,7]=OC_C1S7*x[0,1]>>16
@@ -937,7 +937,7 @@ idct2_2core_down_v6 PROC
MOV r7 ,#8 ; r7 = 8
LDR r6, [r1], #16 ; r6 = <x[1,1]|x[1,0]>
SMLAWB r12,r3, r2, r7 ; r12= (t[0,0]=OC_C4S4*x[0,0]>>16)+8
- LDRD r4, OC_C7S1 ; r4 = OC_C7S1; r5 = OC_C1S7
+ LDRD r4, r5, OC_C7S1 ; r4 = OC_C7S1; r5 = OC_C1S7
SMLAWB r7, r3, r6, r7 ; r7 = (t[1,0]=OC_C4S4*x[1,0]>>16)+8
SMULWT r5, r5, r2 ; r2 = t[0,7]=OC_C1S7*x[0,1]>>16
PKHBT r12,r12,r7, LSL #16 ; r12= <t[1,0]+8|t[0,0]+8>
@@ -1053,7 +1053,7 @@ idct3_2core_v6 PROC
; r1 = const ogg_int16_t *_x (source)
; Stage 1:
LDRD r4, [r1], #16 ; r4 = <x[0,1]|x[0,0]>; r5 = <*|x[0,2]>
- LDRD r10,OC_C6S2_3_v6 ; r10= OC_C6S2; r11= OC_C2S6
+ LDRD r10, r11, OC_C6S2_3_v6 ; r10= OC_C6S2; r11= OC_C2S6
; Stall
SMULWB r3, r11,r5 ; r3 = t[0,3]=OC_C2S6*x[0,2]>>16
LDR r11,OC_C4S4
@@ -1132,12 +1132,12 @@ idct4_3core_v6 PROC
; r1 = const ogg_int16_t *_x (source)
; Stage 1:
LDRD r10,[r1], #16 ; r10= <x[0,1]|x[0,0]>; r11= <x[0,3]|x[0,2]>
- LDRD r2, OC_C5S3_4_v6 ; r2 = OC_C5S3; r3 = OC_C3S5
+ LDRD r2, r3, OC_C5S3_4_v6 ; r2 = OC_C5S3; r3 = OC_C3S5
LDRD r4, [r1], #16 ; r4 = <x[1,1]|x[1,0]>; r5 = <??|x[1,2]>
SMULWT r9, r3, r11 ; r9 = t[0,6]=OC_C3S5*x[0,3]>>16
SMULWT r8, r2, r11 ; r8 = -t[0,5]=OC_C5S3*x[0,3]>>16
PKHBT r9, r9, r2 ; r9 = <0|t[0,6]>
- LDRD r6, OC_C6S2_4_v6 ; r6 = OC_C6S2; r7 = OC_C2S6
+ LDRD r6, r7, OC_C6S2_4_v6 ; r6 = OC_C6S2; r7 = OC_C2S6
PKHBT r8, r8, r2 ; r9 = <0|-t[0,5]>
SMULWB r3, r7, r11 ; r3 = t[0,3]=OC_C2S6*x[0,2]>>16
SMULWB r2, r6, r11 ; r2 = t[0,2]=OC_C6S2*x[0,2]>>16
@@ -1148,7 +1148,7 @@ idct4_3core_v6 PROC
SMULWB r12,r11,r10 ; r12= t[0,0]=OC_C4S4*x[0,0]>>16
PKHBT r2, r2, r5, LSL #16 ; r2 = <t[1,2]|t[0,2]>
SMULWB r5, r11,r4 ; r5 = t[1,0]=OC_C4S4*x[1,0]>>16
- LDRD r6, OC_C7S1_4_v6 ; r6 = OC_C7S1; r7 = OC_C1S7
+ LDRD r6, r7, OC_C7S1_4_v6 ; r6 = OC_C7S1; r7 = OC_C1S7
PKHBT r12,r12,r5, LSL #16 ; r12= <t[1,0]|t[0,0]>
SMULWT r5, r7, r4 ; r5 = t[1,7]=OC_C1S7*x[1,1]>>16
SMULWT r7, r7, r10 ; r7 = t[0,7]=OC_C1S7*x[0,1]>>16
@@ -1216,10 +1216,10 @@ idct4_4core_down_v6 PROC
; r1 = const ogg_int16_t *_x (source)
; Stage 1:
LDRD r10,[r1], #16 ; r10= <x[0,1]|x[0,0]>; r11= <x[0,3]|x[0,2]>
- LDRD r2, OC_C5S3_4_v6 ; r2 = OC_C5S3; r3 = OC_C3S5
+ LDRD r2, r3, OC_C5S3_4_v6 ; r2 = OC_C5S3; r3 = OC_C3S5
LDRD r4, [r1], #16 ; r4 = <x[1,1]|x[1,0]>; r5 = <x[1,3]|x[1,2]>
SMULWT r9, r3, r11 ; r9 = t[0,6]=OC_C3S5*x[0,3]>>16
- LDRD r6, OC_C6S2_4_v6 ; r6 = OC_C6S2; r7 = OC_C2S6
+ LDRD r6, r7, OC_C6S2_4_v6 ; r6 = OC_C6S2; r7 = OC_C2S6
SMULWT r8, r2, r11 ; r8 = -t[0,5]=OC_C5S3*x[0,3]>>16
; Here we cheat: row 3 had just a DC, so x[0,3]==x[1,3] by definition.
PKHBT r9, r9, r9, LSL #16 ; r9 = <t[0,6]|t[0,6]>
@@ -1234,7 +1234,7 @@ idct4_4core_down_v6 PROC
SMLAWB r12,r11,r10,r7 ; r12= t[0,0]+8=(OC_C4S4*x[0,0]>>16)+8
PKHBT r2, r2, r5, LSL #16 ; r2 = <t[1,2]|t[0,2]>
SMLAWB r5, r11,r4 ,r7 ; r5 = t[1,0]+8=(OC_C4S4*x[1,0]>>16)+8
- LDRD r6, OC_C7S1_4_v6 ; r6 = OC_C7S1; r7 = OC_C1S7
+ LDRD r6, r7, OC_C7S1_4_v6 ; r6 = OC_C7S1; r7 = OC_C1S7
PKHBT r12,r12,r5, LSL #16 ; r12= <t[1,0]+8|t[0,0]+8>
SMULWT r5, r7, r4 ; r5 = t[1,7]=OC_C1S7*x[1,1]>>16
SMULWT r7, r7, r10 ; r7 = t[0,7]=OC_C1S7*x[0,1]>>16
@@ -1264,7 +1264,7 @@ idct8_8core_v6 PROC
STMFD r13!,{r0,r14}
; Stage 1:
;5-6 rotation by 3pi/16
- LDRD r10,OC_C5S3_4_v6 ; r10= OC_C5S3, r11= OC_C3S5
+ LDRD r10, r11, OC_C5S3_4_v6 ; r10= OC_C5S3, r11= OC_C3S5
LDR r4, [r1,#8] ; r4 = <x[0,5]|x[0,4]>
LDR r7, [r1,#24] ; r7 = <x[1,5]|x[1,4]>
SMULWT r5, r11,r4 ; r5 = OC_C3S5*x[0,5]>>16
@@ -1281,7 +1281,7 @@ idct8_8core_v6 PROC
PKHBT r6, r6, r11,LSL #16 ; r6 = <t[1,6]|t[0,6]>
SMULWT r8, r10,r12 ; r8 = OC_C5S3*x[1,3]>>16
;2-3 rotation by 6pi/16
- LDRD r10,OC_C6S2_4_v6 ; r10= OC_C6S2, r11= OC_C2S6
+ LDRD r10, r11, OC_C6S2_4_v6 ; r10= OC_C6S2, r11= OC_C2S6
PKHBT r3, r3, r8, LSL #16 ; r3 = <r8|r3>
LDR r8, [r1,#12] ; r8 = <x[0,7]|x[0,6]>
SMULWB r2, r10,r0 ; r2 = OC_C6S2*x[0,2]>>16
@@ -1297,7 +1297,7 @@ idct8_8core_v6 PROC
PKHBT r3, r3, r10,LSL #16 ; r3 = <t[1,6]|t[0,6]>
SMULWB r12,r11,r7 ; r12= OC_C2S6*x[1,6]>>16
;4-7 rotation by 7pi/16
- LDRD r10,OC_C7S1_8_v6 ; r10= OC_C7S1, r11= OC_C1S7
+ LDRD r10, r11, OC_C7S1_8_v6 ; r10= OC_C7S1, r11= OC_C1S7
PKHBT r9, r9, r12,LSL #16 ; r9 = <r9|r12>
LDR r0, [r1],#16 ; r0 = <x[0,1]|x[0,0]>
PKHTB r7, r7, r8, ASR #16 ; r7 = <x[1,7]|x[0,7]>
@@ -1363,7 +1363,7 @@ idct8_8core_down_v6 PROC
STMFD r13!,{r0,r14}
; Stage 1:
;5-6 rotation by 3pi/16
- LDRD r10,OC_C5S3_8_v6 ; r10= OC_C5S3, r11= OC_C3S5
+ LDRD r10, r11, OC_C5S3_8_v6 ; r10= OC_C5S3, r11= OC_C3S5
LDR r4, [r1,#8] ; r4 = <x[0,5]|x[0,4]>
LDR r7, [r1,#24] ; r7 = <x[1,5]|x[1,4]>
SMULWT r5, r11,r4 ; r5 = OC_C3S5*x[0,5]>>16
@@ -1380,7 +1380,7 @@ idct8_8core_down_v6 PROC
PKHBT r6, r6, r11,LSL #16 ; r6 = <t[1,6]|t[0,6]>
SMULWT r8, r10,r12 ; r8 = OC_C5S3*x[1,3]>>16
;2-3 rotation by 6pi/16
- LDRD r10,OC_C6S2_8_v6 ; r10= OC_C6S2, r11= OC_C2S6
+ LDRD r10, r11, OC_C6S2_8_v6 ; r10= OC_C6S2, r11= OC_C2S6
PKHBT r3, r3, r8, LSL #16 ; r3 = <r8|r3>
LDR r8, [r1,#12] ; r8 = <x[0,7]|x[0,6]>
SMULWB r2, r10,r0 ; r2 = OC_C6S2*x[0,2]>>16
@@ -1396,7 +1396,7 @@ idct8_8core_down_v6 PROC
PKHBT r3, r3, r10,LSL #16 ; r3 = <t[1,6]|t[0,6]>
SMULWB r12,r11,r7 ; r12= OC_C2S6*x[1,6]>>16
;4-7 rotation by 7pi/16
- LDRD r10,OC_C7S1_8_v6 ; r10= OC_C7S1, r11= OC_C1S7
+ LDRD r10, r11, OC_C7S1_8_v6 ; r10= OC_C7S1, r11= OC_C1S7
PKHBT r9, r9, r12,LSL #16 ; r9 = <r9|r12>
LDR r0, [r1],#16 ; r0 = <x[0,1]|x[0,0]>
PKHTB r7, r7, r8, ASR #16 ; r7 = <x[1,7]|x[0,7]>
--
2.39.1

View File

@ -1,606 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: theora.h,v 1.8 2004/03/15 22:17:32 derf Exp $
********************************************************************/
/**\mainpage
*
* \section intro Introduction
*
* This is the documentation for the <tt>libtheora</tt> C API.
*
* The \c libtheora package is the current reference
* implementation for <a href="http://www.theora.org/">Theora</a>, a free,
* patent-unencumbered video codec.
* Theora is derived from On2's VP3 codec with additional features and
* integration with Ogg multimedia formats by
* <a href="http://www.xiph.org/">the Xiph.Org Foundation</a>.
* Complete documentation of the format itself is available in
* <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>.
*
* \section Organization
*
* The functions documented here are divided between two
* separate libraries:
* - \c libtheoraenc contains the encoder interface,
* described in \ref encfuncs.
* - \c libtheoradec contains the decoder interface,
* described in \ref decfuncs, \n
* and additional \ref basefuncs.
*
* New code should link to \c libtheoradec. If using encoder
* features, it must also link to \c libtheoraenc.
*
* During initial development, prior to the 1.0 release,
* \c libtheora exported a different \ref oldfuncs which
* combined both encode and decode functions.
* In general, legacy API symbols can be indentified
* by their \c theora_ or \c OC_ namespace prefixes.
* The current API uses \c th_ or \c TH_ instead.
*
* While deprecated, \c libtheoraenc and \c libtheoradec
* together export the legacy api as well at the one documented above.
* Likewise, the legacy \c libtheora included with this package
* exports the new 1.x API. Older code and build scripts can therefore
* but updated independently to the current scheme.
*/
/**\file
* The shared <tt>libtheoradec</tt> and <tt>libtheoraenc</tt> C API.
* You don't need to include this directly.*/
#if !defined(_O_THEORA_CODEC_H_)
# define _O_THEORA_CODEC_H_ (1)
# include <ogg/ogg.h>
#if defined(__cplusplus)
extern "C" {
#endif
/**\name Return codes*/
/*@{*/
/**An invalid pointer was provided.*/
#define TH_EFAULT (-1)
/**An invalid argument was provided.*/
#define TH_EINVAL (-10)
/**The contents of the header were incomplete, invalid, or unexpected.*/
#define TH_EBADHEADER (-20)
/**The header does not belong to a Theora stream.*/
#define TH_ENOTFORMAT (-21)
/**The bitstream version is too high.*/
#define TH_EVERSION (-22)
/**The specified function is not implemented.*/
#define TH_EIMPL (-23)
/**There were errors in the video data packet.*/
#define TH_EBADPACKET (-24)
/**The decoded packet represented a dropped frame.
The player can continue to display the current frame, as the contents of the
decoded frame buffer have not changed.*/
#define TH_DUPFRAME (1)
/*@}*/
/**The currently defined color space tags.
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Chapter 4, for exact details on the meaning
* of each of these color spaces.*/
typedef enum{
/**The color space was not specified at the encoder.
It may be conveyed by an external means.*/
TH_CS_UNSPECIFIED,
/**A color space designed for NTSC content.*/
TH_CS_ITU_REC_470M,
/**A color space designed for PAL/SECAM content.*/
TH_CS_ITU_REC_470BG,
/**The total number of currently defined color spaces.*/
TH_CS_NSPACES
}th_colorspace;
/**The currently defined pixel format tags.
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 4.4, for details on the precise sample
* locations.*/
typedef enum{
/**Chroma decimation by 2 in both the X and Y directions (4:2:0).
The Cb and Cr chroma planes are half the width and half the
height of the luma plane.*/
TH_PF_420,
/**Currently reserved.*/
TH_PF_RSVD,
/**Chroma decimation by 2 in the X direction (4:2:2).
The Cb and Cr chroma planes are half the width of the luma plane, but full
height.*/
TH_PF_422,
/**No chroma decimation (4:4:4).
The Cb and Cr chroma planes are full width and full height.*/
TH_PF_444,
/**The total number of currently defined pixel formats.*/
TH_PF_NFORMATS
}th_pixel_fmt;
/**A buffer for a single color plane in an uncompressed image.
* This contains the image data in a left-to-right, top-down format.
* Each row of pixels is stored contiguously in memory, but successive
* rows need not be.
* Use \a stride to compute the offset of the next row.
* The encoder accepts both positive \a stride values (top-down in memory)
* and negative (bottom-up in memory).
* The decoder currently always generates images with positive strides.*/
typedef struct{
/**The width of this plane.*/
int width;
/**The height of this plane.*/
int height;
/**The offset in bytes between successive rows.*/
int stride;
/**A pointer to the beginning of the first row.*/
unsigned char *data;
}th_img_plane;
/**A complete image buffer for an uncompressed frame.
* The chroma planes may be decimated by a factor of two in either
* direction, as indicated by th_info#pixel_fmt.
* The width and height of the Y' plane must be multiples of 16.
* They may need to be cropped for display, using the rectangle
* specified by th_info#pic_x, th_info#pic_y, th_info#pic_width,
* and th_info#pic_height.
* All samples are 8 bits.
* \note The term YUV often used to describe a colorspace is ambiguous.
* The exact parameters of the RGB to YUV conversion process aside, in
* many contexts the U and V channels actually have opposite meanings.
* To avoid this confusion, we are explicit: the name of the color
* channels are Y'CbCr, and they appear in that order, always.
* The prime symbol denotes that the Y channel is non-linear.
* Cb and Cr stand for "Chroma blue" and "Chroma red", respectively.*/
typedef th_img_plane th_ycbcr_buffer[3];
/**Theora bitstream information.
* This contains the basic playback parameters for a stream, and corresponds to
* the initial 'info' header packet.
* To initialize an encoder, the application fills in this structure and
* passes it to th_encode_alloc().
* A default encoding mode is chosen based on the values of the #quality and
* #target_bitrate fields.
* On decode, it is filled in by th_decode_headerin(), and then passed to
* th_decode_alloc().
*
* Encoded Theora frames must be a multiple of 16 in size;
* this is what the #frame_width and #frame_height members represent.
* To handle arbitrary picture sizes, a crop rectangle is specified in the
* #pic_x, #pic_y, #pic_width and #pic_height members.
*
* All frame buffers contain pointers to the full, padded frame.
* However, the current encoder <em>will not</em> reference pixels outside of
* the cropped picture region, and the application does not need to fill them
* in.
* The decoder <em>will</em> allocate storage for a full frame, but the
* application <em>should not</em> rely on the padding containing sensible
* data.
*
* It is also generally recommended that the offsets and sizes should still be
* multiples of 2 to avoid chroma sampling shifts when chroma is sub-sampled.
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 4.4, for more details.
*
* Frame rate, in frames per second, is stored as a rational fraction, as is
* the pixel aspect ratio.
* Note that this refers to the aspect ratio of the individual pixels, not of
* the overall frame itself.
* The frame aspect ratio can be computed from pixel aspect ratio using the
* image dimensions.*/
typedef struct{
/**\name Theora version
* Bitstream version information.*/
/*@{*/
unsigned char version_major;
unsigned char version_minor;
unsigned char version_subminor;
/*@}*/
/**The encoded frame width.
* This must be a multiple of 16, and less than 1048576.*/
ogg_uint32_t frame_width;
/**The encoded frame height.
* This must be a multiple of 16, and less than 1048576.*/
ogg_uint32_t frame_height;
/**The displayed picture width.
* This must be no larger than width.*/
ogg_uint32_t pic_width;
/**The displayed picture height.
* This must be no larger than height.*/
ogg_uint32_t pic_height;
/**The X offset of the displayed picture.
* This must be no larger than #frame_width-#pic_width or 255, whichever is
* smaller.*/
ogg_uint32_t pic_x;
/**The Y offset of the displayed picture.
* This must be no larger than #frame_height-#pic_height, and
* #frame_height-#pic_height-#pic_y must be no larger than 255.
* This slightly funny restriction is due to the fact that the offset is
* specified from the top of the image for consistency with the standard
* graphics left-handed coordinate system used throughout this API, while
* it is stored in the encoded stream as an offset from the bottom.*/
ogg_uint32_t pic_y;
/**\name Frame rate
* The frame rate, as a fraction.
* If either is 0, the frame rate is undefined.*/
/*@{*/
ogg_uint32_t fps_numerator;
ogg_uint32_t fps_denominator;
/*@}*/
/**\name Aspect ratio
* The aspect ratio of the pixels.
* If either value is zero, the aspect ratio is undefined.
* If not specified by any external means, 1:1 should be assumed.
* The aspect ratio of the full picture can be computed as
* \code
* aspect_numerator*pic_width/(aspect_denominator*pic_height).
* \endcode */
/*@{*/
ogg_uint32_t aspect_numerator;
ogg_uint32_t aspect_denominator;
/*@}*/
/**The color space.*/
th_colorspace colorspace;
/**The pixel format.*/
th_pixel_fmt pixel_fmt;
/**The target bit-rate in bits per second.
If initializing an encoder with this struct, set this field to a non-zero
value to activate CBR encoding by default.*/
int target_bitrate;
/**The target quality level.
Valid values range from 0 to 63, inclusive, with higher values giving
higher quality.
If initializing an encoder with this struct, and #target_bitrate is set
to zero, VBR encoding at this quality will be activated by default.*/
/*Currently this is set so that a qi of 0 corresponds to distortions of 24
times the JND, and each increase by 16 halves that value.
This gives us fine discrimination at low qualities, yet effective rate
control at high qualities.
The qi value 63 is special, however.
For this, the highest quality, we use one half of a JND for our threshold.
Due to the lower bounds placed on allowable quantizers in Theora, we will
not actually be able to achieve quality this good, but this should
provide as close to visually lossless quality as Theora is capable of.
We could lift the quantizer restrictions without breaking VP3.1
compatibility, but this would result in quantized coefficients that are
too large for the current bitstream to be able to store.
We'd have to redesign the token syntax to store these large coefficients,
which would make transcoding complex.*/
int quality;
/**The amount to shift to extract the last keyframe number from the granule
* position.
* This can be at most 31.
* th_info_init() will set this to a default value (currently <tt>6</tt>,
* which is good for streaming applications), but you can set it to 0 to
* make every frame a keyframe.
* The maximum distance between key frames is
* <tt>1<<#keyframe_granule_shift</tt>.
* The keyframe frequency can be more finely controlled with
* #TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE, which can also be adjusted
* during encoding (for example, to force the next frame to be a keyframe),
* but it cannot be set larger than the amount permitted by this field after
* the headers have been output.*/
int keyframe_granule_shift;
}th_info;
/**The comment information.
*
* This structure holds the in-stream metadata corresponding to
* the 'comment' header packet.
* The comment header is meant to be used much like someone jotting a quick
* note on the label of a video.
* It should be a short, to the point text note that can be more than a couple
* words, but not more than a short paragraph.
*
* The metadata is stored as a series of (tag, value) pairs, in
* length-encoded string vectors.
* The first occurrence of the '=' character delimits the tag and value.
* A particular tag may occur more than once, and order is significant.
* The character set encoding for the strings is always UTF-8, but the tag
* names are limited to ASCII, and treated as case-insensitive.
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 6.3.3 for details.
*
* In filling in this structure, th_decode_headerin() will null-terminate
* the user_comment strings for safety.
* However, the bitstream format itself treats them as 8-bit clean vectors,
* possibly containing null characters, so the length array should be
* treated as their authoritative length.
*/
typedef struct th_comment{
/**The array of comment string vectors.*/
char **user_comments;
/**An array of the corresponding length of each vector, in bytes.*/
int *comment_lengths;
/**The total number of comment strings.*/
int comments;
/**The null-terminated vendor string.
This identifies the software used to encode the stream.*/
char *vendor;
}th_comment;
/**A single base matrix.*/
typedef unsigned char th_quant_base[64];
/**A set of \a qi ranges.*/
typedef struct{
/**The number of ranges in the set.*/
int nranges;
/**The size of each of the #nranges ranges.
These must sum to 63.*/
const int *sizes;
/**#nranges <tt>+1</tt> base matrices.
Matrices \a i and <tt>i+1</tt> form the endpoints of range \a i.*/
const th_quant_base *base_matrices;
}th_quant_ranges;
/**A complete set of quantization parameters.
The quantizer for each coefficient is calculated as:
\code
Q=MAX(MIN(qmin[qti][ci!=0],scale[ci!=0][qi]*base[qti][pli][qi][ci]/100),
1024).
\endcode
\a qti is the quantization type index: 0 for intra, 1 for inter.
<tt>ci!=0</tt> is 0 for the DC coefficient and 1 for AC coefficients.
\a qi is the quality index, ranging between 0 (low quality) and 63 (high
quality).
\a pli is the color plane index: 0 for Y', 1 for Cb, 2 for Cr.
\a ci is the DCT coefficient index.
Coefficient indices correspond to the normal 2D DCT block
ordering--row-major with low frequencies first--\em not zig-zag order.
Minimum quantizers are constant, and are given by:
\code
qmin[2][2]={{4,2},{8,4}}.
\endcode
Parameters that can be stored in the bitstream are as follows:
- The two scale matrices ac_scale and dc_scale.
\code
scale[2][64]={dc_scale,ac_scale}.
\endcode
- The base matrices for each \a qi, \a qti and \a pli (up to 384 in all).
In order to avoid storing a full 384 base matrices, only a sparse set of
matrices are stored, and the rest are linearly interpolated.
This is done as follows.
For each \a qti and \a pli, a series of \a n \a qi ranges is defined.
The size of each \a qi range can vary arbitrarily, but they must sum to
63.
Then, <tt>n+1</tt> matrices are specified, one for each endpoint of the
ranges.
For interpolation purposes, each range's endpoints are the first \a qi
value it contains and one past the last \a qi value it contains.
Fractional values are rounded to the nearest integer, with ties rounded
away from zero.
Base matrices are stored by reference, so if the same matrices are used
multiple times, they will only appear once in the bitstream.
The bitstream is also capable of omitting an entire set of ranges and
its associated matrices if they are the same as either the previous
set (indexed in row-major order) or if the inter set is the same as the
intra set.
- Loop filter limit values.
The same limits are used for the loop filter in all color planes, despite
potentially differing levels of quantization in each.
For the current encoder, <tt>scale[ci!=0][qi]</tt> must be no greater
than <tt>scale[ci!=0][qi-1]</tt> and <tt>base[qti][pli][qi][ci]</tt> must
be no greater than <tt>base[qti][pli][qi-1][ci]</tt>.
These two conditions ensure that the actual quantizer for a given \a qti,
\a pli, and \a ci does not increase as \a qi increases.
This is not required by the decoder.*/
typedef struct{
/**The DC scaling factors.*/
ogg_uint16_t dc_scale[64];
/**The AC scaling factors.*/
ogg_uint16_t ac_scale[64];
/**The loop filter limit values.*/
unsigned char loop_filter_limits[64];
/**The \a qi ranges for each \a ci and \a pli.*/
th_quant_ranges qi_ranges[2][3];
}th_quant_info;
/**The number of Huffman tables used by Theora.*/
#define TH_NHUFFMAN_TABLES (80)
/**The number of DCT token values in each table.*/
#define TH_NDCT_TOKENS (32)
/**A Huffman code for a Theora DCT token.
* Each set of Huffman codes in a given table must form a complete, prefix-free
* code.
* There is no requirement that all the tokens in a table have a valid code,
* but the current encoder is not optimized to take advantage of this.
* If each of the five grouops of 16 tables does not contain at least one table
* with a code for every token, then the encoder may fail to encode certain
* frames.
* The complete table in the first group of 16 does not have to be in the same
* place as the complete table in the other groups, but the complete tables in
* the remaining four groups must all be in the same place.*/
typedef struct{
/**The bit pattern for the code, with the LSbit of the pattern aligned in
* the LSbit of the word.*/
ogg_uint32_t pattern;
/**The number of bits in the code.
* This must be between 0 and 32, inclusive.*/
int nbits;
}th_huff_code;
/**\defgroup basefuncs Functions Shared by Encode and Decode*/
/*@{*/
/**\name Basic shared functions
* These functions return information about the library itself,
* or provide high-level information about codec state
* and packet type.
*
* You must link to \c libtheoradec if you use any of the
* functions in this section.*/
/*@{*/
/**Retrieves a human-readable string to identify the library vendor and
* version.
* \return the version string.*/
extern const char *th_version_string(void);
/**Retrieves the library version number.
* This is the highest bitstream version that the encoder library will produce,
* or that the decoder library can decode.
* This number is composed of a 16-bit major version, 8-bit minor version
* and 8 bit sub-version, composed as follows:
* \code
* (VERSION_MAJOR<<16)+(VERSION_MINOR<<8)+(VERSION_SUBMINOR)
* \endcode
* \return the version number.*/
extern ogg_uint32_t th_version_number(void);
/**Converts a granule position to an absolute frame index, starting at
* <tt>0</tt>.
* The granule position is interpreted in the context of a given
* #th_enc_ctx or #th_dec_ctx handle (either will suffice).
* \param _encdec A previously allocated #th_enc_ctx or #th_dec_ctx
* handle.
* \param _granpos The granule position to convert.
* \returns The absolute frame index corresponding to \a _granpos.
* \retval -1 The given granule position was invalid (i.e. negative).*/
extern ogg_int64_t th_granule_frame(void *_encdec,ogg_int64_t _granpos);
/**Converts a granule position to an absolute time in seconds.
* The granule position is interpreted in the context of a given
* #th_enc_ctx or #th_dec_ctx handle (either will suffice).
* \param _encdec A previously allocated #th_enc_ctx or #th_dec_ctx
* handle.
* \param _granpos The granule position to convert.
* \return The absolute time in seconds corresponding to \a _granpos.
* This is the "end time" for the frame, or the latest time it should
* be displayed.
* It is not the presentation time.
* \retval -1 The given granule position was invalid (i.e. negative).*/
extern double th_granule_time(void *_encdec,ogg_int64_t _granpos);
/**Determines whether a Theora packet is a header or not.
* This function does no verification beyond checking the packet type bit, so
* it should not be used for bitstream identification; use
* th_decode_headerin() for that.
* As per the Theora specification, an empty (0-byte) packet is treated as a
* data packet (a delta frame with no coded blocks).
* \param _op An <tt>ogg_packet</tt> containing encoded Theora data.
* \retval 1 The packet is a header packet
* \retval 0 The packet is a video data packet.*/
extern int th_packet_isheader(ogg_packet *_op);
/**Determines whether a theora packet is a key frame or not.
* This function does no verification beyond checking the packet type and
* key frame bits, so it should not be used for bitstream identification; use
* th_decode_headerin() for that.
* As per the Theora specification, an empty (0-byte) packet is treated as a
* delta frame (with no coded blocks).
* \param _op An <tt>ogg_packet</tt> containing encoded Theora data.
* \retval 1 The packet contains a key frame.
* \retval 0 The packet contains a delta frame.
* \retval -1 The packet is not a video data packet.*/
extern int th_packet_iskeyframe(ogg_packet *_op);
/*@}*/
/**\name Functions for manipulating header data
* These functions manipulate the #th_info and #th_comment structures
* which describe video parameters and key-value metadata, respectively.
*
* You must link to \c libtheoradec if you use any of the
* functions in this section.*/
/*@{*/
/**Initializes a th_info structure.
* This should be called on a freshly allocated #th_info structure before
* attempting to use it.
* \param _info The #th_info struct to initialize.*/
extern void th_info_init(th_info *_info);
/**Clears a #th_info structure.
* This should be called on a #th_info structure after it is no longer
* needed.
* \param _info The #th_info struct to clear.*/
extern void th_info_clear(th_info *_info);
/**Initialize a #th_comment structure.
* This should be called on a freshly allocated #th_comment structure
* before attempting to use it.
* \param _tc The #th_comment struct to initialize.*/
extern void th_comment_init(th_comment *_tc);
/**Add a comment to an initialized #th_comment structure.
* \note Neither th_comment_add() nor th_comment_add_tag() support
* comments containing null values, although the bitstream format does
* support them.
* To add such comments you will need to manipulate the #th_comment
* structure directly.
* \param _tc The #th_comment struct to add the comment to.
* \param _comment Must be a null-terminated UTF-8 string containing the
* comment in "TAG=the value" form.*/
extern void th_comment_add(th_comment *_tc,const char *_comment);
/**Add a comment to an initialized #th_comment structure.
* \note Neither th_comment_add() nor th_comment_add_tag() support
* comments containing null values, although the bitstream format does
* support them.
* To add such comments you will need to manipulate the #th_comment
* structure directly.
* \param _tc The #th_comment struct to add the comment to.
* \param _tag A null-terminated string containing the tag associated with
* the comment.
* \param _val The corresponding value as a null-terminated string.*/
extern void th_comment_add_tag(th_comment *_tc,const char *_tag,
const char *_val);
/**Look up a comment value by its tag.
* \param _tc An initialized #th_comment structure.
* \param _tag The tag to look up.
* \param _count The instance of the tag.
* The same tag can appear multiple times, each with a distinct
* value, so an index is required to retrieve them all.
* The order in which these values appear is significant and
* should be preserved.
* Use th_comment_query_count() to get the legal range for
* the \a _count parameter.
* \return A pointer to the queried tag's value.
* This points directly to data in the #th_comment structure.
* It should not be modified or freed by the application, and
* modifications to the structure may invalidate the pointer.
* \retval NULL If no matching tag is found.*/
extern char *th_comment_query(th_comment *_tc,const char *_tag,int _count);
/**Look up the number of instances of a tag.
* Call this first when querying for a specific tag and then iterate over the
* number of instances with separate calls to th_comment_query() to
* retrieve all the values for that tag in order.
* \param _tc An initialized #th_comment structure.
* \param _tag The tag to look up.
* \return The number of instances of this particular tag.*/
extern int th_comment_query_count(th_comment *_tc,const char *_tag);
/**Clears a #th_comment structure.
* This should be called on a #th_comment structure after it is no longer
* needed.
* It will free all memory used by the structure members.
* \param _tc The #th_comment struct to clear.*/
extern void th_comment_clear(th_comment *_tc);
/*@}*/
/*@}*/
#if defined(__cplusplus)
}
#endif
#endif

View File

@ -1,786 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: theora.h,v 1.17 2003/12/06 18:06:19 arc Exp $
********************************************************************/
#ifndef _O_THEORA_H_
#define _O_THEORA_H_
#ifdef __cplusplus
extern "C"
{
#endif /* __cplusplus */
#include <stddef.h> /* for size_t */
#include <ogg/ogg.h>
/** \file
* The libtheora pre-1.0 legacy C API.
*
* \ingroup oldfuncs
*
* \section intro Introduction
*
* This is the documentation for the libtheora legacy C API, declared in
* the theora.h header, which describes the old interface used before
* the 1.0 release. This API was widely deployed for several years and
* remains supported, but for new code we recommend the cleaner API
* declared in theoradec.h and theoraenc.h.
*
* libtheora is the reference implementation for
* <a href="http://www.theora.org/">Theora</a>, a free video codec.
* Theora is derived from On2's VP3 codec with improved integration with
* Ogg multimedia formats by <a href="http://www.xiph.org/">Xiph.Org</a>.
*
* \section overview Overview
*
* This library will both decode and encode theora packets to/from raw YUV
* frames. In either case, the packets will most likely either come from or
* need to be embedded in an Ogg stream. Use
* <a href="http://xiph.org/ogg/">libogg</a> or
* <a href="http://www.annodex.net/software/liboggz/index.html">liboggz</a>
* to extract/package these packets.
*
* \section decoding Decoding Process
*
* Decoding can be separated into the following steps:
* -# initialise theora_info and theora_comment structures using
* theora_info_init() and theora_comment_init():
\verbatim
theora_info info;
theora_comment comment;
theora_info_init(&info);
theora_comment_init(&comment);
\endverbatim
* -# retrieve header packets from Ogg stream (there should be 3) and decode
* into theora_info and theora_comment structures using
* theora_decode_header(). See \ref identification for more information on
* identifying which packets are theora packets.
\verbatim
int i;
for (i = 0; i < 3; i++)
{
(get a theora packet "op" from the Ogg stream)
theora_decode_header(&info, &comment, op);
}
\endverbatim
* -# initialise the decoder based on the information retrieved into the
* theora_info struct by theora_decode_header(). You will need a
* theora_state struct.
\verbatim
theora_state state;
theora_decode_init(&state, &info);
\endverbatim
* -# pass in packets and retrieve decoded frames! See the yuv_buffer
* documentation for information on how to retrieve raw YUV data.
\verbatim
yuf_buffer buffer;
while (last packet was not e_o_s) {
(get a theora packet "op" from the Ogg stream)
theora_decode_packetin(&state, op);
theora_decode_YUVout(&state, &buffer);
}
\endverbatim
*
*
* \subsection identification Identifying Theora Packets
*
* All streams inside an Ogg file have a unique serial_no attached to the
* stream. Typically, you will want to
* - retrieve the serial_no for each b_o_s (beginning of stream) page
* encountered within the Ogg file;
* - test the first (only) packet on that page to determine if it is a theora
* packet;
* - once you have found a theora b_o_s page then use the retrieved serial_no
* to identify future packets belonging to the same theora stream.
*
* Note that you \e cannot use theora_packet_isheader() to determine if a
* packet is a theora packet or not, as this function does not perform any
* checking beyond whether a header bit is present. Instead, use the
* theora_decode_header() function and check the return value; or examine the
* header bytes at the beginning of the Ogg page.
*/
/** \defgroup oldfuncs Legacy pre-1.0 C API */
/* @{ */
/**
* A YUV buffer for passing uncompressed frames to and from the codec.
* This holds a Y'CbCr frame in planar format. The CbCr planes can be
* subsampled and have their own separate dimensions and row stride
* offsets. Note that the strides may be negative in some
* configurations. For theora the width and height of the largest plane
* must be a multiple of 16. The actual meaningful picture size and
* offset are stored in the theora_info structure; frames returned by
* the decoder may need to be cropped for display.
*
* All samples are 8 bits. Within each plane samples are ordered by
* row from the top of the frame to the bottom. Within each row samples
* are ordered from left to right.
*
* During decode, the yuv_buffer struct is allocated by the user, but all
* fields (including luma and chroma pointers) are filled by the library.
* These pointers address library-internal memory and their contents should
* not be modified.
*
* Conversely, during encode the user allocates the struct and fills out all
* fields. The user also manages the data addressed by the luma and chroma
* pointers. See the encoder_example.c and dump_video.c example files in
* theora/examples/ for more information.
*/
typedef struct {
int y_width; /**< Width of the Y' luminance plane */
int y_height; /**< Height of the luminance plane */
int y_stride; /**< Offset in bytes between successive rows */
int uv_width; /**< Width of the Cb and Cr chroma planes */
int uv_height; /**< Height of the chroma planes */
int uv_stride; /**< Offset between successive chroma rows */
unsigned char *y; /**< Pointer to start of luminance data */
unsigned char *u; /**< Pointer to start of Cb data */
unsigned char *v; /**< Pointer to start of Cr data */
} yuv_buffer;
/**
* A Colorspace.
*/
typedef enum {
OC_CS_UNSPECIFIED, /**< The colorspace is unknown or unspecified */
OC_CS_ITU_REC_470M, /**< This is the best option for 'NTSC' content */
OC_CS_ITU_REC_470BG, /**< This is the best option for 'PAL' content */
OC_CS_NSPACES /**< This marks the end of the defined colorspaces */
} theora_colorspace;
/**
* A Chroma subsampling
*
* These enumerate the available chroma subsampling options supported
* by the theora format. See Section 4.4 of the specification for
* exact definitions.
*/
typedef enum {
OC_PF_420, /**< Chroma subsampling by 2 in each direction (4:2:0) */
OC_PF_RSVD, /**< Reserved value */
OC_PF_422, /**< Horizonatal chroma subsampling by 2 (4:2:2) */
OC_PF_444 /**< No chroma subsampling at all (4:4:4) */
} theora_pixelformat;
/**
* Theora bitstream info.
* Contains the basic playback parameters for a stream,
* corresponding to the initial 'info' header packet.
*
* Encoded theora frames must be a multiple of 16 in width and height.
* To handle other frame sizes, a crop rectangle is specified in
* frame_height and frame_width, offset_x and * offset_y. The offset
* and size should still be a multiple of 2 to avoid chroma sampling
* shifts. Offset values in this structure are measured from the
* upper left of the image.
*
* Frame rate, in frames per second, is stored as a rational
* fraction. Aspect ratio is also stored as a rational fraction, and
* refers to the aspect ratio of the frame pixels, not of the
* overall frame itself.
*
* See <a href="http://svn.xiph.org/trunk/theora/examples/encoder_example.c">
* examples/encoder_example.c</a> for usage examples of the
* other parameters and good default settings for the encoder parameters.
*/
typedef struct {
ogg_uint32_t width; /**< encoded frame width */
ogg_uint32_t height; /**< encoded frame height */
ogg_uint32_t frame_width; /**< display frame width */
ogg_uint32_t frame_height; /**< display frame height */
ogg_uint32_t offset_x; /**< horizontal offset of the displayed frame */
ogg_uint32_t offset_y; /**< vertical offset of the displayed frame */
ogg_uint32_t fps_numerator; /**< frame rate numerator **/
ogg_uint32_t fps_denominator; /**< frame rate denominator **/
ogg_uint32_t aspect_numerator; /**< pixel aspect ratio numerator */
ogg_uint32_t aspect_denominator; /**< pixel aspect ratio denominator */
theora_colorspace colorspace; /**< colorspace */
int target_bitrate; /**< nominal bitrate in bits per second */
int quality; /**< Nominal quality setting, 0-63 */
int quick_p; /**< Quick encode/decode */
/* decode only */
unsigned char version_major;
unsigned char version_minor;
unsigned char version_subminor;
void *codec_setup;
/* encode only */
int dropframes_p;
int keyframe_auto_p;
ogg_uint32_t keyframe_frequency;
ogg_uint32_t keyframe_frequency_force; /* also used for decode init to
get granpos shift correct */
ogg_uint32_t keyframe_data_target_bitrate;
ogg_int32_t keyframe_auto_threshold;
ogg_uint32_t keyframe_mindistance;
ogg_int32_t noise_sensitivity;
ogg_int32_t sharpness;
theora_pixelformat pixelformat; /**< chroma subsampling mode to expect */
} theora_info;
/** Codec internal state and context.
*/
typedef struct{
theora_info *i;
ogg_int64_t granulepos;
void *internal_encode;
void *internal_decode;
} theora_state;
/**
* Comment header metadata.
*
* This structure holds the in-stream metadata corresponding to
* the 'comment' header packet.
*
* Meta data is stored as a series of (tag, value) pairs, in
* length-encoded string vectors. The first occurence of the
* '=' character delimits the tag and value. A particular tag
* may occur more than once. The character set encoding for
* the strings is always UTF-8, but the tag names are limited
* to case-insensitive ASCII. See the spec for details.
*
* In filling in this structure, theora_decode_header() will
* null-terminate the user_comment strings for safety. However,
* the bitstream format itself treats them as 8-bit clean,
* and so the length array should be treated as authoritative
* for their length.
*/
typedef struct theora_comment{
char **user_comments; /**< An array of comment string vectors */
int *comment_lengths; /**< An array of corresponding string vector lengths in bytes */
int comments; /**< The total number of comment string vectors */
char *vendor; /**< The vendor string identifying the encoder, null terminated */
} theora_comment;
/**\name theora_control() codes */
/* \anchor decctlcodes_old
* These are the available request codes for theora_control()
* when called with a decoder instance.
* By convention decoder control codes are odd, to distinguish
* them from \ref encctlcodes_old "encoder control codes" which
* are even.
*
* Note that since the 1.0 release, both the legacy and the final
* implementation accept all the same control codes, but only the
* final API declares the newer codes.
*
* Keep any experimental or vendor-specific values above \c 0x8000.*/
/*@{*/
/**Get the maximum post-processing level.
* The decoder supports a post-processing filter that can improve
* the appearance of the decoded images. This returns the highest
* level setting for this post-processor, corresponding to maximum
* improvement and computational expense.
*/
#define TH_DECCTL_GET_PPLEVEL_MAX (1)
/**Set the post-processing level.
* Sets the level of post-processing to use when decoding the
* compressed stream. This must be a value between zero (off)
* and the maximum returned by TH_DECCTL_GET_PPLEVEL_MAX.
*/
#define TH_DECCTL_SET_PPLEVEL (3)
/**Sets the maximum distance between key frames.
* This can be changed during an encode, but will be bounded by
* <tt>1<<th_info#keyframe_granule_shift</tt>.
* If it is set before encoding begins, th_info#keyframe_granule_shift will
* be enlarged appropriately.
*
* \param[in] buf <tt>ogg_uint32_t</tt>: The maximum distance between key
* frames.
* \param[out] buf <tt>ogg_uint32_t</tt>: The actual maximum distance set.
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(ogg_uint32_t)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE (4)
/**Set the granule position.
* Call this after a seek, to update the internal granulepos
* in the decoder, to insure that subsequent frames are marked
* properly. If you track timestamps yourself and do not use
* the granule postion returned by the decoder, then you do
* not need to use this control.
*/
#define TH_DECCTL_SET_GRANPOS (5)
/**\anchor encctlcodes_old */
/**Sets the quantization parameters to use.
* The parameters are copied, not stored by reference, so they can be freed
* after this call.
* <tt>NULL</tt> may be specified to revert to the default parameters.
*
* \param[in] buf #th_quant_info
* \retval OC_FAULT \a theora_state is <tt>NULL</tt>.
* \retval OC_EINVAL Encoding has already begun, the quantization parameters
* are not acceptable to this version of the encoder,
* \a buf is <tt>NULL</tt> and \a buf_sz is not zero,
* or \a buf is non-<tt>NULL</tt> and \a buf_sz is
* not <tt>sizeof(#th_quant_info)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_QUANT_PARAMS (2)
/**Disables any encoder features that would prevent lossless transcoding back
* to VP3.
* This primarily means disabling block-level QI values and not using 4MV mode
* when any of the luma blocks in a macro block are not coded.
* It also includes using the VP3 quantization tables and Huffman codes; if you
* set them explicitly after calling this function, the resulting stream will
* not be VP3-compatible.
* If you enable VP3-compatibility when encoding 4:2:2 or 4:4:4 source
* material, or when using a picture region smaller than the full frame (e.g.
* a non-multiple-of-16 width or height), then non-VP3 bitstream features will
* still be disabled, but the stream will still not be VP3-compatible, as VP3
* was not capable of encoding such formats.
* If you call this after encoding has already begun, then the quantization
* tables and codebooks cannot be changed, but the frame-level features will
* be enabled or disabled as requested.
*
* \param[in] buf <tt>int</tt>: a non-zero value to enable VP3 compatibility,
* or 0 to disable it (the default).
* \param[out] buf <tt>int</tt>: 1 if all bitstream features required for
* VP3-compatibility could be set, and 0 otherwise.
* The latter will be returned if the pixel format is not
* 4:2:0, the picture region is smaller than the full frame,
* or if encoding has begun, preventing the quantization
* tables and codebooks from being set.
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_VP3_COMPATIBLE (10)
/**Gets the maximum speed level.
* Higher speed levels favor quicker encoding over better quality per bit.
* Depending on the encoding mode, and the internal algorithms used, quality
* may actually improve, but in this case bitrate will also likely increase.
* In any case, overall rate/distortion performance will probably decrease.
* The maximum value, and the meaning of each value, may change depending on
* the current encoding mode (VBR vs. CQI, etc.).
*
* \param[out] buf int: The maximum encoding speed level.
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval OC_IMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_GET_SPLEVEL_MAX (12)
/**Sets the speed level.
* By default a speed value of 1 is used.
*
* \param[in] buf int: The new encoding speed level.
* 0 is slowest, larger values use less CPU.
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>, or the
* encoding speed level is out of bounds.
* The maximum encoding speed level may be
* implementation- and encoding mode-specific, and can be
* obtained via #TH_ENCCTL_GET_SPLEVEL_MAX.
* \retval OC_IMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_SPLEVEL (14)
/*@}*/
#define OC_FAULT -1 /**< General failure */
#define OC_EINVAL -10 /**< Library encountered invalid internal data */
#define OC_DISABLED -11 /**< Requested action is disabled */
#define OC_BADHEADER -20 /**< Header packet was corrupt/invalid */
#define OC_NOTFORMAT -21 /**< Packet is not a theora packet */
#define OC_VERSION -22 /**< Bitstream version is not handled */
#define OC_IMPL -23 /**< Feature or action not implemented */
#define OC_BADPACKET -24 /**< Packet is corrupt */
#define OC_NEWPACKET -25 /**< Packet is an (ignorable) unhandled extension */
#define OC_DUPFRAME 1 /**< Packet is a dropped frame */
/**
* Retrieve a human-readable string to identify the encoder vendor and version.
* \returns A version string.
*/
extern const char *theora_version_string(void);
/**
* Retrieve a 32-bit version number.
* This number is composed of a 16-bit major version, 8-bit minor version
* and 8 bit sub-version, composed as follows:
<pre>
(VERSION_MAJOR<<16) + (VERSION_MINOR<<8) + (VERSION_SUB)
</pre>
* \returns The version number.
*/
extern ogg_uint32_t theora_version_number(void);
/**
* Initialize the theora encoder.
* \param th The theora_state handle to initialize for encoding.
* \param ti A theora_info struct filled with the desired encoding parameters.
* \retval 0 Success
*/
extern int theora_encode_init(theora_state *th, theora_info *ti);
/**
* Submit a YUV buffer to the theora encoder.
* \param t A theora_state handle previously initialized for encoding.
* \param yuv A buffer of YUV data to encode. Note that both the yuv_buffer
* struct and the luma/chroma buffers within should be allocated by
* the user.
* \retval OC_EINVAL Encoder is not ready, or is finished.
* \retval -1 The size of the given frame differs from those previously input
* \retval 0 Success
*/
extern int theora_encode_YUVin(theora_state *t, yuv_buffer *yuv);
/**
* Request the next packet of encoded video.
* The encoded data is placed in a user-provided ogg_packet structure.
* \param t A theora_state handle previously initialized for encoding.
* \param last_p whether this is the last packet the encoder should produce.
* \param op An ogg_packet structure to fill. libtheora will set all
* elements of this structure, including a pointer to encoded
* data. The memory for the encoded data is owned by libtheora.
* \retval 0 No internal storage exists OR no packet is ready
* \retval -1 The encoding process has completed
* \retval 1 Success
*/
extern int theora_encode_packetout( theora_state *t, int last_p,
ogg_packet *op);
/**
* Request a packet containing the initial header.
* A pointer to the header data is placed in a user-provided ogg_packet
* structure.
* \param t A theora_state handle previously initialized for encoding.
* \param op An ogg_packet structure to fill. libtheora will set all
* elements of this structure, including a pointer to the header
* data. The memory for the header data is owned by libtheora.
* \retval 0 Success
*/
extern int theora_encode_header(theora_state *t, ogg_packet *op);
/**
* Request a comment header packet from provided metadata.
* A pointer to the comment data is placed in a user-provided ogg_packet
* structure.
* \param tc A theora_comment structure filled with the desired metadata
* \param op An ogg_packet structure to fill. libtheora will set all
* elements of this structure, including a pointer to the encoded
* comment data. The memory for the comment data is owned by
* the application, and must be freed by it using _ogg_free().
* On some systems (such as Windows when using dynamic linking), this
* may mean the free is executed in a different module from the
* malloc, which will crash; there is no way to free this memory on
* such systems.
* \retval 0 Success
*/
extern int theora_encode_comment(theora_comment *tc, ogg_packet *op);
/**
* Request a packet containing the codebook tables for the stream.
* A pointer to the codebook data is placed in a user-provided ogg_packet
* structure.
* \param t A theora_state handle previously initialized for encoding.
* \param op An ogg_packet structure to fill. libtheora will set all
* elements of this structure, including a pointer to the codebook
* data. The memory for the header data is owned by libtheora.
* \retval 0 Success
*/
extern int theora_encode_tables(theora_state *t, ogg_packet *op);
/**
* Decode an Ogg packet, with the expectation that the packet contains
* an initial header, comment data or codebook tables.
*
* \param ci A theora_info structure to fill. This must have been previously
* initialized with theora_info_init(). If \a op contains an initial
* header, theora_decode_header() will fill \a ci with the
* parsed header values. If \a op contains codebook tables,
* theora_decode_header() will parse these and attach an internal
* representation to \a ci->codec_setup.
* \param cc A theora_comment structure to fill. If \a op contains comment
* data, theora_decode_header() will fill \a cc with the parsed
* comments.
* \param op An ogg_packet structure which you expect contains an initial
* header, comment data or codebook tables.
*
* \retval OC_BADHEADER \a op is NULL; OR the first byte of \a op->packet
* has the signature of an initial packet, but op is
* not a b_o_s packet; OR this packet has the signature
* of an initial header packet, but an initial header
* packet has already been seen; OR this packet has the
* signature of a comment packet, but the initial header
* has not yet been seen; OR this packet has the signature
* of a comment packet, but contains invalid data; OR
* this packet has the signature of codebook tables,
* but the initial header or comments have not yet
* been seen; OR this packet has the signature of codebook
* tables, but contains invalid data;
* OR the stream being decoded has a compatible version
* but this packet does not have the signature of a
* theora initial header, comments, or codebook packet
* \retval OC_VERSION The packet data of \a op is an initial header with
* a version which is incompatible with this version of
* libtheora.
* \retval OC_NEWPACKET the stream being decoded has an incompatible (future)
* version and contains an unknown signature.
* \retval 0 Success
*
* \note The normal usage is that theora_decode_header() be called on the
* first three packets of a theora logical bitstream in succession.
*/
extern int theora_decode_header(theora_info *ci, theora_comment *cc,
ogg_packet *op);
/**
* Initialize a theora_state handle for decoding.
* \param th The theora_state handle to initialize.
* \param c A theora_info struct filled with the desired decoding parameters.
* This is of course usually obtained from a previous call to
* theora_decode_header().
* \retval 0 Success
*/
extern int theora_decode_init(theora_state *th, theora_info *c);
/**
* Input a packet containing encoded data into the theora decoder.
* \param th A theora_state handle previously initialized for decoding.
* \param op An ogg_packet containing encoded theora data.
* \retval 0 Success
* \retval OC_BADPACKET \a op does not contain encoded video data
*/
extern int theora_decode_packetin(theora_state *th,ogg_packet *op);
/**
* Output the next available frame of decoded YUV data.
* \param th A theora_state handle previously initialized for decoding.
* \param yuv A yuv_buffer in which libtheora should place the decoded data.
* Note that the buffer struct itself is allocated by the user, but
* that the luma and chroma pointers will be filled in by the
* library. Also note that these luma and chroma regions should be
* considered read-only by the user.
* \retval 0 Success
*/
extern int theora_decode_YUVout(theora_state *th,yuv_buffer *yuv);
/**
* Report whether a theora packet is a header or not
* This function does no verification beyond checking the header
* flag bit so it should not be used for bitstream identification;
* use theora_decode_header() for that.
*
* \param op An ogg_packet containing encoded theora data.
* \retval 1 The packet is a header packet
* \retval 0 The packet is not a header packet (and so contains frame data)
*
* Thus function was added in the 1.0alpha4 release.
*/
extern int theora_packet_isheader(ogg_packet *op);
/**
* Report whether a theora packet is a keyframe or not
*
* \param op An ogg_packet containing encoded theora data.
* \retval 1 The packet contains a keyframe image
* \retval 0 The packet is contains an interframe delta
* \retval -1 The packet is not an image data packet at all
*
* Thus function was added in the 1.0alpha4 release.
*/
extern int theora_packet_iskeyframe(ogg_packet *op);
/**
* Report the granulepos shift radix
*
* When embedded in Ogg, Theora uses a two-part granulepos,
* splitting the 64-bit field into two pieces. The more-significant
* section represents the frame count at the last keyframe,
* and the less-significant section represents the count of
* frames since the last keyframe. In this way the overall
* field is still non-decreasing with time, but usefully encodes
* a pointer to the last keyframe, which is necessary for
* correctly restarting decode after a seek.
*
* This function reports the number of bits used to represent
* the distance to the last keyframe, and thus how the granulepos
* field must be shifted or masked to obtain the two parts.
*
* Since libtheora returns compressed data in an ogg_packet
* structure, this may be generally useful even if the Theora
* packets are not being used in an Ogg container.
*
* \param ti A previously initialized theora_info struct
* \returns The bit shift dividing the two granulepos fields
*
* This function was added in the 1.0alpha5 release.
*/
int theora_granule_shift(theora_info *ti);
/**
* Convert a granulepos to an absolute frame index, starting at 0.
* The granulepos is interpreted in the context of a given theora_state handle.
*
* Note that while the granulepos encodes the frame count (i.e. starting
* from 1) this call returns the frame index, starting from zero. Thus
* One can calculate the presentation time by multiplying the index by
* the rate.
*
* \param th A previously initialized theora_state handle (encode or decode)
* \param granulepos The granulepos to convert.
* \returns The frame index corresponding to \a granulepos.
* \retval -1 The given granulepos is undefined (i.e. negative)
*
* Thus function was added in the 1.0alpha4 release.
*/
extern ogg_int64_t theora_granule_frame(theora_state *th,ogg_int64_t granulepos);
/**
* Convert a granulepos to absolute time in seconds. The granulepos is
* interpreted in the context of a given theora_state handle, and gives
* the end time of a frame's presentation as used in Ogg mux ordering.
*
* \param th A previously initialized theora_state handle (encode or decode)
* \param granulepos The granulepos to convert.
* \returns The absolute time in seconds corresponding to \a granulepos.
* This is the "end time" for the frame, or the latest time it should
* be displayed.
* It is not the presentation time.
* \retval -1. The given granulepos is undefined (i.e. negative).
*/
extern double theora_granule_time(theora_state *th,ogg_int64_t granulepos);
/**
* Initialize a theora_info structure. All values within the given theora_info
* structure are initialized, and space is allocated within libtheora for
* internal codec setup data.
* \param c A theora_info struct to initialize.
*/
extern void theora_info_init(theora_info *c);
/**
* Clear a theora_info structure. All values within the given theora_info
* structure are cleared, and associated internal codec setup data is freed.
* \param c A theora_info struct to initialize.
*/
extern void theora_info_clear(theora_info *c);
/**
* Free all internal data associated with a theora_state handle.
* \param t A theora_state handle.
*/
extern void theora_clear(theora_state *t);
/**
* Initialize an allocated theora_comment structure
* \param tc An allocated theora_comment structure
**/
extern void theora_comment_init(theora_comment *tc);
/**
* Add a comment to an initialized theora_comment structure
* \param tc A previously initialized theora comment structure
* \param comment A null-terminated string encoding the comment in the form
* "TAG=the value"
*
* Neither theora_comment_add() nor theora_comment_add_tag() support
* comments containing null values, although the bitstream format
* supports this. To add such comments you will need to manipulate
* the theora_comment structure directly.
**/
extern void theora_comment_add(theora_comment *tc, char *comment);
/**
* Add a comment to an initialized theora_comment structure.
* \param tc A previously initialized theora comment structure
* \param tag A null-terminated string containing the tag
* associated with the comment.
* \param value The corresponding value as a null-terminated string
*
* Neither theora_comment_add() nor theora_comment_add_tag() support
* comments containing null values, although the bitstream format
* supports this. To add such comments you will need to manipulate
* the theora_comment structure directly.
**/
extern void theora_comment_add_tag(theora_comment *tc,
char *tag, char *value);
/**
* Look up a comment value by tag.
* \param tc Tn initialized theora_comment structure
* \param tag The tag to look up
* \param count The instance of the tag. The same tag can appear multiple
* times, each with a distinct and ordered value, so an index
* is required to retrieve them all.
* \returns A pointer to the queried tag's value
* \retval NULL No matching tag is found
*
* \note Use theora_comment_query_count() to get the legal range for the
* count parameter.
**/
extern char *theora_comment_query(theora_comment *tc, char *tag, int count);
/** Look up the number of instances of a tag.
* \param tc An initialized theora_comment structure
* \param tag The tag to look up
* \returns The number on instances of a particular tag.
*
* Call this first when querying for a specific tag and then interate
* over the number of instances with separate calls to
* theora_comment_query() to retrieve all instances in order.
**/
extern int theora_comment_query_count(theora_comment *tc, char *tag);
/**
* Clear an allocated theora_comment struct so that it can be freed.
* \param tc An allocated theora_comment structure.
**/
extern void theora_comment_clear(theora_comment *tc);
/**Encoder control function.
* This is used to provide advanced control the encoding process.
* \param th A #theora_state handle.
* \param req The control code to process.
* See \ref encctlcodes_old "the list of available
* control codes" for details.
* \param buf The parameters for this control code.
* \param buf_sz The size of the parameter buffer.*/
extern int theora_control(theora_state *th,int req,void *buf,size_t buf_sz);
/* @} */ /* end oldfuncs doxygen group */
#ifdef __cplusplus
}
#endif /* __cplusplus */
#endif /* _O_THEORA_H_ */

View File

@ -1,333 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: theora.h,v 1.8 2004/03/15 22:17:32 derf Exp $
********************************************************************/
/**\file
* The <tt>libtheoradec</tt> C decoding API.*/
#if !defined(_O_THEORA_THEORADEC_H_)
# define _O_THEORA_THEORADEC_H_ (1)
# include <stddef.h>
# include <ogg/ogg.h>
# include "codec.h"
#if defined(__cplusplus)
extern "C" {
#endif
/**\name th_decode_ctl() codes
* \anchor decctlcodes
* These are the available request codes for th_decode_ctl().
* By convention, these are odd, to distinguish them from the
* \ref encctlcodes "encoder control codes".
* Keep any experimental or vendor-specific values above \c 0x8000.*/
/*@{*/
/**Gets the maximum post-processing level.
* The decoder supports a post-processing filter that can improve
* the appearance of the decoded images. This returns the highest
* level setting for this post-processor, corresponding to maximum
* improvement and computational expense.
*
* \param[out] _buf int: The maximum post-processing level.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_DECCTL_GET_PPLEVEL_MAX (1)
/**Sets the post-processing level.
* By default, post-processing is disabled.
*
* Sets the level of post-processing to use when decoding the
* compressed stream. This must be a value between zero (off)
* and the maximum returned by TH_DECCTL_GET_PPLEVEL_MAX.
*
* \param[in] _buf int: The new post-processing level.
* 0 to disable; larger values use more CPU.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>, or the
* post-processing level is out of bounds.
* The maximum post-processing level may be
* implementation-specific, and can be obtained via
* #TH_DECCTL_GET_PPLEVEL_MAX.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_DECCTL_SET_PPLEVEL (3)
/**Sets the granule position.
* Call this after a seek, before decoding the first frame, to ensure that the
* proper granule position is returned for all subsequent frames.
* If you track timestamps yourself and do not use the granule position
* returned by the decoder, then you need not call this function.
*
* \param[in] _buf <tt>ogg_int64_t</tt>: The granule position of the next
* frame.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(ogg_int64_t)</tt>, or the
* granule position is negative.*/
#define TH_DECCTL_SET_GRANPOS (5)
/**Sets the striped decode callback function.
* If set, this function will be called as each piece of a frame is fully
* decoded in th_decode_packetin().
* You can pass in a #th_stripe_callback with
* th_stripe_callback#stripe_decoded set to <tt>NULL</tt> to disable the
* callbacks at any point.
* Enabling striped decode does not prevent you from calling
* th_decode_ycbcr_out() after the frame is fully decoded.
*
* \param[in] _buf #th_stripe_callback: The callback parameters.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not
* <tt>sizeof(th_stripe_callback)</tt>.*/
#define TH_DECCTL_SET_STRIPE_CB (7)
/**Sets the macroblock display mode. Set to 0 to disable displaying
* macroblocks.*/
#define TH_DECCTL_SET_TELEMETRY_MBMODE (9)
/**Sets the motion vector display mode. Set to 0 to disable displaying motion
* vectors.*/
#define TH_DECCTL_SET_TELEMETRY_MV (11)
/**Sets the adaptive quantization display mode. Set to 0 to disable displaying
* adaptive quantization. */
#define TH_DECCTL_SET_TELEMETRY_QI (13)
/**Sets the bitstream breakdown visualization mode. Set to 0 to disable
* displaying bitstream breakdown.*/
#define TH_DECCTL_SET_TELEMETRY_BITS (15)
/*@}*/
/**A callback function for striped decode.
* This is a function pointer to an application-provided function that will be
* called each time a section of the image is fully decoded in
* th_decode_packetin().
* This allows the application to process the section immediately, while it is
* still in cache.
* Note that the frame is decoded bottom to top, so \a _yfrag0 will steadily
* decrease with each call until it reaches 0, at which point the full frame
* is decoded.
* The number of fragment rows made available in each call depends on the pixel
* format and the number of post-processing filters enabled, and may not even
* be constant for the entire frame.
* If a non-<tt>NULL</tt> \a _granpos pointer is passed to
* th_decode_packetin(), the granule position for the frame will be stored
* in it before the first callback is made.
* If an entire frame is dropped (a 0-byte packet), then no callbacks will be
* made at all for that frame.
* \param _ctx An application-provided context pointer.
* \param _buf The image buffer for the decoded frame.
* \param _yfrag0 The Y coordinate of the first row of 8x8 fragments
* decoded.
* Multiply this by 8 to obtain the pixel row number in the
* luma plane.
* If the chroma planes are subsampled in the Y direction,
* this will always be divisible by two.
* \param _yfrag_end The Y coordinate of the first row of 8x8 fragments past
* the newly decoded section.
* If the chroma planes are subsampled in the Y direction,
* this will always be divisible by two.
* I.e., this section contains fragment rows
* <tt>\a _yfrag0 ...\a _yfrag_end -1</tt>.*/
typedef void (*th_stripe_decoded_func)(void *_ctx,th_ycbcr_buffer _buf,
int _yfrag0,int _yfrag_end);
/**The striped decode callback data to pass to #TH_DECCTL_SET_STRIPE_CB.*/
typedef struct{
/**An application-provided context pointer.
* This will be passed back verbatim to the application.*/
void *ctx;
/**The callback function pointer.*/
th_stripe_decoded_func stripe_decoded;
}th_stripe_callback;
/**\name Decoder state
The following data structures are opaque, and their contents are not
publicly defined by this API.
Referring to their internals directly is unsupported, and may break without
warning.*/
/*@{*/
/**The decoder context.*/
typedef struct th_dec_ctx th_dec_ctx;
/**Setup information.
This contains auxiliary information (Huffman tables and quantization
parameters) decoded from the setup header by th_decode_headerin() to be
passed to th_decode_alloc().
It can be re-used to initialize any number of decoders, and can be freed
via th_setup_free() at any time.*/
typedef struct th_setup_info th_setup_info;
/*@}*/
/**\defgroup decfuncs Functions for Decoding*/
/*@{*/
/**\name Functions for decoding
* You must link to <tt>libtheoradec</tt> if you use any of the
* functions in this section.
*
* The functions are listed in the order they are used in a typical decode.
* The basic steps are:
* - Parse the header packets by repeatedly calling th_decode_headerin().
* - Allocate a #th_dec_ctx handle with th_decode_alloc().
* - Call th_setup_free() to free any memory used for codec setup
* information.
* - Perform any additional decoder configuration with th_decode_ctl().
* - For each video data packet:
* - Submit the packet to the decoder via th_decode_packetin().
* - Retrieve the uncompressed video data via th_decode_ycbcr_out().
* - Call th_decode_free() to release all decoder memory.*/
/*@{*/
/**Decodes the header packets of a Theora stream.
* This should be called on the initial packets of the stream, in succession,
* until it returns <tt>0</tt>, indicating that all headers have been
* processed, or an error is encountered.
* At least three header packets are required, and additional optional header
* packets may follow.
* This can be used on the first packet of any logical stream to determine if
* that stream is a Theora stream.
* \param _info A #th_info structure to fill in.
* This must have been previously initialized with
* th_info_init().
* The application may immediately begin using the contents of
* this structure after the first header is decoded, though it
* must continue to be passed in on all subsequent calls.
* \param _tc A #th_comment structure to fill in.
* The application may immediately begin using the contents of
* this structure after the second header is decoded, though it
* must continue to be passed in on all subsequent calls.
* \param _setup Returns a pointer to additional, private setup information
* needed by the decoder.
* The contents of this pointer must be initialized to
* <tt>NULL</tt> on the first call, and the returned value must
* continue to be passed in on all subsequent calls.
* \param _op An <tt>ogg_packet</tt> structure which contains one of the
* initial packets of an Ogg logical stream.
* \return A positive value indicates that a Theora header was successfully
* processed.
* \retval 0 The first video data packet was encountered after all
* required header packets were parsed.
* The packet just passed in on this call should be saved
* and fed to th_decode_packetin() to begin decoding
* video data.
* \retval TH_EFAULT One of \a _info, \a _tc, or \a _setup was
* <tt>NULL</tt>.
* \retval TH_EBADHEADER \a _op was <tt>NULL</tt>, the packet was not the next
* header packet in the expected sequence, or the format
* of the header data was invalid.
* \retval TH_EVERSION The packet data was a Theora info header, but for a
* bitstream version not decodable with this version of
* <tt>libtheoradec</tt>.
* \retval TH_ENOTFORMAT The packet was not a Theora header.
*/
extern int th_decode_headerin(th_info *_info,th_comment *_tc,
th_setup_info **_setup,ogg_packet *_op);
/**Allocates a decoder instance.
*
* <b>Security Warning:</b> The Theora format supports very large frame sizes,
* potentially even larger than the address space of a 32-bit machine, and
* creating a decoder context allocates the space for several frames of data.
* If the allocation fails here, your program will crash, possibly at some
* future point because the OS kernel returned a valid memory range and will
* only fail when it tries to map the pages in it the first time they are
* used.
* Even if it succeeds, you may experience a denial of service if the frame
* size is large enough to cause excessive paging.
* If you are integrating libtheora in a larger application where such things
* are undesirable, it is highly recommended that you check the frame size in
* \a _info before calling this function and refuse to decode streams where it
* is larger than some reasonable maximum.
* libtheora will not check this for you, because there may be machines that
* can handle such streams and applications that wish to.
* \param _info A #th_info struct filled via th_decode_headerin().
* \param _setup A #th_setup_info handle returned via
* th_decode_headerin().
* \return The initialized #th_dec_ctx handle.
* \retval NULL If the decoding parameters were invalid.*/
extern th_dec_ctx *th_decode_alloc(const th_info *_info,
const th_setup_info *_setup);
/**Releases all storage used for the decoder setup information.
* This should be called after you no longer want to create any decoders for
* a stream whose headers you have parsed with th_decode_headerin().
* \param _setup The setup information to free.
* This can safely be <tt>NULL</tt>.*/
extern void th_setup_free(th_setup_info *_setup);
/**Decoder control function.
* This is used to provide advanced control of the decoding process.
* \param _dec A #th_dec_ctx handle.
* \param _req The control code to process.
* See \ref decctlcodes "the list of available control codes"
* for details.
* \param _buf The parameters for this control code.
* \param _buf_sz The size of the parameter buffer.
* \return Possible return values depend on the control code used.
* See \ref decctlcodes "the list of control codes" for
* specific values. Generally 0 indicates success.*/
extern int th_decode_ctl(th_dec_ctx *_dec,int _req,void *_buf,
size_t _buf_sz);
/**Submits a packet containing encoded video data to the decoder.
* \param _dec A #th_dec_ctx handle.
* \param _op An <tt>ogg_packet</tt> containing encoded video data.
* \param _granpos Returns the granule position of the decoded packet.
* If non-<tt>NULL</tt>, the granule position for this specific
* packet is stored in this location.
* This is computed incrementally from previously decoded
* packets.
* After a seek, the correct granule position must be set via
* #TH_DECCTL_SET_GRANPOS for this to work properly.
* \retval 0 Success.
* A new decoded frame can be retrieved by calling
* th_decode_ycbcr_out().
* \retval TH_DUPFRAME The packet represented a dropped frame (either a
* 0-byte frame or an INTER frame with no coded blocks).
* The player can skip the call to th_decode_ycbcr_out(),
* as the contents of the decoded frame buffer have not
* changed.
* \retval TH_EFAULT \a _dec or \a _op was <tt>NULL</tt>.
* \retval TH_EBADPACKET \a _op does not contain encoded video data.
* \retval TH_EIMPL The video data uses bitstream features which this
* library does not support.*/
extern int th_decode_packetin(th_dec_ctx *_dec,const ogg_packet *_op,
ogg_int64_t *_granpos);
/**Outputs the next available frame of decoded Y'CbCr data.
* If a striped decode callback has been set with #TH_DECCTL_SET_STRIPE_CB,
* then the application does not need to call this function.
* \param _dec A #th_dec_ctx handle.
* \param _ycbcr A video buffer structure to fill in.
* <tt>libtheoradec</tt> will fill in all the members of this
* structure, including the pointers to the uncompressed video
* data.
* The memory for this video data is owned by
* <tt>libtheoradec</tt>.
* It may be freed or overwritten without notification when
* subsequent frames are decoded.
* \retval 0 Success
* \retval TH_EFAULT \a _dec or \a _ycbcr was <tt>NULL</tt>.
*/
extern int th_decode_ycbcr_out(th_dec_ctx *_dec,
th_ycbcr_buffer _ycbcr);
/**Frees an allocated decoder instance.
* \param _dec A #th_dec_ctx handle.*/
extern void th_decode_free(th_dec_ctx *_dec);
/*@}*/
/*@}*/
#if defined(__cplusplus)
}
#endif
#endif

View File

@ -1,306 +0,0 @@
#!/usr/bin/perl
my $bigend; # little/big endian
my $nxstack;
$nxstack = 0;
eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
if $running_under_some_shell;
while ($ARGV[0] =~ /^-/) {
$_ = shift;
last if /^--/;
if (/^-n/) {
$nflag++;
next;
}
die "I don't recognize this switch: $_\\n";
}
$printit++ unless $nflag;
$\ = "\n"; # automatically add newline on print
$n=0;
$thumb = 0; # ARM mode by default, not Thumb.
@proc_stack = ();
printf (" .syntax unified\n");
LINE:
while (<>) {
# For ADRLs we need to add a new line after the substituted one.
$addPadding = 0;
# First, we do not dare to touch *anything* inside double quotes, do we?
# Second, if you want a dollar character in the string,
# insert two of them -- that's how ARM C and assembler treat strings.
s/^([A-Za-z_]\w*)[ \t]+DCB[ \t]*\"/$1: .ascii \"/ && do { s/\$\$/\$/g; next };
s/\bDCB\b[ \t]*\"/.ascii \"/ && do { s/\$\$/\$/g; next };
s/^(\S+)\s+RN\s+(\S+)/$1 .req r$2/ && do { s/\$\$/\$/g; next };
# If there's nothing on a line but a comment, don't try to apply any further
# substitutions (this is a cheap hack to avoid mucking up the license header)
s/^([ \t]*);/$1@/ && do { s/\$\$/\$/g; next };
# If substituted -- leave immediately !
s/@/,:/;
s/;/@/;
while ( /@.*'/ ) {
s/(@.*)'/$1/g;
}
s/\{FALSE\}/0/g;
s/\{TRUE\}/1/g;
s/\{(\w\w\w\w+)\}/$1/g;
s/\bINCLUDE[ \t]*([^ \t\n]+)/.include \"$1\"/;
s/\bGET[ \t]*([^ \t\n]+)/.include \"${ my $x=$1; $x =~ s|\.s|-gnu.S|; \$x }\"/;
s/\bIMPORT\b/.extern/;
s/\bEXPORT\b/.global/;
s/^(\s+)\[/$1IF/;
s/^(\s+)\|/$1ELSE/;
s/^(\s+)\]/$1ENDIF/;
s/IF *:DEF:/ .ifdef/;
s/IF *:LNOT: *:DEF:/ .ifndef/;
s/ELSE/ .else/;
s/ENDIF/ .endif/;
if( /\bIF\b/ ) {
s/\bIF\b/ .if/;
s/=/==/;
}
if ( $n == 2) {
s/\$/\\/g;
}
if ($n == 1) {
s/\$//g;
s/label//g;
$n = 2;
}
if ( /MACRO/ ) {
s/MACRO *\n/.macro/;
$n=1;
}
if ( /\bMEND\b/ ) {
s/\bMEND\b/.endm/;
$n=0;
}
# ".rdata" doesn't work in 'as' version 2.13.2, as it is ".rodata" there.
#
if ( /\bAREA\b/ ) {
my $align;
$align = "2";
if ( /ALIGN=(\d+)/ ) {
$align = $1;
}
if ( /CODE/ ) {
$nxstack = 1;
}
s/^(.+)CODE(.+)READONLY(.*)/ .text/;
s/^(.+)DATA(.+)READONLY(.*)/ .section .rdata/;
s/^(.+)\|\|\.data\|\|(.+)/ .data/;
s/^(.+)\|\|\.bss\|\|(.+)/ .bss/;
s/$/; .p2align $align/;
}
s/\|\|\.constdata\$(\d+)\|\|/.L_CONST$1/; # ||.constdata$3||
s/\|\|\.bss\$(\d+)\|\|/.L_BSS$1/; # ||.bss$2||
s/\|\|\.data\$(\d+)\|\|/.L_DATA$1/; # ||.data$2||
s/\|\|([a-zA-Z0-9_]+)\@([a-zA-Z0-9_]+)\|\|/@ $&/;
s/^(\s+)\%(\s)/ .space $1/;
s/\|(.+)\.(\d+)\|/\.$1_$2/; # |L80.123| -> .L80_123
s/\bCODE32\b/.code 32/ && do {$thumb = 0};
s/\bCODE16\b/.code 16/ && do {$thumb = 1};
if (/\bPROC\b/)
{
my $prefix;
my $proc;
/^([A-Za-z_\.]\w+)\b/;
$proc = $1;
$prefix = "";
if ($proc)
{
$prefix = $prefix.sprintf("\t.type\t%s, %%function; ",$proc);
push(@proc_stack, $proc);
s/^[A-Za-z_\.]\w+/$&:/;
}
$prefix = $prefix."\t.thumb_func; " if ($thumb);
s/\bPROC\b/@ $&/;
$_ = $prefix.$_;
}
s/^(\s*)(S|Q|SH|U|UQ|UH)ASX\b/$1$2ADDSUBX/;
s/^(\s*)(S|Q|SH|U|UQ|UH)SAX\b/$1$2SUBADDX/;
if (/\bENDP\b/)
{
my $proc;
s/\bENDP\b/@ $&/;
$proc = pop(@proc_stack);
$_ = "\t.size $proc, .-$proc".$_ if ($proc);
}
s/\bSUBT\b/@ $&/;
s/\bDATA\b/@ $&/; # DATA directive is deprecated -- Asm guide, p.7-25
s/\bKEEP\b/@ $&/;
s/\bEXPORTAS\b/@ $&/;
s/\|\|(.)+\bEQU\b/@ $&/;
s/\|\|([\w\$]+)\|\|/$1/;
s/\bENTRY\b/@ $&/;
s/\bASSERT\b/@ $&/;
s/\bGBLL\b/@ $&/;
s/\bGBLA\b/@ $&/;
s/^\W+OPT\b/@ $&/;
s/:OR:/|/g;
s/:SHL:/<</g;
s/:SHR:/>>/g;
s/:AND:/&/g;
s/:LAND:/&&/g;
s/CPSR/cpsr/;
s/SPSR/spsr/;
s/ALIGN$/.balign 4/;
s/ALIGN\s+([0-9x]+)$/.balign $1/;
s/psr_cxsf/psr_all/;
s/LTORG/.ltorg/;
s/^([A-Za-z_]\w*)[ \t]+EQU/ .set $1,/;
s/^([A-Za-z_]\w*)[ \t]+SETL/ .set $1,/;
s/^([A-Za-z_]\w*)[ \t]+SETA/ .set $1,/;
s/^([A-Za-z_]\w*)[ \t]+\*/ .set $1,/;
# {PC} + 0xdeadfeed --> . + 0xdeadfeed
s/\{PC\} \+/ \. +/;
# Single hex constant on the line !
#
# >>> NOTE <<<
# Double-precision floats in gcc are always mixed-endian, which means
# bytes in two words are little-endian, but words are big-endian.
# So, 0x0000deadfeed0000 would be stored as 0x0000dead at low address
# and 0xfeed0000 at high address.
#
s/\bDCFD\b[ \t]+0x([a-fA-F0-9]{8})([a-fA-F0-9]{8})/.long 0x$1, 0x$2/;
# Only decimal constants on the line, no hex !
s/\bDCFD\b[ \t]+([0-9\.\-]+)/.double $1/;
# Single hex constant on the line !
# s/\bDCFS\b[ \t]+0x([a-f0-9]{8})([a-f0-9]{8})/.long 0x$1, 0x$2/;
# Only decimal constants on the line, no hex !
# s/\bDCFS\b[ \t]+([0-9\.\-]+)/.double $1/;
s/\bDCFS[ \t]+0x/.word 0x/;
s/\bDCFS\b/.float/;
s/^([A-Za-z_]\w*)[ \t]+DCD/$1 .word/;
s/\bDCD\b/.word/;
s/^([A-Za-z_]\w*)[ \t]+DCW/$1 .short/;
s/\bDCW\b/.short/;
s/^([A-Za-z_]\w*)[ \t]+DCB/$1 .byte/;
s/\bDCB\b/.byte/;
s/^([A-Za-z_]\w*)[ \t]+\%/.comm $1,/;
s/^[A-Za-z_\.]\w+/$&:/;
s/^(\d+)/$1:/;
s/\%(\d+)/$1b_or_f/;
s/\%[Bb](\d+)/$1b/;
s/\%[Ff](\d+)/$1f/;
s/\%[Ff][Tt](\d+)/$1f/;
s/&([\dA-Fa-f]+)/0x$1/;
if ( /\b2_[01]+\b/ ) {
s/\b2_([01]+)\b/conv$1&&&&/g;
while ( /[01][01][01][01]&&&&/ ) {
s/0000&&&&/&&&&0/g;
s/0001&&&&/&&&&1/g;
s/0010&&&&/&&&&2/g;
s/0011&&&&/&&&&3/g;
s/0100&&&&/&&&&4/g;
s/0101&&&&/&&&&5/g;
s/0110&&&&/&&&&6/g;
s/0111&&&&/&&&&7/g;
s/1000&&&&/&&&&8/g;
s/1001&&&&/&&&&9/g;
s/1010&&&&/&&&&A/g;
s/1011&&&&/&&&&B/g;
s/1100&&&&/&&&&C/g;
s/1101&&&&/&&&&D/g;
s/1110&&&&/&&&&E/g;
s/1111&&&&/&&&&F/g;
}
s/000&&&&/&&&&0/g;
s/001&&&&/&&&&1/g;
s/010&&&&/&&&&2/g;
s/011&&&&/&&&&3/g;
s/100&&&&/&&&&4/g;
s/101&&&&/&&&&5/g;
s/110&&&&/&&&&6/g;
s/111&&&&/&&&&7/g;
s/00&&&&/&&&&0/g;
s/01&&&&/&&&&1/g;
s/10&&&&/&&&&2/g;
s/11&&&&/&&&&3/g;
s/0&&&&/&&&&0/g;
s/1&&&&/&&&&1/g;
s/conv&&&&/0x/g;
}
if ( /commandline/)
{
if( /-bigend/)
{
$bigend=1;
}
}
if ( /\bDCDU\b/ )
{
my $cmd=$_;
my $value;
my $prefix;
my $w1;
my $w2;
my $w3;
my $w4;
s/\s+DCDU\b/@ $&/;
$cmd =~ /\bDCDU\b\s+0x(\d+)/;
$value = $1;
$value =~ /(\w\w)(\w\w)(\w\w)(\w\w)/;
$w1 = $1;
$w2 = $2;
$w3 = $3;
$w4 = $4;
if( $bigend ne "")
{
# big endian
$prefix = "\t.byte\t0x".$w1.";".
"\t.byte\t0x".$w2.";".
"\t.byte\t0x".$w3.";".
"\t.byte\t0x".$w4."; ";
}
else
{
# little endian
$prefix = "\t.byte\t0x".$w4.";".
"\t.byte\t0x".$w3.";".
"\t.byte\t0x".$w2.";".
"\t.byte\t0x".$w1."; ";
}
$_=$prefix.$_;
}
if ( /\badrl\b/i )
{
s/\badrl\s+(\w+)\s*,\s*(\w+)/ldr $1,=$2/i;
$addPadding = 1;
}
s/\bEND\b/@ END/;
} continue {
printf ("%s", $_) if $printit;
if ($addPadding != 0)
{
printf (" mov r0,r0\n");
$addPadding = 0;
}
}
#If we had a code section, mark that this object doesn't need an executable
# stack.
if ($nxstack) {
printf (" .section\t.note.GNU-stack,\"\",\%\%progbits\n");
}

View File

@ -1,32 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: x86int.h 17344 2010-07-21 01:42:18Z tterribe $
********************************************************************/
#if !defined(_arm_armbits_H)
# define _arm_armbits_H (1)
# include "../bitpack.h"
# include "armcpu.h"
# if defined(OC_ARM_ASM)
# define oc_pack_read oc_pack_read_arm
# define oc_pack_read1 oc_pack_read1_arm
# define oc_huff_token_decode oc_huff_token_decode_arm
# endif
long oc_pack_read_arm(oc_pack_buf *_b,int _bits);
int oc_pack_read1_arm(oc_pack_buf *_b);
int oc_huff_token_decode_arm(oc_pack_buf *_b,const ogg_int16_t *_tree);
#endif

View File

@ -1,230 +0,0 @@
;********************************************************************
;* *
;* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
;* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
;* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
;* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
;* *
;* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
;* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
;* *
;********************************************************************
;
; function:
; last mod: $Id$
;
;********************************************************************
AREA |.text|, CODE, READONLY
EXPORT oc_pack_read_arm
EXPORT oc_pack_read1_arm
EXPORT oc_huff_token_decode_arm
oc_pack_read1_arm PROC
; r0 = oc_pack_buf *_b
ADD r12,r0,#8
LDMIA r12,{r2,r3} ; r2 = window
; Stall... ; r3 = available
; Stall...
SUBS r3,r3,#1 ; r3 = available-1, available<1 => LT
BLT oc_pack_read1_refill
MOV r0,r2,LSR #31 ; r0 = window>>31
MOV r2,r2,LSL #1 ; r2 = window<<=1
STMIA r12,{r2,r3} ; window = r2
; available = r3
MOV PC,r14
ENDP
oc_pack_read_arm PROC
; r0 = oc_pack_buf *_b
; r1 = int _bits
ADD r12,r0,#8
LDMIA r12,{r2,r3} ; r2 = window
; Stall... ; r3 = available
; Stall...
SUBS r3,r3,r1 ; r3 = available-_bits, available<_bits => LT
BLT oc_pack_read_refill
RSB r0,r1,#32 ; r0 = 32-_bits
MOV r0,r2,LSR r0 ; r0 = window>>32-_bits
MOV r2,r2,LSL r1 ; r2 = window<<=_bits
STMIA r12,{r2,r3} ; window = r2
; available = r3
MOV PC,r14
; We need to refill window.
oc_pack_read1_refill
MOV r1,#1
oc_pack_read_refill
STMFD r13!,{r10,r11,r14}
LDMIA r0,{r10,r11} ; r10 = stop
; r11 = ptr
RSB r0,r1,#32 ; r0 = 32-_bits
RSB r3,r3,r0 ; r3 = 32-available
; We can use unsigned compares for both the pointers and for available
; (allowing us to chain condition codes) because available will never be
; larger than 32 (or we wouldn't be here), and thus 32-available will never be
; negative.
CMP r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
CMPHI r10,r11 ; ptr<stop => HI
CMPHI r3,#7 ; available<=24 => HI
LDRBHI r14,[r11],#1 ; r14 = *ptr++
SUBHI r3,#8 ; available += 8
; (HI) Stall...
ORRHI r2,r2,r14,LSL r3 ; r2 = window|=r14<<32-available
SUBS r3,r0,r3 ; r3 = available-=_bits, available<bits => GT
BLT oc_pack_read_refill_last
MOV r0,r2,LSR r0 ; r0 = window>>32-_bits
MOV r2,r2,LSL r1 ; r2 = window<<=_bits
STR r11,[r12,#-4] ; ptr = r11
STMIA r12,{r2,r3} ; window = r2
; available = r3
LDMFD r13!,{r10,r11,PC}
; Either we wanted to read more than 24 bits and didn't have enough room to
; stuff the last byte into the window, or we hit the end of the packet.
oc_pack_read_refill_last
CMP r11,r10 ; ptr<stop => LO
; If we didn't hit the end of the packet, then pull enough of the next byte to
; to fill up the window.
LDRBLO r14,[r11] ; (LO) r14 = *ptr
; Otherwise, set the EOF flag and pretend we have lots of available bits.
MOVHS r14,#1 ; (HS) r14 = 1
ADDLO r10,r3,r1 ; (LO) r10 = available
STRHS r14,[r12,#8] ; (HS) eof = 1
ANDLO r10,r10,#7 ; (LO) r10 = available&7
MOVHS r3,#1<<30 ; (HS) available = OC_LOTS_OF_BITS
ORRLO r2,r2,r14,LSL r10 ; (LO) r2 = window|=*ptr>>(available&7)
MOV r0,r2,LSR r0 ; r0 = window>>32-_bits
MOV r2,r2,LSL r1 ; r2 = window<<=_bits
STR r11,[r12,#-4] ; ptr = r11
STMIA r12,{r2,r3} ; window = r2
; available = r3
LDMFD r13!,{r10,r11,PC}
ENDP
oc_huff_token_decode_arm PROC
; r0 = oc_pack_buf *_b
; r1 = const ogg_int16_t *_tree
STMFD r13!,{r4,r5,r10,r14}
LDRSH r10,[r1] ; r10 = n=_tree[0]
LDMIA r0,{r2-r5} ; r2 = stop
; Stall... ; r3 = ptr
; Stall... ; r4 = window
; r5 = available
CMP r10,r5 ; n>available => GT
BGT oc_huff_token_decode_refill0
RSB r14,r10,#32 ; r14 = 32-n
MOV r14,r4,LSR r14 ; r14 = bits=window>>32-n
ADD r14,r1,r14,LSL #1 ; r14 = _tree+bits
LDRSH r12,[r14,#2] ; r12 = node=_tree[1+bits]
; Stall...
; Stall...
RSBS r14,r12,#0 ; r14 = -node, node>0 => MI
BMI oc_huff_token_decode_continue
MOV r10,r14,LSR #8 ; r10 = n=node>>8
MOV r4,r4,LSL r10 ; r4 = window<<=n
SUB r5,r10 ; r5 = available-=n
STMIB r0,{r3-r5} ; ptr = r3
; window = r4
; available = r5
AND r0,r14,#255 ; r0 = node&255
LDMFD r13!,{r4,r5,r10,pc}
; The first tree node wasn't enough to reach a leaf, read another
oc_huff_token_decode_continue
ADD r12,r1,r12,LSL #1 ; r12 = _tree+node
MOV r4,r4,LSL r10 ; r4 = window<<=n
SUB r5,r5,r10 ; r5 = available-=n
LDRSH r10,[r12],#2 ; r10 = n=_tree[node]
; Stall... ; r12 = _tree+node+1
; Stall...
CMP r10,r5 ; n>available => GT
BGT oc_huff_token_decode_refill
RSB r14,r10,#32 ; r14 = 32-n
MOV r14,r4,LSR r14 ; r14 = bits=window>>32-n
ADD r12,r12,r14 ;
LDRSH r12,[r12,r14] ; r12 = node=_tree[node+1+bits]
; Stall...
; Stall...
RSBS r14,r12,#0 ; r14 = -node, node>0 => MI
BMI oc_huff_token_decode_continue
MOV r10,r14,LSR #8 ; r10 = n=node>>8
MOV r4,r4,LSL r10 ; r4 = window<<=n
SUB r5,r10 ; r5 = available-=n
STMIB r0,{r3-r5} ; ptr = r3
; window = r4
; available = r5
AND r0,r14,#255 ; r0 = node&255
LDMFD r13!,{r4,r5,r10,pc}
oc_huff_token_decode_refill0
ADD r12,r1,#2 ; r12 = _tree+1
oc_huff_token_decode_refill
; We can't possibly need more than 15 bits, so available must be <= 15.
; Therefore we can load at least two bytes without checking it.
CMP r2,r3 ; ptr<stop => HI
LDRBHI r14,[r3],#1 ; r14 = *ptr++
RSBHI r5,r5,#24 ; (HI) available = 32-(available+=8)
RSBLS r5,r5,#32 ; (LS) r5 = 32-available
ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
CMPHI r2,r3 ; ptr<stop => HI
LDRBHI r14,[r3],#1 ; r14 = *ptr++
SUBHI r5,#8 ; available += 8
; (HI) Stall...
ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
; We can use unsigned compares for both the pointers and for available
; (allowing us to chain condition codes) because available will never be
; larger than 32 (or we wouldn't be here), and thus 32-available will never be
; negative.
CMPHI r2,r3 ; ptr<stop => HI
CMPHI r5,#7 ; available<=24 => HI
LDRBHI r14,[r3],#1 ; r14 = *ptr++
SUBHI r5,#8 ; available += 8
; (HI) Stall...
ORRHI r4,r4,r14,LSL r5 ; r4 = window|=r14<<32-available
CMP r2,r3 ; ptr<stop => HI
MOVLS r5,#-1<<30 ; (LS) available = OC_LOTS_OF_BITS+32
CMPHI r5,#7 ; (HI) available<=24 => HI
LDRBHI r14,[r3],#1 ; (HI) r14 = *ptr++
SUBHI r5,#8 ; (HI) available += 8
; (HI) Stall...
ORRHI r4,r4,r14,LSL r5 ; (HI) r4 = window|=r14<<32-available
RSB r14,r10,#32 ; r14 = 32-n
MOV r14,r4,LSR r14 ; r14 = bits=window>>32-n
ADD r12,r12,r14 ;
LDRSH r12,[r12,r14] ; r12 = node=_tree[node+1+bits]
RSB r5,r5,#32 ; r5 = available
; Stall...
RSBS r14,r12,#0 ; r14 = -node, node>0 => MI
BMI oc_huff_token_decode_continue
MOV r10,r14,LSR #8 ; r10 = n=node>>8
MOV r4,r4,LSL r10 ; r4 = window<<=n
SUB r5,r10 ; r5 = available-=n
STMIB r0,{r3-r5} ; ptr = r3
; window = r4
; available = r5
AND r0,r14,#255 ; r0 = node&255
LDMFD r13!,{r4,r5,r10,pc}
ENDP
END

View File

@ -1,154 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
CPU capability detection for ARM processors.
function:
last mod: $Id: cpu.c 17344 2010-07-21 01:42:18Z tterribe $
********************************************************************/
#include "armcpu.h"
#if !defined(OC_ARM_ASM)|| \
!defined(OC_ARM_ASM_EDSP)&&!defined(OC_ARM_ASM_MEDIA)&& \
!defined(OC_ARM_ASM_NEON)
ogg_uint32_t oc_cpu_flags_get(void){
return 0;
}
#elif defined(_MSC_VER)
/*For GetExceptionCode() and EXCEPTION_ILLEGAL_INSTRUCTION.*/
# define WIN32_LEAN_AND_MEAN
# define WIN32_EXTRA_LEAN
# include <windows.h>
ogg_uint32_t oc_cpu_flags_get(void){
ogg_uint32_t flags;
flags=0;
/*MSVC has no inline __asm support for ARM, but it does let you __emit
instructions via their assembled hex code.
All of these instructions should be essentially nops.*/
# if defined(OC_ARM_ASM_EDSP)
__try{
/*PLD [r13]*/
__emit(0xF5DDF000);
flags|=OC_CPU_ARM_EDSP;
}
__except(GetExceptionCode()==EXCEPTION_ILLEGAL_INSTRUCTION){
/*Ignore exception.*/
}
# if defined(OC_ARM_ASM_MEDIA)
__try{
/*SHADD8 r3,r3,r3*/
__emit(0xE6333F93);
flags|=OC_CPU_ARM_MEDIA;
}
__except(GetExceptionCode()==EXCEPTION_ILLEGAL_INSTRUCTION){
/*Ignore exception.*/
}
# if defined(OC_ARM_ASM_NEON)
__try{
/*VORR q0,q0,q0*/
__emit(0xF2200150);
flags|=OC_CPU_ARM_NEON;
}
__except(GetExceptionCode()==EXCEPTION_ILLEGAL_INSTRUCTION){
/*Ignore exception.*/
}
# endif
# endif
# endif
return flags;
}
#elif defined(__linux__)
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
ogg_uint32_t oc_cpu_flags_get(void){
ogg_uint32_t flags;
FILE *fin;
flags=0;
/*Reading /proc/self/auxv would be easier, but that doesn't work reliably on
Android.
This also means that detection will fail in Scratchbox.*/
fin=fopen("/proc/cpuinfo","r");
if(fin!=NULL){
/*512 should be enough for anybody (it's even enough for all the flags that
x86 has accumulated... so far).*/
char buf[512];
while(fgets(buf,511,fin)!=NULL){
if(memcmp(buf,"Features",8)==0){
char *p;
p=strstr(buf," edsp");
if(p!=NULL&&(p[5]==' '||p[5]=='\n'))flags|=OC_CPU_ARM_EDSP;
p=strstr(buf," neon");
if(p!=NULL&&(p[5]==' '||p[5]=='\n'))flags|=OC_CPU_ARM_NEON;
}
if(memcmp(buf,"CPU architecture:",17)==0){
int version;
version=atoi(buf+17);
if(version>=6)flags|=OC_CPU_ARM_MEDIA;
}
}
fclose(fin);
}
return flags;
}
#elif defined(__riscos__)
#include <kernel.h>
#include <swis.h>
ogg_uint32_t oc_cpu_flags_get(void) {
ogg_uint32_t flags = 0;
#if defined(OC_ARM_ASM_EDSP) || defined(OC_ARM_ASM_MEDIA)
if (_swi(OS_Byte,_IN(0)|_IN(2)|_RETURN(1), 129, 0xFF) <= 0xA9)
_swix(OS_Module, _INR(0,1), 1, "System:Modules.CallASWI");
ogg_uint32_t features;
_kernel_oserror* test = _swix(OS_PlatformFeatures, _IN(0)|_OUT(0), 0, &features);
if (test == NULL) {
#if defined(OC_ARM_ASM_EDSP)
if((features>>10 & 1) == 1)flags|=OC_CPU_ARM_EDSP;
#endif
#if defined(OC_ARM_ASM_MEDIA)
if ((features>>31 & 1) == 1) {
ogg_uint32_t shadd = 0;
test =_swix(OS_PlatformFeatures, _INR(0,1)|_OUT(0), 34, 29, &shadd);
if (test==NULL && shadd==1)flags|=OC_CPU_ARM_MEDIA;
}
#endif
}
#endif
#if defined(OC_ARM_ASM_NEON)
ogg_uint32_t mvfr1;
test = _swix(VFPSupport_Features, _IN(0)|_OUT(2), 0, &mvfr1);
if (test==NULL && (mvfr1 & 0xFFF00)==0x11100)flags|=OC_CPU_ARM_NEON;
#endif
return flags;
}
#else
/*The feature registers which can tell us what the processor supports are
accessible in priveleged modes only, so we can't have a general user-space
detection method like on x86.*/
# error "Configured to use ARM asm but no CPU detection method available for " \
"your platform. Reconfigure with --disable-asm (or send patches)."
#endif

View File

@ -1,29 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: cpu.h 17344 2010-07-21 01:42:18Z tterribe $
********************************************************************/
#if !defined(_arm_armcpu_H)
# define _arm_armcpu_H (1)
#include "../internal.h"
/*"Parallel instructions" from ARM v6 and above.*/
#define OC_CPU_ARM_MEDIA (1<<24)
/*Flags chosen to match arch/arm/include/asm/hwcap.h in the Linux kernel.*/
#define OC_CPU_ARM_EDSP (1<<7)
#define OC_CPU_ARM_NEON (1<<12)
ogg_uint32_t oc_cpu_flags_get(void);
#endif

View File

@ -1,655 +0,0 @@
;********************************************************************
;* *
;* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
;* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
;* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
;* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
;* *
;* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
;* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
;* *
;********************************************************************
; Original implementation:
; Copyright (C) 2009 Robin Watts for Pinknoise Productions Ltd
; last mod: $Id$
;********************************************************************
AREA |.text|, CODE, READONLY
GET armopts.s
; Vanilla ARM v4 versions
EXPORT oc_frag_copy_list_arm
EXPORT oc_frag_recon_intra_arm
EXPORT oc_frag_recon_inter_arm
EXPORT oc_frag_recon_inter2_arm
oc_frag_copy_list_arm PROC
; r0 = _dst_frame
; r1 = _src_frame
; r2 = _ystride
; r3 = _fragis
; <> = _nfragis
; <> = _frag_buf_offs
LDR r12,[r13] ; r12 = _nfragis
STMFD r13!,{r4-r6,r11,r14}
SUBS r12, r12, #1
LDR r4,[r3],#4 ; r4 = _fragis[fragii]
LDRGE r14,[r13,#4*6] ; r14 = _frag_buf_offs
BLT ofcl_arm_end
SUB r2, r2, #4
ofcl_arm_lp
LDR r11,[r14,r4,LSL #2] ; r11 = _frag_buf_offs[_fragis[fragii]]
SUBS r12, r12, #1
; Stall (on XScale)
ADD r4, r1, r11 ; r4 = _src_frame+frag_buf_off
LDR r6, [r4], #4
ADD r11,r0, r11 ; r11 = _dst_frame+frag_buf_off
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4], r2
STR r6, [r11],#4
LDR r6, [r4], #4
STR r5, [r11],r2
LDR r5, [r4]
LDRGE r4,[r3],#4 ; r4 = _fragis[fragii]
STR r6, [r11],#4
STR r5, [r11]
BGE ofcl_arm_lp
ofcl_arm_end
LDMFD r13!,{r4-r6,r11,PC}
oc_frag_recon_intra_arm
; r0 = unsigned char *_dst
; r1 = int _ystride
; r2 = const ogg_int16_t _residue[64]
STMFD r13!,{r4,r5,r14}
MOV r14,#8
MOV r5, #255
SUB r1, r1, #7
ofrintra_lp_arm
LDRSH r3, [r2], #2
LDRSH r4, [r2], #2
LDRSH r12,[r2], #2
ADDS r3, r3, #128
CMPGT r5, r3
EORLT r3, r5, r3, ASR #32
STRB r3, [r0], #1
ADDS r4, r4, #128
CMPGT r5, r4
EORLT r4, r5, r4, ASR #32
LDRSH r3, [r2], #2
STRB r4, [r0], #1
ADDS r12,r12,#128
CMPGT r5, r12
EORLT r12,r5, r12,ASR #32
LDRSH r4, [r2], #2
STRB r12,[r0], #1
ADDS r3, r3, #128
CMPGT r5, r3
EORLT r3, r5, r3, ASR #32
LDRSH r12,[r2], #2
STRB r3, [r0], #1
ADDS r4, r4, #128
CMPGT r5, r4
EORLT r4, r5, r4, ASR #32
LDRSH r3, [r2], #2
STRB r4, [r0], #1
ADDS r12,r12,#128
CMPGT r5, r12
EORLT r12,r5, r12,ASR #32
LDRSH r4, [r2], #2
STRB r12,[r0], #1
ADDS r3, r3, #128
CMPGT r5, r3
EORLT r3, r5, r3, ASR #32
STRB r3, [r0], #1
ADDS r4, r4, #128
CMPGT r5, r4
EORLT r4, r5, r4, ASR #32
STRB r4, [r0], r1
SUBS r14,r14,#1
BGT ofrintra_lp_arm
LDMFD r13!,{r4,r5,PC}
ENDP
oc_frag_recon_inter_arm PROC
; r0 = unsigned char *dst
; r1 = const unsigned char *src
; r2 = int ystride
; r3 = const ogg_int16_t residue[64]
STMFD r13!,{r5,r9-r11,r14}
MOV r9, #8
MOV r5, #255
SUB r2, r2, #7
ofrinter_lp_arm
LDRSH r12,[r3], #2
LDRB r14,[r1], #1
LDRSH r11,[r3], #2
LDRB r10,[r1], #1
ADDS r12,r12,r14
CMPGT r5, r12
EORLT r12,r5, r12,ASR #32
STRB r12,[r0], #1
ADDS r11,r11,r10
CMPGT r5, r11
LDRSH r12,[r3], #2
LDRB r14,[r1], #1
EORLT r11,r5, r11,ASR #32
STRB r11,[r0], #1
ADDS r12,r12,r14
CMPGT r5, r12
LDRSH r11,[r3], #2
LDRB r10,[r1], #1
EORLT r12,r5, r12,ASR #32
STRB r12,[r0], #1
ADDS r11,r11,r10
CMPGT r5, r11
LDRSH r12,[r3], #2
LDRB r14,[r1], #1
EORLT r11,r5, r11,ASR #32
STRB r11,[r0], #1
ADDS r12,r12,r14
CMPGT r5, r12
LDRSH r11,[r3], #2
LDRB r10,[r1], #1
EORLT r12,r5, r12,ASR #32
STRB r12,[r0], #1
ADDS r11,r11,r10
CMPGT r5, r11
LDRSH r12,[r3], #2
LDRB r14,[r1], #1
EORLT r11,r5, r11,ASR #32
STRB r11,[r0], #1
ADDS r12,r12,r14
CMPGT r5, r12
LDRSH r11,[r3], #2
LDRB r10,[r1], r2
EORLT r12,r5, r12,ASR #32
STRB r12,[r0], #1
ADDS r11,r11,r10
CMPGT r5, r11
EORLT r11,r5, r11,ASR #32
STRB r11,[r0], r2
SUBS r9, r9, #1
BGT ofrinter_lp_arm
LDMFD r13!,{r5,r9-r11,PC}
ENDP
oc_frag_recon_inter2_arm PROC
; r0 = unsigned char *dst
; r1 = const unsigned char *src1
; r2 = const unsigned char *src2
; r3 = int ystride
LDR r12,[r13]
; r12= const ogg_int16_t residue[64]
STMFD r13!,{r4-r8,r14}
MOV r14,#8
MOV r8, #255
SUB r3, r3, #7
ofrinter2_lp_arm
LDRB r5, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
LDRB r7, [r1], #1
ADD r5, r5, r6
ADDS r5, r4, r5, LSR #1
CMPGT r8, r5
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r5, r8, r5, ASR #32
STRB r5, [r0], #1
ADD r7, r7, r6
ADDS r7, r4, r7, LSR #1
CMPGT r8, r7
LDRB r5, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r7, r8, r7, ASR #32
STRB r7, [r0], #1
ADD r5, r5, r6
ADDS r5, r4, r5, LSR #1
CMPGT r8, r5
LDRB r7, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r5, r8, r5, ASR #32
STRB r5, [r0], #1
ADD r7, r7, r6
ADDS r7, r4, r7, LSR #1
CMPGT r8, r7
LDRB r5, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r7, r8, r7, ASR #32
STRB r7, [r0], #1
ADD r5, r5, r6
ADDS r5, r4, r5, LSR #1
CMPGT r8, r5
LDRB r7, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r5, r8, r5, ASR #32
STRB r5, [r0], #1
ADD r7, r7, r6
ADDS r7, r4, r7, LSR #1
CMPGT r8, r7
LDRB r5, [r1], #1
LDRB r6, [r2], #1
LDRSH r4, [r12],#2
EORLT r7, r8, r7, ASR #32
STRB r7, [r0], #1
ADD r5, r5, r6
ADDS r5, r4, r5, LSR #1
CMPGT r8, r5
LDRB r7, [r1], r3
LDRB r6, [r2], r3
LDRSH r4, [r12],#2
EORLT r5, r8, r5, ASR #32
STRB r5, [r0], #1
ADD r7, r7, r6
ADDS r7, r4, r7, LSR #1
CMPGT r8, r7
EORLT r7, r8, r7, ASR #32
STRB r7, [r0], r3
SUBS r14,r14,#1
BGT ofrinter2_lp_arm
LDMFD r13!,{r4-r8,PC}
ENDP
[ OC_ARM_ASM_EDSP
EXPORT oc_frag_copy_list_edsp
oc_frag_copy_list_edsp PROC
; r0 = _dst_frame
; r1 = _src_frame
; r2 = _ystride
; r3 = _fragis
; <> = _nfragis
; <> = _frag_buf_offs
LDR r12,[r13] ; r12 = _nfragis
STMFD r13!,{r4-r11,r14}
SUBS r12, r12, #1
LDRGE r5, [r3],#4 ; r5 = _fragis[fragii]
LDRGE r14,[r13,#4*10] ; r14 = _frag_buf_offs
BLT ofcl_edsp_end
ofcl_edsp_lp
MOV r4, r1
LDR r5, [r14,r5, LSL #2] ; r5 = _frag_buf_offs[_fragis[fragii]]
SUBS r12, r12, #1
; Stall (on XScale)
LDRD r6, [r4, r5]! ; r4 = _src_frame+frag_buf_off
LDRD r8, [r4, r2]!
; Stall
STRD r6, [r5, r0]! ; r5 = _dst_frame+frag_buf_off
STRD r8, [r5, r2]!
; Stall
LDRD r6, [r4, r2]! ; On Xscale at least, doing 3 consecutive
LDRD r8, [r4, r2]! ; loads causes a stall, but that's no worse
LDRD r10,[r4, r2]! ; than us only doing 2, and having to do
; another pair of LDRD/STRD later on.
; Stall
STRD r6, [r5, r2]!
STRD r8, [r5, r2]!
STRD r10,[r5, r2]!
LDRD r6, [r4, r2]!
LDRD r8, [r4, r2]!
LDRD r10,[r4, r2]!
STRD r6, [r5, r2]!
STRD r8, [r5, r2]!
STRD r10,[r5, r2]!
LDRGE r5, [r3],#4 ; r5 = _fragis[fragii]
BGE ofcl_edsp_lp
ofcl_edsp_end
LDMFD r13!,{r4-r11,PC}
ENDP
]
[ OC_ARM_ASM_MEDIA
EXPORT oc_frag_recon_intra_v6
EXPORT oc_frag_recon_inter_v6
EXPORT oc_frag_recon_inter2_v6
oc_frag_recon_intra_v6 PROC
; r0 = unsigned char *_dst
; r1 = int _ystride
; r2 = const ogg_int16_t _residue[64]
STMFD r13!,{r4-r6,r14}
MOV r14,#8
MOV r12,r2
LDR r6, =0x00800080
ofrintra_v6_lp
LDRD r2, [r12],#8 ; r2 = 11110000 r3 = 33332222
LDRD r4, [r12],#8 ; r4 = 55554444 r5 = 77776666
SUBS r14,r14,#1
QADD16 r2, r2, r6
QADD16 r3, r3, r6
QADD16 r4, r4, r6
QADD16 r5, r5, r6
USAT16 r2, #8, r2 ; r2 = __11__00
USAT16 r3, #8, r3 ; r3 = __33__22
USAT16 r4, #8, r4 ; r4 = __55__44
USAT16 r5, #8, r5 ; r5 = __77__66
ORR r2, r2, r2, LSR #8 ; r2 = __111100
ORR r3, r3, r3, LSR #8 ; r3 = __333322
ORR r4, r4, r4, LSR #8 ; r4 = __555544
ORR r5, r5, r5, LSR #8 ; r5 = __777766
PKHBT r2, r2, r3, LSL #16 ; r2 = 33221100
PKHBT r3, r4, r5, LSL #16 ; r3 = 77665544
STRD r2, r3, [r0], r1
BGT ofrintra_v6_lp
LDMFD r13!,{r4-r6,PC}
ENDP
oc_frag_recon_inter_v6 PROC
; r0 = unsigned char *_dst
; r1 = const unsigned char *_src
; r2 = int _ystride
; r3 = const ogg_int16_t _residue[64]
STMFD r13!,{r4-r7,r14}
MOV r14,#8
ofrinter_v6_lp
LDRD r6, [r3], #8 ; r6 = 11110000 r7 = 33332222
SUBS r14,r14,#1
[ OC_ARM_CAN_UNALIGN_LDRD
LDRD r4, [r1], r2 ; Unaligned ; r4 = 33221100 r5 = 77665544
|
LDR r5, [r1, #4]
LDR r4, [r1], r2
]
PKHBT r12,r6, r7, LSL #16 ; r12= 22220000
PKHTB r7, r7, r6, ASR #16 ; r7 = 33331111
UXTB16 r6,r4 ; r6 = __22__00
UXTB16 r4,r4, ROR #8 ; r4 = __33__11
QADD16 r12,r12,r6 ; r12= xx22xx00
QADD16 r4, r7, r4 ; r4 = xx33xx11
LDRD r6, [r3], #8 ; r6 = 55554444 r7 = 77776666
USAT16 r4, #8, r4 ; r4 = __33__11
USAT16 r12,#8,r12 ; r12= __22__00
ORR r4, r12,r4, LSL #8 ; r4 = 33221100
PKHBT r12,r6, r7, LSL #16 ; r12= 66664444
PKHTB r7, r7, r6, ASR #16 ; r7 = 77775555
UXTB16 r6,r5 ; r6 = __66__44
UXTB16 r5,r5, ROR #8 ; r5 = __77__55
QADD16 r12,r12,r6 ; r12= xx66xx44
QADD16 r5, r7, r5 ; r5 = xx77xx55
USAT16 r12,#8, r12 ; r12= __66__44
USAT16 r5, #8, r5 ; r4 = __77__55
ORR r5, r12,r5, LSL #8 ; r5 = 33221100
STRD r4, r5, [r0], r2
BGT ofrinter_v6_lp
LDMFD r13!,{r4-r7,PC}
ENDP
oc_frag_recon_inter2_v6 PROC
; r0 = unsigned char *_dst
; r1 = const unsigned char *_src1
; r2 = const unsigned char *_src2
; r3 = int _ystride
LDR r12,[r13]
; r12= const ogg_int16_t _residue[64]
STMFD r13!,{r4-r9,r14}
MOV r14,#8
ofrinter2_v6_lp
LDRD r6, [r12,#8] ; r6 = 55554444 r7 = 77776666
SUBS r14,r14,#1
LDR r4, [r1, #4] ; Unaligned ; r4 = src1[1] = 77665544
LDR r5, [r2, #4] ; Unaligned ; r5 = src2[1] = 77665544
PKHBT r8, r6, r7, LSL #16 ; r8 = 66664444
PKHTB r9, r7, r6, ASR #16 ; r9 = 77775555
UHADD8 r4, r4, r5 ; r4 = (src1[7,6,5,4] + src2[7,6,5,4])>>1
UXTB16 r5, r4 ; r5 = __66__44
UXTB16 r4, r4, ROR #8 ; r4 = __77__55
QADD16 r8, r8, r5 ; r8 = xx66xx44
QADD16 r9, r9, r4 ; r9 = xx77xx55
LDRD r6,[r12],#16 ; r6 = 33332222 r7 = 11110000
USAT16 r8, #8, r8 ; r8 = __66__44
LDR r4, [r1], r3 ; Unaligned ; r4 = src1[0] = 33221100
USAT16 r9, #8, r9 ; r9 = __77__55
LDR r5, [r2], r3 ; Unaligned ; r5 = src2[0] = 33221100
ORR r9, r8, r9, LSL #8 ; r9 = 77665544
PKHBT r8, r6, r7, LSL #16 ; r8 = 22220000
UHADD8 r4, r4, r5 ; r4 = (src1[3,2,1,0] + src2[3,2,1,0])>>1
PKHTB r7, r7, r6, ASR #16 ; r7 = 33331111
UXTB16 r5, r4 ; r5 = __22__00
UXTB16 r4, r4, ROR #8 ; r4 = __33__11
QADD16 r8, r8, r5 ; r8 = xx22xx00
QADD16 r7, r7, r4 ; r7 = xx33xx11
USAT16 r8, #8, r8 ; r8 = __22__00
USAT16 r7, #8, r7 ; r7 = __33__11
ORR r8, r8, r7, LSL #8 ; r8 = 33221100
STRD r8, r9, [r0], r3
BGT ofrinter2_v6_lp
LDMFD r13!,{r4-r9,PC}
ENDP
]
[ OC_ARM_ASM_NEON
EXPORT oc_frag_copy_list_neon
EXPORT oc_frag_recon_intra_neon
EXPORT oc_frag_recon_inter_neon
EXPORT oc_frag_recon_inter2_neon
oc_frag_copy_list_neon PROC
; r0 = _dst_frame
; r1 = _src_frame
; r2 = _ystride
; r3 = _fragis
; <> = _nfragis
; <> = _frag_buf_offs
LDR r12,[r13] ; r12 = _nfragis
STMFD r13!,{r4-r7,r14}
CMP r12, #1
LDRGE r6, [r3] ; r6 = _fragis[fragii]
LDRGE r14,[r13,#4*6] ; r14 = _frag_buf_offs
BLT ofcl_neon_end
; Stall (2 on Xscale)
LDR r6, [r14,r6, LSL #2] ; r6 = _frag_buf_offs[_fragis[fragii]]
; Stall (on XScale)
MOV r7, r6 ; Guarantee PLD points somewhere valid.
ofcl_neon_lp
ADD r4, r1, r6
VLD1.64 {D0}, [r4@64], r2
ADD r5, r0, r6
VLD1.64 {D1}, [r4@64], r2
SUBS r12, r12, #1
VLD1.64 {D2}, [r4@64], r2
LDRGT r6, [r3,#4]! ; r6 = _fragis[fragii]
VLD1.64 {D3}, [r4@64], r2
LDRGT r6, [r14,r6, LSL #2] ; r6 = _frag_buf_offs[_fragis[fragii]]
VLD1.64 {D4}, [r4@64], r2
ADDGT r7, r1, r6
VLD1.64 {D5}, [r4@64], r2
PLD [r7]
VLD1.64 {D6}, [r4@64], r2
PLD [r7, r2]
VLD1.64 {D7}, [r4@64]
PLD [r7, r2, LSL #1]
VST1.64 {D0}, [r5@64], r2
ADDGT r7, r7, r2, LSL #2
VST1.64 {D1}, [r5@64], r2
PLD [r7, -r2]
VST1.64 {D2}, [r5@64], r2
PLD [r7]
VST1.64 {D3}, [r5@64], r2
PLD [r7, r2]
VST1.64 {D4}, [r5@64], r2
PLD [r7, r2, LSL #1]
VST1.64 {D5}, [r5@64], r2
ADDGT r7, r7, r2, LSL #2
VST1.64 {D6}, [r5@64], r2
PLD [r7, -r2]
VST1.64 {D7}, [r5@64]
BGT ofcl_neon_lp
ofcl_neon_end
LDMFD r13!,{r4-r7,PC}
ENDP
oc_frag_recon_intra_neon PROC
; r0 = unsigned char *_dst
; r1 = int _ystride
; r2 = const ogg_int16_t _residue[64]
VMOV.I16 Q0, #128
VLDMIA r2, {D16-D31} ; D16= 3333222211110000 etc ; 9(8) cycles
VQADD.S16 Q8, Q8, Q0
VQADD.S16 Q9, Q9, Q0
VQADD.S16 Q10,Q10,Q0
VQADD.S16 Q11,Q11,Q0
VQADD.S16 Q12,Q12,Q0
VQADD.S16 Q13,Q13,Q0
VQADD.S16 Q14,Q14,Q0
VQADD.S16 Q15,Q15,Q0
VQMOVUN.S16 D16,Q8 ; D16= 7766554433221100 ; 1 cycle
VQMOVUN.S16 D17,Q9 ; D17= FFEEDDCCBBAA9988 ; 1 cycle
VQMOVUN.S16 D18,Q10 ; D18= NNMMLLKKJJIIHHGG ; 1 cycle
VST1.64 {D16},[r0@64], r1
VQMOVUN.S16 D19,Q11 ; D19= VVUUTTSSRRQQPPOO ; 1 cycle
VST1.64 {D17},[r0@64], r1
VQMOVUN.S16 D20,Q12 ; D20= ddccbbaaZZYYXXWW ; 1 cycle
VST1.64 {D18},[r0@64], r1
VQMOVUN.S16 D21,Q13 ; D21= llkkjjiihhggffee ; 1 cycle
VST1.64 {D19},[r0@64], r1
VQMOVUN.S16 D22,Q14 ; D22= ttssrrqqppoonnmm ; 1 cycle
VST1.64 {D20},[r0@64], r1
VQMOVUN.S16 D23,Q15 ; D23= !!@@zzyyxxwwvvuu ; 1 cycle
VST1.64 {D21},[r0@64], r1
VST1.64 {D22},[r0@64], r1
VST1.64 {D23},[r0@64], r1
MOV PC,R14
ENDP
oc_frag_recon_inter_neon PROC
; r0 = unsigned char *_dst
; r1 = const unsigned char *_src
; r2 = int _ystride
; r3 = const ogg_int16_t _residue[64]
VLDMIA r3, {D16-D31} ; D16= 3333222211110000 etc ; 9(8) cycles
VLD1.64 {D0}, [r1], r2
VLD1.64 {D2}, [r1], r2
VMOVL.U8 Q0, D0 ; Q0 = __77__66__55__44__33__22__11__00
VLD1.64 {D4}, [r1], r2
VMOVL.U8 Q1, D2 ; etc
VLD1.64 {D6}, [r1], r2
VMOVL.U8 Q2, D4
VMOVL.U8 Q3, D6
VQADD.S16 Q8, Q8, Q0
VLD1.64 {D0}, [r1], r2
VQADD.S16 Q9, Q9, Q1
VLD1.64 {D2}, [r1], r2
VQADD.S16 Q10,Q10,Q2
VLD1.64 {D4}, [r1], r2
VQADD.S16 Q11,Q11,Q3
VLD1.64 {D6}, [r1], r2
VMOVL.U8 Q0, D0
VMOVL.U8 Q1, D2
VMOVL.U8 Q2, D4
VMOVL.U8 Q3, D6
VQADD.S16 Q12,Q12,Q0
VQADD.S16 Q13,Q13,Q1
VQADD.S16 Q14,Q14,Q2
VQADD.S16 Q15,Q15,Q3
VQMOVUN.S16 D16,Q8
VQMOVUN.S16 D17,Q9
VQMOVUN.S16 D18,Q10
VST1.64 {D16},[r0@64], r2
VQMOVUN.S16 D19,Q11
VST1.64 {D17},[r0@64], r2
VQMOVUN.S16 D20,Q12
VST1.64 {D18},[r0@64], r2
VQMOVUN.S16 D21,Q13
VST1.64 {D19},[r0@64], r2
VQMOVUN.S16 D22,Q14
VST1.64 {D20},[r0@64], r2
VQMOVUN.S16 D23,Q15
VST1.64 {D21},[r0@64], r2
VST1.64 {D22},[r0@64], r2
VST1.64 {D23},[r0@64], r2
MOV PC,R14
ENDP
oc_frag_recon_inter2_neon PROC
; r0 = unsigned char *_dst
; r1 = const unsigned char *_src1
; r2 = const unsigned char *_src2
; r3 = int _ystride
LDR r12,[r13]
; r12= const ogg_int16_t _residue[64]
VLDMIA r12,{D16-D31}
VLD1.64 {D0}, [r1], r3
VLD1.64 {D4}, [r2], r3
VLD1.64 {D1}, [r1], r3
VLD1.64 {D5}, [r2], r3
VHADD.U8 Q2, Q0, Q2 ; Q2 = FFEEDDCCBBAA99887766554433221100
VLD1.64 {D2}, [r1], r3
VLD1.64 {D6}, [r2], r3
VMOVL.U8 Q0, D4 ; Q0 = __77__66__55__44__33__22__11__00
VLD1.64 {D3}, [r1], r3
VMOVL.U8 Q2, D5 ; etc
VLD1.64 {D7}, [r2], r3
VHADD.U8 Q3, Q1, Q3
VQADD.S16 Q8, Q8, Q0
VQADD.S16 Q9, Q9, Q2
VLD1.64 {D0}, [r1], r3
VMOVL.U8 Q1, D6
VLD1.64 {D4}, [r2], r3
VMOVL.U8 Q3, D7
VLD1.64 {D1}, [r1], r3
VQADD.S16 Q10,Q10,Q1
VLD1.64 {D5}, [r2], r3
VQADD.S16 Q11,Q11,Q3
VLD1.64 {D2}, [r1], r3
VHADD.U8 Q2, Q0, Q2
VLD1.64 {D6}, [r2], r3
VLD1.64 {D3}, [r1], r3
VMOVL.U8 Q0, D4
VLD1.64 {D7}, [r2], r3
VMOVL.U8 Q2, D5
VHADD.U8 Q3, Q1, Q3
VQADD.S16 Q12,Q12,Q0
VQADD.S16 Q13,Q13,Q2
VMOVL.U8 Q1, D6
VMOVL.U8 Q3, D7
VQADD.S16 Q14,Q14,Q1
VQADD.S16 Q15,Q15,Q3
VQMOVUN.S16 D16,Q8
VQMOVUN.S16 D17,Q9
VQMOVUN.S16 D18,Q10
VST1.64 {D16},[r0@64], r3
VQMOVUN.S16 D19,Q11
VST1.64 {D17},[r0@64], r3
VQMOVUN.S16 D20,Q12
VST1.64 {D18},[r0@64], r3
VQMOVUN.S16 D21,Q13
VST1.64 {D19},[r0@64], r3
VQMOVUN.S16 D22,Q14
VST1.64 {D20},[r0@64], r3
VQMOVUN.S16 D23,Q15
VST1.64 {D21},[r0@64], r3
VST1.64 {D22},[r0@64], r3
VST1.64 {D23},[r0@64], r3
MOV PC,R14
ENDP
]
END

File diff suppressed because it is too large Load Diff

View File

@ -1,126 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: x86int.h 17344 2010-07-21 01:42:18Z tterribe $
********************************************************************/
#if !defined(_arm_armint_H)
# define _arm_armint_H (1)
# include "../internal.h"
# if defined(OC_ARM_ASM)
# if defined(__ARMEB__)
# error "Big-endian configurations are not supported by the ARM asm. " \
"Reconfigure with --disable-asm or undefine OC_ARM_ASM."
# endif
# define oc_state_accel_init oc_state_accel_init_arm
/*This function is implemented entirely in asm, so it's helpful to pull out all
of the things that depend on structure offsets.
We reuse the function pointer with the wrong prototype, though.*/
# define oc_state_loop_filter_frag_rows(_state,_bv,_refi,_pli, \
_fragy0,_fragy_end) \
((oc_loop_filter_frag_rows_arm_func) \
(_state)->opt_vtable.state_loop_filter_frag_rows)( \
(_state)->ref_frame_data[(_refi)],(_state)->ref_ystride[(_pli)], \
(_bv), \
(_state)->frags, \
(_state)->fplanes[(_pli)].froffset \
+(_fragy0)*(ptrdiff_t)(_state)->fplanes[(_pli)].nhfrags, \
(_state)->fplanes[(_pli)].froffset \
+(_fragy_end)*(ptrdiff_t)(_state)->fplanes[(_pli)].nhfrags, \
(_state)->fplanes[(_pli)].froffset, \
(_state)->fplanes[(_pli)].froffset+(_state)->fplanes[(_pli)].nfrags, \
(_state)->frag_buf_offs, \
(_state)->fplanes[(_pli)].nhfrags)
/*For everything else the default vtable macros are fine.*/
# define OC_STATE_USE_VTABLE (1)
# endif
# include "../state.h"
# include "armcpu.h"
# if defined(OC_ARM_ASM)
typedef void (*oc_loop_filter_frag_rows_arm_func)(
unsigned char *_ref_frame_data,int _ystride,signed char _bv[256],
const oc_fragment *_frags,ptrdiff_t _fragi0,ptrdiff_t _fragi0_end,
ptrdiff_t _fragi_top,ptrdiff_t _fragi_bot,
const ptrdiff_t *_frag_buf_offs,int _nhfrags);
void oc_state_accel_init_arm(oc_theora_state *_state);
void oc_frag_copy_list_arm(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void oc_frag_recon_intra_arm(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_arm(unsigned char *_dst,const unsigned char *_src,
int _ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_arm(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue);
void oc_idct8x8_1_arm(ogg_int16_t _y[64],ogg_uint16_t _dc);
void oc_idct8x8_arm(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_arm(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_loop_filter_frag_rows_arm(unsigned char *_ref_frame_data,
int _ystride,signed char *_bv,const oc_fragment *_frags,ptrdiff_t _fragi0,
ptrdiff_t _fragi0_end,ptrdiff_t _fragi_top,ptrdiff_t _fragi_bot,
const ptrdiff_t *_frag_buf_offs,int _nhfrags);
# if defined(OC_ARM_ASM_EDSP)
void oc_frag_copy_list_edsp(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
# if defined(OC_ARM_ASM_MEDIA)
void oc_frag_recon_intra_v6(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_v6(unsigned char *_dst,const unsigned char *_src,
int _ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_v6(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue);
void oc_idct8x8_1_v6(ogg_int16_t _y[64],ogg_uint16_t _dc);
void oc_idct8x8_v6(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_v6(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_loop_filter_init_v6(signed char *_bv,int _flimit);
void oc_loop_filter_frag_rows_v6(unsigned char *_ref_frame_data,
int _ystride,signed char *_bv,const oc_fragment *_frags,ptrdiff_t _fragi0,
ptrdiff_t _fragi0_end,ptrdiff_t _fragi_top,ptrdiff_t _fragi_bot,
const ptrdiff_t *_frag_buf_offs,int _nhfrags);
# if defined(OC_ARM_ASM_NEON)
void oc_frag_copy_list_neon(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void oc_frag_recon_intra_neon(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_neon(unsigned char *_dst,const unsigned char *_src,
int _ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_neon(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue);
void oc_idct8x8_1_neon(ogg_int16_t _y[64],ogg_uint16_t _dc);
void oc_idct8x8_neon(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_neon(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_loop_filter_init_neon(signed char *_bv,int _flimit);
void oc_loop_filter_frag_rows_neon(unsigned char *_ref_frame_data,
int _ystride,signed char *_bv,const oc_fragment *_frags,ptrdiff_t _fragi0,
ptrdiff_t _fragi0_end,ptrdiff_t _fragi_top,ptrdiff_t _fragi_bot,
const ptrdiff_t *_frag_buf_offs,int _nhfrags);
# endif
# endif
# endif
# endif
#endif

View File

@ -1,676 +0,0 @@
;********************************************************************
;* *
;* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
;* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
;* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
;* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
;* *
;* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
;* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
;* *
;********************************************************************
; Original implementation:
; Copyright (C) 2009 Robin Watts for Pinknoise Productions Ltd
; last mod: $Id$
;********************************************************************
AREA |.text|, CODE, READONLY
GET armopts.s
EXPORT oc_loop_filter_frag_rows_arm
; Which bit this is depends on the order of packing within a bitfield.
; Hopefully that doesn't change among any of the relevant compilers.
OC_FRAG_CODED_FLAG * 1
; Vanilla ARM v4 version
loop_filter_h_arm PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int *_bv
; preserves r0-r3
STMFD r13!,{r3-r6,r14}
MOV r14,#8
MOV r6, #255
lfh_arm_lp
LDRB r3, [r0, #-2] ; r3 = _pix[0]
LDRB r12,[r0, #1] ; r12= _pix[3]
LDRB r4, [r0, #-1] ; r4 = _pix[1]
LDRB r5, [r0] ; r5 = _pix[2]
SUB r3, r3, r12 ; r3 = _pix[0]-_pix[3]+4
ADD r3, r3, #4
SUB r12,r5, r4 ; r12= _pix[2]-_pix[1]
ADD r12,r12,r12,LSL #1 ; r12= 3*(_pix[2]-_pix[1])
ADD r12,r12,r3 ; r12= _pix[0]-_pix[3]+3*(_pix[2]-_pix[1])+4
MOV r12,r12,ASR #3
LDRSB r12,[r2, r12]
; Stall (2 on Xscale)
ADDS r4, r4, r12
CMPGT r6, r4
EORLT r4, r6, r4, ASR #32
SUBS r5, r5, r12
CMPGT r6, r5
EORLT r5, r6, r5, ASR #32
STRB r4, [r0, #-1]
STRB r5, [r0], r1
SUBS r14,r14,#1
BGT lfh_arm_lp
SUB r0, r0, r1, LSL #3
LDMFD r13!,{r3-r6,PC}
ENDP
loop_filter_v_arm PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int *_bv
; preserves r0-r3
STMFD r13!,{r3-r6,r14}
MOV r14,#8
MOV r6, #255
lfv_arm_lp
LDRB r3, [r0, -r1, LSL #1] ; r3 = _pix[0]
LDRB r12,[r0, r1] ; r12= _pix[3]
LDRB r4, [r0, -r1] ; r4 = _pix[1]
LDRB r5, [r0] ; r5 = _pix[2]
SUB r3, r3, r12 ; r3 = _pix[0]-_pix[3]+4
ADD r3, r3, #4
SUB r12,r5, r4 ; r12= _pix[2]-_pix[1]
ADD r12,r12,r12,LSL #1 ; r12= 3*(_pix[2]-_pix[1])
ADD r12,r12,r3 ; r12= _pix[0]-_pix[3]+3*(_pix[2]-_pix[1])+4
MOV r12,r12,ASR #3
LDRSB r12,[r2, r12]
; Stall (2 on Xscale)
ADDS r4, r4, r12
CMPGT r6, r4
EORLT r4, r6, r4, ASR #32
SUBS r5, r5, r12
CMPGT r6, r5
EORLT r5, r6, r5, ASR #32
STRB r4, [r0, -r1]
STRB r5, [r0], #1
SUBS r14,r14,#1
BGT lfv_arm_lp
SUB r0, r0, #8
LDMFD r13!,{r3-r6,PC}
ENDP
oc_loop_filter_frag_rows_arm PROC
; r0 = _ref_frame_data
; r1 = _ystride
; r2 = _bv
; r3 = _frags
; r4 = _fragi0
; r5 = _fragi0_end
; r6 = _fragi_top
; r7 = _fragi_bot
; r8 = _frag_buf_offs
; r9 = _nhfrags
MOV r12,r13
STMFD r13!,{r0,r4-r11,r14}
LDMFD r12,{r4-r9}
ADD r2, r2, #127 ; _bv += 127
CMP r4, r5 ; if(_fragi0>=_fragi0_end)
BGE oslffri_arm_end ; bail
SUBS r9, r9, #1 ; r9 = _nhfrags-1 if (r9<=0)
BLE oslffri_arm_end ; bail
ADD r3, r3, r4, LSL #2 ; r3 = &_frags[fragi]
ADD r8, r8, r4, LSL #2 ; r8 = &_frag_buf_offs[fragi]
SUB r7, r7, r9 ; _fragi_bot -= _nhfrags;
oslffri_arm_lp1
MOV r10,r4 ; r10= fragi = _fragi0
ADD r11,r4, r9 ; r11= fragi_end-1=fragi+_nhfrags-1
oslffri_arm_lp2
LDR r14,[r3], #4 ; r14= _frags[fragi] _frags++
LDR r0, [r13] ; r0 = _ref_frame_data
LDR r12,[r8], #4 ; r12= _frag_buf_offs[fragi] _frag_buf_offs++
TST r14,#OC_FRAG_CODED_FLAG
BEQ oslffri_arm_uncoded
CMP r10,r4 ; if (fragi>_fragi0)
ADD r0, r0, r12 ; r0 = _ref_frame_data + _frag_buf_offs[fragi]
BLGT loop_filter_h_arm
CMP r4, r6 ; if (_fragi0>_fragi_top)
BLGT loop_filter_v_arm
CMP r10,r11 ; if(fragi+1<fragi_end)===(fragi<fragi_end-1)
LDRLT r12,[r3] ; r12 = _frags[fragi+1]
ADD r0, r0, #8
ADD r10,r10,#1 ; r10 = fragi+1;
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG ; && _frags[fragi+1].coded==0
BLLT loop_filter_h_arm
CMP r10,r7 ; if (fragi<_fragi_bot)
LDRLT r12,[r3, r9, LSL #2] ; r12 = _frags[fragi+1+_nhfrags-1]
SUB r0, r0, #8
ADD r0, r0, r1, LSL #3
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG
BLLT loop_filter_v_arm
CMP r10,r11 ; while(fragi<=fragi_end-1)
BLE oslffri_arm_lp2
MOV r4, r10 ; r4 = fragi0 += _nhfrags
CMP r4, r5
BLT oslffri_arm_lp1
oslffri_arm_end
LDMFD r13!,{r0,r4-r11,PC}
oslffri_arm_uncoded
ADD r10,r10,#1
CMP r10,r11
BLE oslffri_arm_lp2
MOV r4, r10 ; r4 = _fragi0 += _nhfrags
CMP r4, r5
BLT oslffri_arm_lp1
LDMFD r13!,{r0,r4-r11,PC}
ENDP
[ OC_ARM_ASM_MEDIA
EXPORT oc_loop_filter_init_v6
EXPORT oc_loop_filter_frag_rows_v6
oc_loop_filter_init_v6 PROC
; r0 = _bv
; r1 = _flimit (=L from the spec)
MVN r1, r1, LSL #1 ; r1 = <0xFFFFFF|255-2*L>
AND r1, r1, #255 ; r1 = ll=r1&0xFF
ORR r1, r1, r1, LSL #8 ; r1 = <ll|ll>
PKHBT r1, r1, r1, LSL #16 ; r1 = <ll|ll|ll|ll>
STR r1, [r0]
MOV PC,r14
ENDP
; We could use the same strategy as the v filter below, but that would require
; 40 instructions to load the data and transpose it into columns and another
; 32 to write out the results at the end, plus the 52 instructions to do the
; filtering itself.
; This is slightly less, and less code, even assuming we could have shared the
; 52 instructions in the middle with the other function.
; It executes slightly fewer instructions than the ARMv6 approach David Conrad
; proposed for FFmpeg, but not by much:
; http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2010-February/083141.html
; His is a lot less code, though, because it only does two rows at once instead
; of four.
loop_filter_h_v6 PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int _ll
; preserves r0-r3
STMFD r13!,{r4-r11,r14}
LDR r12,=0x10003
BL loop_filter_h_core_v6
ADD r0, r0, r1, LSL #2
BL loop_filter_h_core_v6
SUB r0, r0, r1, LSL #2
LDMFD r13!,{r4-r11,PC}
ENDP
loop_filter_h_core_v6 PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int _ll
; r12= 0x10003
; Preserves r0-r3, r12; Clobbers r4-r11.
LDR r4,[r0, #-2]! ; r4 = <p3|p2|p1|p0>
; Single issue
LDR r5,[r0, r1]! ; r5 = <q3|q2|q1|q0>
UXTB16 r6, r4, ROR #16 ; r6 = <p0|p2>
UXTB16 r4, r4, ROR #8 ; r4 = <p3|p1>
UXTB16 r7, r5, ROR #16 ; r7 = <q0|q2>
UXTB16 r5, r5, ROR #8 ; r5 = <q3|q1>
PKHBT r8, r4, r5, LSL #16 ; r8 = <__|q1|__|p1>
PKHBT r9, r6, r7, LSL #16 ; r9 = <__|q2|__|p2>
SSUB16 r6, r4, r6 ; r6 = <p3-p0|p1-p2>
SMLAD r6, r6, r12,r12 ; r6 = <????|(p3-p0)+3*(p1-p2)+3>
SSUB16 r7, r5, r7 ; r7 = <q3-q0|q1-q2>
SMLAD r7, r7, r12,r12 ; r7 = <????|(q0-q3)+3*(q2-q1)+4>
LDR r4,[r0, r1]! ; r4 = <r3|r2|r1|r0>
MOV r6, r6, ASR #3 ; r6 = <??????|(p3-p0)+3*(p1-p2)+3>>3>
LDR r5,[r0, r1]! ; r5 = <s3|s2|s1|s0>
PKHBT r11,r6, r7, LSL #13 ; r11= <??|-R_q|??|-R_p>
UXTB16 r6, r4, ROR #16 ; r6 = <r0|r2>
UXTB16 r11,r11 ; r11= <__|-R_q|__|-R_p>
UXTB16 r4, r4, ROR #8 ; r4 = <r3|r1>
UXTB16 r7, r5, ROR #16 ; r7 = <s0|s2>
PKHBT r10,r6, r7, LSL #16 ; r10= <__|s2|__|r2>
SSUB16 r6, r4, r6 ; r6 = <r3-r0|r1-r2>
UXTB16 r5, r5, ROR #8 ; r5 = <s3|s1>
SMLAD r6, r6, r12,r12 ; r6 = <????|(r3-r0)+3*(r2-r1)+3>
SSUB16 r7, r5, r7 ; r7 = <r3-r0|r1-r2>
SMLAD r7, r7, r12,r12 ; r7 = <????|(s0-s3)+3*(s2-s1)+4>
ORR r9, r9, r10, LSL #8 ; r9 = <s2|q2|r2|p2>
MOV r6, r6, ASR #3 ; r6 = <??????|(r0-r3)+3*(r2-r1)+4>>3>
PKHBT r10,r4, r5, LSL #16 ; r10= <__|s1|__|r1>
PKHBT r6, r6, r7, LSL #13 ; r6 = <??|-R_s|??|-R_r>
ORR r8, r8, r10, LSL #8 ; r8 = <s1|q1|r1|p1>
UXTB16 r6, r6 ; r6 = <__|-R_s|__|-R_r>
MOV r10,#0
ORR r6, r11,r6, LSL #8 ; r6 = <-R_s|-R_q|-R_r|-R_p>
; Single issue
; There's no min, max or abs instruction.
; SSUB8 and SEL will work for abs, and we can do all the rest with
; unsigned saturated adds, which means the GE flags are still all
; set when we're done computing lflim(abs(R_i),L).
; This allows us to both add and subtract, and split the results by
; the original sign of R_i.
SSUB8 r7, r10,r6
; Single issue
SEL r7, r7, r6 ; r7 = abs(R_i)
; Single issue
UQADD8 r4, r7, r2 ; r4 = 255-max(2*L-abs(R_i),0)
; Single issue
UQADD8 r7, r7, r4
; Single issue
UQSUB8 r7, r7, r4 ; r7 = min(abs(R_i),max(2*L-abs(R_i),0))
; Single issue
UQSUB8 r4, r8, r7
UQADD8 r5, r9, r7
UQADD8 r8, r8, r7
UQSUB8 r9, r9, r7
SEL r8, r8, r4 ; r8 = p1+lflim(R_i,L)
SEL r9, r9, r5 ; r9 = p2-lflim(R_i,L)
MOV r5, r9, LSR #24 ; r5 = s2
STRB r5, [r0,#2]!
MOV r4, r8, LSR #24 ; r4 = s1
STRB r4, [r0,#-1]
MOV r5, r9, LSR #8 ; r5 = r2
STRB r5, [r0,-r1]!
MOV r4, r8, LSR #8 ; r4 = r1
STRB r4, [r0,#-1]
MOV r5, r9, LSR #16 ; r5 = q2
STRB r5, [r0,-r1]!
MOV r4, r8, LSR #16 ; r4 = q1
STRB r4, [r0,#-1]
; Single issue
STRB r9, [r0,-r1]!
; Single issue
STRB r8, [r0,#-1]
MOV PC,r14
ENDP
; This uses the same strategy as the MMXEXT version for x86, except that UHADD8
; computes (a+b>>1) instead of (a+b+1>>1) like PAVGB.
; This works just as well, with the following procedure for computing the
; filter value, f:
; u = ~UHADD8(p1,~p2);
; v = UHADD8(~p1,p2);
; m = v-u;
; a = m^UHADD8(m^p0,m^~p3);
; f = UHADD8(UHADD8(a,u1),v1);
; where f = 127+R, with R in [-127,128] defined as in the spec.
; This is exactly the same amount of arithmetic as the version that uses PAVGB
; as the basic operator.
; It executes about 2/3 the number of instructions of David Conrad's approach,
; but requires more code, because it does all eight columns at once, instead
; of four at a time.
loop_filter_v_v6 PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int _ll
; preserves r0-r11
STMFD r13!,{r4-r11,r14}
LDRD r6, [r0, -r1]! ; r7, r6 = <p5|p1>
LDRD r4, [r0, -r1] ; r5, r4 = <p4|p0>
LDRD r8, [r0, r1]! ; r9, r8 = <p6|p2>
MVN r14,r6 ; r14= ~p1
LDRD r10,[r0, r1] ; r11,r10= <p7|p3>
; Filter the first four columns.
MVN r12,r8 ; r12= ~p2
UHADD8 r14,r14,r8 ; r14= v1=~p1+p2>>1
UHADD8 r12,r12,r6 ; r12= p1+~p2>>1
MVN r10, r10 ; r10=~p3
MVN r12,r12 ; r12= u1=~p1+p2+1>>1
SSUB8 r14,r14,r12 ; r14= m1=v1-u1
; Single issue
EOR r4, r4, r14 ; r4 = m1^p0
EOR r10,r10,r14 ; r10= m1^~p3
UHADD8 r4, r4, r10 ; r4 = (m1^p0)+(m1^~p3)>>1
; Single issue
EOR r4, r4, r14 ; r4 = a1=m1^((m1^p0)+(m1^~p3)>>1)
SADD8 r14,r14,r12 ; r14= v1=m1+u1
UHADD8 r4, r4, r12 ; r4 = a1+u1>>1
MVN r12,r9 ; r12= ~p6
UHADD8 r4, r4, r14 ; r4 = f1=(a1+u1>>1)+v1>>1
; Filter the second four columns.
MVN r14,r7 ; r14= ~p5
UHADD8 r12,r12,r7 ; r12= p5+~p6>>1
UHADD8 r14,r14,r9 ; r14= v2=~p5+p6>>1
MVN r12,r12 ; r12= u2=~p5+p6+1>>1
MVN r11,r11 ; r11=~p7
SSUB8 r10,r14,r12 ; r10= m2=v2-u2
; Single issue
EOR r5, r5, r10 ; r5 = m2^p4
EOR r11,r11,r10 ; r11= m2^~p7
UHADD8 r5, r5, r11 ; r5 = (m2^p4)+(m2^~p7)>>1
; Single issue
EOR r5, r5, r10 ; r5 = a2=m2^((m2^p4)+(m2^~p7)>>1)
; Single issue
UHADD8 r5, r5, r12 ; r5 = a2+u2>>1
LDR r12,=0x7F7F7F7F ; r12 = {127}x4
UHADD8 r5, r5, r14 ; r5 = f2=(a2+u2>>1)+v2>>1
; Now split f[i] by sign.
; There's no min or max instruction.
; We could use SSUB8 and SEL, but this is just as many instructions and
; dual issues more (for v7 without NEON).
UQSUB8 r10,r4, r12 ; r10= R_i>0?R_i:0
UQSUB8 r4, r12,r4 ; r4 = R_i<0?-R_i:0
UQADD8 r11,r10,r2 ; r11= 255-max(2*L-abs(R_i<0),0)
UQADD8 r14,r4, r2 ; r14= 255-max(2*L-abs(R_i>0),0)
UQADD8 r10,r10,r11
UQADD8 r4, r4, r14
UQSUB8 r10,r10,r11 ; r10= min(abs(R_i<0),max(2*L-abs(R_i<0),0))
UQSUB8 r4, r4, r14 ; r4 = min(abs(R_i>0),max(2*L-abs(R_i>0),0))
UQSUB8 r11,r5, r12 ; r11= R_i>0?R_i:0
UQADD8 r6, r6, r10
UQSUB8 r8, r8, r10
UQSUB8 r5, r12,r5 ; r5 = R_i<0?-R_i:0
UQSUB8 r6, r6, r4 ; r6 = p1+lflim(R_i,L)
UQADD8 r8, r8, r4 ; r8 = p2-lflim(R_i,L)
UQADD8 r10,r11,r2 ; r10= 255-max(2*L-abs(R_i<0),0)
UQADD8 r14,r5, r2 ; r14= 255-max(2*L-abs(R_i>0),0)
UQADD8 r11,r11,r10
UQADD8 r5, r5, r14
UQSUB8 r11,r11,r10 ; r11= min(abs(R_i<0),max(2*L-abs(R_i<0),0))
UQSUB8 r5, r5, r14 ; r5 = min(abs(R_i>0),max(2*L-abs(R_i>0),0))
UQADD8 r7, r7, r11
UQSUB8 r9, r9, r11
UQSUB8 r7, r7, r5 ; r7 = p5+lflim(R_i,L)
STRD r6, [r0, -r1] ; [p5:p1] = [r7: r6]
UQADD8 r9, r9, r5 ; r9 = p6-lflim(R_i,L)
STRD r8, [r0] ; [p6:p2] = [r9: r8]
LDMFD r13!,{r4-r11,PC}
ENDP
oc_loop_filter_frag_rows_v6 PROC
; r0 = _ref_frame_data
; r1 = _ystride
; r2 = _bv
; r3 = _frags
; r4 = _fragi0
; r5 = _fragi0_end
; r6 = _fragi_top
; r7 = _fragi_bot
; r8 = _frag_buf_offs
; r9 = _nhfrags
MOV r12,r13
STMFD r13!,{r0,r4-r11,r14}
LDMFD r12,{r4-r9}
LDR r2, [r2] ; ll = *(int *)_bv
CMP r4, r5 ; if(_fragi0>=_fragi0_end)
BGE oslffri_v6_end ; bail
SUBS r9, r9, #1 ; r9 = _nhfrags-1 if (r9<=0)
BLE oslffri_v6_end ; bail
ADD r3, r3, r4, LSL #2 ; r3 = &_frags[fragi]
ADD r8, r8, r4, LSL #2 ; r8 = &_frag_buf_offs[fragi]
SUB r7, r7, r9 ; _fragi_bot -= _nhfrags;
oslffri_v6_lp1
MOV r10,r4 ; r10= fragi = _fragi0
ADD r11,r4, r9 ; r11= fragi_end-1=fragi+_nhfrags-1
oslffri_v6_lp2
LDR r14,[r3], #4 ; r14= _frags[fragi] _frags++
LDR r0, [r13] ; r0 = _ref_frame_data
LDR r12,[r8], #4 ; r12= _frag_buf_offs[fragi] _frag_buf_offs++
TST r14,#OC_FRAG_CODED_FLAG
BEQ oslffri_v6_uncoded
CMP r10,r4 ; if (fragi>_fragi0)
ADD r0, r0, r12 ; r0 = _ref_frame_data + _frag_buf_offs[fragi]
BLGT loop_filter_h_v6
CMP r4, r6 ; if (fragi0>_fragi_top)
BLGT loop_filter_v_v6
CMP r10,r11 ; if(fragi+1<fragi_end)===(fragi<fragi_end-1)
LDRLT r12,[r3] ; r12 = _frags[fragi+1]
ADD r0, r0, #8
ADD r10,r10,#1 ; r10 = fragi+1;
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG ; && _frags[fragi+1].coded==0
BLLT loop_filter_h_v6
CMP r10,r7 ; if (fragi<_fragi_bot)
LDRLT r12,[r3, r9, LSL #2] ; r12 = _frags[fragi+1+_nhfrags-1]
SUB r0, r0, #8
ADD r0, r0, r1, LSL #3
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG
BLLT loop_filter_v_v6
CMP r10,r11 ; while(fragi<=fragi_end-1)
BLE oslffri_v6_lp2
MOV r4, r10 ; r4 = fragi0 += nhfrags
CMP r4, r5
BLT oslffri_v6_lp1
oslffri_v6_end
LDMFD r13!,{r0,r4-r11,PC}
oslffri_v6_uncoded
ADD r10,r10,#1
CMP r10,r11
BLE oslffri_v6_lp2
MOV r4, r10 ; r4 = fragi0 += nhfrags
CMP r4, r5
BLT oslffri_v6_lp1
LDMFD r13!,{r0,r4-r11,PC}
ENDP
]
[ OC_ARM_ASM_NEON
EXPORT oc_loop_filter_init_neon
EXPORT oc_loop_filter_frag_rows_neon
oc_loop_filter_init_neon PROC
; r0 = _bv
; r1 = _flimit (=L from the spec)
MOV r1, r1, LSL #1 ; r1 = 2*L
VDUP.S16 Q15, r1 ; Q15= 2L in U16s
VST1.64 {D30,D31}, [r0@128]
MOV PC,r14
ENDP
loop_filter_h_neon PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int *_bv
; preserves r0-r3
; We assume Q15= 2*L in U16s
; My best guesses at cycle counts (and latency)--vvv
SUB r12,r0, #2
; Doing a 2-element structure load saves doing two VTRN's below, at the
; cost of using two more slower single-lane loads vs. the faster
; all-lane loads.
; It's less code this way, though, and benches a hair faster, but it
; leaves D2 and D4 swapped.
VLD2.16 {D0[],D2[]}, [r12], r1 ; D0 = ____________1100 2,1
; D2 = ____________3322
VLD2.16 {D4[],D6[]}, [r12], r1 ; D4 = ____________5544 2,1
; D6 = ____________7766
VLD2.16 {D0[1],D2[1]},[r12], r1 ; D0 = ________99881100 3,1
; D2 = ________BBAA3322
VLD2.16 {D4[1],D6[1]},[r12], r1 ; D4 = ________DDCC5544 3,1
; D6 = ________FFEE7766
VLD2.16 {D0[2],D2[2]},[r12], r1 ; D0 = ____GGHH99881100 3,1
; D2 = ____JJIIBBAA3322
VLD2.16 {D4[2],D6[2]},[r12], r1 ; D4 = ____KKLLDDCC5544 3,1
; D6 = ____NNMMFFEE7766
VLD2.16 {D0[3],D2[3]},[r12], r1 ; D0 = PPOOGGHH99881100 3,1
; D2 = RRQQJJIIBBAA3322
VLD2.16 {D4[3],D6[3]},[r12], r1 ; D4 = TTSSKKLLDDCC5544 3,1
; D6 = VVUUNNMMFFEE7766
VTRN.8 D0, D4 ; D0 = SSOOKKGGCC884400 D4 = TTPPLLHHDD995511 1,1
VTRN.8 D2, D6 ; D2 = UUQQMMIIEEAA6622 D6 = VVRRNNJJFFBB7733 1,1
VSUBL.U8 Q0, D0, D6 ; Q0 = 00 - 33 in S16s 1,3
VSUBL.U8 Q8, D2, D4 ; Q8 = 22 - 11 in S16s 1,3
ADD r12,r0, #8
VADD.S16 Q0, Q0, Q8 ; 1,3
PLD [r12]
VADD.S16 Q0, Q0, Q8 ; 1,3
PLD [r12,r1]
VADD.S16 Q0, Q0, Q8 ; Q0 = [0-3]+3*[2-1] 1,3
PLD [r12,r1, LSL #1]
VRSHR.S16 Q0, Q0, #3 ; Q0 = f = ([0-3]+3*[2-1]+4)>>3 1,4
ADD r12,r12,r1, LSL #2
; We want to do
; f = CLAMP(MIN(-2L-f,0), f, MAX(2L-f,0))
; = ((f >= 0) ? MIN( f ,MAX(2L- f ,0)) : MAX( f , MIN(-2L- f ,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) : MAX(-|f|, MIN(-2L+|f|,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) :-MIN( |f|,-MIN(-2L+|f|,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) :-MIN( |f|, MAX( 2L-|f|,0)))
; So we've reduced the left and right hand terms to be the same, except
; for a negation.
; Stall x3
VABS.S16 Q9, Q0 ; Q9 = |f| in U16s 1,4
PLD [r12,-r1]
VSHR.S16 Q0, Q0, #15 ; Q0 = -1 or 0 according to sign 1,3
PLD [r12]
VQSUB.U16 Q10,Q15,Q9 ; Q10= MAX(2L-|f|,0) in U16s 1,4
PLD [r12,r1]
VMOVL.U8 Q1, D2 ; Q2 = __UU__QQ__MM__II__EE__AA__66__22 2,3
PLD [r12,r1,LSL #1]
VMIN.U16 Q9, Q10,Q9 ; Q9 = MIN(|f|,MAX(2L-|f|)) 1,4
ADD r12,r12,r1, LSL #2
; Now we need to correct for the sign of f.
; For negative elements of Q0, we want to subtract the appropriate
; element of Q9. For positive elements we want to add them. No NEON
; instruction exists to do this, so we need to negate the negative
; elements, and we can then just add them. a-b = a-(1+!b) = a-1+!b
VADD.S16 Q9, Q9, Q0 ; 1,3
PLD [r12,-r1]
VEOR.S16 Q9, Q9, Q0 ; Q9 = real value of f 1,3
; Bah. No VRSBW.U8
; Stall (just 1 as Q9 not needed to second pipeline stage. I think.)
VADDW.U8 Q2, Q9, D4 ; Q1 = xxTTxxPPxxLLxxHHxxDDxx99xx55xx11 1,3
VSUB.S16 Q1, Q1, Q9 ; Q2 = xxUUxxQQxxMMxxIIxxEExxAAxx66xx22 1,3
VQMOVUN.S16 D4, Q2 ; D4 = TTPPLLHHDD995511 1,1
VQMOVUN.S16 D2, Q1 ; D2 = UUQQMMIIEEAA6622 1,1
SUB r12,r0, #1
VTRN.8 D4, D2 ; D4 = QQPPIIHHAA992211 D2 = MMLLEEDD6655 1,1
VST1.16 {D4[0]}, [r12], r1
VST1.16 {D2[0]}, [r12], r1
VST1.16 {D4[1]}, [r12], r1
VST1.16 {D2[1]}, [r12], r1
VST1.16 {D4[2]}, [r12], r1
VST1.16 {D2[2]}, [r12], r1
VST1.16 {D4[3]}, [r12], r1
VST1.16 {D2[3]}, [r12], r1
MOV PC,r14
ENDP
loop_filter_v_neon PROC
; r0 = unsigned char *_pix
; r1 = int _ystride
; r2 = int *_bv
; preserves r0-r3
; We assume Q15= 2*L in U16s
; My best guesses at cycle counts (and latency)--vvv
SUB r12,r0, r1, LSL #1
VLD1.64 {D0}, [r12@64], r1 ; D0 = SSOOKKGGCC884400 2,1
VLD1.64 {D2}, [r12@64], r1 ; D2 = TTPPLLHHDD995511 2,1
VLD1.64 {D4}, [r12@64], r1 ; D4 = UUQQMMIIEEAA6622 2,1
VLD1.64 {D6}, [r12@64] ; D6 = VVRRNNJJFFBB7733 2,1
VSUBL.U8 Q8, D4, D2 ; Q8 = 22 - 11 in S16s 1,3
VSUBL.U8 Q0, D0, D6 ; Q0 = 00 - 33 in S16s 1,3
ADD r12, #8
VADD.S16 Q0, Q0, Q8 ; 1,3
PLD [r12]
VADD.S16 Q0, Q0, Q8 ; 1,3
PLD [r12,r1]
VADD.S16 Q0, Q0, Q8 ; Q0 = [0-3]+3*[2-1] 1,3
SUB r12, r0, r1
VRSHR.S16 Q0, Q0, #3 ; Q0 = f = ([0-3]+3*[2-1]+4)>>3 1,4
; We want to do
; f = CLAMP(MIN(-2L-f,0), f, MAX(2L-f,0))
; = ((f >= 0) ? MIN( f ,MAX(2L- f ,0)) : MAX( f , MIN(-2L- f ,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) : MAX(-|f|, MIN(-2L+|f|,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) :-MIN( |f|,-MIN(-2L+|f|,0)))
; = ((f >= 0) ? MIN(|f|,MAX(2L-|f|,0)) :-MIN( |f|, MAX( 2L-|f|,0)))
; So we've reduced the left and right hand terms to be the same, except
; for a negation.
; Stall x3
VABS.S16 Q9, Q0 ; Q9 = |f| in U16s 1,4
VSHR.S16 Q0, Q0, #15 ; Q0 = -1 or 0 according to sign 1,3
; Stall x2
VQSUB.U16 Q10,Q15,Q9 ; Q10= MAX(2L-|f|,0) in U16s 1,4
VMOVL.U8 Q2, D4 ; Q2 = __UU__QQ__MM__II__EE__AA__66__22 2,3
; Stall x2
VMIN.U16 Q9, Q10,Q9 ; Q9 = MIN(|f|,MAX(2L-|f|)) 1,4
; Now we need to correct for the sign of f.
; For negative elements of Q0, we want to subtract the appropriate
; element of Q9. For positive elements we want to add them. No NEON
; instruction exists to do this, so we need to negate the negative
; elements, and we can then just add them. a-b = a-(1+!b) = a-1+!b
; Stall x3
VADD.S16 Q9, Q9, Q0 ; 1,3
; Stall x2
VEOR.S16 Q9, Q9, Q0 ; Q9 = real value of f 1,3
; Bah. No VRSBW.U8
; Stall (just 1 as Q9 not needed to second pipeline stage. I think.)
VADDW.U8 Q1, Q9, D2 ; Q1 = xxTTxxPPxxLLxxHHxxDDxx99xx55xx11 1,3
VSUB.S16 Q2, Q2, Q9 ; Q2 = xxUUxxQQxxMMxxIIxxEExxAAxx66xx22 1,3
VQMOVUN.S16 D2, Q1 ; D2 = TTPPLLHHDD995511 1,1
VQMOVUN.S16 D4, Q2 ; D4 = UUQQMMIIEEAA6622 1,1
VST1.64 {D2}, [r12@64], r1
VST1.64 {D4}, [r12@64], r1
MOV PC,r14
ENDP
oc_loop_filter_frag_rows_neon PROC
; r0 = _ref_frame_data
; r1 = _ystride
; r2 = _bv
; r3 = _frags
; r4 = _fragi0
; r5 = _fragi0_end
; r6 = _fragi_top
; r7 = _fragi_bot
; r8 = _frag_buf_offs
; r9 = _nhfrags
MOV r12,r13
STMFD r13!,{r0,r4-r11,r14}
LDMFD r12,{r4-r9}
CMP r4, r5 ; if(_fragi0>=_fragi0_end)
BGE oslffri_neon_end; bail
SUBS r9, r9, #1 ; r9 = _nhfrags-1 if (r9<=0)
BLE oslffri_neon_end ; bail
VLD1.64 {D30,D31}, [r2@128] ; Q15= 2L in U16s
ADD r3, r3, r4, LSL #2 ; r3 = &_frags[fragi]
ADD r8, r8, r4, LSL #2 ; r8 = &_frag_buf_offs[fragi]
SUB r7, r7, r9 ; _fragi_bot -= _nhfrags;
oslffri_neon_lp1
MOV r10,r4 ; r10= fragi = _fragi0
ADD r11,r4, r9 ; r11= fragi_end-1=fragi+_nhfrags-1
oslffri_neon_lp2
LDR r14,[r3], #4 ; r14= _frags[fragi] _frags++
LDR r0, [r13] ; r0 = _ref_frame_data
LDR r12,[r8], #4 ; r12= _frag_buf_offs[fragi] _frag_buf_offs++
TST r14,#OC_FRAG_CODED_FLAG
BEQ oslffri_neon_uncoded
CMP r10,r4 ; if (fragi>_fragi0)
ADD r0, r0, r12 ; r0 = _ref_frame_data + _frag_buf_offs[fragi]
BLGT loop_filter_h_neon
CMP r4, r6 ; if (_fragi0>_fragi_top)
BLGT loop_filter_v_neon
CMP r10,r11 ; if(fragi+1<fragi_end)===(fragi<fragi_end-1)
LDRLT r12,[r3] ; r12 = _frags[fragi+1]
ADD r0, r0, #8
ADD r10,r10,#1 ; r10 = fragi+1;
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG ; && _frags[fragi+1].coded==0
BLLT loop_filter_h_neon
CMP r10,r7 ; if (fragi<_fragi_bot)
LDRLT r12,[r3, r9, LSL #2] ; r12 = _frags[fragi+1+_nhfrags-1]
SUB r0, r0, #8
ADD r0, r0, r1, LSL #3
ANDLT r12,r12,#OC_FRAG_CODED_FLAG
CMPLT r12,#OC_FRAG_CODED_FLAG
BLLT loop_filter_v_neon
CMP r10,r11 ; while(fragi<=fragi_end-1)
BLE oslffri_neon_lp2
MOV r4, r10 ; r4 = _fragi0 += _nhfrags
CMP r4, r5
BLT oslffri_neon_lp1
oslffri_neon_end
LDMFD r13!,{r0,r4-r11,PC}
oslffri_neon_uncoded
ADD r10,r10,#1
CMP r10,r11
BLE oslffri_neon_lp2
MOV r4, r10 ; r4 = _fragi0 += _nhfrags
CMP r4, r5
BLT oslffri_neon_lp1
LDMFD r13!,{r0,r4-r11,PC}
ENDP
]
END

View File

@ -1,39 +0,0 @@
;********************************************************************
;* *
;* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
;* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
;* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
;* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
;* *
;* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
;* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
;* *
;********************************************************************
; Original implementation:
; Copyright (C) 2009 Robin Watts for Pinknoise Productions Ltd
; last mod: $Id$
;********************************************************************
; Set the following to 1 if we have EDSP instructions
; (LDRD/STRD, etc., ARMv5E and later).
OC_ARM_ASM_EDSP * 1
; Set the following to 1 if we have ARMv6 media instructions.
OC_ARM_ASM_MEDIA * 1
; Set the following to 1 if we have NEON (some ARMv7)
OC_ARM_ASM_NEON * 1
; Set the following to 1 if LDR/STR can work on unaligned addresses
; This is assumed to be true for ARMv6 and later code
OC_ARM_CAN_UNALIGN * 0
; Large unaligned loads and stores are often configured to cause an exception.
; They cause an 8 cycle stall when they cross a 128-bit (load) or 64-bit (store)
; boundary, so it's usually a bad idea to use them anyway if they can be
; avoided.
; Set the following to 1 if LDRD/STRD can work on unaligned addresses
OC_ARM_CAN_UNALIGN_LDRD * 0
END

View File

@ -1,219 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2010 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: x86state.c 17344 2010-07-21 01:42:18Z tterribe $
********************************************************************/
#include "armint.h"
#if defined(OC_ARM_ASM)
# if defined(OC_ARM_ASM_NEON)
/*This table has been modified from OC_FZIG_ZAG by baking an 8x8 transpose into
the destination.*/
static const unsigned char OC_FZIG_ZAG_NEON[128]={
0, 8, 1, 2, 9,16,24,17,
10, 3, 4,11,18,25,32,40,
33,26,19,12, 5, 6,13,20,
27,34,41,48,56,49,42,35,
28,21,14, 7,15,22,29,36,
43,50,57,58,51,44,37,30,
23,31,38,45,52,59,60,53,
46,39,47,54,61,62,55,63,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64
};
# endif
void oc_state_accel_init_arm(oc_theora_state *_state){
oc_state_accel_init_c(_state);
_state->cpu_flags=oc_cpu_flags_get();
# if defined(OC_STATE_USE_VTABLE)
_state->opt_vtable.frag_copy_list=oc_frag_copy_list_arm;
_state->opt_vtable.frag_recon_intra=oc_frag_recon_intra_arm;
_state->opt_vtable.frag_recon_inter=oc_frag_recon_inter_arm;
_state->opt_vtable.frag_recon_inter2=oc_frag_recon_inter2_arm;
_state->opt_vtable.idct8x8=oc_idct8x8_arm;
_state->opt_vtable.state_frag_recon=oc_state_frag_recon_arm;
/*Note: We _must_ set this function pointer, because the macro in armint.h
calls it with different arguments, so the C version will segfault.*/
_state->opt_vtable.state_loop_filter_frag_rows=
(oc_state_loop_filter_frag_rows_func)oc_loop_filter_frag_rows_arm;
# endif
# if defined(OC_ARM_ASM_EDSP)
if(_state->cpu_flags&OC_CPU_ARM_EDSP){
# if defined(OC_STATE_USE_VTABLE)
_state->opt_vtable.frag_copy_list=oc_frag_copy_list_edsp;
# endif
}
# if defined(OC_ARM_ASM_MEDIA)
if(_state->cpu_flags&OC_CPU_ARM_MEDIA){
# if defined(OC_STATE_USE_VTABLE)
_state->opt_vtable.frag_recon_intra=oc_frag_recon_intra_v6;
_state->opt_vtable.frag_recon_inter=oc_frag_recon_inter_v6;
_state->opt_vtable.frag_recon_inter2=oc_frag_recon_inter2_v6;
_state->opt_vtable.idct8x8=oc_idct8x8_v6;
_state->opt_vtable.state_frag_recon=oc_state_frag_recon_v6;
_state->opt_vtable.loop_filter_init=oc_loop_filter_init_v6;
_state->opt_vtable.state_loop_filter_frag_rows=
(oc_state_loop_filter_frag_rows_func)oc_loop_filter_frag_rows_v6;
# endif
}
# if defined(OC_ARM_ASM_NEON)
if(_state->cpu_flags&OC_CPU_ARM_NEON){
# if defined(OC_STATE_USE_VTABLE)
_state->opt_vtable.frag_copy_list=oc_frag_copy_list_neon;
_state->opt_vtable.frag_recon_intra=oc_frag_recon_intra_neon;
_state->opt_vtable.frag_recon_inter=oc_frag_recon_inter_neon;
_state->opt_vtable.frag_recon_inter2=oc_frag_recon_inter2_neon;
_state->opt_vtable.state_frag_recon=oc_state_frag_recon_neon;
_state->opt_vtable.loop_filter_init=oc_loop_filter_init_neon;
_state->opt_vtable.state_loop_filter_frag_rows=
(oc_state_loop_filter_frag_rows_func)oc_loop_filter_frag_rows_neon;
_state->opt_vtable.idct8x8=oc_idct8x8_neon;
# endif
_state->opt_data.dct_fzig_zag=OC_FZIG_ZAG_NEON;
}
# endif
# endif
# endif
}
void oc_state_frag_recon_arm(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant){
unsigned char *dst;
ptrdiff_t frag_buf_off;
int ystride;
int refi;
/*Apply the inverse transform.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
ogg_uint16_t p;
/*We round this dequant product (and not any of the others) because there's
no iDCT rounding.*/
p=(ogg_uint16_t)(_dct_coeffs[0]*(ogg_int32_t)_dc_quant+15>>5);
oc_idct8x8_1_arm(_dct_coeffs+64,p);
}
else{
/*First, dequantize the DC coefficient.*/
_dct_coeffs[0]=(ogg_int16_t)(_dct_coeffs[0]*(int)_dc_quant);
oc_idct8x8_arm(_dct_coeffs+64,_dct_coeffs,_last_zzi);
}
/*Fill in the target buffer.*/
frag_buf_off=_state->frag_buf_offs[_fragi];
refi=_state->frags[_fragi].refi;
ystride=_state->ref_ystride[_pli];
dst=_state->ref_frame_data[OC_FRAME_SELF]+frag_buf_off;
if(refi==OC_FRAME_SELF)oc_frag_recon_intra_arm(dst,ystride,_dct_coeffs+64);
else{
const unsigned char *ref;
int mvoffsets[2];
ref=_state->ref_frame_data[refi]+frag_buf_off;
if(oc_state_get_mv_offsets(_state,mvoffsets,_pli,
_state->frag_mvs[_fragi])>1){
oc_frag_recon_inter2_arm(dst,ref+mvoffsets[0],ref+mvoffsets[1],ystride,
_dct_coeffs+64);
}
else oc_frag_recon_inter_arm(dst,ref+mvoffsets[0],ystride,_dct_coeffs+64);
}
}
# if defined(OC_ARM_ASM_MEDIA)
void oc_state_frag_recon_v6(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant){
unsigned char *dst;
ptrdiff_t frag_buf_off;
int ystride;
int refi;
/*Apply the inverse transform.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
ogg_uint16_t p;
/*We round this dequant product (and not any of the others) because there's
no iDCT rounding.*/
p=(ogg_uint16_t)(_dct_coeffs[0]*(ogg_int32_t)_dc_quant+15>>5);
oc_idct8x8_1_v6(_dct_coeffs+64,p);
}
else{
/*First, dequantize the DC coefficient.*/
_dct_coeffs[0]=(ogg_int16_t)(_dct_coeffs[0]*(int)_dc_quant);
oc_idct8x8_v6(_dct_coeffs+64,_dct_coeffs,_last_zzi);
}
/*Fill in the target buffer.*/
frag_buf_off=_state->frag_buf_offs[_fragi];
refi=_state->frags[_fragi].refi;
ystride=_state->ref_ystride[_pli];
dst=_state->ref_frame_data[OC_FRAME_SELF]+frag_buf_off;
if(refi==OC_FRAME_SELF)oc_frag_recon_intra_v6(dst,ystride,_dct_coeffs+64);
else{
const unsigned char *ref;
int mvoffsets[2];
ref=_state->ref_frame_data[refi]+frag_buf_off;
if(oc_state_get_mv_offsets(_state,mvoffsets,_pli,
_state->frag_mvs[_fragi])>1){
oc_frag_recon_inter2_v6(dst,ref+mvoffsets[0],ref+mvoffsets[1],ystride,
_dct_coeffs+64);
}
else oc_frag_recon_inter_v6(dst,ref+mvoffsets[0],ystride,_dct_coeffs+64);
}
}
# if defined(OC_ARM_ASM_NEON)
void oc_state_frag_recon_neon(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant){
unsigned char *dst;
ptrdiff_t frag_buf_off;
int ystride;
int refi;
/*Apply the inverse transform.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
ogg_uint16_t p;
/*We round this dequant product (and not any of the others) because there's
no iDCT rounding.*/
p=(ogg_uint16_t)(_dct_coeffs[0]*(ogg_int32_t)_dc_quant+15>>5);
oc_idct8x8_1_neon(_dct_coeffs+64,p);
}
else{
/*First, dequantize the DC coefficient.*/
_dct_coeffs[0]=(ogg_int16_t)(_dct_coeffs[0]*(int)_dc_quant);
oc_idct8x8_neon(_dct_coeffs+64,_dct_coeffs,_last_zzi);
}
/*Fill in the target buffer.*/
frag_buf_off=_state->frag_buf_offs[_fragi];
refi=_state->frags[_fragi].refi;
ystride=_state->ref_ystride[_pli];
dst=_state->ref_frame_data[OC_FRAME_SELF]+frag_buf_off;
if(refi==OC_FRAME_SELF)oc_frag_recon_intra_neon(dst,ystride,_dct_coeffs+64);
else{
const unsigned char *ref;
int mvoffsets[2];
ref=_state->ref_frame_data[refi]+frag_buf_off;
if(oc_state_get_mv_offsets(_state,mvoffsets,_pli,
_state->frag_mvs[_fragi])>1){
oc_frag_recon_inter2_neon(dst,ref+mvoffsets[0],ref+mvoffsets[1],ystride,
_dct_coeffs+64);
}
else oc_frag_recon_inter_neon(dst,ref+mvoffsets[0],ystride,_dct_coeffs+64);
}
}
# endif
# endif
#endif

View File

@ -1,114 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function: packing variable sized words into an octet stream
last mod: $Id$
********************************************************************/
#include <string.h>
#include <stdlib.h>
#include "bitpack.h"
/*We're 'MSb' endian; if we write a word but read individual bits,
then we'll read the MSb first.*/
void oc_pack_readinit(oc_pack_buf *_b,unsigned char *_buf,long _bytes){
memset(_b,0,sizeof(*_b));
_b->ptr=_buf;
_b->stop=_buf+_bytes;
}
static oc_pb_window oc_pack_refill(oc_pack_buf *_b,int _bits){
const unsigned char *ptr;
const unsigned char *stop;
oc_pb_window window;
int available;
unsigned shift;
stop=_b->stop;
ptr=_b->ptr;
window=_b->window;
available=_b->bits;
shift=OC_PB_WINDOW_SIZE-available;
while(7<shift&&ptr<stop){
shift-=8;
window|=(oc_pb_window)*ptr++<<shift;
}
_b->ptr=ptr;
available=OC_PB_WINDOW_SIZE-shift;
if(_bits>available){
if(ptr>=stop){
_b->eof=1;
available=OC_LOTS_OF_BITS;
}
else window|=*ptr>>(available&7);
}
_b->bits=available;
return window;
}
int oc_pack_look1(oc_pack_buf *_b){
oc_pb_window window;
int available;
window=_b->window;
available=_b->bits;
if(available<1)_b->window=window=oc_pack_refill(_b,1);
return window>>OC_PB_WINDOW_SIZE-1;
}
void oc_pack_adv1(oc_pack_buf *_b){
_b->window<<=1;
_b->bits--;
}
/*Here we assume that 0<=_bits&&_bits<=32.*/
long oc_pack_read_c(oc_pack_buf *_b,int _bits){
oc_pb_window window;
int available;
long result;
window=_b->window;
available=_b->bits;
if(_bits==0)return 0;
if(available<_bits){
window=oc_pack_refill(_b,_bits);
available=_b->bits;
}
result=window>>OC_PB_WINDOW_SIZE-_bits;
available-=_bits;
window<<=1;
window<<=_bits-1;
_b->window=window;
_b->bits=available;
return result;
}
int oc_pack_read1_c(oc_pack_buf *_b){
oc_pb_window window;
int available;
int result;
window=_b->window;
available=_b->bits;
if(available<1){
window=oc_pack_refill(_b,1);
available=_b->bits;
}
result=window>>OC_PB_WINDOW_SIZE-1;
available--;
window<<=1;
_b->window=window;
_b->bits=available;
return result;
}
long oc_pack_bytes_left(oc_pack_buf *_b){
if(_b->eof)return -1;
return _b->stop-_b->ptr+(_b->bits>>3);
}

View File

@ -1,76 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function: packing variable sized words into an octet stream
last mod: $Id: bitwise.c 7675 2004-09-01 00:34:39Z xiphmont $
********************************************************************/
#if !defined(_bitpack_H)
# define _bitpack_H (1)
# include <stddef.h>
# include <limits.h>
# include "internal.h"
typedef size_t oc_pb_window;
typedef struct oc_pack_buf oc_pack_buf;
/*Custom bitpacker implementations.*/
# if defined(OC_ARM_ASM)
# include "arm/armbits.h"
# endif
# if !defined(oc_pack_read)
# define oc_pack_read oc_pack_read_c
# endif
# if !defined(oc_pack_read1)
# define oc_pack_read1 oc_pack_read1_c
# endif
# if !defined(oc_huff_token_decode)
# define oc_huff_token_decode oc_huff_token_decode_c
# endif
# define OC_PB_WINDOW_SIZE ((int)sizeof(oc_pb_window)*CHAR_BIT)
/*This is meant to be a large, positive constant that can still be efficiently
loaded as an immediate (on platforms like ARM, for example).
Even relatively modest values like 100 would work fine.*/
# define OC_LOTS_OF_BITS (0x40000000)
struct oc_pack_buf{
const unsigned char *stop;
const unsigned char *ptr;
oc_pb_window window;
int bits;
int eof;
};
void oc_pack_readinit(oc_pack_buf *_b,unsigned char *_buf,long _bytes);
int oc_pack_look1(oc_pack_buf *_b);
void oc_pack_adv1(oc_pack_buf *_b);
/*Here we assume 0<=_bits&&_bits<=32.*/
long oc_pack_read_c(oc_pack_buf *_b,int _bits);
int oc_pack_read1_c(oc_pack_buf *_b);
/* returns -1 for read beyond EOF, or the number of whole bytes available */
long oc_pack_bytes_left(oc_pack_buf *_b);
/*These two functions are implemented locally in huffdec.c*/
/*Read in bits without advancing the bitptr.
Here we assume 0<=_bits&&_bits<=32.*/
/*static int oc_pack_look(oc_pack_buf *_b,int _bits);*/
/*static void oc_pack_adv(oc_pack_buf *_b,int _bits);*/
#endif

View File

@ -1,103 +0,0 @@
/* config.h. Generated from config.h.in by configure. */
/* config.h.in. Generated from configure.ac by autoheader. */
/* libcairo is available for visual debugging output */
/* #undef HAVE_CAIRO */
/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define to 1 if you have the <machine/soundcard.h> header file. */
/* #undef HAVE_MACHINE_SOUNDCARD_H */
/* Abort if size exceeds 16384x16384 (for fuzzing only) */
/* #undef HAVE_MEMORY_CONSTRAINT */
/* Define to 1 if you have the <soundcard.h> header file. */
/* #undef HAVE_SOUNDCARD_H */
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <stdio.h> header file. */
#define HAVE_STDIO_H 1
/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1
/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1
/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1
/* Define to 1 if you have the <sys/soundcard.h> header file. */
#define HAVE_SYS_SOUNDCARD_H 1
/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1
/* Define to the sub-directory where libtool stores uninstalled libraries. */
/* #undef LT_OBJDIR */
/* make use of arm asm optimization */
/* #undef OC_ARM_ASM */
/* Define if assembler supports EDSP instructions */
/* #undef OC_ARM_ASM_EDSP */
/* Define if assembler supports ARMv6 media instructions */
/* #undef OC_ARM_ASM_MEDIA */
/* Define if compiler supports NEON instructions */
/* #undef OC_ARM_ASM_NEON */
/* make use of c64x+ asm optimization */
/* #undef OC_C64X_ASM */
/* make use of x86_64 asm optimization */
/* #undef OC_X86_64_ASM */
/* make use of x86 asm optimization */
/* #undef OC_X86_ASM */
/* Name of package */
#define PACKAGE "libtheora"
/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT ""
/* Define to the full name of this package. */
#define PACKAGE_NAME "libtheora"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "libtheora 1.2.0alpha1+git"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "libtheora"
/* Define to the home page for this package. */
#define PACKAGE_URL ""
/* Define to the version of this package. */
#define PACKAGE_VERSION "1.2.0alpha1+git"
/* Define to 1 if all of the C90 standard headers exist (not just the ones
required in a freestanding environment). This macro is provided for
backward compatibility; new code need not use it. */
#define STDC_HEADERS 1
/* Define to exclude encode support from the build */
/* #undef THEORA_DISABLE_ENCODE */
/* Version number of package */
#define VERSION "1.2.0alpha1+git"

View File

@ -1,31 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*Definitions shared by the forward and inverse DCT transforms.*/
#if !defined(_dct_H)
# define _dct_H (1)
/*cos(n*pi/16) (resp. sin(m*pi/16)) scaled by 65536.*/
#define OC_C1S7 ((ogg_int32_t)64277)
#define OC_C2S6 ((ogg_int32_t)60547)
#define OC_C3S5 ((ogg_int32_t)54491)
#define OC_C4S4 ((ogg_int32_t)46341)
#define OC_C5S3 ((ogg_int32_t)36410)
#define OC_C6S2 ((ogg_int32_t)25080)
#define OC_C7S1 ((ogg_int32_t)12785)
#endif

View File

@ -1,274 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include "decint.h"
/*Only used for fuzzing.*/
#if defined(HAVE_MEMORY_CONSTRAINT)
static const int MAX_FUZZING_WIDTH = 16384;
static const int MAX_FUZZING_HEIGHT = 16384;
#endif
/*Unpacks a series of octets from a given byte array into the pack buffer.
No checking is done to ensure the buffer contains enough data.
_opb: The pack buffer to read the octets from.
_buf: The byte array to store the unpacked bytes in.
_len: The number of octets to unpack.*/
static void oc_unpack_octets(oc_pack_buf *_opb,char *_buf,size_t _len){
while(_len-->0){
long val;
val=oc_pack_read(_opb,8);
*_buf++=(char)val;
}
}
/*Unpacks a 32-bit integer encoded by octets in little-endian form.*/
static long oc_unpack_length(oc_pack_buf *_opb){
long ret[4];
int i;
for(i=0;i<4;i++)ret[i]=oc_pack_read(_opb,8);
return ret[0]|ret[1]<<8|ret[2]<<16|ret[3]<<24;
}
static int oc_info_unpack(oc_pack_buf *_opb,th_info *_info){
long val;
/*Check the codec bitstream version.*/
val=oc_pack_read(_opb,8);
_info->version_major=(unsigned char)val;
val=oc_pack_read(_opb,8);
_info->version_minor=(unsigned char)val;
val=oc_pack_read(_opb,8);
_info->version_subminor=(unsigned char)val;
/*verify we can parse this bitstream version.
We accept earlier minors and all subminors, by spec*/
if(_info->version_major>TH_VERSION_MAJOR||
(_info->version_major==TH_VERSION_MAJOR&&
_info->version_minor>TH_VERSION_MINOR)){
return TH_EVERSION;
}
/*Read the encoded frame description.*/
val=oc_pack_read(_opb,16);
_info->frame_width=(ogg_uint32_t)val<<4;
val=oc_pack_read(_opb,16);
_info->frame_height=(ogg_uint32_t)val<<4;
val=oc_pack_read(_opb,24);
_info->pic_width=(ogg_uint32_t)val;
val=oc_pack_read(_opb,24);
_info->pic_height=(ogg_uint32_t)val;
val=oc_pack_read(_opb,8);
_info->pic_x=(ogg_uint32_t)val;
val=oc_pack_read(_opb,8);
_info->pic_y=(ogg_uint32_t)val;
val=oc_pack_read(_opb,32);
_info->fps_numerator=(ogg_uint32_t)val;
val=oc_pack_read(_opb,32);
_info->fps_denominator=(ogg_uint32_t)val;
if(_info->frame_width==0||_info->frame_height==0||
_info->pic_width+_info->pic_x>_info->frame_width||
_info->pic_height+_info->pic_y>_info->frame_height||
_info->fps_numerator==0||_info->fps_denominator==0){
return TH_EBADHEADER;
}
#if defined(HAVE_MEMORY_CONSTRAINT)
if(_info->frame_width>=MAX_FUZZING_WIDTH&&_info->frame_height>=MAX_FUZZING_HEIGHT){
return TH_EBADHEADER;
}
#endif
/*Note: The sense of pic_y is inverted in what we pass back to the
application compared to how it is stored in the bitstream.
This is because the bitstream uses a right-handed coordinate system, while
applications expect a left-handed one.*/
_info->pic_y=_info->frame_height-_info->pic_height-_info->pic_y;
val=oc_pack_read(_opb,24);
_info->aspect_numerator=(ogg_uint32_t)val;
val=oc_pack_read(_opb,24);
_info->aspect_denominator=(ogg_uint32_t)val;
val=oc_pack_read(_opb,8);
_info->colorspace=(th_colorspace)val;
val=oc_pack_read(_opb,24);
_info->target_bitrate=(int)val;
val=oc_pack_read(_opb,6);
_info->quality=(int)val;
val=oc_pack_read(_opb,5);
_info->keyframe_granule_shift=(int)val;
val=oc_pack_read(_opb,2);
_info->pixel_fmt=(th_pixel_fmt)val;
if(_info->pixel_fmt==TH_PF_RSVD)return TH_EBADHEADER;
val=oc_pack_read(_opb,3);
if(val!=0||oc_pack_bytes_left(_opb)<0)return TH_EBADHEADER;
return 0;
}
static int oc_comment_unpack(oc_pack_buf *_opb,th_comment *_tc){
long len;
int i;
/*Read the vendor string.*/
len=oc_unpack_length(_opb);
if(len<0||len>oc_pack_bytes_left(_opb))return TH_EBADHEADER;
_tc->vendor=_ogg_malloc((size_t)len+1);
if(_tc->vendor==NULL)return TH_EFAULT;
oc_unpack_octets(_opb,_tc->vendor,len);
_tc->vendor[len]='\0';
/*Read the user comments.*/
_tc->comments=(int)oc_unpack_length(_opb);
len=_tc->comments;
if(len<0||len>(LONG_MAX>>2)||len<<2>oc_pack_bytes_left(_opb)){
_tc->comments=0;
return TH_EBADHEADER;
}
_tc->comment_lengths=(int *)_ogg_malloc(
_tc->comments*sizeof(_tc->comment_lengths[0]));
_tc->user_comments=(char **)_ogg_malloc(
_tc->comments*sizeof(_tc->user_comments[0]));
if(_tc->comment_lengths==NULL||_tc->user_comments==NULL){
_tc->comments=0;
return TH_EFAULT;
}
for(i=0;i<_tc->comments;i++){
len=oc_unpack_length(_opb);
if(len<0||len>oc_pack_bytes_left(_opb)){
_tc->comments=i;
return TH_EBADHEADER;
}
_tc->comment_lengths[i]=len;
_tc->user_comments[i]=_ogg_malloc((size_t)len+1);
if(_tc->user_comments[i]==NULL){
_tc->comments=i;
return TH_EFAULT;
}
oc_unpack_octets(_opb,_tc->user_comments[i],len);
_tc->user_comments[i][len]='\0';
}
return oc_pack_bytes_left(_opb)<0?TH_EBADHEADER:0;
}
static int oc_setup_unpack(oc_pack_buf *_opb,th_setup_info *_setup){
int ret;
/*Read the quantizer tables.*/
ret=oc_quant_params_unpack(_opb,&_setup->qinfo);
if(ret<0)return ret;
/*Read the Huffman trees.*/
return oc_huff_trees_unpack(_opb,_setup->huff_tables);
}
static void oc_setup_clear(th_setup_info *_setup){
oc_quant_params_clear(&_setup->qinfo);
oc_huff_trees_clear(_setup->huff_tables);
}
static int oc_dec_headerin(oc_pack_buf *_opb,th_info *_info,
th_comment *_tc,th_setup_info **_setup,ogg_packet *_op){
char buffer[6];
long val;
int packtype;
int ret;
val=oc_pack_read(_opb,8);
packtype=(int)val;
/*If we're at a data packet...*/
if(!(packtype&0x80)){
/*Check to make sure we received all three headers...
If we haven't seen any valid headers, assume this is not actually
Theora.*/
if(_info->frame_width<=0)return TH_ENOTFORMAT;
/*Follow our documentation, which says we'll return TH_EFAULT if this
are NULL (_info was checked by our caller).*/
if(_tc==NULL)return TH_EFAULT;
/*And if any other headers were missing, declare this packet "out of
sequence" instead.*/
if(_tc->vendor==NULL)return TH_EBADHEADER;
/*Don't check this until it's needed, since we allow passing NULL for the
arguments that we're not expecting the next header to fill in yet.*/
if(_setup==NULL)return TH_EFAULT;
if(*_setup==NULL)return TH_EBADHEADER;
/*If we got everything, we're done.*/
return 0;
}
/*Check the codec string.*/
oc_unpack_octets(_opb,buffer,6);
if(memcmp(buffer,"theora",6)!=0)return TH_ENOTFORMAT;
switch(packtype){
/*Codec info header.*/
case 0x80:{
/*This should be the first packet, and we should not already be
initialized.*/
if(!_op->b_o_s||_info->frame_width>0)return TH_EBADHEADER;
ret=oc_info_unpack(_opb,_info);
if(ret<0)th_info_clear(_info);
else ret=3;
}break;
/*Comment header.*/
case 0x81:{
if(_tc==NULL)return TH_EFAULT;
/*We shoud have already decoded the info header, and should not yet have
decoded the comment header.*/
if(_info->frame_width==0||_tc->vendor!=NULL)return TH_EBADHEADER;
ret=oc_comment_unpack(_opb,_tc);
if(ret<0)th_comment_clear(_tc);
else ret=2;
}break;
/*Codec setup header.*/
case 0x82:{
oc_setup_info *setup;
if(_tc==NULL||_setup==NULL)return TH_EFAULT;
/*We should have already decoded the info header and the comment header,
and should not yet have decoded the setup header.*/
if(_info->frame_width==0||_tc->vendor==NULL||*_setup!=NULL){
return TH_EBADHEADER;
}
setup=(oc_setup_info *)_ogg_calloc(1,sizeof(*setup));
if(setup==NULL)return TH_EFAULT;
ret=oc_setup_unpack(_opb,setup);
if(ret<0){
oc_setup_clear(setup);
_ogg_free(setup);
}
else{
*_setup=setup;
ret=1;
}
}break;
default:{
/*We don't know what this header is.*/
return TH_EBADHEADER;
}break;
}
return ret;
}
/*Decodes one header packet.
This should be called repeatedly with the packets at the beginning of the
stream until it returns 0.*/
int th_decode_headerin(th_info *_info,th_comment *_tc,
th_setup_info **_setup,ogg_packet *_op){
oc_pack_buf opb;
if(_op==NULL)return TH_EBADHEADER;
if(_info==NULL)return TH_EFAULT;
oc_pack_readinit(&opb,_op->packet,_op->bytes);
return oc_dec_headerin(&opb,_info,_tc,_setup,_op);
}
void th_setup_free(th_setup_info *_setup){
if(_setup!=NULL){
oc_setup_clear(_setup);
_ogg_free(_setup);
}
}

View File

@ -1,185 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <limits.h>
#if !defined(_decint_H)
# define _decint_H (1)
# include "theora/theoradec.h"
# include "state.h"
# include "bitpack.h"
# include "huffdec.h"
# include "dequant.h"
typedef struct th_setup_info oc_setup_info;
typedef struct oc_dec_opt_vtable oc_dec_opt_vtable;
typedef struct oc_dec_pipeline_state oc_dec_pipeline_state;
typedef struct th_dec_ctx oc_dec_ctx;
/*Decoder-specific accelerated functions.*/
# if defined(OC_C64X_ASM)
# include "c64x/c64xdec.h"
# endif
# if !defined(oc_dec_accel_init)
# define oc_dec_accel_init oc_dec_accel_init_c
# endif
# if defined(OC_DEC_USE_VTABLE)
# if !defined(oc_dec_dc_unpredict_mcu_plane)
# define oc_dec_dc_unpredict_mcu_plane(_dec,_pipe,_pli) \
((*(_dec)->opt_vtable.dc_unpredict_mcu_plane)(_dec,_pipe,_pli))
# endif
# else
# if !defined(oc_dec_dc_unpredict_mcu_plane)
# define oc_dec_dc_unpredict_mcu_plane oc_dec_dc_unpredict_mcu_plane_c
# endif
# endif
/*Constants for the packet-in state machine specific to the decoder.*/
/*Next packet to read: Data packet.*/
#define OC_PACKET_DATA (0)
struct th_setup_info{
/*The Huffman codes.*/
ogg_int16_t *huff_tables[TH_NHUFFMAN_TABLES];
/*The quantization parameters.*/
th_quant_info qinfo;
};
/*Decoder specific functions with accelerated variants.*/
struct oc_dec_opt_vtable{
void (*dc_unpredict_mcu_plane)(oc_dec_ctx *_dec,
oc_dec_pipeline_state *_pipe,int _pli);
};
struct oc_dec_pipeline_state{
/*Decoded DCT coefficients.
These are placed here instead of on the stack so that they can persist
between blocks, which makes clearing them back to zero much faster when
only a few non-zero coefficients were decoded.
It requires at least 65 elements because the zig-zag index array uses the
65th element as a dumping ground for out-of-range indices to protect us
from buffer overflow.
We make it fully twice as large so that the second half can serve as the
reconstruction buffer, which saves passing another parameter to all the
acceleration functios.
It also solves problems with 16-byte alignment for NEON on ARM.
gcc (as of 4.2.1) only seems to be able to give stack variables 8-byte
alignment, and silently produces incorrect results if you ask for 16.
Finally, keeping it off the stack means there's less likely to be a data
hazard beween the NEON co-processor and the regular ARM core, which avoids
unnecessary stalls.*/
OC_ALIGN16(ogg_int16_t dct_coeffs[128]);
OC_ALIGN16(signed char bounding_values[256]);
ptrdiff_t ti[3][64];
ptrdiff_t ebi[3][64];
ptrdiff_t eob_runs[3][64];
const ptrdiff_t *coded_fragis[3];
const ptrdiff_t *uncoded_fragis[3];
ptrdiff_t ncoded_fragis[3];
ptrdiff_t nuncoded_fragis[3];
const ogg_uint16_t *dequant[3][3][2];
int fragy0[3];
int fragy_end[3];
int pred_last[3][4];
int mcu_nvfrags;
int loop_filter;
int pp_level;
};
struct th_dec_ctx{
/*Shared encoder/decoder state.*/
oc_theora_state state;
/*Whether or not packets are ready to be emitted.
This takes on negative values while there are remaining header packets to
be emitted, reaches 0 when the codec is ready for input, and goes to 1
when a frame has been processed and a data packet is ready.*/
int packet_state;
/*Buffer in which to assemble packets.*/
oc_pack_buf opb;
/*Huffman decode trees.*/
ogg_int16_t *huff_tables[TH_NHUFFMAN_TABLES];
/*The index of the first token in each plane for each coefficient.*/
ptrdiff_t ti0[3][64];
/*The number of outstanding EOB runs at the start of each coefficient in each
plane.*/
ptrdiff_t eob_runs[3][64];
/*The DCT token lists.*/
unsigned char *dct_tokens;
/*The extra bits associated with DCT tokens.*/
unsigned char *extra_bits;
/*The number of dct tokens unpacked so far.*/
int dct_tokens_count;
/*The out-of-loop post-processing level.*/
int pp_level;
/*The DC scale used for out-of-loop deblocking.*/
int pp_dc_scale[64];
/*The sharpen modifier used for out-of-loop deringing.*/
int pp_sharp_mod[64];
/*The DC quantization index of each block.*/
unsigned char *dc_qis;
/*The variance of each block.*/
int *variances;
/*The storage for the post-processed frame buffer.*/
unsigned char *pp_frame_data;
/*Whether or not the post-processsed frame buffer has space for chroma.*/
int pp_frame_state;
/*The buffer used for the post-processed frame.
Note that this is _not_ guaranteed to have the same strides and offsets as
the reference frame buffers.*/
th_ycbcr_buffer pp_frame_buf;
/*The striped decode callback function.*/
th_stripe_callback stripe_cb;
oc_dec_pipeline_state pipe;
# if defined(OC_DEC_USE_VTABLE)
/*Table for decoder acceleration functions.*/
oc_dec_opt_vtable opt_vtable;
# endif
# if defined(HAVE_CAIRO)
/*Output metrics for debugging.*/
int telemetry_mbmode;
int telemetry_mv;
int telemetry_qi;
int telemetry_bits;
int telemetry_frame_bytes;
int telemetry_coding_bytes;
int telemetry_mode_bytes;
int telemetry_mv_bytes;
int telemetry_qi_bytes;
int telemetry_dc_bytes;
unsigned char *telemetry_frame_data;
# endif
};
/*Default pure-C implementations of decoder-specific accelerated functions.*/
void oc_dec_accel_init_c(oc_dec_ctx *_dec);
void oc_dec_dc_unpredict_mcu_plane_c(oc_dec_ctx *_dec,
oc_dec_pipeline_state *_pipe,int _pli);
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,182 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <ogg/ogg.h>
#include "dequant.h"
#include "decint.h"
int oc_quant_params_unpack(oc_pack_buf *_opb,th_quant_info *_qinfo){
th_quant_base *base_mats;
long val;
int nbase_mats;
int sizes[64];
int indices[64];
int nbits;
int bmi;
int ci;
int qti;
int pli;
int qri;
int qi;
int i;
val=oc_pack_read(_opb,3);
nbits=(int)val;
for(qi=0;qi<64;qi++){
val=oc_pack_read(_opb,nbits);
_qinfo->loop_filter_limits[qi]=(unsigned char)val;
}
val=oc_pack_read(_opb,4);
nbits=(int)val+1;
for(qi=0;qi<64;qi++){
val=oc_pack_read(_opb,nbits);
_qinfo->ac_scale[qi]=(ogg_uint16_t)val;
}
val=oc_pack_read(_opb,4);
nbits=(int)val+1;
for(qi=0;qi<64;qi++){
val=oc_pack_read(_opb,nbits);
_qinfo->dc_scale[qi]=(ogg_uint16_t)val;
}
val=oc_pack_read(_opb,9);
nbase_mats=(int)val+1;
base_mats=_ogg_malloc(nbase_mats*sizeof(base_mats[0]));
if(base_mats==NULL)return TH_EFAULT;
for(bmi=0;bmi<nbase_mats;bmi++){
for(ci=0;ci<64;ci++){
val=oc_pack_read(_opb,8);
base_mats[bmi][ci]=(unsigned char)val;
}
}
nbits=oc_ilog(nbase_mats-1);
for(i=0;i<6;i++){
th_quant_ranges *qranges;
th_quant_base *qrbms;
int *qrsizes;
qti=i/3;
pli=i%3;
qranges=_qinfo->qi_ranges[qti]+pli;
if(i>0){
val=oc_pack_read1(_opb);
if(!val){
int qtj;
int plj;
if(qti>0){
val=oc_pack_read1(_opb);
if(val){
qtj=qti-1;
plj=pli;
}
else{
qtj=(i-1)/3;
plj=(i-1)%3;
}
}
else{
qtj=(i-1)/3;
plj=(i-1)%3;
}
*qranges=*(_qinfo->qi_ranges[qtj]+plj);
continue;
}
}
val=oc_pack_read(_opb,nbits);
indices[0]=(int)val;
for(qi=qri=0;qi<63;){
val=oc_pack_read(_opb,oc_ilog(62-qi));
sizes[qri]=(int)val+1;
qi+=(int)val+1;
val=oc_pack_read(_opb,nbits);
indices[++qri]=(int)val;
}
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
if(qi>63){
_ogg_free(base_mats);
return TH_EBADHEADER;
}
qranges->nranges=qri;
qranges->sizes=qrsizes=(int *)_ogg_malloc(qri*sizeof(qrsizes[0]));
if(qranges->sizes==NULL){
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
_ogg_free(base_mats);
return TH_EFAULT;
}
memcpy(qrsizes,sizes,qri*sizeof(qrsizes[0]));
qrbms=(th_quant_base *)_ogg_malloc((qri+1)*sizeof(qrbms[0]));
if(qrbms==NULL){
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
_ogg_free(base_mats);
return TH_EFAULT;
}
qranges->base_matrices=(const th_quant_base *)qrbms;
do{
bmi=indices[qri];
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
if(bmi>=nbase_mats){
_ogg_free(base_mats);
return TH_EBADHEADER;
}
memcpy(qrbms[qri],base_mats[bmi],sizeof(qrbms[qri]));
}
while(qri-->0);
}
_ogg_free(base_mats);
return 0;
}
void oc_quant_params_clear(th_quant_info *_qinfo){
int i;
for(i=6;i-->0;){
int qti;
int pli;
qti=i/3;
pli=i%3;
/*Clear any duplicate pointer references.*/
if(i>0){
int qtj;
int plj;
qtj=(i-1)/3;
plj=(i-1)%3;
if(_qinfo->qi_ranges[qti][pli].sizes==
_qinfo->qi_ranges[qtj][plj].sizes){
_qinfo->qi_ranges[qti][pli].sizes=NULL;
}
if(_qinfo->qi_ranges[qti][pli].base_matrices==
_qinfo->qi_ranges[qtj][plj].base_matrices){
_qinfo->qi_ranges[qti][pli].base_matrices=NULL;
}
}
if(qti>0){
if(_qinfo->qi_ranges[1][pli].sizes==
_qinfo->qi_ranges[0][pli].sizes){
_qinfo->qi_ranges[1][pli].sizes=NULL;
}
if(_qinfo->qi_ranges[1][pli].base_matrices==
_qinfo->qi_ranges[0][pli].base_matrices){
_qinfo->qi_ranges[1][pli].base_matrices=NULL;
}
}
/*Now free all the non-duplicate storage.*/
_ogg_free((void *)_qinfo->qi_ranges[qti][pli].sizes);
_ogg_free((void *)_qinfo->qi_ranges[qti][pli].base_matrices);
}
}

View File

@ -1,27 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_dequant_H)
# define _dequant_H (1)
# include "quant.h"
# include "bitpack.h"
int oc_quant_params_unpack(oc_pack_buf *_opb,
th_quant_info *_qinfo);
void oc_quant_params_clear(th_quant_info *_qinfo);
#endif

View File

@ -1,82 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <string.h>
#include "internal.h"
void oc_frag_copy_c(unsigned char *_dst,const unsigned char *_src,int _ystride){
int i;
for(i=8;i-->0;){
memcpy(_dst,_src,8*sizeof(*_dst));
_dst+=_ystride;
_src+=_ystride;
}
}
/*Copies the fragments specified by the lists of fragment indices from one
frame to another.
_dst_frame: The reference frame to copy to.
_src_frame: The reference frame to copy from.
_ystride: The row stride of the reference frames.
_fragis: A pointer to a list of fragment indices.
_nfragis: The number of fragment indices to copy.
_frag_buf_offs: The offsets of fragments in the reference frames.*/
void oc_frag_copy_list_c(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs){
ptrdiff_t fragii;
for(fragii=0;fragii<_nfragis;fragii++){
ptrdiff_t frag_buf_off;
frag_buf_off=_frag_buf_offs[_fragis[fragii]];
oc_frag_copy_c(_dst_frame+frag_buf_off,
_src_frame+frag_buf_off,_ystride);
}
}
void oc_frag_recon_intra_c(unsigned char *_dst,int _ystride,
const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+128);
_dst+=_ystride;
}
}
void oc_frag_recon_inter_c(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+_src[j]);
_dst+=_ystride;
_src+=_ystride;
}
}
void oc_frag_recon_inter2_c(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+(_src1[j]+_src2[j]>>1));
_dst+=_ystride;
_src1+=_ystride;
_src2+=_ystride;
}
}
void oc_restore_fpu_c(void){}

View File

@ -1,515 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <ogg/ogg.h>
#include "huffdec.h"
#include "decint.h"
/*Instead of storing every branching in the tree, subtrees can be collapsed
into one node, with a table of size 1<<nbits pointing directly to its
descedents nbits levels down.
This allows more than one bit to be read at a time, and avoids following all
the intermediate branches with next to no increased code complexity once
the collapsed tree has been built.
We do _not_ require that a subtree be complete to be collapsed, but instead
store duplicate pointers in the table, and record the actual depth of the
node below its parent.
This tells us the number of bits to advance the stream after reaching it.
This turns out to be equivalent to the method described in \cite{Hash95},
without the requirement that codewords be sorted by length.
If the codewords were sorted by length (so-called ``canonical-codes''), they
could be decoded much faster via either Lindell and Moffat's approach or
Hashemian's Condensed Huffman Code approach, the latter of which has an
extremely small memory footprint.
We can't use Choueka et al.'s finite state machine approach, which is
extremely fast, because we can't allow multiple symbols to be output at a
time; the codebook can and does change between symbols.
It also has very large memory requirements, which impairs cache coherency.
We store the tree packed in an array of 16-bit integers (words).
Each node consists of a single word, followed consecutively by two or more
indices of its children.
Let n be the value of this first word.
This is the number of bits that need to be read to traverse the node, and
must be positive.
1<<n entries follow in the array, each an index to a child node.
If the child is positive, then it is the index of another internal node in
the table.
If the child is negative or zero, then it is a leaf node.
These are stored directly in the child pointer to save space, since they only
require a single word.
If a leaf node would have been encountered before reading n bits, then it is
duplicated the necessary number of times in this table.
Leaf nodes pack both a token value and their actual depth in the tree.
The token in the leaf node is (-leaf&255).
The number of bits that need to be consumed to reach the leaf, starting from
the current node, is (-leaf>>8).
@ARTICLE{Hash95,
author="Reza Hashemian",
title="Memory Efficient and High-Speed Search {Huffman} Coding",
journal="{IEEE} Transactions on Communications",
volume=43,
number=10,
pages="2576--2581",
month=Oct,
year=1995
}*/
/*The map from external spec-defined tokens to internal tokens.
This is constructed so that any extra bits read with the original token value
can be masked off the least significant bits of its internal token index.
In addition, all of the tokens which require additional extra bits are placed
at the start of the list, and grouped by type.
OC_DCT_REPEAT_RUN3_TOKEN is placed first, as it is an extra-special case, so
giving it index 0 may simplify comparisons on some architectures.
These requirements require some substantial reordering.*/
static const unsigned char OC_DCT_TOKEN_MAP[TH_NDCT_TOKENS]={
/*OC_DCT_EOB1_TOKEN (0 extra bits)*/
15,
/*OC_DCT_EOB2_TOKEN (0 extra bits)*/
16,
/*OC_DCT_EOB3_TOKEN (0 extra bits)*/
17,
/*OC_DCT_REPEAT_RUN0_TOKEN (2 extra bits)*/
88,
/*OC_DCT_REPEAT_RUN1_TOKEN (3 extra bits)*/
80,
/*OC_DCT_REPEAT_RUN2_TOKEN (4 extra bits)*/
1,
/*OC_DCT_REPEAT_RUN3_TOKEN (12 extra bits)*/
0,
/*OC_DCT_SHORT_ZRL_TOKEN (3 extra bits)*/
48,
/*OC_DCT_ZRL_TOKEN (6 extra bits)*/
14,
/*OC_ONE_TOKEN (0 extra bits)*/
56,
/*OC_MINUS_ONE_TOKEN (0 extra bits)*/
57,
/*OC_TWO_TOKEN (0 extra bits)*/
58,
/*OC_MINUS_TWO_TOKEN (0 extra bits)*/
59,
/*OC_DCT_VAL_CAT2 (1 extra bit)*/
60,
62,
64,
66,
/*OC_DCT_VAL_CAT3 (2 extra bits)*/
68,
/*OC_DCT_VAL_CAT4 (3 extra bits)*/
72,
/*OC_DCT_VAL_CAT5 (4 extra bits)*/
2,
/*OC_DCT_VAL_CAT6 (5 extra bits)*/
4,
/*OC_DCT_VAL_CAT7 (6 extra bits)*/
6,
/*OC_DCT_VAL_CAT8 (10 extra bits)*/
8,
/*OC_DCT_RUN_CAT1A (1 extra bit)*/
18,
20,
22,
24,
26,
/*OC_DCT_RUN_CAT1B (3 extra bits)*/
32,
/*OC_DCT_RUN_CAT1C (4 extra bits)*/
12,
/*OC_DCT_RUN_CAT2A (2 extra bits)*/
28,
/*OC_DCT_RUN_CAT2B (3 extra bits)*/
40
};
/*The log base 2 of number of internal tokens associated with each of the spec
tokens (i.e., how many of the extra bits are folded into the token value).
Increasing the maximum value beyond 3 will enlarge the amount of stack
required for tree construction.*/
static const unsigned char OC_DCT_TOKEN_MAP_LOG_NENTRIES[TH_NDCT_TOKENS]={
0,0,0,2,3,0,0,3,0,0,0,0,0,1,1,1,1,2,3,1,1,1,2,1,1,1,1,1,3,1,2,3
};
/*The size a lookup table is allowed to grow to relative to the number of
unique nodes it contains.
E.g., if OC_HUFF_SLUSH is 4, then at most 75% of the space in the tree is
wasted (1/4 of the space must be used).
Larger numbers can decode tokens with fewer read operations, while smaller
numbers may save more space.
With a sample file:
32233473 read calls are required when no tree collapsing is done (100.0%).
19269269 read calls are required when OC_HUFF_SLUSH is 1 (59.8%).
11144969 read calls are required when OC_HUFF_SLUSH is 2 (34.6%).
10538563 read calls are required when OC_HUFF_SLUSH is 4 (32.7%).
10192578 read calls are required when OC_HUFF_SLUSH is 8 (31.6%).
Since a value of 2 gets us the vast majority of the speed-up with only a
small amount of wasted memory, this is what we use.
This value must be less than 128, or you could create a tree with more than
32767 entries, which would overflow the 16-bit words used to index it.*/
#define OC_HUFF_SLUSH (2)
/*The root of the tree is on the fast path, and a larger value here is more
beneficial than elsewhere in the tree.
7 appears to give the best performance, trading off between increased use of
the single-read fast path and cache footprint for the tables, though
obviously this will depend on your cache size.
Using 7 here, the VP3 tables are about twice as large compared to using 2.*/
#define OC_ROOT_HUFF_SLUSH (7)
/*Unpacks a Huffman codebook.
_opb: The buffer to unpack from.
_tokens: Stores a list of internal tokens, in the order they were found in
the codebook, and the lengths of their corresponding codewords.
This is enough to completely define the codebook, while minimizing
stack usage and avoiding temporary allocations (for platforms
where free() is a no-op).
Return: The number of internal tokens in the codebook, or a negative value
on error.*/
int oc_huff_tree_unpack(oc_pack_buf *_opb,unsigned char _tokens[256][2]){
ogg_uint32_t code;
int len;
int ntokens;
int nleaves;
code=0;
len=ntokens=nleaves=0;
for(;;){
long bits;
bits=oc_pack_read1(_opb);
/*Only process nodes so long as there's more bits in the buffer.*/
if(oc_pack_bytes_left(_opb)<0)return TH_EBADHEADER;
/*Read an internal node:*/
if(!bits){
len++;
/*Don't allow codewords longer than 32 bits.*/
if(len>32)return TH_EBADHEADER;
}
/*Read a leaf node:*/
else{
ogg_uint32_t code_bit;
int neb;
int nentries;
int token;
/*Don't allow more than 32 spec-tokens per codebook.*/
if(++nleaves>32)return TH_EBADHEADER;
bits=oc_pack_read(_opb,OC_NDCT_TOKEN_BITS);
neb=OC_DCT_TOKEN_MAP_LOG_NENTRIES[bits];
token=OC_DCT_TOKEN_MAP[bits];
nentries=1<<neb;
while(nentries-->0){
_tokens[ntokens][0]=(unsigned char)token++;
_tokens[ntokens][1]=(unsigned char)(len+neb);
ntokens++;
}
code_bit=0x80000000U>>len-1;
while(len>0&&(code&code_bit)){
code^=code_bit;
code_bit<<=1;
len--;
}
if(len<=0)break;
code|=code_bit;
}
}
return ntokens;
}
/*Count how many tokens would be required to fill a subtree at depth _depth.
_tokens: A list of internal tokens, in the order they are found in the
codebook, and the lengths of their corresponding codewords.
_depth: The depth of the desired node in the corresponding tree structure.
Return: The number of tokens that belong to that subtree.*/
static int oc_huff_subtree_tokens(unsigned char _tokens[][2],int _depth){
ogg_uint32_t code;
int ti;
code=0;
ti=0;
do{
if(_tokens[ti][1]-_depth<32)code+=0x80000000U>>_tokens[ti++][1]-_depth;
else{
/*Because of the expanded internal tokens, we can have codewords as long
as 35 bits.
A single recursion here is enough to advance past them.*/
code++;
ti+=oc_huff_subtree_tokens(_tokens+ti,_depth+31);
}
}
while(code<0x80000000U);
return ti;
}
/*Compute the number of bits to use for a collapsed tree node at the given
depth.
_tokens: A list of internal tokens, in the order they are found in the
codebook, and the lengths of their corresponding codewords.
_ntokens: The number of tokens corresponding to this tree node.
_depth: The depth of this tree node.
Return: The number of bits to use for a collapsed tree node rooted here.
This is always at least one, even if this was a leaf node.*/
static int oc_huff_tree_collapse_depth(unsigned char _tokens[][2],
int _ntokens,int _depth){
int got_leaves;
int loccupancy;
int occupancy;
int slush;
int nbits;
int best_nbits;
slush=_depth>0?OC_HUFF_SLUSH:OC_ROOT_HUFF_SLUSH;
/*It's legal to have a tree with just a single node, which requires no bits
to decode and always returns the same token.
However, no encoder actually does this (yet).
To avoid a special case in oc_huff_token_decode(), we force the number of
lookahead bits to be at least one.
This will produce a tree that looks ahead one bit and then advances the
stream zero bits.*/
nbits=1;
occupancy=2;
got_leaves=1;
do{
int ti;
if(got_leaves)best_nbits=nbits;
nbits++;
got_leaves=0;
loccupancy=occupancy;
for(occupancy=ti=0;ti<_ntokens;occupancy++){
if(_tokens[ti][1]<_depth+nbits)ti++;
else if(_tokens[ti][1]==_depth+nbits){
got_leaves=1;
ti++;
}
else ti+=oc_huff_subtree_tokens(_tokens+ti,_depth+nbits);
}
}
while(occupancy>loccupancy&&occupancy*slush>=1<<nbits);
return best_nbits;
}
/*Determines the size in words of a Huffman tree node that represents a
subtree of depth _nbits.
_nbits: The depth of the subtree.
This must be greater than zero.
Return: The number of words required to store the node.*/
static size_t oc_huff_node_size(int _nbits){
return 1+(1<<_nbits);
}
/*Produces a collapsed-tree representation of the given token list.
_tree: The storage for the collapsed Huffman tree.
This may be NULL to compute the required storage size instead of
constructing the tree.
_tokens: A list of internal tokens, in the order they are found in the
codebook, and the lengths of their corresponding codewords.
_ntokens: The number of tokens corresponding to this tree node.
Return: The number of words required to store the tree.*/
static size_t oc_huff_tree_collapse(ogg_int16_t *_tree,
unsigned char _tokens[][2],int _ntokens){
ogg_int16_t node[34];
unsigned char depth[34];
unsigned char last[34];
size_t ntree;
int ti;
int l;
depth[0]=0;
last[0]=(unsigned char)(_ntokens-1);
ntree=0;
ti=0;
l=0;
do{
int nbits;
nbits=oc_huff_tree_collapse_depth(_tokens+ti,last[l]+1-ti,depth[l]);
node[l]=(ogg_int16_t)ntree;
ntree+=oc_huff_node_size(nbits);
if(_tree!=NULL)_tree[node[l]++]=(ogg_int16_t)nbits;
do{
while(ti<=last[l]&&_tokens[ti][1]<=depth[l]+nbits){
if(_tree!=NULL){
ogg_int16_t leaf;
int nentries;
nentries=1<<depth[l]+nbits-_tokens[ti][1];
leaf=(ogg_int16_t)-(_tokens[ti][1]-depth[l]<<8|_tokens[ti][0]);
while(nentries-->0)_tree[node[l]++]=leaf;
}
ti++;
}
if(ti<=last[l]){
/*We need to recurse*/
depth[l+1]=(unsigned char)(depth[l]+nbits);
if(_tree!=NULL)_tree[node[l]++]=(ogg_int16_t)ntree;
l++;
last[l]=
(unsigned char)(ti+oc_huff_subtree_tokens(_tokens+ti,depth[l])-1);
break;
}
/*Pop back up a level of recursion.*/
else if(l-->0)nbits=depth[l+1]-depth[l];
}
while(l>=0);
}
while(l>=0);
return ntree;
}
/*Unpacks a set of Huffman trees, and reduces them to a collapsed
representation.
_opb: The buffer to unpack the trees from.
_nodes: The table to fill with the Huffman trees.
Return: 0 on success, or a negative value on error.
The caller is responsible for cleaning up any partially initialized
_nodes on failure.*/
int oc_huff_trees_unpack(oc_pack_buf *_opb,
ogg_int16_t *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
unsigned char tokens[256][2];
int ntokens;
ogg_int16_t *tree;
size_t size;
/*Unpack the full tree into a temporary buffer.*/
ntokens=oc_huff_tree_unpack(_opb,tokens);
if(ntokens<0)return ntokens;
/*Figure out how big the collapsed tree will be and allocate space for it.*/
size=oc_huff_tree_collapse(NULL,tokens,ntokens);
/*This should never happen; if it does it means you set OC_HUFF_SLUSH or
OC_ROOT_HUFF_SLUSH too large.*/
if(size>32767)return TH_EIMPL;
tree=(ogg_int16_t *)_ogg_malloc(size*sizeof(*tree));
if(tree==NULL)return TH_EFAULT;
/*Construct the collapsed the tree.*/
oc_huff_tree_collapse(tree,tokens,ntokens);
_nodes[i]=tree;
}
return 0;
}
/*Determines the size in words of a Huffman subtree.
_tree: The complete Huffman tree.
_node: The index of the root of the desired subtree.
Return: The number of words required to store the tree.*/
static size_t oc_huff_tree_size(const ogg_int16_t *_tree,int _node){
size_t size;
int nchildren;
int n;
int i;
n=_tree[_node];
size=oc_huff_node_size(n);
nchildren=1<<n;
i=0;
do{
int child;
child=_tree[_node+i+1];
if(child<=0)i+=1<<n-(-child>>8);
else{
size+=oc_huff_tree_size(_tree,child);
i++;
}
}
while(i<nchildren);
return size;
}
/*Makes a copy of the given set of Huffman trees.
_dst: The array to store the copy in.
_src: The array of trees to copy.*/
int oc_huff_trees_copy(ogg_int16_t *_dst[TH_NHUFFMAN_TABLES],
const ogg_int16_t *const _src[TH_NHUFFMAN_TABLES]){
int total;
int i;
total=0;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
size_t size;
size=oc_huff_tree_size(_src[i],0);
total+=size;
_dst[i]=(ogg_int16_t *)_ogg_malloc(size*sizeof(*_dst[i]));
if(_dst[i]==NULL){
while(i-->0)_ogg_free(_dst[i]);
return TH_EFAULT;
}
memcpy(_dst[i],_src[i],size*sizeof(*_dst[i]));
}
return 0;
}
/*Frees the memory used by a set of Huffman trees.
_nodes: The array of trees to free.*/
void oc_huff_trees_clear(ogg_int16_t *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++)_ogg_free(_nodes[i]);
}
/*Unpacks a single token using the given Huffman tree.
_opb: The buffer to unpack the token from.
_node: The tree to unpack the token with.
Return: The token value.*/
int oc_huff_token_decode_c(oc_pack_buf *_opb,const ogg_int16_t *_tree){
const unsigned char *ptr;
const unsigned char *stop;
oc_pb_window window;
int available;
long bits;
int node;
int n;
ptr=_opb->ptr;
window=_opb->window;
stop=_opb->stop;
available=_opb->bits;
node=0;
for(;;){
n=_tree[node];
if(n>available){
unsigned shift;
shift=OC_PB_WINDOW_SIZE-available;
do{
/*We don't bother setting eof because we won't check for it after we've
started decoding DCT tokens.*/
if(ptr>=stop){
shift=(unsigned)-OC_LOTS_OF_BITS;
break;
}
shift-=8;
window|=(oc_pb_window)*ptr++<<shift;
}
while(shift>=8);
/*Note: We never request more than 24 bits, so there's no need to fill in
the last partial byte here.*/
available=OC_PB_WINDOW_SIZE-shift;
}
bits=window>>OC_PB_WINDOW_SIZE-n;
node=_tree[node+1+bits];
if(node<=0)break;
window<<=n;
available-=n;
}
node=-node;
n=node>>8;
window<<=n;
available-=n;
_opb->ptr=ptr;
_opb->window=window;
_opb->bits=available;
return node&255;
}

View File

@ -1,32 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_huffdec_H)
# define _huffdec_H (1)
# include "huffman.h"
# include "bitpack.h"
int oc_huff_trees_unpack(oc_pack_buf *_opb,
ogg_int16_t *_nodes[TH_NHUFFMAN_TABLES]);
int oc_huff_trees_copy(ogg_int16_t *_dst[TH_NHUFFMAN_TABLES],
const ogg_int16_t *const _src[TH_NHUFFMAN_TABLES]);
void oc_huff_trees_clear(ogg_int16_t *_nodes[TH_NHUFFMAN_TABLES]);
int oc_huff_token_decode_c(oc_pack_buf *_opb,const ogg_int16_t *_node);
#endif

View File

@ -1,70 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_huffman_H)
# define _huffman_H (1)
# include "theora/codec.h"
# include "ocintrin.h"
/*The range of valid quantized DCT coefficient values.
VP3 used 511 in the encoder, but the bitstream is capable of 580.*/
#define OC_DCT_VAL_RANGE (580)
#define OC_NDCT_TOKEN_BITS (5)
#define OC_DCT_EOB1_TOKEN (0)
#define OC_DCT_EOB2_TOKEN (1)
#define OC_DCT_EOB3_TOKEN (2)
#define OC_DCT_REPEAT_RUN0_TOKEN (3)
#define OC_DCT_REPEAT_RUN1_TOKEN (4)
#define OC_DCT_REPEAT_RUN2_TOKEN (5)
#define OC_DCT_REPEAT_RUN3_TOKEN (6)
#define OC_DCT_SHORT_ZRL_TOKEN (7)
#define OC_DCT_ZRL_TOKEN (8)
#define OC_ONE_TOKEN (9)
#define OC_MINUS_ONE_TOKEN (10)
#define OC_TWO_TOKEN (11)
#define OC_MINUS_TWO_TOKEN (12)
#define OC_DCT_VAL_CAT2 (13)
#define OC_DCT_VAL_CAT3 (17)
#define OC_DCT_VAL_CAT4 (18)
#define OC_DCT_VAL_CAT5 (19)
#define OC_DCT_VAL_CAT6 (20)
#define OC_DCT_VAL_CAT7 (21)
#define OC_DCT_VAL_CAT8 (22)
#define OC_DCT_RUN_CAT1A (23)
#define OC_DCT_RUN_CAT1B (28)
#define OC_DCT_RUN_CAT1C (29)
#define OC_DCT_RUN_CAT2A (30)
#define OC_DCT_RUN_CAT2B (31)
#define OC_NDCT_EOB_TOKEN_MAX (7)
#define OC_NDCT_ZRL_TOKEN_MAX (9)
#define OC_NDCT_VAL_MAX (23)
#define OC_NDCT_VAL_CAT1_MAX (13)
#define OC_NDCT_VAL_CAT2_MAX (17)
#define OC_NDCT_VAL_CAT2_SIZE (OC_NDCT_VAL_CAT2_MAX-OC_DCT_VAL_CAT2)
#define OC_NDCT_RUN_MAX (32)
#define OC_NDCT_RUN_CAT1A_MAX (28)
extern const unsigned char OC_DCT_TOKEN_EXTRA_BITS[TH_NDCT_TOKENS];
#endif

View File

@ -1,330 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <string.h>
#include "internal.h"
#include "dct.h"
/*Performs an inverse 8 point Type-II DCT transform.
The output is scaled by a factor of 2 relative to the orthonormal version of
the transform.
_y: The buffer to store the result in.
Data will be placed in every 8th entry (e.g., in a column of an 8x8
block).
_x: The input coefficients.
The first 8 entries are used (e.g., from a row of an 8x8 block).*/
static void idct8(ogg_int16_t *_y,const ogg_int16_t _x[8]){
ogg_int32_t t[8];
ogg_int32_t r;
/*Stage 1:*/
/*0-1 butterfly.*/
t[0]=OC_C4S4*(ogg_int16_t)(_x[0]+_x[4])>>16;
t[1]=OC_C4S4*(ogg_int16_t)(_x[0]-_x[4])>>16;
/*2-3 rotation by 6pi/16.*/
t[2]=(OC_C6S2*_x[2]>>16)-(OC_C2S6*_x[6]>>16);
t[3]=(OC_C2S6*_x[2]>>16)+(OC_C6S2*_x[6]>>16);
/*4-7 rotation by 7pi/16.*/
t[4]=(OC_C7S1*_x[1]>>16)-(OC_C1S7*_x[7]>>16);
/*5-6 rotation by 3pi/16.*/
t[5]=(OC_C3S5*_x[5]>>16)-(OC_C5S3*_x[3]>>16);
t[6]=(OC_C5S3*_x[5]>>16)+(OC_C3S5*_x[3]>>16);
t[7]=(OC_C1S7*_x[1]>>16)+(OC_C7S1*_x[7]>>16);
/*Stage 2:*/
/*4-5 butterfly.*/
r=t[4]+t[5];
t[5]=OC_C4S4*(ogg_int16_t)(t[4]-t[5])>>16;
t[4]=r;
/*7-6 butterfly.*/
r=t[7]+t[6];
t[6]=OC_C4S4*(ogg_int16_t)(t[7]-t[6])>>16;
t[7]=r;
/*Stage 3:*/
/*0-3 butterfly.*/
r=t[0]+t[3];
t[3]=t[0]-t[3];
t[0]=r;
/*1-2 butterfly.*/
r=t[1]+t[2];
t[2]=t[1]-t[2];
t[1]=r;
/*6-5 butterfly.*/
r=t[6]+t[5];
t[5]=t[6]-t[5];
t[6]=r;
/*Stage 4:*/
/*0-7 butterfly.*/
_y[0<<3]=(ogg_int16_t)(t[0]+t[7]);
/*1-6 butterfly.*/
_y[1<<3]=(ogg_int16_t)(t[1]+t[6]);
/*2-5 butterfly.*/
_y[2<<3]=(ogg_int16_t)(t[2]+t[5]);
/*3-4 butterfly.*/
_y[3<<3]=(ogg_int16_t)(t[3]+t[4]);
_y[4<<3]=(ogg_int16_t)(t[3]-t[4]);
_y[5<<3]=(ogg_int16_t)(t[2]-t[5]);
_y[6<<3]=(ogg_int16_t)(t[1]-t[6]);
_y[7<<3]=(ogg_int16_t)(t[0]-t[7]);
}
/*Performs an inverse 8 point Type-II DCT transform.
The output is scaled by a factor of 2 relative to the orthonormal version of
the transform.
_y: The buffer to store the result in.
Data will be placed in every 8th entry (e.g., in a column of an 8x8
block).
_x: The input coefficients.
Only the first 4 entries are used.
The other 4 are assumed to be 0.*/
static void idct8_4(ogg_int16_t *_y,const ogg_int16_t _x[8]){
ogg_int32_t t[8];
ogg_int32_t r;
/*Stage 1:*/
t[0]=OC_C4S4*_x[0]>>16;
t[2]=OC_C6S2*_x[2]>>16;
t[3]=OC_C2S6*_x[2]>>16;
t[4]=OC_C7S1*_x[1]>>16;
t[5]=-(OC_C5S3*_x[3]>>16);
t[6]=OC_C3S5*_x[3]>>16;
t[7]=OC_C1S7*_x[1]>>16;
/*Stage 2:*/
r=t[4]+t[5];
t[5]=OC_C4S4*(ogg_int16_t)(t[4]-t[5])>>16;
t[4]=r;
r=t[7]+t[6];
t[6]=OC_C4S4*(ogg_int16_t)(t[7]-t[6])>>16;
t[7]=r;
/*Stage 3:*/
t[1]=t[0]+t[2];
t[2]=t[0]-t[2];
r=t[0]+t[3];
t[3]=t[0]-t[3];
t[0]=r;
r=t[6]+t[5];
t[5]=t[6]-t[5];
t[6]=r;
/*Stage 4:*/
_y[0<<3]=(ogg_int16_t)(t[0]+t[7]);
_y[1<<3]=(ogg_int16_t)(t[1]+t[6]);
_y[2<<3]=(ogg_int16_t)(t[2]+t[5]);
_y[3<<3]=(ogg_int16_t)(t[3]+t[4]);
_y[4<<3]=(ogg_int16_t)(t[3]-t[4]);
_y[5<<3]=(ogg_int16_t)(t[2]-t[5]);
_y[6<<3]=(ogg_int16_t)(t[1]-t[6]);
_y[7<<3]=(ogg_int16_t)(t[0]-t[7]);
}
/*Performs an inverse 8 point Type-II DCT transform.
The output is scaled by a factor of 2 relative to the orthonormal version of
the transform.
_y: The buffer to store the result in.
Data will be placed in every 8th entry (e.g., in a column of an 8x8
block).
_x: The input coefficients.
Only the first 3 entries are used.
The other 5 are assumed to be 0.*/
static void idct8_3(ogg_int16_t *_y,const ogg_int16_t _x[8]){
ogg_int32_t t[8];
ogg_int32_t r;
/*Stage 1:*/
t[0]=OC_C4S4*_x[0]>>16;
t[2]=OC_C6S2*_x[2]>>16;
t[3]=OC_C2S6*_x[2]>>16;
t[4]=OC_C7S1*_x[1]>>16;
t[7]=OC_C1S7*_x[1]>>16;
/*Stage 2:*/
t[5]=OC_C4S4*t[4]>>16;
t[6]=OC_C4S4*t[7]>>16;
/*Stage 3:*/
t[1]=t[0]+t[2];
t[2]=t[0]-t[2];
r=t[0]+t[3];
t[3]=t[0]-t[3];
t[0]=r;
r=t[6]+t[5];
t[5]=t[6]-t[5];
t[6]=r;
/*Stage 4:*/
_y[0<<3]=(ogg_int16_t)(t[0]+t[7]);
_y[1<<3]=(ogg_int16_t)(t[1]+t[6]);
_y[2<<3]=(ogg_int16_t)(t[2]+t[5]);
_y[3<<3]=(ogg_int16_t)(t[3]+t[4]);
_y[4<<3]=(ogg_int16_t)(t[3]-t[4]);
_y[5<<3]=(ogg_int16_t)(t[2]-t[5]);
_y[6<<3]=(ogg_int16_t)(t[1]-t[6]);
_y[7<<3]=(ogg_int16_t)(t[0]-t[7]);
}
/*Performs an inverse 8 point Type-II DCT transform.
The output is scaled by a factor of 2 relative to the orthonormal version of
the transform.
_y: The buffer to store the result in.
Data will be placed in every 8th entry (e.g., in a column of an 8x8
block).
_x: The input coefficients.
Only the first 2 entries are used.
The other 6 are assumed to be 0.*/
static void idct8_2(ogg_int16_t *_y,const ogg_int16_t _x[8]){
ogg_int32_t t[8];
ogg_int32_t r;
/*Stage 1:*/
t[0]=OC_C4S4*_x[0]>>16;
t[4]=OC_C7S1*_x[1]>>16;
t[7]=OC_C1S7*_x[1]>>16;
/*Stage 2:*/
t[5]=OC_C4S4*t[4]>>16;
t[6]=OC_C4S4*t[7]>>16;
/*Stage 3:*/
r=t[6]+t[5];
t[5]=t[6]-t[5];
t[6]=r;
/*Stage 4:*/
_y[0<<3]=(ogg_int16_t)(t[0]+t[7]);
_y[1<<3]=(ogg_int16_t)(t[0]+t[6]);
_y[2<<3]=(ogg_int16_t)(t[0]+t[5]);
_y[3<<3]=(ogg_int16_t)(t[0]+t[4]);
_y[4<<3]=(ogg_int16_t)(t[0]-t[4]);
_y[5<<3]=(ogg_int16_t)(t[0]-t[5]);
_y[6<<3]=(ogg_int16_t)(t[0]-t[6]);
_y[7<<3]=(ogg_int16_t)(t[0]-t[7]);
}
/*Performs an inverse 8 point Type-II DCT transform.
The output is scaled by a factor of 2 relative to the orthonormal version of
the transform.
_y: The buffer to store the result in.
Data will be placed in every 8th entry (e.g., in a column of an 8x8
block).
_x: The input coefficients.
Only the first entry is used.
The other 7 are assumed to be 0.*/
static void idct8_1(ogg_int16_t *_y,const ogg_int16_t _x[1]){
_y[0<<3]=_y[1<<3]=_y[2<<3]=_y[3<<3]=
_y[4<<3]=_y[5<<3]=_y[6<<3]=_y[7<<3]=(ogg_int16_t)(OC_C4S4*_x[0]>>16);
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.
All coefficients but the first 3 in zig-zag scan order are assumed to be 0:
x x 0 0 0 0 0 0
x 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
_y: The buffer to store the result in.
This may be the same as _x.
_x: The input coefficients.*/
static void oc_idct8x8_3(ogg_int16_t _y[64],ogg_int16_t _x[64]){
ogg_int16_t w[64];
int i;
/*Transform rows of x into columns of w.*/
idct8_2(w,_x);
idct8_1(w+1,_x+8);
/*Transform rows of w into columns of y.*/
for(i=0;i<8;i++)idct8_2(_y+i,w+i*8);
/*Adjust for the scale factor.*/
for(i=0;i<64;i++)_y[i]=(ogg_int16_t)(_y[i]+8>>4);
/*Clear input data for next block.*/
_x[0]=_x[1]=_x[8]=0;
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.
All coefficients but the first 10 in zig-zag scan order are assumed to be 0:
x x x x 0 0 0 0
x x x 0 0 0 0 0
x x 0 0 0 0 0 0
x 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
_y: The buffer to store the result in.
This may be the same as _x.
_x: The input coefficients.*/
static void oc_idct8x8_10(ogg_int16_t _y[64],ogg_int16_t _x[64]){
ogg_int16_t w[64];
int i;
/*Transform rows of x into columns of w.*/
idct8_4(w,_x);
idct8_3(w+1,_x+8);
idct8_2(w+2,_x+16);
idct8_1(w+3,_x+24);
/*Transform rows of w into columns of y.*/
for(i=0;i<8;i++)idct8_4(_y+i,w+i*8);
/*Adjust for the scale factor.*/
for(i=0;i<64;i++)_y[i]=(ogg_int16_t)(_y[i]+8>>4);
/*Clear input data for next block.*/
_x[0]=_x[1]=_x[2]=_x[3]=_x[8]=_x[9]=_x[10]=_x[16]=_x[17]=_x[24]=0;
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.
_y: The buffer to store the result in.
This may be the same as _x.
_x: The input coefficients.*/
static void oc_idct8x8_slow(ogg_int16_t _y[64],ogg_int16_t _x[64]){
ogg_int16_t w[64];
int i;
/*Transform rows of x into columns of w.*/
for(i=0;i<8;i++)idct8(w+i,_x+i*8);
/*Transform rows of w into columns of y.*/
for(i=0;i<8;i++)idct8(_y+i,w+i*8);
/*Adjust for the scale factor.*/
for(i=0;i<64;i++)_y[i]=(ogg_int16_t)(_y[i]+8>>4);
/*Clear input data for next block.*/
for(i=0;i<64;i++)_x[i]=0;
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.*/
void oc_idct8x8_c(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi){
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
use a smaller transform when the block ends with a long zero run instead
of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Then perform the iDCT.*/
if(_last_zzi<=3)oc_idct8x8_3(_y,_x);
else if(_last_zzi<=10)oc_idct8x8_10(_y,_x);
else oc_idct8x8_slow(_y,_x);
}

View File

@ -1,131 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include "internal.h"
/*This is more or less the same as strncasecmp, but that doesn't exist
everywhere, and this is a fairly trivial function, so we include it.
Note: We take advantage of the fact that we know _n is less than or equal to
the length of at least one of the strings.*/
static int oc_tagcompare(const char *_s1,const char *_s2,int _n){
int c;
for(c=0;c<_n;c++){
if(toupper(_s1[c])!=toupper(_s2[c]))return !0;
}
return _s1[c]!='=';
}
void th_info_init(th_info *_info){
memset(_info,0,sizeof(*_info));
_info->version_major=TH_VERSION_MAJOR;
_info->version_minor=TH_VERSION_MINOR;
_info->version_subminor=TH_VERSION_SUB;
_info->keyframe_granule_shift=6;
}
void th_info_clear(th_info *_info){
memset(_info,0,sizeof(*_info));
}
void th_comment_init(th_comment *_tc){
memset(_tc,0,sizeof(*_tc));
}
void th_comment_add(th_comment *_tc,const char *_comment){
char **user_comments;
int *comment_lengths;
int comment_len;
user_comments=_ogg_realloc(_tc->user_comments,
(_tc->comments+2)*sizeof(*_tc->user_comments));
if(user_comments==NULL)return;
_tc->user_comments=user_comments;
comment_lengths=_ogg_realloc(_tc->comment_lengths,
(_tc->comments+2)*sizeof(*_tc->comment_lengths));
if(comment_lengths==NULL)return;
_tc->comment_lengths=comment_lengths;
comment_len=strlen(_comment);
comment_lengths[_tc->comments]=comment_len;
user_comments[_tc->comments]=_ogg_malloc(comment_len+1);
if(user_comments[_tc->comments]==NULL)return;
memcpy(_tc->user_comments[_tc->comments],_comment,comment_len+1);
_tc->comments++;
_tc->user_comments[_tc->comments]=NULL;
}
void th_comment_add_tag(th_comment *_tc,const char *_tag,const char *_val){
char *comment;
int tag_len;
int val_len;
tag_len=strlen(_tag);
val_len=strlen(_val);
/*+2 for '=' and '\0'.*/
comment=_ogg_malloc(tag_len+val_len+2);
if(comment==NULL)return;
memcpy(comment,_tag,tag_len);
comment[tag_len]='=';
memcpy(comment+tag_len+1,_val,val_len+1);
th_comment_add(_tc,comment);
_ogg_free(comment);
}
char *th_comment_query(th_comment *_tc,const char *_tag,int _count){
long i;
int found;
int tag_len;
tag_len=strlen(_tag);
found=0;
for(i=0;i<_tc->comments;i++){
if(!oc_tagcompare(_tc->user_comments[i],_tag,tag_len)){
/*We return a pointer to the data, not a copy.*/
if(_count==found++)return _tc->user_comments[i]+tag_len+1;
}
}
/*Didn't find anything.*/
return NULL;
}
int th_comment_query_count(th_comment *_tc,const char *_tag){
long i;
int tag_len;
int count;
tag_len=strlen(_tag);
count=0;
for(i=0;i<_tc->comments;i++){
if(!oc_tagcompare(_tc->user_comments[i],_tag,tag_len))count++;
}
return count;
}
void th_comment_clear(th_comment *_tc){
if(_tc!=NULL){
long i;
for(i=0;i<_tc->comments;i++)_ogg_free(_tc->user_comments[i]);
_ogg_free(_tc->user_comments);
_ogg_free(_tc->comment_lengths);
_ogg_free(_tc->vendor);
memset(_tc,0,sizeof(*_tc));
}
}

View File

@ -1,210 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include "internal.h"
/*A map from the index in the zig zag scan to the coefficient number in a
block.
All zig zag indices beyond 63 are sent to coefficient 64, so that zero runs
past the end of a block in bogus streams get mapped to a known location.*/
const unsigned char OC_FZIG_ZAG[128]={
0, 1, 8,16, 9, 2, 3,10,
17,24,32,25,18,11, 4, 5,
12,19,26,33,40,48,41,34,
27,20,13, 6, 7,14,21,28,
35,42,49,56,57,50,43,36,
29,22,15,23,30,37,44,51,
58,59,52,45,38,31,39,46,
53,60,61,54,47,55,62,63,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64
};
/*A map from the coefficient number in a block to its index in the zig zag
scan.*/
const unsigned char OC_IZIG_ZAG[64]={
0, 1, 5, 6,14,15,27,28,
2, 4, 7,13,16,26,29,42,
3, 8,12,17,25,30,41,43,
9,11,18,24,31,40,44,53,
10,19,23,32,39,45,52,54,
20,22,33,38,46,51,55,60,
21,34,37,47,50,56,59,61,
35,36,48,49,57,58,62,63
};
/*A map from physical macro block ordering to bitstream macro block
ordering within a super block.*/
const unsigned char OC_MB_MAP[2][2]={{0,3},{1,2}};
/*A list of the indices in the oc_mb.map array that can be valid for each of
the various chroma decimation types.*/
const unsigned char OC_MB_MAP_IDXS[TH_PF_NFORMATS][12]={
{0,1,2,3,4,8},
{0,1,2,3,4,5,8,9},
{0,1,2,3,4,6,8,10},
{0,1,2,3,4,5,6,7,8,9,10,11}
};
/*The number of indices in the oc_mb.map array that can be valid for each of
the various chroma decimation types.*/
const unsigned char OC_MB_MAP_NIDXS[TH_PF_NFORMATS]={6,8,8,12};
/*The number of extra bits that are coded with each of the DCT tokens.
Each DCT token has some fixed number of additional bits (possibly 0) stored
after the token itself, containing, for example, coefficient magnitude,
sign bits, etc.*/
const unsigned char OC_DCT_TOKEN_EXTRA_BITS[TH_NDCT_TOKENS]={
0,0,0,2,3,4,12,3,6,
0,0,0,0,
1,1,1,1,2,3,4,5,6,10,
1,1,1,1,1,3,4,
2,3
};
int oc_ilog(unsigned _v){
int ret;
for(ret=0;_v;ret++)_v>>=1;
return ret;
}
void *oc_aligned_malloc(size_t _sz,size_t _align){
unsigned char *p;
if(_align-1>UCHAR_MAX||(_align&_align-1)||_sz>~(size_t)0-_align)return NULL;
p=(unsigned char *)_ogg_malloc(_sz+_align);
if(p!=NULL){
int offs;
offs=((p-(unsigned char *)0)-1&_align-1);
p[offs]=offs;
p+=offs+1;
}
return p;
}
void oc_aligned_free(void *_ptr){
unsigned char *p;
p=(unsigned char *)_ptr;
if(p!=NULL){
int offs;
offs=*--p;
_ogg_free(p-offs);
}
}
void **oc_malloc_2d(size_t _height,size_t _width,size_t _sz){
size_t rowsz;
size_t colsz;
size_t datsz;
char *ret;
colsz=_height*sizeof(void *);
rowsz=_sz*_width;
datsz=rowsz*_height;
/*Alloc array and row pointers.*/
ret=(char *)_ogg_malloc(datsz+colsz);
/*Initialize the array.*/
if(ret!=NULL){
size_t i;
void **p;
char *datptr;
p=(void **)ret;
i=_height;
for(datptr=ret+colsz;i-->0;p++,datptr+=rowsz)*p=(void *)datptr;
}
return (void **)ret;
}
void **oc_calloc_2d(size_t _height,size_t _width,size_t _sz){
size_t colsz;
size_t rowsz;
size_t datsz;
char *ret;
colsz=_height*sizeof(void *);
rowsz=_sz*_width;
datsz=rowsz*_height;
/*Alloc array and row pointers.*/
ret=(char *)_ogg_calloc(datsz+colsz,1);
/*Initialize the array.*/
if(ret!=NULL){
size_t i;
void **p;
char *datptr;
p=(void **)ret;
i=_height;
for(datptr=ret+colsz;i-->0;p++,datptr+=rowsz)*p=(void *)datptr;
}
return (void **)ret;
}
void oc_free_2d(void *_ptr){
_ogg_free(_ptr);
}
/*Fills in a Y'CbCr buffer with a pointer to the image data in the first
buffer, but with the opposite vertical orientation.
_dst: The destination buffer.
This can be the same as _src.
_src: The source buffer.*/
void oc_ycbcr_buffer_flip(th_ycbcr_buffer _dst,
const th_ycbcr_buffer _src){
int pli;
for(pli=0;pli<3;pli++){
_dst[pli].width=_src[pli].width;
_dst[pli].height=_src[pli].height;
_dst[pli].stride=-_src[pli].stride;
_dst[pli].data=_src[pli].data
+(1-_dst[pli].height)*(ptrdiff_t)_dst[pli].stride;
}
}
const char *th_version_string(void){
return OC_VENDOR_STRING;
}
ogg_uint32_t th_version_number(void){
return (TH_VERSION_MAJOR<<16)+(TH_VERSION_MINOR<<8)+TH_VERSION_SUB;
}
/*Determines the packet type.
Note that this correctly interprets a 0-byte packet as a video data packet.
Return: 1 for a header packet, 0 for a data packet.*/
int th_packet_isheader(ogg_packet *_op){
return _op->bytes>0?_op->packet[0]>>7:0;
}
/*Determines the frame type of a video data packet.
Note that this correctly interprets a 0-byte packet as a delta frame.
Return: 1 for a key frame, 0 for a delta frame, and -1 for a header
packet.*/
int th_packet_iskeyframe(ogg_packet *_op){
return _op->bytes<=0?0:_op->packet[0]&0x80?-1:!(_op->packet[0]&0x40);
}

View File

@ -1,116 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_internal_H)
# define _internal_H (1)
# include <stdlib.h>
# include <limits.h>
# if defined(HAVE_CONFIG_H)
# include "config.h"
# endif
# include "theora/codec.h"
# include "theora/theora.h"
# include "ocintrin.h"
# if !defined(__GNUC_PREREQ)
# if defined(__GNUC__)&&defined(__GNUC_MINOR__)
# define __GNUC_PREREQ(_maj,_min) \
((__GNUC__<<16)+__GNUC_MINOR__>=((_maj)<<16)+(_min))
# else
# define __GNUC_PREREQ(_maj,_min) 0
# endif
# endif
# if defined(_MSC_VER)
/*Disable missing EMMS warnings.*/
# pragma warning(disable:4799)
/*Thank you Microsoft, I know the order of operations.*/
# pragma warning(disable:4554)
# endif
/*You, too, gcc.*/
# if __GNUC_PREREQ(4,2)
# pragma GCC diagnostic ignored "-Wparentheses"
# endif
/*Some assembly constructs require aligned operands.
The following macros are _only_ intended for structure member declarations.
Although they will sometimes work on stack variables, gcc will often silently
ignore them.
A separate set of macros could be made for manual stack alignment, but we
don't actually require it anywhere.*/
# if defined(OC_X86_ASM)||defined(OC_ARM_ASM)
# if defined(__GNUC__)
# define OC_ALIGN8(expr) expr __attribute__((aligned(8)))
# define OC_ALIGN16(expr) expr __attribute__((aligned(16)))
# elif defined(_MSC_VER)
# define OC_ALIGN8(expr) __declspec (align(8)) expr
# define OC_ALIGN16(expr) __declspec (align(16)) expr
# else
# error "Alignment macros required for this platform."
# endif
# endif
# if !defined(OC_ALIGN8)
# define OC_ALIGN8(expr) expr
# endif
# if !defined(OC_ALIGN16)
# define OC_ALIGN16(expr) expr
# endif
/*This library's version.*/
# define OC_VENDOR_STRING "Xiph.Org libtheora 1.2.0alpha 20100924 (Ptalarbvorm)"
/*Theora bitstream version.*/
# define TH_VERSION_MAJOR (3)
# define TH_VERSION_MINOR (2)
# define TH_VERSION_SUB (1)
# define TH_VERSION_CHECK(_info,_maj,_min,_sub) \
((_info)->version_major>(_maj)||(_info)->version_major==(_maj)&& \
((_info)->version_minor>(_min)||(_info)->version_minor==(_min)&& \
(_info)->version_subminor>=(_sub)))
/*A map from the index in the zig zag scan to the coefficient number in a
block.*/
extern const unsigned char OC_FZIG_ZAG[128];
/*A map from the coefficient number in a block to its index in the zig zag
scan.*/
extern const unsigned char OC_IZIG_ZAG[64];
/*A map from physical macro block ordering to bitstream macro block
ordering within a super block.*/
extern const unsigned char OC_MB_MAP[2][2];
/*A list of the indices in the oc_mb_map array that can be valid for each of
the various chroma decimation types.*/
extern const unsigned char OC_MB_MAP_IDXS[TH_PF_NFORMATS][12];
/*The number of indices in the oc_mb_map array that can be valid for each of
the various chroma decimation types.*/
extern const unsigned char OC_MB_MAP_NIDXS[TH_PF_NFORMATS];
int oc_ilog(unsigned _v);
void *oc_aligned_malloc(size_t _sz,size_t _align);
void oc_aligned_free(void *_ptr);
void **oc_malloc_2d(size_t _height,size_t _width,size_t _sz);
void **oc_calloc_2d(size_t _height,size_t _width,size_t _sz);
void oc_free_2d(void *_ptr);
void oc_ycbcr_buffer_flip(th_ycbcr_buffer _dst,
const th_ycbcr_buffer _src);
#endif

View File

@ -1,143 +0,0 @@
#if !defined(_mathops_H)
# define _mathops_H (1)
# include <ogg/ogg.h>
# if __GNUC_PREREQ(3,4)
# include <limits.h>
/*Note the casts to (int) below: this prevents OC_CLZ{32|64}_OFFS from
"upgrading" the type of an entire expression to an (unsigned) size_t.*/
# if INT_MAX>=2147483647
# define OC_CLZ32_OFFS ((int)sizeof(unsigned)*CHAR_BIT)
# define OC_CLZ32(_x) (__builtin_clz(_x))
# elif LONG_MAX>=2147483647L
# define OC_CLZ32_OFFS ((int)sizeof(unsigned long)*CHAR_BIT)
# define OC_CLZ32(_x) (__builtin_clzl(_x))
# endif
# if INT_MAX>=9223372036854775807LL
# define OC_CLZ64_OFFS ((int)sizeof(unsigned)*CHAR_BIT)
# define OC_CLZ64(_x) (__builtin_clz(_x))
# elif LONG_MAX>=9223372036854775807LL
# define OC_CLZ64_OFFS ((int)sizeof(unsigned long)*CHAR_BIT)
# define OC_CLZ64(_x) (__builtin_clzl(_x))
# elif LLONG_MAX>=9223372036854775807LL|| \
__LONG_LONG_MAX__>=9223372036854775807LL
# define OC_CLZ64_OFFS ((int)sizeof(unsigned long long)*CHAR_BIT)
# define OC_CLZ64(_x) (__builtin_clzll(_x))
# endif
# endif
/**
* oc_ilog32 - Integer binary logarithm of a 32-bit value.
* @_v: A 32-bit value.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* The OC_ILOG_32() or OC_ILOGNZ_32() macros may be able to use a builtin
* function instead, which should be faster.
*/
int oc_ilog32(ogg_uint32_t _v);
/**
* oc_ilog64 - Integer binary logarithm of a 64-bit value.
* @_v: A 64-bit value.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* The OC_ILOG_64() or OC_ILOGNZ_64() macros may be able to use a builtin
* function instead, which should be faster.
*/
int oc_ilog64(ogg_int64_t _v);
# if defined(OC_CLZ32)
/**
* OC_ILOGNZ_32 - Integer binary logarithm of a non-zero 32-bit value.
* @_v: A non-zero 32-bit value.
* Returns floor(log2(_v))+1.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* If _v is zero, the return value is undefined; use OC_ILOG_32() instead.
*/
# define OC_ILOGNZ_32(_v) (OC_CLZ32_OFFS-OC_CLZ32(_v))
/**
* OC_ILOG_32 - Integer binary logarithm of a 32-bit value.
* @_v: A 32-bit value.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
*/
# define OC_ILOG_32(_v) (OC_ILOGNZ_32(_v)&-!!(_v))
# else
# define OC_ILOGNZ_32(_v) (oc_ilog32(_v))
# define OC_ILOG_32(_v) (oc_ilog32(_v))
# endif
# if defined(CLZ64)
/**
* OC_ILOGNZ_64 - Integer binary logarithm of a non-zero 64-bit value.
* @_v: A non-zero 64-bit value.
* Returns floor(log2(_v))+1.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* If _v is zero, the return value is undefined; use OC_ILOG_64() instead.
*/
# define OC_ILOGNZ_64(_v) (CLZ64_OFFS-CLZ64(_v))
/**
* OC_ILOG_64 - Integer binary logarithm of a 64-bit value.
* @_v: A 64-bit value.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
*/
# define OC_ILOG_64(_v) (OC_ILOGNZ_64(_v)&-!!(_v))
# else
# define OC_ILOGNZ_64(_v) (oc_ilog64(_v))
# define OC_ILOG_64(_v) (oc_ilog64(_v))
# endif
# define OC_STATIC_ILOG0(_v) (!!(_v))
# define OC_STATIC_ILOG1(_v) (((_v)&0x2)?2:OC_STATIC_ILOG0(_v))
# define OC_STATIC_ILOG2(_v) \
(((_v)&0xC)?2+OC_STATIC_ILOG1((_v)>>2):OC_STATIC_ILOG1(_v))
# define OC_STATIC_ILOG3(_v) \
(((_v)&0xF0)?4+OC_STATIC_ILOG2((_v)>>4):OC_STATIC_ILOG2(_v))
# define OC_STATIC_ILOG4(_v) \
(((_v)&0xFF00)?8+OC_STATIC_ILOG3((_v)>>8):OC_STATIC_ILOG3(_v))
# define OC_STATIC_ILOG5(_v) \
(((_v)&0xFFFF0000)?16+OC_STATIC_ILOG4((_v)>>16):OC_STATIC_ILOG4(_v))
# define OC_STATIC_ILOG6(_v) \
(((_v)&0xFFFFFFFF00000000ULL)?32+OC_STATIC_ILOG5((_v)>>32):OC_STATIC_ILOG5(_v))
/**
* OC_STATIC_ILOG_32 - The integer logarithm of an (unsigned, 32-bit) constant.
* @_v: A non-negative 32-bit constant.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* This macro is suitable for evaluation at compile time, but it should not be
* used on values that can change at runtime, as it operates via exhaustive
* search.
*/
# define OC_STATIC_ILOG_32(_v) (OC_STATIC_ILOG5((ogg_uint32_t)(_v)))
/**
* OC_STATIC_ILOG_64 - The integer logarithm of an (unsigned, 64-bit) constant.
* @_v: A non-negative 64-bit constant.
* Returns floor(log2(_v))+1, or 0 if _v==0.
* This is the number of bits that would be required to represent _v in two's
* complement notation with all of the leading zeros stripped.
* This macro is suitable for evaluation at compile time, but it should not be
* used on values that can change at runtime, as it operates via exhaustive
* search.
*/
# define OC_STATIC_ILOG_64(_v) (OC_STATIC_ILOG6((ogg_int64_t)(_v)))
#define OC_Q57(_v) ((ogg_int64_t)(_v)<<57)
#define OC_Q10(_v) ((_v)<<10)
ogg_int64_t oc_bexp64(ogg_int64_t _z);
ogg_int64_t oc_blog64(ogg_int64_t _w);
ogg_uint32_t oc_bexp32_q10(int _z);
int oc_blog32_q10(ogg_uint32_t _w);
#endif

View File

@ -1,128 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*Some common macros for potential platform-specific optimization.*/
#include <math.h>
#if !defined(_ocintrin_H)
# define _ocintrin_H (1)
/*Some specific platforms may have optimized intrinsic or inline assembly
versions of these functions which can substantially improve performance.
We define macros for them to allow easy incorporation of these non-ANSI
features.*/
/*Note that we do not provide a macro for abs(), because it is provided as a
library function, which we assume is translated into an intrinsic to avoid
the function call overhead and then implemented in the smartest way for the
target platform.
With modern gcc (4.x), this is true: it uses cmov instructions if the
architecture supports it and branchless bit-twiddling if it does not (the
speed difference between the two approaches is not measurable).
Interestingly, the bit-twiddling method was patented in 2000 (US 6,073,150)
by Sun Microsystems, despite prior art dating back to at least 1996:
http://web.archive.org/web/19961201174141/www.x86.org/ftp/articles/pentopt/PENTOPT.TXT
On gcc 3.x, however, our assumption is not true, as abs() is translated to a
conditional jump, which is horrible on deeply piplined architectures (e.g.,
all consumer architectures for the past decade or more).
Also be warned that -C*abs(x) where C is a constant is mis-optimized as
abs(C*x) on every gcc release before 4.2.3.
See bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34130 */
/*Modern gcc (4.x) can compile the naive versions of min and max with cmov if
given an appropriate architecture, but the branchless bit-twiddling versions
are just as fast, and do not require any special target architecture.
Earlier gcc versions (3.x) compiled both code to the same assembly
instructions, because of the way they represented ((_b)>(_a)) internally.*/
#define OC_MAXI(_a,_b) ((_a)-((_a)-(_b)&-((_b)>(_a))))
#define OC_MINI(_a,_b) ((_a)+((_b)-(_a)&-((_b)<(_a))))
/*Clamps an integer into the given range.
If _a>_c, then the lower bound _a is respected over the upper bound _c (this
behavior is required to meet our documented API behavior).
_a: The lower bound.
_b: The value to clamp.
_c: The upper boud.*/
#define OC_CLAMPI(_a,_b,_c) (OC_MAXI(_a,OC_MINI(_b,_c)))
#define OC_CLAMP255(_x) ((unsigned char)((((_x)<0)-1)&((_x)|-((_x)>255))))
/*This has a chance of compiling branchless, and is just as fast as the
bit-twiddling method, which is slightly less portable, since it relies on a
sign-extended rightshift, which is not guaranteed by ANSI (but present on
every relevant platform).*/
#define OC_SIGNI(_a) (((_a)>0)-((_a)<0))
/*Slightly more portable than relying on a sign-extended right-shift (which is
not guaranteed by ANSI), and just as fast, since gcc (3.x and 4.x both)
compile it into the right-shift anyway.*/
#define OC_SIGNMASK(_a) (-((_a)<0))
/*Divides an integer by a power of two, truncating towards 0.
_dividend: The integer to divide.
_shift: The non-negative power of two to divide by.
_rmask: (1<<_shift)-1*/
#define OC_DIV_POW2(_dividend,_shift,_rmask)\
((_dividend)+(OC_SIGNMASK(_dividend)&(_rmask))>>(_shift))
/*Divides _x by 65536, truncating towards 0.*/
#define OC_DIV2_16(_x) OC_DIV_POW2(_x,16,0xFFFF)
/*Divides _x by 2, truncating towards 0.*/
#define OC_DIV2(_x) OC_DIV_POW2(_x,1,0x1)
/*Divides _x by 8, truncating towards 0.*/
#define OC_DIV8(_x) OC_DIV_POW2(_x,3,0x7)
/*Divides _x by 16, truncating towards 0.*/
#define OC_DIV16(_x) OC_DIV_POW2(_x,4,0xF)
/*Right shifts _dividend by _shift, adding _rval, and subtracting one for
negative dividends first.
When _rval is (1<<_shift-1), this is equivalent to division with rounding
ties away from zero.*/
#define OC_DIV_ROUND_POW2(_dividend,_shift,_rval)\
((_dividend)+OC_SIGNMASK(_dividend)+(_rval)>>(_shift))
/*Divides a _x by 2, rounding towards even numbers.*/
#define OC_DIV2_RE(_x) ((_x)+((_x)>>1&1)>>1)
/*Divides a _x by (1<<(_shift)), rounding towards even numbers.*/
#define OC_DIV_POW2_RE(_x,_shift) \
((_x)+((_x)>>(_shift)&1)+((1<<(_shift))-1>>1)>>(_shift))
/*Swaps two integers _a and _b if _a>_b.*/
#define OC_SORT2I(_a,_b) \
do{ \
int t__; \
t__=((_a)^(_b))&-((_b)<(_a)); \
(_a)^=t__; \
(_b)^=t__; \
} \
while(0)
/*Accesses one of four (signed) bytes given an index.
This can be used to avoid small lookup tables.*/
#define OC_BYTE_TABLE32(_a,_b,_c,_d,_i) \
((signed char) \
(((_a)&0xFF|((_b)&0xFF)<<8|((_c)&0xFF)<<16|((_d)&0xFF)<<24)>>(_i)*8))
/*Accesses one of eight (unsigned) nibbles given an index.
This can be used to avoid small lookup tables.*/
#define OC_UNIBBLE_TABLE32(_a,_b,_c,_d,_e,_f,_g,_h,_i) \
((((_a)&0xF|((_b)&0xF)<<4|((_c)&0xF)<<8|((_d)&0xF)<<12| \
((_e)&0xF)<<16|((_f)&0xF)<<20|((_g)&0xF)<<24|((_h)&0xF)<<28)>>(_i)*4)&0xF)
/*All of these macros should expect floats as arguments.*/
#define OC_MAXF(_a,_b) ((_a)<(_b)?(_b):(_a))
#define OC_MINF(_a,_b) ((_a)>(_b)?(_b):(_a))
#define OC_CLAMPF(_a,_b,_c) (OC_MINF(_a,OC_MAXF(_b,_c)))
#define OC_FABSF(_f) ((float)fabs(_f))
#define OC_SQRTF(_f) ((float)sqrt(_f))
#define OC_POWF(_b,_e) ((float)pow(_b,_e))
#define OC_LOGF(_f) ((float)log(_f))
#define OC_IFLOORF(_f) ((int)floor(_f))
#define OC_ICEILF(_f) ((int)ceil(_f))
#endif

View File

@ -1,127 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <ogg/ogg.h>
#include "quant.h"
#include "decint.h"
/*The maximum output of the DCT with +/- 255 inputs is +/- 8157.
These minimum quantizers ensure the result after quantization (and after
prediction for DC) will be no more than +/- 510.
The tokenization system can handle values up to +/- 580, so there is no need
to do any coefficient clamping.
I would rather have allowed smaller quantizers and had to clamp, but these
minimums were required when constructing the original VP3 matrices and have
been formalized in the spec.*/
static const unsigned OC_DC_QUANT_MIN[2]={4<<2,8<<2};
static const unsigned OC_AC_QUANT_MIN[2]={2<<2,4<<2};
/*Initializes the dequantization tables from a set of quantizer info.
Currently the dequantizer (and elsewhere enquantizer) tables are expected to
be initialized as pointing to the storage reserved for them in the
oc_theora_state (resp. oc_enc_ctx) structure.
If some tables are duplicates of others, the pointers will be adjusted to
point to a single copy of the tables, but the storage for them will not be
freed.
If you're concerned about the memory footprint, the obvious thing to do is
to move the storage out of its fixed place in the structures and allocate
it on demand.
However, a much, much better option is to only store the quantization
matrices being used for the current frame, and to recalculate these as the
qi values change between frames (this is what VP3 did).*/
void oc_dequant_tables_init(ogg_uint16_t *_dequant[64][3][2],
int _pp_dc_scale[64],const th_quant_info *_qinfo){
/*Coding mode: intra or inter.*/
int qti;
/*Y', C_b, C_r*/
int pli;
for(qti=0;qti<2;qti++)for(pli=0;pli<3;pli++){
/*Quality index.*/
int qi;
/*Range iterator.*/
int qri;
for(qi=0,qri=0;qri<=_qinfo->qi_ranges[qti][pli].nranges;qri++){
th_quant_base base;
ogg_uint32_t q;
int qi_start;
int qi_end;
memcpy(base,_qinfo->qi_ranges[qti][pli].base_matrices[qri],
sizeof(base));
qi_start=qi;
if(qri==_qinfo->qi_ranges[qti][pli].nranges)qi_end=qi+1;
else qi_end=qi+_qinfo->qi_ranges[qti][pli].sizes[qri];
/*Iterate over quality indicies in this range.*/
for(;;){
ogg_uint32_t qfac;
int zzi;
int ci;
/*In the original VP3.2 code, the rounding offset and the size of the
dead zone around 0 were controlled by a "sharpness" parameter.
The size of our dead zone is now controlled by the per-coefficient
quality thresholds returned by our HVS module.
We round down from a more accurate value when the quality of the
reconstruction does not fall below our threshold and it saves bits.
Hence, all of that VP3.2 code is gone from here, and the remaining
floating point code has been implemented as equivalent integer code
with exact precision.*/
qfac=(ogg_uint32_t)_qinfo->dc_scale[qi]*base[0];
/*For postprocessing, not dequantization.*/
if(_pp_dc_scale!=NULL)_pp_dc_scale[qi]=(int)(qfac/160);
/*Scale DC the coefficient from the proper table.*/
q=(qfac/100)<<2;
q=OC_CLAMPI(OC_DC_QUANT_MIN[qti],q,OC_QUANT_MAX);
_dequant[qi][pli][qti][0]=(ogg_uint16_t)q;
/*Now scale AC coefficients from the proper table.*/
for(zzi=1;zzi<64;zzi++){
q=((ogg_uint32_t)_qinfo->ac_scale[qi]*base[OC_FZIG_ZAG[zzi]]/100)<<2;
q=OC_CLAMPI(OC_AC_QUANT_MIN[qti],q,OC_QUANT_MAX);
_dequant[qi][pli][qti][zzi]=(ogg_uint16_t)q;
}
/*If this is a duplicate of a previous matrix, use that instead.
This simple check helps us improve cache coherency later.*/
{
int dupe;
int qtj;
int plj;
dupe=0;
for(qtj=0;qtj<=qti;qtj++){
for(plj=0;plj<(qtj<qti?3:pli);plj++){
if(!memcmp(_dequant[qi][pli][qti],_dequant[qi][plj][qtj],
sizeof(oc_quant_table))){
dupe=1;
break;
}
}
if(dupe)break;
}
if(dupe)_dequant[qi][pli][qti]=_dequant[qi][plj][qtj];
}
if(++qi>=qi_end)break;
/*Interpolate the next base matrix.*/
for(ci=0;ci<64;ci++){
base[ci]=(unsigned char)(
(2*((qi_end-qi)*_qinfo->qi_ranges[qti][pli].base_matrices[qri][ci]+
(qi-qi_start)*_qinfo->qi_ranges[qti][pli].base_matrices[qri+1][ci])
+_qinfo->qi_ranges[qti][pli].sizes[qri])/
(2*_qinfo->qi_ranges[qti][pli].sizes[qri]));
}
}
}
}
}

View File

@ -1,33 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_quant_H)
# define _quant_H (1)
# include "theora/codec.h"
# include "ocintrin.h"
typedef ogg_uint16_t oc_quant_table[64];
/*Maximum scaled quantizer value.*/
#define OC_QUANT_MAX (1024<<2)
void oc_dequant_tables_init(ogg_uint16_t *_dequant[64][3][2],
int _pp_dc_scale[64],const th_quant_info *_qinfo);
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,552 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: internal.h 17337 2010-07-19 16:08:54Z tterribe $
********************************************************************/
#if !defined(_state_H)
# define _state_H (1)
# include "internal.h"
# include "huffman.h"
# include "quant.h"
/*A single quadrant of the map from a super block to fragment numbers.*/
typedef ptrdiff_t oc_sb_map_quad[4];
/*A map from a super block to fragment numbers.*/
typedef oc_sb_map_quad oc_sb_map[4];
/*A single plane of the map from a macro block to fragment numbers.*/
typedef ptrdiff_t oc_mb_map_plane[4];
/*A map from a macro block to fragment numbers.*/
typedef oc_mb_map_plane oc_mb_map[3];
/*A motion vector.*/
typedef ogg_int16_t oc_mv;
typedef struct oc_sb_flags oc_sb_flags;
typedef struct oc_border_info oc_border_info;
typedef struct oc_fragment oc_fragment;
typedef struct oc_fragment_plane oc_fragment_plane;
typedef struct oc_base_opt_vtable oc_base_opt_vtable;
typedef struct oc_base_opt_data oc_base_opt_data;
typedef struct oc_state_dispatch_vtable oc_state_dispatch_vtable;
typedef struct oc_theora_state oc_theora_state;
/*Shared accelerated functions.*/
# if defined(OC_X86_ASM)
# if defined(_MSC_VER)
# include "x86_vc/x86int.h"
# else
# include "x86/x86int.h"
# endif
# endif
# if defined(OC_ARM_ASM)
# include "arm/armint.h"
# endif
# if defined(OC_C64X_ASM)
# include "c64x/c64xint.h"
# endif
# if !defined(oc_state_accel_init)
# define oc_state_accel_init oc_state_accel_init_c
# endif
# if defined(OC_STATE_USE_VTABLE)
# if !defined(oc_frag_copy)
# define oc_frag_copy(_state,_dst,_src,_ystride) \
((*(_state)->opt_vtable.frag_copy)(_dst,_src,_ystride))
# endif
# if !defined(oc_frag_copy_list)
# define oc_frag_copy_list(_state,_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs) \
((*(_state)->opt_vtable.frag_copy_list)(_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs))
# endif
# if !defined(oc_frag_recon_intra)
# define oc_frag_recon_intra(_state,_dst,_dst_ystride,_residue) \
((*(_state)->opt_vtable.frag_recon_intra)(_dst,_dst_ystride,_residue))
# endif
# if !defined(oc_frag_recon_inter)
# define oc_frag_recon_inter(_state,_dst,_src,_ystride,_residue) \
((*(_state)->opt_vtable.frag_recon_inter)(_dst,_src,_ystride,_residue))
# endif
# if !defined(oc_frag_recon_inter2)
# define oc_frag_recon_inter2(_state,_dst,_src1,_src2,_ystride,_residue) \
((*(_state)->opt_vtable.frag_recon_inter2)(_dst, \
_src1,_src2,_ystride,_residue))
# endif
# if !defined(oc_idct8x8)
# define oc_idct8x8(_state,_y,_x,_last_zzi) \
((*(_state)->opt_vtable.idct8x8)(_y,_x,_last_zzi))
# endif
# if !defined(oc_state_frag_recon)
# define oc_state_frag_recon(_state,_fragi, \
_pli,_dct_coeffs,_last_zzi,_dc_quant) \
((*(_state)->opt_vtable.state_frag_recon)(_state,_fragi, \
_pli,_dct_coeffs,_last_zzi,_dc_quant))
# endif
# if !defined(oc_loop_filter_init)
# define oc_loop_filter_init(_state,_bv,_flimit) \
((*(_state)->opt_vtable.loop_filter_init)(_bv,_flimit))
# endif
# if !defined(oc_state_loop_filter_frag_rows)
# define oc_state_loop_filter_frag_rows(_state, \
_bv,_refi,_pli,_fragy0,_fragy_end) \
((*(_state)->opt_vtable.state_loop_filter_frag_rows)(_state, \
_bv,_refi,_pli,_fragy0,_fragy_end))
# endif
# if !defined(oc_restore_fpu)
# define oc_restore_fpu(_state) \
((*(_state)->opt_vtable.restore_fpu)())
# endif
# else
# if !defined(oc_frag_copy)
# define oc_frag_copy(_state,_dst,_src,_ystride) \
oc_frag_copy_c(_dst,_src,_ystride)
# endif
# if !defined(oc_frag_copy_list)
# define oc_frag_copy_list(_state,_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs) \
oc_frag_copy_list_c(_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs)
# endif
# if !defined(oc_frag_recon_intra)
# define oc_frag_recon_intra(_state,_dst,_dst_ystride,_residue) \
oc_frag_recon_intra_c(_dst,_dst_ystride,_residue)
# endif
# if !defined(oc_frag_recon_inter)
# define oc_frag_recon_inter(_state,_dst,_src,_ystride,_residue) \
oc_frag_recon_inter_c(_dst,_src,_ystride,_residue)
# endif
# if !defined(oc_frag_recon_inter2)
# define oc_frag_recon_inter2(_state,_dst,_src1,_src2,_ystride,_residue) \
oc_frag_recon_inter2_c(_dst,_src1,_src2,_ystride,_residue)
# endif
# if !defined(oc_idct8x8)
# define oc_idct8x8(_state,_y,_x,_last_zzi) oc_idct8x8_c(_y,_x,_last_zzi)
# endif
# if !defined(oc_state_frag_recon)
# define oc_state_frag_recon oc_state_frag_recon_c
# endif
# if !defined(oc_loop_filter_init)
# define oc_loop_filter_init(_state,_bv,_flimit) \
oc_loop_filter_init_c(_bv,_flimit)
# endif
# if !defined(oc_state_loop_filter_frag_rows)
# define oc_state_loop_filter_frag_rows oc_state_loop_filter_frag_rows_c
# endif
# if !defined(oc_restore_fpu)
# define oc_restore_fpu(_state) do{}while(0)
# endif
# endif
/*A keyframe.*/
# define OC_INTRA_FRAME (0)
/*A predicted frame.*/
# define OC_INTER_FRAME (1)
/*A frame of unknown type (frame type decision has not yet been made).*/
# define OC_UNKWN_FRAME (-1)
/*The amount of padding to add to the reconstructed frame buffers on all
sides.
This is used to allow unrestricted motion vectors without special casing.
This must be a multiple of 2.*/
# define OC_UMV_PADDING (16)
/*Frame classification indices.*/
/*The previous golden frame.*/
# define OC_FRAME_GOLD (0)
/*The previous frame.*/
# define OC_FRAME_PREV (1)
/*The current frame.*/
# define OC_FRAME_SELF (2)
/*Used to mark uncoded fragments (for DC prediction).*/
# define OC_FRAME_NONE (3)
/*The input or output buffer.*/
# define OC_FRAME_IO (3)
/*Uncompressed prev golden frame.*/
# define OC_FRAME_GOLD_ORIG (4)
/*Uncompressed previous frame. */
# define OC_FRAME_PREV_ORIG (5)
/*Macroblock modes.*/
/*Macro block is invalid: It is never coded.*/
# define OC_MODE_INVALID (-1)
/*Encoded difference from the same macro block in the previous frame.*/
# define OC_MODE_INTER_NOMV (0)
/*Encoded with no motion compensated prediction.*/
# define OC_MODE_INTRA (1)
/*Encoded difference from the previous frame offset by the given motion
vector.*/
# define OC_MODE_INTER_MV (2)
/*Encoded difference from the previous frame offset by the last coded motion
vector.*/
# define OC_MODE_INTER_MV_LAST (3)
/*Encoded difference from the previous frame offset by the second to last
coded motion vector.*/
# define OC_MODE_INTER_MV_LAST2 (4)
/*Encoded difference from the same macro block in the previous golden
frame.*/
# define OC_MODE_GOLDEN_NOMV (5)
/*Encoded difference from the previous golden frame offset by the given motion
vector.*/
# define OC_MODE_GOLDEN_MV (6)
/*Encoded difference from the previous frame offset by the individual motion
vectors given for each block.*/
# define OC_MODE_INTER_MV_FOUR (7)
/*The number of (coded) modes.*/
# define OC_NMODES (8)
/*Determines the reference frame used for a given MB mode.*/
# define OC_FRAME_FOR_MODE(_x) \
OC_UNIBBLE_TABLE32(OC_FRAME_PREV,OC_FRAME_SELF,OC_FRAME_PREV,OC_FRAME_PREV, \
OC_FRAME_PREV,OC_FRAME_GOLD,OC_FRAME_GOLD,OC_FRAME_PREV,(_x))
/*Constants for the packet state machine common between encoder and decoder.*/
/*Next packet to emit/read: Codec info header.*/
# define OC_PACKET_INFO_HDR (-3)
/*Next packet to emit/read: Comment header.*/
# define OC_PACKET_COMMENT_HDR (-2)
/*Next packet to emit/read: Codec setup header.*/
# define OC_PACKET_SETUP_HDR (-1)
/*No more packets to emit/read.*/
# define OC_PACKET_DONE (INT_MAX)
#define OC_MV(_x,_y) ((oc_mv)((_x)&0xFF|(_y)<<8))
#define OC_MV_X(_mv) ((signed char)(_mv))
#define OC_MV_Y(_mv) ((_mv)>>8)
#define OC_MV_ADD(_mv1,_mv2) \
OC_MV(OC_MV_X(_mv1)+OC_MV_X(_mv2), \
OC_MV_Y(_mv1)+OC_MV_Y(_mv2))
#define OC_MV_SUB(_mv1,_mv2) \
OC_MV(OC_MV_X(_mv1)-OC_MV_X(_mv2), \
OC_MV_Y(_mv1)-OC_MV_Y(_mv2))
/*Super blocks are 32x32 segments of pixels in a single color plane indexed
in image order.
Internally, super blocks are broken up into four quadrants, each of which
contains a 2x2 pattern of blocks, each of which is an 8x8 block of pixels.
Quadrants, and the blocks within them, are indexed in a special order called
a "Hilbert curve" within the super block.
In order to differentiate between the Hilbert-curve indexing strategy and
the regular image order indexing strategy, blocks indexed in image order
are called "fragments".
Fragments are indexed in image order, left to right, then bottom to top,
from Y' plane to Cb plane to Cr plane.
The co-located fragments in all image planes corresponding to the location
of a single quadrant of a luma plane super block form a macro block.
Thus there is only a single set of macro blocks for all planes, each of which
contains between 6 and 12 fragments, depending on the pixel format.
Therefore macro block information is kept in a separate set of arrays from
super blocks to avoid unused space in the other planes.
The lists are indexed in super block order.
That is, the macro block corresponding to the macro block mbi in (luma plane)
super block sbi is at index (sbi<<2|mbi).
Thus the number of macro blocks in each dimension is always twice the number
of super blocks, even when only an odd number fall inside the coded frame.
These "extra" macro blocks are just an artifact of our internal data layout,
and not part of the coded stream; they are flagged with a negative MB mode.*/
/*Super block information.*/
struct oc_sb_flags{
unsigned char coded_fully:1;
unsigned char coded_partially:1;
unsigned char quad_valid:4;
};
/*Information about a fragment which intersects the border of the displayable
region.
This marks which pixels belong to the displayable region.*/
struct oc_border_info{
/*A bit mask marking which pixels are in the displayable region.
Pixel (x,y) corresponds to bit (y<<3|x).*/
ogg_int64_t mask;
/*The number of pixels in the displayable region.
This is always positive, and always less than 64.*/
int npixels;
};
/*Fragment information.*/
struct oc_fragment{
/*A flag indicating whether or not this fragment is coded.*/
unsigned coded:1;
/*A flag indicating that this entire fragment lies outside the displayable
region of the frame.
Note the contrast with an invalid macro block, which is outside the coded
frame, not just the displayable one.
There are no fragments outside the coded frame by construction.*/
unsigned invalid:1;
/*The index of the quality index used for this fragment's AC coefficients.*/
unsigned qii:4;
/*The index of the reference frame this fragment is predicted from.*/
unsigned refi:2;
/*The mode of the macroblock this fragment belongs to.*/
unsigned mb_mode:3;
/*The index of the associated border information for fragments which lie
partially outside the displayable region.
For fragments completely inside or outside this region, this is -1.
Note that the C standard requires an explicit signed keyword for bitfield
types, since some compilers may treat them as unsigned without it.*/
signed int borderi:5;
/*The prediction-corrected DC component.
Note that the C standard requires an explicit signed keyword for bitfield
types, since some compilers may treat them as unsigned without it.*/
signed int dc:16;
};
/*A description of each fragment plane.*/
struct oc_fragment_plane{
/*The number of fragments in the horizontal direction.*/
int nhfrags;
/*The number of fragments in the vertical direction.*/
int nvfrags;
/*The offset of the first fragment in the plane.*/
ptrdiff_t froffset;
/*The total number of fragments in the plane.*/
ptrdiff_t nfrags;
/*The number of super blocks in the horizontal direction.*/
unsigned nhsbs;
/*The number of super blocks in the vertical direction.*/
unsigned nvsbs;
/*The offset of the first super block in the plane.*/
unsigned sboffset;
/*The total number of super blocks in the plane.*/
unsigned nsbs;
};
typedef void (*oc_state_loop_filter_frag_rows_func)(
const oc_theora_state *_state,signed char _bv[256],int _refi,int _pli,
int _fragy0,int _fragy_end);
/*The shared (encoder and decoder) functions that have accelerated variants.*/
struct oc_base_opt_vtable{
void (*frag_copy)(unsigned char *_dst,
const unsigned char *_src,int _ystride);
void (*frag_copy_list)(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void (*frag_recon_intra)(unsigned char *_dst,int _ystride,
const ogg_int16_t _residue[64]);
void (*frag_recon_inter)(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]);
void (*frag_recon_inter2)(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t _residue[64]);
void (*idct8x8)(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void (*state_frag_recon)(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void (*loop_filter_init)(signed char _bv[256],int _flimit);
oc_state_loop_filter_frag_rows_func state_loop_filter_frag_rows;
void (*restore_fpu)(void);
};
/*The shared (encoder and decoder) tables that vary according to which variants
of the above functions are used.*/
struct oc_base_opt_data{
const unsigned char *dct_fzig_zag;
};
/*State information common to both the encoder and decoder.*/
struct oc_theora_state{
/*The stream information.*/
th_info info;
# if defined(OC_STATE_USE_VTABLE)
/*Table for shared accelerated functions.*/
oc_base_opt_vtable opt_vtable;
# endif
/*Table for shared data used by accelerated functions.*/
oc_base_opt_data opt_data;
/*CPU flags to detect the presence of extended instruction sets.*/
ogg_uint32_t cpu_flags;
/*The fragment plane descriptions.*/
oc_fragment_plane fplanes[3];
/*The list of fragments, indexed in image order.*/
oc_fragment *frags;
/*The the offset into the reference frame buffer to the upper-left pixel of
each fragment.*/
ptrdiff_t *frag_buf_offs;
/*The motion vector for each fragment.*/
oc_mv *frag_mvs;
/*The total number of fragments in a single frame.*/
ptrdiff_t nfrags;
/*The list of super block maps, indexed in image order.*/
oc_sb_map *sb_maps;
/*The list of super block flags, indexed in image order.*/
oc_sb_flags *sb_flags;
/*The total number of super blocks in a single frame.*/
unsigned nsbs;
/*The fragments from each color plane that belong to each macro block.
Fragments are stored in image order (left to right then top to bottom).
When chroma components are decimated, the extra fragments have an index of
-1.*/
oc_mb_map *mb_maps;
/*The list of macro block modes.
A negative number indicates the macro block lies entirely outside the
coded frame.*/
signed char *mb_modes;
/*The number of macro blocks in the X direction.*/
unsigned nhmbs;
/*The number of macro blocks in the Y direction.*/
unsigned nvmbs;
/*The total number of macro blocks.*/
size_t nmbs;
/*The list of coded fragments, in coded order.
Uncoded fragments are stored in reverse order from the end of the list.*/
ptrdiff_t *coded_fragis;
/*The number of coded fragments in each plane.*/
ptrdiff_t ncoded_fragis[3];
/*The total number of coded fragments.*/
ptrdiff_t ntotal_coded_fragis;
/*The actual buffers used for the reference frames.*/
th_ycbcr_buffer ref_frame_bufs[6];
/*The index of the buffers being used for each OC_FRAME_* reference frame.*/
int ref_frame_idx[6];
/*The storage for the reference frame buffers.
This is just ref_frame_bufs[ref_frame_idx[i]][0].data, but is cached here
for faster look-up.*/
unsigned char *ref_frame_data[6];
/*The handle used to allocate the reference frame buffers.*/
unsigned char *ref_frame_handle;
/*The strides for each plane in the reference frames.*/
int ref_ystride[3];
/*The number of unique border patterns.*/
int nborders;
/*The unique border patterns for all border fragments.
The borderi field of fragments which straddle the border indexes this
list.*/
oc_border_info borders[16];
/*The frame number of the last keyframe.*/
ogg_int64_t keyframe_num;
/*The frame number of the current frame.*/
ogg_int64_t curframe_num;
/*The granpos of the current frame.*/
ogg_int64_t granpos;
/*The type of the current frame.*/
signed char frame_type;
/*The bias to add to the frame count when computing granule positions.*/
unsigned char granpos_bias;
/*The number of quality indices used in the current frame.*/
unsigned char nqis;
/*The quality indices of the current frame.*/
unsigned char qis[3];
/*The dequantization tables, stored in zig-zag order, and indexed by
qi, pli, qti, and zzi.*/
ogg_uint16_t *dequant_tables[64][3][2];
OC_ALIGN16(oc_quant_table dequant_table_data[64][3][2]);
/*Loop filter strength parameters.*/
unsigned char loop_filter_limits[64];
};
/*The function type used to fill in the chroma plane motion vectors for a
macro block when 4 different motion vectors are specified in the luma
plane.
_cbmvs: The chroma block-level motion vectors to fill in.
_lmbmv: The luma macro-block level motion vector to fill in for use in
prediction.
_lbmvs: The luma block-level motion vectors.*/
typedef void (*oc_set_chroma_mvs_func)(oc_mv _cbmvs[4],const oc_mv _lbmvs[4]);
/*A table of functions used to fill in the Cb,Cr plane motion vectors for a
macro block when 4 different motion vectors are specified in the luma
plane.*/
extern const oc_set_chroma_mvs_func OC_SET_CHROMA_MVS_TABLE[TH_PF_NFORMATS];
int oc_state_init(oc_theora_state *_state,const th_info *_info,int _nrefs);
void oc_state_clear(oc_theora_state *_state);
void oc_state_accel_init_c(oc_theora_state *_state);
void oc_state_borders_fill_rows(oc_theora_state *_state,int _refi,int _pli,
int _y0,int _yend);
void oc_state_borders_fill_caps(oc_theora_state *_state,int _refi,int _pli);
void oc_state_borders_fill(oc_theora_state *_state,int _refi);
void oc_state_fill_buffer_ptrs(oc_theora_state *_state,int _buf_idx,
th_ycbcr_buffer _img);
int oc_state_mbi_for_pos(oc_theora_state *_state,int _mbx,int _mby);
int oc_state_get_mv_offsets(const oc_theora_state *_state,int _offsets[2],
int _pli,oc_mv _mv);
void oc_loop_filter_init_c(signed char _bv[256],int _flimit);
void oc_state_loop_filter(oc_theora_state *_state,int _frame);
# if defined(OC_DUMP_IMAGES)
int oc_state_dump_frame(const oc_theora_state *_state,int _frame,
const char *_suf);
# endif
/*Default pure-C implementations of shared accelerated functions.*/
void oc_frag_copy_c(unsigned char *_dst,
const unsigned char *_src,int _src_ystride);
void oc_frag_copy_list_c(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void oc_frag_recon_intra_c(unsigned char *_dst,int _dst_ystride,
const ogg_int16_t _residue[64]);
void oc_frag_recon_inter_c(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]);
void oc_frag_recon_inter2_c(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t _residue[64]);
void oc_idct8x8_c(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_c(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_state_loop_filter_frag_rows_c(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end);
void oc_restore_fpu_c(void);
/*We need a way to call a few encoder functions without introducing a link-time
dependency into the decoder, while still allowing the old alpha API which
does not distinguish between encoder and decoder objects to be used.
We do this by placing a function table at the start of the encoder object
which can dispatch into the encoder library.
We do a similar thing for the decoder in case we ever decide to split off a
common base library.*/
typedef void (*oc_state_clear_func)(theora_state *_th);
typedef int (*oc_state_control_func)(theora_state *th,int _req,
void *_buf,size_t _buf_sz);
typedef ogg_int64_t (*oc_state_granule_frame_func)(theora_state *_th,
ogg_int64_t _granulepos);
typedef double (*oc_state_granule_time_func)(theora_state *_th,
ogg_int64_t _granulepos);
struct oc_state_dispatch_vtable{
oc_state_clear_func clear;
oc_state_control_func control;
oc_state_granule_frame_func granule_frame;
oc_state_granule_time_func granule_time;
};
#endif

View File

@ -1,368 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of fragment reconstruction for motion compensation.
Originally written by Rudolf Marek.
Additional optimization by Nils Pipenbrinck.
Note: Loops are unrolled for best performance.
The iteration each instruction belongs to is marked in the comments as #i.*/
#include <stddef.h>
#include "x86int.h"
#if defined(OC_X86_ASM)
/*Copies an 8x8 block of pixels from _src to _dst, assuming _ystride bytes
between rows.*/
# define OC_FRAG_COPY_MMX(_dst,_src,_ystride) \
do{ \
const unsigned char *src; \
unsigned char *dst; \
ptrdiff_t ystride3; \
src=(_src); \
dst=(_dst); \
__asm__ __volatile__( \
/*src+0*ystride*/ \
"movq (%[src]),%%mm0\n\t" \
/*src+1*ystride*/ \
"movq (%[src],%[ystride]),%%mm1\n\t" \
/*ystride3=ystride*3*/ \
"lea (%[ystride],%[ystride],2),%[ystride3]\n\t" \
/*src+2*ystride*/ \
"movq (%[src],%[ystride],2),%%mm2\n\t" \
/*src+3*ystride*/ \
"movq (%[src],%[ystride3]),%%mm3\n\t" \
/*dst+0*ystride*/ \
"movq %%mm0,(%[dst])\n\t" \
/*dst+1*ystride*/ \
"movq %%mm1,(%[dst],%[ystride])\n\t" \
/*Pointer to next 4.*/ \
"lea (%[src],%[ystride],4),%[src]\n\t" \
/*dst+2*ystride*/ \
"movq %%mm2,(%[dst],%[ystride],2)\n\t" \
/*dst+3*ystride*/ \
"movq %%mm3,(%[dst],%[ystride3])\n\t" \
/*Pointer to next 4.*/ \
"lea (%[dst],%[ystride],4),%[dst]\n\t" \
/*src+0*ystride*/ \
"movq (%[src]),%%mm0\n\t" \
/*src+1*ystride*/ \
"movq (%[src],%[ystride]),%%mm1\n\t" \
/*src+2*ystride*/ \
"movq (%[src],%[ystride],2),%%mm2\n\t" \
/*src+3*ystride*/ \
"movq (%[src],%[ystride3]),%%mm3\n\t" \
/*dst+0*ystride*/ \
"movq %%mm0,(%[dst])\n\t" \
/*dst+1*ystride*/ \
"movq %%mm1,(%[dst],%[ystride])\n\t" \
/*dst+2*ystride*/ \
"movq %%mm2,(%[dst],%[ystride],2)\n\t" \
/*dst+3*ystride*/ \
"movq %%mm3,(%[dst],%[ystride3])\n\t" \
:[dst]"+r"(dst),[src]"+r"(src),[ystride3]"=&r"(ystride3) \
:[ystride]"r"((ptrdiff_t)(_ystride)) \
:"memory" \
); \
} \
while(0)
/*Copies an 8x8 block of pixels from _src to _dst, assuming _ystride bytes
between rows.*/
void oc_frag_copy_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride){
OC_FRAG_COPY_MMX(_dst,_src,_ystride);
}
/*Copies the fragments specified by the lists of fragment indices from one
frame to another.
_dst_frame: The reference frame to copy to.
_src_frame: The reference frame to copy from.
_ystride: The row stride of the reference frames.
_fragis: A pointer to a list of fragment indices.
_nfragis: The number of fragment indices to copy.
_frag_buf_offs: The offsets of fragments in the reference frames.*/
void oc_frag_copy_list_mmx(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs){
ptrdiff_t fragii;
for(fragii=0;fragii<_nfragis;fragii++){
ptrdiff_t frag_buf_off;
frag_buf_off=_frag_buf_offs[_fragis[fragii]];
OC_FRAG_COPY_MMX(_dst_frame+frag_buf_off,
_src_frame+frag_buf_off,_ystride);
}
}
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue){
__asm__ __volatile__(
/*Set mm0 to 0xFFFFFFFFFFFFFFFF.*/
"pcmpeqw %%mm0,%%mm0\n\t"
/*#0 Load low residue.*/
"movq 0*8(%[residue]),%%mm1\n\t"
/*#0 Load high residue.*/
"movq 1*8(%[residue]),%%mm2\n\t"
/*Set mm0 to 0x8000800080008000.*/
"psllw $15,%%mm0\n\t"
/*#1 Load low residue.*/
"movq 2*8(%[residue]),%%mm3\n\t"
/*#1 Load high residue.*/
"movq 3*8(%[residue]),%%mm4\n\t"
/*Set mm0 to 0x0080008000800080.*/
"psrlw $8,%%mm0\n\t"
/*#2 Load low residue.*/
"movq 4*8(%[residue]),%%mm5\n\t"
/*#2 Load high residue.*/
"movq 5*8(%[residue]),%%mm6\n\t"
/*#0 Bias low residue.*/
"paddsw %%mm0,%%mm1\n\t"
/*#0 Bias high residue.*/
"paddsw %%mm0,%%mm2\n\t"
/*#0 Pack to byte.*/
"packuswb %%mm2,%%mm1\n\t"
/*#1 Bias low residue.*/
"paddsw %%mm0,%%mm3\n\t"
/*#1 Bias high residue.*/
"paddsw %%mm0,%%mm4\n\t"
/*#1 Pack to byte.*/
"packuswb %%mm4,%%mm3\n\t"
/*#2 Bias low residue.*/
"paddsw %%mm0,%%mm5\n\t"
/*#2 Bias high residue.*/
"paddsw %%mm0,%%mm6\n\t"
/*#2 Pack to byte.*/
"packuswb %%mm6,%%mm5\n\t"
/*#0 Write row.*/
"movq %%mm1,(%[dst])\n\t"
/*#1 Write row.*/
"movq %%mm3,(%[dst],%[ystride])\n\t"
/*#2 Write row.*/
"movq %%mm5,(%[dst],%[ystride],2)\n\t"
/*#3 Load low residue.*/
"movq 6*8(%[residue]),%%mm1\n\t"
/*#3 Load high residue.*/
"movq 7*8(%[residue]),%%mm2\n\t"
/*#4 Load high residue.*/
"movq 8*8(%[residue]),%%mm3\n\t"
/*#4 Load high residue.*/
"movq 9*8(%[residue]),%%mm4\n\t"
/*#5 Load high residue.*/
"movq 10*8(%[residue]),%%mm5\n\t"
/*#5 Load high residue.*/
"movq 11*8(%[residue]),%%mm6\n\t"
/*#3 Bias low residue.*/
"paddsw %%mm0,%%mm1\n\t"
/*#3 Bias high residue.*/
"paddsw %%mm0,%%mm2\n\t"
/*#3 Pack to byte.*/
"packuswb %%mm2,%%mm1\n\t"
/*#4 Bias low residue.*/
"paddsw %%mm0,%%mm3\n\t"
/*#4 Bias high residue.*/
"paddsw %%mm0,%%mm4\n\t"
/*#4 Pack to byte.*/
"packuswb %%mm4,%%mm3\n\t"
/*#5 Bias low residue.*/
"paddsw %%mm0,%%mm5\n\t"
/*#5 Bias high residue.*/
"paddsw %%mm0,%%mm6\n\t"
/*#5 Pack to byte.*/
"packuswb %%mm6,%%mm5\n\t"
/*#3 Write row.*/
"movq %%mm1,(%[dst],%[ystride3])\n\t"
/*#4 Write row.*/
"movq %%mm3,(%[dst4])\n\t"
/*#5 Write row.*/
"movq %%mm5,(%[dst4],%[ystride])\n\t"
/*#6 Load low residue.*/
"movq 12*8(%[residue]),%%mm1\n\t"
/*#6 Load high residue.*/
"movq 13*8(%[residue]),%%mm2\n\t"
/*#7 Load low residue.*/
"movq 14*8(%[residue]),%%mm3\n\t"
/*#7 Load high residue.*/
"movq 15*8(%[residue]),%%mm4\n\t"
/*#6 Bias low residue.*/
"paddsw %%mm0,%%mm1\n\t"
/*#6 Bias high residue.*/
"paddsw %%mm0,%%mm2\n\t"
/*#6 Pack to byte.*/
"packuswb %%mm2,%%mm1\n\t"
/*#7 Bias low residue.*/
"paddsw %%mm0,%%mm3\n\t"
/*#7 Bias high residue.*/
"paddsw %%mm0,%%mm4\n\t"
/*#7 Pack to byte.*/
"packuswb %%mm4,%%mm3\n\t"
/*#6 Write row.*/
"movq %%mm1,(%[dst4],%[ystride],2)\n\t"
/*#7 Write row.*/
"movq %%mm3,(%[dst4],%[ystride3])\n\t"
:
:[residue]"r"(_residue),
[dst]"r"(_dst),
[dst4]"r"(_dst+(_ystride<<2)),
[ystride]"r"((ptrdiff_t)_ystride),
[ystride3]"r"((ptrdiff_t)_ystride*3)
:"memory"
);
}
void oc_frag_recon_inter_mmx(unsigned char *_dst,const unsigned char *_src,
int _ystride,const ogg_int16_t *_residue){
int i;
/*Zero mm0.*/
__asm__ __volatile__("pxor %%mm0,%%mm0\n\t"::);
for(i=4;i-->0;){
__asm__ __volatile__(
/*#0 Load source.*/
"movq (%[src]),%%mm3\n\t"
/*#1 Load source.*/
"movq (%[src],%[ystride]),%%mm7\n\t"
/*#0 Get copy of src.*/
"movq %%mm3,%%mm4\n\t"
/*#0 Expand high source.*/
"punpckhbw %%mm0,%%mm4\n\t"
/*#0 Expand low source.*/
"punpcklbw %%mm0,%%mm3\n\t"
/*#0 Add residue high.*/
"paddsw 8(%[residue]),%%mm4\n\t"
/*#1 Get copy of src.*/
"movq %%mm7,%%mm2\n\t"
/*#0 Add residue low.*/
"paddsw (%[residue]), %%mm3\n\t"
/*#1 Expand high source.*/
"punpckhbw %%mm0,%%mm2\n\t"
/*#0 Pack final row pixels.*/
"packuswb %%mm4,%%mm3\n\t"
/*#1 Expand low source.*/
"punpcklbw %%mm0,%%mm7\n\t"
/*#1 Add residue low.*/
"paddsw 16(%[residue]),%%mm7\n\t"
/*#1 Add residue high.*/
"paddsw 24(%[residue]),%%mm2\n\t"
/*Advance residue.*/
"lea 32(%[residue]),%[residue]\n\t"
/*#1 Pack final row pixels.*/
"packuswb %%mm2,%%mm7\n\t"
/*Advance src.*/
"lea (%[src],%[ystride],2),%[src]\n\t"
/*#0 Write row.*/
"movq %%mm3,(%[dst])\n\t"
/*#1 Write row.*/
"movq %%mm7,(%[dst],%[ystride])\n\t"
/*Advance dst.*/
"lea (%[dst],%[ystride],2),%[dst]\n\t"
:[residue]"+r"(_residue),[dst]"+r"(_dst),[src]"+r"(_src)
:[ystride]"r"((ptrdiff_t)_ystride)
:"memory"
);
}
}
void oc_frag_recon_inter2_mmx(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue){
int i;
/*Zero mm7.*/
__asm__ __volatile__("pxor %%mm7,%%mm7\n\t"::);
for(i=4;i-->0;){
__asm__ __volatile__(
/*#0 Load src1.*/
"movq (%[src1]),%%mm0\n\t"
/*#0 Load src2.*/
"movq (%[src2]),%%mm2\n\t"
/*#0 Copy src1.*/
"movq %%mm0,%%mm1\n\t"
/*#0 Copy src2.*/
"movq %%mm2,%%mm3\n\t"
/*#1 Load src1.*/
"movq (%[src1],%[ystride]),%%mm4\n\t"
/*#0 Unpack lower src1.*/
"punpcklbw %%mm7,%%mm0\n\t"
/*#1 Load src2.*/
"movq (%[src2],%[ystride]),%%mm5\n\t"
/*#0 Unpack higher src1.*/
"punpckhbw %%mm7,%%mm1\n\t"
/*#0 Unpack lower src2.*/
"punpcklbw %%mm7,%%mm2\n\t"
/*#0 Unpack higher src2.*/
"punpckhbw %%mm7,%%mm3\n\t"
/*Advance src1 ptr.*/
"lea (%[src1],%[ystride],2),%[src1]\n\t"
/*Advance src2 ptr.*/
"lea (%[src2],%[ystride],2),%[src2]\n\t"
/*#0 Lower src1+src2.*/
"paddsw %%mm2,%%mm0\n\t"
/*#0 Higher src1+src2.*/
"paddsw %%mm3,%%mm1\n\t"
/*#1 Copy src1.*/
"movq %%mm4,%%mm2\n\t"
/*#0 Build lo average.*/
"psraw $1,%%mm0\n\t"
/*#1 Copy src2.*/
"movq %%mm5,%%mm3\n\t"
/*#1 Unpack lower src1.*/
"punpcklbw %%mm7,%%mm4\n\t"
/*#0 Build hi average.*/
"psraw $1,%%mm1\n\t"
/*#1 Unpack higher src1.*/
"punpckhbw %%mm7,%%mm2\n\t"
/*#0 low+=residue.*/
"paddsw (%[residue]),%%mm0\n\t"
/*#1 Unpack lower src2.*/
"punpcklbw %%mm7,%%mm5\n\t"
/*#0 high+=residue.*/
"paddsw 8(%[residue]),%%mm1\n\t"
/*#1 Unpack higher src2.*/
"punpckhbw %%mm7,%%mm3\n\t"
/*#1 Lower src1+src2.*/
"paddsw %%mm4,%%mm5\n\t"
/*#0 Pack and saturate.*/
"packuswb %%mm1,%%mm0\n\t"
/*#1 Higher src1+src2.*/
"paddsw %%mm2,%%mm3\n\t"
/*#0 Write row.*/
"movq %%mm0,(%[dst])\n\t"
/*#1 Build lo average.*/
"psraw $1,%%mm5\n\t"
/*#1 Build hi average.*/
"psraw $1,%%mm3\n\t"
/*#1 low+=residue.*/
"paddsw 16(%[residue]),%%mm5\n\t"
/*#1 high+=residue.*/
"paddsw 24(%[residue]),%%mm3\n\t"
/*#1 Pack and saturate.*/
"packuswb %%mm3,%%mm5\n\t"
/*#1 Write row ptr.*/
"movq %%mm5,(%[dst],%[ystride])\n\t"
/*Advance residue ptr.*/
"add $32,%[residue]\n\t"
/*Advance dest ptr.*/
"lea (%[dst],%[ystride],2),%[dst]\n\t"
:[dst]"+r"(_dst),[residue]"+r"(_residue),
[src1]"+r"(_src1),[src2]"+r"(_src2)
:[ystride]"r"((ptrdiff_t)_ystride)
:"memory"
);
}
}
void oc_restore_fpu_mmx(void){
__asm__ __volatile__("emms\n\t");
}
#endif

View File

@ -1,558 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of Theora's iDCT.
Originally written by Rudolf Marek, based on code from On2's VP3.*/
#include "x86int.h"
#include "../dct.h"
#if defined(OC_X86_ASM)
/*These are offsets into the table of constants below.*/
/*7 rows of cosines, in order: pi/16 * (1 ... 7).*/
#define OC_COSINE_OFFSET (0)
/*A row of 8's.*/
#define OC_EIGHT_OFFSET (56)
/*38 cycles*/
#define OC_IDCT_BEGIN(_y,_x) \
"#OC_IDCT_BEGIN\n\t" \
"movq "OC_I(3,_x)",%%mm2\n\t" \
"movq "OC_MEM_OFFS(0x30,c)",%%mm6\n\t" \
"movq %%mm2,%%mm4\n\t" \
"movq "OC_J(5,_x)",%%mm7\n\t" \
"pmulhw %%mm6,%%mm4\n\t" \
"movq "OC_MEM_OFFS(0x50,c)",%%mm1\n\t" \
"pmulhw %%mm7,%%mm6\n\t" \
"movq %%mm1,%%mm5\n\t" \
"pmulhw %%mm2,%%mm1\n\t" \
"movq "OC_I(1,_x)",%%mm3\n\t" \
"pmulhw %%mm7,%%mm5\n\t" \
"movq "OC_MEM_OFFS(0x10,c)",%%mm0\n\t" \
"paddw %%mm2,%%mm4\n\t" \
"paddw %%mm7,%%mm6\n\t" \
"paddw %%mm1,%%mm2\n\t" \
"movq "OC_J(7,_x)",%%mm1\n\t" \
"paddw %%mm5,%%mm7\n\t" \
"movq %%mm0,%%mm5\n\t" \
"pmulhw %%mm3,%%mm0\n\t" \
"paddw %%mm7,%%mm4\n\t" \
"pmulhw %%mm1,%%mm5\n\t" \
"movq "OC_MEM_OFFS(0x70,c)",%%mm7\n\t" \
"psubw %%mm2,%%mm6\n\t" \
"paddw %%mm3,%%mm0\n\t" \
"pmulhw %%mm7,%%mm3\n\t" \
"movq "OC_I(2,_x)",%%mm2\n\t" \
"pmulhw %%mm1,%%mm7\n\t" \
"paddw %%mm1,%%mm5\n\t" \
"movq %%mm2,%%mm1\n\t" \
"pmulhw "OC_MEM_OFFS(0x20,c)",%%mm2\n\t" \
"psubw %%mm5,%%mm3\n\t" \
"movq "OC_J(6,_x)",%%mm5\n\t" \
"paddw %%mm7,%%mm0\n\t" \
"movq %%mm5,%%mm7\n\t" \
"psubw %%mm4,%%mm0\n\t" \
"pmulhw "OC_MEM_OFFS(0x20,c)",%%mm5\n\t" \
"paddw %%mm1,%%mm2\n\t" \
"pmulhw "OC_MEM_OFFS(0x60,c)",%%mm1\n\t" \
"paddw %%mm4,%%mm4\n\t" \
"paddw %%mm0,%%mm4\n\t" \
"psubw %%mm6,%%mm3\n\t" \
"paddw %%mm7,%%mm5\n\t" \
"paddw %%mm6,%%mm6\n\t" \
"pmulhw "OC_MEM_OFFS(0x60,c)",%%mm7\n\t" \
"paddw %%mm3,%%mm6\n\t" \
"movq %%mm4,"OC_I(1,_y)"\n\t" \
"psubw %%mm5,%%mm1\n\t" \
"movq "OC_MEM_OFFS(0x40,c)",%%mm4\n\t" \
"movq %%mm3,%%mm5\n\t" \
"pmulhw %%mm4,%%mm3\n\t" \
"paddw %%mm2,%%mm7\n\t" \
"movq %%mm6,"OC_I(2,_y)"\n\t" \
"movq %%mm0,%%mm2\n\t" \
"movq "OC_I(0,_x)",%%mm6\n\t" \
"pmulhw %%mm4,%%mm0\n\t" \
"paddw %%mm3,%%mm5\n\t" \
"movq "OC_J(4,_x)",%%mm3\n\t" \
"psubw %%mm1,%%mm5\n\t" \
"paddw %%mm0,%%mm2\n\t" \
"psubw %%mm3,%%mm6\n\t" \
"movq %%mm6,%%mm0\n\t" \
"pmulhw %%mm4,%%mm6\n\t" \
"paddw %%mm3,%%mm3\n\t" \
"paddw %%mm1,%%mm1\n\t" \
"paddw %%mm0,%%mm3\n\t" \
"paddw %%mm5,%%mm1\n\t" \
"pmulhw %%mm3,%%mm4\n\t" \
"paddw %%mm0,%%mm6\n\t" \
"psubw %%mm2,%%mm6\n\t" \
"paddw %%mm2,%%mm2\n\t" \
"movq "OC_I(1,_y)",%%mm0\n\t" \
"paddw %%mm6,%%mm2\n\t" \
"paddw %%mm3,%%mm4\n\t" \
"psubw %%mm1,%%mm2\n\t" \
"#end OC_IDCT_BEGIN\n\t" \
/*38+8=46 cycles.*/
#define OC_ROW_IDCT(_y,_x) \
"#OC_ROW_IDCT\n" \
OC_IDCT_BEGIN(_y,_x) \
/*r3=D'*/ \
"movq "OC_I(2,_y)",%%mm3\n\t" \
/*r4=E'=E-G*/ \
"psubw %%mm7,%%mm4\n\t" \
/*r1=H'+H'*/ \
"paddw %%mm1,%%mm1\n\t" \
/*r7=G+G*/ \
"paddw %%mm7,%%mm7\n\t" \
/*r1=R1=A''+H'*/ \
"paddw %%mm2,%%mm1\n\t" \
/*r7=G'=E+G*/ \
"paddw %%mm4,%%mm7\n\t" \
/*r4=R4=E'-D'*/ \
"psubw %%mm3,%%mm4\n\t" \
"paddw %%mm3,%%mm3\n\t" \
/*r6=R6=F'-B''*/ \
"psubw %%mm5,%%mm6\n\t" \
"paddw %%mm5,%%mm5\n\t" \
/*r3=R3=E'+D'*/ \
"paddw %%mm4,%%mm3\n\t" \
/*r5=R5=F'+B''*/ \
"paddw %%mm6,%%mm5\n\t" \
/*r7=R7=G'-C'*/ \
"psubw %%mm0,%%mm7\n\t" \
"paddw %%mm0,%%mm0\n\t" \
/*Save R1.*/ \
"movq %%mm1,"OC_I(1,_y)"\n\t" \
/*r0=R0=G.+C.*/ \
"paddw %%mm7,%%mm0\n\t" \
"#end OC_ROW_IDCT\n\t" \
/*The following macro does two 4x4 transposes in place.
At entry, we assume:
r0 = a3 a2 a1 a0
I(1) = b3 b2 b1 b0
r2 = c3 c2 c1 c0
r3 = d3 d2 d1 d0
r4 = e3 e2 e1 e0
r5 = f3 f2 f1 f0
r6 = g3 g2 g1 g0
r7 = h3 h2 h1 h0
At exit, we have:
I(0) = d0 c0 b0 a0
I(1) = d1 c1 b1 a1
I(2) = d2 c2 b2 a2
I(3) = d3 c3 b3 a3
J(4) = h0 g0 f0 e0
J(5) = h1 g1 f1 e1
J(6) = h2 g2 f2 e2
J(7) = h3 g3 f3 e3
I(0) I(1) I(2) I(3) is the transpose of r0 I(1) r2 r3.
J(4) J(5) J(6) J(7) is the transpose of r4 r5 r6 r7.
Since r1 is free at entry, we calculate the Js first.*/
/*19 cycles.*/
#define OC_TRANSPOSE(_y) \
"#OC_TRANSPOSE\n\t" \
"movq %%mm4,%%mm1\n\t" \
"punpcklwd %%mm5,%%mm4\n\t" \
"movq %%mm0,"OC_I(0,_y)"\n\t" \
"punpckhwd %%mm5,%%mm1\n\t" \
"movq %%mm6,%%mm0\n\t" \
"punpcklwd %%mm7,%%mm6\n\t" \
"movq %%mm4,%%mm5\n\t" \
"punpckldq %%mm6,%%mm4\n\t" \
"punpckhdq %%mm6,%%mm5\n\t" \
"movq %%mm1,%%mm6\n\t" \
"movq %%mm4,"OC_J(4,_y)"\n\t" \
"punpckhwd %%mm7,%%mm0\n\t" \
"movq %%mm5,"OC_J(5,_y)"\n\t" \
"punpckhdq %%mm0,%%mm6\n\t" \
"movq "OC_I(0,_y)",%%mm4\n\t" \
"punpckldq %%mm0,%%mm1\n\t" \
"movq "OC_I(1,_y)",%%mm5\n\t" \
"movq %%mm4,%%mm0\n\t" \
"movq %%mm6,"OC_J(7,_y)"\n\t" \
"punpcklwd %%mm5,%%mm0\n\t" \
"movq %%mm1,"OC_J(6,_y)"\n\t" \
"punpckhwd %%mm5,%%mm4\n\t" \
"movq %%mm2,%%mm5\n\t" \
"punpcklwd %%mm3,%%mm2\n\t" \
"movq %%mm0,%%mm1\n\t" \
"punpckldq %%mm2,%%mm0\n\t" \
"punpckhdq %%mm2,%%mm1\n\t" \
"movq %%mm4,%%mm2\n\t" \
"movq %%mm0,"OC_I(0,_y)"\n\t" \
"punpckhwd %%mm3,%%mm5\n\t" \
"movq %%mm1,"OC_I(1,_y)"\n\t" \
"punpckhdq %%mm5,%%mm4\n\t" \
"punpckldq %%mm5,%%mm2\n\t" \
"movq %%mm4,"OC_I(3,_y)"\n\t" \
"movq %%mm2,"OC_I(2,_y)"\n\t" \
"#end OC_TRANSPOSE\n\t" \
/*38+19=57 cycles.*/
#define OC_COLUMN_IDCT(_y) \
"#OC_COLUMN_IDCT\n" \
OC_IDCT_BEGIN(_y,_y) \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm2\n\t" \
/*r1=H'+H'*/ \
"paddw %%mm1,%%mm1\n\t" \
/*r1=R1=A''+H'*/ \
"paddw %%mm2,%%mm1\n\t" \
/*r2=NR2*/ \
"psraw $4,%%mm2\n\t" \
/*r4=E'=E-G*/ \
"psubw %%mm7,%%mm4\n\t" \
/*r1=NR1*/ \
"psraw $4,%%mm1\n\t" \
/*r3=D'*/ \
"movq "OC_I(2,_y)",%%mm3\n\t" \
/*r7=G+G*/ \
"paddw %%mm7,%%mm7\n\t" \
/*Store NR2 at I(2).*/ \
"movq %%mm2,"OC_I(2,_y)"\n\t" \
/*r7=G'=E+G*/ \
"paddw %%mm4,%%mm7\n\t" \
/*Store NR1 at I(1).*/ \
"movq %%mm1,"OC_I(1,_y)"\n\t" \
/*r4=R4=E'-D'*/ \
"psubw %%mm3,%%mm4\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm4\n\t" \
/*r3=D'+D'*/ \
"paddw %%mm3,%%mm3\n\t" \
/*r3=R3=E'+D'*/ \
"paddw %%mm4,%%mm3\n\t" \
/*r4=NR4*/ \
"psraw $4,%%mm4\n\t" \
/*r6=R6=F'-B''*/ \
"psubw %%mm5,%%mm6\n\t" \
/*r3=NR3*/ \
"psraw $4,%%mm3\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm6\n\t" \
/*r5=B''+B''*/ \
"paddw %%mm5,%%mm5\n\t" \
/*r5=R5=F'+B''*/ \
"paddw %%mm6,%%mm5\n\t" \
/*r6=NR6*/ \
"psraw $4,%%mm6\n\t" \
/*Store NR4 at J(4).*/ \
"movq %%mm4,"OC_J(4,_y)"\n\t" \
/*r5=NR5*/ \
"psraw $4,%%mm5\n\t" \
/*Store NR3 at I(3).*/ \
"movq %%mm3,"OC_I(3,_y)"\n\t" \
/*r7=R7=G'-C'*/ \
"psubw %%mm0,%%mm7\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm7\n\t" \
/*r0=C'+C'*/ \
"paddw %%mm0,%%mm0\n\t" \
/*r0=R0=G'+C'*/ \
"paddw %%mm7,%%mm0\n\t" \
/*r7=NR7*/ \
"psraw $4,%%mm7\n\t" \
/*Store NR6 at J(6).*/ \
"movq %%mm6,"OC_J(6,_y)"\n\t" \
/*r0=NR0*/ \
"psraw $4,%%mm0\n\t" \
/*Store NR5 at J(5).*/ \
"movq %%mm5,"OC_J(5,_y)"\n\t" \
/*Store NR7 at J(7).*/ \
"movq %%mm7,"OC_J(7,_y)"\n\t" \
/*Store NR0 at I(0).*/ \
"movq %%mm0,"OC_I(0,_y)"\n\t" \
"#end OC_COLUMN_IDCT\n\t" \
static void oc_idct8x8_slow_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64]){
int i;
/*This routine accepts an 8x8 matrix, but in partially transposed form.
Every 4x4 block is transposed.*/
__asm__ __volatile__(
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16,_y)
#define OC_J(_k,_y) OC_MEM_OFFS(((_k)-4)*16+8,_y)
OC_ROW_IDCT(y,x)
OC_TRANSPOSE(y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16+64,_y)
#define OC_J(_k,_y) OC_MEM_OFFS(((_k)-4)*16+72,_y)
OC_ROW_IDCT(y,x)
OC_TRANSPOSE(y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16,_y)
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT(y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16+8,_y)
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT(y)
#undef OC_I
#undef OC_J
:[y]"=m"OC_ARRAY_OPERAND(ogg_int16_t,_y,64)
:[x]"m"OC_CONST_ARRAY_OPERAND(ogg_int16_t,_x,64),
[c]"m"OC_CONST_ARRAY_OPERAND(ogg_int16_t,OC_IDCT_CONSTS,128)
);
__asm__ __volatile__("pxor %%mm0,%%mm0\n\t"::);
for(i=0;i<4;i++){
__asm__ __volatile__(
"movq %%mm0,"OC_MEM_OFFS(0x00,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x08,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x10,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x18,x)"\n\t"
:[x]"=m"OC_ARRAY_OPERAND(ogg_int16_t,_x+16*i,16)
);
}
}
/*25 cycles.*/
#define OC_IDCT_BEGIN_10(_y,_x) \
"#OC_IDCT_BEGIN_10\n\t" \
"movq "OC_I(3,_x)",%%mm2\n\t" \
"nop\n\t" \
"movq "OC_MEM_OFFS(0x30,c)",%%mm6\n\t" \
"movq %%mm2,%%mm4\n\t" \
"movq "OC_MEM_OFFS(0x50,c)",%%mm1\n\t" \
"pmulhw %%mm6,%%mm4\n\t" \
"movq "OC_I(1,_x)",%%mm3\n\t" \
"pmulhw %%mm2,%%mm1\n\t" \
"movq "OC_MEM_OFFS(0x10,c)",%%mm0\n\t" \
"paddw %%mm2,%%mm4\n\t" \
"pxor %%mm6,%%mm6\n\t" \
"paddw %%mm1,%%mm2\n\t" \
"movq "OC_I(2,_x)",%%mm5\n\t" \
"pmulhw %%mm3,%%mm0\n\t" \
"movq %%mm5,%%mm1\n\t" \
"paddw %%mm3,%%mm0\n\t" \
"pmulhw "OC_MEM_OFFS(0x70,c)",%%mm3\n\t" \
"psubw %%mm2,%%mm6\n\t" \
"pmulhw "OC_MEM_OFFS(0x20,c)",%%mm5\n\t" \
"psubw %%mm4,%%mm0\n\t" \
"movq "OC_I(2,_x)",%%mm7\n\t" \
"paddw %%mm4,%%mm4\n\t" \
"paddw %%mm5,%%mm7\n\t" \
"paddw %%mm0,%%mm4\n\t" \
"pmulhw "OC_MEM_OFFS(0x60,c)",%%mm1\n\t" \
"psubw %%mm6,%%mm3\n\t" \
"movq %%mm4,"OC_I(1,_y)"\n\t" \
"paddw %%mm6,%%mm6\n\t" \
"movq "OC_MEM_OFFS(0x40,c)",%%mm4\n\t" \
"paddw %%mm3,%%mm6\n\t" \
"movq %%mm3,%%mm5\n\t" \
"pmulhw %%mm4,%%mm3\n\t" \
"movq %%mm6,"OC_I(2,_y)"\n\t" \
"movq %%mm0,%%mm2\n\t" \
"movq "OC_I(0,_x)",%%mm6\n\t" \
"pmulhw %%mm4,%%mm0\n\t" \
"paddw %%mm3,%%mm5\n\t" \
"paddw %%mm0,%%mm2\n\t" \
"psubw %%mm1,%%mm5\n\t" \
"pmulhw %%mm4,%%mm6\n\t" \
"paddw "OC_I(0,_x)",%%mm6\n\t" \
"paddw %%mm1,%%mm1\n\t" \
"movq %%mm6,%%mm4\n\t" \
"paddw %%mm5,%%mm1\n\t" \
"psubw %%mm2,%%mm6\n\t" \
"paddw %%mm2,%%mm2\n\t" \
"movq "OC_I(1,_y)",%%mm0\n\t" \
"paddw %%mm6,%%mm2\n\t" \
"psubw %%mm1,%%mm2\n\t" \
"nop\n\t" \
"#end OC_IDCT_BEGIN_10\n\t" \
/*25+8=33 cycles.*/
#define OC_ROW_IDCT_10(_y,_x) \
"#OC_ROW_IDCT_10\n\t" \
OC_IDCT_BEGIN_10(_y,_x) \
/*r3=D'*/ \
"movq "OC_I(2,_y)",%%mm3\n\t" \
/*r4=E'=E-G*/ \
"psubw %%mm7,%%mm4\n\t" \
/*r1=H'+H'*/ \
"paddw %%mm1,%%mm1\n\t" \
/*r7=G+G*/ \
"paddw %%mm7,%%mm7\n\t" \
/*r1=R1=A''+H'*/ \
"paddw %%mm2,%%mm1\n\t" \
/*r7=G'=E+G*/ \
"paddw %%mm4,%%mm7\n\t" \
/*r4=R4=E'-D'*/ \
"psubw %%mm3,%%mm4\n\t" \
"paddw %%mm3,%%mm3\n\t" \
/*r6=R6=F'-B''*/ \
"psubw %%mm5,%%mm6\n\t" \
"paddw %%mm5,%%mm5\n\t" \
/*r3=R3=E'+D'*/ \
"paddw %%mm4,%%mm3\n\t" \
/*r5=R5=F'+B''*/ \
"paddw %%mm6,%%mm5\n\t" \
/*r7=R7=G'-C'*/ \
"psubw %%mm0,%%mm7\n\t" \
"paddw %%mm0,%%mm0\n\t" \
/*Save R1.*/ \
"movq %%mm1,"OC_I(1,_y)"\n\t" \
/*r0=R0=G'+C'*/ \
"paddw %%mm7,%%mm0\n\t" \
"#end OC_ROW_IDCT_10\n\t" \
/*25+19=44 cycles'*/
#define OC_COLUMN_IDCT_10(_y) \
"#OC_COLUMN_IDCT_10\n\t" \
OC_IDCT_BEGIN_10(_y,_y) \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm2\n\t" \
/*r1=H'+H'*/ \
"paddw %%mm1,%%mm1\n\t" \
/*r1=R1=A''+H'*/ \
"paddw %%mm2,%%mm1\n\t" \
/*r2=NR2*/ \
"psraw $4,%%mm2\n\t" \
/*r4=E'=E-G*/ \
"psubw %%mm7,%%mm4\n\t" \
/*r1=NR1*/ \
"psraw $4,%%mm1\n\t" \
/*r3=D'*/ \
"movq "OC_I(2,_y)",%%mm3\n\t" \
/*r7=G+G*/ \
"paddw %%mm7,%%mm7\n\t" \
/*Store NR2 at I(2).*/ \
"movq %%mm2,"OC_I(2,_y)"\n\t" \
/*r7=G'=E+G*/ \
"paddw %%mm4,%%mm7\n\t" \
/*Store NR1 at I(1).*/ \
"movq %%mm1,"OC_I(1,_y)"\n\t" \
/*r4=R4=E'-D'*/ \
"psubw %%mm3,%%mm4\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm4\n\t" \
/*r3=D'+D'*/ \
"paddw %%mm3,%%mm3\n\t" \
/*r3=R3=E'+D'*/ \
"paddw %%mm4,%%mm3\n\t" \
/*r4=NR4*/ \
"psraw $4,%%mm4\n\t" \
/*r6=R6=F'-B''*/ \
"psubw %%mm5,%%mm6\n\t" \
/*r3=NR3*/ \
"psraw $4,%%mm3\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm6\n\t" \
/*r5=B''+B''*/ \
"paddw %%mm5,%%mm5\n\t" \
/*r5=R5=F'+B''*/ \
"paddw %%mm6,%%mm5\n\t" \
/*r6=NR6*/ \
"psraw $4,%%mm6\n\t" \
/*Store NR4 at J(4).*/ \
"movq %%mm4,"OC_J(4,_y)"\n\t" \
/*r5=NR5*/ \
"psraw $4,%%mm5\n\t" \
/*Store NR3 at I(3).*/ \
"movq %%mm3,"OC_I(3,_y)"\n\t" \
/*r7=R7=G'-C'*/ \
"psubw %%mm0,%%mm7\n\t" \
"paddw "OC_MEM_OFFS(0x00,c)",%%mm7\n\t" \
/*r0=C'+C'*/ \
"paddw %%mm0,%%mm0\n\t" \
/*r0=R0=G'+C'*/ \
"paddw %%mm7,%%mm0\n\t" \
/*r7=NR7*/ \
"psraw $4,%%mm7\n\t" \
/*Store NR6 at J(6).*/ \
"movq %%mm6,"OC_J(6,_y)"\n\t" \
/*r0=NR0*/ \
"psraw $4,%%mm0\n\t" \
/*Store NR5 at J(5).*/ \
"movq %%mm5,"OC_J(5,_y)"\n\t" \
/*Store NR7 at J(7).*/ \
"movq %%mm7,"OC_J(7,_y)"\n\t" \
/*Store NR0 at I(0).*/ \
"movq %%mm0,"OC_I(0,_y)"\n\t" \
"#end OC_COLUMN_IDCT_10\n\t" \
static void oc_idct8x8_10_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64]){
__asm__ __volatile__(
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16,_y)
#define OC_J(_k,_y) OC_MEM_OFFS(((_k)-4)*16+8,_y)
/*Done with dequant, descramble, and partial transpose.
Now do the iDCT itself.*/
OC_ROW_IDCT_10(y,x)
OC_TRANSPOSE(y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16,_y)
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT_10(y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) OC_MEM_OFFS((_k)*16+8,_y)
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT_10(y)
#undef OC_I
#undef OC_J
:[y]"=m"OC_ARRAY_OPERAND(ogg_int16_t,_y,64)
:[x]"m"OC_CONST_ARRAY_OPERAND(ogg_int16_t,_x,64),
[c]"m"OC_CONST_ARRAY_OPERAND(ogg_int16_t,OC_IDCT_CONSTS,128)
);
__asm__ __volatile__(
"pxor %%mm0,%%mm0\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x00,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x10,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x20,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x30,x)"\n\t"
:[x]"+m"OC_ARRAY_OPERAND(ogg_int16_t,_x,28)
);
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.*/
void oc_idct8x8_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi){
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
use a smaller transform when the block ends with a long zero run instead
of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Then perform the iDCT.*/
if(_last_zzi<=10)oc_idct8x8_10_mmx(_y,_x);
else oc_idct8x8_slow_mmx(_y,_x);
}
#endif

View File

@ -1,318 +0,0 @@
#if !defined(_x86_mmxloop_H)
# define _x86_mmxloop_H (1)
# include <stddef.h>
# include "x86int.h"
#if defined(OC_X86_ASM)
/*On entry, mm0={a0,...,a7}, mm1={b0,...,b7}, mm2={c0,...,c7}, mm3={d0,...d7}.
On exit, mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)} and
mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}; mm0 and mm3 are clobbered.*/
#define OC_LOOP_FILTER8_MMX \
"#OC_LOOP_FILTER8_MMX\n\t" \
/*mm7=0*/ \
"pxor %%mm7,%%mm7\n\t" \
/*mm6:mm0={a0,...,a7}*/ \
"movq %%mm0,%%mm6\n\t" \
"punpcklbw %%mm7,%%mm0\n\t" \
"punpckhbw %%mm7,%%mm6\n\t" \
/*mm3:mm5={d0,...,d7}*/ \
"movq %%mm3,%%mm5\n\t" \
"punpcklbw %%mm7,%%mm3\n\t" \
"punpckhbw %%mm7,%%mm5\n\t" \
/*mm6:mm0={a0-d0,...,a7-d7}*/ \
"psubw %%mm3,%%mm0\n\t" \
"psubw %%mm5,%%mm6\n\t" \
/*mm3:mm1={b0,...,b7}*/ \
"movq %%mm1,%%mm3\n\t" \
"punpcklbw %%mm7,%%mm1\n\t" \
"movq %%mm2,%%mm4\n\t" \
"punpckhbw %%mm7,%%mm3\n\t" \
/*mm5:mm4={c0,...,c7}*/ \
"movq %%mm2,%%mm5\n\t" \
"punpcklbw %%mm7,%%mm4\n\t" \
"punpckhbw %%mm7,%%mm5\n\t" \
/*mm7={3}x4 \
mm5:mm4={c0-b0,...,c7-b7}*/ \
"pcmpeqw %%mm7,%%mm7\n\t" \
"psubw %%mm1,%%mm4\n\t" \
"psrlw $14,%%mm7\n\t" \
"psubw %%mm3,%%mm5\n\t" \
/*Scale by 3.*/ \
"pmullw %%mm7,%%mm4\n\t" \
"pmullw %%mm7,%%mm5\n\t" \
/*mm7={4}x4 \
mm5:mm4=f={a0-d0+3*(c0-b0),...,a7-d7+3*(c7-b7)}*/ \
"psrlw $1,%%mm7\n\t" \
"paddw %%mm0,%%mm4\n\t" \
"psllw $2,%%mm7\n\t" \
"movq (%[ll]),%%mm0\n\t" \
"paddw %%mm6,%%mm5\n\t" \
/*R_i has the range [-127,128], so we compute -R_i instead. \
mm4=-R_i=-(f+4>>3)=0xFF^(f-4>>3)*/ \
"psubw %%mm7,%%mm4\n\t" \
"psubw %%mm7,%%mm5\n\t" \
"psraw $3,%%mm4\n\t" \
"psraw $3,%%mm5\n\t" \
"pcmpeqb %%mm7,%%mm7\n\t" \
"packsswb %%mm5,%%mm4\n\t" \
"pxor %%mm6,%%mm6\n\t" \
"pxor %%mm7,%%mm4\n\t" \
"packuswb %%mm3,%%mm1\n\t" \
/*Now compute lflim of -mm4 cf. Section 7.10 of the sepc.*/ \
/*There's no unsigned byte+signed byte with unsigned saturation op code, so \
we have to split things by sign (the other option is to work in 16 bits, \
but working in 8 bits gives much better parallelism). \
We compute abs(R_i), but save a mask of which terms were negative in mm6. \
Then we compute mm4=abs(lflim(R_i,L))=min(abs(R_i),max(2*L-abs(R_i),0)). \
Finally, we split mm4 into positive and negative pieces using the mask in \
mm6, and add and subtract them as appropriate.*/ \
/*mm4=abs(-R_i)*/ \
/*mm7=255-2*L*/ \
"pcmpgtb %%mm4,%%mm6\n\t" \
"psubb %%mm0,%%mm7\n\t" \
"pxor %%mm6,%%mm4\n\t" \
"psubb %%mm0,%%mm7\n\t" \
"psubb %%mm6,%%mm4\n\t" \
/*mm7=255-max(2*L-abs(R_i),0)*/ \
"paddusb %%mm4,%%mm7\n\t" \
/*mm4=min(abs(R_i),max(2*L-abs(R_i),0))*/ \
"paddusb %%mm7,%%mm4\n\t" \
"psubusb %%mm7,%%mm4\n\t" \
/*Now split mm4 by the original sign of -R_i.*/ \
"movq %%mm4,%%mm5\n\t" \
"pand %%mm6,%%mm4\n\t" \
"pandn %%mm5,%%mm6\n\t" \
/*mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)}*/ \
/*mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}*/ \
"paddusb %%mm4,%%mm1\n\t" \
"psubusb %%mm4,%%mm2\n\t" \
"psubusb %%mm6,%%mm1\n\t" \
"paddusb %%mm6,%%mm2\n\t" \
/*On entry, mm0={a0,...,a7}, mm1={b0,...,b7}, mm2={c0,...,c7}, mm3={d0,...d7}.
On exit, mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)} and
mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}.
All other MMX registers are clobbered.*/
#define OC_LOOP_FILTER8_MMXEXT \
"#OC_LOOP_FILTER8_MMXEXT\n\t" \
/*R_i=(a_i-3*b_i+3*c_i-d_i+4>>3) has the range [-127,128], so we compute \
-R_i=(-a_i+3*b_i-3*c_i+d_i+3>>3) instead.*/ \
/*This first part is based on the transformation \
f = -(3*(c-b)+a-d+4>>3) \
= -(3*(c+255-b)+(a+255-d)+4-1020>>3) \
= -(3*(c+~b)+(a+~d)-1016>>3) \
= 127-(3*(c+~b)+(a+~d)>>3) \
= 128+~(3*(c+~b)+(a+~d)>>3) (mod 256). \
Although pavgb(a,b) = (a+b+1>>1) (biased up), we rely heavily on the \
fact that ~pavgb(~a,~b) = (a+b>>1) (biased down). \
Using this, the last expression above can be computed in 8 bits of working \
precision via: \
u = ~pavgb(~b,c); \
v = pavgb(b,~c); \
This mask is 0 or 0xFF, and controls whether t is biased up or down: \
m = u-v; \
t = m^pavgb(m^~a,m^d); \
f = 128+pavgb(pavgb(t,u),v); \
This required some careful analysis to ensure that carries are propagated \
correctly in all cases, but has been checked exhaustively.*/ \
/*input (a, b, c, d, ., ., ., .)*/ \
/*ff=0xFF; \
u=b; \
v=c; \
ll=255-2*L;*/ \
"pcmpeqb %%mm7,%%mm7\n\t" \
"movq %%mm1,%%mm4\n\t" \
"movq %%mm2,%%mm5\n\t" \
"movq (%[ll]),%%mm6\n\t" \
/*allocated u, v, ll, ff: (a, b, c, d, u, v, ll, ff)*/ \
/*u^=ff; \
v^=ff;*/ \
"pxor %%mm7,%%mm4\n\t" \
"pxor %%mm7,%%mm5\n\t" \
/*allocated ll: (a, b, c, d, u, v, ll, ff)*/ \
/*u=pavgb(u,c); \
v=pavgb(v,b);*/ \
"pavgb %%mm2,%%mm4\n\t" \
"pavgb %%mm1,%%mm5\n\t" \
/*u^=ff; \
a^=ff;*/ \
"pxor %%mm7,%%mm4\n\t" \
"pxor %%mm7,%%mm0\n\t" \
/*m=u-v;*/ \
"psubb %%mm5,%%mm4\n\t" \
/*freed u, allocated m: (a, b, c, d, m, v, ll, ff)*/ \
/*a^=m; \
d^=m;*/ \
"pxor %%mm4,%%mm0\n\t" \
"pxor %%mm4,%%mm3\n\t" \
/*t=pavgb(a,d);*/ \
"pavgb %%mm3,%%mm0\n\t" \
"psllw $7,%%mm7\n\t" \
/*freed a, d, ff, allocated t, of: (t, b, c, ., m, v, ll, of)*/ \
/*t^=m; \
u=m+v;*/ \
"pxor %%mm4,%%mm0\n\t" \
"paddb %%mm5,%%mm4\n\t" \
/*freed t, m, allocated f, u: (f, b, c, ., u, v, ll, of)*/ \
/*f=pavgb(f,u); \
of=128;*/ \
"pavgb %%mm4,%%mm0\n\t" \
"packsswb %%mm7,%%mm7\n\t" \
/*freed u, ff, allocated ll: (f, b, c, ., ll, v, ll, of)*/ \
/*f=pavgb(f,v);*/ \
"pavgb %%mm5,%%mm0\n\t" \
"movq %%mm7,%%mm3\n\t" \
"movq %%mm6,%%mm4\n\t" \
/*freed v, allocated of: (f, b, c, of, ll, ., ll, of)*/ \
/*Now compute lflim of R_i=-(128+mm0) cf. Section 7.10 of the sepc.*/ \
/*There's no unsigned byte+signed byte with unsigned saturation op code, so \
we have to split things by sign (the other option is to work in 16 bits, \
but staying in 8 bits gives much better parallelism).*/ \
/*Instead of adding the offset of 128 in mm3, we use it to split mm0. \
This is the same number of instructions as computing a mask and splitting \
after the lflim computation, but has shorter dependency chains.*/ \
/*mm0=R_i<0?-R_i:0 (denoted abs(R_i<0))\
mm3=R_i>0?R_i:0* (denoted abs(R_i>0))*/ \
"psubusb %%mm0,%%mm3\n\t" \
"psubusb %%mm7,%%mm0\n\t" \
/*mm6=255-max(2*L-abs(R_i<0),0) \
mm4=255-max(2*L-abs(R_i>0),0)*/ \
"paddusb %%mm3,%%mm4\n\t" \
"paddusb %%mm0,%%mm6\n\t" \
/*mm0=min(abs(R_i<0),max(2*L-abs(R_i<0),0)) \
mm3=min(abs(R_i>0),max(2*L-abs(R_i>0),0))*/ \
"paddusb %%mm4,%%mm3\n\t" \
"paddusb %%mm6,%%mm0\n\t" \
"psubusb %%mm4,%%mm3\n\t" \
"psubusb %%mm6,%%mm0\n\t" \
/*mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)}*/ \
/*mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}*/ \
"paddusb %%mm3,%%mm1\n\t" \
"psubusb %%mm3,%%mm2\n\t" \
"psubusb %%mm0,%%mm1\n\t" \
"paddusb %%mm0,%%mm2\n\t" \
#define OC_LOOP_FILTER_V(_filter,_pix,_ystride,_ll) \
do{ \
ptrdiff_t ystride3__; \
__asm__ __volatile__( \
/*mm0={a0,...,a7}*/ \
"movq (%[pix]),%%mm0\n\t" \
/*ystride3=_ystride*3*/ \
"lea (%[ystride],%[ystride],2),%[ystride3]\n\t" \
/*mm3={d0,...,d7}*/ \
"movq (%[pix],%[ystride3]),%%mm3\n\t" \
/*mm1={b0,...,b7}*/ \
"movq (%[pix],%[ystride]),%%mm1\n\t" \
/*mm2={c0,...,c7}*/ \
"movq (%[pix],%[ystride],2),%%mm2\n\t" \
_filter \
/*Write it back out.*/ \
"movq %%mm1,(%[pix],%[ystride])\n\t" \
"movq %%mm2,(%[pix],%[ystride],2)\n\t" \
:[ystride3]"=&r"(ystride3__) \
:[pix]"r"(_pix-_ystride*2),[ystride]"r"((ptrdiff_t)(_ystride)), \
[ll]"r"(_ll) \
:"memory" \
); \
} \
while(0)
#define OC_LOOP_FILTER_H(_filter,_pix,_ystride,_ll) \
do{ \
unsigned char *pix__; \
ptrdiff_t ystride3__; \
ptrdiff_t d__; \
pix__=(_pix)-2; \
__asm__ __volatile__( \
/*x x x x d0 c0 b0 a0*/ \
"movd (%[pix]),%%mm0\n\t" \
/*x x x x d1 c1 b1 a1*/ \
"movd (%[pix],%[ystride]),%%mm1\n\t" \
/*ystride3=_ystride*3*/ \
"lea (%[ystride],%[ystride],2),%[ystride3]\n\t" \
/*x x x x d2 c2 b2 a2*/ \
"movd (%[pix],%[ystride],2),%%mm2\n\t" \
/*x x x x d3 c3 b3 a3*/ \
"lea (%[pix],%[ystride],4),%[d]\n\t" \
"movd (%[pix],%[ystride3]),%%mm3\n\t" \
/*x x x x d4 c4 b4 a4*/ \
"movd (%[d]),%%mm4\n\t" \
/*x x x x d5 c5 b5 a5*/ \
"movd (%[d],%[ystride]),%%mm5\n\t" \
/*x x x x d6 c6 b6 a6*/ \
"movd (%[d],%[ystride],2),%%mm6\n\t" \
/*x x x x d7 c7 b7 a7*/ \
"movd (%[d],%[ystride3]),%%mm7\n\t" \
/*mm0=d1 d0 c1 c0 b1 b0 a1 a0*/ \
"punpcklbw %%mm1,%%mm0\n\t" \
/*mm2=d3 d2 c3 c2 b3 b2 a3 a2*/ \
"punpcklbw %%mm3,%%mm2\n\t" \
/*mm3=d1 d0 c1 c0 b1 b0 a1 a0*/ \
"movq %%mm0,%%mm3\n\t" \
/*mm0=b3 b2 b1 b0 a3 a2 a1 a0*/ \
"punpcklwd %%mm2,%%mm0\n\t" \
/*mm3=d3 d2 d1 d0 c3 c2 c1 c0*/ \
"punpckhwd %%mm2,%%mm3\n\t" \
/*mm1=b3 b2 b1 b0 a3 a2 a1 a0*/ \
"movq %%mm0,%%mm1\n\t" \
/*mm4=d5 d4 c5 c4 b5 b4 a5 a4*/ \
"punpcklbw %%mm5,%%mm4\n\t" \
/*mm6=d7 d6 c7 c6 b7 b6 a7 a6*/ \
"punpcklbw %%mm7,%%mm6\n\t" \
/*mm5=d5 d4 c5 c4 b5 b4 a5 a4*/ \
"movq %%mm4,%%mm5\n\t" \
/*mm4=b7 b6 b5 b4 a7 a6 a5 a4*/ \
"punpcklwd %%mm6,%%mm4\n\t" \
/*mm5=d7 d6 d5 d4 c7 c6 c5 c4*/ \
"punpckhwd %%mm6,%%mm5\n\t" \
/*mm2=d3 d2 d1 d0 c3 c2 c1 c0*/ \
"movq %%mm3,%%mm2\n\t" \
/*mm0=a7 a6 a5 a4 a3 a2 a1 a0*/ \
"punpckldq %%mm4,%%mm0\n\t" \
/*mm1=b7 b6 b5 b4 b3 b2 b1 b0*/ \
"punpckhdq %%mm4,%%mm1\n\t" \
/*mm2=c7 c6 c5 c4 c3 c2 c1 c0*/ \
"punpckldq %%mm5,%%mm2\n\t" \
/*mm3=d7 d6 d5 d4 d3 d2 d1 d0*/ \
"punpckhdq %%mm5,%%mm3\n\t" \
_filter \
/*mm2={b0+R_0'',...,b7+R_7''}*/ \
"movq %%mm1,%%mm0\n\t" \
/*mm1={b0+R_0'',c0-R_0'',...,b3+R_3'',c3-R_3''}*/ \
"punpcklbw %%mm2,%%mm1\n\t" \
/*mm2={b4+R_4'',c4-R_4'',...,b7+R_7'',c7-R_7''}*/ \
"punpckhbw %%mm2,%%mm0\n\t" \
/*[d]=c1 b1 c0 b0*/ \
"movd %%mm1,%[d]\n\t" \
"movw %w[d],1(%[pix])\n\t" \
"psrlq $32,%%mm1\n\t" \
"shr $16,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride])\n\t" \
/*[d]=c3 b3 c2 b2*/ \
"movd %%mm1,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride],2)\n\t" \
"shr $16,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride3])\n\t" \
"lea (%[pix],%[ystride],4),%[pix]\n\t" \
/*[d]=c5 b5 c4 b4*/ \
"movd %%mm0,%[d]\n\t" \
"movw %w[d],1(%[pix])\n\t" \
"psrlq $32,%%mm0\n\t" \
"shr $16,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride])\n\t" \
/*[d]=c7 b7 c6 b6*/ \
"movd %%mm0,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride],2)\n\t" \
"shr $16,%[d]\n\t" \
"movw %w[d],1(%[pix],%[ystride3])\n\t" \
:[pix]"+r"(pix__),[ystride3]"=&r"(ystride3__),[d]"=&r"(d__) \
:[ystride]"r"((ptrdiff_t)(_ystride)),[ll]"r"(_ll) \
:"memory" \
); \
} \
while(0)
# endif
#endif

View File

@ -1,226 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of complete fragment reconstruction algorithm.
Originally written by Rudolf Marek.*/
#include <string.h>
#include "x86int.h"
#include "mmxloop.h"
#if defined(OC_X86_ASM)
void oc_state_frag_recon_mmx(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant){
unsigned char *dst;
ptrdiff_t frag_buf_off;
int ystride;
int refi;
/*Apply the inverse transform.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
/*Note that this value must be unsigned, to keep the __asm__ block from
sign-extending it when it puts it in a register.*/
ogg_uint16_t p;
int i;
/*We round this dequant product (and not any of the others) because there's
no iDCT rounding.*/
p=(ogg_int16_t)(_dct_coeffs[0]*(ogg_int32_t)_dc_quant+15>>5);
/*Fill _dct_coeffs with p.*/
__asm__ __volatile__(
/*mm0=0000 0000 0000 AAAA*/
"movd %[p],%%mm0\n\t"
/*mm0=0000 0000 AAAA AAAA*/
"punpcklwd %%mm0,%%mm0\n\t"
/*mm0=AAAA AAAA AAAA AAAA*/
"punpckldq %%mm0,%%mm0\n\t"
:
:[p]"r"((unsigned)p)
);
for(i=0;i<4;i++){
__asm__ __volatile__(
"movq %%mm0,"OC_MEM_OFFS(0x00,y)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x08,y)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x10,y)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x18,y)"\n\t"
:[y]"=m"OC_ARRAY_OPERAND(ogg_int16_t,_dct_coeffs+64+16*i,16)
);
}
}
else{
/*Dequantize the DC coefficient.*/
_dct_coeffs[0]=(ogg_int16_t)(_dct_coeffs[0]*(int)_dc_quant);
oc_idct8x8(_state,_dct_coeffs+64,_dct_coeffs,_last_zzi);
}
/*Fill in the target buffer.*/
frag_buf_off=_state->frag_buf_offs[_fragi];
refi=_state->frags[_fragi].refi;
ystride=_state->ref_ystride[_pli];
dst=_state->ref_frame_data[OC_FRAME_SELF]+frag_buf_off;
if(refi==OC_FRAME_SELF)oc_frag_recon_intra_mmx(dst,ystride,_dct_coeffs+64);
else{
const unsigned char *ref;
int mvoffsets[2];
ref=_state->ref_frame_data[refi]+frag_buf_off;
if(oc_state_get_mv_offsets(_state,mvoffsets,_pli,
_state->frag_mvs[_fragi])>1){
oc_frag_recon_inter2_mmx(dst,ref+mvoffsets[0],ref+mvoffsets[1],ystride,
_dct_coeffs+64);
}
else oc_frag_recon_inter_mmx(dst,ref+mvoffsets[0],ystride,_dct_coeffs+64);
}
}
/*We copy these entire function to inline the actual MMX routines so that we
use only a single indirect call.*/
void oc_loop_filter_init_mmx(signed char _bv[256],int _flimit){
memset(_bv,_flimit,8);
}
/*Apply the loop filter to a given set of fragment rows in the given plane.
The filter may be run on the bottom edge, affecting pixels in the next row of
fragments, so this row also needs to be available.
_bv: The bounding values array.
_refi: The index of the frame buffer to filter.
_pli: The color plane to filter.
_fragy0: The Y coordinate of the first fragment row to filter.
_fragy_end: The Y coordinate of the fragment row to stop filtering at.*/
void oc_state_loop_filter_frag_rows_mmx(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end){
OC_ALIGN8(unsigned char ll[8]);
const oc_fragment_plane *fplane;
const oc_fragment *frags;
const ptrdiff_t *frag_buf_offs;
unsigned char *ref_frame_data;
ptrdiff_t fragi_top;
ptrdiff_t fragi_bot;
ptrdiff_t fragi0;
ptrdiff_t fragi0_end;
int ystride;
int nhfrags;
memset(ll,_state->loop_filter_limits[_state->qis[0]],sizeof(ll));
fplane=_state->fplanes+_pli;
nhfrags=fplane->nhfrags;
fragi_top=fplane->froffset;
fragi_bot=fragi_top+fplane->nfrags;
fragi0=fragi_top+_fragy0*(ptrdiff_t)nhfrags;
fragi0_end=fragi0+(_fragy_end-_fragy0)*(ptrdiff_t)nhfrags;
ystride=_state->ref_ystride[_pli];
frags=_state->frags;
frag_buf_offs=_state->frag_buf_offs;
ref_frame_data=_state->ref_frame_data[_refi];
/*The following loops are constructed somewhat non-intuitively on purpose.
The main idea is: if a block boundary has at least one coded fragment on
it, the filter is applied to it.
However, the order that the filters are applied in matters, and VP3 chose
the somewhat strange ordering used below.*/
while(fragi0<fragi0_end){
ptrdiff_t fragi;
ptrdiff_t fragi_end;
fragi=fragi0;
fragi_end=fragi+nhfrags;
while(fragi<fragi_end){
if(frags[fragi].coded){
unsigned char *ref;
ref=ref_frame_data+frag_buf_offs[fragi];
if(fragi>fragi0){
OC_LOOP_FILTER_H(OC_LOOP_FILTER8_MMX,ref,ystride,ll);
}
if(fragi0>fragi_top){
OC_LOOP_FILTER_V(OC_LOOP_FILTER8_MMX,ref,ystride,ll);
}
if(fragi+1<fragi_end&&!frags[fragi+1].coded){
OC_LOOP_FILTER_H(OC_LOOP_FILTER8_MMX,ref+8,ystride,ll);
}
if(fragi+nhfrags<fragi_bot&&!frags[fragi+nhfrags].coded){
OC_LOOP_FILTER_V(OC_LOOP_FILTER8_MMX,ref+(ystride<<3),ystride,ll);
}
}
fragi++;
}
fragi0+=nhfrags;
}
}
void oc_loop_filter_init_mmxext(signed char _bv[256],int _flimit){
memset(_bv,~(_flimit<<1),8);
}
/*Apply the loop filter to a given set of fragment rows in the given plane.
The filter may be run on the bottom edge, affecting pixels in the next row of
fragments, so this row also needs to be available.
_bv: The bounding values array.
_refi: The index of the frame buffer to filter.
_pli: The color plane to filter.
_fragy0: The Y coordinate of the first fragment row to filter.
_fragy_end: The Y coordinate of the fragment row to stop filtering at.*/
void oc_state_loop_filter_frag_rows_mmxext(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end){
const oc_fragment_plane *fplane;
const oc_fragment *frags;
const ptrdiff_t *frag_buf_offs;
unsigned char *ref_frame_data;
ptrdiff_t fragi_top;
ptrdiff_t fragi_bot;
ptrdiff_t fragi0;
ptrdiff_t fragi0_end;
int ystride;
int nhfrags;
fplane=_state->fplanes+_pli;
nhfrags=fplane->nhfrags;
fragi_top=fplane->froffset;
fragi_bot=fragi_top+fplane->nfrags;
fragi0=fragi_top+_fragy0*(ptrdiff_t)nhfrags;
fragi0_end=fragi_top+_fragy_end*(ptrdiff_t)nhfrags;
ystride=_state->ref_ystride[_pli];
frags=_state->frags;
frag_buf_offs=_state->frag_buf_offs;
ref_frame_data=_state->ref_frame_data[_refi];
/*The following loops are constructed somewhat non-intuitively on purpose.
The main idea is: if a block boundary has at least one coded fragment on
it, the filter is applied to it.
However, the order that the filters are applied in matters, and VP3 chose
the somewhat strange ordering used below.*/
while(fragi0<fragi0_end){
ptrdiff_t fragi;
ptrdiff_t fragi_end;
fragi=fragi0;
fragi_end=fragi+nhfrags;
while(fragi<fragi_end){
if(frags[fragi].coded){
unsigned char *ref;
ref=ref_frame_data+frag_buf_offs[fragi];
if(fragi>fragi0){
OC_LOOP_FILTER_H(OC_LOOP_FILTER8_MMXEXT,ref,ystride,_bv);
}
if(fragi0>fragi_top){
OC_LOOP_FILTER_V(OC_LOOP_FILTER8_MMXEXT,ref,ystride,_bv);
}
if(fragi+1<fragi_end&&!frags[fragi+1].coded){
OC_LOOP_FILTER_H(OC_LOOP_FILTER8_MMXEXT,ref+8,ystride,_bv);
}
if(fragi+nhfrags<fragi_bot&&!frags[fragi+nhfrags].coded){
OC_LOOP_FILTER_V(OC_LOOP_FILTER8_MMXEXT,ref+(ystride<<3),ystride,_bv);
}
}
fragi++;
}
fragi0+=nhfrags;
}
}
#endif

View File

@ -1,456 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: mmxidct.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
/*SSE2 acceleration of Theora's iDCT.*/
#include "x86int.h"
#include "sse2trans.h"
#include "../dct.h"
#if defined(OC_X86_ASM)
/*A table of constants used by the MMX routines.*/
const unsigned short __attribute__((aligned(16),used)) OC_IDCT_CONSTS[64]={
8, 8, 8, 8, 8, 8, 8, 8,
OC_C1S7,OC_C1S7,OC_C1S7,OC_C1S7,OC_C1S7,OC_C1S7,OC_C1S7,OC_C1S7,
OC_C2S6,OC_C2S6,OC_C2S6,OC_C2S6,OC_C2S6,OC_C2S6,OC_C2S6,OC_C2S6,
OC_C3S5,OC_C3S5,OC_C3S5,OC_C3S5,OC_C3S5,OC_C3S5,OC_C3S5,OC_C3S5,
OC_C4S4,OC_C4S4,OC_C4S4,OC_C4S4,OC_C4S4,OC_C4S4,OC_C4S4,OC_C4S4,
OC_C5S3,OC_C5S3,OC_C5S3,OC_C5S3,OC_C5S3,OC_C5S3,OC_C5S3,OC_C5S3,
OC_C6S2,OC_C6S2,OC_C6S2,OC_C6S2,OC_C6S2,OC_C6S2,OC_C6S2,OC_C6S2,
OC_C7S1,OC_C7S1,OC_C7S1,OC_C7S1,OC_C7S1,OC_C7S1,OC_C7S1,OC_C7S1
};
/*Performs the first three stages of the iDCT.
xmm2, xmm6, xmm3, and xmm5 must contain the corresponding rows of the input
(accessed in that order).
The remaining rows must be in _x at their corresponding locations.
On output, xmm7 down to xmm4 contain rows 0 through 3, and xmm0 up to xmm3
contain rows 4 through 7.*/
#define OC_IDCT_8x8_ABC(_x) \
"#OC_IDCT_8x8_ABC\n\t" \
/*Stage 1:*/ \
/*2-3 rotation by 6pi/16. \
xmm4=xmm7=C6, xmm0=xmm1=C2, xmm2=X2, xmm6=X6.*/ \
"movdqa "OC_MEM_OFFS(0x20,c)",%%xmm1\n\t" \
"movdqa "OC_MEM_OFFS(0x60,c)",%%xmm4\n\t" \
"movdqa %%xmm1,%%xmm0\n\t" \
"pmulhw %%xmm2,%%xmm1\n\t" \
"movdqa %%xmm4,%%xmm7\n\t" \
"pmulhw %%xmm6,%%xmm0\n\t" \
"pmulhw %%xmm2,%%xmm7\n\t" \
"pmulhw %%xmm6,%%xmm4\n\t" \
"paddw %%xmm6,%%xmm0\n\t" \
"movdqa "OC_MEM_OFFS(0x30,c)",%%xmm6\n\t" \
"paddw %%xmm1,%%xmm2\n\t" \
"psubw %%xmm0,%%xmm7\n\t" \
"movdqa %%xmm7,"OC_MEM_OFFS(0x00,buf)"\n\t" \
"paddw %%xmm4,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x50,c)",%%xmm4\n\t" \
"movdqa %%xmm2,"OC_MEM_OFFS(0x10,buf)"\n\t" \
/*5-6 rotation by 3pi/16. \
xmm4=xmm2=C5, xmm1=xmm6=C3, xmm3=X3, xmm5=X5.*/ \
"movdqa %%xmm4,%%xmm2\n\t" \
"movdqa %%xmm6,%%xmm1\n\t" \
"pmulhw %%xmm3,%%xmm4\n\t" \
"pmulhw %%xmm5,%%xmm1\n\t" \
"pmulhw %%xmm3,%%xmm6\n\t" \
"pmulhw %%xmm5,%%xmm2\n\t" \
"paddw %%xmm3,%%xmm4\n\t" \
"paddw %%xmm5,%%xmm3\n\t" \
"paddw %%xmm6,%%xmm3\n\t" \
"movdqa "OC_MEM_OFFS(0x70,_x)",%%xmm6\n\t" \
"paddw %%xmm5,%%xmm1\n\t" \
"movdqa "OC_MEM_OFFS(0x10,_x)",%%xmm5\n\t" \
"paddw %%xmm3,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x70,c)",%%xmm3\n\t" \
"psubw %%xmm4,%%xmm1\n\t" \
"movdqa "OC_MEM_OFFS(0x10,c)",%%xmm4\n\t" \
/*4-7 rotation by 7pi/16. \
xmm4=xmm7=C1, xmm3=xmm0=C7, xmm5=X1, xmm6=X7.*/ \
"movdqa %%xmm3,%%xmm0\n\t" \
"movdqa %%xmm4,%%xmm7\n\t" \
"pmulhw %%xmm5,%%xmm3\n\t" \
"pmulhw %%xmm5,%%xmm7\n\t" \
"pmulhw %%xmm6,%%xmm4\n\t" \
"pmulhw %%xmm6,%%xmm0\n\t" \
"paddw %%xmm6,%%xmm4\n\t" \
"movdqa "OC_MEM_OFFS(0x40,_x)",%%xmm6\n\t" \
"paddw %%xmm5,%%xmm7\n\t" \
"psubw %%xmm4,%%xmm3\n\t" \
"movdqa "OC_MEM_OFFS(0x40,c)",%%xmm4\n\t" \
"paddw %%xmm7,%%xmm0\n\t" \
"movdqa "OC_MEM_OFFS(0x00,_x)",%%xmm7\n\t" \
/*0-1 butterfly. \
xmm4=xmm5=C4, xmm7=X0, xmm6=X4.*/ \
"paddw %%xmm7,%%xmm6\n\t" \
"movdqa %%xmm4,%%xmm5\n\t" \
"pmulhw %%xmm6,%%xmm4\n\t" \
"paddw %%xmm7,%%xmm7\n\t" \
"psubw %%xmm6,%%xmm7\n\t" \
"paddw %%xmm6,%%xmm4\n\t" \
/*Stage 2:*/ \
/*4-5 butterfly: xmm3=t[4], xmm1=t[5] \
7-6 butterfly: xmm2=t[6], xmm0=t[7]*/ \
"movdqa %%xmm3,%%xmm6\n\t" \
"paddw %%xmm1,%%xmm3\n\t" \
"psubw %%xmm1,%%xmm6\n\t" \
"movdqa %%xmm5,%%xmm1\n\t" \
"pmulhw %%xmm7,%%xmm5\n\t" \
"paddw %%xmm7,%%xmm5\n\t" \
"movdqa %%xmm0,%%xmm7\n\t" \
"paddw %%xmm2,%%xmm0\n\t" \
"psubw %%xmm2,%%xmm7\n\t" \
"movdqa %%xmm1,%%xmm2\n\t" \
"pmulhw %%xmm6,%%xmm1\n\t" \
"pmulhw %%xmm7,%%xmm2\n\t" \
"paddw %%xmm6,%%xmm1\n\t" \
"movdqa "OC_MEM_OFFS(0x00,buf)",%%xmm6\n\t" \
"paddw %%xmm7,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x10,buf)",%%xmm7\n\t" \
/*Stage 3: \
6-5 butterfly: xmm1=t[5], xmm2=t[6] -> xmm1=t[6]+t[5], xmm2=t[6]-t[5] \
0-3 butterfly: xmm4=t[0], xmm7=t[3] -> xmm7=t[0]+t[3], xmm4=t[0]-t[3] \
1-2 butterfly: xmm5=t[1], xmm6=t[2] -> xmm6=t[1]+t[2], xmm5=t[1]-t[2]*/ \
"paddw %%xmm2,%%xmm1\n\t" \
"paddw %%xmm5,%%xmm6\n\t" \
"paddw %%xmm4,%%xmm7\n\t" \
"paddw %%xmm2,%%xmm2\n\t" \
"paddw %%xmm4,%%xmm4\n\t" \
"paddw %%xmm5,%%xmm5\n\t" \
"psubw %%xmm1,%%xmm2\n\t" \
"psubw %%xmm7,%%xmm4\n\t" \
"psubw %%xmm6,%%xmm5\n\t" \
/*Performs the last stage of the iDCT.
On input, xmm7 down to xmm4 contain rows 0 through 3, and xmm0 up to xmm3
contain rows 4 through 7.
On output, xmm0 through xmm7 contain the corresponding rows.*/
#define OC_IDCT_8x8_D \
"#OC_IDCT_8x8_D\n\t" \
/*Stage 4: \
0-7 butterfly: xmm7=t[0], xmm0=t[7] -> xmm0=t[0]+t[7], xmm7=t[0]-t[7] \
1-6 butterfly: xmm6=t[1], xmm1=t[6] -> xmm1=t[1]+t[6], xmm6=t[1]-t[6] \
2-5 butterfly: xmm5=t[2], xmm2=t[5] -> xmm2=t[2]+t[5], xmm5=t[2]-t[5] \
3-4 butterfly: xmm4=t[3], xmm3=t[4] -> xmm3=t[3]+t[4], xmm4=t[3]-t[4]*/ \
"psubw %%xmm0,%%xmm7\n\t" \
"psubw %%xmm1,%%xmm6\n\t" \
"psubw %%xmm2,%%xmm5\n\t" \
"psubw %%xmm3,%%xmm4\n\t" \
"paddw %%xmm0,%%xmm0\n\t" \
"paddw %%xmm1,%%xmm1\n\t" \
"paddw %%xmm2,%%xmm2\n\t" \
"paddw %%xmm3,%%xmm3\n\t" \
"paddw %%xmm7,%%xmm0\n\t" \
"paddw %%xmm6,%%xmm1\n\t" \
"paddw %%xmm5,%%xmm2\n\t" \
"paddw %%xmm4,%%xmm3\n\t" \
/*Performs the last stage of the iDCT.
On input, xmm7 down to xmm4 contain rows 0 through 3, and xmm0 up to xmm3
contain rows 4 through 7.
On output, xmm0 through xmm7 contain the corresponding rows.*/
#define OC_IDCT_8x8_D_STORE \
"#OC_IDCT_8x8_D_STORE\n\t" \
/*Stage 4: \
0-7 butterfly: xmm7=t[0], xmm0=t[7] -> xmm0=t[0]+t[7], xmm7=t[0]-t[7] \
1-6 butterfly: xmm6=t[1], xmm1=t[6] -> xmm1=t[1]+t[6], xmm6=t[1]-t[6] \
2-5 butterfly: xmm5=t[2], xmm2=t[5] -> xmm2=t[2]+t[5], xmm5=t[2]-t[5] \
3-4 butterfly: xmm4=t[3], xmm3=t[4] -> xmm3=t[3]+t[4], xmm4=t[3]-t[4]*/ \
"psubw %%xmm3,%%xmm4\n\t" \
"movdqa %%xmm4,"OC_MEM_OFFS(0x40,y)"\n\t" \
"movdqa "OC_MEM_OFFS(0x00,c)",%%xmm4\n\t" \
"psubw %%xmm0,%%xmm7\n\t" \
"psubw %%xmm1,%%xmm6\n\t" \
"psubw %%xmm2,%%xmm5\n\t" \
"paddw %%xmm4,%%xmm7\n\t" \
"paddw %%xmm4,%%xmm6\n\t" \
"paddw %%xmm4,%%xmm5\n\t" \
"paddw "OC_MEM_OFFS(0x40,y)",%%xmm4\n\t" \
"paddw %%xmm0,%%xmm0\n\t" \
"paddw %%xmm1,%%xmm1\n\t" \
"paddw %%xmm2,%%xmm2\n\t" \
"paddw %%xmm3,%%xmm3\n\t" \
"paddw %%xmm7,%%xmm0\n\t" \
"paddw %%xmm6,%%xmm1\n\t" \
"psraw $4,%%xmm0\n\t" \
"paddw %%xmm5,%%xmm2\n\t" \
"movdqa %%xmm0,"OC_MEM_OFFS(0x00,y)"\n\t" \
"psraw $4,%%xmm1\n\t" \
"paddw %%xmm4,%%xmm3\n\t" \
"movdqa %%xmm1,"OC_MEM_OFFS(0x10,y)"\n\t" \
"psraw $4,%%xmm2\n\t" \
"movdqa %%xmm2,"OC_MEM_OFFS(0x20,y)"\n\t" \
"psraw $4,%%xmm3\n\t" \
"movdqa %%xmm3,"OC_MEM_OFFS(0x30,y)"\n\t" \
"psraw $4,%%xmm4\n\t" \
"movdqa %%xmm4,"OC_MEM_OFFS(0x40,y)"\n\t" \
"psraw $4,%%xmm5\n\t" \
"movdqa %%xmm5,"OC_MEM_OFFS(0x50,y)"\n\t" \
"psraw $4,%%xmm6\n\t" \
"movdqa %%xmm6,"OC_MEM_OFFS(0x60,y)"\n\t" \
"psraw $4,%%xmm7\n\t" \
"movdqa %%xmm7,"OC_MEM_OFFS(0x70,y)"\n\t" \
static void oc_idct8x8_slow_sse2(ogg_int16_t _y[64],ogg_int16_t _x[64]){
OC_ALIGN16(ogg_int16_t buf[16]);
int i;
/*This routine accepts an 8x8 matrix pre-transposed.*/
__asm__ __volatile__(
/*Load rows 2, 3, 5, and 6 for the first stage of the iDCT.*/
"movdqa "OC_MEM_OFFS(0x20,x)",%%xmm2\n\t"
"movdqa "OC_MEM_OFFS(0x60,x)",%%xmm6\n\t"
"movdqa "OC_MEM_OFFS(0x30,x)",%%xmm3\n\t"
"movdqa "OC_MEM_OFFS(0x50,x)",%%xmm5\n\t"
OC_IDCT_8x8_ABC(x)
OC_IDCT_8x8_D
OC_TRANSPOSE_8x8
/*Clear out rows 0, 1, 4, and 7 for the first stage of the iDCT.*/
"movdqa %%xmm7,"OC_MEM_OFFS(0x70,y)"\n\t"
"movdqa %%xmm4,"OC_MEM_OFFS(0x40,y)"\n\t"
"movdqa %%xmm1,"OC_MEM_OFFS(0x10,y)"\n\t"
"movdqa %%xmm0,"OC_MEM_OFFS(0x00,y)"\n\t"
OC_IDCT_8x8_ABC(y)
OC_IDCT_8x8_D_STORE
:[buf]"=m"(OC_ARRAY_OPERAND(ogg_int16_t,buf,16)),
[y]"=m"(OC_ARRAY_OPERAND(ogg_int16_t,_y,64))
:[x]"m"(OC_CONST_ARRAY_OPERAND(ogg_int16_t,_x,64)),
[c]"m"(OC_CONST_ARRAY_OPERAND(ogg_int16_t,OC_IDCT_CONSTS,128))
);
__asm__ __volatile__("pxor %%xmm0,%%xmm0\n\t"::);
/*Clear input data for next block (decoder only).*/
for(i=0;i<2;i++){
__asm__ __volatile__(
"movdqa %%xmm0,"OC_MEM_OFFS(0x00,x)"\n\t"
"movdqa %%xmm0,"OC_MEM_OFFS(0x10,x)"\n\t"
"movdqa %%xmm0,"OC_MEM_OFFS(0x20,x)"\n\t"
"movdqa %%xmm0,"OC_MEM_OFFS(0x30,x)"\n\t"
:[x]"=m"(OC_ARRAY_OPERAND(ogg_int16_t,_x+i*32,32))
);
}
}
/*For the first step of the 10-coefficient version of the 8x8 iDCT, we only
need to work with four columns at a time.
Doing this in MMX is faster on processors with a 64-bit data path.*/
#define OC_IDCT_8x8_10_MMX \
"#OC_IDCT_8x8_10_MMX\n\t" \
/*Stage 1:*/ \
/*2-3 rotation by 6pi/16. \
mm7=C6, mm6=C2, mm2=X2, X6=0.*/ \
"movq "OC_MEM_OFFS(0x60,c)",%%mm7\n\t" \
"movq "OC_MEM_OFFS(0x20,c)",%%mm6\n\t" \
"pmulhw %%mm2,%%mm6\n\t" \
"pmulhw %%mm2,%%mm7\n\t" \
"movq "OC_MEM_OFFS(0x50,c)",%%mm5\n\t" \
"paddw %%mm6,%%mm2\n\t" \
"movq %%mm2,"OC_MEM_OFFS(0x10,buf)"\n\t" \
"movq "OC_MEM_OFFS(0x30,c)",%%mm2\n\t" \
"movq %%mm7,"OC_MEM_OFFS(0x00,buf)"\n\t" \
/*5-6 rotation by 3pi/16. \
mm5=C5, mm2=C3, mm3=X3, X5=0.*/ \
"pmulhw %%mm3,%%mm5\n\t" \
"pmulhw %%mm3,%%mm2\n\t" \
"movq "OC_MEM_OFFS(0x10,c)",%%mm7\n\t" \
"paddw %%mm3,%%mm5\n\t" \
"paddw %%mm3,%%mm2\n\t" \
"movq "OC_MEM_OFFS(0x70,c)",%%mm3\n\t" \
/*4-7 rotation by 7pi/16. \
mm7=C1, mm3=C7, mm1=X1, X7=0.*/ \
"pmulhw %%mm1,%%mm3\n\t" \
"pmulhw %%mm1,%%mm7\n\t" \
"movq "OC_MEM_OFFS(0x40,c)",%%mm4\n\t" \
"movq %%mm3,%%mm6\n\t" \
"paddw %%mm1,%%mm7\n\t" \
/*0-1 butterfly. \
mm4=C4, mm0=X0, X4=0.*/ \
/*Stage 2:*/ \
/*4-5 butterfly: mm3=t[4], mm5=t[5] \
7-6 butterfly: mm2=t[6], mm7=t[7]*/ \
"psubw %%mm5,%%mm3\n\t" \
"paddw %%mm5,%%mm6\n\t" \
"movq %%mm4,%%mm1\n\t" \
"pmulhw %%mm0,%%mm4\n\t" \
"paddw %%mm0,%%mm4\n\t" \
"movq %%mm7,%%mm0\n\t" \
"movq %%mm4,%%mm5\n\t" \
"paddw %%mm2,%%mm0\n\t" \
"psubw %%mm2,%%mm7\n\t" \
"movq %%mm1,%%mm2\n\t" \
"pmulhw %%mm6,%%mm1\n\t" \
"pmulhw %%mm7,%%mm2\n\t" \
"paddw %%mm6,%%mm1\n\t" \
"movq "OC_MEM_OFFS(0x00,buf)",%%mm6\n\t" \
"paddw %%mm7,%%mm2\n\t" \
"movq "OC_MEM_OFFS(0x10,buf)",%%mm7\n\t" \
/*Stage 3: \
6-5 butterfly: mm1=t[5], mm2=t[6] -> mm1=t[6]+t[5], mm2=t[6]-t[5] \
0-3 butterfly: mm4=t[0], mm7=t[3] -> mm7=t[0]+t[3], mm4=t[0]-t[3] \
1-2 butterfly: mm5=t[1], mm6=t[2] -> mm6=t[1]+t[2], mm5=t[1]-t[2]*/ \
"paddw %%mm2,%%mm1\n\t" \
"paddw %%mm5,%%mm6\n\t" \
"paddw %%mm4,%%mm7\n\t" \
"paddw %%mm2,%%mm2\n\t" \
"paddw %%mm4,%%mm4\n\t" \
"paddw %%mm5,%%mm5\n\t" \
"psubw %%mm1,%%mm2\n\t" \
"psubw %%mm7,%%mm4\n\t" \
"psubw %%mm6,%%mm5\n\t" \
/*Stage 4: \
0-7 butterfly: mm7=t[0], mm0=t[7] -> mm0=t[0]+t[7], mm7=t[0]-t[7] \
1-6 butterfly: mm6=t[1], mm1=t[6] -> mm1=t[1]+t[6], mm6=t[1]-t[6] \
2-5 butterfly: mm5=t[2], mm2=t[5] -> mm2=t[2]+t[5], mm5=t[2]-t[5] \
3-4 butterfly: mm4=t[3], mm3=t[4] -> mm3=t[3]+t[4], mm4=t[3]-t[4]*/ \
"psubw %%mm0,%%mm7\n\t" \
"psubw %%mm1,%%mm6\n\t" \
"psubw %%mm2,%%mm5\n\t" \
"psubw %%mm3,%%mm4\n\t" \
"paddw %%mm0,%%mm0\n\t" \
"paddw %%mm1,%%mm1\n\t" \
"paddw %%mm2,%%mm2\n\t" \
"paddw %%mm3,%%mm3\n\t" \
"paddw %%mm7,%%mm0\n\t" \
"paddw %%mm6,%%mm1\n\t" \
"paddw %%mm5,%%mm2\n\t" \
"paddw %%mm4,%%mm3\n\t" \
#define OC_IDCT_8x8_10_ABC \
"#OC_IDCT_8x8_10_ABC\n\t" \
/*Stage 1:*/ \
/*2-3 rotation by 6pi/16. \
xmm7=C6, xmm6=C2, xmm2=X2, X6=0.*/ \
"movdqa "OC_MEM_OFFS(0x60,c)",%%xmm7\n\t" \
"movdqa "OC_MEM_OFFS(0x20,c)",%%xmm6\n\t" \
"pmulhw %%xmm2,%%xmm6\n\t" \
"pmulhw %%xmm2,%%xmm7\n\t" \
"movdqa "OC_MEM_OFFS(0x50,c)",%%xmm5\n\t" \
"paddw %%xmm6,%%xmm2\n\t" \
"movdqa %%xmm2,"OC_MEM_OFFS(0x10,buf)"\n\t" \
"movdqa "OC_MEM_OFFS(0x30,c)",%%xmm2\n\t" \
"movdqa %%xmm7,"OC_MEM_OFFS(0x00,buf)"\n\t" \
/*5-6 rotation by 3pi/16. \
xmm5=C5, xmm2=C3, xmm3=X3, X5=0.*/ \
"pmulhw %%xmm3,%%xmm5\n\t" \
"pmulhw %%xmm3,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x10,c)",%%xmm7\n\t" \
"paddw %%xmm3,%%xmm5\n\t" \
"paddw %%xmm3,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x70,c)",%%xmm3\n\t" \
/*4-7 rotation by 7pi/16. \
xmm7=C1, xmm3=C7, xmm1=X1, X7=0.*/ \
"pmulhw %%xmm1,%%xmm3\n\t" \
"pmulhw %%xmm1,%%xmm7\n\t" \
"movdqa "OC_MEM_OFFS(0x40,c)",%%xmm4\n\t" \
"movdqa %%xmm3,%%xmm6\n\t" \
"paddw %%xmm1,%%xmm7\n\t" \
/*0-1 butterfly. \
xmm4=C4, xmm0=X0, X4=0.*/ \
/*Stage 2:*/ \
/*4-5 butterfly: xmm3=t[4], xmm5=t[5] \
7-6 butterfly: xmm2=t[6], xmm7=t[7]*/ \
"psubw %%xmm5,%%xmm3\n\t" \
"paddw %%xmm5,%%xmm6\n\t" \
"movdqa %%xmm4,%%xmm1\n\t" \
"pmulhw %%xmm0,%%xmm4\n\t" \
"paddw %%xmm0,%%xmm4\n\t" \
"movdqa %%xmm7,%%xmm0\n\t" \
"movdqa %%xmm4,%%xmm5\n\t" \
"paddw %%xmm2,%%xmm0\n\t" \
"psubw %%xmm2,%%xmm7\n\t" \
"movdqa %%xmm1,%%xmm2\n\t" \
"pmulhw %%xmm6,%%xmm1\n\t" \
"pmulhw %%xmm7,%%xmm2\n\t" \
"paddw %%xmm6,%%xmm1\n\t" \
"movdqa "OC_MEM_OFFS(0x00,buf)",%%xmm6\n\t" \
"paddw %%xmm7,%%xmm2\n\t" \
"movdqa "OC_MEM_OFFS(0x10,buf)",%%xmm7\n\t" \
/*Stage 3: \
6-5 butterfly: xmm1=t[5], xmm2=t[6] -> xmm1=t[6]+t[5], xmm2=t[6]-t[5] \
0-3 butterfly: xmm4=t[0], xmm7=t[3] -> xmm7=t[0]+t[3], xmm4=t[0]-t[3] \
1-2 butterfly: xmm5=t[1], xmm6=t[2] -> xmm6=t[1]+t[2], xmm5=t[1]-t[2]*/ \
"paddw %%xmm2,%%xmm1\n\t" \
"paddw %%xmm5,%%xmm6\n\t" \
"paddw %%xmm4,%%xmm7\n\t" \
"paddw %%xmm2,%%xmm2\n\t" \
"paddw %%xmm4,%%xmm4\n\t" \
"paddw %%xmm5,%%xmm5\n\t" \
"psubw %%xmm1,%%xmm2\n\t" \
"psubw %%xmm7,%%xmm4\n\t" \
"psubw %%xmm6,%%xmm5\n\t" \
static void oc_idct8x8_10_sse2(ogg_int16_t _y[64],ogg_int16_t _x[64]){
OC_ALIGN16(ogg_int16_t buf[16]);
/*This routine accepts an 8x8 matrix pre-transposed.*/
__asm__ __volatile__(
"movq "OC_MEM_OFFS(0x20,x)",%%mm2\n\t"
"movq "OC_MEM_OFFS(0x30,x)",%%mm3\n\t"
"movq "OC_MEM_OFFS(0x10,x)",%%mm1\n\t"
"movq "OC_MEM_OFFS(0x00,x)",%%mm0\n\t"
OC_IDCT_8x8_10_MMX
OC_TRANSPOSE_8x4_MMX2SSE
OC_IDCT_8x8_10_ABC
OC_IDCT_8x8_D_STORE
:[buf]"=m"(OC_ARRAY_OPERAND(short,buf,16)),
[y]"=m"(OC_ARRAY_OPERAND(ogg_int16_t,_y,64))
:[x]"m"OC_CONST_ARRAY_OPERAND(ogg_int16_t,_x,64),
[c]"m"(OC_CONST_ARRAY_OPERAND(ogg_int16_t,OC_IDCT_CONSTS,128))
);
/*Clear input data for next block (decoder only).*/
__asm__ __volatile__(
"pxor %%mm0,%%mm0\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x00,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x10,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x20,x)"\n\t"
"movq %%mm0,"OC_MEM_OFFS(0x30,x)"\n\t"
:[x]"+m"(OC_ARRAY_OPERAND(ogg_int16_t,_x,28))
);
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.*/
void oc_idct8x8_sse2(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi){
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
use a smaller transform when the block ends with a long zero run instead
of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Then perform the iDCT.*/
if(_last_zzi<=10)oc_idct8x8_10_sse2(_y,_x);
else oc_idct8x8_slow_sse2(_y,_x);
}
#endif

View File

@ -1,242 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: sse2trans.h 15675 2009-02-06 09:43:27Z tterribe $
********************************************************************/
#if !defined(_x86_sse2trans_H)
# define _x86_sse2trans_H (1)
# include "x86int.h"
# if defined(OC_X86_64_ASM)
/*On x86-64 we can transpose in-place without spilling registers.
By clever choices of the order to apply the butterflies and the order of
their outputs, we can take the rows in order and output the columns in order
without any extra operations and using just one temporary register.*/
# define OC_TRANSPOSE_8x8 \
"#OC_TRANSPOSE_8x8\n\t" \
"movdqa %%xmm4,%%xmm8\n\t" \
/*xmm4 = f3 e3 f2 e2 f1 e1 f0 e0*/ \
"punpcklwd %%xmm5,%%xmm4\n\t" \
/*xmm8 = f7 e7 f6 e6 f5 e5 f4 e4*/ \
"punpckhwd %%xmm5,%%xmm8\n\t" \
/*xmm5 is free.*/ \
"movdqa %%xmm0,%%xmm5\n\t" \
/*xmm0 = b3 a3 b2 a2 b1 a1 b0 a0*/ \
"punpcklwd %%xmm1,%%xmm0\n\t" \
/*xmm5 = b7 a7 b6 a6 b5 a5 b4 a4*/ \
"punpckhwd %%xmm1,%%xmm5\n\t" \
/*xmm1 is free.*/ \
"movdqa %%xmm6,%%xmm1\n\t" \
/*xmm6 = h3 g3 h2 g2 h1 g1 h0 g0*/ \
"punpcklwd %%xmm7,%%xmm6\n\t" \
/*xmm1 = h7 g7 h6 g6 h5 g5 h4 g4*/ \
"punpckhwd %%xmm7,%%xmm1\n\t" \
/*xmm7 is free.*/ \
"movdqa %%xmm2,%%xmm7\n\t" \
/*xmm2 = d7 c7 d6 c6 d5 c5 d4 c4*/ \
"punpckhwd %%xmm3,%%xmm2\n\t" \
/*xmm7 = d3 c3 d2 c2 d1 c1 d0 c0*/ \
"punpcklwd %%xmm3,%%xmm7\n\t" \
/*xmm3 is free.*/ \
"movdqa %%xmm0,%%xmm3\n\t" \
/*xmm0 = d1 c1 b1 a1 d0 c0 b0 a0*/ \
"punpckldq %%xmm7,%%xmm0\n\t" \
/*xmm3 = d3 c3 b3 a3 d2 c2 b2 a2*/ \
"punpckhdq %%xmm7,%%xmm3\n\t" \
/*xmm7 is free.*/ \
"movdqa %%xmm5,%%xmm7\n\t" \
/*xmm5 = d5 c5 b5 a5 d4 c4 b4 a4*/ \
"punpckldq %%xmm2,%%xmm5\n\t" \
/*xmm7 = d7 c7 b7 a7 d6 c6 b6 a6*/ \
"punpckhdq %%xmm2,%%xmm7\n\t" \
/*xmm2 is free.*/ \
"movdqa %%xmm4,%%xmm2\n\t" \
/*xmm4 = h3 g3 f3 e3 h2 g2 f2 e2*/ \
"punpckhdq %%xmm6,%%xmm4\n\t" \
/*xmm2 = h1 g1 f1 e1 h0 g0 f0 e0*/ \
"punpckldq %%xmm6,%%xmm2\n\t" \
/*xmm6 is free.*/ \
"movdqa %%xmm8,%%xmm6\n\t" \
/*xmm6 = h5 g5 f5 e5 h4 g4 f4 e4*/ \
"punpckldq %%xmm1,%%xmm6\n\t" \
/*xmm8 = h7 g7 f7 e7 h6 g6 f6 e6*/ \
"punpckhdq %%xmm1,%%xmm8\n\t" \
/*xmm1 is free.*/ \
"movdqa %%xmm0,%%xmm1\n\t" \
/*xmm0 = h0 g0 f0 e0 d0 c0 b0 a0*/ \
"punpcklqdq %%xmm2,%%xmm0\n\t" \
/*xmm1 = h1 g1 f1 e1 d1 c1 b1 a1*/ \
"punpckhqdq %%xmm2,%%xmm1\n\t" \
/*xmm2 is free.*/ \
"movdqa %%xmm3,%%xmm2\n\t" \
/*xmm3 = h3 g3 f3 e3 d3 c3 b3 a3*/ \
"punpckhqdq %%xmm4,%%xmm3\n\t" \
/*xmm2 = h2 g2 f2 e2 d2 c2 b2 a2*/ \
"punpcklqdq %%xmm4,%%xmm2\n\t" \
/*xmm4 is free.*/ \
"movdqa %%xmm5,%%xmm4\n\t" \
/*xmm5 = h5 g5 f5 e5 d5 c5 b5 a5*/ \
"punpckhqdq %%xmm6,%%xmm5\n\t" \
/*xmm4 = h4 g4 f4 e4 d4 c4 b4 a4*/ \
"punpcklqdq %%xmm6,%%xmm4\n\t" \
/*xmm6 is free.*/ \
"movdqa %%xmm7,%%xmm6\n\t" \
/*xmm7 = h7 g7 f7 e7 d7 c7 b7 a7*/ \
"punpckhqdq %%xmm8,%%xmm7\n\t" \
/*xmm6 = h6 g6 f6 e6 d6 c6 b6 a6*/ \
"punpcklqdq %%xmm8,%%xmm6\n\t" \
/*xmm8 is free.*/ \
# else
/*Otherwise, we need to spill some values to %[buf] temporarily.
Again, the butterflies are carefully arranged to get the columns to come out
in order, minimizing register spills and maximizing the delay between a load
and when the value loaded is actually used.*/
# define OC_TRANSPOSE_8x8 \
"#OC_TRANSPOSE_8x8\n\t" \
/*buf[0] = a7 a6 a5 a4 a3 a2 a1 a0*/ \
"movdqa %%xmm0,"OC_MEM_OFFS(0x00,buf)"\n\t" \
/*xmm0 is free.*/ \
"movdqa %%xmm2,%%xmm0\n\t" \
/*xmm2 = d7 c7 d6 c6 d5 c5 d4 c4*/ \
"punpckhwd %%xmm3,%%xmm2\n\t" \
/*xmm0 = d3 c3 d2 c2 d1 c1 d0 c0*/ \
"punpcklwd %%xmm3,%%xmm0\n\t" \
/*xmm3 = a7 a6 a5 a4 a3 a2 a1 a0*/ \
"movdqa "OC_MEM_OFFS(0x00,buf)",%%xmm3\n\t" \
/*buf[1] = d7 c7 d6 c6 d5 c5 d4 c4*/ \
"movdqa %%xmm2,"OC_MEM_OFFS(0x10,buf)"\n\t" \
/*xmm2 is free.*/ \
"movdqa %%xmm6,%%xmm2\n\t" \
/*xmm6 = h3 g3 h2 g2 h1 g1 h0 g0*/ \
"punpcklwd %%xmm7,%%xmm6\n\t" \
/*xmm2 = h7 g7 h6 g6 h5 g5 h4 g4*/ \
"punpckhwd %%xmm7,%%xmm2\n\t" \
/*xmm7 is free.*/ \
"movdqa %%xmm4,%%xmm7\n\t" \
/*xmm4 = f3 e3 f2 e2 f1 e1 f0 e0*/ \
"punpcklwd %%xmm5,%%xmm4\n\t" \
/*xmm7 = f7 e7 f6 e6 f5 e5 f4 e4*/ \
"punpckhwd %%xmm5,%%xmm7\n\t" \
/*xmm5 is free.*/ \
"movdqa %%xmm3,%%xmm5\n\t" \
/*xmm3 = b3 a3 b2 a2 b1 a1 b0 a0*/ \
"punpcklwd %%xmm1,%%xmm3\n\t" \
/*xmm5 = b7 a7 b6 a6 b5 a5 b4 a4*/ \
"punpckhwd %%xmm1,%%xmm5\n\t" \
/*xmm1 is free.*/ \
"movdqa %%xmm7,%%xmm1\n\t" \
/*xmm7 = h5 g5 f5 e5 h4 g4 f4 e4*/ \
"punpckldq %%xmm2,%%xmm7\n\t" \
/*xmm1 = h7 g7 f7 e7 h6 g6 f6 e6*/ \
"punpckhdq %%xmm2,%%xmm1\n\t" \
/*xmm2 = d7 c7 d6 c6 d5 c5 d4 c4*/ \
"movdqa "OC_MEM_OFFS(0x10,buf)",%%xmm2\n\t" \
/*buf[0] = h7 g7 f7 e7 h6 g6 f6 e6*/ \
"movdqa %%xmm1,"OC_MEM_OFFS(0x00,buf)"\n\t" \
/*xmm1 is free.*/ \
"movdqa %%xmm3,%%xmm1\n\t" \
/*xmm3 = d3 c3 b3 a3 d2 c2 b2 a2*/ \
"punpckhdq %%xmm0,%%xmm3\n\t" \
/*xmm1 = d1 c1 b1 a1 d0 c0 b0 a0*/ \
"punpckldq %%xmm0,%%xmm1\n\t" \
/*xmm0 is free.*/ \
"movdqa %%xmm4,%%xmm0\n\t" \
/*xmm4 = h3 g3 f3 e3 h2 g2 f2 e2*/ \
"punpckhdq %%xmm6,%%xmm4\n\t" \
/*xmm0 = h1 g1 f1 e1 h0 g0 f0 e0*/ \
"punpckldq %%xmm6,%%xmm0\n\t" \
/*xmm6 is free.*/ \
"movdqa %%xmm5,%%xmm6\n\t" \
/*xmm5 = d5 c5 b5 a5 d4 c4 b4 a4*/ \
"punpckldq %%xmm2,%%xmm5\n\t" \
/*xmm6 = d7 c7 b7 a7 d6 c6 b6 a6*/ \
"punpckhdq %%xmm2,%%xmm6\n\t" \
/*xmm2 is free.*/ \
"movdqa %%xmm1,%%xmm2\n\t" \
/*xmm1 = h1 g1 f1 e1 d1 c1 b1 a1*/ \
"punpckhqdq %%xmm0,%%xmm1\n\t" \
/*xmm2 = h0 g0 f0 e0 d0 c0 b0 a0*/ \
"punpcklqdq %%xmm0,%%xmm2\n\t" \
/*xmm0 = h7 g7 f7 e7 h6 g6 f6 e6*/ \
"movdqa "OC_MEM_OFFS(0x00,buf)",%%xmm0\n\t" \
/*buf[1] = h0 g0 f0 e0 d0 c0 b0 a0*/ \
"movdqa %%xmm2,"OC_MEM_OFFS(0x10,buf)"\n\t" \
/*xmm2 is free.*/ \
"movdqa %%xmm3,%%xmm2\n\t" \
/*xmm3 = h3 g3 f3 e3 d3 c3 b3 a3*/ \
"punpckhqdq %%xmm4,%%xmm3\n\t" \
/*xmm2 = h2 g2 f2 e2 d2 c2 b2 a2*/ \
"punpcklqdq %%xmm4,%%xmm2\n\t" \
/*xmm4 is free.*/ \
"movdqa %%xmm5,%%xmm4\n\t" \
/*xmm5 = h5 g5 f5 e5 d5 c5 b5 a5*/ \
"punpckhqdq %%xmm7,%%xmm5\n\t" \
/*xmm4 = h4 g4 f4 e4 d4 c4 b4 a4*/ \
"punpcklqdq %%xmm7,%%xmm4\n\t" \
/*xmm7 is free.*/ \
"movdqa %%xmm6,%%xmm7\n\t" \
/*xmm6 = h6 g6 f6 e6 d6 c6 b6 a6*/ \
"punpcklqdq %%xmm0,%%xmm6\n\t" \
/*xmm7 = h7 g7 f7 e7 d7 c7 b7 a7*/ \
"punpckhqdq %%xmm0,%%xmm7\n\t" \
/*xmm0 = h0 g0 f0 e0 d0 c0 b0 a0*/ \
"movdqa "OC_MEM_OFFS(0x10,buf)",%%xmm0\n\t" \
# endif
/*Transpose 4 values in each of 8 MMX registers into 8 values in the first
four SSE registers.
No need to be clever here; we have plenty of room.*/
# define OC_TRANSPOSE_8x4_MMX2SSE \
"#OC_TRANSPOSE_8x4_MMX2SSE\n\t" \
"movq2dq %%mm0,%%xmm0\n\t" \
"movq2dq %%mm1,%%xmm1\n\t" \
/*xmmA = b3 a3 b2 a2 b1 a1 b0 a0*/ \
"punpcklwd %%xmm1,%%xmm0\n\t" \
"movq2dq %%mm2,%%xmm3\n\t" \
"movq2dq %%mm3,%%xmm2\n\t" \
/*xmmC = d3 c3 d2 c2 d1 c1 d0 c0*/ \
"punpcklwd %%xmm2,%%xmm3\n\t" \
"movq2dq %%mm4,%%xmm4\n\t" \
"movq2dq %%mm5,%%xmm5\n\t" \
/*xmmE = f3 e3 f2 e2 f1 e1 f0 e0*/ \
"punpcklwd %%xmm5,%%xmm4\n\t" \
"movq2dq %%mm6,%%xmm7\n\t" \
"movq2dq %%mm7,%%xmm6\n\t" \
/*xmmG = h3 g3 h2 g2 h1 g1 h0 g0*/ \
"punpcklwd %%xmm6,%%xmm7\n\t" \
"movdqa %%xmm0,%%xmm2\n\t" \
/*xmm0 = d1 c1 b1 a1 d0 c0 b0 a0*/ \
"punpckldq %%xmm3,%%xmm0\n\t" \
/*xmm2 = d3 c3 b3 a3 d2 c2 b2 a2*/ \
"punpckhdq %%xmm3,%%xmm2\n\t" \
"movdqa %%xmm4,%%xmm5\n\t" \
/*xmm4 = h1 g1 f1 e1 h0 g0 f0 e0*/ \
"punpckldq %%xmm7,%%xmm4\n\t" \
/*xmm3 = h3 g3 f3 e3 h2 g2 f2 e2*/ \
"punpckhdq %%xmm7,%%xmm5\n\t" \
"movdqa %%xmm0,%%xmm1\n\t" \
/*xmm0 = h0 g0 f0 e0 d0 c0 b0 a0*/ \
"punpcklqdq %%xmm4,%%xmm0\n\t" \
/*xmm1 = h1 g1 f1 e1 d1 c1 b1 a1*/ \
"punpckhqdq %%xmm4,%%xmm1\n\t" \
"movdqa %%xmm2,%%xmm3\n\t" \
/*xmm2 = h2 g2 f2 e2 d2 c2 b2 a2*/ \
"punpcklqdq %%xmm5,%%xmm2\n\t" \
/*xmm3 = h3 g3 f3 e3 d3 c3 b3 a3*/ \
"punpckhqdq %%xmm5,%%xmm3\n\t" \
#endif

View File

@ -1,182 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
CPU capability detection for x86 processors.
Originally written by Rudolf Marek.
function:
last mod: $Id$
********************************************************************/
#include "x86cpu.h"
#if !defined(OC_X86_ASM)
ogg_uint32_t oc_cpu_flags_get(void){
return 0;
}
#else
# if defined(__amd64__)||defined(__x86_64__)
/*On x86-64, gcc seems to be able to figure out how to save %rbx for us when
compiling with -fPIC.*/
# define cpuid(_op,_eax,_ebx,_ecx,_edx) \
__asm__ __volatile__( \
"cpuid\n\t" \
:[eax]"=a"(_eax),[ebx]"=b"(_ebx),[ecx]"=c"(_ecx),[edx]"=d"(_edx) \
:"a"(_op) \
:"cc" \
)
# else
/*On x86-32, not so much.*/
# define cpuid(_op,_eax,_ebx,_ecx,_edx) \
__asm__ __volatile__( \
"xchgl %%ebx,%[ebx]\n\t" \
"cpuid\n\t" \
"xchgl %%ebx,%[ebx]\n\t" \
:[eax]"=a"(_eax),[ebx]"=r"(_ebx),[ecx]"=c"(_ecx),[edx]"=d"(_edx) \
:"a"(_op) \
:"cc" \
)
# endif
static ogg_uint32_t oc_parse_intel_flags(ogg_uint32_t _edx,ogg_uint32_t _ecx){
ogg_uint32_t flags;
/*If there isn't even MMX, give up.*/
if(!(_edx&0x00800000))return 0;
flags=OC_CPU_X86_MMX;
if(_edx&0x02000000)flags|=OC_CPU_X86_MMXEXT|OC_CPU_X86_SSE;
if(_edx&0x04000000)flags|=OC_CPU_X86_SSE2;
if(_ecx&0x00000001)flags|=OC_CPU_X86_PNI;
if(_ecx&0x00000100)flags|=OC_CPU_X86_SSSE3;
if(_ecx&0x00080000)flags|=OC_CPU_X86_SSE4_1;
if(_ecx&0x00100000)flags|=OC_CPU_X86_SSE4_2;
return flags;
}
static ogg_uint32_t oc_parse_amd_flags(ogg_uint32_t _edx,ogg_uint32_t _ecx){
ogg_uint32_t flags;
/*If there isn't even MMX, give up.*/
if(!(_edx&0x00800000))return 0;
flags=OC_CPU_X86_MMX;
if(_edx&0x00400000)flags|=OC_CPU_X86_MMXEXT;
if(_edx&0x80000000)flags|=OC_CPU_X86_3DNOW;
if(_edx&0x40000000)flags|=OC_CPU_X86_3DNOWEXT;
if(_ecx&0x00000040)flags|=OC_CPU_X86_SSE4A;
if(_ecx&0x00000800)flags|=OC_CPU_X86_SSE5;
return flags;
}
ogg_uint32_t oc_cpu_flags_get(void){
ogg_uint32_t flags;
ogg_uint32_t eax;
ogg_uint32_t ebx;
ogg_uint32_t ecx;
ogg_uint32_t edx;
# if !defined(__amd64__)&&!defined(__x86_64__)
/*Not all x86-32 chips support cpuid, so we have to check.*/
__asm__ __volatile__(
"pushfl\n\t"
"pushfl\n\t"
"popl %[a]\n\t"
"movl %[a],%[b]\n\t"
"xorl $0x200000,%[a]\n\t"
"pushl %[a]\n\t"
"popfl\n\t"
"pushfl\n\t"
"popl %[a]\n\t"
"popfl\n\t"
:[a]"=r"(eax),[b]"=r"(ebx)
:
:"cc"
);
/*No cpuid.*/
if(eax==ebx)return 0;
# endif
cpuid(0,eax,ebx,ecx,edx);
/* l e t n I e n i u n e G*/
if(ecx==0x6C65746E&&edx==0x49656E69&&ebx==0x756E6547||
/* 6 8 x M T e n i u n e G*/
ecx==0x3638784D&&edx==0x54656E69&&ebx==0x756E6547){
int family;
int model;
/*Intel, Transmeta (tested with Crusoe TM5800):*/
cpuid(1,eax,ebx,ecx,edx);
flags=oc_parse_intel_flags(edx,ecx);
family=(eax>>8)&0xF;
model=(eax>>4)&0xF;
/*The SSE unit on the Pentium M and Core Duo is much slower than the MMX
unit, so don't use it.*/
if(family==6&&(model==9||model==13||model==14)){
flags&=~(OC_CPU_X86_SSE2|OC_CPU_X86_PNI);
}
}
/* D M A c i t n e h t u A*/
else if(ecx==0x444D4163&&edx==0x69746E65&&ebx==0x68747541||
/* C S N y b e d o e G*/
ecx==0x43534e20&&edx==0x79622065&&ebx==0x646f6547){
/*AMD, Geode:*/
cpuid(0x80000000,eax,ebx,ecx,edx);
if(eax<0x80000001)flags=0;
else{
cpuid(0x80000001,eax,ebx,ecx,edx);
flags=oc_parse_amd_flags(edx,ecx);
}
/*Also check for SSE.*/
cpuid(1,eax,ebx,ecx,edx);
flags|=oc_parse_intel_flags(edx,ecx);
}
/*Technically some VIA chips can be configured in the BIOS to return any
string here the user wants.
There is a special detection method that can be used to identify such
processors, but in my opinion, if the user really wants to change it, they
deserve what they get.*/
/* s l u a H r u a t n e C*/
else if(ecx==0x736C7561&&edx==0x48727561&&ebx==0x746E6543){
/*VIA:*/
/*I only have documentation for the C7 (Esther) and Isaiah (forthcoming)
chips (thanks to the engineers from Centaur Technology who provided it).
These chips support Intel-like cpuid info.
The C3-2 (Nehemiah) cores appear to, as well.*/
cpuid(1,eax,ebx,ecx,edx);
flags=oc_parse_intel_flags(edx,ecx);
if(eax>=0x80000001){
/*The (non-Nehemiah) C3 processors support AMD-like cpuid info.
We need to check this even if the Intel test succeeds to pick up 3DNow!
support on these processors.
Unlike actual AMD processors, we cannot _rely_ on this info, since
some cores (e.g., the 693 stepping of the Nehemiah) claim to support
this function, yet return edx=0, despite the Intel test indicating
MMX support.
Therefore the features detected here are strictly added to those
detected by the Intel test.*/
/*TODO: How about earlier chips?*/
cpuid(0x80000001,eax,ebx,ecx,edx);
/*Note: As of the C7, this function returns Intel-style extended feature
flags, not AMD-style.
Currently, this only defines bits 11, 20, and 29 (0x20100800), which
do not conflict with any of the AMD flags we inspect.
For the remaining bits, Intel tells us, "Do not count on their value",
but VIA assures us that they will all be zero (at least on the C7 and
Isaiah chips).
In the (unlikely) event a future processor uses bits 18, 19, 30, or 31
(0xC0C00000) for something else, we will have to add code to detect
the model to decide when it is appropriate to inspect them.*/
flags|=oc_parse_amd_flags(edx,ecx);
}
}
else{
/*Implement me.*/
flags=0;
}
return flags;
}
#endif

View File

@ -1,36 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_x86_x86cpu_H)
# define _x86_x86cpu_H (1)
#include "../internal.h"
#define OC_CPU_X86_MMX (1<<0)
#define OC_CPU_X86_3DNOW (1<<1)
#define OC_CPU_X86_3DNOWEXT (1<<2)
#define OC_CPU_X86_MMXEXT (1<<3)
#define OC_CPU_X86_SSE (1<<4)
#define OC_CPU_X86_SSE2 (1<<5)
#define OC_CPU_X86_PNI (1<<6)
#define OC_CPU_X86_SSSE3 (1<<7)
#define OC_CPU_X86_SSE4_1 (1<<8)
#define OC_CPU_X86_SSE4_2 (1<<9)
#define OC_CPU_X86_SSE4A (1<<10)
#define OC_CPU_X86_SSE5 (1<<11)
ogg_uint32_t oc_cpu_flags_get(void);
#endif

View File

@ -1,122 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_x86_x86int_H)
# define _x86_x86int_H (1)
# include "../internal.h"
# if defined(OC_X86_ASM)
# define oc_state_accel_init oc_state_accel_init_x86
# if defined(OC_X86_64_ASM)
/*x86-64 guarantees SIMD support up through at least SSE2.
If the best routine we have available only needs SSE2 (which at the moment
covers all of them), then we can avoid runtime detection and the indirect
call.*/
# define oc_frag_copy(_state,_dst,_src,_ystride) \
oc_frag_copy_mmx(_dst,_src,_ystride)
# define oc_frag_copy_list(_state,_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs) \
oc_frag_copy_list_mmx(_dst_frame,_src_frame,_ystride, \
_fragis,_nfragis,_frag_buf_offs)
# define oc_frag_recon_intra(_state,_dst,_ystride,_residue) \
oc_frag_recon_intra_mmx(_dst,_ystride,_residue)
# define oc_frag_recon_inter(_state,_dst,_src,_ystride,_residue) \
oc_frag_recon_inter_mmx(_dst,_src,_ystride,_residue)
# define oc_frag_recon_inter2(_state,_dst,_src1,_src2,_ystride,_residue) \
oc_frag_recon_inter2_mmx(_dst,_src1,_src2,_ystride,_residue)
# define oc_idct8x8(_state,_y,_x,_last_zzi) \
oc_idct8x8_sse2(_y,_x,_last_zzi)
# define oc_state_frag_recon oc_state_frag_recon_mmx
# define oc_loop_filter_init(_state,_bv,_flimit) \
oc_loop_filter_init_mmxext(_bv,_flimit)
# define oc_state_loop_filter_frag_rows oc_state_loop_filter_frag_rows_mmxext
# define oc_restore_fpu(_state) \
oc_restore_fpu_mmx()
# else
# define OC_STATE_USE_VTABLE (1)
# endif
# endif
# include "../state.h"
# include "x86cpu.h"
/*Converts the expression in the argument to a string.*/
#define OC_M2STR(_s) #_s
/*Memory operands do not always include an offset.
To avoid warnings, we force an offset with %H (which adds 8).*/
# if __GNUC_PREREQ(4,0)
# define OC_MEM_OFFS(_offs,_name) \
OC_M2STR(_offs-8+%H[_name])
# endif
/*If your gcc version does't support %H, then you get to suffer the warnings.
Note that Apple's gas breaks on things like _offs+(%esp): it throws away the
whole offset, instead of substituting in 0 for the missing operand to +.*/
# if !defined(OC_MEM_OFFS)
# define OC_MEM_OFFS(_offs,_name) \
OC_M2STR(_offs+%[_name])
# endif
/*Declare an array operand with an exact size.
This tells gcc we're going to clobber this memory region, without having to
clobber all of "memory" and lets us access local buffers directly using the
stack pointer, without allocating a separate register to point to them.*/
#define OC_ARRAY_OPERAND(_type,_ptr,_size) \
(*({ \
struct{_type array_value__[(_size)];} *array_addr__=(void *)(_ptr); \
array_addr__; \
}))
/*Declare an array operand with an exact size.
This tells gcc we're going to clobber this memory region, without having to
clobber all of "memory" and lets us access local buffers directly using the
stack pointer, without allocating a separate register to point to them.*/
#define OC_CONST_ARRAY_OPERAND(_type,_ptr,_size) \
(*({ \
const struct{_type array_value__[(_size)];} *array_addr__= \
(const void *)(_ptr); \
array_addr__; \
}))
extern const unsigned short __attribute__((aligned(16))) OC_IDCT_CONSTS[64];
void oc_state_accel_init_x86(oc_theora_state *_state);
void oc_frag_copy_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride);
void oc_frag_copy_list_mmx(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_mmx(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue);
void oc_idct8x8_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_idct8x8_sse2(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_mmx(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_loop_filter_init_mmx(signed char _bv[256],int _flimit);
void oc_loop_filter_init_mmxext(signed char _bv[256],int _flimit);
void oc_state_loop_filter_frag_rows_mmx(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end);
void oc_state_loop_filter_frag_rows_mmxext(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end);
void oc_restore_fpu_mmx(void);
#endif

View File

@ -1,97 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include "x86int.h"
#if defined(OC_X86_ASM)
#if defined(OC_STATE_USE_VTABLE)
/*This table has been modified from OC_FZIG_ZAG by baking a 4x4 transpose into
each quadrant of the destination.*/
static const unsigned char OC_FZIG_ZAG_MMX[128]={
0, 8, 1, 2, 9,16,24,17,
10, 3,32,11,18,25, 4,12,
5,26,19,40,33,34,41,48,
27, 6,13,20,28,21,14, 7,
56,49,42,35,43,50,57,36,
15,22,29,30,23,44,37,58,
51,59,38,45,52,31,60,53,
46,39,47,54,61,62,55,63,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64
};
#endif
/*This table has been modified from OC_FZIG_ZAG by baking an 8x8 transpose into
the destination.*/
static const unsigned char OC_FZIG_ZAG_SSE2[128]={
0, 8, 1, 2, 9,16,24,17,
10, 3, 4,11,18,25,32,40,
33,26,19,12, 5, 6,13,20,
27,34,41,48,56,49,42,35,
28,21,14, 7,15,22,29,36,
43,50,57,58,51,44,37,30,
23,31,38,45,52,59,60,53,
46,39,47,54,61,62,55,63,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64
};
void oc_state_accel_init_x86(oc_theora_state *_state){
oc_state_accel_init_c(_state);
_state->cpu_flags=oc_cpu_flags_get();
# if defined(OC_STATE_USE_VTABLE)
if(_state->cpu_flags&OC_CPU_X86_MMX){
_state->opt_vtable.frag_copy=oc_frag_copy_mmx;
_state->opt_vtable.frag_copy_list=oc_frag_copy_list_mmx;
_state->opt_vtable.frag_recon_intra=oc_frag_recon_intra_mmx;
_state->opt_vtable.frag_recon_inter=oc_frag_recon_inter_mmx;
_state->opt_vtable.frag_recon_inter2=oc_frag_recon_inter2_mmx;
_state->opt_vtable.idct8x8=oc_idct8x8_mmx;
_state->opt_vtable.state_frag_recon=oc_state_frag_recon_mmx;
_state->opt_vtable.loop_filter_init=oc_loop_filter_init_mmx;
_state->opt_vtable.state_loop_filter_frag_rows=
oc_state_loop_filter_frag_rows_mmx;
_state->opt_vtable.restore_fpu=oc_restore_fpu_mmx;
_state->opt_data.dct_fzig_zag=OC_FZIG_ZAG_MMX;
}
if(_state->cpu_flags&OC_CPU_X86_MMXEXT){
_state->opt_vtable.loop_filter_init=oc_loop_filter_init_mmxext;
_state->opt_vtable.state_loop_filter_frag_rows=
oc_state_loop_filter_frag_rows_mmxext;
}
if(_state->cpu_flags&OC_CPU_X86_SSE2){
_state->opt_vtable.idct8x8=oc_idct8x8_sse2;
# endif
_state->opt_data.dct_fzig_zag=OC_FZIG_ZAG_SSE2;
# if defined(OC_STATE_USE_VTABLE)
}
# endif
}
#endif

View File

@ -1,416 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of fragment reconstruction for motion compensation.
Originally written by Rudolf Marek.
Additional optimization by Nils Pipenbrinck.
Note: Loops are unrolled for best performance.
The iteration each instruction belongs to is marked in the comments as #i.*/
#include <stddef.h>
#include "x86int.h"
#if defined(OC_X86_ASM)
/*Copies an 8x8 block of pixels from _src to _dst, assuming _ystride bytes
between rows.*/
# define OC_FRAG_COPY_MMX(_dst,_src,_ystride) \
do{ \
const unsigned char *src; \
unsigned char *dst; \
src=(_src); \
dst=(_dst); \
__asm mov SRC,src \
__asm mov DST,dst \
__asm mov YSTRIDE,_ystride \
/*src+0*ystride*/ \
__asm movq mm0,[SRC] \
/*src+1*ystride*/ \
__asm movq mm1,[SRC+YSTRIDE] \
/*ystride3=ystride*3*/ \
__asm lea YSTRIDE3,[YSTRIDE+YSTRIDE*2] \
/*src+2*ystride*/ \
__asm movq mm2,[SRC+YSTRIDE*2] \
/*src+3*ystride*/ \
__asm movq mm3,[SRC+YSTRIDE3] \
/*dst+0*ystride*/ \
__asm movq [DST],mm0 \
/*dst+1*ystride*/ \
__asm movq [DST+YSTRIDE],mm1 \
/*Pointer to next 4.*/ \
__asm lea SRC,[SRC+YSTRIDE*4] \
/*dst+2*ystride*/ \
__asm movq [DST+YSTRIDE*2],mm2 \
/*dst+3*ystride*/ \
__asm movq [DST+YSTRIDE3],mm3 \
/*Pointer to next 4.*/ \
__asm lea DST,[DST+YSTRIDE*4] \
/*src+0*ystride*/ \
__asm movq mm0,[SRC] \
/*src+1*ystride*/ \
__asm movq mm1,[SRC+YSTRIDE] \
/*src+2*ystride*/ \
__asm movq mm2,[SRC+YSTRIDE*2] \
/*src+3*ystride*/ \
__asm movq mm3,[SRC+YSTRIDE3] \
/*dst+0*ystride*/ \
__asm movq [DST],mm0 \
/*dst+1*ystride*/ \
__asm movq [DST+YSTRIDE],mm1 \
/*dst+2*ystride*/ \
__asm movq [DST+YSTRIDE*2],mm2 \
/*dst+3*ystride*/ \
__asm movq [DST+YSTRIDE3],mm3 \
} \
while(0)
/*Copies an 8x8 block of pixels from _src to _dst, assuming _ystride bytes
between rows.*/
void oc_frag_copy_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride){
#define SRC edx
#define DST eax
#define YSTRIDE ecx
#define YSTRIDE3 esi
OC_FRAG_COPY_MMX(_dst,_src,_ystride);
#undef SRC
#undef DST
#undef YSTRIDE
#undef YSTRIDE3
}
/*Copies the fragments specified by the lists of fragment indices from one
frame to another.
_dst_frame: The reference frame to copy to.
_src_frame: The reference frame to copy from.
_ystride: The row stride of the reference frames.
_fragis: A pointer to a list of fragment indices.
_nfragis: The number of fragment indices to copy.
_frag_buf_offs: The offsets of fragments in the reference frames.*/
void oc_frag_copy_list_mmx(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs){
ptrdiff_t fragii;
for(fragii=0;fragii<_nfragis;fragii++){
ptrdiff_t frag_buf_off;
frag_buf_off=_frag_buf_offs[_fragis[fragii]];
#define SRC edx
#define DST eax
#define YSTRIDE ecx
#define YSTRIDE3 edi
OC_FRAG_COPY_MMX(_dst_frame+frag_buf_off,
_src_frame+frag_buf_off,_ystride);
#undef SRC
#undef DST
#undef YSTRIDE
#undef YSTRIDE3
}
}
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue){
__asm{
#define DST edx
#define DST4 esi
#define YSTRIDE eax
#define YSTRIDE3 edi
#define RESIDUE ecx
mov DST,_dst
mov YSTRIDE,_ystride
mov RESIDUE,_residue
lea DST4,[DST+YSTRIDE*4]
lea YSTRIDE3,[YSTRIDE+YSTRIDE*2]
/*Set mm0 to 0xFFFFFFFFFFFFFFFF.*/
pcmpeqw mm0,mm0
/*#0 Load low residue.*/
movq mm1,[0*8+RESIDUE]
/*#0 Load high residue.*/
movq mm2,[1*8+RESIDUE]
/*Set mm0 to 0x8000800080008000.*/
psllw mm0,15
/*#1 Load low residue.*/
movq mm3,[2*8+RESIDUE]
/*#1 Load high residue.*/
movq mm4,[3*8+RESIDUE]
/*Set mm0 to 0x0080008000800080.*/
psrlw mm0,8
/*#2 Load low residue.*/
movq mm5,[4*8+RESIDUE]
/*#2 Load high residue.*/
movq mm6,[5*8+RESIDUE]
/*#0 Bias low residue.*/
paddsw mm1,mm0
/*#0 Bias high residue.*/
paddsw mm2,mm0
/*#0 Pack to byte.*/
packuswb mm1,mm2
/*#1 Bias low residue.*/
paddsw mm3,mm0
/*#1 Bias high residue.*/
paddsw mm4,mm0
/*#1 Pack to byte.*/
packuswb mm3,mm4
/*#2 Bias low residue.*/
paddsw mm5,mm0
/*#2 Bias high residue.*/
paddsw mm6,mm0
/*#2 Pack to byte.*/
packuswb mm5,mm6
/*#0 Write row.*/
movq [DST],mm1
/*#1 Write row.*/
movq [DST+YSTRIDE],mm3
/*#2 Write row.*/
movq [DST+YSTRIDE*2],mm5
/*#3 Load low residue.*/
movq mm1,[6*8+RESIDUE]
/*#3 Load high residue.*/
movq mm2,[7*8+RESIDUE]
/*#4 Load high residue.*/
movq mm3,[8*8+RESIDUE]
/*#4 Load high residue.*/
movq mm4,[9*8+RESIDUE]
/*#5 Load high residue.*/
movq mm5,[10*8+RESIDUE]
/*#5 Load high residue.*/
movq mm6,[11*8+RESIDUE]
/*#3 Bias low residue.*/
paddsw mm1,mm0
/*#3 Bias high residue.*/
paddsw mm2,mm0
/*#3 Pack to byte.*/
packuswb mm1,mm2
/*#4 Bias low residue.*/
paddsw mm3,mm0
/*#4 Bias high residue.*/
paddsw mm4,mm0
/*#4 Pack to byte.*/
packuswb mm3,mm4
/*#5 Bias low residue.*/
paddsw mm5,mm0
/*#5 Bias high residue.*/
paddsw mm6,mm0
/*#5 Pack to byte.*/
packuswb mm5,mm6
/*#3 Write row.*/
movq [DST+YSTRIDE3],mm1
/*#4 Write row.*/
movq [DST4],mm3
/*#5 Write row.*/
movq [DST4+YSTRIDE],mm5
/*#6 Load low residue.*/
movq mm1,[12*8+RESIDUE]
/*#6 Load high residue.*/
movq mm2,[13*8+RESIDUE]
/*#7 Load low residue.*/
movq mm3,[14*8+RESIDUE]
/*#7 Load high residue.*/
movq mm4,[15*8+RESIDUE]
/*#6 Bias low residue.*/
paddsw mm1,mm0
/*#6 Bias high residue.*/
paddsw mm2,mm0
/*#6 Pack to byte.*/
packuswb mm1,mm2
/*#7 Bias low residue.*/
paddsw mm3,mm0
/*#7 Bias high residue.*/
paddsw mm4,mm0
/*#7 Pack to byte.*/
packuswb mm3,mm4
/*#6 Write row.*/
movq [DST4+YSTRIDE*2],mm1
/*#7 Write row.*/
movq [DST4+YSTRIDE3],mm3
#undef DST
#undef DST4
#undef YSTRIDE
#undef YSTRIDE3
#undef RESIDUE
}
}
void oc_frag_recon_inter_mmx(unsigned char *_dst,const unsigned char *_src,
int _ystride,const ogg_int16_t *_residue){
int i;
/*Zero mm0.*/
__asm pxor mm0,mm0;
for(i=4;i-->0;){
__asm{
#define DST edx
#define SRC ecx
#define YSTRIDE edi
#define RESIDUE eax
mov DST,_dst
mov SRC,_src
mov YSTRIDE,_ystride
mov RESIDUE,_residue
/*#0 Load source.*/
movq mm3,[SRC]
/*#1 Load source.*/
movq mm7,[SRC+YSTRIDE]
/*#0 Get copy of src.*/
movq mm4,mm3
/*#0 Expand high source.*/
punpckhbw mm4,mm0
/*#0 Expand low source.*/
punpcklbw mm3,mm0
/*#0 Add residue high.*/
paddsw mm4,[8+RESIDUE]
/*#1 Get copy of src.*/
movq mm2,mm7
/*#0 Add residue low.*/
paddsw mm3,[RESIDUE]
/*#1 Expand high source.*/
punpckhbw mm2,mm0
/*#0 Pack final row pixels.*/
packuswb mm3,mm4
/*#1 Expand low source.*/
punpcklbw mm7,mm0
/*#1 Add residue low.*/
paddsw mm7,[16+RESIDUE]
/*#1 Add residue high.*/
paddsw mm2,[24+RESIDUE]
/*Advance residue.*/
lea RESIDUE,[32+RESIDUE]
/*#1 Pack final row pixels.*/
packuswb mm7,mm2
/*Advance src.*/
lea SRC,[SRC+YSTRIDE*2]
/*#0 Write row.*/
movq [DST],mm3
/*#1 Write row.*/
movq [DST+YSTRIDE],mm7
/*Advance dst.*/
lea DST,[DST+YSTRIDE*2]
mov _residue,RESIDUE
mov _dst,DST
mov _src,SRC
#undef DST
#undef SRC
#undef YSTRIDE
#undef RESIDUE
}
}
}
void oc_frag_recon_inter2_mmx(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue){
int i;
/*Zero mm7.*/
__asm pxor mm7,mm7;
for(i=4;i-->0;){
__asm{
#define SRC1 ecx
#define SRC2 edi
#define YSTRIDE esi
#define RESIDUE edx
#define DST eax
mov YSTRIDE,_ystride
mov DST,_dst
mov RESIDUE,_residue
mov SRC1,_src1
mov SRC2,_src2
/*#0 Load src1.*/
movq mm0,[SRC1]
/*#0 Load src2.*/
movq mm2,[SRC2]
/*#0 Copy src1.*/
movq mm1,mm0
/*#0 Copy src2.*/
movq mm3,mm2
/*#1 Load src1.*/
movq mm4,[SRC1+YSTRIDE]
/*#0 Unpack lower src1.*/
punpcklbw mm0,mm7
/*#1 Load src2.*/
movq mm5,[SRC2+YSTRIDE]
/*#0 Unpack higher src1.*/
punpckhbw mm1,mm7
/*#0 Unpack lower src2.*/
punpcklbw mm2,mm7
/*#0 Unpack higher src2.*/
punpckhbw mm3,mm7
/*Advance src1 ptr.*/
lea SRC1,[SRC1+YSTRIDE*2]
/*Advance src2 ptr.*/
lea SRC2,[SRC2+YSTRIDE*2]
/*#0 Lower src1+src2.*/
paddsw mm0,mm2
/*#0 Higher src1+src2.*/
paddsw mm1,mm3
/*#1 Copy src1.*/
movq mm2,mm4
/*#0 Build lo average.*/
psraw mm0,1
/*#1 Copy src2.*/
movq mm3,mm5
/*#1 Unpack lower src1.*/
punpcklbw mm4,mm7
/*#0 Build hi average.*/
psraw mm1,1
/*#1 Unpack higher src1.*/
punpckhbw mm2,mm7
/*#0 low+=residue.*/
paddsw mm0,[RESIDUE]
/*#1 Unpack lower src2.*/
punpcklbw mm5,mm7
/*#0 high+=residue.*/
paddsw mm1,[8+RESIDUE]
/*#1 Unpack higher src2.*/
punpckhbw mm3,mm7
/*#1 Lower src1+src2.*/
paddsw mm5,mm4
/*#0 Pack and saturate.*/
packuswb mm0,mm1
/*#1 Higher src1+src2.*/
paddsw mm3,mm2
/*#0 Write row.*/
movq [DST],mm0
/*#1 Build lo average.*/
psraw mm5,1
/*#1 Build hi average.*/
psraw mm3,1
/*#1 low+=residue.*/
paddsw mm5,[16+RESIDUE]
/*#1 high+=residue.*/
paddsw mm3,[24+RESIDUE]
/*#1 Pack and saturate.*/
packuswb mm5,mm3
/*#1 Write row ptr.*/
movq [DST+YSTRIDE],mm5
/*Advance residue ptr.*/
add RESIDUE,32
/*Advance dest ptr.*/
lea DST,[DST+YSTRIDE*2]
mov _dst,DST
mov _residue,RESIDUE
mov _src1,SRC1
mov _src2,SRC2
#undef SRC1
#undef SRC2
#undef YSTRIDE
#undef RESIDUE
#undef DST
}
}
}
void oc_restore_fpu_mmx(void){
__asm emms;
}
#endif

View File

@ -1,592 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of Theora's iDCT.
Originally written by Rudolf Marek, based on code from On2's VP3.*/
#include "x86int.h"
#include "../dct.h"
#if defined(OC_X86_ASM)
/*These are offsets into the table of constants below.*/
/*7 rows of cosines, in order: pi/16 * (1 ... 7).*/
#define OC_COSINE_OFFSET (8)
/*A row of 8's.*/
#define OC_EIGHT_OFFSET (0)
/*A table of constants used by the MMX routines.*/
static const OC_ALIGN16(ogg_uint16_t) OC_IDCT_CONSTS[(1+7)*4]={
8, 8, 8, 8,
(ogg_uint16_t)OC_C1S7,(ogg_uint16_t)OC_C1S7,
(ogg_uint16_t)OC_C1S7,(ogg_uint16_t)OC_C1S7,
(ogg_uint16_t)OC_C2S6,(ogg_uint16_t)OC_C2S6,
(ogg_uint16_t)OC_C2S6,(ogg_uint16_t)OC_C2S6,
(ogg_uint16_t)OC_C3S5,(ogg_uint16_t)OC_C3S5,
(ogg_uint16_t)OC_C3S5,(ogg_uint16_t)OC_C3S5,
(ogg_uint16_t)OC_C4S4,(ogg_uint16_t)OC_C4S4,
(ogg_uint16_t)OC_C4S4,(ogg_uint16_t)OC_C4S4,
(ogg_uint16_t)OC_C5S3,(ogg_uint16_t)OC_C5S3,
(ogg_uint16_t)OC_C5S3,(ogg_uint16_t)OC_C5S3,
(ogg_uint16_t)OC_C6S2,(ogg_uint16_t)OC_C6S2,
(ogg_uint16_t)OC_C6S2,(ogg_uint16_t)OC_C6S2,
(ogg_uint16_t)OC_C7S1,(ogg_uint16_t)OC_C7S1,
(ogg_uint16_t)OC_C7S1,(ogg_uint16_t)OC_C7S1
};
/*38 cycles*/
#define OC_IDCT_BEGIN(_y,_x) __asm{ \
__asm movq mm2,OC_I(3,_x) \
__asm movq mm6,OC_C(3) \
__asm movq mm4,mm2 \
__asm movq mm7,OC_J(5,_x) \
__asm pmulhw mm4,mm6 \
__asm movq mm1,OC_C(5) \
__asm pmulhw mm6,mm7 \
__asm movq mm5,mm1 \
__asm pmulhw mm1,mm2 \
__asm movq mm3,OC_I(1,_x) \
__asm pmulhw mm5,mm7 \
__asm movq mm0,OC_C(1) \
__asm paddw mm4,mm2 \
__asm paddw mm6,mm7 \
__asm paddw mm2,mm1 \
__asm movq mm1,OC_J(7,_x) \
__asm paddw mm7,mm5 \
__asm movq mm5,mm0 \
__asm pmulhw mm0,mm3 \
__asm paddw mm4,mm7 \
__asm pmulhw mm5,mm1 \
__asm movq mm7,OC_C(7) \
__asm psubw mm6,mm2 \
__asm paddw mm0,mm3 \
__asm pmulhw mm3,mm7 \
__asm movq mm2,OC_I(2,_x) \
__asm pmulhw mm7,mm1 \
__asm paddw mm5,mm1 \
__asm movq mm1,mm2 \
__asm pmulhw mm2,OC_C(2) \
__asm psubw mm3,mm5 \
__asm movq mm5,OC_J(6,_x) \
__asm paddw mm0,mm7 \
__asm movq mm7,mm5 \
__asm psubw mm0,mm4 \
__asm pmulhw mm5,OC_C(2) \
__asm paddw mm2,mm1 \
__asm pmulhw mm1,OC_C(6) \
__asm paddw mm4,mm4 \
__asm paddw mm4,mm0 \
__asm psubw mm3,mm6 \
__asm paddw mm5,mm7 \
__asm paddw mm6,mm6 \
__asm pmulhw mm7,OC_C(6) \
__asm paddw mm6,mm3 \
__asm movq OC_I(1,_y),mm4 \
__asm psubw mm1,mm5 \
__asm movq mm4,OC_C(4) \
__asm movq mm5,mm3 \
__asm pmulhw mm3,mm4 \
__asm paddw mm7,mm2 \
__asm movq OC_I(2,_y),mm6 \
__asm movq mm2,mm0 \
__asm movq mm6,OC_I(0,_x) \
__asm pmulhw mm0,mm4 \
__asm paddw mm5,mm3 \
__asm movq mm3,OC_J(4,_x) \
__asm psubw mm5,mm1 \
__asm paddw mm2,mm0 \
__asm psubw mm6,mm3 \
__asm movq mm0,mm6 \
__asm pmulhw mm6,mm4 \
__asm paddw mm3,mm3 \
__asm paddw mm1,mm1 \
__asm paddw mm3,mm0 \
__asm paddw mm1,mm5 \
__asm pmulhw mm4,mm3 \
__asm paddw mm6,mm0 \
__asm psubw mm6,mm2 \
__asm paddw mm2,mm2 \
__asm movq mm0,OC_I(1,_y) \
__asm paddw mm2,mm6 \
__asm paddw mm4,mm3 \
__asm psubw mm2,mm1 \
}
/*38+8=46 cycles.*/
#define OC_ROW_IDCT(_y,_x) __asm{ \
OC_IDCT_BEGIN(_y,_x) \
/*r3=D'*/ \
__asm movq mm3,OC_I(2,_y) \
/*r4=E'=E-G*/ \
__asm psubw mm4,mm7 \
/*r1=H'+H'*/ \
__asm paddw mm1,mm1 \
/*r7=G+G*/ \
__asm paddw mm7,mm7 \
/*r1=R1=A''+H'*/ \
__asm paddw mm1,mm2 \
/*r7=G'=E+G*/ \
__asm paddw mm7,mm4 \
/*r4=R4=E'-D'*/ \
__asm psubw mm4,mm3 \
__asm paddw mm3,mm3 \
/*r6=R6=F'-B''*/ \
__asm psubw mm6,mm5 \
__asm paddw mm5,mm5 \
/*r3=R3=E'+D'*/ \
__asm paddw mm3,mm4 \
/*r5=R5=F'+B''*/ \
__asm paddw mm5,mm6 \
/*r7=R7=G'-C'*/ \
__asm psubw mm7,mm0 \
__asm paddw mm0,mm0 \
/*Save R1.*/ \
__asm movq OC_I(1,_y),mm1 \
/*r0=R0=G.+C.*/ \
__asm paddw mm0,mm7 \
}
/*The following macro does two 4x4 transposes in place.
At entry, we assume:
r0 = a3 a2 a1 a0
I(1) = b3 b2 b1 b0
r2 = c3 c2 c1 c0
r3 = d3 d2 d1 d0
r4 = e3 e2 e1 e0
r5 = f3 f2 f1 f0
r6 = g3 g2 g1 g0
r7 = h3 h2 h1 h0
At exit, we have:
I(0) = d0 c0 b0 a0
I(1) = d1 c1 b1 a1
I(2) = d2 c2 b2 a2
I(3) = d3 c3 b3 a3
J(4) = h0 g0 f0 e0
J(5) = h1 g1 f1 e1
J(6) = h2 g2 f2 e2
J(7) = h3 g3 f3 e3
I(0) I(1) I(2) I(3) is the transpose of r0 I(1) r2 r3.
J(4) J(5) J(6) J(7) is the transpose of r4 r5 r6 r7.
Since r1 is free at entry, we calculate the Js first.*/
/*19 cycles.*/
#define OC_TRANSPOSE(_y) __asm{ \
__asm movq mm1,mm4 \
__asm punpcklwd mm4,mm5 \
__asm movq OC_I(0,_y),mm0 \
__asm punpckhwd mm1,mm5 \
__asm movq mm0,mm6 \
__asm punpcklwd mm6,mm7 \
__asm movq mm5,mm4 \
__asm punpckldq mm4,mm6 \
__asm punpckhdq mm5,mm6 \
__asm movq mm6,mm1 \
__asm movq OC_J(4,_y),mm4 \
__asm punpckhwd mm0,mm7 \
__asm movq OC_J(5,_y),mm5 \
__asm punpckhdq mm6,mm0 \
__asm movq mm4,OC_I(0,_y) \
__asm punpckldq mm1,mm0 \
__asm movq mm5,OC_I(1,_y) \
__asm movq mm0,mm4 \
__asm movq OC_J(7,_y),mm6 \
__asm punpcklwd mm0,mm5 \
__asm movq OC_J(6,_y),mm1 \
__asm punpckhwd mm4,mm5 \
__asm movq mm5,mm2 \
__asm punpcklwd mm2,mm3 \
__asm movq mm1,mm0 \
__asm punpckldq mm0,mm2 \
__asm punpckhdq mm1,mm2 \
__asm movq mm2,mm4 \
__asm movq OC_I(0,_y),mm0 \
__asm punpckhwd mm5,mm3 \
__asm movq OC_I(1,_y),mm1 \
__asm punpckhdq mm4,mm5 \
__asm punpckldq mm2,mm5 \
__asm movq OC_I(3,_y),mm4 \
__asm movq OC_I(2,_y),mm2 \
}
/*38+19=57 cycles.*/
#define OC_COLUMN_IDCT(_y) __asm{ \
OC_IDCT_BEGIN(_y,_y) \
__asm paddw mm2,OC_8 \
/*r1=H'+H'*/ \
__asm paddw mm1,mm1 \
/*r1=R1=A''+H'*/ \
__asm paddw mm1,mm2 \
/*r2=NR2*/ \
__asm psraw mm2,4 \
/*r4=E'=E-G*/ \
__asm psubw mm4,mm7 \
/*r1=NR1*/ \
__asm psraw mm1,4 \
/*r3=D'*/ \
__asm movq mm3,OC_I(2,_y) \
/*r7=G+G*/ \
__asm paddw mm7,mm7 \
/*Store NR2 at I(2).*/ \
__asm movq OC_I(2,_y),mm2 \
/*r7=G'=E+G*/ \
__asm paddw mm7,mm4 \
/*Store NR1 at I(1).*/ \
__asm movq OC_I(1,_y),mm1 \
/*r4=R4=E'-D'*/ \
__asm psubw mm4,mm3 \
__asm paddw mm4,OC_8 \
/*r3=D'+D'*/ \
__asm paddw mm3,mm3 \
/*r3=R3=E'+D'*/ \
__asm paddw mm3,mm4 \
/*r4=NR4*/ \
__asm psraw mm4,4 \
/*r6=R6=F'-B''*/ \
__asm psubw mm6,mm5 \
/*r3=NR3*/ \
__asm psraw mm3,4 \
__asm paddw mm6,OC_8 \
/*r5=B''+B''*/ \
__asm paddw mm5,mm5 \
/*r5=R5=F'+B''*/ \
__asm paddw mm5,mm6 \
/*r6=NR6*/ \
__asm psraw mm6,4 \
/*Store NR4 at J(4).*/ \
__asm movq OC_J(4,_y),mm4 \
/*r5=NR5*/ \
__asm psraw mm5,4 \
/*Store NR3 at I(3).*/ \
__asm movq OC_I(3,_y),mm3 \
/*r7=R7=G'-C'*/ \
__asm psubw mm7,mm0 \
__asm paddw mm7,OC_8 \
/*r0=C'+C'*/ \
__asm paddw mm0,mm0 \
/*r0=R0=G'+C'*/ \
__asm paddw mm0,mm7 \
/*r7=NR7*/ \
__asm psraw mm7,4 \
/*Store NR6 at J(6).*/ \
__asm movq OC_J(6,_y),mm6 \
/*r0=NR0*/ \
__asm psraw mm0,4 \
/*Store NR5 at J(5).*/ \
__asm movq OC_J(5,_y),mm5 \
/*Store NR7 at J(7).*/ \
__asm movq OC_J(7,_y),mm7 \
/*Store NR0 at I(0).*/ \
__asm movq OC_I(0,_y),mm0 \
}
#define OC_MID(_m,_i) [CONSTS+_m+(_i)*8]
#define OC_C(_i) OC_MID(OC_COSINE_OFFSET,_i-1)
#define OC_8 OC_MID(OC_EIGHT_OFFSET,0)
static void oc_idct8x8_slow(ogg_int16_t _y[64],ogg_int16_t _x[64]){
int i;
/*This routine accepts an 8x8 matrix, but in partially transposed form.
Every 4x4 block is transposed.*/
__asm{
#define CONSTS eax
#define Y edx
#define X ecx
mov CONSTS,offset OC_IDCT_CONSTS
mov Y,_y
mov X,_x
#define OC_I(_k,_y) [(_y)+(_k)*16]
#define OC_J(_k,_y) [(_y)+((_k)-4)*16+8]
OC_ROW_IDCT(Y,X)
OC_TRANSPOSE(Y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) [(_y)+(_k)*16+64]
#define OC_J(_k,_y) [(_y)+((_k)-4)*16+72]
OC_ROW_IDCT(Y,X)
OC_TRANSPOSE(Y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) [(_y)+(_k)*16]
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT(Y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) [(_y)+(_k)*16+8]
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT(Y)
#undef OC_I
#undef OC_J
#undef CONSTS
#undef Y
#undef X
}
__asm pxor mm0,mm0;
for(i=0;i<4;i++){
ogg_int16_t *x;
x=_x+16*i;
#define X ecx
__asm{
mov X,x
movq [X+0x00],mm0
movq [X+0x08],mm0
movq [X+0x10],mm0
movq [X+0x18],mm0
}
#undef X
}
}
/*25 cycles.*/
#define OC_IDCT_BEGIN_10(_y,_x) __asm{ \
__asm movq mm2,OC_I(3,_x) \
__asm nop \
__asm movq mm6,OC_C(3) \
__asm movq mm4,mm2 \
__asm movq mm1,OC_C(5) \
__asm pmulhw mm4,mm6 \
__asm movq mm3,OC_I(1,_x) \
__asm pmulhw mm1,mm2 \
__asm movq mm0,OC_C(1) \
__asm paddw mm4,mm2 \
__asm pxor mm6,mm6 \
__asm paddw mm2,mm1 \
__asm movq mm5,OC_I(2,_x) \
__asm pmulhw mm0,mm3 \
__asm movq mm1,mm5 \
__asm paddw mm0,mm3 \
__asm pmulhw mm3,OC_C(7) \
__asm psubw mm6,mm2 \
__asm pmulhw mm5,OC_C(2) \
__asm psubw mm0,mm4 \
__asm movq mm7,OC_I(2,_x) \
__asm paddw mm4,mm4 \
__asm paddw mm7,mm5 \
__asm paddw mm4,mm0 \
__asm pmulhw mm1,OC_C(6) \
__asm psubw mm3,mm6 \
__asm movq OC_I(1,_y),mm4 \
__asm paddw mm6,mm6 \
__asm movq mm4,OC_C(4) \
__asm paddw mm6,mm3 \
__asm movq mm5,mm3 \
__asm pmulhw mm3,mm4 \
__asm movq OC_I(2,_y),mm6 \
__asm movq mm2,mm0 \
__asm movq mm6,OC_I(0,_x) \
__asm pmulhw mm0,mm4 \
__asm paddw mm5,mm3 \
__asm paddw mm2,mm0 \
__asm psubw mm5,mm1 \
__asm pmulhw mm6,mm4 \
__asm paddw mm6,OC_I(0,_x) \
__asm paddw mm1,mm1 \
__asm movq mm4,mm6 \
__asm paddw mm1,mm5 \
__asm psubw mm6,mm2 \
__asm paddw mm2,mm2 \
__asm movq mm0,OC_I(1,_y) \
__asm paddw mm2,mm6 \
__asm psubw mm2,mm1 \
__asm nop \
}
/*25+8=33 cycles.*/
#define OC_ROW_IDCT_10(_y,_x) __asm{ \
OC_IDCT_BEGIN_10(_y,_x) \
/*r3=D'*/ \
__asm movq mm3,OC_I(2,_y) \
/*r4=E'=E-G*/ \
__asm psubw mm4,mm7 \
/*r1=H'+H'*/ \
__asm paddw mm1,mm1 \
/*r7=G+G*/ \
__asm paddw mm7,mm7 \
/*r1=R1=A''+H'*/ \
__asm paddw mm1,mm2 \
/*r7=G'=E+G*/ \
__asm paddw mm7,mm4 \
/*r4=R4=E'-D'*/ \
__asm psubw mm4,mm3 \
__asm paddw mm3,mm3 \
/*r6=R6=F'-B''*/ \
__asm psubw mm6,mm5 \
__asm paddw mm5,mm5 \
/*r3=R3=E'+D'*/ \
__asm paddw mm3,mm4 \
/*r5=R5=F'+B''*/ \
__asm paddw mm5,mm6 \
/*r7=R7=G'-C'*/ \
__asm psubw mm7,mm0 \
__asm paddw mm0,mm0 \
/*Save R1.*/ \
__asm movq OC_I(1,_y),mm1 \
/*r0=R0=G'+C'*/ \
__asm paddw mm0,mm7 \
}
/*25+19=44 cycles'*/
#define OC_COLUMN_IDCT_10(_y) __asm{ \
OC_IDCT_BEGIN_10(_y,_y) \
__asm paddw mm2,OC_8 \
/*r1=H'+H'*/ \
__asm paddw mm1,mm1 \
/*r1=R1=A''+H'*/ \
__asm paddw mm1,mm2 \
/*r2=NR2*/ \
__asm psraw mm2,4 \
/*r4=E'=E-G*/ \
__asm psubw mm4,mm7 \
/*r1=NR1*/ \
__asm psraw mm1,4 \
/*r3=D'*/ \
__asm movq mm3,OC_I(2,_y) \
/*r7=G+G*/ \
__asm paddw mm7,mm7 \
/*Store NR2 at I(2).*/ \
__asm movq OC_I(2,_y),mm2 \
/*r7=G'=E+G*/ \
__asm paddw mm7,mm4 \
/*Store NR1 at I(1).*/ \
__asm movq OC_I(1,_y),mm1 \
/*r4=R4=E'-D'*/ \
__asm psubw mm4,mm3 \
__asm paddw mm4,OC_8 \
/*r3=D'+D'*/ \
__asm paddw mm3,mm3 \
/*r3=R3=E'+D'*/ \
__asm paddw mm3,mm4 \
/*r4=NR4*/ \
__asm psraw mm4,4 \
/*r6=R6=F'-B''*/ \
__asm psubw mm6,mm5 \
/*r3=NR3*/ \
__asm psraw mm3,4 \
__asm paddw mm6,OC_8 \
/*r5=B''+B''*/ \
__asm paddw mm5,mm5 \
/*r5=R5=F'+B''*/ \
__asm paddw mm5,mm6 \
/*r6=NR6*/ \
__asm psraw mm6,4 \
/*Store NR4 at J(4).*/ \
__asm movq OC_J(4,_y),mm4 \
/*r5=NR5*/ \
__asm psraw mm5,4 \
/*Store NR3 at I(3).*/ \
__asm movq OC_I(3,_y),mm3 \
/*r7=R7=G'-C'*/ \
__asm psubw mm7,mm0 \
__asm paddw mm7,OC_8 \
/*r0=C'+C'*/ \
__asm paddw mm0,mm0 \
/*r0=R0=G'+C'*/ \
__asm paddw mm0,mm7 \
/*r7=NR7*/ \
__asm psraw mm7,4 \
/*Store NR6 at J(6).*/ \
__asm movq OC_J(6,_y),mm6 \
/*r0=NR0*/ \
__asm psraw mm0,4 \
/*Store NR5 at J(5).*/ \
__asm movq OC_J(5,_y),mm5 \
/*Store NR7 at J(7).*/ \
__asm movq OC_J(7,_y),mm7 \
/*Store NR0 at I(0).*/ \
__asm movq OC_I(0,_y),mm0 \
}
static void oc_idct8x8_10(ogg_int16_t _y[64],ogg_int16_t _x[64]){
__asm{
#define CONSTS eax
#define Y edx
#define X ecx
mov CONSTS,offset OC_IDCT_CONSTS
mov Y,_y
mov X,_x
#define OC_I(_k,_y) [(_y)+(_k)*16]
#define OC_J(_k,_y) [(_y)+((_k)-4)*16+8]
/*Done with dequant, descramble, and partial transpose.
Now do the iDCT itself.*/
OC_ROW_IDCT_10(Y,X)
OC_TRANSPOSE(Y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) [(_y)+(_k)*16]
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT_10(Y)
#undef OC_I
#undef OC_J
#define OC_I(_k,_y) [(_y)+(_k)*16+8]
#define OC_J(_k,_y) OC_I(_k,_y)
OC_COLUMN_IDCT_10(Y)
#undef OC_I
#undef OC_J
#undef CONSTS
#undef Y
#undef X
}
#define X ecx
__asm{
pxor mm0,mm0;
mov X,_x
movq [X+0x00],mm0
movq [X+0x10],mm0
movq [X+0x20],mm0
movq [X+0x30],mm0
}
#undef X
}
/*Performs an inverse 8x8 Type-II DCT transform.
The input is assumed to be scaled by a factor of 4 relative to orthonormal
version of the transform.*/
void oc_idct8x8_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi){
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
use a smaller transform when the block ends with a long zero run instead
of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Perform the iDCT.*/
if(_last_zzi<=10)oc_idct8x8_10(_y,_x);
else oc_idct8x8_slow(_y,_x);
}
#endif

View File

@ -1,219 +0,0 @@
#if !defined(_x86_vc_mmxloop_H)
# define _x86_vc_mmxloop_H (1)
# include <stddef.h>
# include "x86int.h"
#if defined(OC_X86_ASM)
/*On entry, mm0={a0,...,a7}, mm1={b0,...,b7}, mm2={c0,...,c7}, mm3={d0,...d7}.
On exit, mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)} and
mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}; mm0 and mm3 are clobbered.*/
#define OC_LOOP_FILTER8_MMX __asm{ \
/*mm7=0*/ \
__asm pxor mm7,mm7 \
/*mm6:mm0={a0,...,a7}*/ \
__asm movq mm6,mm0 \
__asm punpcklbw mm0,mm7 \
__asm punpckhbw mm6,mm7 \
/*mm3:mm5={d0,...,d7}*/ \
__asm movq mm5,mm3 \
__asm punpcklbw mm3,mm7 \
__asm punpckhbw mm5,mm7 \
/*mm6:mm0={a0-d0,...,a7-d7}*/ \
__asm psubw mm0,mm3 \
__asm psubw mm6,mm5 \
/*mm3:mm1={b0,...,b7}*/ \
__asm movq mm3,mm1 \
__asm punpcklbw mm1,mm7 \
__asm movq mm4,mm2 \
__asm punpckhbw mm3,mm7 \
/*mm5:mm4={c0,...,c7}*/ \
__asm movq mm5,mm2 \
__asm punpcklbw mm4,mm7 \
__asm punpckhbw mm5,mm7 \
/*mm7={3}x4 \
mm5:mm4={c0-b0,...,c7-b7}*/ \
__asm pcmpeqw mm7,mm7 \
__asm psubw mm4,mm1 \
__asm psrlw mm7,14 \
__asm psubw mm5,mm3 \
/*Scale by 3.*/ \
__asm pmullw mm4,mm7 \
__asm pmullw mm5,mm7 \
/*mm7={4}x4 \
mm5:mm4=f={a0-d0+3*(c0-b0),...,a7-d7+3*(c7-b7)}*/ \
__asm psrlw mm7,1 \
__asm paddw mm4,mm0 \
__asm psllw mm7,2 \
__asm movq mm0,[LL] \
__asm paddw mm5,mm6 \
/*R_i has the range [-127,128], so we compute -R_i instead. \
mm4=-R_i=-(f+4>>3)=0xFF^(f-4>>3)*/ \
__asm psubw mm4,mm7 \
__asm psubw mm5,mm7 \
__asm psraw mm4,3 \
__asm psraw mm5,3 \
__asm pcmpeqb mm7,mm7 \
__asm packsswb mm4,mm5 \
__asm pxor mm6,mm6 \
__asm pxor mm4,mm7 \
__asm packuswb mm1,mm3 \
/*Now compute lflim of -mm4 cf. Section 7.10 of the sepc.*/ \
/*There's no unsigned byte+signed byte with unsigned saturation op code, so \
we have to split things by sign (the other option is to work in 16 bits, \
but working in 8 bits gives much better parallelism). \
We compute abs(R_i), but save a mask of which terms were negative in mm6. \
Then we compute mm4=abs(lflim(R_i,L))=min(abs(R_i),max(2*L-abs(R_i),0)). \
Finally, we split mm4 into positive and negative pieces using the mask in \
mm6, and add and subtract them as appropriate.*/ \
/*mm4=abs(-R_i)*/ \
/*mm7=255-2*L*/ \
__asm pcmpgtb mm6,mm4 \
__asm psubb mm7,mm0 \
__asm pxor mm4,mm6 \
__asm psubb mm7,mm0 \
__asm psubb mm4,mm6 \
/*mm7=255-max(2*L-abs(R_i),0)*/ \
__asm paddusb mm7,mm4 \
/*mm4=min(abs(R_i),max(2*L-abs(R_i),0))*/ \
__asm paddusb mm4,mm7 \
__asm psubusb mm4,mm7 \
/*Now split mm4 by the original sign of -R_i.*/ \
__asm movq mm5,mm4 \
__asm pand mm4,mm6 \
__asm pandn mm6,mm5 \
/*mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)}*/ \
/*mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}*/ \
__asm paddusb mm1,mm4 \
__asm psubusb mm2,mm4 \
__asm psubusb mm1,mm6 \
__asm paddusb mm2,mm6 \
}
#define OC_LOOP_FILTER_V_MMX(_pix,_ystride,_ll) \
do{ \
/*Used local variable pix__ in order to fix compilation errors like: \
"error C2425: 'SHL' : non-constant expression in 'second operand'".*/ \
unsigned char *pix__; \
unsigned char *ll__; \
ll__=(_ll); \
pix__=(_pix); \
__asm mov YSTRIDE,_ystride \
__asm mov LL,ll__ \
__asm mov PIX,pix__ \
__asm sub PIX,YSTRIDE \
__asm sub PIX,YSTRIDE \
/*mm0={a0,...,a7}*/ \
__asm movq mm0,[PIX] \
/*ystride3=_ystride*3*/ \
__asm lea YSTRIDE3,[YSTRIDE+YSTRIDE*2] \
/*mm3={d0,...,d7}*/ \
__asm movq mm3,[PIX+YSTRIDE3] \
/*mm1={b0,...,b7}*/ \
__asm movq mm1,[PIX+YSTRIDE] \
/*mm2={c0,...,c7}*/ \
__asm movq mm2,[PIX+YSTRIDE*2] \
OC_LOOP_FILTER8_MMX \
/*Write it back out.*/ \
__asm movq [PIX+YSTRIDE],mm1 \
__asm movq [PIX+YSTRIDE*2],mm2 \
} \
while(0)
#define OC_LOOP_FILTER_H_MMX(_pix,_ystride,_ll) \
do{ \
/*Used local variable ll__ in order to fix compilation errors like: \
"error C2443: operand size conflict".*/ \
unsigned char *ll__; \
unsigned char *pix__; \
ll__=(_ll); \
pix__=(_pix)-2; \
__asm mov PIX,pix__ \
__asm mov YSTRIDE,_ystride \
__asm mov LL,ll__ \
/*x x x x d0 c0 b0 a0*/ \
__asm movd mm0,[PIX] \
/*x x x x d1 c1 b1 a1*/ \
__asm movd mm1,[PIX+YSTRIDE] \
/*ystride3=_ystride*3*/ \
__asm lea YSTRIDE3,[YSTRIDE+YSTRIDE*2] \
/*x x x x d2 c2 b2 a2*/ \
__asm movd mm2,[PIX+YSTRIDE*2] \
/*x x x x d3 c3 b3 a3*/ \
__asm lea D,[PIX+YSTRIDE*4] \
__asm movd mm3,[PIX+YSTRIDE3] \
/*x x x x d4 c4 b4 a4*/ \
__asm movd mm4,[D] \
/*x x x x d5 c5 b5 a5*/ \
__asm movd mm5,[D+YSTRIDE] \
/*x x x x d6 c6 b6 a6*/ \
__asm movd mm6,[D+YSTRIDE*2] \
/*x x x x d7 c7 b7 a7*/ \
__asm movd mm7,[D+YSTRIDE3] \
/*mm0=d1 d0 c1 c0 b1 b0 a1 a0*/ \
__asm punpcklbw mm0,mm1 \
/*mm2=d3 d2 c3 c2 b3 b2 a3 a2*/ \
__asm punpcklbw mm2,mm3 \
/*mm3=d1 d0 c1 c0 b1 b0 a1 a0*/ \
__asm movq mm3,mm0 \
/*mm0=b3 b2 b1 b0 a3 a2 a1 a0*/ \
__asm punpcklwd mm0,mm2 \
/*mm3=d3 d2 d1 d0 c3 c2 c1 c0*/ \
__asm punpckhwd mm3,mm2 \
/*mm1=b3 b2 b1 b0 a3 a2 a1 a0*/ \
__asm movq mm1,mm0 \
/*mm4=d5 d4 c5 c4 b5 b4 a5 a4*/ \
__asm punpcklbw mm4,mm5 \
/*mm6=d7 d6 c7 c6 b7 b6 a7 a6*/ \
__asm punpcklbw mm6,mm7 \
/*mm5=d5 d4 c5 c4 b5 b4 a5 a4*/ \
__asm movq mm5,mm4 \
/*mm4=b7 b6 b5 b4 a7 a6 a5 a4*/ \
__asm punpcklwd mm4,mm6 \
/*mm5=d7 d6 d5 d4 c7 c6 c5 c4*/ \
__asm punpckhwd mm5,mm6 \
/*mm2=d3 d2 d1 d0 c3 c2 c1 c0*/ \
__asm movq mm2,mm3 \
/*mm0=a7 a6 a5 a4 a3 a2 a1 a0*/ \
__asm punpckldq mm0,mm4 \
/*mm1=b7 b6 b5 b4 b3 b2 b1 b0*/ \
__asm punpckhdq mm1,mm4 \
/*mm2=c7 c6 c5 c4 c3 c2 c1 c0*/ \
__asm punpckldq mm2,mm5 \
/*mm3=d7 d6 d5 d4 d3 d2 d1 d0*/ \
__asm punpckhdq mm3,mm5 \
OC_LOOP_FILTER8_MMX \
/*mm2={b0+R_0'',...,b7+R_7''}*/ \
__asm movq mm0,mm1 \
/*mm1={b0+R_0'',c0-R_0'',...,b3+R_3'',c3-R_3''}*/ \
__asm punpcklbw mm1,mm2 \
/*mm2={b4+R_4'',c4-R_4'',...,b7+R_7'',c7-R_7''}*/ \
__asm punpckhbw mm0,mm2 \
/*[d]=c1 b1 c0 b0*/ \
__asm movd D,mm1 \
__asm mov [PIX+1],D_WORD \
__asm psrlq mm1,32 \
__asm shr D,16 \
__asm mov [PIX+YSTRIDE+1],D_WORD \
/*[d]=c3 b3 c2 b2*/ \
__asm movd D,mm1 \
__asm mov [PIX+YSTRIDE*2+1],D_WORD \
__asm shr D,16 \
__asm mov [PIX+YSTRIDE3+1],D_WORD \
__asm lea PIX,[PIX+YSTRIDE*4] \
/*[d]=c5 b5 c4 b4*/ \
__asm movd D,mm0 \
__asm mov [PIX+1],D_WORD \
__asm psrlq mm0,32 \
__asm shr D,16 \
__asm mov [PIX+YSTRIDE+1],D_WORD \
/*[d]=c7 b7 c6 b6*/ \
__asm movd D,mm0 \
__asm mov [PIX+YSTRIDE*2+1],D_WORD \
__asm shr D,16 \
__asm mov [PIX+YSTRIDE3+1],D_WORD \
} \
while(0)
# endif
#endif

View File

@ -1,176 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
/*MMX acceleration of complete fragment reconstruction algorithm.
Originally written by Rudolf Marek.*/
#include <string.h>
#include "x86int.h"
#include "mmxloop.h"
#if defined(OC_X86_ASM)
void oc_state_frag_recon_mmx(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant){
unsigned char *dst;
ptrdiff_t frag_buf_off;
int ystride;
int refi;
/*Apply the inverse transform.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
/*Note that this value must be unsigned, to keep the __asm__ block from
sign-extending it when it puts it in a register.*/
ogg_uint16_t p;
/*We round this dequant product (and not any of the others) because there's
no iDCT rounding.*/
p=(ogg_int16_t)(_dct_coeffs[0]*(ogg_int32_t)_dc_quant+15>>5);
/*Fill _dct_coeffs with p.*/
__asm{
#define Y eax
#define P ecx
mov Y,_dct_coeffs
movzx P,p
lea Y,[Y+128]
/*mm0=0000 0000 0000 AAAA*/
movd mm0,P
/*mm0=0000 0000 AAAA AAAA*/
punpcklwd mm0,mm0
/*mm0=AAAA AAAA AAAA AAAA*/
punpckldq mm0,mm0
movq [Y],mm0
movq [8+Y],mm0
movq [16+Y],mm0
movq [24+Y],mm0
movq [32+Y],mm0
movq [40+Y],mm0
movq [48+Y],mm0
movq [56+Y],mm0
movq [64+Y],mm0
movq [72+Y],mm0
movq [80+Y],mm0
movq [88+Y],mm0
movq [96+Y],mm0
movq [104+Y],mm0
movq [112+Y],mm0
movq [120+Y],mm0
#undef Y
#undef P
}
}
else{
/*Dequantize the DC coefficient.*/
_dct_coeffs[0]=(ogg_int16_t)(_dct_coeffs[0]*(int)_dc_quant);
oc_idct8x8_mmx(_dct_coeffs+64,_dct_coeffs,_last_zzi);
}
/*Fill in the target buffer.*/
frag_buf_off=_state->frag_buf_offs[_fragi];
refi=_state->frags[_fragi].refi;
ystride=_state->ref_ystride[_pli];
dst=_state->ref_frame_data[OC_FRAME_SELF]+frag_buf_off;
if(refi==OC_FRAME_SELF)oc_frag_recon_intra_mmx(dst,ystride,_dct_coeffs+64);
else{
const unsigned char *ref;
int mvoffsets[2];
ref=_state->ref_frame_data[refi]+frag_buf_off;
if(oc_state_get_mv_offsets(_state,mvoffsets,_pli,
_state->frag_mvs[_fragi])>1){
oc_frag_recon_inter2_mmx(dst,ref+mvoffsets[0],ref+mvoffsets[1],ystride,
_dct_coeffs+64);
}
else oc_frag_recon_inter_mmx(dst,ref+mvoffsets[0],ystride,_dct_coeffs+64);
}
}
/*We copy these entire function to inline the actual MMX routines so that we
use only a single indirect call.*/
void oc_loop_filter_init_mmx(signed char _bv[256],int _flimit){
memset(_bv,~(_flimit<<1),8);
}
/*Apply the loop filter to a given set of fragment rows in the given plane.
The filter may be run on the bottom edge, affecting pixels in the next row of
fragments, so this row also needs to be available.
_bv: The bounding values array.
_refi: The index of the frame buffer to filter.
_pli: The color plane to filter.
_fragy0: The Y coordinate of the first fragment row to filter.
_fragy_end: The Y coordinate of the fragment row to stop filtering at.*/
void oc_state_loop_filter_frag_rows_mmx(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end){
const oc_fragment_plane *fplane;
const oc_fragment *frags;
const ptrdiff_t *frag_buf_offs;
unsigned char *ref_frame_data;
ptrdiff_t fragi_top;
ptrdiff_t fragi_bot;
ptrdiff_t fragi0;
ptrdiff_t fragi0_end;
int ystride;
int nhfrags;
fplane=_state->fplanes+_pli;
nhfrags=fplane->nhfrags;
fragi_top=fplane->froffset;
fragi_bot=fragi_top+fplane->nfrags;
fragi0=fragi_top+_fragy0*(ptrdiff_t)nhfrags;
fragi0_end=fragi_top+_fragy_end*(ptrdiff_t)nhfrags;
ystride=_state->ref_ystride[_pli];
frags=_state->frags;
frag_buf_offs=_state->frag_buf_offs;
ref_frame_data=_state->ref_frame_data[_refi];
/*The following loops are constructed somewhat non-intuitively on purpose.
The main idea is: if a block boundary has at least one coded fragment on
it, the filter is applied to it.
However, the order that the filters are applied in matters, and VP3 chose
the somewhat strange ordering used below.*/
while(fragi0<fragi0_end){
ptrdiff_t fragi;
ptrdiff_t fragi_end;
fragi=fragi0;
fragi_end=fragi+nhfrags;
while(fragi<fragi_end){
if(frags[fragi].coded){
unsigned char *ref;
ref=ref_frame_data+frag_buf_offs[fragi];
#define PIX eax
#define YSTRIDE3 edi
#define YSTRIDE ecx
#define LL edx
#define D esi
#define D_WORD si
if(fragi>fragi0)OC_LOOP_FILTER_H_MMX(ref,ystride,_bv);
if(fragi0>fragi_top)OC_LOOP_FILTER_V_MMX(ref,ystride,_bv);
if(fragi+1<fragi_end&&!frags[fragi+1].coded){
OC_LOOP_FILTER_H_MMX(ref+8,ystride,_bv);
}
if(fragi+nhfrags<fragi_bot&&!frags[fragi+nhfrags].coded){
OC_LOOP_FILTER_V_MMX(ref+(ystride<<3),ystride,_bv);
}
#undef PIX
#undef YSTRIDE3
#undef YSTRIDE
#undef LL
#undef D
#undef D_WORD
}
fragi++;
}
fragi0+=nhfrags;
}
}
#endif

View File

@ -1,192 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
CPU capability detection for x86 processors.
Originally written by Rudolf Marek.
function:
last mod: $Id$
********************************************************************/
#include "x86cpu.h"
#if !defined(OC_X86_ASM)
ogg_uint32_t oc_cpu_flags_get(void){
return 0;
}
#else
/*Why does MSVC need this complicated rigamarole?
At this point I honestly do not care.*/
/*Visual C cpuid helper function.
For VS2005 we could as well use the _cpuid builtin, but that wouldn't work
for VS2003 users, so we do it in inline assembler.*/
static void oc_cpuid_helper(ogg_uint32_t _cpu_info[4],ogg_uint32_t _op){
_asm{
mov eax,[_op]
mov esi,_cpu_info
cpuid
mov [esi+0],eax
mov [esi+4],ebx
mov [esi+8],ecx
mov [esi+12],edx
}
}
# define cpuid(_op,_eax,_ebx,_ecx,_edx) \
do{ \
ogg_uint32_t cpu_info[4]; \
oc_cpuid_helper(cpu_info,_op); \
(_eax)=cpu_info[0]; \
(_ebx)=cpu_info[1]; \
(_ecx)=cpu_info[2]; \
(_edx)=cpu_info[3]; \
}while(0)
static void oc_detect_cpuid_helper(ogg_uint32_t *_eax,ogg_uint32_t *_ebx){
_asm{
pushfd
pushfd
pop eax
mov ebx,eax
xor eax,200000h
push eax
popfd
pushfd
pop eax
popfd
mov ecx,_eax
mov [ecx],eax
mov ecx,_ebx
mov [ecx],ebx
}
}
static ogg_uint32_t oc_parse_intel_flags(ogg_uint32_t _edx,ogg_uint32_t _ecx){
ogg_uint32_t flags;
/*If there isn't even MMX, give up.*/
if(!(_edx&0x00800000))return 0;
flags=OC_CPU_X86_MMX;
if(_edx&0x02000000)flags|=OC_CPU_X86_MMXEXT|OC_CPU_X86_SSE;
if(_edx&0x04000000)flags|=OC_CPU_X86_SSE2;
if(_ecx&0x00000001)flags|=OC_CPU_X86_PNI;
if(_ecx&0x00000100)flags|=OC_CPU_X86_SSSE3;
if(_ecx&0x00080000)flags|=OC_CPU_X86_SSE4_1;
if(_ecx&0x00100000)flags|=OC_CPU_X86_SSE4_2;
return flags;
}
static ogg_uint32_t oc_parse_amd_flags(ogg_uint32_t _edx,ogg_uint32_t _ecx){
ogg_uint32_t flags;
/*If there isn't even MMX, give up.*/
if(!(_edx&0x00800000))return 0;
flags=OC_CPU_X86_MMX;
if(_edx&0x00400000)flags|=OC_CPU_X86_MMXEXT;
if(_edx&0x80000000)flags|=OC_CPU_X86_3DNOW;
if(_edx&0x40000000)flags|=OC_CPU_X86_3DNOWEXT;
if(_ecx&0x00000040)flags|=OC_CPU_X86_SSE4A;
if(_ecx&0x00000800)flags|=OC_CPU_X86_SSE5;
return flags;
}
ogg_uint32_t oc_cpu_flags_get(void){
ogg_uint32_t flags;
ogg_uint32_t eax;
ogg_uint32_t ebx;
ogg_uint32_t ecx;
ogg_uint32_t edx;
# if !defined(__amd64__)&&!defined(__x86_64__)
/*Not all x86-32 chips support cpuid, so we have to check.*/
oc_detect_cpuid_helper(&eax,&ebx);
/*No cpuid.*/
if(eax==ebx)return 0;
# endif
cpuid(0,eax,ebx,ecx,edx);
/* l e t n I e n i u n e G*/
if(ecx==0x6C65746E&&edx==0x49656E69&&ebx==0x756E6547||
/* 6 8 x M T e n i u n e G*/
ecx==0x3638784D&&edx==0x54656E69&&ebx==0x756E6547){
int family;
int model;
/*Intel, Transmeta (tested with Crusoe TM5800):*/
cpuid(1,eax,ebx,ecx,edx);
flags=oc_parse_intel_flags(edx,ecx);
family=(eax>>8)&0xF;
model=(eax>>4)&0xF;
/*The SSE unit on the Pentium M and Core Duo is much slower than the MMX
unit, so don't use it.*/
if(family==6&&(model==9||model==13||model==14)){
flags&=~(OC_CPU_X86_SSE2|OC_CPU_X86_PNI);
}
}
/* D M A c i t n e h t u A*/
else if(ecx==0x444D4163&&edx==0x69746E65&&ebx==0x68747541||
/* C S N y b e d o e G*/
ecx==0x43534e20&&edx==0x79622065&&ebx==0x646f6547){
/*AMD, Geode:*/
cpuid(0x80000000,eax,ebx,ecx,edx);
if(eax<0x80000001)flags=0;
else{
cpuid(0x80000001,eax,ebx,ecx,edx);
flags=oc_parse_amd_flags(edx,ecx);
}
/*Also check for SSE.*/
cpuid(1,eax,ebx,ecx,edx);
flags|=oc_parse_intel_flags(edx,ecx);
}
/*Technically some VIA chips can be configured in the BIOS to return any
string here the user wants.
There is a special detection method that can be used to identify such
processors, but in my opinion, if the user really wants to change it, they
deserve what they get.*/
/* s l u a H r u a t n e C*/
else if(ecx==0x736C7561&&edx==0x48727561&&ebx==0x746E6543){
/*VIA:*/
/*I only have documentation for the C7 (Esther) and Isaiah (forthcoming)
chips (thanks to the engineers from Centaur Technology who provided it).
These chips support Intel-like cpuid info.
The C3-2 (Nehemiah) cores appear to, as well.*/
cpuid(1,eax,ebx,ecx,edx);
flags=oc_parse_intel_flags(edx,ecx);
if(eax>=0x80000001){
/*The (non-Nehemiah) C3 processors support AMD-like cpuid info.
We need to check this even if the Intel test succeeds to pick up 3DNow!
support on these processors.
Unlike actual AMD processors, we cannot _rely_ on this info, since
some cores (e.g., the 693 stepping of the Nehemiah) claim to support
this function, yet return edx=0, despite the Intel test indicating
MMX support.
Therefore the features detected here are strictly added to those
detected by the Intel test.*/
/*TODO: How about earlier chips?*/
cpuid(0x80000001,eax,ebx,ecx,edx);
/*Note: As of the C7, this function returns Intel-style extended feature
flags, not AMD-style.
Currently, this only defines bits 11, 20, and 29 (0x20100800), which
do not conflict with any of the AMD flags we inspect.
For the remaining bits, Intel tells us, "Do not count on their value",
but VIA assures us that they will all be zero (at least on the C7 and
Isaiah chips).
In the (unlikely) event a future processor uses bits 18, 19, 30, or 31
(0xC0C00000) for something else, we will have to add code to detect
the model to decide when it is appropriate to inspect them.*/
flags|=oc_parse_amd_flags(edx,ecx);
}
}
else{
/*Implement me.*/
flags=0;
}
return flags;
}
#endif

View File

@ -1,36 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_x86_vc_x86cpu_H)
# define _x86_vc_x86cpu_H (1)
#include "../internal.h"
#define OC_CPU_X86_MMX (1<<0)
#define OC_CPU_X86_3DNOW (1<<1)
#define OC_CPU_X86_3DNOWEXT (1<<2)
#define OC_CPU_X86_MMXEXT (1<<3)
#define OC_CPU_X86_SSE (1<<4)
#define OC_CPU_X86_SSE2 (1<<5)
#define OC_CPU_X86_PNI (1<<6)
#define OC_CPU_X86_SSSE3 (1<<7)
#define OC_CPU_X86_SSE4_1 (1<<8)
#define OC_CPU_X86_SSE4_2 (1<<9)
#define OC_CPU_X86_SSE4A (1<<10)
#define OC_CPU_X86_SSE5 (1<<11)
ogg_uint32_t oc_cpu_flags_get(void);
#endif

View File

@ -1,49 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#if !defined(_x86_vc_x86int_H)
# define _x86_vc_x86int_H (1)
# include "../internal.h"
# if defined(OC_X86_ASM)
# define oc_state_accel_init oc_state_accel_init_x86
# define OC_STATE_USE_VTABLE (1)
# endif
# include "../state.h"
# include "x86cpu.h"
void oc_state_accel_init_x86(oc_theora_state *_state);
void oc_frag_copy_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride);
void oc_frag_copy_list_mmx(unsigned char *_dst_frame,
const unsigned char *_src_frame,int _ystride,
const ptrdiff_t *_fragis,ptrdiff_t _nfragis,const ptrdiff_t *_frag_buf_offs);
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_mmx(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_mmx(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t *_residue);
void oc_idct8x8_mmx(ogg_int16_t _y[64],ogg_int16_t _x[64],int _last_zzi);
void oc_state_frag_recon_mmx(const oc_theora_state *_state,ptrdiff_t _fragi,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,ogg_uint16_t _dc_quant);
void oc_loop_filter_init_mmx(signed char _bv[256],int _flimit);
void oc_state_loop_filter_frag_rows_mmx(const oc_theora_state *_state,
signed char _bv[256],int _refi,int _pli,int _fragy0,int _fragy_end);
void oc_restore_fpu_mmx(void);
#endif

View File

@ -1,61 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id$
********************************************************************/
#include "x86int.h"
#if defined(OC_X86_ASM)
/*This table has been modified from OC_FZIG_ZAG by baking a 4x4 transpose into
each quadrant of the destination.*/
static const unsigned char OC_FZIG_ZAG_MMX[128]={
0, 8, 1, 2, 9,16,24,17,
10, 3,32,11,18,25, 4,12,
5,26,19,40,33,34,41,48,
27, 6,13,20,28,21,14, 7,
56,49,42,35,43,50,57,36,
15,22,29,30,23,44,37,58,
51,59,38,45,52,31,60,53,
46,39,47,54,61,62,55,63,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
64,64,64,64,64,64,64,64,
};
void oc_state_accel_init_x86(oc_theora_state *_state){
_state->cpu_flags=oc_cpu_flags_get();
if(_state->cpu_flags&OC_CPU_X86_MMX){
_state->opt_vtable.frag_copy=oc_frag_copy_mmx;
_state->opt_vtable.frag_copy_list=oc_frag_copy_list_mmx;
_state->opt_vtable.frag_recon_intra=oc_frag_recon_intra_mmx;
_state->opt_vtable.frag_recon_inter=oc_frag_recon_inter_mmx;
_state->opt_vtable.frag_recon_inter2=oc_frag_recon_inter2_mmx;
_state->opt_vtable.idct8x8=oc_idct8x8_mmx;
_state->opt_vtable.state_frag_recon=oc_state_frag_recon_mmx;
_state->opt_vtable.loop_filter_init=oc_loop_filter_init_mmx;
_state->opt_vtable.state_loop_filter_frag_rows=
oc_state_loop_filter_frag_rows_mmx;
_state->opt_vtable.restore_fpu=oc_restore_fpu_mmx;
_state->opt_data.dct_fzig_zag=OC_FZIG_ZAG_MMX;
}
else oc_state_accel_init_c(_state);
}
#endif

View File

@ -1,106 +0,0 @@
# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*-
# vim: set filetype=python:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
with Files('*'):
BUG_COMPONENT = ('Core', 'Audio/Video')
EXPORTS.theora += [
'include/theora/codec.h',
'include/theora/theoradec.h',
]
# We allow warnings for third-party code that can be updated from upstream.
AllowCompilerWarnings()
FINAL_LIBRARY = 'gkcodecs'
NoVisibilityFlags()
# The encoder is currently not included.
DEFINES['THEORA_DISABLE_ENCODE'] = True
# Suppress warnings in third-party code.
CFLAGS += ['-Wno-type-limits']
if CONFIG['CC_TYPE'] in ('clang', 'clang-cl'):
CFLAGS += [
'-Wno-shift-negative-value',
'-Wno-tautological-compare',
]
if CONFIG['CC_TYPE'] == 'clang-cl':
CFLAGS += [
'-Wno-parentheses',
'-Wno-pointer-sign',
]
UNIFIED_SOURCES += [
'lib/bitpack.c',
'lib/decinfo.c',
'lib/decode.c',
'lib/dequant.c',
'lib/fragment.c',
'lib/huffdec.c',
'lib/idct.c',
'lib/info.c',
'lib/internal.c',
'lib/quant.c',
'lib/state.c',
]
LOCAL_INCLUDES += ['include']
if CONFIG['INTEL_ARCHITECTURE']:
if CONFIG['OS_ARCH'] != 'SunOS':
if CONFIG['CC_TYPE'] == 'clang-cl':
# clang-cl can't handle libtheora's inline asm.
pass
elif CONFIG['OS_ARCH'] != 'WINNT' or CONFIG['TARGET_CPU'] != 'x86_64':
DEFINES['OC_X86_ASM'] = True
if CONFIG['TARGET_CPU'] == 'x86_64':
DEFINES['OC_X86_64_ASM'] = True
if CONFIG['CC_TYPE'] == 'clang-cl':
# clang-cl can't handle libtheora's inline asm.
pass
#SOURCES += [
# 'lib/x86_vc/mmxfrag.c',
# 'lib/x86_vc/mmxidct.c',
# 'lib/x86_vc/mmxstate.c',
# 'lib/x86_vc/x86cpu.c',
# 'lib/x86_vc/x86state.c',
#]
else:
SOURCES += [
'lib/x86/mmxfrag.c',
'lib/x86/mmxidct.c',
'lib/x86/mmxstate.c',
'lib/x86/sse2idct.c',
'lib/x86/x86cpu.c',
'lib/x86/x86state.c',
]
if CONFIG['GNU_AS']:
if CONFIG['TARGET_CPU'] == 'arm':
SOURCES += [
'lib/arm/armcpu.c',
'lib/arm/armstate.c',
]
for var in ('OC_ARM_ASM',
'OC_ARM_ASM_EDSP',
'OC_ARM_ASM_MEDIA',
'OC_ARM_ASM_NEON'):
DEFINES[var] = True
SOURCES += [ '!%s.s' % f for f in [
'armbits-gnu',
'armfrag-gnu',
'armidct-gnu',
'armloop-gnu',
]]
# These flags are a lie; they're just used to enable the requisite
# opcodes; actual arch detection is done at runtime.
ASFLAGS += [
'-march=armv7-a',
]
ASFLAGS += CONFIG['NEON_FLAGS']

View File

@ -1,109 +0,0 @@
schema: 1
bugzilla:
product: Core
component: "Audio/Video: Playback"
origin:
name: theora
description: Video compression format from Xiph
url: https://www.theora.org/
release: 7180717276af1ebc7da15c83162d6c5d6203aabf (2020-10-27T09:17:42.000-07:00).
revision: 7180717276af1ebc7da15c83162d6c5d6203aabf
license: BSD-3-Clause-Clear
license-file: COPYING
updatebot:
maintainer-phab: kinetik
maintainer-bz: kinetik
tasks:
- type: vendoring
enabled: true
frequency: every
vendoring:
url: https://gitlab.xiph.org/xiph/theora
source-hosting: gitlab
exclude:
- doc
- examples
- lib/c64x
- m4
- macosx
- symbian
- tests
- tools
- win32
- autogen.sh
- .travis.yml
- configure.ac
- SConstruct
- Makefile.am
- "*.pc.in"
- "*.spec.in"
- include/theora/theoraenc.h
- include/theora/Makefile.*
- include/Makefile.*
- lib/analyze.c
- lib/apiwrapper.c
- lib/apiwrapper.h
- lib/arm/armenc.c
- lib/arm/armenc.h
- lib/arm/armencfrag.s
- lib/arm/armenquant.s
- lib/collect.c
- lib/collect.h
- lib/decapiwrapper.c
- lib/encapiwrapper.c
- lib/encfrag.c
- lib/encinfo.c
- lib/encint.h
- lib/encode.c
- lib/encoder_disabled.c
- lib/enquant.c
- lib/enquant.h
- lib/fdct.c
- lib/huffenc.c
- lib/huffenc.h
- lib/mathops.c
- lib/mcenc.c
- lib/modedec.h
- lib/rate.c
- lib/tokenize.c
- lib/x86/mmxencfrag.c
- lib/x86/mmxfdct.c
- lib/x86/sse2encfrag.c
- lib/x86/sse2fdct.c
- lib/x86/x86enc.c
- lib/x86/x86enc.h
- lib/x86/x86enquant.c
- lib/x86/x86zigzag.h
- lib/x86_vc/mmxencfrag.c
- lib/x86_vc/mmxfdct.c
- lib/x86_vc/x86enc.c
- lib/x86_vc/x86enc.h
- lib/x86_vc/x86zigzag.h
- lib/Makefile.*
- lib/Version_script*
- lib/*.awk
- lib/*.def
- lib/*.exp
keep:
- Makefile.in
- lib/config.h
patches:
- clang-arm.patch
update-actions:
- action: move-file
from: '{vendor_dir}/lib/arm/armopts.s.in'
to: '{vendor_dir}/lib/arm/armopts.s'
- action: replace-in-file-regex
file: '{vendor_dir}/lib/arm/armopts.s'
pattern: '@HAVE_ARM_ASM_((EDSP)|(MEDIA)|(NEON))@'
with: '1'

View File

@ -8,7 +8,7 @@
# documentation and how to modify this file.
repo: mozilla-central
created_at: '2021-10-14T12:50:40.073465'
updated_at: '2024-07-10T15:47:32.256847+00:00'
updated_at: '2024-07-11T11:59:38.343950+00:00'
export:
path: ./docs/mots/index.rst
format: rst
@ -2109,7 +2109,6 @@ modules:
- media/libnestegg/**/*
- media/libogg/**/*
- media/libopus/**/*
- media/libtheora/**/*
- media/libtremor/**/*
- media/libvorbis/**/*
- media/libvpx/**/*
@ -4359,5 +4358,5 @@ modules:
- Ryan Tilder
group: dev-platform
hashes:
config: c187a82ea9772e8a0e65611c0d64c164ccd7f60c
export: 7101ade973617f1b4aa16978462a432b059ffdcd
config: 9d863547e5bafe974d1808b92938fd54c2442a77
export: fb113b325e4913583a923e07c6489f81b5821e4d

View File

@ -5482,7 +5482,6 @@ OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
with the specified copyright year ranges:</p>
<ul>
<li><code>media/libogg/</code>, 2002</li>
<li><code>media/libtheora/</code>, 2002-2007</li>
<li><code>media/libvorbis/</code>, 2002-2004</li>
<li><code>media/libspeex_resampler/</code>, 2002-2008</li>
</ul>

View File

@ -123,7 +123,6 @@ media/libopus/
media/libpng/
media/libsoundtouch/
media/libspeex_resampler/
media/libtheora/
media/libvorbis/
media/libvpx/
media/libwebp/