mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-11-27 14:52:16 +00:00
Bug 1879873
- Remove kiss fft and openmax dl. r=karlt,sylvestre
Differential Revision: https://phabricator.services.mozilla.com/D201600
This commit is contained in:
parent
be66068593
commit
a71183d09c
4
config/external/moz.build
vendored
4
config/external/moz.build
vendored
@ -49,9 +49,6 @@ if not CONFIG["MOZ_SYSTEM_PNG"]:
|
||||
if not CONFIG["MOZ_SYSTEM_WEBP"]:
|
||||
external_dirs += ["media/libwebp"]
|
||||
|
||||
if CONFIG["TARGET_CPU"] == "arm":
|
||||
external_dirs += ["media/openmax_dl/dl"]
|
||||
|
||||
if CONFIG["MOZ_FFVPX"]:
|
||||
external_dirs += ["media/ffvpx"]
|
||||
|
||||
@ -59,7 +56,6 @@ if CONFIG["MOZ_JXL"]:
|
||||
external_dirs += ["media/libjxl", "media/highway"]
|
||||
|
||||
external_dirs += [
|
||||
"media/kiss_fft",
|
||||
"media/libcubeb",
|
||||
"media/libmkv",
|
||||
"media/libnestegg",
|
||||
|
@ -130,8 +130,6 @@ if CONFIG["TARGET_CPU"] == "aarch64" or CONFIG["BUILD_ARM_NEON"]:
|
||||
LOCAL_INCLUDES += ["/third_party/xsimd/include"]
|
||||
SOURCES += ["AudioNodeEngineNEON.cpp"]
|
||||
SOURCES["AudioNodeEngineNEON.cpp"].flags += CONFIG["NEON_FLAGS"]
|
||||
if CONFIG["BUILD_ARM_NEON"]:
|
||||
LOCAL_INCLUDES += ["/media/openmax_dl/dl/api/"]
|
||||
|
||||
# Are we targeting x86 or x64? If so, build SSEX files.
|
||||
if CONFIG["INTEL_ARCHITECTURE"]:
|
||||
|
@ -1,123 +0,0 @@
|
||||
1.3.0 2012-07-18
|
||||
removed non-standard malloc.h from kiss_fft.h
|
||||
|
||||
moved -lm to end of link line
|
||||
|
||||
checked various return values
|
||||
|
||||
converted python Numeric code to NumPy
|
||||
|
||||
fixed test of int32_t on 64 bit OS
|
||||
|
||||
added padding in a couple of places to allow SIMD alignment of structs
|
||||
|
||||
1.2.9 2010-05-27
|
||||
threadsafe ( including OpenMP )
|
||||
|
||||
first edition of kissfft.hh the C++ template fft engine
|
||||
|
||||
1.2.8
|
||||
Changed memory.h to string.h -- apparently more standard
|
||||
|
||||
Added openmp extensions. This can have fairly linear speedups for larger FFT sizes.
|
||||
|
||||
1.2.7
|
||||
Shrank the real-fft memory footprint. Thanks to Galen Seitz.
|
||||
|
||||
1.2.6 (Nov 14, 2006) The "thanks to GenArts" release.
|
||||
Added multi-dimensional real-optimized FFT, see tools/kiss_fftndr
|
||||
Thanks go to GenArts, Inc. for sponsoring the development.
|
||||
|
||||
1.2.5 (June 27, 2006) The "release for no good reason" release.
|
||||
Changed some harmless code to make some compilers' warnings go away.
|
||||
Added some more digits to pi -- why not.
|
||||
Added kiss_fft_next_fast_size() function to help people decide how much to pad.
|
||||
Changed multidimensional test from 8 dimensions to only 3 to avoid testing
|
||||
problems with fixed point (sorry Buckaroo Banzai).
|
||||
|
||||
1.2.4 (Oct 27, 2005) The "oops, inverse fixed point real fft was borked" release.
|
||||
Fixed scaling bug for inverse fixed point real fft -- also fixed test code that should've been failing.
|
||||
Thanks to Jean-Marc Valin for bug report.
|
||||
|
||||
Use sys/types.h for more portable types than short,int,long => int16_t,int32_t,int64_t
|
||||
If your system does not have these, you may need to define them -- but at least it breaks in a
|
||||
loud and easily fixable way -- unlike silently using the wrong size type.
|
||||
|
||||
Hopefully tools/psdpng.c is fixed -- thanks to Steve Kellog for pointing out the weirdness.
|
||||
|
||||
1.2.3 (June 25, 2005) The "you want to use WHAT as a sample" release.
|
||||
Added ability to use 32 bit fixed point samples -- requires a 64 bit intermediate result, a la 'long long'
|
||||
|
||||
Added ability to do 4 FFTs in parallel by using SSE SIMD instructions. This is accomplished by
|
||||
using the __m128 (vector of 4 floats) as kiss_fft_scalar. Define USE_SIMD to use this.
|
||||
|
||||
I know, I know ... this is drifting a bit from the "kiss" principle, but the speed advantages
|
||||
make it worth it for some. Also recent gcc makes it SOO easy to use vectors of 4 floats like a POD type.
|
||||
|
||||
1.2.2 (May 6, 2005) The Matthew release
|
||||
Replaced fixed point division with multiply&shift. Thanks to Jean-Marc Valin for
|
||||
discussions regarding. Considerable speedup for fixed-point.
|
||||
|
||||
Corrected overflow protection in real fft routines when using fixed point.
|
||||
Finder's Credit goes to Robert Oschler of robodance for pointing me at the bug.
|
||||
This also led to the CHECK_OVERFLOW_OP macro.
|
||||
|
||||
1.2.1 (April 4, 2004)
|
||||
compiles cleanly with just about every -W warning flag under the sun
|
||||
|
||||
reorganized kiss_fft_state so it could be read-only/const. This may be useful for embedded systems
|
||||
that are willing to predeclare twiddle factors, factorization.
|
||||
|
||||
Fixed C_MUL,S_MUL on 16-bit platforms.
|
||||
|
||||
tmpbuf will only be allocated if input & output buffers are same
|
||||
scratchbuf will only be allocated for ffts that are not multiples of 2,3,5
|
||||
|
||||
NOTE: The tmpbuf,scratchbuf changes may require synchronization code for multi-threaded apps.
|
||||
|
||||
|
||||
1.2 (Feb 23, 2004)
|
||||
interface change -- cfg object is forward declaration of struct instead of void*
|
||||
This maintains type saftey and lets the compiler warn/error about stupid mistakes.
|
||||
(prompted by suggestion from Erik de Castro Lopo)
|
||||
|
||||
small speed improvements
|
||||
|
||||
added psdpng.c -- sample utility that will create png spectrum "waterfalls" from an input file
|
||||
( not terribly useful yet)
|
||||
|
||||
1.1.1 (Feb 1, 2004 )
|
||||
minor bug fix -- only affects odd rank, in-place, multi-dimensional FFTs
|
||||
|
||||
1.1 : (Jan 30,2004)
|
||||
split sample_code/ into test/ and tools/
|
||||
|
||||
Removed 2-D fft and added N-D fft (arbitrary)
|
||||
|
||||
modified fftutil.c to allow multi-d FFTs
|
||||
|
||||
Modified core fft routine to allow an input stride via kiss_fft_stride()
|
||||
(eased support of multi-D ffts)
|
||||
|
||||
Added fast convolution filtering (FIR filtering using overlap-scrap method, with tail scrap)
|
||||
|
||||
Add kfc.[ch]: the KISS FFT Cache. It takes care of allocs for you ( suggested by Oscar Lesta ).
|
||||
|
||||
1.0.1 (Dec 15, 2003)
|
||||
fixed bug that occurred when nfft==1. Thanks to Steven Johnson.
|
||||
|
||||
1.0 : (Dec 14, 2003)
|
||||
changed kiss_fft function from using a single buffer, to two buffers.
|
||||
If the same buffer pointer is supplied for both in and out, kiss will
|
||||
manage the buffer copies.
|
||||
|
||||
added kiss_fft2d and kiss_fftr as separate source files (declarations in kiss_fft.h )
|
||||
|
||||
0.4 :(Nov 4,2003) optimized for radix 2,3,4,5
|
||||
|
||||
0.3 :(Oct 28, 2003) woops, version 2 didn't actually factor out any radices other than 2.
|
||||
Thanks to Steven Johnson for finding this one.
|
||||
|
||||
0.2 :(Oct 27, 2003) added mixed radix, only radix 2,4 optimized versions
|
||||
|
||||
0.1 :(May 19 2003) initial release, radix 2 only
|
@ -1,11 +0,0 @@
|
||||
Copyright (c) 2003-2010 Mark Borgerding
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
|
||||
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
@ -1,134 +0,0 @@
|
||||
KISS FFT - A mixed-radix Fast Fourier Transform based up on the principle,
|
||||
"Keep It Simple, Stupid."
|
||||
|
||||
There are many great fft libraries already around. Kiss FFT is not trying
|
||||
to be better than any of them. It only attempts to be a reasonably efficient,
|
||||
moderately useful FFT that can use fixed or floating data types and can be
|
||||
incorporated into someone's C program in a few minutes with trivial licensing.
|
||||
|
||||
USAGE:
|
||||
|
||||
The basic usage for 1-d complex FFT is:
|
||||
|
||||
#include "kiss_fft.h"
|
||||
|
||||
kiss_fft_cfg cfg = kiss_fft_alloc( nfft ,is_inverse_fft ,0,0 );
|
||||
|
||||
while ...
|
||||
|
||||
... // put kth sample in cx_in[k].r and cx_in[k].i
|
||||
|
||||
kiss_fft( cfg , cx_in , cx_out );
|
||||
|
||||
... // transformed. DC is in cx_out[0].r and cx_out[0].i
|
||||
|
||||
free(cfg);
|
||||
|
||||
Note: frequency-domain data is stored from dc up to 2pi.
|
||||
so cx_out[0] is the dc bin of the FFT
|
||||
and cx_out[nfft/2] is the Nyquist bin (if exists)
|
||||
|
||||
Declarations are in "kiss_fft.h", along with a brief description of the
|
||||
functions you'll need to use.
|
||||
|
||||
Code definitions for 1d complex FFTs are in kiss_fft.c.
|
||||
|
||||
You can do other cool stuff with the extras you'll find in tools/
|
||||
|
||||
* multi-dimensional FFTs
|
||||
* real-optimized FFTs (returns the positive half-spectrum: (nfft/2+1) complex frequency bins)
|
||||
* fast convolution FIR filtering (not available for fixed point)
|
||||
* spectrum image creation
|
||||
|
||||
The core fft and most tools/ code can be compiled to use float, double,
|
||||
Q15 short or Q31 samples. The default is float.
|
||||
|
||||
|
||||
BACKGROUND:
|
||||
|
||||
I started coding this because I couldn't find a fixed point FFT that didn't
|
||||
use assembly code. I started with floating point numbers so I could get the
|
||||
theory straight before working on fixed point issues. In the end, I had a
|
||||
little bit of code that could be recompiled easily to do ffts with short, float
|
||||
or double (other types should be easy too).
|
||||
|
||||
Once I got my FFT working, I was curious about the speed compared to
|
||||
a well respected and highly optimized fft library. I don't want to criticize
|
||||
this great library, so let's call it FFT_BRANDX.
|
||||
During this process, I learned:
|
||||
|
||||
1. FFT_BRANDX has more than 100K lines of code. The core of kiss_fft is about 500 lines (cpx 1-d).
|
||||
2. It took me an embarrassingly long time to get FFT_BRANDX working.
|
||||
3. A simple program using FFT_BRANDX is 522KB. A similar program using kiss_fft is 18KB (without optimizing for size).
|
||||
4. FFT_BRANDX is roughly twice as fast as KISS FFT in default mode.
|
||||
|
||||
It is wonderful that free, highly optimized libraries like FFT_BRANDX exist.
|
||||
But such libraries carry a huge burden of complexity necessary to extract every
|
||||
last bit of performance.
|
||||
|
||||
Sometimes simpler is better, even if it's not better.
|
||||
|
||||
FREQUENTLY ASKED QUESTIONS:
|
||||
Q: Can I use kissfft in a project with a ___ license?
|
||||
A: Yes. See LICENSE below.
|
||||
|
||||
Q: Why don't I get the output I expect?
|
||||
A: The two most common causes of this are
|
||||
1) scaling : is there a constant multiplier between what you got and what you want?
|
||||
2) mixed build environment -- all code must be compiled with same preprocessor
|
||||
definitions for FIXED_POINT and kiss_fft_scalar
|
||||
|
||||
Q: Will you write/debug my code for me?
|
||||
A: Probably not unless you pay me. I am happy to answer pointed and topical questions, but
|
||||
I may refer you to a book, a forum, or some other resource.
|
||||
|
||||
|
||||
PERFORMANCE:
|
||||
(on Athlon XP 2100+, with gcc 2.96, float data type)
|
||||
|
||||
Kiss performed 10000 1024-pt cpx ffts in .63 s of cpu time.
|
||||
For comparison, it took md5sum twice as long to process the same amount of data.
|
||||
|
||||
Transforming 5 minutes of CD quality audio takes less than a second (nfft=1024).
|
||||
|
||||
DO NOT:
|
||||
... use Kiss if you need the Fastest Fourier Transform in the World
|
||||
... ask me to add features that will bloat the code
|
||||
|
||||
UNDER THE HOOD:
|
||||
|
||||
Kiss FFT uses a time decimation, mixed-radix, out-of-place FFT. If you give it an input buffer
|
||||
and output buffer that are the same, a temporary buffer will be created to hold the data.
|
||||
|
||||
No static data is used. The core routines of kiss_fft are thread-safe (but not all of the tools directory).
|
||||
|
||||
No scaling is done for the floating point version (for speed).
|
||||
Scaling is done both ways for the fixed-point version (for overflow prevention).
|
||||
|
||||
Optimized butterflies are used for factors 2,3,4, and 5.
|
||||
|
||||
The real (i.e. not complex) optimization code only works for even length ffts. It does two half-length
|
||||
FFTs in parallel (packed into real&imag), and then combines them via twiddling. The result is
|
||||
nfft/2+1 complex frequency bins from DC to Nyquist. If you don't know what this means, search the web.
|
||||
|
||||
The fast convolution filtering uses the overlap-scrap method, slightly
|
||||
modified to put the scrap at the tail.
|
||||
|
||||
LICENSE:
|
||||
Revised BSD License, see COPYING for verbiage.
|
||||
Basically, "free to use&change, give credit where due, no guarantees"
|
||||
Note this license is compatible with GPL at one end of the spectrum and closed, commercial software at
|
||||
the other end. See http://www.fsf.org/licensing/licenses
|
||||
|
||||
A commercial license is available which removes the requirement for attribution. Contact me for details.
|
||||
|
||||
|
||||
TODO:
|
||||
*) Add real optimization for odd length FFTs
|
||||
*) Document/revisit the input/output fft scaling
|
||||
*) Make doc describing the overlap (tail) scrap fast convolution filtering in kiss_fastfir.c
|
||||
*) Test all the ./tools/ code with fixed point (kiss_fastfir.c doesn't work, maybe others)
|
||||
|
||||
AUTHOR:
|
||||
Mark Borgerding
|
||||
Mark@Borgerding.net
|
@ -1,78 +0,0 @@
|
||||
If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft
|
||||
to do 4 *separate* FFTs at once.
|
||||
|
||||
Beware! Beyond here there be dragons!
|
||||
|
||||
This API is not easy to use, is not well documented, and breaks the KISS principle.
|
||||
|
||||
|
||||
Still reading? Okay, you may get rewarded for your patience with a considerable speedup
|
||||
(2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops.
|
||||
|
||||
The basic idea is to use the packed 4 float __m128 data type as a scalar element.
|
||||
This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D.
|
||||
|
||||
For complex data, the data is interlaced as follows:
|
||||
rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ...
|
||||
where "rA0" is the real part of the zeroth sample for signal A
|
||||
|
||||
Real-only data is laid out:
|
||||
rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ...
|
||||
|
||||
Compile with gcc flags something like
|
||||
-O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse
|
||||
|
||||
Be aware of SIMD alignment. This is the most likely cause of segfaults.
|
||||
The code within kissfft uses scratch variables on the stack.
|
||||
With SIMD, these must have addresses on 16 byte boundaries.
|
||||
Search on "SIMD alignment" for more info.
|
||||
|
||||
|
||||
|
||||
Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft.
|
||||
I have not run it -- use it at your own risk. It appears to do 4xN and Nx4 transpositions
|
||||
(out of place).
|
||||
|
||||
void SSETools::pack128(float* target, float* source, unsigned long size128)
|
||||
{
|
||||
__m128* pDest = (__m128*)target;
|
||||
__m128* pDestEnd = pDest+size128;
|
||||
float* source0=source;
|
||||
float* source1=source0+size128;
|
||||
float* source2=source1+size128;
|
||||
float* source3=source2+size128;
|
||||
|
||||
while(pDest<pDestEnd)
|
||||
{
|
||||
*pDest=_mm_set_ps(*source3,*source2,*source1,*source0);
|
||||
source0++;
|
||||
source1++;
|
||||
source2++;
|
||||
source3++;
|
||||
pDest++;
|
||||
}
|
||||
}
|
||||
|
||||
void SSETools::unpack128(float* target, float* source, unsigned long size128)
|
||||
{
|
||||
|
||||
float* pSrc = source;
|
||||
float* pSrcEnd = pSrc+size128*4;
|
||||
float* target0=target;
|
||||
float* target1=target0+size128;
|
||||
float* target2=target1+size128;
|
||||
float* target3=target2+size128;
|
||||
|
||||
while(pSrc<pSrcEnd)
|
||||
{
|
||||
*target0=pSrc[0];
|
||||
*target1=pSrc[1];
|
||||
*target2=pSrc[2];
|
||||
*target3=pSrc[3];
|
||||
target0++;
|
||||
target1++;
|
||||
target2++;
|
||||
target3++;
|
||||
pSrc+=4;
|
||||
}
|
||||
}
|
@ -1,164 +0,0 @@
|
||||
/*
|
||||
Copyright (c) 2003-2010, Mark Borgerding
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
|
||||
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
/* kiss_fft.h
|
||||
defines kiss_fft_scalar as either short or a float type
|
||||
and defines
|
||||
typedef struct { kiss_fft_scalar r; kiss_fft_scalar i; }kiss_fft_cpx; */
|
||||
#include "kiss_fft.h"
|
||||
#include <limits.h>
|
||||
|
||||
#define MAXFACTORS 32
|
||||
/* e.g. an fft of length 128 has 4 factors
|
||||
as far as kissfft is concerned
|
||||
4*4*4*2
|
||||
*/
|
||||
|
||||
struct kiss_fft_state{
|
||||
int nfft;
|
||||
int inverse;
|
||||
int factors[2*MAXFACTORS];
|
||||
kiss_fft_cpx twiddles[1];
|
||||
};
|
||||
|
||||
/*
|
||||
Explanation of macros dealing with complex math:
|
||||
|
||||
C_MUL(m,a,b) : m = a*b
|
||||
C_FIXDIV( c , div ) : if a fixed point impl., c /= div. noop otherwise
|
||||
C_SUB( res, a,b) : res = a - b
|
||||
C_SUBFROM( res , a) : res -= a
|
||||
C_ADDTO( res , a) : res += a
|
||||
* */
|
||||
#ifdef FIXED_POINT
|
||||
#if (FIXED_POINT==32)
|
||||
# define FRACBITS 31
|
||||
# define SAMPPROD int64_t
|
||||
#define SAMP_MAX 2147483647
|
||||
#else
|
||||
# define FRACBITS 15
|
||||
# define SAMPPROD int32_t
|
||||
#define SAMP_MAX 32767
|
||||
#endif
|
||||
|
||||
#define SAMP_MIN -SAMP_MAX
|
||||
|
||||
#if defined(CHECK_OVERFLOW)
|
||||
# define CHECK_OVERFLOW_OP(a,op,b) \
|
||||
if ( (SAMPPROD)(a) op (SAMPPROD)(b) > SAMP_MAX || (SAMPPROD)(a) op (SAMPPROD)(b) < SAMP_MIN ) { \
|
||||
fprintf(stderr,"WARNING:overflow @ " __FILE__ "(%d): (%d " #op" %d) = %ld\n",__LINE__,(a),(b),(SAMPPROD)(a) op (SAMPPROD)(b) ); }
|
||||
#endif
|
||||
|
||||
|
||||
# define smul(a,b) ( (SAMPPROD)(a)*(b) )
|
||||
# define sround( x ) (kiss_fft_scalar)( ( (x) + (1<<(FRACBITS-1)) ) >> FRACBITS )
|
||||
|
||||
# define S_MUL(a,b) sround( smul(a,b) )
|
||||
|
||||
# define C_MUL(m,a,b) \
|
||||
do{ (m).r = sround( smul((a).r,(b).r) - smul((a).i,(b).i) ); \
|
||||
(m).i = sround( smul((a).r,(b).i) + smul((a).i,(b).r) ); }while(0)
|
||||
|
||||
# define DIVSCALAR(x,k) \
|
||||
(x) = sround( smul( x, SAMP_MAX/k ) )
|
||||
|
||||
# define C_FIXDIV(c,div) \
|
||||
do { DIVSCALAR( (c).r , div); \
|
||||
DIVSCALAR( (c).i , div); }while (0)
|
||||
|
||||
# define C_MULBYSCALAR( c, s ) \
|
||||
do{ (c).r = sround( smul( (c).r , s ) ) ;\
|
||||
(c).i = sround( smul( (c).i , s ) ) ; }while(0)
|
||||
|
||||
#else /* not FIXED_POINT*/
|
||||
|
||||
# define S_MUL(a,b) ( (a)*(b) )
|
||||
#define C_MUL(m,a,b) \
|
||||
do{ (m).r = (a).r*(b).r - (a).i*(b).i;\
|
||||
(m).i = (a).r*(b).i + (a).i*(b).r; }while(0)
|
||||
# define C_FIXDIV(c,div) /* NOOP */
|
||||
# define C_MULBYSCALAR( c, s ) \
|
||||
do{ (c).r *= (s);\
|
||||
(c).i *= (s); }while(0)
|
||||
#endif
|
||||
|
||||
#ifndef CHECK_OVERFLOW_OP
|
||||
# define CHECK_OVERFLOW_OP(a,op,b) /* noop */
|
||||
#endif
|
||||
|
||||
#define C_ADD( res, a,b)\
|
||||
do { \
|
||||
CHECK_OVERFLOW_OP((a).r,+,(b).r)\
|
||||
CHECK_OVERFLOW_OP((a).i,+,(b).i)\
|
||||
(res).r=(a).r+(b).r; (res).i=(a).i+(b).i; \
|
||||
}while(0)
|
||||
#define C_SUB( res, a,b)\
|
||||
do { \
|
||||
CHECK_OVERFLOW_OP((a).r,-,(b).r)\
|
||||
CHECK_OVERFLOW_OP((a).i,-,(b).i)\
|
||||
(res).r=(a).r-(b).r; (res).i=(a).i-(b).i; \
|
||||
}while(0)
|
||||
#define C_ADDTO( res , a)\
|
||||
do { \
|
||||
CHECK_OVERFLOW_OP((res).r,+,(a).r)\
|
||||
CHECK_OVERFLOW_OP((res).i,+,(a).i)\
|
||||
(res).r += (a).r; (res).i += (a).i;\
|
||||
}while(0)
|
||||
|
||||
#define C_SUBFROM( res , a)\
|
||||
do {\
|
||||
CHECK_OVERFLOW_OP((res).r,-,(a).r)\
|
||||
CHECK_OVERFLOW_OP((res).i,-,(a).i)\
|
||||
(res).r -= (a).r; (res).i -= (a).i; \
|
||||
}while(0)
|
||||
|
||||
|
||||
#ifdef FIXED_POINT
|
||||
# define KISS_FFT_COS(phase) floor(.5+SAMP_MAX * cos (phase))
|
||||
# define KISS_FFT_SIN(phase) floor(.5+SAMP_MAX * sin (phase))
|
||||
# define HALF_OF(x) ((x)>>1)
|
||||
#elif defined(USE_SIMD)
|
||||
# define KISS_FFT_COS(phase) _mm_set1_ps( cos(phase) )
|
||||
# define KISS_FFT_SIN(phase) _mm_set1_ps( sin(phase) )
|
||||
# define HALF_OF(x) ((x)*_mm_set1_ps(.5))
|
||||
#else
|
||||
# define KISS_FFT_COS(phase) (kiss_fft_scalar) cos(phase)
|
||||
# define KISS_FFT_SIN(phase) (kiss_fft_scalar) sin(phase)
|
||||
# define HALF_OF(x) ((x)*.5)
|
||||
#endif
|
||||
|
||||
#define kf_cexp(x,phase) \
|
||||
do{ \
|
||||
(x)->r = KISS_FFT_COS(phase);\
|
||||
(x)->i = KISS_FFT_SIN(phase);\
|
||||
}while(0)
|
||||
|
||||
|
||||
/* a debugging function */
|
||||
#define pcpx(c)\
|
||||
fprintf(stderr,"%g + %gi\n",(double)((c)->r),(double)((c)->i) )
|
||||
|
||||
|
||||
#ifdef KISS_FFT_USE_ALLOCA
|
||||
// define this to allow use of alloca instead of malloc for temporary buffers
|
||||
// Temporary buffers are used in two case:
|
||||
// 1. FFT sizes that have "bad" factors. i.e. not 2,3 and 5
|
||||
// 2. "in-place" FFTs. Notice the quotes, since kissfft does not really do an in-place transform.
|
||||
#include <alloca.h>
|
||||
#define KISS_FFT_TMP_ALLOC(nbytes) alloca(nbytes)
|
||||
#define KISS_FFT_TMP_FREE(ptr)
|
||||
#else
|
||||
#define KISS_FFT_TMP_ALLOC(nbytes) KISS_FFT_MALLOC(nbytes)
|
||||
#define KISS_FFT_TMP_FREE(ptr) KISS_FFT_FREE(ptr)
|
||||
#endif
|
@ -1,408 +0,0 @@
|
||||
/*
|
||||
Copyright (c) 2003-2010, Mark Borgerding
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
|
||||
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
|
||||
#include "_kiss_fft_guts.h"
|
||||
/* The guts header contains all the multiplication and addition macros that are defined for
|
||||
fixed or floating point complex numbers. It also delares the kf_ internal functions.
|
||||
*/
|
||||
|
||||
static void kf_bfly2(
|
||||
kiss_fft_cpx * Fout,
|
||||
const size_t fstride,
|
||||
const kiss_fft_cfg st,
|
||||
int m
|
||||
)
|
||||
{
|
||||
kiss_fft_cpx * Fout2;
|
||||
kiss_fft_cpx * tw1 = st->twiddles;
|
||||
kiss_fft_cpx t;
|
||||
Fout2 = Fout + m;
|
||||
do{
|
||||
C_FIXDIV(*Fout,2); C_FIXDIV(*Fout2,2);
|
||||
|
||||
C_MUL (t, *Fout2 , *tw1);
|
||||
tw1 += fstride;
|
||||
C_SUB( *Fout2 , *Fout , t );
|
||||
C_ADDTO( *Fout , t );
|
||||
++Fout2;
|
||||
++Fout;
|
||||
}while (--m);
|
||||
}
|
||||
|
||||
static void kf_bfly4(
|
||||
kiss_fft_cpx * Fout,
|
||||
const size_t fstride,
|
||||
const kiss_fft_cfg st,
|
||||
const size_t m
|
||||
)
|
||||
{
|
||||
kiss_fft_cpx *tw1,*tw2,*tw3;
|
||||
kiss_fft_cpx scratch[6];
|
||||
size_t k=m;
|
||||
const size_t m2=2*m;
|
||||
const size_t m3=3*m;
|
||||
|
||||
|
||||
tw3 = tw2 = tw1 = st->twiddles;
|
||||
|
||||
do {
|
||||
C_FIXDIV(*Fout,4); C_FIXDIV(Fout[m],4); C_FIXDIV(Fout[m2],4); C_FIXDIV(Fout[m3],4);
|
||||
|
||||
C_MUL(scratch[0],Fout[m] , *tw1 );
|
||||
C_MUL(scratch[1],Fout[m2] , *tw2 );
|
||||
C_MUL(scratch[2],Fout[m3] , *tw3 );
|
||||
|
||||
C_SUB( scratch[5] , *Fout, scratch[1] );
|
||||
C_ADDTO(*Fout, scratch[1]);
|
||||
C_ADD( scratch[3] , scratch[0] , scratch[2] );
|
||||
C_SUB( scratch[4] , scratch[0] , scratch[2] );
|
||||
C_SUB( Fout[m2], *Fout, scratch[3] );
|
||||
tw1 += fstride;
|
||||
tw2 += fstride*2;
|
||||
tw3 += fstride*3;
|
||||
C_ADDTO( *Fout , scratch[3] );
|
||||
|
||||
if(st->inverse) {
|
||||
Fout[m].r = scratch[5].r - scratch[4].i;
|
||||
Fout[m].i = scratch[5].i + scratch[4].r;
|
||||
Fout[m3].r = scratch[5].r + scratch[4].i;
|
||||
Fout[m3].i = scratch[5].i - scratch[4].r;
|
||||
}else{
|
||||
Fout[m].r = scratch[5].r + scratch[4].i;
|
||||
Fout[m].i = scratch[5].i - scratch[4].r;
|
||||
Fout[m3].r = scratch[5].r - scratch[4].i;
|
||||
Fout[m3].i = scratch[5].i + scratch[4].r;
|
||||
}
|
||||
++Fout;
|
||||
}while(--k);
|
||||
}
|
||||
|
||||
static void kf_bfly3(
|
||||
kiss_fft_cpx * Fout,
|
||||
const size_t fstride,
|
||||
const kiss_fft_cfg st,
|
||||
size_t m
|
||||
)
|
||||
{
|
||||
size_t k=m;
|
||||
const size_t m2 = 2*m;
|
||||
kiss_fft_cpx *tw1,*tw2;
|
||||
kiss_fft_cpx scratch[5];
|
||||
kiss_fft_cpx epi3;
|
||||
epi3 = st->twiddles[fstride*m];
|
||||
|
||||
tw1=tw2=st->twiddles;
|
||||
|
||||
do{
|
||||
C_FIXDIV(*Fout,3); C_FIXDIV(Fout[m],3); C_FIXDIV(Fout[m2],3);
|
||||
|
||||
C_MUL(scratch[1],Fout[m] , *tw1);
|
||||
C_MUL(scratch[2],Fout[m2] , *tw2);
|
||||
|
||||
C_ADD(scratch[3],scratch[1],scratch[2]);
|
||||
C_SUB(scratch[0],scratch[1],scratch[2]);
|
||||
tw1 += fstride;
|
||||
tw2 += fstride*2;
|
||||
|
||||
Fout[m].r = Fout->r - HALF_OF(scratch[3].r);
|
||||
Fout[m].i = Fout->i - HALF_OF(scratch[3].i);
|
||||
|
||||
C_MULBYSCALAR( scratch[0] , epi3.i );
|
||||
|
||||
C_ADDTO(*Fout,scratch[3]);
|
||||
|
||||
Fout[m2].r = Fout[m].r + scratch[0].i;
|
||||
Fout[m2].i = Fout[m].i - scratch[0].r;
|
||||
|
||||
Fout[m].r -= scratch[0].i;
|
||||
Fout[m].i += scratch[0].r;
|
||||
|
||||
++Fout;
|
||||
}while(--k);
|
||||
}
|
||||
|
||||
static void kf_bfly5(
|
||||
kiss_fft_cpx * Fout,
|
||||
const size_t fstride,
|
||||
const kiss_fft_cfg st,
|
||||
int m
|
||||
)
|
||||
{
|
||||
kiss_fft_cpx *Fout0,*Fout1,*Fout2,*Fout3,*Fout4;
|
||||
int u;
|
||||
kiss_fft_cpx scratch[13];
|
||||
kiss_fft_cpx * twiddles = st->twiddles;
|
||||
kiss_fft_cpx *tw;
|
||||
kiss_fft_cpx ya,yb;
|
||||
ya = twiddles[fstride*m];
|
||||
yb = twiddles[fstride*2*m];
|
||||
|
||||
Fout0=Fout;
|
||||
Fout1=Fout0+m;
|
||||
Fout2=Fout0+2*m;
|
||||
Fout3=Fout0+3*m;
|
||||
Fout4=Fout0+4*m;
|
||||
|
||||
tw=st->twiddles;
|
||||
for ( u=0; u<m; ++u ) {
|
||||
C_FIXDIV( *Fout0,5); C_FIXDIV( *Fout1,5); C_FIXDIV( *Fout2,5); C_FIXDIV( *Fout3,5); C_FIXDIV( *Fout4,5);
|
||||
scratch[0] = *Fout0;
|
||||
|
||||
C_MUL(scratch[1] ,*Fout1, tw[u*fstride]);
|
||||
C_MUL(scratch[2] ,*Fout2, tw[2*u*fstride]);
|
||||
C_MUL(scratch[3] ,*Fout3, tw[3*u*fstride]);
|
||||
C_MUL(scratch[4] ,*Fout4, tw[4*u*fstride]);
|
||||
|
||||
C_ADD( scratch[7],scratch[1],scratch[4]);
|
||||
C_SUB( scratch[10],scratch[1],scratch[4]);
|
||||
C_ADD( scratch[8],scratch[2],scratch[3]);
|
||||
C_SUB( scratch[9],scratch[2],scratch[3]);
|
||||
|
||||
Fout0->r += scratch[7].r + scratch[8].r;
|
||||
Fout0->i += scratch[7].i + scratch[8].i;
|
||||
|
||||
scratch[5].r = scratch[0].r + S_MUL(scratch[7].r,ya.r) + S_MUL(scratch[8].r,yb.r);
|
||||
scratch[5].i = scratch[0].i + S_MUL(scratch[7].i,ya.r) + S_MUL(scratch[8].i,yb.r);
|
||||
|
||||
scratch[6].r = S_MUL(scratch[10].i,ya.i) + S_MUL(scratch[9].i,yb.i);
|
||||
scratch[6].i = -S_MUL(scratch[10].r,ya.i) - S_MUL(scratch[9].r,yb.i);
|
||||
|
||||
C_SUB(*Fout1,scratch[5],scratch[6]);
|
||||
C_ADD(*Fout4,scratch[5],scratch[6]);
|
||||
|
||||
scratch[11].r = scratch[0].r + S_MUL(scratch[7].r,yb.r) + S_MUL(scratch[8].r,ya.r);
|
||||
scratch[11].i = scratch[0].i + S_MUL(scratch[7].i,yb.r) + S_MUL(scratch[8].i,ya.r);
|
||||
scratch[12].r = - S_MUL(scratch[10].i,yb.i) + S_MUL(scratch[9].i,ya.i);
|
||||
scratch[12].i = S_MUL(scratch[10].r,yb.i) - S_MUL(scratch[9].r,ya.i);
|
||||
|
||||
C_ADD(*Fout2,scratch[11],scratch[12]);
|
||||
C_SUB(*Fout3,scratch[11],scratch[12]);
|
||||
|
||||
++Fout0;++Fout1;++Fout2;++Fout3;++Fout4;
|
||||
}
|
||||
}
|
||||
|
||||
/* perform the butterfly for one stage of a mixed radix FFT */
|
||||
static void kf_bfly_generic(
|
||||
kiss_fft_cpx * Fout,
|
||||
const size_t fstride,
|
||||
const kiss_fft_cfg st,
|
||||
int m,
|
||||
int p
|
||||
)
|
||||
{
|
||||
int u,k,q1,q;
|
||||
kiss_fft_cpx * twiddles = st->twiddles;
|
||||
kiss_fft_cpx t;
|
||||
int Norig = st->nfft;
|
||||
|
||||
kiss_fft_cpx * scratch = (kiss_fft_cpx*)KISS_FFT_TMP_ALLOC(sizeof(kiss_fft_cpx)*p);
|
||||
|
||||
for ( u=0; u<m; ++u ) {
|
||||
k=u;
|
||||
for ( q1=0 ; q1<p ; ++q1 ) {
|
||||
scratch[q1] = Fout[ k ];
|
||||
C_FIXDIV(scratch[q1],p);
|
||||
k += m;
|
||||
}
|
||||
|
||||
k=u;
|
||||
for ( q1=0 ; q1<p ; ++q1 ) {
|
||||
int twidx=0;
|
||||
Fout[ k ] = scratch[0];
|
||||
for (q=1;q<p;++q ) {
|
||||
twidx += fstride * k;
|
||||
if (twidx>=Norig) twidx-=Norig;
|
||||
C_MUL(t,scratch[q] , twiddles[twidx] );
|
||||
C_ADDTO( Fout[ k ] ,t);
|
||||
}
|
||||
k += m;
|
||||
}
|
||||
}
|
||||
KISS_FFT_TMP_FREE(scratch);
|
||||
}
|
||||
|
||||
static
|
||||
void kf_work(
|
||||
kiss_fft_cpx * Fout,
|
||||
const kiss_fft_cpx * f,
|
||||
const size_t fstride,
|
||||
int in_stride,
|
||||
int * factors,
|
||||
const kiss_fft_cfg st
|
||||
)
|
||||
{
|
||||
kiss_fft_cpx * Fout_beg=Fout;
|
||||
const int p=*factors++; /* the radix */
|
||||
const int m=*factors++; /* stage's fft length/p */
|
||||
const kiss_fft_cpx * Fout_end = Fout + p*m;
|
||||
|
||||
#ifdef _OPENMP
|
||||
// use openmp extensions at the
|
||||
// top-level (not recursive)
|
||||
if (fstride==1 && p<=5)
|
||||
{
|
||||
int k;
|
||||
|
||||
// execute the p different work units in different threads
|
||||
# pragma omp parallel for
|
||||
for (k=0;k<p;++k)
|
||||
kf_work( Fout +k*m, f+ fstride*in_stride*k,fstride*p,in_stride,factors,st);
|
||||
// all threads have joined by this point
|
||||
|
||||
switch (p) {
|
||||
case 2: kf_bfly2(Fout,fstride,st,m); break;
|
||||
case 3: kf_bfly3(Fout,fstride,st,m); break;
|
||||
case 4: kf_bfly4(Fout,fstride,st,m); break;
|
||||
case 5: kf_bfly5(Fout,fstride,st,m); break;
|
||||
default: kf_bfly_generic(Fout,fstride,st,m,p); break;
|
||||
}
|
||||
return;
|
||||
}
|
||||
#endif
|
||||
|
||||
if (m==1) {
|
||||
do{
|
||||
*Fout = *f;
|
||||
f += fstride*in_stride;
|
||||
}while(++Fout != Fout_end );
|
||||
}else{
|
||||
do{
|
||||
// recursive call:
|
||||
// DFT of size m*p performed by doing
|
||||
// p instances of smaller DFTs of size m,
|
||||
// each one takes a decimated version of the input
|
||||
kf_work( Fout , f, fstride*p, in_stride, factors,st);
|
||||
f += fstride*in_stride;
|
||||
}while( (Fout += m) != Fout_end );
|
||||
}
|
||||
|
||||
Fout=Fout_beg;
|
||||
|
||||
// recombine the p smaller DFTs
|
||||
switch (p) {
|
||||
case 2: kf_bfly2(Fout,fstride,st,m); break;
|
||||
case 3: kf_bfly3(Fout,fstride,st,m); break;
|
||||
case 4: kf_bfly4(Fout,fstride,st,m); break;
|
||||
case 5: kf_bfly5(Fout,fstride,st,m); break;
|
||||
default: kf_bfly_generic(Fout,fstride,st,m,p); break;
|
||||
}
|
||||
}
|
||||
|
||||
/* facbuf is populated by p1,m1,p2,m2, ...
|
||||
where
|
||||
p[i] * m[i] = m[i-1]
|
||||
m0 = n */
|
||||
static
|
||||
void kf_factor(int n,int * facbuf)
|
||||
{
|
||||
int p=4;
|
||||
double floor_sqrt;
|
||||
floor_sqrt = floor( sqrt((double)n) );
|
||||
|
||||
/*factor out powers of 4, powers of 2, then any remaining primes */
|
||||
do {
|
||||
while (n % p) {
|
||||
switch (p) {
|
||||
case 4: p = 2; break;
|
||||
case 2: p = 3; break;
|
||||
default: p += 2; break;
|
||||
}
|
||||
if (p > floor_sqrt)
|
||||
p = n; /* no more factors, skip to end */
|
||||
}
|
||||
n /= p;
|
||||
*facbuf++ = p;
|
||||
*facbuf++ = n;
|
||||
} while (n > 1);
|
||||
}
|
||||
|
||||
/*
|
||||
*
|
||||
* User-callable function to allocate all necessary storage space for the fft.
|
||||
*
|
||||
* The return value is a contiguous block of memory, allocated with malloc. As such,
|
||||
* It can be freed with free(), rather than a kiss_fft-specific function.
|
||||
* */
|
||||
kiss_fft_cfg kiss_fft_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem )
|
||||
{
|
||||
kiss_fft_cfg st=NULL;
|
||||
size_t memneeded = sizeof(struct kiss_fft_state)
|
||||
+ sizeof(kiss_fft_cpx)*(nfft-1); /* twiddle factors*/
|
||||
|
||||
if ( lenmem==NULL ) {
|
||||
st = ( kiss_fft_cfg)KISS_FFT_MALLOC( memneeded );
|
||||
}else{
|
||||
if (mem != NULL && *lenmem >= memneeded)
|
||||
st = (kiss_fft_cfg)mem;
|
||||
*lenmem = memneeded;
|
||||
}
|
||||
if (st) {
|
||||
int i;
|
||||
st->nfft=nfft;
|
||||
st->inverse = inverse_fft;
|
||||
|
||||
for (i=0;i<nfft;++i) {
|
||||
const double pi=3.141592653589793238462643383279502884197169399375105820974944;
|
||||
double phase = -2*pi*i / nfft;
|
||||
if (st->inverse)
|
||||
phase *= -1;
|
||||
kf_cexp(st->twiddles+i, phase );
|
||||
}
|
||||
|
||||
kf_factor(nfft,st->factors);
|
||||
}
|
||||
return st;
|
||||
}
|
||||
|
||||
|
||||
void kiss_fft_stride(kiss_fft_cfg st,const kiss_fft_cpx *fin,kiss_fft_cpx *fout,int in_stride)
|
||||
{
|
||||
if (fin == fout) {
|
||||
//NOTE: this is not really an in-place FFT algorithm.
|
||||
//It just performs an out-of-place FFT into a temp buffer
|
||||
kiss_fft_cpx * tmpbuf = (kiss_fft_cpx*)KISS_FFT_TMP_ALLOC( sizeof(kiss_fft_cpx)*st->nfft);
|
||||
kf_work(tmpbuf,fin,1,in_stride, st->factors,st);
|
||||
memcpy(fout,tmpbuf,sizeof(kiss_fft_cpx)*st->nfft);
|
||||
KISS_FFT_TMP_FREE(tmpbuf);
|
||||
}else{
|
||||
kf_work( fout, fin, 1,in_stride, st->factors,st );
|
||||
}
|
||||
}
|
||||
|
||||
void kiss_fft(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout)
|
||||
{
|
||||
kiss_fft_stride(cfg,fin,fout,1);
|
||||
}
|
||||
|
||||
|
||||
void kiss_fft_cleanup(void)
|
||||
{
|
||||
// nothing needed any more
|
||||
}
|
||||
|
||||
int kiss_fft_next_fast_size(int n)
|
||||
{
|
||||
while(1) {
|
||||
int m=n;
|
||||
while ( (m%2) == 0 ) m/=2;
|
||||
while ( (m%3) == 0 ) m/=3;
|
||||
while ( (m%5) == 0 ) m/=5;
|
||||
if (m<=1)
|
||||
break; /* n is completely factorable by twos, threes, and fives */
|
||||
n++;
|
||||
}
|
||||
return n;
|
||||
}
|
@ -1,124 +0,0 @@
|
||||
#ifndef KISS_FFT_H
|
||||
#define KISS_FFT_H
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <math.h>
|
||||
#include <string.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
ATTENTION!
|
||||
If you would like a :
|
||||
-- a utility that will handle the caching of fft objects
|
||||
-- real-only (no imaginary time component ) FFT
|
||||
-- a multi-dimensional FFT
|
||||
-- a command-line utility to perform ffts
|
||||
-- a command-line utility to perform fast-convolution filtering
|
||||
|
||||
Then see kfc.h kiss_fftr.h kiss_fftnd.h fftutil.c kiss_fastfir.c
|
||||
in the tools/ directory.
|
||||
*/
|
||||
|
||||
#ifdef USE_SIMD
|
||||
# include <xmmintrin.h>
|
||||
# define kiss_fft_scalar __m128
|
||||
#define KISS_FFT_MALLOC(nbytes) _mm_malloc(nbytes,16)
|
||||
#define KISS_FFT_FREE _mm_free
|
||||
#else
|
||||
#define KISS_FFT_MALLOC malloc
|
||||
#define KISS_FFT_FREE free
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef FIXED_POINT
|
||||
#include <sys/types.h>
|
||||
# if (FIXED_POINT == 32)
|
||||
# define kiss_fft_scalar int32_t
|
||||
# else
|
||||
# define kiss_fft_scalar int16_t
|
||||
# endif
|
||||
#else
|
||||
# ifndef kiss_fft_scalar
|
||||
/* default is float */
|
||||
# define kiss_fft_scalar float
|
||||
# endif
|
||||
#endif
|
||||
|
||||
typedef struct {
|
||||
kiss_fft_scalar r;
|
||||
kiss_fft_scalar i;
|
||||
}kiss_fft_cpx;
|
||||
|
||||
typedef struct kiss_fft_state* kiss_fft_cfg;
|
||||
|
||||
/*
|
||||
* kiss_fft_alloc
|
||||
*
|
||||
* Initialize a FFT (or IFFT) algorithm's cfg/state buffer.
|
||||
*
|
||||
* typical usage: kiss_fft_cfg mycfg=kiss_fft_alloc(1024,0,NULL,NULL);
|
||||
*
|
||||
* The return value from fft_alloc is a cfg buffer used internally
|
||||
* by the fft routine or NULL.
|
||||
*
|
||||
* If lenmem is NULL, then kiss_fft_alloc will allocate a cfg buffer using malloc.
|
||||
* The returned value should be free()d when done to avoid memory leaks.
|
||||
*
|
||||
* The state can be placed in a user supplied buffer 'mem':
|
||||
* If lenmem is not NULL and mem is not NULL and *lenmem is large enough,
|
||||
* then the function places the cfg in mem and the size used in *lenmem
|
||||
* and returns mem.
|
||||
*
|
||||
* If lenmem is not NULL and ( mem is NULL or *lenmem is not large enough),
|
||||
* then the function returns NULL and places the minimum cfg
|
||||
* buffer size in *lenmem.
|
||||
* */
|
||||
|
||||
kiss_fft_cfg kiss_fft_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem);
|
||||
|
||||
/*
|
||||
* kiss_fft(cfg,in_out_buf)
|
||||
*
|
||||
* Perform an FFT on a complex input buffer.
|
||||
* for a forward FFT,
|
||||
* fin should be f[0] , f[1] , ... ,f[nfft-1]
|
||||
* fout will be F[0] , F[1] , ... ,F[nfft-1]
|
||||
* Note that each element is complex and can be accessed like
|
||||
f[k].r and f[k].i
|
||||
* */
|
||||
void kiss_fft(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout);
|
||||
|
||||
/*
|
||||
A more generic version of the above function. It reads its input from every Nth sample.
|
||||
* */
|
||||
void kiss_fft_stride(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout,int fin_stride);
|
||||
|
||||
/* If kiss_fft_alloc allocated a buffer, it is one contiguous
|
||||
buffer and can be simply free()d when no longer needed*/
|
||||
#define kiss_fft_free free
|
||||
|
||||
/*
|
||||
Cleans up some memory that gets managed internally. Not necessary to call, but it might clean up
|
||||
your compiler output to call this before you exit.
|
||||
*/
|
||||
void kiss_fft_cleanup(void);
|
||||
|
||||
|
||||
/*
|
||||
* Returns the smallest integer k, such that k>=n and k has only "fast" factors (2,3,5)
|
||||
*/
|
||||
int kiss_fft_next_fast_size(int n);
|
||||
|
||||
/* for real ffts, we need an even size */
|
||||
#define kiss_fftr_next_fast_size_real(n) \
|
||||
(kiss_fft_next_fast_size( ((n)+1)>>1)<<1)
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
@ -1,159 +0,0 @@
|
||||
/*
|
||||
Copyright (c) 2003-2004, Mark Borgerding
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
|
||||
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#include "kiss_fftr.h"
|
||||
#include "_kiss_fft_guts.h"
|
||||
|
||||
struct kiss_fftr_state{
|
||||
kiss_fft_cfg substate;
|
||||
kiss_fft_cpx * tmpbuf;
|
||||
kiss_fft_cpx * super_twiddles;
|
||||
#ifdef USE_SIMD
|
||||
void * pad;
|
||||
#endif
|
||||
};
|
||||
|
||||
kiss_fftr_cfg kiss_fftr_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem)
|
||||
{
|
||||
int i;
|
||||
kiss_fftr_cfg st = NULL;
|
||||
size_t subsize, memneeded;
|
||||
|
||||
if (nfft & 1) {
|
||||
fprintf(stderr,"Real FFT optimization must be even.\n");
|
||||
return NULL;
|
||||
}
|
||||
nfft >>= 1;
|
||||
|
||||
kiss_fft_alloc (nfft, inverse_fft, NULL, &subsize);
|
||||
memneeded = sizeof(struct kiss_fftr_state) + subsize + sizeof(kiss_fft_cpx) * ( nfft * 3 / 2);
|
||||
|
||||
if (lenmem == NULL) {
|
||||
st = (kiss_fftr_cfg) KISS_FFT_MALLOC (memneeded);
|
||||
} else {
|
||||
if (*lenmem >= memneeded)
|
||||
st = (kiss_fftr_cfg) mem;
|
||||
*lenmem = memneeded;
|
||||
}
|
||||
if (!st)
|
||||
return NULL;
|
||||
|
||||
st->substate = (kiss_fft_cfg) (st + 1); /*just beyond kiss_fftr_state struct */
|
||||
st->tmpbuf = (kiss_fft_cpx *) (((char *) st->substate) + subsize);
|
||||
st->super_twiddles = st->tmpbuf + nfft;
|
||||
kiss_fft_alloc(nfft, inverse_fft, st->substate, &subsize);
|
||||
|
||||
for (i = 0; i < nfft/2; ++i) {
|
||||
double phase =
|
||||
-3.14159265358979323846264338327 * ((double) (i+1) / nfft + .5);
|
||||
if (inverse_fft)
|
||||
phase *= -1;
|
||||
kf_cexp (st->super_twiddles+i,phase);
|
||||
}
|
||||
return st;
|
||||
}
|
||||
|
||||
void kiss_fftr(kiss_fftr_cfg st,const kiss_fft_scalar *timedata,kiss_fft_cpx *freqdata)
|
||||
{
|
||||
/* input buffer timedata is stored row-wise */
|
||||
int k,ncfft;
|
||||
kiss_fft_cpx fpnk,fpk,f1k,f2k,tw,tdc;
|
||||
|
||||
if ( st->substate->inverse) {
|
||||
fprintf(stderr,"kiss fft usage error: improper alloc\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
ncfft = st->substate->nfft;
|
||||
|
||||
/*perform the parallel fft of two real signals packed in real,imag*/
|
||||
kiss_fft( st->substate , (const kiss_fft_cpx*)timedata, st->tmpbuf );
|
||||
/* The real part of the DC element of the frequency spectrum in st->tmpbuf
|
||||
* contains the sum of the even-numbered elements of the input time sequence
|
||||
* The imag part is the sum of the odd-numbered elements
|
||||
*
|
||||
* The sum of tdc.r and tdc.i is the sum of the input time sequence.
|
||||
* yielding DC of input time sequence
|
||||
* The difference of tdc.r - tdc.i is the sum of the input (dot product) [1,-1,1,-1...
|
||||
* yielding Nyquist bin of input time sequence
|
||||
*/
|
||||
|
||||
tdc.r = st->tmpbuf[0].r;
|
||||
tdc.i = st->tmpbuf[0].i;
|
||||
C_FIXDIV(tdc,2);
|
||||
CHECK_OVERFLOW_OP(tdc.r ,+, tdc.i);
|
||||
CHECK_OVERFLOW_OP(tdc.r ,-, tdc.i);
|
||||
freqdata[0].r = tdc.r + tdc.i;
|
||||
freqdata[ncfft].r = tdc.r - tdc.i;
|
||||
#ifdef USE_SIMD
|
||||
freqdata[ncfft].i = freqdata[0].i = _mm_set1_ps(0);
|
||||
#else
|
||||
freqdata[ncfft].i = freqdata[0].i = 0;
|
||||
#endif
|
||||
|
||||
for ( k=1;k <= ncfft/2 ; ++k ) {
|
||||
fpk = st->tmpbuf[k];
|
||||
fpnk.r = st->tmpbuf[ncfft-k].r;
|
||||
fpnk.i = - st->tmpbuf[ncfft-k].i;
|
||||
C_FIXDIV(fpk,2);
|
||||
C_FIXDIV(fpnk,2);
|
||||
|
||||
C_ADD( f1k, fpk , fpnk );
|
||||
C_SUB( f2k, fpk , fpnk );
|
||||
C_MUL( tw , f2k , st->super_twiddles[k-1]);
|
||||
|
||||
freqdata[k].r = HALF_OF(f1k.r + tw.r);
|
||||
freqdata[k].i = HALF_OF(f1k.i + tw.i);
|
||||
freqdata[ncfft-k].r = HALF_OF(f1k.r - tw.r);
|
||||
freqdata[ncfft-k].i = HALF_OF(tw.i - f1k.i);
|
||||
}
|
||||
}
|
||||
|
||||
void kiss_fftri(kiss_fftr_cfg st,const kiss_fft_cpx *freqdata,kiss_fft_scalar *timedata)
|
||||
{
|
||||
/* input buffer timedata is stored row-wise */
|
||||
int k, ncfft;
|
||||
|
||||
if (st->substate->inverse == 0) {
|
||||
fprintf (stderr, "kiss fft usage error: improper alloc\n");
|
||||
exit (1);
|
||||
}
|
||||
|
||||
ncfft = st->substate->nfft;
|
||||
|
||||
st->tmpbuf[0].r = freqdata[0].r + freqdata[ncfft].r;
|
||||
st->tmpbuf[0].i = freqdata[0].r - freqdata[ncfft].r;
|
||||
C_FIXDIV(st->tmpbuf[0],2);
|
||||
|
||||
for (k = 1; k <= ncfft / 2; ++k) {
|
||||
kiss_fft_cpx fk, fnkc, fek, fok, tmp;
|
||||
fk = freqdata[k];
|
||||
fnkc.r = freqdata[ncfft - k].r;
|
||||
fnkc.i = -freqdata[ncfft - k].i;
|
||||
C_FIXDIV( fk , 2 );
|
||||
C_FIXDIV( fnkc , 2 );
|
||||
|
||||
C_ADD (fek, fk, fnkc);
|
||||
C_SUB (tmp, fk, fnkc);
|
||||
C_MUL (fok, tmp, st->super_twiddles[k-1]);
|
||||
C_ADD (st->tmpbuf[k], fek, fok);
|
||||
C_SUB (st->tmpbuf[ncfft - k], fek, fok);
|
||||
#ifdef USE_SIMD
|
||||
st->tmpbuf[ncfft - k].i *= _mm_set1_ps(-1.0);
|
||||
#else
|
||||
st->tmpbuf[ncfft - k].i *= -1;
|
||||
#endif
|
||||
}
|
||||
kiss_fft (st->substate, st->tmpbuf, (kiss_fft_cpx *) timedata);
|
||||
}
|
@ -1,46 +0,0 @@
|
||||
#ifndef KISS_FTR_H
|
||||
#define KISS_FTR_H
|
||||
|
||||
#include "kiss_fft.h"
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
|
||||
/*
|
||||
|
||||
Real optimized version can save about 45% cpu time vs. complex fft of a real seq.
|
||||
|
||||
|
||||
|
||||
*/
|
||||
|
||||
typedef struct kiss_fftr_state *kiss_fftr_cfg;
|
||||
|
||||
|
||||
kiss_fftr_cfg kiss_fftr_alloc(int nfft,int inverse_fft,void * mem, size_t * lenmem);
|
||||
/*
|
||||
nfft must be even
|
||||
|
||||
If you don't care to allocate space, use mem = lenmem = NULL
|
||||
*/
|
||||
|
||||
|
||||
void kiss_fftr(kiss_fftr_cfg cfg,const kiss_fft_scalar *timedata,kiss_fft_cpx *freqdata);
|
||||
/*
|
||||
input timedata has nfft scalar points
|
||||
output freqdata has nfft/2+1 complex points
|
||||
*/
|
||||
|
||||
void kiss_fftri(kiss_fftr_cfg cfg,const kiss_fft_cpx *freqdata,kiss_fft_scalar *timedata);
|
||||
/*
|
||||
input freqdata has nfft/2+1 complex points
|
||||
output timedata has nfft scalar points
|
||||
*/
|
||||
|
||||
#define kiss_fftr_free free
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
#endif
|
@ -1,20 +0,0 @@
|
||||
# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*-
|
||||
# vim: set filetype=python:
|
||||
# This Source Code Form is subject to the terms of the Mozilla Public
|
||||
# License, v. 2.0. If a copy of the MPL was not distributed with this
|
||||
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
|
||||
|
||||
with Files("**"):
|
||||
BUG_COMPONENT = ("Core", "Web Audio")
|
||||
|
||||
EXPORTS.kiss_fft += [
|
||||
'kiss_fft.h',
|
||||
'kiss_fftr.h',
|
||||
]
|
||||
|
||||
SOURCES += [
|
||||
'kiss_fft.c',
|
||||
'kiss_fftr.c',
|
||||
]
|
||||
|
||||
FINAL_LIBRARY = 'xul'
|
@ -1,49 +0,0 @@
|
||||
schema: 1
|
||||
|
||||
bugzilla:
|
||||
product: Core
|
||||
component: "Web Audio"
|
||||
|
||||
origin:
|
||||
name: kiss_fft
|
||||
description: A mixed-radix Fast Fourier Transform
|
||||
|
||||
url: https://github.com/mborgerding/kissfft
|
||||
|
||||
release: 1c3d6f5aa9eb2bf2f18641f0a7e3e6f5e523a156 (2017-10-25T13:50:40Z).
|
||||
revision: 1c3d6f5aa9eb2bf2f18641f0a7e3e6f5e523a156
|
||||
|
||||
license: BSD-3-Clause
|
||||
license-file: COPYING
|
||||
|
||||
vendoring:
|
||||
url: https://github.com/mborgerding/kissfft
|
||||
source-hosting: github
|
||||
tracking: commit
|
||||
|
||||
exclude:
|
||||
- ".*"
|
||||
- test
|
||||
- tools/fftutil.c
|
||||
- tools/psdpng.c
|
||||
- "tools/kiss_fftnd*"
|
||||
- tools/kiss_fastfir.c
|
||||
- "tools/kfc.*"
|
||||
- "tools/.*"
|
||||
- TIPS
|
||||
- kissfft.hh
|
||||
- tools/Makefile
|
||||
- Makefile
|
||||
|
||||
keep:
|
||||
- COPYING
|
||||
- _kiss_fft_guts.h
|
||||
- kiss_fft.c
|
||||
- kiss_fft.h
|
||||
- tools/kiss_fftr.c
|
||||
- tools/kiss_fftr.h
|
||||
|
||||
update-actions:
|
||||
- action: move-dir
|
||||
from: '{vendor_dir}/tools'
|
||||
to: '{vendor_dir}'
|
@ -1,39 +0,0 @@
|
||||
Use of this source code is governed by a BSD-style license that can be
|
||||
found in the LICENSE file in the root of the source tree. All
|
||||
contributing project authors may be found in the AUTHORS file in the
|
||||
root of the source tree.
|
||||
|
||||
The files were originally licensed by ARM Limited.
|
||||
|
||||
The following files:
|
||||
|
||||
* dl/api/omxtypes.h
|
||||
* dl/sp/api/omxSP.h
|
||||
|
||||
are licensed by Khronos:
|
||||
|
||||
Copyright (c) 2005-2008,2015 The Khronos Group Inc.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a
|
||||
copy of this software and/or associated documentation files (the
|
||||
"Materials"), to deal in the Materials without restriction, including
|
||||
without limitation the rights to use, copy, modify, merge, publish,
|
||||
distribute, sublicense, and/or sell copies of the Materials, and to
|
||||
permit persons to whom the Materials are furnished to do so, subject to
|
||||
the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included
|
||||
in all copies or substantial portions of the Materials.
|
||||
|
||||
MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
|
||||
KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
|
||||
SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
|
||||
https://www.khronos.org/registry/
|
||||
|
||||
THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
||||
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
|
@ -1,3 +0,0 @@
|
||||
ajm@google.com
|
||||
kma@google.com
|
||||
rtoy@google.com
|
@ -1,19 +0,0 @@
|
||||
Name: OpenMAX DL
|
||||
Short Name: OpenMax DL
|
||||
URL: https://silver.arm.com/download/Software/Graphics/OX000-BU-00010-r1p0-00bet0/OX000-BU-00010-r1p0-00bet0.tgz
|
||||
Version: 1.0.2
|
||||
License: BSD
|
||||
License File: LICENSE
|
||||
Security Critical: yes
|
||||
|
||||
Description:
|
||||
Implementation of OpenMAX DL spec from ARM. This is used to support
|
||||
WebAudio for Chromium on Android.
|
||||
|
||||
Local Modifications:
|
||||
Only the FFT routines from the OpenMAX DL package are included. The
|
||||
code was modified to work with gcc and a new implementation for a
|
||||
floating-point FFT was added.
|
||||
|
||||
The original ARM license is unclear, but Google has obtained
|
||||
permission to relicense this code under a BSD license.
|
@ -1,9 +0,0 @@
|
||||
Bug 1158741 added an omxSP_FFTInv_CCSToR_F32_Sfs_unscaled function as an
|
||||
optimization which performs the same operation as
|
||||
omxSP_FFTInv_CCSToR_F32_Sfs except it doesn't scale the results by the
|
||||
length of the FFT. For consistency with other FFT routines used, it does
|
||||
multiply the results by two.
|
||||
|
||||
The affected files are:
|
||||
media/openmax_dl/dl/sp/api/omxSP.h
|
||||
media/openmax_dl/dl/sp/src/omxSP_FFTInv_CCSToR_F32_Sfs_unscaled_s.S
|
@ -1,417 +0,0 @@
|
||||
@// -*- Mode: asm; -*-
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This file was originally licensed as follows. It has been
|
||||
@// relicensed with permission from the copyright holders.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// File Name: armCOMM_s.h
|
||||
@// OpenMAX DL: v1.0.2
|
||||
@// Last Modified Revision: 13871
|
||||
@// Last Modified Date: Fri, 09 May 2008
|
||||
@//
|
||||
@// (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
|
||||
@//
|
||||
@//
|
||||
@//
|
||||
@// ARM optimized OpenMAX common header file
|
||||
@//
|
||||
|
||||
.set _SBytes, 0 @ Number of scratch bytes on stack
|
||||
.set _Workspace, 0 @ Stack offset of scratch workspace
|
||||
|
||||
.set _RRegList, 0 @ R saved register list (last register number)
|
||||
.set _DRegList, 0 @ D saved register list (last register number)
|
||||
|
||||
@// Work out a list of R saved registers, and how much stack space is needed.
|
||||
@// gas doesn't support setting a variable to a string, so we set _RRegList to
|
||||
@// the register number.
|
||||
.macro _M_GETRREGLIST rreg
|
||||
.ifeqs "\rreg", ""
|
||||
@ Nothing needs to be saved
|
||||
.exitm
|
||||
.endif
|
||||
@ If rreg is lr or r4, save lr and r4
|
||||
.ifeqs "\rreg", "lr"
|
||||
.set _RRegList, 4
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\rreg", "r4"
|
||||
.set _RRegList, 4
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
@ If rreg = r5 or r6, save up to register r6
|
||||
.ifeqs "\rreg", "r5"
|
||||
.set _RRegList, 6
|
||||
.exitm
|
||||
.endif
|
||||
.ifeqs "\rreg", "r6"
|
||||
.set _RRegList, 6
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
@ If rreg = r7 or r8, save up to register r8
|
||||
.ifeqs "\rreg", "r7"
|
||||
.set _RRegList, 8
|
||||
.exitm
|
||||
.endif
|
||||
.ifeqs "\rreg", "r8"
|
||||
.set _RRegList, 8
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
@ If rreg = r9 or r10, save up to register r10
|
||||
.ifeqs "\rreg", "r9"
|
||||
.set _RRegList, 10
|
||||
.exitm
|
||||
.endif
|
||||
.ifeqs "\rreg", "r10"
|
||||
.set _RRegList, 10
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
@ If rreg = r11 or r12, save up to register r12
|
||||
.ifeqs "\rreg", "r11"
|
||||
.set _RRegList, 12
|
||||
.exitm
|
||||
.endif
|
||||
.ifeqs "\rreg", "r12"
|
||||
.set _RRegList, 12
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.warning "Unrecognized saved r register limit: \rreg"
|
||||
.endm
|
||||
|
||||
@ Work out list of D saved registers, like for R registers.
|
||||
.macro _M_GETDREGLIST dreg
|
||||
.ifeqs "\dreg", ""
|
||||
.set _DRegList, 0
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d8"
|
||||
.set _DRegList, 8
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d9"
|
||||
.set _DRegList, 9
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d10"
|
||||
.set _DRegList, 10
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d11"
|
||||
.set _DRegList, 11
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d12"
|
||||
.set _DRegList, 12
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d13"
|
||||
.set _DRegList, 13
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d14"
|
||||
.set _DRegList, 14
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.ifeqs "\dreg", "d15"
|
||||
.set _DRegList, 15
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.warning "Unrecognized saved d register limit: \rreg"
|
||||
.endm
|
||||
|
||||
@//////////////////////////////////////////////////////////
|
||||
@// Function header and footer macros
|
||||
@//////////////////////////////////////////////////////////
|
||||
|
||||
@ Function Header Macro
|
||||
@ Generates the function prologue
|
||||
@ Note that functions should all be "stack-moves-once"
|
||||
@ The FNSTART and FNEND macros should be the only places
|
||||
@ where the stack moves.
|
||||
@
|
||||
@ name = function name
|
||||
@ rreg = "" don't stack any registers
|
||||
@ "lr" stack "lr" only
|
||||
@ "rN" stack registers "r4-rN,lr"
|
||||
@ dreg = "" don't stack any D registers
|
||||
@ "dN" stack registers "d8-dN"
|
||||
@
|
||||
@ Note: ARM Archicture procedure call standard AAPCS
|
||||
@ states that r4-r11, sp, d8-d15 must be preserved by
|
||||
@ a compliant function.
|
||||
.macro M_START name, rreg, dreg
|
||||
.set _Workspace, 0
|
||||
|
||||
@ Define the function and make it external.
|
||||
.global \name
|
||||
#ifndef __clang__
|
||||
.func \name
|
||||
#endif
|
||||
.section .text.\name,"ax",%progbits
|
||||
.arch armv7-a
|
||||
.fpu neon
|
||||
.syntax unified
|
||||
.object_arch armv4
|
||||
.align 2
|
||||
\name :
|
||||
.fnstart
|
||||
@ Save specified R registers
|
||||
_M_GETRREGLIST \rreg
|
||||
_M_PUSH_RREG
|
||||
|
||||
@ Save specified D registers
|
||||
_M_GETDREGLIST \dreg
|
||||
_M_PUSH_DREG
|
||||
|
||||
@ Ensure size claimed on stack is 8-byte aligned
|
||||
.if (_SBytes & 7) != 0
|
||||
.set _SBytes, _SBytes + (8 - (_SBytes & 7))
|
||||
.endif
|
||||
.if _SBytes != 0
|
||||
sub sp, sp, #_SBytes
|
||||
.endif
|
||||
.endm
|
||||
|
||||
@ Function Footer Macro
|
||||
@ Generates the function epilogue
|
||||
.macro M_END
|
||||
@ Restore the stack pointer to its original value on function entry
|
||||
.if _SBytes != 0
|
||||
add sp, sp, #_SBytes
|
||||
.endif
|
||||
@ Restore any saved R or D registers.
|
||||
_M_RET
|
||||
.fnend
|
||||
#ifndef __clang__
|
||||
.endfunc
|
||||
#endif
|
||||
@ Reset the global stack tracking variables back to their
|
||||
@ initial values.
|
||||
.set _SBytes, 0
|
||||
.endm
|
||||
|
||||
@// Based on the value of _DRegList, push the specified set of registers
|
||||
@// to the stack. Is there a better way?
|
||||
.macro _M_PUSH_DREG
|
||||
.if _DRegList == 8
|
||||
vpush {d8}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 9
|
||||
vpush {d8-d9}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 10
|
||||
vpush {d8-d10}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 11
|
||||
vpush {d8-d11}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 12
|
||||
vpush {d8-d12}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 13
|
||||
vpush {d8-d13}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 14
|
||||
vpush {d8-d14}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 15
|
||||
vpush {d8-d15}
|
||||
.exitm
|
||||
.endif
|
||||
.endm
|
||||
|
||||
@// Based on the value of _RRegList, push the specified set of registers
|
||||
@// to the stack. Is there a better way?
|
||||
.macro _M_PUSH_RREG
|
||||
.if _RRegList == 4
|
||||
stmfd sp!, {r4, lr}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 6
|
||||
stmfd sp!, {r4-r6, lr}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 8
|
||||
stmfd sp!, {r4-r8, lr}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 10
|
||||
stmfd sp!, {r4-r10, lr}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 12
|
||||
stmfd sp!, {r4-r12, lr}
|
||||
.exitm
|
||||
.endif
|
||||
.endm
|
||||
|
||||
@// The opposite of _M_PUSH_DREG
|
||||
.macro _M_POP_DREG
|
||||
.if _DRegList == 8
|
||||
vpop {d8}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 9
|
||||
vpop {d8-d9}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 10
|
||||
vpop {d8-d10}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 11
|
||||
vpop {d8-d11}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 12
|
||||
vpop {d8-d12}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 13
|
||||
vpop {d8-d13}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 14
|
||||
vpop {d8-d14}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _DRegList == 15
|
||||
vpop {d8-d15}
|
||||
.exitm
|
||||
.endif
|
||||
.endm
|
||||
|
||||
@// The opposite of _M_PUSH_RREG
|
||||
.macro _M_POP_RREG cc
|
||||
.if _RRegList == 0
|
||||
bx\cc lr
|
||||
.exitm
|
||||
.endif
|
||||
.if _RRegList == 4
|
||||
ldm\cc\()fd sp!, {r4, pc}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 6
|
||||
ldm\cc\()fd sp!, {r4-r6, pc}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 8
|
||||
ldm\cc\()fd sp!, {r4-r8, pc}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 10
|
||||
ldm\cc\()fd sp!, {r4-r10, pc}
|
||||
.exitm
|
||||
.endif
|
||||
|
||||
.if _RRegList == 12
|
||||
ldm\cc\()fd sp!, {r4-r12, pc}
|
||||
.exitm
|
||||
.endif
|
||||
.endm
|
||||
|
||||
@ Produce function return instructions
|
||||
.macro _M_RET cc
|
||||
_M_POP_DREG \cc
|
||||
_M_POP_RREG \cc
|
||||
.endm
|
||||
|
||||
@// Allocate 4-byte aligned area of name
|
||||
@// |name| and size |size| bytes.
|
||||
.macro M_ALLOC4 name, size
|
||||
.if (_SBytes & 3) != 0
|
||||
.set _SBytes, _SBytes + (4 - (_SBytes & 3))
|
||||
.endif
|
||||
.set \name\()_F, _SBytes
|
||||
.set _SBytes, _SBytes + \size
|
||||
|
||||
.endm
|
||||
|
||||
@ Load word from stack
|
||||
.macro M_LDR r, a0, a1, a2, a3
|
||||
_M_DATA "ldr", 4, \r, \a0, \a1, \a2, \a3
|
||||
.endm
|
||||
|
||||
@ Store word to stack
|
||||
.macro M_STR r, a0, a1, a2, a3
|
||||
_M_DATA "str", 4, \r, \a0, \a1, \a2, \a3
|
||||
.endm
|
||||
|
||||
@ Macro to perform a data access operation
|
||||
@ Such as LDR or STR
|
||||
@ The addressing mode is modified such that
|
||||
@ 1. If no address is given then the name is taken
|
||||
@ as a stack offset
|
||||
@ 2. If the addressing mode is not available for the
|
||||
@ state being assembled for (eg Thumb) then a suitable
|
||||
@ addressing mode is substituted.
|
||||
@
|
||||
@ On Entry:
|
||||
@ $i = Instruction to perform (eg "LDRB")
|
||||
@ $a = Required byte alignment
|
||||
@ $r = Register(s) to transfer (eg "r1")
|
||||
@ $a0,$a1,$a2. Addressing mode and condition. One of:
|
||||
@ label {,cc}
|
||||
@ [base] {,,,cc}
|
||||
@ [base, offset]{!} {,,cc}
|
||||
@ [base, offset, shift]{!} {,cc}
|
||||
@ [base], offset {,,cc}
|
||||
@ [base], offset, shift {,cc}
|
||||
@
|
||||
@ WARNING: Most of the above are not supported, except the first case.
|
||||
.macro _M_DATA i, a, r, a0, a1, a2, a3
|
||||
.set _Offset, _Workspace + \a0\()_F
|
||||
\i\a1 \r, [sp, #_Offset]
|
||||
.endm
|
@ -1,289 +0,0 @@
|
||||
/*
|
||||
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
*
|
||||
* Use of this source code is governed by a BSD-style license
|
||||
* that can be found in the LICENSE file in the root of the source
|
||||
* tree. An additional intellectual property rights grant can be found
|
||||
* in the file PATENTS. All contributing project authors may
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*
|
||||
* This file was originally licensed as follows. It has been
|
||||
* relicensed with permission from the copyright holders.
|
||||
*/
|
||||
|
||||
/*
|
||||
*
|
||||
* File Name: armOMX_ReleaseVersion.h
|
||||
* OpenMAX DL: v1.0.2
|
||||
* Last Modified Revision: 15322
|
||||
* Last Modified Date: Wed, 15 Oct 2008
|
||||
*
|
||||
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
|
||||
*
|
||||
*
|
||||
*
|
||||
* This file allows a version of the OMX DL libraries to be built where some or
|
||||
* all of the function names can be given a user specified suffix.
|
||||
*
|
||||
* You might want to use it where:
|
||||
*
|
||||
* - you want to rename a function "out of the way" so that you could replace
|
||||
* a function with a different version (the original version would still be
|
||||
* in the library just with a different name - so you could debug the new
|
||||
* version by comparing it to the output of the old)
|
||||
*
|
||||
* - you want to rename all the functions to versions with a suffix so that
|
||||
* you can include two versions of the library and choose between functions
|
||||
* at runtime.
|
||||
*
|
||||
* e.g. omxIPBM_Copy_U8_C1R could be renamed omxIPBM_Copy_U8_C1R_CortexA8
|
||||
*
|
||||
*/
|
||||
|
||||
|
||||
#ifndef _armOMX_H_
|
||||
#define _armOMX_H_
|
||||
|
||||
#define ARMOMX_ENABLE_RENAMING 0
|
||||
#if ARMOMX_ENABLE_RENAMING
|
||||
|
||||
/* We need to define these two macros in order to expand and concatenate the names */
|
||||
#define OMXCAT2BAR(A, B) omx ## A ## B
|
||||
#define OMXCATBAR(A, B) OMXCAT2BAR(A, B)
|
||||
|
||||
/* Define the suffix to add to all functions - the default is no suffix */
|
||||
#define BARE_SUFFIX
|
||||
|
||||
|
||||
|
||||
/* Define what happens to the bare suffix-less functions, down to the sub-domain accuracy */
|
||||
#define OMXACAAC_SUFFIX BARE_SUFFIX
|
||||
#define OMXACMP3_SUFFIX BARE_SUFFIX
|
||||
#define OMXICJP_SUFFIX BARE_SUFFIX
|
||||
#define OMXIPBM_SUFFIX BARE_SUFFIX
|
||||
#define OMXIPCS_SUFFIX BARE_SUFFIX
|
||||
#define OMXIPPP_SUFFIX BARE_SUFFIX
|
||||
#define OMXSP_SUFFIX BARE_SUFFIX
|
||||
#define OMXVCCOMM_SUFFIX BARE_SUFFIX
|
||||
#define OMXVCM4P10_SUFFIX BARE_SUFFIX
|
||||
#define OMXVCM4P2_SUFFIX BARE_SUFFIX
|
||||
|
||||
|
||||
|
||||
|
||||
/* Define what the each bare, un-suffixed OpenMAX API function names is to be renamed */
|
||||
#define omxACAAC_DecodeChanPairElt OMXCATBAR(ACAAC_DecodeChanPairElt, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeDatStrElt OMXCATBAR(ACAAC_DecodeDatStrElt, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeFillElt OMXCATBAR(ACAAC_DecodeFillElt, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeIsStereo_S32 OMXCATBAR(ACAAC_DecodeIsStereo_S32, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeMsPNS_S32_I OMXCATBAR(ACAAC_DecodeMsPNS_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeMsStereo_S32_I OMXCATBAR(ACAAC_DecodeMsStereo_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodePrgCfgElt OMXCATBAR(ACAAC_DecodePrgCfgElt, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DecodeTNS_S32_I OMXCATBAR(ACAAC_DecodeTNS_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_DeinterleaveSpectrum_S32 OMXCATBAR(ACAAC_DeinterleaveSpectrum_S32, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_EncodeTNS_S32_I OMXCATBAR(ACAAC_EncodeTNS_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_LongTermPredict_S32 OMXCATBAR(ACAAC_LongTermPredict_S32, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_LongTermReconstruct_S32_I OMXCATBAR(ACAAC_LongTermReconstruct_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_MDCTFwd_S32 OMXCATBAR(ACAAC_MDCTFwd_S32, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_MDCTInv_S32_S16 OMXCATBAR(ACAAC_MDCTInv_S32_S16, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_NoiselessDecode OMXCATBAR(ACAAC_NoiselessDecode, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_QuantInv_S32_I OMXCATBAR(ACAAC_QuantInv_S32_I, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_UnpackADIFHeader OMXCATBAR(ACAAC_UnpackADIFHeader, OMXACAAC_SUFFIX)
|
||||
#define omxACAAC_UnpackADTSFrameHeader OMXCATBAR(ACAAC_UnpackADTSFrameHeader, OMXACAAC_SUFFIX)
|
||||
|
||||
|
||||
#define omxACMP3_HuffmanDecode_S32 OMXCATBAR(ACMP3_HuffmanDecode_S32, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_HuffmanDecodeSfb_S32 OMXCATBAR(ACMP3_HuffmanDecodeSfb_S32, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_HuffmanDecodeSfbMbp_S32 OMXCATBAR(ACMP3_HuffmanDecodeSfbMbp_S32, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_MDCTInv_S32 OMXCATBAR(ACMP3_MDCTInv_S32, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_ReQuantize_S32_I OMXCATBAR(ACMP3_ReQuantize_S32_I, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_ReQuantizeSfb_S32_I OMXCATBAR(ACMP3_ReQuantizeSfb_S32_I, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_SynthPQMF_S32_S16 OMXCATBAR(ACMP3_SynthPQMF_S32_S16, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_UnpackFrameHeader OMXCATBAR(ACMP3_UnpackFrameHeader, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_UnpackScaleFactors_S8 OMXCATBAR(ACMP3_UnpackScaleFactors_S8, OMXACMP3_SUFFIX)
|
||||
#define omxACMP3_UnpackSideInfo OMXCATBAR(ACMP3_UnpackSideInfo, OMXACMP3_SUFFIX)
|
||||
|
||||
#define omxICJP_CopyExpand_U8_C3 OMXCATBAR(ICJP_CopyExpand_U8_C3, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTFwd_S16 OMXCATBAR(ICJP_DCTFwd_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTFwd_S16_I OMXCATBAR(ICJP_DCTFwd_S16_I, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTInv_S16 OMXCATBAR(ICJP_DCTInv_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTInv_S16_I OMXCATBAR(ICJP_DCTInv_S16_I, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantFwd_Multiple_S16 OMXCATBAR(ICJP_DCTQuantFwd_Multiple_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantFwd_S16 OMXCATBAR(ICJP_DCTQuantFwd_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantFwd_S16_I OMXCATBAR(ICJP_DCTQuantFwd_S16_I, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantFwdTableInit OMXCATBAR(ICJP_DCTQuantFwdTableInit, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantInv_Multiple_S16 OMXCATBAR(ICJP_DCTQuantInv_Multiple_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantInv_S16 OMXCATBAR(ICJP_DCTQuantInv_S16, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantInv_S16_I OMXCATBAR(ICJP_DCTQuantInv_S16_I, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DCTQuantInvTableInit OMXCATBAR(ICJP_DCTQuantInvTableInit, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DecodeHuffman8x8_Direct_S16_C1 OMXCATBAR(ICJP_DecodeHuffman8x8_Direct_S16_C1, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DecodeHuffmanSpecGetBufSize_U8 OMXCATBAR(ICJP_DecodeHuffmanSpecGetBufSize_U8, OMXICJP_SUFFIX)
|
||||
#define omxICJP_DecodeHuffmanSpecInit_U8 OMXCATBAR(ICJP_DecodeHuffmanSpecInit_U8, OMXICJP_SUFFIX)
|
||||
#define omxICJP_EncodeHuffman8x8_Direct_S16_U1_C1 OMXCATBAR(ICJP_EncodeHuffman8x8_Direct_S16_U1_C1, OMXICJP_SUFFIX)
|
||||
#define omxICJP_EncodeHuffmanSpecGetBufSize_U8 OMXCATBAR(ICJP_EncodeHuffmanSpecGetBufSize_U8, OMXICJP_SUFFIX)
|
||||
#define omxICJP_EncodeHuffmanSpecInit_U8 OMXCATBAR(ICJP_EncodeHuffmanSpecInit_U8, OMXICJP_SUFFIX)
|
||||
|
||||
#define omxIPBM_AddC_U8_C1R_Sfs OMXCATBAR(IPBM_AddC_U8_C1R_Sfs, OMXIPBM_SUFFIX)
|
||||
#define omxIPBM_Copy_U8_C1R OMXCATBAR(IPBM_Copy_U8_C1R, OMXIPBM_SUFFIX)
|
||||
#define omxIPBM_Copy_U8_C3R OMXCATBAR(IPBM_Copy_U8_C3R, OMXIPBM_SUFFIX)
|
||||
#define omxIPBM_Mirror_U8_C1R OMXCATBAR(IPBM_Mirror_U8_C1R, OMXIPBM_SUFFIX)
|
||||
#define omxIPBM_MulC_U8_C1R_Sfs OMXCATBAR(IPBM_MulC_U8_C1R_Sfs, OMXIPBM_SUFFIX)
|
||||
|
||||
#define omxIPCS_ColorTwistQ14_U8_C3R OMXCATBAR(IPCS_ColorTwistQ14_U8_C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR565ToYCbCr420LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr420LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR565ToYCbCr422LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr422LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR565ToYCbCr444LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr444LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR888ToYCbCr420LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr420LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR888ToYCbCr422LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr422LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_BGR888ToYCbCr444LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr444LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr420RszCscRotBGR_U8_P3C3R OMXCATBAR(IPCS_YCbCr420RszCscRotBGR_U8_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr420RszRot_U8_P3R OMXCATBAR(IPCS_YCbCr420RszRot_U8_P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr420ToBGR565_U8_U16_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR565_U8_U16_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr420ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr420ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422RszCscRotBGR_U8_P3C3R OMXCATBAR(IPCS_YCbCr422RszCscRotBGR_U8_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_CbYCrY422RszCscRotBGR_U8_U16_C2R OMXCATBAR(IPCS_CbYCrY422RszCscRotBGR_U8_U16_C2R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422RszRot_U8_P3R OMXCATBAR(IPCS_YCbCr422RszRot_U8_P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbYCr422ToBGR565_U8_U16_C2C3R OMXCATBAR(IPCS_YCbYCr422ToBGR565_U8_U16_C2C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbYCr422ToBGR888_U8_C2C3R OMXCATBAR(IPCS_YCbYCr422ToBGR888_U8_C2C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_CbYCrY422ToYCbCr420Rotate_U8_C2P3R OMXCATBAR(IPCS_CbYCrY422ToYCbCr420Rotate_U8_C2P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr422ToYCbCr420Rotate_U8_P3R OMXCATBAR(IPCS_YCbCr422ToYCbCr420Rotate_U8_P3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr444ToBGR565_U8_U16_C3R OMXCATBAR(IPCS_YCbCr444ToBGR565_U8_U16_C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr444ToBGR565_U8_U16_P3C3R OMXCATBAR(IPCS_YCbCr444ToBGR565_U8_U16_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr444ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr444ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
|
||||
#define omxIPCS_YCbCr444ToBGR888_U8_C3R OMXCATBAR(IPCS_YCbCr444ToBGR888_U8_C3R, OMXIPCS_SUFFIX)
|
||||
|
||||
#define omxIPPP_Deblock_HorEdge_U8_I OMXCATBAR(IPPP_Deblock_HorEdge_U8_I, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_Deblock_VerEdge_U8_I OMXCATBAR(IPPP_Deblock_VerEdge_U8_I, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_FilterFIR_U8_C1R OMXCATBAR(IPPP_FilterFIR_U8_C1R, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_FilterMedian_U8_C1R OMXCATBAR(IPPP_FilterMedian_U8_C1R, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_GetCentralMoment_S64 OMXCATBAR(IPPP_GetCentralMoment_S64, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_GetSpatialMoment_S64 OMXCATBAR(IPPP_GetSpatialMoment_S64, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_MomentGetStateSize OMXCATBAR(IPPP_MomentGetStateSize, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_MomentInit OMXCATBAR(IPPP_MomentInit, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_Moments_U8_C1R OMXCATBAR(IPPP_Moments_U8_C1R, OMXIPPP_SUFFIX)
|
||||
#define omxIPPP_Moments_U8_C3R OMXCATBAR(IPPP_Moments_U8_C3R, OMXIPPP_SUFFIX)
|
||||
|
||||
#define omxSP_BlockExp_S16 OMXCATBAR(SP_BlockExp_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_BlockExp_S32 OMXCATBAR(SP_BlockExp_S32, OMXSP_SUFFIX)
|
||||
#define omxSP_Copy_S16 OMXCATBAR(SP_Copy_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_DotProd_S16 OMXCATBAR(SP_DotProd_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_DotProd_S16_Sfs OMXCATBAR(SP_DotProd_S16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTFwd_CToC_SC16_Sfs OMXCATBAR(SP_FFTFwd_CToC_SC16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTFwd_CToC_SC32_Sfs OMXCATBAR(SP_FFTFwd_CToC_SC32_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTFwd_RToCCS_S16S32_Sfs OMXCATBAR(SP_FFTFwd_RToCCS_S16S32_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTFwd_RToCCS_S32_Sfs OMXCATBAR(SP_FFTFwd_RToCCS_S32_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTGetBufSize_C_SC16 OMXCATBAR(SP_FFTGetBufSize_C_SC16, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTGetBufSize_C_SC32 OMXCATBAR(SP_FFTGetBufSize_C_SC32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTGetBufSize_R_S16S32 OMXCATBAR(SP_FFTGetBufSize_R_S16S32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTGetBufSize_R_S32 OMXCATBAR(SP_FFTGetBufSize_R_S32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInit_C_SC16 OMXCATBAR(SP_FFTInit_C_SC16, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInit_C_SC32 OMXCATBAR(SP_FFTInit_C_SC32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInit_R_S16S32 OMXCATBAR(SP_FFTInit_R_S16S32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInit_R_S32 OMXCATBAR(SP_FFTInit_R_S32, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInv_CCSToR_S32_Sfs OMXCATBAR(SP_FFTInv_CCSToR_S32_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInv_CCSToR_S32S16_Sfs OMXCATBAR(SP_FFTInv_CCSToR_S32S16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInv_CToC_SC16_Sfs OMXCATBAR(SP_FFTInv_CToC_SC16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FFTInv_CToC_SC32_Sfs OMXCATBAR(SP_FFTInv_CToC_SC32_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FilterMedian_S32 OMXCATBAR(SP_FilterMedian_S32, OMXSP_SUFFIX)
|
||||
#define omxSP_FilterMedian_S32_I OMXCATBAR(SP_FilterMedian_S32_I, OMXSP_SUFFIX)
|
||||
#define omxSP_FIR_Direct_S16 OMXCATBAR(SP_FIR_Direct_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_FIR_Direct_S16_I OMXCATBAR(SP_FIR_Direct_S16_I, OMXSP_SUFFIX)
|
||||
#define omxSP_FIR_Direct_S16_ISfs OMXCATBAR(SP_FIR_Direct_S16_ISfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FIR_Direct_S16_Sfs OMXCATBAR(SP_FIR_Direct_S16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FIROne_Direct_S16 OMXCATBAR(SP_FIROne_Direct_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_FIROne_Direct_S16_I OMXCATBAR(SP_FIROne_Direct_S16_I, OMXSP_SUFFIX)
|
||||
#define omxSP_FIROne_Direct_S16_ISfs OMXCATBAR(SP_FIROne_Direct_S16_ISfs, OMXSP_SUFFIX)
|
||||
#define omxSP_FIROne_Direct_S16_Sfs OMXCATBAR(SP_FIROne_Direct_S16_Sfs, OMXSP_SUFFIX)
|
||||
#define omxSP_IIR_BiQuadDirect_S16 OMXCATBAR(SP_IIR_BiQuadDirect_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_IIR_BiQuadDirect_S16_I OMXCATBAR(SP_IIR_BiQuadDirect_S16_I, OMXSP_SUFFIX)
|
||||
#define omxSP_IIR_Direct_S16 OMXCATBAR(SP_IIR_Direct_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_IIR_Direct_S16_I OMXCATBAR(SP_IIR_Direct_S16_I, OMXSP_SUFFIX)
|
||||
#define omxSP_IIROne_BiQuadDirect_S16 OMXCATBAR(SP_IIROne_BiQuadDirect_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_IIROne_BiQuadDirect_S16_I OMXCATBAR(SP_IIROne_BiQuadDirect_S16_I, OMXSP_SUFFIX)
|
||||
#define omxSP_IIROne_Direct_S16 OMXCATBAR(SP_IIROne_Direct_S16, OMXSP_SUFFIX)
|
||||
#define omxSP_IIROne_Direct_S16_I OMXCATBAR(SP_IIROne_Direct_S16_I, OMXSP_SUFFIX)
|
||||
|
||||
#define omxVCCOMM_Average_16x OMXCATBAR(VCCOMM_Average_16x, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_Average_8x OMXCATBAR(VCCOMM_Average_8x, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_ComputeTextureErrorBlock OMXCATBAR(VCCOMM_ComputeTextureErrorBlock, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_ComputeTextureErrorBlock_SAD OMXCATBAR(VCCOMM_ComputeTextureErrorBlock_SAD, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_Copy16x16 OMXCATBAR(VCCOMM_Copy16x16, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_Copy8x8 OMXCATBAR(VCCOMM_Copy8x8, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_ExpandFrame_I OMXCATBAR(VCCOMM_ExpandFrame_I, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_LimitMVToRect OMXCATBAR(VCCOMM_LimitMVToRect, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_SAD_16x OMXCATBAR(VCCOMM_SAD_16x, OMXVCCOMM_SUFFIX)
|
||||
#define omxVCCOMM_SAD_8x OMXCATBAR(VCCOMM_SAD_8x, OMXVCCOMM_SUFFIX)
|
||||
|
||||
#define omxVCM4P10_Average_4x OMXCATBAR(VCM4P10_Average_4x, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_BlockMatch_Half OMXCATBAR(VCM4P10_BlockMatch_Half, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_BlockMatch_Integer OMXCATBAR(VCM4P10_BlockMatch_Integer, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_BlockMatch_Quarter OMXCATBAR(VCM4P10_BlockMatch_Quarter, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_DeblockChroma_I OMXCATBAR(VCM4P10_DeblockChroma_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_DeblockLuma_I OMXCATBAR(VCM4P10_DeblockLuma_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_DecodeChromaDcCoeffsToPairCAVLC OMXCATBAR(VCM4P10_DecodeChromaDcCoeffsToPairCAVLC, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_DecodeCoeffsToPairCAVLC OMXCATBAR(VCM4P10_DecodeCoeffsToPairCAVLC, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_DequantTransformResidualFromPairAndAdd OMXCATBAR(VCM4P10_DequantTransformResidualFromPairAndAdd, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_FilterDeblockingChroma_HorEdge_I OMXCATBAR(VCM4P10_FilterDeblockingChroma_HorEdge_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_FilterDeblockingChroma_VerEdge_I OMXCATBAR(VCM4P10_FilterDeblockingChroma_VerEdge_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_FilterDeblockingLuma_HorEdge_I OMXCATBAR(VCM4P10_FilterDeblockingLuma_HorEdge_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_FilterDeblockingLuma_VerEdge_I OMXCATBAR(VCM4P10_FilterDeblockingLuma_VerEdge_I, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_GetVLCInfo OMXCATBAR(VCM4P10_GetVLCInfo, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InterpolateChroma OMXCATBAR(VCM4P10_InterpolateChroma, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InterpolateHalfHor_Luma OMXCATBAR(VCM4P10_InterpolateHalfHor_Luma, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InterpolateHalfVer_Luma OMXCATBAR(VCM4P10_InterpolateHalfVer_Luma, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InterpolateLuma OMXCATBAR(VCM4P10_InterpolateLuma, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InvTransformDequant_ChromaDC OMXCATBAR(VCM4P10_InvTransformDequant_ChromaDC, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InvTransformDequant_LumaDC OMXCATBAR(VCM4P10_InvTransformDequant_LumaDC, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_InvTransformResidualAndAdd OMXCATBAR(VCM4P10_InvTransformResidualAndAdd, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_MEGetBufSize OMXCATBAR(VCM4P10_MEGetBufSize, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_MEInit OMXCATBAR(VCM4P10_MEInit, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_MotionEstimationMB OMXCATBAR(VCM4P10_MotionEstimationMB, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_PredictIntra_16x16 OMXCATBAR(VCM4P10_PredictIntra_16x16, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_PredictIntra_4x4 OMXCATBAR(VCM4P10_PredictIntra_4x4, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_PredictIntraChroma_8x8 OMXCATBAR(VCM4P10_PredictIntraChroma_8x8, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SAD_4x OMXCATBAR(VCM4P10_SAD_4x, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SADQuar_16x OMXCATBAR(VCM4P10_SADQuar_16x, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SADQuar_4x OMXCATBAR(VCM4P10_SADQuar_4x, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SADQuar_8x OMXCATBAR(VCM4P10_SADQuar_8x, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SATD_4x4 OMXCATBAR(VCM4P10_SATD_4x4, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_SubAndTransformQDQResidual OMXCATBAR(VCM4P10_SubAndTransformQDQResidual, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_TransformDequantChromaDCFromPair OMXCATBAR(VCM4P10_TransformDequantChromaDCFromPair, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_TransformDequantLumaDCFromPair OMXCATBAR(VCM4P10_TransformDequantLumaDCFromPair, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_TransformQuant_ChromaDC OMXCATBAR(VCM4P10_TransformQuant_ChromaDC, OMXVCM4P10_SUFFIX)
|
||||
#define omxVCM4P10_TransformQuant_LumaDC OMXCATBAR(VCM4P10_TransformQuant_LumaDC, OMXVCM4P10_SUFFIX)
|
||||
|
||||
#define omxVCM4P2_BlockMatch_Half_16x16 OMXCATBAR(VCM4P2_BlockMatch_Half_16x16, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_BlockMatch_Half_8x8 OMXCATBAR(VCM4P2_BlockMatch_Half_8x8, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_BlockMatch_Integer_16x16 OMXCATBAR(VCM4P2_BlockMatch_Integer_16x16, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_BlockMatch_Integer_8x8 OMXCATBAR(VCM4P2_BlockMatch_Integer_8x8, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DCT8x8blk OMXCATBAR(VCM4P2_DCT8x8blk, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodeBlockCoef_Inter OMXCATBAR(VCM4P2_DecodeBlockCoef_Inter, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodeBlockCoef_Intra OMXCATBAR(VCM4P2_DecodeBlockCoef_Intra, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodePadMV_PVOP OMXCATBAR(VCM4P2_DecodePadMV_PVOP, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodeVLCZigzag_Inter OMXCATBAR(VCM4P2_DecodeVLCZigzag_Inter, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodeVLCZigzag_IntraACVLC OMXCATBAR(VCM4P2_DecodeVLCZigzag_IntraACVLC, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_DecodeVLCZigzag_IntraDCVLC OMXCATBAR(VCM4P2_DecodeVLCZigzag_IntraDCVLC, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_EncodeMV OMXCATBAR(VCM4P2_EncodeMV, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_EncodeVLCZigzag_Inter OMXCATBAR(VCM4P2_EncodeVLCZigzag_Inter, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_EncodeVLCZigzag_IntraACVLC OMXCATBAR(VCM4P2_EncodeVLCZigzag_IntraACVLC, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_EncodeVLCZigzag_IntraDCVLC OMXCATBAR(VCM4P2_EncodeVLCZigzag_IntraDCVLC, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_FindMVpred OMXCATBAR(VCM4P2_FindMVpred, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_IDCT8x8blk OMXCATBAR(VCM4P2_IDCT8x8blk, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_MCReconBlock OMXCATBAR(VCM4P2_MCReconBlock, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_MEGetBufSize OMXCATBAR(VCM4P2_MEGetBufSize, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_MEInit OMXCATBAR(VCM4P2_MEInit, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_MotionEstimationMB OMXCATBAR(VCM4P2_MotionEstimationMB, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_PredictReconCoefIntra OMXCATBAR(VCM4P2_PredictReconCoefIntra, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_QuantInter_I OMXCATBAR(VCM4P2_QuantInter_I, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_QuantIntra_I OMXCATBAR(VCM4P2_QuantIntra_I, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_QuantInvInter_I OMXCATBAR(VCM4P2_QuantInvInter_I, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_QuantInvIntra_I OMXCATBAR(VCM4P2_QuantInvIntra_I, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_TransRecBlockCoef_inter OMXCATBAR(VCM4P2_TransRecBlockCoef_inter, OMXVCM4P2_SUFFIX)
|
||||
#define omxVCM4P2_TransRecBlockCoef_intra OMXCATBAR(VCM4P2_TransRecBlockCoef_intra, OMXVCM4P2_SUFFIX)
|
||||
|
||||
#endif /* endif ARMOMX_ENABLE_RENAMING */
|
||||
#endif /* _armOMX_h_ */
|
@ -1,286 +0,0 @@
|
||||
/**
|
||||
* File: omxtypes.h
|
||||
* Brief: Defines basic Data types used in OpenMAX v1.0.2 header files.
|
||||
*
|
||||
* Copyright (c) 2005-2008,2015 The Khronos Group Inc.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and/or associated documentation files (the
|
||||
* "Materials"), to deal in the Materials without restriction, including
|
||||
* without limitation the rights to use, copy, modify, merge, publish,
|
||||
* distribute, sublicense, and/or sell copies of the Materials, and to
|
||||
* permit persons to whom the Materials are furnished to do so, subject to
|
||||
* the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included
|
||||
* in all copies or substantial portions of the Materials.
|
||||
*
|
||||
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
|
||||
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
|
||||
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
|
||||
* https://www.khronos.org/registry/
|
||||
*
|
||||
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
||||
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
|
||||
*
|
||||
*/
|
||||
|
||||
#ifndef _OMXTYPES_H_
|
||||
#define _OMXTYPES_H_
|
||||
|
||||
#include <limits.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Maximum FFT order supported by the twiddle table. Only used by the
|
||||
* float FFT routines. Must be consistent with the table in
|
||||
* armSP_FFT_F32TwiddleTable.c.
|
||||
*/
|
||||
#ifdef BIG_FFT_TABLE
|
||||
#define TWIDDLE_TABLE_ORDER 15
|
||||
#else
|
||||
#define TWIDDLE_TABLE_ORDER 12
|
||||
#endif
|
||||
|
||||
#define OMX_IN
|
||||
#define OMX_OUT
|
||||
#define OMX_INOUT
|
||||
|
||||
|
||||
typedef enum {
|
||||
|
||||
/* Mandatory return codes - use cases are explicitly described for each function */
|
||||
OMX_Sts_NoErr = 0, /* No error, the function completed successfully */
|
||||
OMX_Sts_Err = -2, /* Unknown/unspecified error */
|
||||
OMX_Sts_InvalidBitstreamValErr = -182, /* Invalid value detected during bitstream processing */
|
||||
OMX_Sts_MemAllocErr = -9, /* Not enough memory allocated for the operation */
|
||||
OMX_StsACAAC_GainCtrErr = -159, /* AAC: Unsupported gain control data detected */
|
||||
OMX_StsACAAC_PrgNumErr = -167, /* AAC: Invalid number of elements for one program */
|
||||
OMX_StsACAAC_CoefValErr = -163, /* AAC: Invalid quantized coefficient value */
|
||||
OMX_StsACAAC_MaxSfbErr = -162, /* AAC: Invalid maxSfb value in relation to numSwb */
|
||||
OMX_StsACAAC_PlsDataErr = -160, /* AAC: pulse escape sequence data error */
|
||||
|
||||
/* Optional return codes - use cases are explicitly described for each function*/
|
||||
OMX_Sts_BadArgErr = -5, /* Bad Arguments */
|
||||
|
||||
OMX_StsACAAC_TnsNumFiltErr = -157, /* AAC: Invalid number of TNS filters */
|
||||
OMX_StsACAAC_TnsLenErr = -156, /* AAC: Invalid TNS region length */
|
||||
OMX_StsACAAC_TnsOrderErr = -155, /* AAC: Invalid order of TNS filter */
|
||||
OMX_StsACAAC_TnsCoefResErr = -154, /* AAC: Invalid bit-resolution for TNS filter coefficients */
|
||||
OMX_StsACAAC_TnsCoefErr = -153, /* AAC: Invalid TNS filter coefficients */
|
||||
OMX_StsACAAC_TnsDirectErr = -152, /* AAC: Invalid TNS filter direction */
|
||||
|
||||
OMX_StsICJP_JPEGMarkerErr = -183, /* JPEG marker encountered within an entropy-coded block; */
|
||||
/* Huffman decoding operation terminated early. */
|
||||
OMX_StsICJP_JPEGMarker = -181, /* JPEG marker encountered; Huffman decoding */
|
||||
/* operation terminated early. */
|
||||
OMX_StsIPPP_ContextMatchErr = -17, /* Context parameter doesn't match to the operation */
|
||||
|
||||
OMX_StsSP_EvenMedianMaskSizeErr = -180, /* Even size of the Median Filter mask was replaced by the odd one */
|
||||
|
||||
OMX_Sts_MaximumEnumeration = INT_MAX /*Placeholder, forces enum of size OMX_INT*/
|
||||
|
||||
} OMXResult; /** Return value or error value returned from a function. Identical to OMX_INT */
|
||||
|
||||
|
||||
/* OMX_U8 */
|
||||
#if UCHAR_MAX == 0xff
|
||||
typedef unsigned char OMX_U8;
|
||||
#elif USHRT_MAX == 0xff
|
||||
typedef unsigned short int OMX_U8;
|
||||
#else
|
||||
#error OMX_U8 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_S8 */
|
||||
#if SCHAR_MAX == 0x7f
|
||||
typedef signed char OMX_S8;
|
||||
#elif SHRT_MAX == 0x7f
|
||||
typedef signed short int OMX_S8;
|
||||
#else
|
||||
#error OMX_S8 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_U16 */
|
||||
#if USHRT_MAX == 0xffff
|
||||
typedef unsigned short int OMX_U16;
|
||||
#elif UINT_MAX == 0xffff
|
||||
typedef unsigned int OMX_U16;
|
||||
#else
|
||||
#error OMX_U16 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_S16 */
|
||||
#if SHRT_MAX == 0x7fff
|
||||
typedef signed short int OMX_S16;
|
||||
#elif INT_MAX == 0x7fff
|
||||
typedef signed int OMX_S16;
|
||||
#else
|
||||
#error OMX_S16 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_U32 */
|
||||
#if UINT_MAX == 0xffffffff
|
||||
typedef unsigned int OMX_U32;
|
||||
#elif LONG_MAX == 0xffffffff
|
||||
typedef unsigned long int OMX_U32;
|
||||
#else
|
||||
#error OMX_U32 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_S32 */
|
||||
#if INT_MAX == 0x7fffffff
|
||||
typedef signed int OMX_S32;
|
||||
#elif LONG_MAX == 0x7fffffff
|
||||
typedef long signed int OMX_S32;
|
||||
#else
|
||||
#error OMX_S32 undefined
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_U64 & OMX_S64 */
|
||||
#if defined( _WIN32 ) || defined ( _WIN64 )
|
||||
typedef __int64 OMX_S64; /** Signed 64-bit integer */
|
||||
typedef unsigned __int64 OMX_U64; /** Unsigned 64-bit integer */
|
||||
#define OMX_MIN_S64 (0x8000000000000000i64)
|
||||
#define OMX_MIN_U64 (0x0000000000000000i64)
|
||||
#define OMX_MAX_S64 (0x7FFFFFFFFFFFFFFFi64)
|
||||
#define OMX_MAX_U64 (0xFFFFFFFFFFFFFFFFi64)
|
||||
#else
|
||||
typedef long long OMX_S64; /** Signed 64-bit integer */
|
||||
typedef unsigned long long OMX_U64; /** Unsigned 64-bit integer */
|
||||
#define OMX_MIN_S64 (0x8000000000000000LL)
|
||||
#define OMX_MIN_U64 (0x0000000000000000LL)
|
||||
#define OMX_MAX_S64 (0x7FFFFFFFFFFFFFFFLL)
|
||||
#define OMX_MAX_U64 (0xFFFFFFFFFFFFFFFFLL)
|
||||
#endif
|
||||
|
||||
|
||||
/* OMX_SC8 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_S8 Re; /** Real part */
|
||||
OMX_S8 Im; /** Imaginary part */
|
||||
|
||||
} OMX_SC8; /** Signed 8-bit complex number */
|
||||
|
||||
|
||||
/* OMX_SC16 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_S16 Re; /** Real part */
|
||||
OMX_S16 Im; /** Imaginary part */
|
||||
|
||||
} OMX_SC16; /** Signed 16-bit complex number */
|
||||
|
||||
|
||||
/* OMX_SC32 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_S32 Re; /** Real part */
|
||||
OMX_S32 Im; /** Imaginary part */
|
||||
|
||||
} OMX_SC32; /** Signed 32-bit complex number */
|
||||
|
||||
|
||||
/* OMX_SC64 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_S64 Re; /** Real part */
|
||||
OMX_S64 Im; /** Imaginary part */
|
||||
|
||||
} OMX_SC64; /** Signed 64-bit complex number */
|
||||
|
||||
|
||||
/* OMX_F32 */
|
||||
typedef float OMX_F32; /** Single precision floating point,IEEE 754 */
|
||||
|
||||
/* OMX_F64 */
|
||||
typedef double OMX_F64; /** Double precision floating point,IEEE 754 */
|
||||
|
||||
/* OMX_FC32 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_F32 Re; /** Real part */
|
||||
OMX_F32 Im; /** Imaginary part */
|
||||
|
||||
} OMX_FC32; /** single precision floating point complex number */
|
||||
|
||||
/* OMX_FC64 */
|
||||
typedef struct
|
||||
{
|
||||
OMX_F64 Re; /** Real part */
|
||||
OMX_F64 Im; /** Imaginary part */
|
||||
|
||||
} OMX_FC64; /** double precision floating point complex number */
|
||||
|
||||
/* OMX_INT */
|
||||
typedef int OMX_INT; /** signed integer corresponding to machine word length, has maximum signed value INT_MAX*/
|
||||
|
||||
|
||||
#define OMX_MIN_S8 (-128)
|
||||
#define OMX_MIN_U8 0
|
||||
#define OMX_MIN_S16 (-32768)
|
||||
#define OMX_MIN_U16 0
|
||||
#define OMX_MIN_S32 (-2147483647-1)
|
||||
#define OMX_MIN_U32 0
|
||||
|
||||
#define OMX_MAX_S8 (127)
|
||||
#define OMX_MAX_U8 (255)
|
||||
#define OMX_MAX_S16 (32767)
|
||||
#define OMX_MAX_U16 (0xFFFF)
|
||||
#define OMX_MAX_S32 (2147483647)
|
||||
#define OMX_MAX_U32 (0xFFFFFFFF)
|
||||
|
||||
typedef void OMXVoid;
|
||||
|
||||
#ifndef NULL
|
||||
#define NULL ((void*)0)
|
||||
#endif
|
||||
|
||||
/** Defines the geometric position and size of a rectangle,
|
||||
* where x,y defines the coordinates of the top left corner
|
||||
* of the rectangle, with dimensions width in the x-direction
|
||||
* and height in the y-direction */
|
||||
typedef struct {
|
||||
OMX_INT x; /** x-coordinate of top left corner of rectangle */
|
||||
OMX_INT y; /** y-coordinate of top left corner of rectangle */
|
||||
OMX_INT width; /** Width in the x-direction. */
|
||||
OMX_INT height; /** Height in the y-direction. */
|
||||
}OMXRect;
|
||||
|
||||
|
||||
/** Defines the geometric position of a point, */
|
||||
typedef struct
|
||||
{
|
||||
OMX_INT x; /** x-coordinate */
|
||||
OMX_INT y; /** y-coordinate */
|
||||
|
||||
} OMXPoint;
|
||||
|
||||
|
||||
/** Defines the dimensions of a rectangle, or region of interest in an image */
|
||||
typedef struct
|
||||
{
|
||||
OMX_INT width; /** Width of the rectangle, in the x-direction */
|
||||
OMX_INT height; /** Height of the rectangle, in the y-direction */
|
||||
|
||||
} OMXSize;
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
#endif /* _OMXTYPES_H_ */
|
@ -1,76 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This file was originally licensed as follows. It has been
|
||||
@// relicensed with permission from the copyright holders.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// File Name: omxtypes_s.h
|
||||
@// OpenMAX DL: v1.0.2
|
||||
@// Last Modified Revision: 9622
|
||||
@// Last Modified Date: Wed, 06 Feb 2008
|
||||
@//
|
||||
@// (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
|
||||
@//
|
||||
@//
|
||||
|
||||
@// Mandatory return codes - use cases are explicitly described for each function
|
||||
.equ OMX_Sts_NoErr, 0 @// No error the function completed successfully
|
||||
.equ OMX_Sts_Err, -2 @// Unknown/unspecified error
|
||||
.equ OMX_Sts_InvalidBitstreamValErr, -182 @// Invalid value detected during bitstream processing
|
||||
.equ OMX_Sts_MemAllocErr, -9 @// Not enough memory allocated for the operation
|
||||
.equ OMX_StsACAAC_GainCtrErr, -159 @// AAC: Unsupported gain control data detected
|
||||
.equ OMX_StsACAAC_PrgNumErr, -167 @// AAC: Invalid number of elements for one program
|
||||
.equ OMX_StsACAAC_CoefValErr, -163 @// AAC: Invalid quantized coefficient value
|
||||
.equ OMX_StsACAAC_MaxSfbErr, -162 @// AAC: Invalid maxSfb value in relation to numSwb
|
||||
.equ OMX_StsACAAC_PlsDataErr, -160 @// AAC: pulse escape sequence data error
|
||||
|
||||
@// Optional return codes - use cases are explicitly described for each function
|
||||
.equ OMX_Sts_BadArgErr, -5 @// Bad Arguments
|
||||
|
||||
.equ OMX_StsACAAC_TnsNumFiltErr, -157 @// AAC: Invalid number of TNS filters
|
||||
.equ OMX_StsACAAC_TnsLenErr, -156 @// AAC: Invalid TNS region length
|
||||
.equ OMX_StsACAAC_TnsOrderErr, -155 @// AAC: Invalid order of TNS filter
|
||||
.equ OMX_StsACAAC_TnsCoefResErr, -154 @// AAC: Invalid bit-resolution for TNS filter coefficients
|
||||
.equ OMX_StsACAAC_TnsCoefErr, -153 @// AAC: Invalid TNS filter coefficients
|
||||
.equ OMX_StsACAAC_TnsDirectErr, -152 @// AAC: Invalid TNS filter direction
|
||||
.equ OMX_StsICJP_JPEGMarkerErr, -183 @// JPEG marker encountered within an entropy-coded block;
|
||||
@// Huffman decoding operation terminated early.
|
||||
.equ OMX_StsICJP_JPEGMarker, -181 @// JPEG marker encountered; Huffman decoding
|
||||
@// operation terminated early.
|
||||
.equ OMX_StsIPPP_ContextMatchErr, -17 @// Context parameter doesn't match to the operation
|
||||
|
||||
.equ OMX_StsSP_EvenMedianMaskSizeErr, -180 @// Even size of the Median Filter mask was replaced by the odd one
|
||||
|
||||
.equ OMX_Sts_MaximumEnumeration, 0x7FFFFFFF
|
||||
|
||||
|
||||
|
||||
.equ OMX_MIN_S8, (-128)
|
||||
.equ OMX_MIN_U8, 0
|
||||
.equ OMX_MIN_S16, (-32768)
|
||||
.equ OMX_MIN_U16, 0
|
||||
|
||||
|
||||
.equ OMX_MIN_S32, (-2147483647-1)
|
||||
.equ OMX_MIN_U32, 0
|
||||
|
||||
.equ OMX_MAX_S8, (127)
|
||||
.equ OMX_MAX_U8, (255)
|
||||
.equ OMX_MAX_S16, (32767)
|
||||
.equ OMX_MAX_U16, (0xFFFF)
|
||||
.equ OMX_MAX_S32, (2147483647)
|
||||
.equ OMX_MAX_U32, (0xFFFFFFFF)
|
||||
|
||||
.equ OMX_VC_UPPER, 0x1 @// Used by the PredictIntra functions
|
||||
.equ OMX_VC_LEFT, 0x2 @// Used by the PredictIntra functions
|
||||
.equ OMX_VC_UPPER_RIGHT, 0x40 @// Used by the PredictIntra functions
|
||||
|
||||
.equ NULL, 0
|
@ -1,49 +0,0 @@
|
||||
# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*-
|
||||
# vim: set filetype=python:
|
||||
# This Source Code Form is subject to the terms of the Mozilla Public
|
||||
# License, v. 2.0. If a copy of the MPL was not distributed with this
|
||||
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
|
||||
|
||||
if CONFIG['TARGET_CPU'] == 'arm' and CONFIG['BUILD_ARM_NEON']:
|
||||
Library('openmax_dl')
|
||||
|
||||
EXPORTS.dl.api += [
|
||||
'api/armCOMM_s.h',
|
||||
'api/armOMX.h',
|
||||
'api/omxtypes.h',
|
||||
'api/omxtypes_s.h',
|
||||
]
|
||||
|
||||
EXPORTS.dl.sp.api += [
|
||||
'sp/api/armSP.h',
|
||||
'sp/api/omxSP.h',
|
||||
]
|
||||
|
||||
SOURCES += [
|
||||
'sp/src/armSP_FFT_F32TwiddleTable.c',
|
||||
'sp/src/omxSP_FFTGetBufSize_R_F32.c',
|
||||
'sp/src/omxSP_FFTGetBufSize_R_S32.c',
|
||||
'sp/src/omxSP_FFTInit_R_F32.c',
|
||||
]
|
||||
|
||||
SOURCES += [
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix2_fs_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix2_ls_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix2_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix4_fs_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix4_ls_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix4_unsafe_s.S',
|
||||
'sp/src/armSP_FFT_CToC_FC32_Radix8_fs_unsafe_s.S',
|
||||
'sp/src/armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe_s.S',
|
||||
'sp/src/omxSP_FFTFwd_RToCCS_F32_Sfs_s.S',
|
||||
'sp/src/omxSP_FFTInv_CCSToR_F32_Sfs_unscaled_s.S',
|
||||
]
|
||||
|
||||
LOCAL_INCLUDES += [
|
||||
'..',
|
||||
'api'
|
||||
]
|
||||
|
||||
DEFINES['BIG_FFT_TABLE'] = True
|
||||
|
||||
FINAL_LIBRARY = 'xul'
|
@ -1,92 +0,0 @@
|
||||
/*
|
||||
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
*
|
||||
* Use of this source code is governed by a BSD-style license
|
||||
* that can be found in the LICENSE file in the root of the source
|
||||
* tree. An additional intellectual property rights grant can be found
|
||||
* in the file PATENTS. All contributing project authors may
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*
|
||||
* This file was originally licensed as follows. It has been
|
||||
* relicensed with permission from the copyright holders.
|
||||
*/
|
||||
|
||||
/**
|
||||
*
|
||||
* File Name: armSP.h
|
||||
* OpenMAX DL: v1.0.2
|
||||
* Last Modified Revision: 7014
|
||||
* Last Modified Date: Wed, 01 Aug 2007
|
||||
*
|
||||
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
|
||||
*
|
||||
*
|
||||
*
|
||||
* File: armSP.h
|
||||
* Brief: Declares API's/Basic Data types used across the OpenMAX Signal Processing domain
|
||||
*
|
||||
*/
|
||||
#ifndef _armSP_H_
|
||||
#define _armSP_H_
|
||||
|
||||
#include "dl/api/omxtypes.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/** FFT Specific declarations */
|
||||
extern OMX_S32 armSP_FFT_S32TwiddleTable[1026];
|
||||
extern OMX_F32 armSP_FFT_F32TwiddleTable[];
|
||||
|
||||
typedef struct ARMsFFTSpec_SC32_Tag
|
||||
{
|
||||
OMX_U32 N;
|
||||
OMX_U16 *pBitRev;
|
||||
OMX_SC32 *pTwiddle;
|
||||
OMX_SC32 *pBuf;
|
||||
}ARMsFFTSpec_SC32;
|
||||
|
||||
|
||||
typedef struct ARMsFFTSpec_SC16_Tag
|
||||
{
|
||||
OMX_U32 N;
|
||||
OMX_U16 *pBitRev;
|
||||
OMX_SC16 *pTwiddle;
|
||||
OMX_SC16 *pBuf;
|
||||
}ARMsFFTSpec_SC16;
|
||||
|
||||
typedef struct ARMsFFTSpec_R_SC32_Tag
|
||||
{
|
||||
OMX_U32 N;
|
||||
OMX_U16 *pBitRev;
|
||||
OMX_SC32 *pTwiddle;
|
||||
OMX_S32 *pBuf;
|
||||
}ARMsFFTSpec_R_SC32;
|
||||
|
||||
typedef struct ARMsFFTSpec_R_FC32_Tag
|
||||
{
|
||||
OMX_U32 N;
|
||||
OMX_U16* pBitRev;
|
||||
OMX_FC32* pTwiddle;
|
||||
OMX_F32* pBuf;
|
||||
} ARMsFFTSpec_R_FC32;
|
||||
|
||||
typedef struct ARMsFFTSpec_FC32_Tag
|
||||
{
|
||||
OMX_U32 N;
|
||||
OMX_U16* pBitRev;
|
||||
OMX_FC32* pTwiddle;
|
||||
OMX_FC32* pBuf;
|
||||
} ARMsFFTSpec_FC32;
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
/*End of File*/
|
||||
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,294 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of
|
||||
@// armSP_FFTInv_CCSToR_S32_preTwiddleRadix2_unsafe_s.s to support float
|
||||
@// instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute the "preTwiddleRadix2" stage prior to the call to the complexFFT
|
||||
@// It does a Z(k) = Feven(k) + jW^(-k) FOdd(k); k=0,1,2,...N/2-1 computation
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r1
|
||||
#define pFFTSpec r2
|
||||
#define scale r3
|
||||
|
||||
|
||||
@// Output registers
|
||||
#define result r0
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define argTwiddle r1
|
||||
#define argDst r2
|
||||
#define argScale r4
|
||||
#define tmpOrder r4
|
||||
#define pTwiddle r4
|
||||
#define pOut r5
|
||||
#define subFFTSize r7
|
||||
#define subFFTNum r6
|
||||
#define N r6
|
||||
#define order r14
|
||||
#define diff r9
|
||||
@// Total num of radix stages required to complete the FFT
|
||||
#define count r8
|
||||
#define x0r r4
|
||||
#define x0i r5
|
||||
#define diffMinusOne r2
|
||||
#define round r3
|
||||
|
||||
#define pOut1 r2
|
||||
#define size r7
|
||||
#define step r8
|
||||
#define step1 r9
|
||||
#define twStep r10
|
||||
#define pTwiddleTmp r11
|
||||
#define argTwiddle1 r12
|
||||
#define zero r14
|
||||
|
||||
@// Neon registers
|
||||
|
||||
#define dX0 D0
|
||||
#define dShift D1
|
||||
#define dX1 D1
|
||||
#define dY0 D2
|
||||
#define dY1 D3
|
||||
#define dX0r D0
|
||||
#define dX0i D1
|
||||
#define dX1r D2
|
||||
#define dX1i D3
|
||||
#define dW0r D4
|
||||
#define dW0i D5
|
||||
#define dW1r D6
|
||||
#define dW1i D7
|
||||
#define dT0 D8
|
||||
#define dT1 D9
|
||||
#define dT2 D10
|
||||
#define dT3 D11
|
||||
#define qT0 D12
|
||||
#define qT1 D14
|
||||
#define qT2 D16
|
||||
#define qT3 D18
|
||||
#define dY0r D4
|
||||
#define dY0i D5
|
||||
#define dY1r D6
|
||||
#define dY1i D7
|
||||
|
||||
#define dY2 D4
|
||||
#define dY3 D5
|
||||
#define dW0 D6
|
||||
#define dW1 D7
|
||||
#define dW0Tmp D10
|
||||
#define dW1Neg D11
|
||||
|
||||
#define half D13
|
||||
|
||||
@ Structure offsets for the FFTSpec
|
||||
.set ARMsFFTSpec_N, 0
|
||||
.set ARMsFFTSpec_pBitRev, 4
|
||||
.set ARMsFFTSpec_pTwiddle, 8
|
||||
.set ARMsFFTSpec_pBuf, 12
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
@// Read the size from structure and take log
|
||||
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
|
||||
|
||||
@// Read other structure parameters
|
||||
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
|
||||
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
|
||||
|
||||
VMOV.F32 half, #0.5
|
||||
|
||||
|
||||
MOV size,N,ASR #1 @// preserve the contents of N
|
||||
MOV step,N,LSL #2 @// step = N/2 * 8 bytes
|
||||
|
||||
|
||||
@// Z(k) = 1/2 {[F(k) + F'(N/2-k)] +j*W^(-k) [F(k) - F'(N/2-k)]}
|
||||
@// Note: W^(k) is stored as negated value and also need to
|
||||
@// conjugate the values from the table
|
||||
|
||||
@// Z(0) : no need of twiddle multiply
|
||||
@// Z(0) = 1/2 { [F(0) + F'(N/2)] +j [F(0) - F'(N/2)] }
|
||||
|
||||
VLD1.F32 dX0,[pSrc],step
|
||||
ADD pOut1,pOut,step @// pOut1 = pOut+ N/2*8 bytes
|
||||
|
||||
VLD1.F32 dX1,[pSrc]!
|
||||
@// twStep = 3N/8 * 8 bytes pointing to W^1
|
||||
SUB twStep,step,size,LSL #1
|
||||
|
||||
MOV step1,size,LSL #2 @// step1 = N/4 * 8 = N/2*4 bytes
|
||||
SUB step1,step1,#8 @// (N/4-1)*8 bytes
|
||||
|
||||
VADD.F32 dY0,dX0,dX1 @// [b+d | a+c]
|
||||
VSUB.F32 dY1,dX0,dX1 @// [b-d | a-c]
|
||||
VMUL.F32 dY0, dY0, half[0]
|
||||
VMUL.F32 dY1, dY1, half[0]
|
||||
|
||||
@// dY0= [a-c | a+c] ;dY1= [b-d | b+d]
|
||||
VZIP.F32 dY0,dY1
|
||||
|
||||
VSUB.F32 dX0,dY0,dY1
|
||||
SUBS size,size,#2
|
||||
VADD.F32 dX1,dY0,dY1
|
||||
|
||||
SUB pSrc,pSrc,step
|
||||
|
||||
VST1.F32 dX0[0],[pOut1]!
|
||||
ADD pTwiddleTmp,pTwiddle,#8 @// W^2
|
||||
VST1.F32 dX1[1],[pOut1]!
|
||||
ADD argTwiddle1,pTwiddle,twStep @// W^1
|
||||
|
||||
|
||||
BLT decrementScale\name
|
||||
BEQ lastElement\name
|
||||
|
||||
|
||||
@// Z(k) = 1/2[F(k) + F'(N/2-k)] +j*W^(-k) [F(k) - F'(N/2-k)]
|
||||
@// Note: W^k is stored as negative values in the table and also
|
||||
@// need to conjugate the values from the table.
|
||||
@//
|
||||
@// Process 4 elements at a time. E.g: Z(1),Z(2) and Z(N/2-2),Z(N/2-1)
|
||||
@// since both of them require F(1),F(2) and F(N/2-2),F(N/2-1)
|
||||
|
||||
|
||||
SUB step,step,#24
|
||||
evenOddButterflyLoop\name :
|
||||
|
||||
|
||||
VLD1.F32 dW0r,[argTwiddle1],step1
|
||||
VLD1.F32 dW1r,[argTwiddle1]!
|
||||
|
||||
VLD2.F32 {dX0r,dX0i},[pSrc],step
|
||||
SUB argTwiddle1,argTwiddle1,step1
|
||||
VLD2.F32 {dX1r,dX1i},[pSrc]!
|
||||
|
||||
SUB step1,step1,#8 @// (N/4-2)*8 bytes
|
||||
VLD1.F32 dW0i,[pTwiddleTmp],step1
|
||||
VLD1.F32 dW1i,[pTwiddleTmp]!
|
||||
SUB pSrc,pSrc,step
|
||||
|
||||
SUB pTwiddleTmp,pTwiddleTmp,step1
|
||||
VREV64.F32 dX1r,dX1r
|
||||
VREV64.F32 dX1i,dX1i
|
||||
SUBS size,size,#4
|
||||
|
||||
|
||||
VSUB.F32 dT2,dX0r,dX1r @// a-c
|
||||
VADD.F32 dT3,dX0i,dX1i @// b+d
|
||||
VADD.F32 dT0,dX0r,dX1r @// a+c
|
||||
VSUB.F32 dT1,dX0i,dX1i @// b-d
|
||||
SUB step1,step1,#8
|
||||
|
||||
VMUL.F32 dT2, dT2, half[0]
|
||||
VMUL.F32 dT3, dT3, half[0]
|
||||
|
||||
VMUL.F32 dT0, dT0, half[0]
|
||||
VMUL.F32 dT1, dT1, half[0]
|
||||
|
||||
VZIP.F32 dW1r,dW1i
|
||||
VZIP.F32 dW0r,dW0i
|
||||
|
||||
|
||||
VMUL.F32 dX1r,dW1r,dT2
|
||||
VMUL.F32 dX1i,dW1r,dT3
|
||||
VMUL.F32 dX0r,dW0r,dT2
|
||||
VMUL.F32 dX0i,dW0r,dT3
|
||||
|
||||
VMLS.F32 dX1r,dW1i,dT3
|
||||
VMLA.F32 dX1i,dW1i,dT2
|
||||
|
||||
VMLA.F32 dX0r,dW0i,dT3
|
||||
VMLS.F32 dX0i,dW0i,dT2
|
||||
|
||||
|
||||
VADD.F32 dY1r,dT0,dX1i @// F(N/2 -1)
|
||||
VSUB.F32 dY1i,dX1r,dT1
|
||||
|
||||
VREV64.F32 dY1r,dY1r
|
||||
VREV64.F32 dY1i,dY1i
|
||||
|
||||
|
||||
VADD.F32 dY0r,dT0,dX0i @// F(1)
|
||||
VSUB.F32 dY0i,dT1,dX0r
|
||||
|
||||
|
||||
VST2.F32 {dY0r,dY0i},[pOut1],step
|
||||
VST2.F32 {dY1r,dY1i},[pOut1]!
|
||||
SUB pOut1,pOut1,step
|
||||
SUB step,step,#32 @// (N/2-4)*8 bytes
|
||||
|
||||
|
||||
BGT evenOddButterflyLoop\name
|
||||
|
||||
|
||||
@// set both the ptrs to the last element
|
||||
SUB pSrc,pSrc,#8
|
||||
SUB pOut1,pOut1,#8
|
||||
|
||||
@// Last element can be expanded as follows
|
||||
@// 1/2[Z(k) + Z'(k)] - j w^-k [Z(k) - Z'(k)] (since W^k is stored as
|
||||
@// -ve)
|
||||
@// 1/2[(a+jb) + (a-jb)] - j w^-k [(a+jb) - (a-jb)]
|
||||
@// 1/2[2a+j0] - j (c-jd) [0+j2b]
|
||||
@// (a+bc, -bd)
|
||||
@// Since (c,d) = (0,1) for the last element, result is just (a,-b)
|
||||
|
||||
lastElement\name :
|
||||
VLD1.F32 dX0r,[pSrc]
|
||||
|
||||
VST1.F32 dX0r[0],[pOut1]!
|
||||
VNEG.F32 dX0r,dX0r
|
||||
VST1.F32 dX0r[1],[pOut1]
|
||||
|
||||
|
||||
|
||||
decrementScale\name :
|
||||
|
||||
.endm
|
||||
|
||||
M_START armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe,r4
|
||||
|
||||
FFTSTAGE "FALSE","TRUE",Inv
|
||||
M_END
|
||||
|
||||
.end
|
@ -1,134 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_fs_unsafe_s.S
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute the first stage of a Radix 2 DIT in-order out-of-place FFT
|
||||
@// stage for a N point complex signal.
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define pPingPongBuf r5
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define pointStep r3
|
||||
#define outPointStep r3
|
||||
#define grpSize r4
|
||||
#define setCount r4
|
||||
#define step r8
|
||||
#define dstStep r8
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dX0 D0
|
||||
#define dX1 D1
|
||||
#define dY0 D2
|
||||
#define dY1 D3
|
||||
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
|
||||
@// update subFFTSize and subFFTNum into RN6 and RN7 for the next stage
|
||||
|
||||
|
||||
MOV subFFTSize,#2
|
||||
LSR grpSize,subFFTNum,#1
|
||||
MOV subFFTNum,grpSize
|
||||
|
||||
|
||||
@// pT0+1 increments pT0 by 8 bytes
|
||||
@// pT0+pointStep = increment of 8*pointStep bytes = 4*grpSize bytes
|
||||
@// Note: outPointStep = pointStep for firststage
|
||||
@// Note: setCount = grpSize/2 (reuse the updated grpSize for setCount)
|
||||
|
||||
MOV pointStep,grpSize,LSL #3
|
||||
RSB step,pointStep,#8
|
||||
|
||||
|
||||
@// Loop on the sets for grp zero
|
||||
|
||||
grpZeroSetLoop\name :
|
||||
|
||||
VLD1.F32 dX0,[pSrc],pointStep
|
||||
VLD1.F32 dX1,[pSrc],step @// step = -pointStep + 8
|
||||
SUBS setCount,setCount,#1
|
||||
|
||||
VADD.F32 dY0,dX0,dX1
|
||||
VSUB.F32 dY1,dX0,dX1
|
||||
|
||||
VST1.F32 dY0,[pDst],outPointStep
|
||||
@// dstStep = step = -pointStep + 8
|
||||
VST1.F32 dY1,[pDst],dstStep
|
||||
|
||||
BGT grpZeroSetLoop\name
|
||||
|
||||
|
||||
@// reset pSrc to pDst for the next stage
|
||||
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
|
||||
MOV pDst,pPingPongBuf
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",fwd
|
||||
M_END
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",inv
|
||||
M_END
|
||||
|
||||
.end
|
@ -1,153 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_ls_unsafe_s.S
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute the last stage of a Radix 2 DIT in-order out-of-place FFT
|
||||
@// stage for a N point complex signal.
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
|
||||
#define outPointStep r3
|
||||
#define grpCount r4
|
||||
#define dstStep r5
|
||||
#define pTmp r4
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dWr d0
|
||||
#define dWi d1
|
||||
#define dXr0 d2
|
||||
#define dXi0 d3
|
||||
#define dXr1 d4
|
||||
#define dXi1 d5
|
||||
#define dYr0 d6
|
||||
#define dYi0 d7
|
||||
#define dYr1 d8
|
||||
#define dYi1 d9
|
||||
#define qT0 d10
|
||||
#define qT1 d12
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
|
||||
MOV outPointStep,subFFTSize,LSL #3
|
||||
@// Update grpCount and grpSize rightaway
|
||||
|
||||
MOV subFFTNum,#1 @//after the last stage
|
||||
LSL grpCount,subFFTSize,#1
|
||||
|
||||
@// update subFFTSize for the next stage
|
||||
MOV subFFTSize,grpCount
|
||||
|
||||
RSB dstStep,outPointStep,#16
|
||||
|
||||
|
||||
@// Loop on 2 grps at a time for the last stage
|
||||
|
||||
radix2lsGrpLoop\name :
|
||||
@ dWr = [pTwiddle[0].Re, pTwiddle[1].Re]
|
||||
@ dWi = [pTwiddle[0].Im, pTwiddle[1].Im]
|
||||
VLD2.F32 {dWr,dWi},[pTwiddle, :64]!
|
||||
|
||||
@ dXr0 = [pSrc[0].Re, pSrc[2].Re]
|
||||
@ dXi0 = [pSrc[0].Im, pSrc[2].Im]
|
||||
@ dXr1 = [pSrc[1].Re, pSrc[3].Re]
|
||||
@ dXi1 = [pSrc[1].Im, pSrc[3].Im]
|
||||
VLD4.F32 {dXr0,dXi0,dXr1,dXi1},[pSrc, :128]!
|
||||
SUBS grpCount,grpCount,#4 @// grpCount is multiplied by 2
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 qT0,dWr,dXr1
|
||||
VMLA.F32 qT0,dWi,dXi1 @// real part
|
||||
VMUL.F32 qT1,dWr,dXi1
|
||||
VMLS.F32 qT1,dWi,dXr1 @// imag part
|
||||
|
||||
.else
|
||||
|
||||
VMUL.F32 qT0,dWr,dXr1
|
||||
VMLS.F32 qT0,dWi,dXi1 @// real part
|
||||
VMUL.F32 qT1,dWr,dXi1
|
||||
VMLA.F32 qT1,dWi,dXr1 @// imag part
|
||||
|
||||
.endif
|
||||
|
||||
VSUB.F32 dYr0,dXr0,qT0
|
||||
VSUB.F32 dYi0,dXi0,qT1
|
||||
VADD.F32 dYr1,dXr0,qT0
|
||||
VADD.F32 dYi1,dXi0,qT1
|
||||
|
||||
VST2.F32 {dYr0,dYi0},[pDst],outPointStep
|
||||
VST2.F32 {dYr1,dYi1},[pDst],dstStep @// dstStep = step = -outPointStep + 16
|
||||
|
||||
BGT radix2lsGrpLoop\name
|
||||
|
||||
|
||||
@// Reset and Swap pSrc and pDst for the next stage
|
||||
MOV pTmp,pDst
|
||||
SUB pDst,pSrc,outPointStep,LSL #1 @// pDst -= 4*size; pSrc -= 8*size bytes
|
||||
SUB pSrc,pTmp,outPointStep
|
||||
|
||||
@// Reset pTwiddle for the next stage
|
||||
SUB pTwiddle,pTwiddle,outPointStep @// pTwiddle -= 4*size bytes
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe,r4,""
|
||||
FFTSTAGE "FALSE","FALSE",fwd
|
||||
M_END
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",inv
|
||||
M_END
|
||||
|
||||
.end
|
@ -1,191 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_unsafe_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@// Description:
|
||||
@// Compute a Radix 2 DIT in-order out-of-place FFT stage for an N point
|
||||
@// complex signal. This handles the general stage, not the first or last
|
||||
@// stage.
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define outPointStep r3
|
||||
#define pointStep r4
|
||||
#define grpCount r5
|
||||
#define setCount r8
|
||||
@//const RN 9
|
||||
#define step r10
|
||||
#define dstStep r11
|
||||
#define pTable r9
|
||||
#define pTmp r9
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dW D0
|
||||
#define dX0 D2
|
||||
#define dX1 D3
|
||||
#define dX2 D4
|
||||
#define dX3 D5
|
||||
#define dY0 D6
|
||||
#define dY1 D7
|
||||
#define dY2 D8
|
||||
#define dY3 D9
|
||||
#define qT0 D10
|
||||
#define qT1 D11
|
||||
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
|
||||
@// Update grpCount and grpSize rightaway inorder to reuse pGrpCount
|
||||
@// and pGrpSize regs
|
||||
|
||||
LSR subFFTNum,subFFTNum,#1 @//grpSize
|
||||
LSL grpCount,subFFTSize,#1
|
||||
|
||||
|
||||
@// pT0+1 increments pT0 by 8 bytes
|
||||
@// pT0+pointStep = increment of 8*pointStep bytes = 4*grpSize bytes
|
||||
MOV pointStep,subFFTNum,LSL #2
|
||||
|
||||
@// update subFFTSize for the next stage
|
||||
MOV subFFTSize,grpCount
|
||||
|
||||
@// pOut0+1 increments pOut0 by 8 bytes
|
||||
@// pOut0+outPointStep == increment of 8*outPointStep bytes =
|
||||
@// 4*size bytes
|
||||
SMULBB outPointStep,grpCount,pointStep
|
||||
LSL pointStep,pointStep,#1
|
||||
|
||||
|
||||
RSB step,pointStep,#16
|
||||
RSB dstStep,outPointStep,#16
|
||||
|
||||
@// Loop on the groups
|
||||
|
||||
radix2GrpLoop\name :
|
||||
MOV setCount,pointStep,LSR #3
|
||||
VLD1.F32 dW,[pTwiddle],pointStep @//[wi | wr]
|
||||
|
||||
|
||||
@// Loop on the sets
|
||||
|
||||
|
||||
radix2SetLoop\name :
|
||||
|
||||
|
||||
@// point0: dX0-real part dX1-img part
|
||||
VLD2.F32 {dX0,dX1},[pSrc],pointStep
|
||||
@// point1: dX2-real part dX3-img part
|
||||
VLD2.F32 {dX2,dX3},[pSrc],step
|
||||
|
||||
SUBS setCount,setCount,#2
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 qT0,dX2,dW[0]
|
||||
VMLA.F32 qT0,dX3,dW[1] @// real part
|
||||
VMUL.F32 qT1,dX3,dW[0]
|
||||
VMLS.F32 qT1,dX2,dW[1] @// imag part
|
||||
|
||||
.else
|
||||
|
||||
VMUL.F32 qT0,dX2,dW[0]
|
||||
VMLS.F32 qT0,dX3,dW[1] @// real part
|
||||
VMUL.F32 qT1,dX3,dW[0]
|
||||
VMLA.F32 qT1,dX2,dW[1] @// imag part
|
||||
|
||||
.endif
|
||||
|
||||
VSUB.F32 dY0,dX0,qT0
|
||||
VSUB.F32 dY1,dX1,qT1
|
||||
VADD.F32 dY2,dX0,qT0
|
||||
VADD.F32 dY3,dX1,qT1
|
||||
|
||||
VST2.F32 {dY0,dY1},[pDst],outPointStep
|
||||
@// dstStep = -outPointStep + 16
|
||||
VST2.F32 {dY2,dY3},[pDst],dstStep
|
||||
|
||||
BGT radix2SetLoop\name
|
||||
|
||||
SUBS grpCount,grpCount,#2
|
||||
ADD pSrc,pSrc,pointStep
|
||||
BGT radix2GrpLoop\name
|
||||
|
||||
|
||||
@// Reset and Swap pSrc and pDst for the next stage
|
||||
MOV pTmp,pDst
|
||||
@// pDst -= 4*size; pSrc -= 8*size bytes
|
||||
SUB pDst,pSrc,outPointStep,LSL #1
|
||||
SUB pSrc,pTmp,outPointStep
|
||||
|
||||
@// Reset pTwiddle for the next stage
|
||||
@// pTwiddle -= 4*size bytes
|
||||
SUB pTwiddle,pTwiddle,outPointStep
|
||||
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",FWD
|
||||
M_END
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",INV
|
||||
M_END
|
||||
|
||||
|
||||
.end
|
@ -1,251 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_fs_unsafe_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute a first stage Radix 4 FFT stage for a N point complex signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define pPingPongBuf r5
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define grpSize r3
|
||||
@// Reuse grpSize as setCount
|
||||
#define setCount r3
|
||||
#define pointStep r4
|
||||
#define outPointStep r4
|
||||
#define setStep r8
|
||||
#define step1 r9
|
||||
#define step3 r10
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dXr0 D0
|
||||
#define dXi0 D1
|
||||
#define dXr1 D2
|
||||
#define dXi1 D3
|
||||
#define dXr2 D4
|
||||
#define dXi2 D5
|
||||
#define dXr3 D6
|
||||
#define dXi3 D7
|
||||
#define dYr0 D8
|
||||
#define dYi0 D9
|
||||
#define dYr1 D10
|
||||
#define dYi1 D11
|
||||
#define dYr2 D12
|
||||
#define dYi2 D13
|
||||
#define dYr3 D14
|
||||
#define dYi3 D15
|
||||
#define qX0 Q0
|
||||
#define qX1 Q1
|
||||
#define qX2 Q2
|
||||
#define qX3 Q3
|
||||
#define qY0 Q4
|
||||
#define qY1 Q5
|
||||
#define qY2 Q6
|
||||
#define qY3 Q7
|
||||
#define dZr0 D16
|
||||
#define dZi0 D17
|
||||
#define dZr1 D18
|
||||
#define dZi1 D19
|
||||
#define dZr2 D20
|
||||
#define dZi2 D21
|
||||
#define dZr3 D22
|
||||
#define dZi3 D23
|
||||
#define qZ0 Q8
|
||||
#define qZ1 Q9
|
||||
#define qZ2 Q10
|
||||
#define qZ3 Q11
|
||||
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
@// pT0+1 increments pT0 by 8 bytes
|
||||
@// pT0+pointStep = increment of 8*pointStep bytes = 2*grpSize bytes
|
||||
@// Note: outPointStep = pointStep for firststage
|
||||
|
||||
MOV pointStep,subFFTNum,LSL #1
|
||||
|
||||
|
||||
@// Update pSubFFTSize and pSubFFTNum regs
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep @// data[0]
|
||||
@// subFFTSize = 1 for the first stage
|
||||
MOV subFFTSize,#4
|
||||
|
||||
@// Note: setCount = subFFTNum/4 (reuse the grpSize reg for setCount)
|
||||
LSR grpSize,subFFTNum,#2
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
|
||||
MOV subFFTNum,grpSize
|
||||
|
||||
|
||||
@// Calculate the step of input data for the next set
|
||||
@//MOV setStep,pointStep,LSL #1
|
||||
MOV setStep,grpSize,LSL #4
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
|
||||
@// setStep = 3*pointStep
|
||||
ADD setStep,setStep,pointStep
|
||||
@// setStep = - 3*pointStep+16
|
||||
RSB setStep,setStep,#16
|
||||
|
||||
@// data[3] & update pSrc for the next set
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
|
||||
@// step1 = 2*pointStep
|
||||
MOV step1,pointStep,LSL #1
|
||||
|
||||
VADD.F32 qY0,qX0,qX2
|
||||
|
||||
@// step3 = -pointStep
|
||||
RSB step3,pointStep,#0
|
||||
|
||||
@// grp = 0 a special case since all the twiddle factors are 1
|
||||
@// Loop on the sets : 2 sets at a time
|
||||
|
||||
radix4fsGrpZeroSetLoop\name :
|
||||
|
||||
|
||||
|
||||
@// Decrement setcount
|
||||
SUBS setCount,setCount,#2
|
||||
|
||||
|
||||
@// finish first stage of 4 point FFT
|
||||
|
||||
|
||||
VSUB.F32 qY2,qX0,qX2
|
||||
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc, :128],step1 @// data[0]
|
||||
VADD.F32 qY1,qX1,qX3
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc, :128],step3 @// data[2]
|
||||
VSUB.F32 qY3,qX1,qX3
|
||||
|
||||
|
||||
@// finish second stage of 4 point FFT
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc, :128],step1 @// data[1]
|
||||
VADD.F32 qZ0,qY0,qY1
|
||||
|
||||
@// data[3] & update pSrc for the next set, but not if it's the
|
||||
@// last iteration so that we don't read past the end of the
|
||||
@// input array.
|
||||
BEQ radix4SkipLastUpdateInv\name
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
|
||||
radix4SkipLastUpdateInv\name:
|
||||
VSUB.F32 dZr3,dYr2,dYi3
|
||||
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi3,dYi2,dYr3
|
||||
|
||||
VSUB.F32 qZ1,qY0,qY1
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
|
||||
|
||||
VADD.F32 dZr2,dYr2,dYi3
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi2,dYi2,dYr3
|
||||
|
||||
VADD.F32 qY0,qX0,qX2 @// u0 for next iteration
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],setStep
|
||||
|
||||
|
||||
.else
|
||||
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc, :128],step1 @// data[1]
|
||||
VADD.F32 qZ0,qY0,qY1
|
||||
|
||||
@// data[3] & update pSrc for the next set, but not if it's the
|
||||
@// last iteration so that we don't read past the end of the
|
||||
@// input array.
|
||||
BEQ radix4SkipLastUpdateFwd\name
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
|
||||
radix4SkipLastUpdateFwd\name:
|
||||
VADD.F32 dZr2,dYr2,dYi3
|
||||
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi2,dYi2,dYr3
|
||||
|
||||
VSUB.F32 qZ1,qY0,qY1
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
|
||||
|
||||
VSUB.F32 dZr3,dYr2,dYi3
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi3,dYi2,dYr3
|
||||
|
||||
VADD.F32 qY0,qX0,qX2 @// u0 for next iteration
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],setStep
|
||||
|
||||
.endif
|
||||
|
||||
BGT radix4fsGrpZeroSetLoop\name
|
||||
|
||||
@// reset pSrc to pDst for the next stage
|
||||
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
|
||||
MOV pDst,pPingPongBuf
|
||||
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",fwd
|
||||
M_END
|
||||
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",inv
|
||||
M_END
|
||||
|
||||
|
||||
.end
|
@ -1,339 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_ls_unsafe_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute a Radix 4 FFT stage for a N point complex signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
@//IMPORT armAAC_constTable
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define outPointStep r3
|
||||
#define grpCount r4
|
||||
#define dstStep r5
|
||||
#define grpTwStep r8
|
||||
#define stepTwiddle r9
|
||||
#define twStep r10
|
||||
#define pTmp r4
|
||||
#define step16 r11
|
||||
#define step24 r12
|
||||
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dButterfly1Real02 D0
|
||||
#define dButterfly1Imag02 D1
|
||||
#define dButterfly1Real13 D2
|
||||
#define dButterfly1Imag13 D3
|
||||
#define dButterfly2Real02 D4
|
||||
#define dButterfly2Imag02 D5
|
||||
#define dButterfly2Real13 D6
|
||||
#define dButterfly2Imag13 D7
|
||||
#define dXr0 D0
|
||||
#define dXi0 D1
|
||||
#define dXr1 D2
|
||||
#define dXi1 D3
|
||||
#define dXr2 D4
|
||||
#define dXi2 D5
|
||||
#define dXr3 D6
|
||||
#define dXi3 D7
|
||||
|
||||
#define dYr0 D16
|
||||
#define dYi0 D17
|
||||
#define dYr1 D18
|
||||
#define dYi1 D19
|
||||
#define dYr2 D20
|
||||
#define dYi2 D21
|
||||
#define dYr3 D22
|
||||
#define dYi3 D23
|
||||
|
||||
#define dW1r D8
|
||||
#define dW1i D9
|
||||
#define dW2r D10
|
||||
#define dW2i D11
|
||||
#define dW3r D12
|
||||
#define dW3i D13
|
||||
#define qT0 d14
|
||||
#define qT1 d16
|
||||
#define qT2 d18
|
||||
#define qT3 d20
|
||||
#define qT4 d22
|
||||
#define qT5 d24
|
||||
|
||||
#define dZr0 D14
|
||||
#define dZi0 D15
|
||||
#define dZr1 D26
|
||||
#define dZi1 D27
|
||||
#define dZr2 D28
|
||||
#define dZi2 D29
|
||||
#define dZr3 D30
|
||||
#define dZi3 D31
|
||||
|
||||
#define qX0 Q0
|
||||
#define qY0 Q8
|
||||
#define qY1 Q9
|
||||
#define qY2 Q10
|
||||
#define qY3 Q11
|
||||
#define qZ0 Q7
|
||||
#define qZ1 Q13
|
||||
#define qZ2 Q14
|
||||
#define qZ3 Q15
|
||||
|
||||
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse , name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
|
||||
@// pOut0+1 increments pOut0 by 8 bytes
|
||||
@// pOut0+outPointStep == increment of 8*outPointStep bytes
|
||||
MOV outPointStep,subFFTSize,LSL #3
|
||||
|
||||
@// Update grpCount and grpSize rightaway
|
||||
|
||||
VLD2.F32 {dW1r,dW1i},[pTwiddle, :128] @// [wi|wr]
|
||||
MOV step16,#16
|
||||
LSL grpCount,subFFTSize,#2
|
||||
|
||||
VLD1.F32 dW2r,[pTwiddle, :64] @// [wi|wr]
|
||||
MOV subFFTNum,#1 @//after the last stage
|
||||
|
||||
VLD1.F32 dW3r,[pTwiddle, :64],step16 @// [wi|wr]
|
||||
MOV stepTwiddle,#0
|
||||
|
||||
VLD1.F32 dW2i,[pTwiddle, :64]! @// [wi|wr]
|
||||
SUB grpTwStep,stepTwiddle,#8 @// grpTwStep = -8 to start with
|
||||
|
||||
@// update subFFTSize for the next stage
|
||||
MOV subFFTSize,grpCount
|
||||
VLD1.F32 dW3i,[pTwiddle, :64],grpTwStep @// [wi|wr]
|
||||
MOV dstStep,outPointStep,LSL #1
|
||||
|
||||
@// AC.r AC.i BD.r BD.i
|
||||
VLD4.F32 {dButterfly1Real02,dButterfly1Imag02,dButterfly1Real13,dButterfly1Imag13},[pSrc, :256]!
|
||||
ADD dstStep,dstStep,outPointStep @// dstStep = 3*outPointStep
|
||||
RSB dstStep,dstStep,#16 @// dstStep = - 3*outPointStep+16
|
||||
MOV step24,#24
|
||||
|
||||
@// AC.r AC.i BD.r BD.i
|
||||
VLD4.F32 {dButterfly2Real02,dButterfly2Imag02,dButterfly2Real13,dButterfly2Imag13},[pSrc, :256]!
|
||||
|
||||
|
||||
@// Process two groups at a time
|
||||
|
||||
radix4lsGrpLoop\name :
|
||||
|
||||
VZIP.F32 dW2r,dW2i
|
||||
ADD stepTwiddle,stepTwiddle,#16
|
||||
VZIP.F32 dW3r,dW3i
|
||||
ADD grpTwStep,stepTwiddle,#4
|
||||
VUZP.F32 dButterfly1Real13, dButterfly2Real13 @// B.r D.r
|
||||
SUB twStep,stepTwiddle,#16 @// -16+stepTwiddle
|
||||
VUZP.F32 dButterfly1Imag13, dButterfly2Imag13 @// B.i D.i
|
||||
MOV grpTwStep,grpTwStep,LSL #1
|
||||
VUZP.F32 dButterfly1Real02, dButterfly2Real02 @// A.r C.r
|
||||
RSB grpTwStep,grpTwStep,#0 @// -8-2*stepTwiddle
|
||||
|
||||
|
||||
VUZP.F32 dButterfly1Imag02, dButterfly2Imag02 @// A.i C.i
|
||||
|
||||
|
||||
@// grpCount is multiplied by 4
|
||||
SUBS grpCount,grpCount,#8
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 dZr1,dW1r,dXr1
|
||||
VMLA.F32 dZr1,dW1i,dXi1 @// real part
|
||||
VMUL.F32 dZi1,dW1r,dXi1
|
||||
VMLS.F32 dZi1,dW1i,dXr1 @// imag part
|
||||
|
||||
.else
|
||||
|
||||
VMUL.F32 dZr1,dW1r,dXr1
|
||||
VMLS.F32 dZr1,dW1i,dXi1 @// real part
|
||||
VMUL.F32 dZi1,dW1r,dXi1
|
||||
VMLA.F32 dZi1,dW1i,dXr1 @// imag part
|
||||
|
||||
.endif
|
||||
|
||||
VLD2.F32 {dW1r,dW1i},[pTwiddle, :128],stepTwiddle @// [wi|wr]
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 dZr2,dW2r,dXr2
|
||||
VMLA.F32 dZr2,dW2i,dXi2 @// real part
|
||||
VMUL.F32 dZi2,dW2r,dXi2
|
||||
VLD1.F32 dW2r,[pTwiddle, :64],step16 @// [wi|wr]
|
||||
VMLS.F32 dZi2,dW2i,dXr2 @// imag part
|
||||
|
||||
.else
|
||||
|
||||
VMUL.F32 dZr2,dW2r,dXr2
|
||||
VMLS.F32 dZr2,dW2i,dXi2 @// real part
|
||||
VMUL.F32 dZi2,dW2r,dXi2
|
||||
VLD1.F32 dW2r,[pTwiddle, :64],step16 @// [wi|wr]
|
||||
VMLA.F32 dZi2,dW2i,dXr2 @// imag part
|
||||
|
||||
.endif
|
||||
|
||||
|
||||
VLD1.F32 dW2i,[pTwiddle, :64],twStep @// [wi|wr]
|
||||
|
||||
@// move qX0 so as to load for the next iteration
|
||||
VMOV qZ0,qX0
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 dZr3,dW3r,dXr3
|
||||
VMLA.F32 dZr3,dW3i,dXi3 @// real part
|
||||
VMUL.F32 dZi3,dW3r,dXi3
|
||||
VLD1.F32 dW3r,[pTwiddle, :64],step24
|
||||
VMLS.F32 dZi3,dW3i,dXr3 @// imag part
|
||||
|
||||
.else
|
||||
|
||||
VMUL.F32 dZr3,dW3r,dXr3
|
||||
VMLS.F32 dZr3,dW3i,dXi3 @// real part
|
||||
VMUL.F32 dZi3,dW3r,dXi3
|
||||
VLD1.F32 dW3r,[pTwiddle, :64],step24
|
||||
VMLA.F32 dZi3,dW3i,dXr3 @// imag part
|
||||
|
||||
.endif
|
||||
|
||||
VLD1.F32 dW3i,[pTwiddle, :64],grpTwStep @// [wi|wr]
|
||||
|
||||
@// Don't do the load on the last iteration so we don't read past the end
|
||||
@// of pSrc.
|
||||
addeq pSrc, pSrc, #64
|
||||
beq radix4lsSkipRead\name
|
||||
@// AC.r AC.i BD.r BD.i
|
||||
VLD4.F32 {dButterfly1Real02,dButterfly1Imag02,dButterfly1Real13,dButterfly1Imag13},[pSrc, :256]!
|
||||
|
||||
@// AC.r AC.i BD.r BD.i
|
||||
VLD4.F32 {dButterfly2Real02,dButterfly2Imag02,dButterfly2Real13,dButterfly2Imag13},[pSrc, :256]!
|
||||
radix4lsSkipRead\name:
|
||||
|
||||
@// finish first stage of 4 point FFT
|
||||
|
||||
VADD.F32 qY0,qZ0,qZ2
|
||||
VSUB.F32 qY2,qZ0,qZ2
|
||||
VADD.F32 qY1,qZ1,qZ3
|
||||
VSUB.F32 qY3,qZ1,qZ3
|
||||
|
||||
|
||||
@// finish second stage of 4 point FFT
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
|
||||
VSUB.F32 qZ0,qY2,qY1
|
||||
|
||||
VADD.F32 dZr3,dYr0,dYi3
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi3,dYi0,dYr3
|
||||
|
||||
VADD.F32 qZ2,qY2,qY1
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
|
||||
|
||||
VSUB.F32 dZr1,dYr0,dYi3
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi1,dYi0,dYr3
|
||||
|
||||
@// dstStep = -outPointStep + 16
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],dstStep
|
||||
|
||||
|
||||
.else
|
||||
|
||||
VSUB.F32 qZ0,qY2,qY1
|
||||
|
||||
VSUB.F32 dZr1,dYr0,dYi3
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi1,dYi0,dYr3
|
||||
|
||||
VADD.F32 qZ2,qY2,qY1
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
|
||||
|
||||
VADD.F32 dZr3,dYr0,dYi3
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi3,dYi0,dYr3
|
||||
|
||||
@// dstStep = -outPointStep + 16
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],dstStep
|
||||
|
||||
|
||||
.endif
|
||||
|
||||
BGT radix4lsGrpLoop\name
|
||||
|
||||
|
||||
@// Reset and Swap pSrc and pDst for the next stage
|
||||
MOV pTmp,pDst
|
||||
@// Extra increment done in final iteration of the loop
|
||||
SUB pSrc,pSrc,#64
|
||||
@// pDst -= 4*size; pSrc -= 8*size bytes
|
||||
SUB pDst,pSrc,outPointStep,LSL #2
|
||||
SUB pSrc,pTmp,outPointStep
|
||||
SUB pTwiddle,pTwiddle,subFFTSize,LSL #1
|
||||
@// Extra increment done in final iteration of the loop
|
||||
SUB pTwiddle,pTwiddle,#16
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",fwd
|
||||
M_END
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",inv
|
||||
M_END
|
||||
|
||||
|
||||
.end
|
@ -1,331 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_unsafe_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute a Radix 4 FFT stage for a N point complex signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define grpCount r3
|
||||
#define pointStep r4
|
||||
#define outPointStep r5
|
||||
#define stepTwiddle r12
|
||||
#define setCount r14
|
||||
#define srcStep r8
|
||||
#define setStep r9
|
||||
#define dstStep r10
|
||||
#define twStep r11
|
||||
#define t1 r3
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dW1 D0
|
||||
#define dW2 D1
|
||||
#define dW3 D2
|
||||
|
||||
#define dXr0 D4
|
||||
#define dXi0 D5
|
||||
#define dXr1 D6
|
||||
#define dXi1 D7
|
||||
#define dXr2 D8
|
||||
#define dXi2 D9
|
||||
#define dXr3 D10
|
||||
#define dXi3 D11
|
||||
#define dYr0 D12
|
||||
#define dYi0 D13
|
||||
#define dYr1 D14
|
||||
#define dYi1 D15
|
||||
#define dYr2 D16
|
||||
#define dYi2 D17
|
||||
#define dYr3 D18
|
||||
#define dYi3 D19
|
||||
#define qT0 d16
|
||||
#define qT1 d18
|
||||
#define qT2 d12
|
||||
#define qT3 d14
|
||||
#define dZr0 D20
|
||||
#define dZi0 D21
|
||||
#define dZr1 D22
|
||||
#define dZi1 D23
|
||||
#define dZr2 D24
|
||||
#define dZi2 D25
|
||||
#define dZr3 D26
|
||||
#define dZi3 D27
|
||||
|
||||
#define qY0 Q6
|
||||
#define qY1 Q7
|
||||
#define qY2 Q8
|
||||
#define qY3 Q9
|
||||
#define qX0 Q2
|
||||
#define qZ0 Q10
|
||||
#define qZ1 Q11
|
||||
#define qZ2 Q12
|
||||
#define qZ3 Q13
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse , name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
|
||||
@// Update grpCount and grpSize rightaway inorder to reuse
|
||||
@// pGrpCount and pGrpSize regs
|
||||
|
||||
LSL grpCount,subFFTSize,#2
|
||||
LSR subFFTNum,subFFTNum,#2
|
||||
MOV subFFTSize,grpCount
|
||||
|
||||
VLD1.F32 dW1,[pTwiddle] @//[wi | wr]
|
||||
@// pT0+1 increments pT0 by 8 bytes
|
||||
@// pT0+pointStep = increment of 8*pointStep bytes = 2*grpSize bytes
|
||||
MOV pointStep,subFFTNum,LSL #1
|
||||
|
||||
|
||||
@// pOut0+1 increments pOut0 by 8 bytes
|
||||
@// pOut0+outPointStep == increment of 8*outPointStep bytes
|
||||
@// = 2*size bytes
|
||||
|
||||
MOV stepTwiddle,#0
|
||||
VLD1.F32 dW2,[pTwiddle] @//[wi | wr]
|
||||
SMULBB outPointStep,grpCount,pointStep
|
||||
LSL pointStep,pointStep,#2 @// 2*grpSize
|
||||
|
||||
VLD1.F32 dW3,[pTwiddle] @//[wi | wr]
|
||||
MOV srcStep,pointStep,LSL #1 @// srcStep = 2*pointStep
|
||||
ADD setStep,srcStep,pointStep @// setStep = 3*pointStep
|
||||
|
||||
RSB setStep,setStep,#0 @// setStep = - 3*pointStep
|
||||
SUB srcStep,srcStep,#16 @// srcStep = 2*pointStep-16
|
||||
|
||||
MOV dstStep,outPointStep,LSL #1
|
||||
ADD dstStep,dstStep,outPointStep @// dstStep = 3*outPointStep
|
||||
@// dstStep = - 3*outPointStep+16
|
||||
RSB dstStep,dstStep,#16
|
||||
|
||||
|
||||
|
||||
radix4GrpLoop\name :
|
||||
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc],pointStep @// data[0]
|
||||
ADD stepTwiddle,stepTwiddle,pointStep
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep @// data[1]
|
||||
@// set pTwiddle to the first point
|
||||
ADD pTwiddle,pTwiddle,stepTwiddle
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep @// data[2]
|
||||
MOV twStep,stepTwiddle,LSL #2
|
||||
|
||||
@// data[3] & update pSrc for the next set
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc],setStep
|
||||
SUB twStep,stepTwiddle,twStep @// twStep = -3*stepTwiddle
|
||||
|
||||
MOV setCount,pointStep,LSR #3
|
||||
@// set pSrc to data[0] of the next set
|
||||
ADD pSrc,pSrc,#16
|
||||
@// increment to data[1] of the next set
|
||||
ADD pSrc,pSrc,pointStep
|
||||
|
||||
|
||||
@// Loop on the sets
|
||||
|
||||
radix4SetLoop\name :
|
||||
|
||||
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
VMUL.F32 dZr1,dXr1,dW1[0]
|
||||
VMUL.F32 dZi1,dXi1,dW1[0]
|
||||
VMUL.F32 dZr2,dXr2,dW2[0]
|
||||
VMUL.F32 dZi2,dXi2,dW2[0]
|
||||
VMUL.F32 dZr3,dXr3,dW3[0]
|
||||
VMUL.F32 dZi3,dXi3,dW3[0]
|
||||
|
||||
VMLA.F32 dZr1,dXi1,dW1[1] @// real part
|
||||
VMLS.F32 dZi1,dXr1,dW1[1] @// imag part
|
||||
|
||||
@// data[1] for next iteration
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep
|
||||
|
||||
VMLA.F32 dZr2,dXi2,dW2[1] @// real part
|
||||
VMLS.F32 dZi2,dXr2,dW2[1] @// imag part
|
||||
|
||||
@// data[2] for next iteration
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep
|
||||
|
||||
VMLA.F32 dZr3,dXi3,dW3[1] @// real part
|
||||
VMLS.F32 dZi3,dXr3,dW3[1] @// imag part
|
||||
.else
|
||||
VMUL.F32 dZr1,dXr1,dW1[0]
|
||||
VMUL.F32 dZi1,dXi1,dW1[0]
|
||||
VMUL.F32 dZr2,dXr2,dW2[0]
|
||||
VMUL.F32 dZi2,dXi2,dW2[0]
|
||||
VMUL.F32 dZr3,dXr3,dW3[0]
|
||||
VMUL.F32 dZi3,dXi3,dW3[0]
|
||||
|
||||
VMLS.F32 dZr1,dXi1,dW1[1] @// real part
|
||||
VMLA.F32 dZi1,dXr1,dW1[1] @// imag part
|
||||
|
||||
@// data[1] for next iteration
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep
|
||||
|
||||
VMLS.F32 dZr2,dXi2,dW2[1] @// real part
|
||||
VMLA.F32 dZi2,dXr2,dW2[1] @// imag part
|
||||
|
||||
@// data[2] for next iteration
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep
|
||||
|
||||
VMLS.F32 dZr3,dXi3,dW3[1] @// real part
|
||||
VMLA.F32 dZi3,dXr3,dW3[1] @// imag part
|
||||
.endif
|
||||
|
||||
@// data[3] & update pSrc to data[0]
|
||||
@// But don't read on the very last iteration because that reads past
|
||||
@// the end of pSrc. The last iteration is grpCount = 4, setCount = 2.
|
||||
cmp grpCount, #4
|
||||
cmpeq setCount, #2 @// Test setCount if grpCount = 4
|
||||
@// These are executed only if both grpCount = 4 and setCount = 2
|
||||
addeq pSrc, pSrc, setStep
|
||||
beq radix4SkipRead\name
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc],setStep
|
||||
radix4SkipRead\name:
|
||||
SUBS setCount,setCount,#2
|
||||
|
||||
@// finish first stage of 4 point FFT
|
||||
VADD.F32 qY0,qX0,qZ2
|
||||
VSUB.F32 qY2,qX0,qZ2
|
||||
|
||||
@// data[0] for next iteration
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc, :128]!
|
||||
VADD.F32 qY1,qZ1,qZ3
|
||||
VSUB.F32 qY3,qZ1,qZ3
|
||||
|
||||
@// finish second stage of 4 point FFT
|
||||
|
||||
VSUB.F32 qZ0,qY2,qY1
|
||||
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
|
||||
VADD.F32 dZr3,dYr0,dYi3
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi3,dYi0,dYr3
|
||||
|
||||
VADD.F32 qZ2,qY2,qY1
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
|
||||
|
||||
VSUB.F32 dZr1,dYr0,dYi3
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi1,dYi0,dYr3
|
||||
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],dstStep
|
||||
|
||||
|
||||
.else
|
||||
|
||||
VSUB.F32 dZr1,dYr0,dYi3
|
||||
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
|
||||
VADD.F32 dZi1,dYi0,dYr3
|
||||
|
||||
VADD.F32 qZ2,qY2,qY1
|
||||
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
|
||||
|
||||
VADD.F32 dZr3,dYr0,dYi3
|
||||
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
|
||||
VSUB.F32 dZi3,dYi0,dYr3
|
||||
|
||||
VST2.F32 {dZr3,dZi3},[pDst, :128],dstStep
|
||||
|
||||
|
||||
.endif
|
||||
|
||||
@// increment to data[1] of the next set
|
||||
ADD pSrc,pSrc,pointStep
|
||||
BGT radix4SetLoop\name
|
||||
|
||||
|
||||
VLD1.F32 dW1,[pTwiddle, :64],stepTwiddle @//[wi | wr]
|
||||
@// subtract 4 since grpCount multiplied by 4
|
||||
SUBS grpCount,grpCount,#4
|
||||
VLD1.F32 dW2,[pTwiddle, :64],stepTwiddle @//[wi | wr]
|
||||
@// increment pSrc for the next grp
|
||||
ADD pSrc,pSrc,srcStep
|
||||
VLD1.F32 dW3,[pTwiddle, :64],twStep @//[wi | wr]
|
||||
BGT radix4GrpLoop\name
|
||||
|
||||
|
||||
@// Reset and Swap pSrc and pDst for the next stage
|
||||
MOV t1,pDst
|
||||
@// pDst -= 2*size; pSrc -= 8*size bytes
|
||||
SUB pDst,pSrc,outPointStep,LSL #2
|
||||
SUB pSrc,t1,outPointStep
|
||||
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",FWD
|
||||
M_END
|
||||
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",INV
|
||||
M_END
|
||||
|
||||
|
||||
.end
|
@ -1,422 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of armSP_FFT_CToC_FC32_Radix8_fs_unsafe_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute a first stage Radix 8 FFT stage for a N point complex signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r2
|
||||
#define pTwiddle r1
|
||||
#define subFFTNum r6
|
||||
#define subFFTSize r7
|
||||
@// dest buffer for the next stage (not pSrc for first stage)
|
||||
#define pPingPongBuf r5
|
||||
|
||||
|
||||
@//Output Registers
|
||||
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define grpSize r3
|
||||
@// Reuse grpSize as setCount
|
||||
#define setCount r3
|
||||
#define pointStep r4
|
||||
#define outPointStep r4
|
||||
#define setStep r8
|
||||
#define step1 r9
|
||||
#define step2 r10
|
||||
#define t0 r11
|
||||
|
||||
|
||||
@// Neon Registers
|
||||
|
||||
#define dXr0 D0
|
||||
#define dXi0 D1
|
||||
#define dXr1 D2
|
||||
#define dXi1 D3
|
||||
#define dXr2 D4
|
||||
#define dXi2 D5
|
||||
#define dXr3 D6
|
||||
#define dXi3 D7
|
||||
#define dXr4 D8
|
||||
#define dXi4 D9
|
||||
#define dXr5 D10
|
||||
#define dXi5 D11
|
||||
#define dXr6 D12
|
||||
#define dXi6 D13
|
||||
#define dXr7 D14
|
||||
#define dXi7 D15
|
||||
#define qX0 Q0
|
||||
#define qX1 Q1
|
||||
#define qX2 Q2
|
||||
#define qX3 Q3
|
||||
#define qX4 Q4
|
||||
#define qX5 Q5
|
||||
#define qX6 Q6
|
||||
#define qX7 Q7
|
||||
|
||||
#define dUr0 D16
|
||||
#define dUi0 D17
|
||||
#define dUr2 D18
|
||||
#define dUi2 D19
|
||||
#define dUr4 D20
|
||||
#define dUi4 D21
|
||||
#define dUr6 D22
|
||||
#define dUi6 D23
|
||||
#define dUr1 D24
|
||||
#define dUi1 D25
|
||||
#define dUr3 D26
|
||||
#define dUi3 D27
|
||||
#define dUr5 D28
|
||||
#define dUi5 D29
|
||||
@// reuse dXr7 and dXi7
|
||||
#define dUr7 D30
|
||||
#define dUi7 D31
|
||||
#define qU0 Q8
|
||||
#define qU1 Q12
|
||||
#define qU2 Q9
|
||||
#define qU3 Q13
|
||||
#define qU4 Q10
|
||||
#define qU5 Q14
|
||||
#define qU6 Q11
|
||||
#define qU7 Q15
|
||||
|
||||
|
||||
#define dVr0 D24
|
||||
#define dVi0 D25
|
||||
#define dVr2 D26
|
||||
#define dVi2 D27
|
||||
#define dVr4 D28
|
||||
#define dVi4 D29
|
||||
#define dVr6 D30
|
||||
#define dVi6 D31
|
||||
#define dVr1 D16
|
||||
#define dVi1 D17
|
||||
#define dVr3 D18
|
||||
#define dVi3 D19
|
||||
#define dVr5 D20
|
||||
#define dVi5 D21
|
||||
#define dVr7 D22
|
||||
#define dVi7 D23
|
||||
#define qV0 Q12
|
||||
#define qV1 Q8
|
||||
#define qV2 Q13
|
||||
#define qV3 Q9
|
||||
#define qV4 Q14
|
||||
#define qV5 Q10
|
||||
#define qV6 Q15
|
||||
#define qV7 Q11
|
||||
|
||||
#define dYr0 D16
|
||||
#define dYi0 D17
|
||||
#define dYr2 D18
|
||||
#define dYi2 D19
|
||||
#define dYr4 D20
|
||||
#define dYi4 D21
|
||||
#define dYr6 D22
|
||||
#define dYi6 D23
|
||||
#define dYr1 D24
|
||||
#define dYi1 D25
|
||||
#define dYr3 D26
|
||||
#define dYi3 D27
|
||||
#define dYr5 D28
|
||||
#define dYi5 D29
|
||||
#define dYr7 D30
|
||||
#define dYi7 D31
|
||||
#define qY0 Q8
|
||||
#define qY1 Q12
|
||||
#define qY2 Q9
|
||||
#define qY3 Q13
|
||||
#define qY4 Q10
|
||||
#define qY5 Q14
|
||||
#define qY6 Q11
|
||||
#define qY7 Q15
|
||||
|
||||
#define dT0 D14
|
||||
#define dT1 D15
|
||||
|
||||
|
||||
.MACRO FFTSTAGE scaled, inverse, name
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
@// Update pSubFFTSize and pSubFFTNum regs
|
||||
@// subFFTSize = 1 for the first stage
|
||||
MOV subFFTSize,#8
|
||||
ADR t0,ONEBYSQRT2\name
|
||||
|
||||
@// Note: setCount = subFFTNum/8 (reuse the grpSize reg for setCount)
|
||||
LSR grpSize,subFFTNum,#3
|
||||
MOV subFFTNum,grpSize
|
||||
|
||||
|
||||
@// pT0+1 increments pT0 by 8 bytes
|
||||
@// pT0+pointStep = increment of 8*pointStep bytes = grpSize bytes
|
||||
@// Note: outPointStep = pointStep for firststage
|
||||
|
||||
MOV pointStep,grpSize,LSL #3
|
||||
|
||||
|
||||
@// Calculate the step of input data for the next set
|
||||
@//MOV step1,pointStep,LSL #1 @// step1 = 2*pointStep
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep @// data[0]
|
||||
MOV step1,grpSize,LSL #4
|
||||
|
||||
MOV step2,pointStep,LSL #3
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
|
||||
SUB step2,step2,pointStep @// step2 = 7*pointStep
|
||||
@// setStep = - 7*pointStep+16
|
||||
RSB setStep,step2,#16
|
||||
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc, :128],pointStep @// data[3]
|
||||
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
|
||||
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
|
||||
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
|
||||
@// data[7] & update pSrc for the next set
|
||||
@// setStep = -7*pointStep + 16
|
||||
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep
|
||||
@// grp = 0 a special case since all the twiddle factors are 1
|
||||
@// Loop on the sets
|
||||
|
||||
radix8fsGrpZeroSetLoop\name :
|
||||
|
||||
@// Decrement setcount
|
||||
SUBS setCount,setCount,#2
|
||||
|
||||
|
||||
@// finish first stage of 8 point FFT
|
||||
|
||||
VADD.F32 qU0,qX0,qX4
|
||||
VADD.F32 qU2,qX1,qX5
|
||||
VADD.F32 qU4,qX2,qX6
|
||||
VADD.F32 qU6,qX3,qX7
|
||||
|
||||
@// finish second stage of 8 point FFT
|
||||
|
||||
VADD.F32 qV0,qU0,qU4
|
||||
VSUB.F32 qV2,qU0,qU4
|
||||
VADD.F32 qV4,qU2,qU6
|
||||
VSUB.F32 qV6,qU2,qU6
|
||||
|
||||
@// finish third stage of 8 point FFT
|
||||
|
||||
VADD.F32 qY0,qV0,qV4
|
||||
VSUB.F32 qY4,qV0,qV4
|
||||
VST2.F32 {dYr0,dYi0},[pDst, :128],step1 @// store y0
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
|
||||
VSUB.F32 dYr2,dVr2,dVi6
|
||||
VADD.F32 dYi2,dVi2,dVr6
|
||||
|
||||
VADD.F32 dYr6,dVr2,dVi6
|
||||
VST2.F32 {dYr2,dYi2},[pDst, :128],step1 @// store y2
|
||||
VSUB.F32 dYi6,dVi2,dVr6
|
||||
|
||||
VSUB.F32 qU1,qX0,qX4
|
||||
VST2.F32 {dYr4,dYi4},[pDst, :128],step1 @// store y4
|
||||
|
||||
VSUB.F32 qU3,qX1,qX5
|
||||
VSUB.F32 qU5,qX2,qX6
|
||||
VST2.F32 {dYr6,dYi6},[pDst, :128],step1 @// store y6
|
||||
|
||||
.ELSE
|
||||
|
||||
VADD.F32 dYr6,dVr2,dVi6
|
||||
VSUB.F32 dYi6,dVi2,dVr6
|
||||
|
||||
VSUB.F32 dYr2,dVr2,dVi6
|
||||
VST2.F32 {dYr6,dYi6},[pDst, :128],step1 @// store y2
|
||||
VADD.F32 dYi2,dVi2,dVr6
|
||||
|
||||
|
||||
VSUB.F32 qU1,qX0,qX4
|
||||
VST2.F32 {dYr4,dYi4},[pDst, :128],step1 @// store y4
|
||||
VSUB.F32 qU3,qX1,qX5
|
||||
VSUB.F32 qU5,qX2,qX6
|
||||
VST2.F32 {dYr2,dYi2},[pDst, :128],step1 @// store y6
|
||||
|
||||
|
||||
.ENDIF
|
||||
|
||||
@// finish first stage of 8 point FFT
|
||||
|
||||
VSUB.F32 qU7,qX3,qX7
|
||||
VLD1.F32 dT0[0], [t0]
|
||||
|
||||
@// finish second stage of 8 point FFT
|
||||
|
||||
VSUB.F32 dVr1,dUr1,dUi5
|
||||
@// data[0] for next iteration
|
||||
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep
|
||||
VADD.F32 dVi1,dUi1,dUr5
|
||||
VADD.F32 dVr3,dUr1,dUi5
|
||||
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
|
||||
VSUB.F32 dVi3,dUi1,dUr5
|
||||
|
||||
VSUB.F32 dVr5,dUr3,dUi7
|
||||
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
|
||||
VADD.F32 dVi5,dUi3,dUr7
|
||||
VADD.F32 dVr7,dUr3,dUi7
|
||||
VLD2.F32 {dXr3,dXi3},[pSrc, :128],pointStep @// data[3]
|
||||
VSUB.F32 dVi7,dUi3,dUr7
|
||||
|
||||
@// finish third stage of 8 point FFT
|
||||
|
||||
.ifeqs "\inverse", "TRUE"
|
||||
|
||||
@// calculate a*v5
|
||||
VMUL.F32 dT1,dVr5,dT0[0] @// use dVi0 for dT1
|
||||
|
||||
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
|
||||
VMUL.F32 dVi5,dVi5,dT0[0]
|
||||
|
||||
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
|
||||
VSUB.F32 dVr5,dT1,dVi5 @// a * V5
|
||||
VADD.F32 dVi5,dT1,dVi5
|
||||
|
||||
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
|
||||
|
||||
@// calculate b*v7
|
||||
VMUL.F32 dT1,dVr7,dT0[0]
|
||||
VMUL.F32 dVi7,dVi7,dT0[0]
|
||||
|
||||
VADD.F32 qY1,qV1,qV5
|
||||
VSUB.F32 qY5,qV1,qV5
|
||||
|
||||
|
||||
VADD.F32 dVr7,dT1,dVi7 @// b * V7
|
||||
VSUB.F32 dVi7,dVi7,dT1
|
||||
SUB pDst, pDst, step2 @// set pDst to y1
|
||||
|
||||
@// On the last iteration, this will read past the end of pSrc,
|
||||
@// so skip this read.
|
||||
BEQ radix8SkipLastUpdateInv\name
|
||||
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep @// data[7]
|
||||
radix8SkipLastUpdateInv\name:
|
||||
|
||||
VSUB.F32 dYr3,dVr3,dVr7
|
||||
VSUB.F32 dYi3,dVi3,dVi7
|
||||
VST2.F32 {dYr1,dYi1},[pDst, :128],step1 @// store y1
|
||||
VADD.F32 dYr7,dVr3,dVr7
|
||||
VADD.F32 dYi7,dVi3,dVi7
|
||||
|
||||
|
||||
VST2.F32 {dYr3,dYi3},[pDst, :128],step1 @// store y3
|
||||
VST2.F32 {dYr5,dYi5},[pDst, :128],step1 @// store y5
|
||||
VST2.F32 {dYr7,dYi7},[pDst, :128] @// store y7
|
||||
ADD pDst, pDst, #16
|
||||
|
||||
.ELSE
|
||||
|
||||
@// calculate b*v7
|
||||
VMUL.F32 dT1,dVr7,dT0[0]
|
||||
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
|
||||
VMUL.F32 dVi7,dVi7,dT0[0]
|
||||
|
||||
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
|
||||
VADD.F32 dVr7,dT1,dVi7 @// b * V7
|
||||
VSUB.F32 dVi7,dVi7,dT1
|
||||
|
||||
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
|
||||
|
||||
@// calculate a*v5
|
||||
VMUL.F32 dT1,dVr5,dT0[0] @// use dVi0 for dT1
|
||||
VMUL.F32 dVi5,dVi5,dT0[0]
|
||||
|
||||
VADD.F32 dYr7,dVr3,dVr7
|
||||
VADD.F32 dYi7,dVi3,dVi7
|
||||
SUB pDst, pDst, step2 @// set pDst to y1
|
||||
|
||||
VSUB.F32 dVr5,dT1,dVi5 @// a * V5
|
||||
VADD.F32 dVi5,dT1,dVi5
|
||||
|
||||
@// On the last iteration, this will read past the end of pSrc,
|
||||
@// so skip this read.
|
||||
BEQ radix8SkipLastUpdateFwd\name
|
||||
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep @// data[7]
|
||||
radix8SkipLastUpdateFwd\name:
|
||||
|
||||
VSUB.F32 qY5,qV1,qV5
|
||||
|
||||
VSUB.F32 dYr3,dVr3,dVr7
|
||||
VST2.F32 {dYr7,dYi7},[pDst, :128],step1 @// store y1
|
||||
VSUB.F32 dYi3,dVi3,dVi7
|
||||
VADD.F32 qY1,qV1,qV5
|
||||
|
||||
|
||||
VST2.F32 {dYr5,dYi5},[pDst, :128],step1 @// store y3
|
||||
VST2.F32 {dYr3,dYi3},[pDst, :128],step1 @// store y5
|
||||
VST2.F32 {dYr1,dYi1},[pDst, :128]! @// store y7
|
||||
|
||||
.ENDIF
|
||||
|
||||
|
||||
@// update pDst for the next set
|
||||
SUB pDst, pDst, step2
|
||||
BGT radix8fsGrpZeroSetLoop\name
|
||||
|
||||
|
||||
@// reset pSrc to pDst for the next stage
|
||||
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
|
||||
MOV pDst,pPingPongBuf
|
||||
|
||||
|
||||
|
||||
.endm
|
||||
|
||||
|
||||
@// Allocate stack memory required by the function
|
||||
|
||||
|
||||
M_START armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","FALSE",FWD
|
||||
M_END
|
||||
ONEBYSQRT2FWD: .float 0.7071067811865476e0
|
||||
|
||||
M_START armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe,r4
|
||||
FFTSTAGE "FALSE","TRUE",INV
|
||||
M_END
|
||||
ONEBYSQRT2INV: .float 0.7071067811865476e0
|
||||
|
||||
|
||||
.end
|
File diff suppressed because it is too large
Load Diff
@ -1,404 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of omxSP_FFTFwd_RToCCS_S32_Sfs_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute FFT for a real signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
|
||||
.extern armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r1
|
||||
#define pFFTSpec r2
|
||||
#define scale r3
|
||||
|
||||
|
||||
@// Output registers
|
||||
#define result r0
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define argTwiddle r1
|
||||
#define argDst r2
|
||||
#define argScale r4
|
||||
#define tmpOrder r4
|
||||
#define pTwiddle r4
|
||||
#define pOut r5
|
||||
#define subFFTSize r7
|
||||
#define subFFTNum r6
|
||||
#define N r6
|
||||
#define order r14
|
||||
#define diff r9
|
||||
@// Total num of radix stages required to comple the FFT
|
||||
#define count r8
|
||||
#define x0r r4
|
||||
#define x0i r5
|
||||
#define diffMinusOne r2
|
||||
#define subFFTSizeTmp r6
|
||||
#define step r3
|
||||
#define step1 r4
|
||||
#define twStep r8
|
||||
#define zero r9
|
||||
#define pTwiddleTmp r5
|
||||
#define t0 r10
|
||||
|
||||
@// Neon registers
|
||||
|
||||
#define dX0 d0
|
||||
#define dzero d1
|
||||
#define dZero d2
|
||||
#define dShift d3
|
||||
#define dX0r d2
|
||||
#define dX0i d3
|
||||
#define dX1r d4
|
||||
#define dX1i d5
|
||||
#define dT0 d6
|
||||
#define dT1 d7
|
||||
#define dT2 d8
|
||||
#define dT3 d9
|
||||
#define qT0 d10
|
||||
#define qT1 d12
|
||||
#define dW0r d14
|
||||
#define dW0i d15
|
||||
#define dW1r d16
|
||||
#define dW1i d17
|
||||
#define dY0r d14
|
||||
#define dY0i d15
|
||||
#define dY1r d16
|
||||
#define dY1i d17
|
||||
#define dY0rS64 d14.s64
|
||||
#define dY0iS64 d15.s64
|
||||
#define qT2 d18
|
||||
#define qT3 d20
|
||||
@// lastThreeelements
|
||||
#define dX1 d3
|
||||
#define dW0 d4
|
||||
#define dW1 d5
|
||||
#define dY0 d10
|
||||
#define dY1 d11
|
||||
#define dY2 d12
|
||||
#define dY3 d13
|
||||
|
||||
#define half d0
|
||||
|
||||
@// Allocate stack memory required by the function
|
||||
|
||||
@// Write function header
|
||||
M_START omxSP_FFTFwd_RToCCS_F32_Sfs,r11,d15
|
||||
|
||||
@ Structure offsets for the FFTSpec
|
||||
.set ARMsFFTSpec_N, 0
|
||||
.set ARMsFFTSpec_pBitRev, 4
|
||||
.set ARMsFFTSpec_pTwiddle, 8
|
||||
.set ARMsFFTSpec_pBuf, 12
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
@// Read the size from structure and take log
|
||||
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
|
||||
|
||||
@// Read other structure parameters
|
||||
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
|
||||
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
|
||||
|
||||
@// N=1 Treat seperately
|
||||
CMP N,#1
|
||||
BGT sizeGreaterThanOne
|
||||
VLD1.F32 dX0[0],[pSrc]
|
||||
MOV zero,#0
|
||||
VMOV.F32 dzero[0],zero
|
||||
VMOV.F32 dZero[0],zero
|
||||
VST3.F32 {dX0[0],dzero[0],dZero[0]},[pDst]
|
||||
|
||||
B End
|
||||
|
||||
|
||||
|
||||
sizeGreaterThanOne:
|
||||
@// Do a N/2 point complex FFT including the scaling
|
||||
|
||||
MOV N,N,ASR #1 @// N/2 point complex FFT
|
||||
|
||||
CLZ order,N @// N = 2^order
|
||||
RSB order,order,#31
|
||||
MOV subFFTSize,#1
|
||||
@//MOV subFFTNum,N
|
||||
|
||||
CMP order,#3
|
||||
BGT orderGreaterthan3 @// order > 3
|
||||
|
||||
CMP order,#1
|
||||
BGE orderGreaterthan0 @// order > 0
|
||||
VLD1.F32 dX0,[pSrc]
|
||||
VST1.F32 dX0,[pOut]
|
||||
MOV pSrc,pOut
|
||||
MOV argDst,pDst
|
||||
BLT FFTEnd
|
||||
|
||||
orderGreaterthan0:
|
||||
@// set the buffers appropriately for various orders
|
||||
CMP order,#2
|
||||
MOVEQ argDst,pDst
|
||||
MOVNE argDst,pOut
|
||||
@// Pass the first stage destination in RN5
|
||||
MOVNE pOut,pDst
|
||||
MOV argTwiddle,pTwiddle
|
||||
|
||||
CMP order,#1
|
||||
BGT orderGreaterthan1
|
||||
@// order = 1
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
B FFTEnd
|
||||
|
||||
orderGreaterthan1:
|
||||
CMP order,#2
|
||||
BGT orderGreaterthan2
|
||||
@// order =2
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
|
||||
B FFTEnd
|
||||
|
||||
orderGreaterthan2:@// order =3
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
|
||||
|
||||
B FFTEnd
|
||||
|
||||
|
||||
|
||||
orderGreaterthan3:
|
||||
specialScaleCase:
|
||||
|
||||
@// Set input args to fft stages
|
||||
TST order, #2
|
||||
MOVEQ argDst,pDst
|
||||
MOVNE argDst,pOut
|
||||
@// Pass the first stage destination in RN5
|
||||
MOVNE pOut,pDst
|
||||
MOV argTwiddle,pTwiddle
|
||||
|
||||
@//check for even or odd order
|
||||
@// NOTE: The following combination of BL's would work fine even though
|
||||
@// the first BL would corrupt the flags. This is because the end of
|
||||
@// the "grpZeroSetLoop" loop inside
|
||||
@// armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe sets the Z flag
|
||||
@// to EQ
|
||||
|
||||
TST order,#0x00000001
|
||||
BLEQ armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
|
||||
BLNE armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
|
||||
|
||||
CMP subFFTNum,#4
|
||||
BLT FFTEnd
|
||||
|
||||
|
||||
unscaledRadix4Loop:
|
||||
BEQ lastStageUnscaledRadix4
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe
|
||||
CMP subFFTNum,#4
|
||||
B unscaledRadix4Loop
|
||||
|
||||
lastStageUnscaledRadix4:
|
||||
BL armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
|
||||
B FFTEnd
|
||||
|
||||
|
||||
FFTEnd:
|
||||
finalComplexToRealFixup:
|
||||
|
||||
|
||||
@// F(0) = 1/2[Z(0) + Z'(0)] - j [Z(0) - Z'(0)]
|
||||
@// 1/2[(a+jb) + (a-jb)] - j [(a+jb) - (a-jb)]
|
||||
@// 1/2[2a+j0] - j [0+j2b]
|
||||
@// (a+b, 0)
|
||||
|
||||
@// F(N/2) = 1/2[Z(0) + Z'(0)] + j [Z(0) - Z'(0)]
|
||||
@// 1/2[(a+jb) + (a-jb)] + j [(a+jb) - (a-jb)]
|
||||
@// 1/2[2a+j0] + j [0+j2b]
|
||||
@// (a-b, 0)
|
||||
|
||||
@// F(0) and F(N/2)
|
||||
VLD2.F32 {dX0r[0],dX0i[0]},[pSrc]!
|
||||
MOV zero,#0
|
||||
VMOV.F32 dX0r[1],zero
|
||||
MOV step,subFFTSize,LSL #3 @// step = N/2 * 8 bytes
|
||||
VMOV.F32 dX0i[1],zero
|
||||
@// twStep = 3N/8 * 8 bytes pointing to W^1
|
||||
SUB twStep,step,subFFTSize,LSL #1
|
||||
|
||||
VADD.F32 dY0r,dX0r,dX0i @// F(0) = ((Z0.r+Z0.i) , 0)
|
||||
MOV step1,subFFTSize,LSL #2 @// step1 = N/2 * 4 bytes
|
||||
VSUB.F32 dY0i,dX0r,dX0i @// F(N/2) = ((Z0.r-Z0.i) , 0)
|
||||
SUBS subFFTSize,subFFTSize,#2
|
||||
|
||||
VST1.F32 dY0r,[argDst],step
|
||||
ADD pTwiddleTmp,argTwiddle,#8 @// W^2
|
||||
VST1.F32 dY0i,[argDst]!
|
||||
ADD argTwiddle,argTwiddle,twStep @// W^1
|
||||
|
||||
VDUP.F32 dzero,zero
|
||||
SUB argDst,argDst,step
|
||||
|
||||
BLT End
|
||||
BEQ lastElement
|
||||
SUB step,step,#24
|
||||
SUB step1,step1,#8 @// (N/4-1)*8 bytes
|
||||
|
||||
@// F(k) = 1/2[Z(k) + Z'(N/2-k)] -j*W^(k) [Z(k) - Z'(N/2-k)]
|
||||
@// Note: W^k is stored as negative values in the table
|
||||
@// Process 4 elements at a time. E.g: F(1),F(2) and F(N/2-2),F(N/2-1)
|
||||
@// since both of them require Z(1),Z(2) and Z(N/2-2),Z(N/2-1)
|
||||
|
||||
|
||||
ADR t0, HALF
|
||||
VLD1.F32 half[0], [t0]
|
||||
|
||||
evenOddButterflyLoop:
|
||||
|
||||
|
||||
VLD1.F32 dW0r,[argTwiddle],step1
|
||||
VLD1.F32 dW1r,[argTwiddle]!
|
||||
|
||||
VLD2.F32 {dX0r,dX0i},[pSrc],step
|
||||
SUB argTwiddle,argTwiddle,step1
|
||||
VLD2.F32 {dX1r,dX1i},[pSrc]!
|
||||
|
||||
|
||||
|
||||
SUB step1,step1,#8 @// (N/4-2)*8 bytes
|
||||
VLD1.F32 dW0i,[pTwiddleTmp],step1
|
||||
VLD1.F32 dW1i,[pTwiddleTmp]!
|
||||
SUB pSrc,pSrc,step
|
||||
|
||||
SUB pTwiddleTmp,pTwiddleTmp,step1
|
||||
VREV64.F32 dX1r,dX1r
|
||||
VREV64.F32 dX1i,dX1i
|
||||
SUBS subFFTSize,subFFTSize,#4
|
||||
|
||||
|
||||
|
||||
VSUB.F32 dT2,dX0r,dX1r @// a-c
|
||||
SUB step1,step1,#8
|
||||
VADD.F32 dT0,dX0r,dX1r @// a+c
|
||||
VSUB.F32 dT1,dX0i,dX1i @// b-d
|
||||
VADD.F32 dT3,dX0i,dX1i @// b+d
|
||||
VMUL.F32 dT0,dT0,half[0]
|
||||
VMUL.F32 dT1,dT1,half[0]
|
||||
VZIP.F32 dW1r,dW1i
|
||||
VZIP.F32 dW0r,dW0i
|
||||
|
||||
|
||||
VMUL.F32 qT0,dW1r,dT2
|
||||
VMUL.F32 qT1,dW1r,dT3
|
||||
VMUL.F32 qT2,dW0r,dT2
|
||||
VMUL.F32 qT3,dW0r,dT3
|
||||
|
||||
VMLA.F32 qT0,dW1i,dT3
|
||||
VMLS.F32 qT1,dW1i,dT2
|
||||
|
||||
VMLS.F32 qT2,dW0i,dT3
|
||||
VMLA.F32 qT3,dW0i,dT2
|
||||
|
||||
|
||||
VMUL.F32 dX1r,qT0,half[0]
|
||||
VMUL.F32 dX1i,qT1,half[0]
|
||||
|
||||
VSUB.F32 dY1r,dT0,dX1i @// F(N/2 -1)
|
||||
VADD.F32 dY1i,dT1,dX1r
|
||||
VNEG.F32 dY1i,dY1i
|
||||
|
||||
VREV64.F32 dY1r,dY1r
|
||||
VREV64.F32 dY1i,dY1i
|
||||
|
||||
|
||||
VMUL.F32 dX0r,qT2,half[0]
|
||||
VMUL.F32 dX0i,qT3,half[0]
|
||||
|
||||
VSUB.F32 dY0r,dT0,dX0i @// F(1)
|
||||
VADD.F32 dY0i,dT1,dX0r
|
||||
|
||||
|
||||
VST2.F32 {dY0r,dY0i},[argDst],step
|
||||
VST2.F32 {dY1r,dY1i},[argDst]!
|
||||
SUB argDst,argDst,step
|
||||
SUB step,step,#32 @// (N/2-4)*8 bytes
|
||||
|
||||
|
||||
BGT evenOddButterflyLoop
|
||||
|
||||
@// set both the ptrs to the last element
|
||||
SUB pSrc,pSrc,#8
|
||||
SUB argDst,argDst,#8
|
||||
|
||||
|
||||
|
||||
@// Last element can be expanded as follows
|
||||
@// 1/2[Z(k) + Z'(k)] + j w^k [Z(k) - Z'(k)]
|
||||
@// 1/2[(a+jb) + (a-jb)] + j w^k [(a+jb) - (a-jb)]
|
||||
@// 1/2[2a+j0] + j (c+jd) [0+j2b]
|
||||
@// (a-bc, -bd)
|
||||
@// Since (c,d) = (0,1) for the last element, result is just (a,-b)
|
||||
|
||||
lastElement:
|
||||
VLD1.F32 dX0r,[pSrc]
|
||||
|
||||
VST1.F32 dX0r[0],[argDst]!
|
||||
VNEG.F32 dX0r,dX0r
|
||||
VST1.F32 dX0r[1],[argDst]!
|
||||
|
||||
End:
|
||||
@// Set return value
|
||||
MOV result, #OMX_Sts_NoErr
|
||||
|
||||
@// Write function tail
|
||||
M_END
|
||||
HALF: .float 0.5
|
||||
.end
|
@ -1,49 +0,0 @@
|
||||
/*
|
||||
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
*
|
||||
* Use of this source code is governed by a BSD-style license
|
||||
* that can be found in the LICENSE file in the root of the source
|
||||
* tree. An additional intellectual property rights grant can be found
|
||||
* in the file PATENTS. All contributing project authors may
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*
|
||||
*/
|
||||
|
||||
#include "dl/api/armOMX.h"
|
||||
#include "dl/api/omxtypes.h"
|
||||
#include "dl/sp/api/armSP.h"
|
||||
#include "dl/sp/api/omxSP.h"
|
||||
|
||||
/**
|
||||
* Function: omxSP_FFTGetBufSize_R_F32
|
||||
*
|
||||
* Description:
|
||||
* Computes the size of the specification structure required for the length
|
||||
* 2^order real FFT and IFFT functions.
|
||||
*
|
||||
* Remarks:
|
||||
* This function is used in conjunction with the 32-bit functions
|
||||
* <FFTFwd_RToCCS_F32_Sfs> and <FFTInv_CCSToR_F32_Sfs>.
|
||||
*
|
||||
* Parameters:
|
||||
* [in] order base-2 logarithm of the length; valid in the range
|
||||
* [1,12]. ([1,15] if BIG_FFT_TABLE is defined.)
|
||||
* [out] pSize pointer to the number of bytes required for the
|
||||
* specification structure.
|
||||
*
|
||||
* Return Value:
|
||||
* Standard omxError result. See enumeration for possible result codes.
|
||||
*
|
||||
*/
|
||||
|
||||
OMXResult omxSP_FFTGetBufSize_R_F32(OMX_INT order, OMX_INT *pSize) {
|
||||
if (!pSize || (order < 1) || (order > TWIDDLE_TABLE_ORDER))
|
||||
return OMX_Sts_BadArgErr;
|
||||
|
||||
/*
|
||||
* The required size is the same as for R_S32, because the
|
||||
* elements are the same size and because ARMsFFTSpec_R_SC32 is
|
||||
* the same size as ARMsFFTSpec_R_FC32.
|
||||
*/
|
||||
return omxSP_FFTGetBufSize_R_S32(order, pSize);
|
||||
}
|
@ -1,91 +0,0 @@
|
||||
/*
|
||||
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
*
|
||||
* Use of this source code is governed by a BSD-style license
|
||||
* that can be found in the LICENSE file in the root of the source
|
||||
* tree. An additional intellectual property rights grant can be found
|
||||
* in the file PATENTS. All contributing project authors may
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*
|
||||
* This file was originally licensed as follows. It has been
|
||||
* relicensed with permission from the copyright holders.
|
||||
*/
|
||||
|
||||
/**
|
||||
*
|
||||
* File Name: omxSP_FFTGetBufSize_R_S32.c
|
||||
* OpenMAX DL: v1.0.2
|
||||
* Last Modified Revision: 7777
|
||||
* Last Modified Date: Thu, 27 Sep 2007
|
||||
*
|
||||
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
|
||||
*
|
||||
*
|
||||
* Description:
|
||||
* Computes the size of the specification structure required.
|
||||
*/
|
||||
|
||||
#include "dl/api/armOMX.h"
|
||||
#include "dl/api/omxtypes.h"
|
||||
#include "dl/sp/api/armSP.h"
|
||||
#include "dl/sp/api/omxSP.h"
|
||||
|
||||
/**
|
||||
* Function: omxSP_FFTGetBufSize_R_S32
|
||||
*
|
||||
* Description:
|
||||
* Computes the size of the specification structure required for the length
|
||||
* 2^order real FFT and IFFT functions.
|
||||
*
|
||||
* Remarks:
|
||||
* This function is used in conjunction with the 32-bit functions
|
||||
* <FFTFwd_RToCCS_S32_Sfs> and <FFTInv_CCSToR_S32_Sfs>.
|
||||
*
|
||||
* Parameters:
|
||||
* [in] order base-2 logarithm of the length; valid in the range
|
||||
* [0,12].
|
||||
* [out] pSize pointer to the number of bytes required for the
|
||||
* specification structure.
|
||||
*
|
||||
* Return Value:
|
||||
* Standard omxError result. See enumeration for possible result codes.
|
||||
*
|
||||
*/
|
||||
|
||||
OMXResult omxSP_FFTGetBufSize_R_S32(
|
||||
OMX_INT order,
|
||||
OMX_INT *pSize
|
||||
)
|
||||
{
|
||||
OMX_INT NBy2,N,twiddleSize;
|
||||
|
||||
|
||||
/* Check for order zero */
|
||||
if (order == 0)
|
||||
{
|
||||
*pSize = sizeof(ARMsFFTSpec_R_SC32)
|
||||
+ sizeof(OMX_S32) * (2); /* Extra size 'N' is used in FFTInv_CCSToR_S32S16_Sfs as a temporary buf */
|
||||
|
||||
return OMX_Sts_NoErr;
|
||||
}
|
||||
|
||||
NBy2 = 1 << (order - 1);
|
||||
N = NBy2<<1;
|
||||
twiddleSize = 5*N/8; /* 3/4(N/2) + N/4 */
|
||||
|
||||
/* 2 pointers to store bitreversed array and twiddle factor array */
|
||||
*pSize = sizeof(ARMsFFTSpec_R_SC32)
|
||||
/* Twiddle factors */
|
||||
+ sizeof(OMX_SC32) * twiddleSize
|
||||
/* Ping Pong buffer for doing the N/2 point complex FFT */
|
||||
+ sizeof(OMX_S32) * (N<<1) /* Extra size 'N' is used in FFTInv_CCSToR_S32_Sfs as a temporary buf */
|
||||
+ 62 ; /* Extra bytes to get 32 byte alignment of ptwiddle and pBuf */
|
||||
|
||||
|
||||
return OMX_Sts_NoErr;
|
||||
}
|
||||
|
||||
/*****************************************************************************
|
||||
* END OF FILE
|
||||
*****************************************************************************/
|
||||
|
@ -1,210 +0,0 @@
|
||||
/*
|
||||
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
*
|
||||
* Use of this source code is governed by a BSD-style license
|
||||
* that can be found in the LICENSE file in the root of the source
|
||||
* tree. An additional intellectual property rights grant can be found
|
||||
* in the file PATENTS. All contributing project authors may
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*
|
||||
* This is a modification of omxSP_FFTInit_R_S32.c to support float
|
||||
* instead of S32.
|
||||
*/
|
||||
|
||||
#include "dl/api/armOMX.h"
|
||||
#include "dl/api/omxtypes.h"
|
||||
#include "dl/sp/api/armSP.h"
|
||||
#include "dl/sp/api/omxSP.h"
|
||||
|
||||
/**
|
||||
* Function: omxSP_FFTInit_R_F32
|
||||
*
|
||||
* Description:
|
||||
* Initialize the real forward-FFT specification information struct.
|
||||
*
|
||||
* Remarks:
|
||||
* This function is used to initialize the specification structures
|
||||
* for functions <ippsFFTFwd_RToCCS_F32_Sfs> and
|
||||
* <ippsFFTInv_CCSToR_F32_Sfs>. Memory for *pFFTSpec must be
|
||||
* allocated prior to calling this function. The number of bytes
|
||||
* required for *pFFTSpec can be determined using
|
||||
* <FFTGetBufSize_R_F32>.
|
||||
*
|
||||
* Parameters:
|
||||
* [in] order base-2 logarithm of the desired block length;
|
||||
* valid in the range [1,12]. ([1,15] if
|
||||
* BIG_FFT_TABLE is defined.)
|
||||
* [out] pFFTFwdSpec pointer to the initialized specification structure.
|
||||
*
|
||||
* Return Value:
|
||||
* Standard omxError result. See enumeration for possible result codes.
|
||||
*
|
||||
*/
|
||||
OMXResult omxSP_FFTInit_R_F32(OMXFFTSpec_R_F32* pFFTSpec, OMX_INT order) {
|
||||
OMX_INT i;
|
||||
OMX_INT j;
|
||||
OMX_FC32* pTwiddle;
|
||||
OMX_FC32* pTwiddle1;
|
||||
OMX_FC32* pTwiddle2;
|
||||
OMX_FC32* pTwiddle3;
|
||||
OMX_FC32* pTwiddle4;
|
||||
OMX_F32* pBuf;
|
||||
OMX_U16* pBitRev;
|
||||
OMX_U32 pTmp;
|
||||
OMX_INT Nby2;
|
||||
OMX_INT N;
|
||||
OMX_INT M;
|
||||
OMX_INT diff;
|
||||
OMX_INT step;
|
||||
OMX_F32 x;
|
||||
OMX_F32 y;
|
||||
OMX_F32 xNeg;
|
||||
ARMsFFTSpec_R_FC32* pFFTStruct = 0;
|
||||
|
||||
pFFTStruct = (ARMsFFTSpec_R_FC32 *) pFFTSpec;
|
||||
|
||||
/* Validate args */
|
||||
if (!pFFTSpec || (order < 1) || (order > TWIDDLE_TABLE_ORDER))
|
||||
return OMX_Sts_BadArgErr;
|
||||
|
||||
/* Do the initializations */
|
||||
Nby2 = 1 << (order - 1);
|
||||
N = Nby2 << 1;
|
||||
|
||||
/* optimized implementations don't use bitreversal */
|
||||
pBitRev = NULL;
|
||||
|
||||
pTwiddle = (OMX_FC32 *) (sizeof(ARMsFFTSpec_R_SC32) + (OMX_S8*) pFFTSpec);
|
||||
|
||||
/* Align to 32 byte boundary */
|
||||
pTmp = ((OMX_U32)pTwiddle) & 31;
|
||||
if (pTmp)
|
||||
pTwiddle = (OMX_FC32*) ((OMX_S8*)pTwiddle + (32 - pTmp));
|
||||
|
||||
pBuf = (OMX_F32*) (sizeof(OMX_FC32)*(5*N/8) + (OMX_S8*) pTwiddle);
|
||||
|
||||
/* Align to 32 byte boundary */
|
||||
pTmp = ((OMX_U32)pBuf)&31; /* (OMX_U32)pBuf % 32 */
|
||||
if (pTmp)
|
||||
pBuf = (OMX_F32*) ((OMX_S8*)pBuf + (32 - pTmp));
|
||||
|
||||
/*
|
||||
* Filling Twiddle factors :
|
||||
*
|
||||
* exp^(-j*2*PI*k/ (N/2) ) ; k=0,1,2,...,3/4(N/2)
|
||||
*
|
||||
* N/2 point complex FFT is used to compute N point real FFT The
|
||||
* original twiddle table "armSP_FFT_F32TwiddleTable" is of size
|
||||
* (MaxSize/8 + 1) Rest of the values i.e., upto MaxSize are
|
||||
* calculated using the symmetries of sin and cos The max size of
|
||||
* the twiddle table needed is 3/4(N/2) for a radix-4 stage
|
||||
*
|
||||
* W = (-2 * PI) / N
|
||||
* N = 1 << order
|
||||
* W = -PI >> (order - 1)
|
||||
*/
|
||||
|
||||
M = Nby2 >> 3;
|
||||
diff = TWIDDLE_TABLE_ORDER - (order - 1);
|
||||
/* step into the twiddle table for the current order */
|
||||
step = 1 << diff;
|
||||
|
||||
x = armSP_FFT_F32TwiddleTable[0];
|
||||
y = armSP_FFT_F32TwiddleTable[1];
|
||||
xNeg = 1;
|
||||
|
||||
if ((order - 1) >= 3) {
|
||||
/* i = 0 case */
|
||||
pTwiddle[0].Re = x;
|
||||
pTwiddle[0].Im = y;
|
||||
pTwiddle[2*M].Re = -y;
|
||||
pTwiddle[2*M].Im = xNeg;
|
||||
pTwiddle[4*M].Re = xNeg;
|
||||
pTwiddle[4*M].Im = y;
|
||||
|
||||
for (i = 1; i <= M; i++) {
|
||||
j = i*step;
|
||||
|
||||
x = armSP_FFT_F32TwiddleTable[2*j];
|
||||
y = armSP_FFT_F32TwiddleTable[2*j+1];
|
||||
|
||||
pTwiddle[i].Re = x;
|
||||
pTwiddle[i].Im = y;
|
||||
pTwiddle[2*M-i].Re = -y;
|
||||
pTwiddle[2*M-i].Im = -x;
|
||||
pTwiddle[2*M+i].Re = y;
|
||||
pTwiddle[2*M+i].Im = -x;
|
||||
pTwiddle[4*M-i].Re = -x;
|
||||
pTwiddle[4*M-i].Im = y;
|
||||
pTwiddle[4*M+i].Re = -x;
|
||||
pTwiddle[4*M+i].Im = -y;
|
||||
pTwiddle[6*M-i].Re = y;
|
||||
pTwiddle[6*M-i].Im = x;
|
||||
}
|
||||
} else if ((order - 1) == 2) {
|
||||
pTwiddle[0].Re = x;
|
||||
pTwiddle[0].Im = y;
|
||||
pTwiddle[1].Re = -y;
|
||||
pTwiddle[1].Im = xNeg;
|
||||
pTwiddle[2].Re = xNeg;
|
||||
pTwiddle[2].Im = y;
|
||||
} else if ((order-1) == 1) {
|
||||
pTwiddle[0].Re = x;
|
||||
pTwiddle[0].Im = y;
|
||||
}
|
||||
|
||||
/*
|
||||
* Now fill the last N/4 values : exp^(-j*2*PI*k/N) ;
|
||||
* k=1,3,5,...,N/2-1 These are used for the final twiddle fix-up for
|
||||
* converting complex to real FFT
|
||||
*/
|
||||
|
||||
M = N >> 3;
|
||||
diff = TWIDDLE_TABLE_ORDER - order;
|
||||
step = 1 << diff;
|
||||
|
||||
pTwiddle1 = pTwiddle + 3*N/8;
|
||||
pTwiddle4 = pTwiddle1 + (N/4 - 1);
|
||||
pTwiddle3 = pTwiddle1 + N/8;
|
||||
pTwiddle2 = pTwiddle1 + (N/8 - 1);
|
||||
|
||||
x = armSP_FFT_F32TwiddleTable[0];
|
||||
y = armSP_FFT_F32TwiddleTable[1];
|
||||
xNeg = 1;
|
||||
|
||||
if (order >=3) {
|
||||
for (i = 1; i <= M; i += 2) {
|
||||
j = i*step;
|
||||
|
||||
x = armSP_FFT_F32TwiddleTable[2*j];
|
||||
y = armSP_FFT_F32TwiddleTable[2*j+1];
|
||||
|
||||
pTwiddle1[0].Re = x;
|
||||
pTwiddle1[0].Im = y;
|
||||
pTwiddle1 += 1;
|
||||
pTwiddle2[0].Re = -y;
|
||||
pTwiddle2[0].Im = -x;
|
||||
pTwiddle2 -= 1;
|
||||
pTwiddle3[0].Re = y;
|
||||
pTwiddle3[0].Im = -x;
|
||||
pTwiddle3 += 1;
|
||||
pTwiddle4[0].Re = -x;
|
||||
pTwiddle4[0].Im = y;
|
||||
pTwiddle4 -= 1;
|
||||
}
|
||||
} else {
|
||||
if (order == 2) {
|
||||
pTwiddle1[0].Re = -y;
|
||||
pTwiddle1[0].Im = xNeg;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* Update the structure */
|
||||
pFFTStruct->N = N;
|
||||
pFFTStruct->pTwiddle = pTwiddle;
|
||||
pFFTStruct->pBitRev = pBitRev;
|
||||
pFFTStruct->pBuf = pBuf;
|
||||
|
||||
return OMX_Sts_NoErr;
|
||||
}
|
@ -1,284 +0,0 @@
|
||||
@//
|
||||
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
|
||||
@//
|
||||
@// Copyright 2016, Mozilla Foundation and contributors
|
||||
@//
|
||||
@// Use of this source code is governed by a BSD-style license
|
||||
@// that can be found in the LICENSE file in the root of the source
|
||||
@// tree. An additional intellectual property rights grant can be found
|
||||
@// in the file PATENTS. All contributing project authors may
|
||||
@// be found in the AUTHORS file in the root of the source tree.
|
||||
@//
|
||||
@// This is a modification of omxSP_FFTInv_CCSToR_S32_Sfs_s.s
|
||||
@// to support float instead of SC32.
|
||||
@//
|
||||
@// It is further modified to produce an "unscaled" version, which
|
||||
@// actually multiplies by two for consistency with the other FFT functions
|
||||
@// in use.
|
||||
@//
|
||||
|
||||
@//
|
||||
@// Description:
|
||||
@// Compute an inverse FFT for a complex signal
|
||||
@//
|
||||
@//
|
||||
|
||||
|
||||
@// Include standard headers
|
||||
|
||||
#include "dl/api/armCOMM_s.h"
|
||||
#include "dl/api/omxtypes_s.h"
|
||||
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe
|
||||
|
||||
|
||||
@// Set debugging level
|
||||
@//DEBUG_ON SETL {TRUE}
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
|
||||
|
||||
@// Guarding implementation by the processor name
|
||||
|
||||
@// Import symbols required from other files
|
||||
@// (For example tables)
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
|
||||
.extern armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
|
||||
|
||||
|
||||
@//Input Registers
|
||||
|
||||
#define pSrc r0
|
||||
#define pDst r1
|
||||
#define pFFTSpec r2
|
||||
#define scale r3
|
||||
|
||||
|
||||
@// Output registers
|
||||
#define result r0
|
||||
|
||||
@//Local Scratch Registers
|
||||
|
||||
#define argTwiddle r1
|
||||
#define argDst r2
|
||||
#define argScale r4
|
||||
#define tmpOrder r4
|
||||
#define pTwiddle r4
|
||||
#define pOut r5
|
||||
#define subFFTSize r7
|
||||
#define subFFTNum r6
|
||||
#define N r6
|
||||
#define order r14
|
||||
#define diff r9
|
||||
@// Total num of radix stages required to comple the FFT
|
||||
#define count r8
|
||||
#define x0r r4
|
||||
#define x0i r5
|
||||
#define diffMinusOne r2
|
||||
#define round r3
|
||||
|
||||
#define pOut1 r2
|
||||
#define size r7
|
||||
#define step r8
|
||||
#define step1 r9
|
||||
#define twStep r10
|
||||
#define pTwiddleTmp r11
|
||||
#define argTwiddle1 r12
|
||||
#define zero r14
|
||||
|
||||
@// Neon registers
|
||||
|
||||
#define dX0 D0
|
||||
#define dShift D1
|
||||
#define dX1 D1
|
||||
#define dY0 D2
|
||||
#define dY1 D3
|
||||
#define dX0r D0
|
||||
#define dX0i D1
|
||||
#define dX1r D2
|
||||
#define dX1i D3
|
||||
#define dW0r D4
|
||||
#define dW0i D5
|
||||
#define dW1r D6
|
||||
#define dW1i D7
|
||||
#define dT0 D8
|
||||
#define dT1 D9
|
||||
#define dT2 D10
|
||||
#define dT3 D11
|
||||
#define qT0 d12
|
||||
#define qT1 d14
|
||||
#define qT2 d16
|
||||
#define qT3 d18
|
||||
#define dY0r D4
|
||||
#define dY0i D5
|
||||
#define dY1r D6
|
||||
#define dY1i D7
|
||||
#define dzero D20
|
||||
|
||||
#define dY2 D4
|
||||
#define dY3 D5
|
||||
#define dW0 D6
|
||||
#define dW1 D7
|
||||
#define dW0Tmp D10
|
||||
#define dW1Neg D11
|
||||
|
||||
#define sN S0.S32
|
||||
#define fN S1
|
||||
@// two must be the same as dScale[0]!
|
||||
#define dScale D2
|
||||
#define two S4
|
||||
|
||||
|
||||
@// Allocate stack memory required by the function
|
||||
M_ALLOC4 complexFFTSize, 4
|
||||
|
||||
@// Write function header
|
||||
M_START omxSP_FFTInv_CCSToR_F32_Sfs_unscaled,r11,d15
|
||||
|
||||
@ Structure offsets for the FFTSpec
|
||||
.set ARMsFFTSpec_N, 0
|
||||
.set ARMsFFTSpec_pBitRev, 4
|
||||
.set ARMsFFTSpec_pTwiddle, 8
|
||||
.set ARMsFFTSpec_pBuf, 12
|
||||
|
||||
@// Define stack arguments
|
||||
|
||||
@// Read the size from structure and take log
|
||||
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
|
||||
|
||||
@// Read other structure parameters
|
||||
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
|
||||
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
|
||||
|
||||
@// N=1 Treat seperately
|
||||
CMP N,#1
|
||||
BGT sizeGreaterThanOne
|
||||
VLD1.F32 dX0[0],[pSrc]
|
||||
VST1.F32 dX0[0],[pDst]
|
||||
|
||||
B End
|
||||
|
||||
sizeGreaterThanOne:
|
||||
|
||||
@// Call the preTwiddle Radix2 stage before doing the compledIFFT
|
||||
|
||||
|
||||
BL armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe
|
||||
|
||||
|
||||
complexIFFT:
|
||||
|
||||
ASR N,N,#1 @// N/2 point complex IFFT
|
||||
M_STR N, complexFFTSize @ Save N for scaling later
|
||||
ADD pSrc,pOut,N,LSL #3 @// set pSrc as pOut1
|
||||
|
||||
CLZ order,N @// N = 2^order
|
||||
RSB order,order,#31
|
||||
MOV subFFTSize,#1
|
||||
@//MOV subFFTNum,N
|
||||
|
||||
CMP order,#3
|
||||
BGT orderGreaterthan3 @// order > 3
|
||||
|
||||
CMP order,#1
|
||||
BGE orderGreaterthan0 @// order > 0
|
||||
|
||||
VLD1.F32 dX0,[pSrc]
|
||||
VST1.F32 dX0,[pDst]
|
||||
MOV pSrc,pDst
|
||||
BLT FFTEnd
|
||||
|
||||
orderGreaterthan0:
|
||||
@// set the buffers appropriately for various orders
|
||||
CMP order,#2
|
||||
MOVNE argDst,pDst
|
||||
MOVEQ argDst,pOut
|
||||
@// Pass the first stage destination in RN5
|
||||
MOVEQ pOut,pDst
|
||||
MOV argTwiddle,pTwiddle
|
||||
|
||||
BGE orderGreaterthan1
|
||||
BLLT armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe @// order = 1
|
||||
B FFTEnd
|
||||
|
||||
orderGreaterthan1:
|
||||
MOV tmpOrder,order @// tmpOrder = RN 4
|
||||
BL armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
|
||||
CMP tmpOrder,#2
|
||||
BLGT armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe
|
||||
BL armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
|
||||
B FFTEnd
|
||||
|
||||
|
||||
orderGreaterthan3:
|
||||
specialScaleCase:
|
||||
|
||||
@// Set input args to fft stages
|
||||
TST order, #2
|
||||
MOVNE argDst,pDst
|
||||
MOVEQ argDst,pOut
|
||||
@// Pass the first stage destination in RN5
|
||||
MOVEQ pOut,pDst
|
||||
MOV argTwiddle,pTwiddle
|
||||
|
||||
@//check for even or odd order
|
||||
@// NOTE: The following combination of BL's would work fine even though
|
||||
@// the first BL would corrupt the flags. This is because the end of
|
||||
@// the "grpZeroSetLoop" loop inside
|
||||
@// armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe sets the Z flag
|
||||
@// to EQ
|
||||
|
||||
TST order,#0x00000001
|
||||
BLEQ armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
|
||||
BLNE armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
|
||||
|
||||
CMP subFFTNum,#4
|
||||
BLT FFTEnd
|
||||
|
||||
|
||||
unscaledRadix4Loop:
|
||||
BEQ lastStageUnscaledRadix4
|
||||
BL armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe
|
||||
CMP subFFTNum,#4
|
||||
B unscaledRadix4Loop
|
||||
|
||||
lastStageUnscaledRadix4:
|
||||
BL armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
|
||||
B FFTEnd
|
||||
|
||||
FFTEnd: @// Does only the scaling
|
||||
@ Scale inverse FFT result by 2 for consistency with other FFTs
|
||||
VMOV.F32 two, #2.0 @ two = dScale[0]
|
||||
|
||||
@// N = subFFTSize ; dataptr = pDst
|
||||
scaleFFTData:
|
||||
VLD1.F32 {dX0},[pSrc] @// pSrc contains pDst pointer
|
||||
SUBS subFFTSize,subFFTSize,#1
|
||||
VMUL.F32 dX0, dX0, dScale[0]
|
||||
VST1.F32 {dX0},[pSrc]!
|
||||
|
||||
BGT scaleFFTData
|
||||
|
||||
|
||||
End:
|
||||
@// Set return value
|
||||
MOV result, #OMX_Sts_NoErr
|
||||
|
||||
@// Write function tail
|
||||
M_END
|
||||
|
||||
|
||||
|
||||
.end
|
@ -104,7 +104,6 @@
|
||||
<li><a href="about:license#jquery">jQuery License</a></li>
|
||||
<li><a href="about:license#k_exp">k_exp License</a></li>
|
||||
<li><a href="about:license#khronos">Khronos group License</a></li>
|
||||
<li><a href="about:license#kiss_fft">Kiss FFT License</a></li>
|
||||
#ifdef MOZ_USE_LIBCXX
|
||||
<li><a href="about:license#libc++">libc++ License</a></li>
|
||||
#endif
|
||||
@ -2041,7 +2040,6 @@ WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
||||
<li><code>gfx/ots/</code></li>
|
||||
<li><code>gfx/ycbcr/</code></li>
|
||||
<li><code>ipc/chromium/</code></li>
|
||||
<li><code>media/openmax_dl/</code></li>
|
||||
<li><code>toolkit/components/reputationservice/</code></li>
|
||||
<li><code>toolkit/components/url-classifier/chromium/</code></li>
|
||||
<li><code>tools/profiler/</code></li>
|
||||
@ -3116,80 +3114,6 @@ OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
SUCH DAMAGE.
|
||||
</pre>
|
||||
|
||||
|
||||
<hr>
|
||||
|
||||
<h1><a id="khronos"></a>Khronos group License</h1>
|
||||
|
||||
<p>This license applies to the following files:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>media/openmax_dl/dl/api/omxtypes.h</code></li>
|
||||
<li><code>media/openmax_dl/dl/sp/api/omxSP.h</code></li>
|
||||
</ul>
|
||||
|
||||
<pre>
|
||||
Copyright 2005-2008 The Khronos Group Inc. All Rights Reserved.
|
||||
|
||||
These materials are protected by copyright laws and contain material
|
||||
proprietary to the Khronos Group, Inc. You may use these materials
|
||||
for implementing Khronos specifications, without altering or removing
|
||||
any trademark, copyright or other notice from the specification.
|
||||
|
||||
Khronos Group makes no, and expressly disclaims any, representations
|
||||
or warranties, express or implied, regarding these materials, including,
|
||||
without limitation, any implied warranties of merchantability or fitness
|
||||
for a particular purpose or non-infringement of any intellectual property.
|
||||
Khronos Group makes no, and expressly disclaims any, warranties, express
|
||||
or implied, regarding the correctness, accuracy, completeness, timeliness,
|
||||
and reliability of these materials.
|
||||
|
||||
Under no circumstances will the Khronos Group, or any of its Promoters,
|
||||
Contributors or Members or their respective partners, officers, directors,
|
||||
employees, agents or representatives be liable for any damages, whether
|
||||
direct, indirect, special or consequential damages for lost revenues,
|
||||
lost profits, or otherwise, arising from or in connection with these
|
||||
materials.
|
||||
|
||||
Khronos and OpenMAX are trademarks of the Khronos Group Inc.
|
||||
</pre>
|
||||
|
||||
<hr>
|
||||
|
||||
<h1><a id="kiss_fft"></a>Kiss FFT License</h1>
|
||||
|
||||
<p>This license applies to files in the directory
|
||||
<code>media/kiss_fft/</code>.</p>
|
||||
|
||||
<pre>
|
||||
Copyright (c) 2003-2010 Mark Borgerding
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
* Neither the author nor the names of any contributors may be used to
|
||||
endorse or promote products derived from this software without specific
|
||||
prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
||||
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
</pre>
|
||||
|
||||
<hr>
|
||||
|
||||
#ifdef MOZ_USE_LIBCXX
|
||||
|
Loading…
Reference in New Issue
Block a user