Bug 1879873 - Remove kiss fft and openmax dl. r=karlt,sylvestre

Differential Revision: https://phabricator.services.mozilla.com/D201600
This commit is contained in:
Paul Adenot 2024-02-28 12:50:26 +00:00
parent 26f74e078a
commit d309b4eb34
39 changed files with 0 additions and 13093 deletions

View File

@ -49,9 +49,6 @@ if not CONFIG["MOZ_SYSTEM_PNG"]:
if not CONFIG["MOZ_SYSTEM_WEBP"]:
external_dirs += ["media/libwebp"]
if CONFIG["TARGET_CPU"] == "arm":
external_dirs += ["media/openmax_dl/dl"]
if CONFIG["MOZ_FFVPX"]:
external_dirs += ["media/ffvpx"]
@ -59,7 +56,6 @@ if CONFIG["MOZ_JXL"]:
external_dirs += ["media/libjxl", "media/highway"]
external_dirs += [
"media/kiss_fft",
"media/libcubeb",
"media/libmkv",
"media/libnestegg",

View File

@ -130,8 +130,6 @@ if CONFIG["TARGET_CPU"] == "aarch64" or CONFIG["BUILD_ARM_NEON"]:
LOCAL_INCLUDES += ["/third_party/xsimd/include"]
SOURCES += ["AudioNodeEngineNEON.cpp"]
SOURCES["AudioNodeEngineNEON.cpp"].flags += CONFIG["NEON_FLAGS"]
if CONFIG["BUILD_ARM_NEON"]:
LOCAL_INCLUDES += ["/media/openmax_dl/dl/api/"]
# Are we targeting x86 or x64? If so, build SSEX files.
if CONFIG["INTEL_ARCHITECTURE"]:

View File

@ -1,123 +0,0 @@
1.3.0 2012-07-18
removed non-standard malloc.h from kiss_fft.h
moved -lm to end of link line
checked various return values
converted python Numeric code to NumPy
fixed test of int32_t on 64 bit OS
added padding in a couple of places to allow SIMD alignment of structs
1.2.9 2010-05-27
threadsafe ( including OpenMP )
first edition of kissfft.hh the C++ template fft engine
1.2.8
Changed memory.h to string.h -- apparently more standard
Added openmp extensions. This can have fairly linear speedups for larger FFT sizes.
1.2.7
Shrank the real-fft memory footprint. Thanks to Galen Seitz.
1.2.6 (Nov 14, 2006) The "thanks to GenArts" release.
Added multi-dimensional real-optimized FFT, see tools/kiss_fftndr
Thanks go to GenArts, Inc. for sponsoring the development.
1.2.5 (June 27, 2006) The "release for no good reason" release.
Changed some harmless code to make some compilers' warnings go away.
Added some more digits to pi -- why not.
Added kiss_fft_next_fast_size() function to help people decide how much to pad.
Changed multidimensional test from 8 dimensions to only 3 to avoid testing
problems with fixed point (sorry Buckaroo Banzai).
1.2.4 (Oct 27, 2005) The "oops, inverse fixed point real fft was borked" release.
Fixed scaling bug for inverse fixed point real fft -- also fixed test code that should've been failing.
Thanks to Jean-Marc Valin for bug report.
Use sys/types.h for more portable types than short,int,long => int16_t,int32_t,int64_t
If your system does not have these, you may need to define them -- but at least it breaks in a
loud and easily fixable way -- unlike silently using the wrong size type.
Hopefully tools/psdpng.c is fixed -- thanks to Steve Kellog for pointing out the weirdness.
1.2.3 (June 25, 2005) The "you want to use WHAT as a sample" release.
Added ability to use 32 bit fixed point samples -- requires a 64 bit intermediate result, a la 'long long'
Added ability to do 4 FFTs in parallel by using SSE SIMD instructions. This is accomplished by
using the __m128 (vector of 4 floats) as kiss_fft_scalar. Define USE_SIMD to use this.
I know, I know ... this is drifting a bit from the "kiss" principle, but the speed advantages
make it worth it for some. Also recent gcc makes it SOO easy to use vectors of 4 floats like a POD type.
1.2.2 (May 6, 2005) The Matthew release
Replaced fixed point division with multiply&shift. Thanks to Jean-Marc Valin for
discussions regarding. Considerable speedup for fixed-point.
Corrected overflow protection in real fft routines when using fixed point.
Finder's Credit goes to Robert Oschler of robodance for pointing me at the bug.
This also led to the CHECK_OVERFLOW_OP macro.
1.2.1 (April 4, 2004)
compiles cleanly with just about every -W warning flag under the sun
reorganized kiss_fft_state so it could be read-only/const. This may be useful for embedded systems
that are willing to predeclare twiddle factors, factorization.
Fixed C_MUL,S_MUL on 16-bit platforms.
tmpbuf will only be allocated if input & output buffers are same
scratchbuf will only be allocated for ffts that are not multiples of 2,3,5
NOTE: The tmpbuf,scratchbuf changes may require synchronization code for multi-threaded apps.
1.2 (Feb 23, 2004)
interface change -- cfg object is forward declaration of struct instead of void*
This maintains type saftey and lets the compiler warn/error about stupid mistakes.
(prompted by suggestion from Erik de Castro Lopo)
small speed improvements
added psdpng.c -- sample utility that will create png spectrum "waterfalls" from an input file
( not terribly useful yet)
1.1.1 (Feb 1, 2004 )
minor bug fix -- only affects odd rank, in-place, multi-dimensional FFTs
1.1 : (Jan 30,2004)
split sample_code/ into test/ and tools/
Removed 2-D fft and added N-D fft (arbitrary)
modified fftutil.c to allow multi-d FFTs
Modified core fft routine to allow an input stride via kiss_fft_stride()
(eased support of multi-D ffts)
Added fast convolution filtering (FIR filtering using overlap-scrap method, with tail scrap)
Add kfc.[ch]: the KISS FFT Cache. It takes care of allocs for you ( suggested by Oscar Lesta ).
1.0.1 (Dec 15, 2003)
fixed bug that occurred when nfft==1. Thanks to Steven Johnson.
1.0 : (Dec 14, 2003)
changed kiss_fft function from using a single buffer, to two buffers.
If the same buffer pointer is supplied for both in and out, kiss will
manage the buffer copies.
added kiss_fft2d and kiss_fftr as separate source files (declarations in kiss_fft.h )
0.4 :(Nov 4,2003) optimized for radix 2,3,4,5
0.3 :(Oct 28, 2003) woops, version 2 didn't actually factor out any radices other than 2.
Thanks to Steven Johnson for finding this one.
0.2 :(Oct 27, 2003) added mixed radix, only radix 2,4 optimized versions
0.1 :(May 19 2003) initial release, radix 2 only

View File

@ -1,11 +0,0 @@
Copyright (c) 2003-2010 Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,134 +0,0 @@
KISS FFT - A mixed-radix Fast Fourier Transform based up on the principle,
"Keep It Simple, Stupid."
There are many great fft libraries already around. Kiss FFT is not trying
to be better than any of them. It only attempts to be a reasonably efficient,
moderately useful FFT that can use fixed or floating data types and can be
incorporated into someone's C program in a few minutes with trivial licensing.
USAGE:
The basic usage for 1-d complex FFT is:
#include "kiss_fft.h"
kiss_fft_cfg cfg = kiss_fft_alloc( nfft ,is_inverse_fft ,0,0 );
while ...
... // put kth sample in cx_in[k].r and cx_in[k].i
kiss_fft( cfg , cx_in , cx_out );
... // transformed. DC is in cx_out[0].r and cx_out[0].i
free(cfg);
Note: frequency-domain data is stored from dc up to 2pi.
so cx_out[0] is the dc bin of the FFT
and cx_out[nfft/2] is the Nyquist bin (if exists)
Declarations are in "kiss_fft.h", along with a brief description of the
functions you'll need to use.
Code definitions for 1d complex FFTs are in kiss_fft.c.
You can do other cool stuff with the extras you'll find in tools/
* multi-dimensional FFTs
* real-optimized FFTs (returns the positive half-spectrum: (nfft/2+1) complex frequency bins)
* fast convolution FIR filtering (not available for fixed point)
* spectrum image creation
The core fft and most tools/ code can be compiled to use float, double,
Q15 short or Q31 samples. The default is float.
BACKGROUND:
I started coding this because I couldn't find a fixed point FFT that didn't
use assembly code. I started with floating point numbers so I could get the
theory straight before working on fixed point issues. In the end, I had a
little bit of code that could be recompiled easily to do ffts with short, float
or double (other types should be easy too).
Once I got my FFT working, I was curious about the speed compared to
a well respected and highly optimized fft library. I don't want to criticize
this great library, so let's call it FFT_BRANDX.
During this process, I learned:
1. FFT_BRANDX has more than 100K lines of code. The core of kiss_fft is about 500 lines (cpx 1-d).
2. It took me an embarrassingly long time to get FFT_BRANDX working.
3. A simple program using FFT_BRANDX is 522KB. A similar program using kiss_fft is 18KB (without optimizing for size).
4. FFT_BRANDX is roughly twice as fast as KISS FFT in default mode.
It is wonderful that free, highly optimized libraries like FFT_BRANDX exist.
But such libraries carry a huge burden of complexity necessary to extract every
last bit of performance.
Sometimes simpler is better, even if it's not better.
FREQUENTLY ASKED QUESTIONS:
Q: Can I use kissfft in a project with a ___ license?
A: Yes. See LICENSE below.
Q: Why don't I get the output I expect?
A: The two most common causes of this are
1) scaling : is there a constant multiplier between what you got and what you want?
2) mixed build environment -- all code must be compiled with same preprocessor
definitions for FIXED_POINT and kiss_fft_scalar
Q: Will you write/debug my code for me?
A: Probably not unless you pay me. I am happy to answer pointed and topical questions, but
I may refer you to a book, a forum, or some other resource.
PERFORMANCE:
(on Athlon XP 2100+, with gcc 2.96, float data type)
Kiss performed 10000 1024-pt cpx ffts in .63 s of cpu time.
For comparison, it took md5sum twice as long to process the same amount of data.
Transforming 5 minutes of CD quality audio takes less than a second (nfft=1024).
DO NOT:
... use Kiss if you need the Fastest Fourier Transform in the World
... ask me to add features that will bloat the code
UNDER THE HOOD:
Kiss FFT uses a time decimation, mixed-radix, out-of-place FFT. If you give it an input buffer
and output buffer that are the same, a temporary buffer will be created to hold the data.
No static data is used. The core routines of kiss_fft are thread-safe (but not all of the tools directory).
No scaling is done for the floating point version (for speed).
Scaling is done both ways for the fixed-point version (for overflow prevention).
Optimized butterflies are used for factors 2,3,4, and 5.
The real (i.e. not complex) optimization code only works for even length ffts. It does two half-length
FFTs in parallel (packed into real&imag), and then combines them via twiddling. The result is
nfft/2+1 complex frequency bins from DC to Nyquist. If you don't know what this means, search the web.
The fast convolution filtering uses the overlap-scrap method, slightly
modified to put the scrap at the tail.
LICENSE:
Revised BSD License, see COPYING for verbiage.
Basically, "free to use&change, give credit where due, no guarantees"
Note this license is compatible with GPL at one end of the spectrum and closed, commercial software at
the other end. See http://www.fsf.org/licensing/licenses
A commercial license is available which removes the requirement for attribution. Contact me for details.
TODO:
*) Add real optimization for odd length FFTs
*) Document/revisit the input/output fft scaling
*) Make doc describing the overlap (tail) scrap fast convolution filtering in kiss_fastfir.c
*) Test all the ./tools/ code with fixed point (kiss_fastfir.c doesn't work, maybe others)
AUTHOR:
Mark Borgerding
Mark@Borgerding.net

View File

@ -1,78 +0,0 @@
If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft
to do 4 *separate* FFTs at once.
Beware! Beyond here there be dragons!
This API is not easy to use, is not well documented, and breaks the KISS principle.
Still reading? Okay, you may get rewarded for your patience with a considerable speedup
(2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops.
The basic idea is to use the packed 4 float __m128 data type as a scalar element.
This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D.
For complex data, the data is interlaced as follows:
rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ...
where "rA0" is the real part of the zeroth sample for signal A
Real-only data is laid out:
rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ...
Compile with gcc flags something like
-O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse
Be aware of SIMD alignment. This is the most likely cause of segfaults.
The code within kissfft uses scratch variables on the stack.
With SIMD, these must have addresses on 16 byte boundaries.
Search on "SIMD alignment" for more info.
Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft.
I have not run it -- use it at your own risk. It appears to do 4xN and Nx4 transpositions
(out of place).
void SSETools::pack128(float* target, float* source, unsigned long size128)
{
__m128* pDest = (__m128*)target;
__m128* pDestEnd = pDest+size128;
float* source0=source;
float* source1=source0+size128;
float* source2=source1+size128;
float* source3=source2+size128;
while(pDest<pDestEnd)
{
*pDest=_mm_set_ps(*source3,*source2,*source1,*source0);
source0++;
source1++;
source2++;
source3++;
pDest++;
}
}
void SSETools::unpack128(float* target, float* source, unsigned long size128)
{
float* pSrc = source;
float* pSrcEnd = pSrc+size128*4;
float* target0=target;
float* target1=target0+size128;
float* target2=target1+size128;
float* target3=target2+size128;
while(pSrc<pSrcEnd)
{
*target0=pSrc[0];
*target1=pSrc[1];
*target2=pSrc[2];
*target3=pSrc[3];
target0++;
target1++;
target2++;
target3++;
pSrc+=4;
}
}

View File

@ -1,164 +0,0 @@
/*
Copyright (c) 2003-2010, Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/* kiss_fft.h
defines kiss_fft_scalar as either short or a float type
and defines
typedef struct { kiss_fft_scalar r; kiss_fft_scalar i; }kiss_fft_cpx; */
#include "kiss_fft.h"
#include <limits.h>
#define MAXFACTORS 32
/* e.g. an fft of length 128 has 4 factors
as far as kissfft is concerned
4*4*4*2
*/
struct kiss_fft_state{
int nfft;
int inverse;
int factors[2*MAXFACTORS];
kiss_fft_cpx twiddles[1];
};
/*
Explanation of macros dealing with complex math:
C_MUL(m,a,b) : m = a*b
C_FIXDIV( c , div ) : if a fixed point impl., c /= div. noop otherwise
C_SUB( res, a,b) : res = a - b
C_SUBFROM( res , a) : res -= a
C_ADDTO( res , a) : res += a
* */
#ifdef FIXED_POINT
#if (FIXED_POINT==32)
# define FRACBITS 31
# define SAMPPROD int64_t
#define SAMP_MAX 2147483647
#else
# define FRACBITS 15
# define SAMPPROD int32_t
#define SAMP_MAX 32767
#endif
#define SAMP_MIN -SAMP_MAX
#if defined(CHECK_OVERFLOW)
# define CHECK_OVERFLOW_OP(a,op,b) \
if ( (SAMPPROD)(a) op (SAMPPROD)(b) > SAMP_MAX || (SAMPPROD)(a) op (SAMPPROD)(b) < SAMP_MIN ) { \
fprintf(stderr,"WARNING:overflow @ " __FILE__ "(%d): (%d " #op" %d) = %ld\n",__LINE__,(a),(b),(SAMPPROD)(a) op (SAMPPROD)(b) ); }
#endif
# define smul(a,b) ( (SAMPPROD)(a)*(b) )
# define sround( x ) (kiss_fft_scalar)( ( (x) + (1<<(FRACBITS-1)) ) >> FRACBITS )
# define S_MUL(a,b) sround( smul(a,b) )
# define C_MUL(m,a,b) \
do{ (m).r = sround( smul((a).r,(b).r) - smul((a).i,(b).i) ); \
(m).i = sround( smul((a).r,(b).i) + smul((a).i,(b).r) ); }while(0)
# define DIVSCALAR(x,k) \
(x) = sround( smul( x, SAMP_MAX/k ) )
# define C_FIXDIV(c,div) \
do { DIVSCALAR( (c).r , div); \
DIVSCALAR( (c).i , div); }while (0)
# define C_MULBYSCALAR( c, s ) \
do{ (c).r = sround( smul( (c).r , s ) ) ;\
(c).i = sround( smul( (c).i , s ) ) ; }while(0)
#else /* not FIXED_POINT*/
# define S_MUL(a,b) ( (a)*(b) )
#define C_MUL(m,a,b) \
do{ (m).r = (a).r*(b).r - (a).i*(b).i;\
(m).i = (a).r*(b).i + (a).i*(b).r; }while(0)
# define C_FIXDIV(c,div) /* NOOP */
# define C_MULBYSCALAR( c, s ) \
do{ (c).r *= (s);\
(c).i *= (s); }while(0)
#endif
#ifndef CHECK_OVERFLOW_OP
# define CHECK_OVERFLOW_OP(a,op,b) /* noop */
#endif
#define C_ADD( res, a,b)\
do { \
CHECK_OVERFLOW_OP((a).r,+,(b).r)\
CHECK_OVERFLOW_OP((a).i,+,(b).i)\
(res).r=(a).r+(b).r; (res).i=(a).i+(b).i; \
}while(0)
#define C_SUB( res, a,b)\
do { \
CHECK_OVERFLOW_OP((a).r,-,(b).r)\
CHECK_OVERFLOW_OP((a).i,-,(b).i)\
(res).r=(a).r-(b).r; (res).i=(a).i-(b).i; \
}while(0)
#define C_ADDTO( res , a)\
do { \
CHECK_OVERFLOW_OP((res).r,+,(a).r)\
CHECK_OVERFLOW_OP((res).i,+,(a).i)\
(res).r += (a).r; (res).i += (a).i;\
}while(0)
#define C_SUBFROM( res , a)\
do {\
CHECK_OVERFLOW_OP((res).r,-,(a).r)\
CHECK_OVERFLOW_OP((res).i,-,(a).i)\
(res).r -= (a).r; (res).i -= (a).i; \
}while(0)
#ifdef FIXED_POINT
# define KISS_FFT_COS(phase) floor(.5+SAMP_MAX * cos (phase))
# define KISS_FFT_SIN(phase) floor(.5+SAMP_MAX * sin (phase))
# define HALF_OF(x) ((x)>>1)
#elif defined(USE_SIMD)
# define KISS_FFT_COS(phase) _mm_set1_ps( cos(phase) )
# define KISS_FFT_SIN(phase) _mm_set1_ps( sin(phase) )
# define HALF_OF(x) ((x)*_mm_set1_ps(.5))
#else
# define KISS_FFT_COS(phase) (kiss_fft_scalar) cos(phase)
# define KISS_FFT_SIN(phase) (kiss_fft_scalar) sin(phase)
# define HALF_OF(x) ((x)*.5)
#endif
#define kf_cexp(x,phase) \
do{ \
(x)->r = KISS_FFT_COS(phase);\
(x)->i = KISS_FFT_SIN(phase);\
}while(0)
/* a debugging function */
#define pcpx(c)\
fprintf(stderr,"%g + %gi\n",(double)((c)->r),(double)((c)->i) )
#ifdef KISS_FFT_USE_ALLOCA
// define this to allow use of alloca instead of malloc for temporary buffers
// Temporary buffers are used in two case:
// 1. FFT sizes that have "bad" factors. i.e. not 2,3 and 5
// 2. "in-place" FFTs. Notice the quotes, since kissfft does not really do an in-place transform.
#include <alloca.h>
#define KISS_FFT_TMP_ALLOC(nbytes) alloca(nbytes)
#define KISS_FFT_TMP_FREE(ptr)
#else
#define KISS_FFT_TMP_ALLOC(nbytes) KISS_FFT_MALLOC(nbytes)
#define KISS_FFT_TMP_FREE(ptr) KISS_FFT_FREE(ptr)
#endif

View File

@ -1,408 +0,0 @@
/*
Copyright (c) 2003-2010, Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "_kiss_fft_guts.h"
/* The guts header contains all the multiplication and addition macros that are defined for
fixed or floating point complex numbers. It also delares the kf_ internal functions.
*/
static void kf_bfly2(
kiss_fft_cpx * Fout,
const size_t fstride,
const kiss_fft_cfg st,
int m
)
{
kiss_fft_cpx * Fout2;
kiss_fft_cpx * tw1 = st->twiddles;
kiss_fft_cpx t;
Fout2 = Fout + m;
do{
C_FIXDIV(*Fout,2); C_FIXDIV(*Fout2,2);
C_MUL (t, *Fout2 , *tw1);
tw1 += fstride;
C_SUB( *Fout2 , *Fout , t );
C_ADDTO( *Fout , t );
++Fout2;
++Fout;
}while (--m);
}
static void kf_bfly4(
kiss_fft_cpx * Fout,
const size_t fstride,
const kiss_fft_cfg st,
const size_t m
)
{
kiss_fft_cpx *tw1,*tw2,*tw3;
kiss_fft_cpx scratch[6];
size_t k=m;
const size_t m2=2*m;
const size_t m3=3*m;
tw3 = tw2 = tw1 = st->twiddles;
do {
C_FIXDIV(*Fout,4); C_FIXDIV(Fout[m],4); C_FIXDIV(Fout[m2],4); C_FIXDIV(Fout[m3],4);
C_MUL(scratch[0],Fout[m] , *tw1 );
C_MUL(scratch[1],Fout[m2] , *tw2 );
C_MUL(scratch[2],Fout[m3] , *tw3 );
C_SUB( scratch[5] , *Fout, scratch[1] );
C_ADDTO(*Fout, scratch[1]);
C_ADD( scratch[3] , scratch[0] , scratch[2] );
C_SUB( scratch[4] , scratch[0] , scratch[2] );
C_SUB( Fout[m2], *Fout, scratch[3] );
tw1 += fstride;
tw2 += fstride*2;
tw3 += fstride*3;
C_ADDTO( *Fout , scratch[3] );
if(st->inverse) {
Fout[m].r = scratch[5].r - scratch[4].i;
Fout[m].i = scratch[5].i + scratch[4].r;
Fout[m3].r = scratch[5].r + scratch[4].i;
Fout[m3].i = scratch[5].i - scratch[4].r;
}else{
Fout[m].r = scratch[5].r + scratch[4].i;
Fout[m].i = scratch[5].i - scratch[4].r;
Fout[m3].r = scratch[5].r - scratch[4].i;
Fout[m3].i = scratch[5].i + scratch[4].r;
}
++Fout;
}while(--k);
}
static void kf_bfly3(
kiss_fft_cpx * Fout,
const size_t fstride,
const kiss_fft_cfg st,
size_t m
)
{
size_t k=m;
const size_t m2 = 2*m;
kiss_fft_cpx *tw1,*tw2;
kiss_fft_cpx scratch[5];
kiss_fft_cpx epi3;
epi3 = st->twiddles[fstride*m];
tw1=tw2=st->twiddles;
do{
C_FIXDIV(*Fout,3); C_FIXDIV(Fout[m],3); C_FIXDIV(Fout[m2],3);
C_MUL(scratch[1],Fout[m] , *tw1);
C_MUL(scratch[2],Fout[m2] , *tw2);
C_ADD(scratch[3],scratch[1],scratch[2]);
C_SUB(scratch[0],scratch[1],scratch[2]);
tw1 += fstride;
tw2 += fstride*2;
Fout[m].r = Fout->r - HALF_OF(scratch[3].r);
Fout[m].i = Fout->i - HALF_OF(scratch[3].i);
C_MULBYSCALAR( scratch[0] , epi3.i );
C_ADDTO(*Fout,scratch[3]);
Fout[m2].r = Fout[m].r + scratch[0].i;
Fout[m2].i = Fout[m].i - scratch[0].r;
Fout[m].r -= scratch[0].i;
Fout[m].i += scratch[0].r;
++Fout;
}while(--k);
}
static void kf_bfly5(
kiss_fft_cpx * Fout,
const size_t fstride,
const kiss_fft_cfg st,
int m
)
{
kiss_fft_cpx *Fout0,*Fout1,*Fout2,*Fout3,*Fout4;
int u;
kiss_fft_cpx scratch[13];
kiss_fft_cpx * twiddles = st->twiddles;
kiss_fft_cpx *tw;
kiss_fft_cpx ya,yb;
ya = twiddles[fstride*m];
yb = twiddles[fstride*2*m];
Fout0=Fout;
Fout1=Fout0+m;
Fout2=Fout0+2*m;
Fout3=Fout0+3*m;
Fout4=Fout0+4*m;
tw=st->twiddles;
for ( u=0; u<m; ++u ) {
C_FIXDIV( *Fout0,5); C_FIXDIV( *Fout1,5); C_FIXDIV( *Fout2,5); C_FIXDIV( *Fout3,5); C_FIXDIV( *Fout4,5);
scratch[0] = *Fout0;
C_MUL(scratch[1] ,*Fout1, tw[u*fstride]);
C_MUL(scratch[2] ,*Fout2, tw[2*u*fstride]);
C_MUL(scratch[3] ,*Fout3, tw[3*u*fstride]);
C_MUL(scratch[4] ,*Fout4, tw[4*u*fstride]);
C_ADD( scratch[7],scratch[1],scratch[4]);
C_SUB( scratch[10],scratch[1],scratch[4]);
C_ADD( scratch[8],scratch[2],scratch[3]);
C_SUB( scratch[9],scratch[2],scratch[3]);
Fout0->r += scratch[7].r + scratch[8].r;
Fout0->i += scratch[7].i + scratch[8].i;
scratch[5].r = scratch[0].r + S_MUL(scratch[7].r,ya.r) + S_MUL(scratch[8].r,yb.r);
scratch[5].i = scratch[0].i + S_MUL(scratch[7].i,ya.r) + S_MUL(scratch[8].i,yb.r);
scratch[6].r = S_MUL(scratch[10].i,ya.i) + S_MUL(scratch[9].i,yb.i);
scratch[6].i = -S_MUL(scratch[10].r,ya.i) - S_MUL(scratch[9].r,yb.i);
C_SUB(*Fout1,scratch[5],scratch[6]);
C_ADD(*Fout4,scratch[5],scratch[6]);
scratch[11].r = scratch[0].r + S_MUL(scratch[7].r,yb.r) + S_MUL(scratch[8].r,ya.r);
scratch[11].i = scratch[0].i + S_MUL(scratch[7].i,yb.r) + S_MUL(scratch[8].i,ya.r);
scratch[12].r = - S_MUL(scratch[10].i,yb.i) + S_MUL(scratch[9].i,ya.i);
scratch[12].i = S_MUL(scratch[10].r,yb.i) - S_MUL(scratch[9].r,ya.i);
C_ADD(*Fout2,scratch[11],scratch[12]);
C_SUB(*Fout3,scratch[11],scratch[12]);
++Fout0;++Fout1;++Fout2;++Fout3;++Fout4;
}
}
/* perform the butterfly for one stage of a mixed radix FFT */
static void kf_bfly_generic(
kiss_fft_cpx * Fout,
const size_t fstride,
const kiss_fft_cfg st,
int m,
int p
)
{
int u,k,q1,q;
kiss_fft_cpx * twiddles = st->twiddles;
kiss_fft_cpx t;
int Norig = st->nfft;
kiss_fft_cpx * scratch = (kiss_fft_cpx*)KISS_FFT_TMP_ALLOC(sizeof(kiss_fft_cpx)*p);
for ( u=0; u<m; ++u ) {
k=u;
for ( q1=0 ; q1<p ; ++q1 ) {
scratch[q1] = Fout[ k ];
C_FIXDIV(scratch[q1],p);
k += m;
}
k=u;
for ( q1=0 ; q1<p ; ++q1 ) {
int twidx=0;
Fout[ k ] = scratch[0];
for (q=1;q<p;++q ) {
twidx += fstride * k;
if (twidx>=Norig) twidx-=Norig;
C_MUL(t,scratch[q] , twiddles[twidx] );
C_ADDTO( Fout[ k ] ,t);
}
k += m;
}
}
KISS_FFT_TMP_FREE(scratch);
}
static
void kf_work(
kiss_fft_cpx * Fout,
const kiss_fft_cpx * f,
const size_t fstride,
int in_stride,
int * factors,
const kiss_fft_cfg st
)
{
kiss_fft_cpx * Fout_beg=Fout;
const int p=*factors++; /* the radix */
const int m=*factors++; /* stage's fft length/p */
const kiss_fft_cpx * Fout_end = Fout + p*m;
#ifdef _OPENMP
// use openmp extensions at the
// top-level (not recursive)
if (fstride==1 && p<=5)
{
int k;
// execute the p different work units in different threads
# pragma omp parallel for
for (k=0;k<p;++k)
kf_work( Fout +k*m, f+ fstride*in_stride*k,fstride*p,in_stride,factors,st);
// all threads have joined by this point
switch (p) {
case 2: kf_bfly2(Fout,fstride,st,m); break;
case 3: kf_bfly3(Fout,fstride,st,m); break;
case 4: kf_bfly4(Fout,fstride,st,m); break;
case 5: kf_bfly5(Fout,fstride,st,m); break;
default: kf_bfly_generic(Fout,fstride,st,m,p); break;
}
return;
}
#endif
if (m==1) {
do{
*Fout = *f;
f += fstride*in_stride;
}while(++Fout != Fout_end );
}else{
do{
// recursive call:
// DFT of size m*p performed by doing
// p instances of smaller DFTs of size m,
// each one takes a decimated version of the input
kf_work( Fout , f, fstride*p, in_stride, factors,st);
f += fstride*in_stride;
}while( (Fout += m) != Fout_end );
}
Fout=Fout_beg;
// recombine the p smaller DFTs
switch (p) {
case 2: kf_bfly2(Fout,fstride,st,m); break;
case 3: kf_bfly3(Fout,fstride,st,m); break;
case 4: kf_bfly4(Fout,fstride,st,m); break;
case 5: kf_bfly5(Fout,fstride,st,m); break;
default: kf_bfly_generic(Fout,fstride,st,m,p); break;
}
}
/* facbuf is populated by p1,m1,p2,m2, ...
where
p[i] * m[i] = m[i-1]
m0 = n */
static
void kf_factor(int n,int * facbuf)
{
int p=4;
double floor_sqrt;
floor_sqrt = floor( sqrt((double)n) );
/*factor out powers of 4, powers of 2, then any remaining primes */
do {
while (n % p) {
switch (p) {
case 4: p = 2; break;
case 2: p = 3; break;
default: p += 2; break;
}
if (p > floor_sqrt)
p = n; /* no more factors, skip to end */
}
n /= p;
*facbuf++ = p;
*facbuf++ = n;
} while (n > 1);
}
/*
*
* User-callable function to allocate all necessary storage space for the fft.
*
* The return value is a contiguous block of memory, allocated with malloc. As such,
* It can be freed with free(), rather than a kiss_fft-specific function.
* */
kiss_fft_cfg kiss_fft_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem )
{
kiss_fft_cfg st=NULL;
size_t memneeded = sizeof(struct kiss_fft_state)
+ sizeof(kiss_fft_cpx)*(nfft-1); /* twiddle factors*/
if ( lenmem==NULL ) {
st = ( kiss_fft_cfg)KISS_FFT_MALLOC( memneeded );
}else{
if (mem != NULL && *lenmem >= memneeded)
st = (kiss_fft_cfg)mem;
*lenmem = memneeded;
}
if (st) {
int i;
st->nfft=nfft;
st->inverse = inverse_fft;
for (i=0;i<nfft;++i) {
const double pi=3.141592653589793238462643383279502884197169399375105820974944;
double phase = -2*pi*i / nfft;
if (st->inverse)
phase *= -1;
kf_cexp(st->twiddles+i, phase );
}
kf_factor(nfft,st->factors);
}
return st;
}
void kiss_fft_stride(kiss_fft_cfg st,const kiss_fft_cpx *fin,kiss_fft_cpx *fout,int in_stride)
{
if (fin == fout) {
//NOTE: this is not really an in-place FFT algorithm.
//It just performs an out-of-place FFT into a temp buffer
kiss_fft_cpx * tmpbuf = (kiss_fft_cpx*)KISS_FFT_TMP_ALLOC( sizeof(kiss_fft_cpx)*st->nfft);
kf_work(tmpbuf,fin,1,in_stride, st->factors,st);
memcpy(fout,tmpbuf,sizeof(kiss_fft_cpx)*st->nfft);
KISS_FFT_TMP_FREE(tmpbuf);
}else{
kf_work( fout, fin, 1,in_stride, st->factors,st );
}
}
void kiss_fft(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout)
{
kiss_fft_stride(cfg,fin,fout,1);
}
void kiss_fft_cleanup(void)
{
// nothing needed any more
}
int kiss_fft_next_fast_size(int n)
{
while(1) {
int m=n;
while ( (m%2) == 0 ) m/=2;
while ( (m%3) == 0 ) m/=3;
while ( (m%5) == 0 ) m/=5;
if (m<=1)
break; /* n is completely factorable by twos, threes, and fives */
n++;
}
return n;
}

View File

@ -1,124 +0,0 @@
#ifndef KISS_FFT_H
#define KISS_FFT_H
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#ifdef __cplusplus
extern "C" {
#endif
/*
ATTENTION!
If you would like a :
-- a utility that will handle the caching of fft objects
-- real-only (no imaginary time component ) FFT
-- a multi-dimensional FFT
-- a command-line utility to perform ffts
-- a command-line utility to perform fast-convolution filtering
Then see kfc.h kiss_fftr.h kiss_fftnd.h fftutil.c kiss_fastfir.c
in the tools/ directory.
*/
#ifdef USE_SIMD
# include <xmmintrin.h>
# define kiss_fft_scalar __m128
#define KISS_FFT_MALLOC(nbytes) _mm_malloc(nbytes,16)
#define KISS_FFT_FREE _mm_free
#else
#define KISS_FFT_MALLOC malloc
#define KISS_FFT_FREE free
#endif
#ifdef FIXED_POINT
#include <sys/types.h>
# if (FIXED_POINT == 32)
# define kiss_fft_scalar int32_t
# else
# define kiss_fft_scalar int16_t
# endif
#else
# ifndef kiss_fft_scalar
/* default is float */
# define kiss_fft_scalar float
# endif
#endif
typedef struct {
kiss_fft_scalar r;
kiss_fft_scalar i;
}kiss_fft_cpx;
typedef struct kiss_fft_state* kiss_fft_cfg;
/*
* kiss_fft_alloc
*
* Initialize a FFT (or IFFT) algorithm's cfg/state buffer.
*
* typical usage: kiss_fft_cfg mycfg=kiss_fft_alloc(1024,0,NULL,NULL);
*
* The return value from fft_alloc is a cfg buffer used internally
* by the fft routine or NULL.
*
* If lenmem is NULL, then kiss_fft_alloc will allocate a cfg buffer using malloc.
* The returned value should be free()d when done to avoid memory leaks.
*
* The state can be placed in a user supplied buffer 'mem':
* If lenmem is not NULL and mem is not NULL and *lenmem is large enough,
* then the function places the cfg in mem and the size used in *lenmem
* and returns mem.
*
* If lenmem is not NULL and ( mem is NULL or *lenmem is not large enough),
* then the function returns NULL and places the minimum cfg
* buffer size in *lenmem.
* */
kiss_fft_cfg kiss_fft_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem);
/*
* kiss_fft(cfg,in_out_buf)
*
* Perform an FFT on a complex input buffer.
* for a forward FFT,
* fin should be f[0] , f[1] , ... ,f[nfft-1]
* fout will be F[0] , F[1] , ... ,F[nfft-1]
* Note that each element is complex and can be accessed like
f[k].r and f[k].i
* */
void kiss_fft(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout);
/*
A more generic version of the above function. It reads its input from every Nth sample.
* */
void kiss_fft_stride(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout,int fin_stride);
/* If kiss_fft_alloc allocated a buffer, it is one contiguous
buffer and can be simply free()d when no longer needed*/
#define kiss_fft_free free
/*
Cleans up some memory that gets managed internally. Not necessary to call, but it might clean up
your compiler output to call this before you exit.
*/
void kiss_fft_cleanup(void);
/*
* Returns the smallest integer k, such that k>=n and k has only "fast" factors (2,3,5)
*/
int kiss_fft_next_fast_size(int n);
/* for real ffts, we need an even size */
#define kiss_fftr_next_fast_size_real(n) \
(kiss_fft_next_fast_size( ((n)+1)>>1)<<1)
#ifdef __cplusplus
}
#endif
#endif

View File

@ -1,159 +0,0 @@
/*
Copyright (c) 2003-2004, Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "kiss_fftr.h"
#include "_kiss_fft_guts.h"
struct kiss_fftr_state{
kiss_fft_cfg substate;
kiss_fft_cpx * tmpbuf;
kiss_fft_cpx * super_twiddles;
#ifdef USE_SIMD
void * pad;
#endif
};
kiss_fftr_cfg kiss_fftr_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem)
{
int i;
kiss_fftr_cfg st = NULL;
size_t subsize, memneeded;
if (nfft & 1) {
fprintf(stderr,"Real FFT optimization must be even.\n");
return NULL;
}
nfft >>= 1;
kiss_fft_alloc (nfft, inverse_fft, NULL, &subsize);
memneeded = sizeof(struct kiss_fftr_state) + subsize + sizeof(kiss_fft_cpx) * ( nfft * 3 / 2);
if (lenmem == NULL) {
st = (kiss_fftr_cfg) KISS_FFT_MALLOC (memneeded);
} else {
if (*lenmem >= memneeded)
st = (kiss_fftr_cfg) mem;
*lenmem = memneeded;
}
if (!st)
return NULL;
st->substate = (kiss_fft_cfg) (st + 1); /*just beyond kiss_fftr_state struct */
st->tmpbuf = (kiss_fft_cpx *) (((char *) st->substate) + subsize);
st->super_twiddles = st->tmpbuf + nfft;
kiss_fft_alloc(nfft, inverse_fft, st->substate, &subsize);
for (i = 0; i < nfft/2; ++i) {
double phase =
-3.14159265358979323846264338327 * ((double) (i+1) / nfft + .5);
if (inverse_fft)
phase *= -1;
kf_cexp (st->super_twiddles+i,phase);
}
return st;
}
void kiss_fftr(kiss_fftr_cfg st,const kiss_fft_scalar *timedata,kiss_fft_cpx *freqdata)
{
/* input buffer timedata is stored row-wise */
int k,ncfft;
kiss_fft_cpx fpnk,fpk,f1k,f2k,tw,tdc;
if ( st->substate->inverse) {
fprintf(stderr,"kiss fft usage error: improper alloc\n");
exit(1);
}
ncfft = st->substate->nfft;
/*perform the parallel fft of two real signals packed in real,imag*/
kiss_fft( st->substate , (const kiss_fft_cpx*)timedata, st->tmpbuf );
/* The real part of the DC element of the frequency spectrum in st->tmpbuf
* contains the sum of the even-numbered elements of the input time sequence
* The imag part is the sum of the odd-numbered elements
*
* The sum of tdc.r and tdc.i is the sum of the input time sequence.
* yielding DC of input time sequence
* The difference of tdc.r - tdc.i is the sum of the input (dot product) [1,-1,1,-1...
* yielding Nyquist bin of input time sequence
*/
tdc.r = st->tmpbuf[0].r;
tdc.i = st->tmpbuf[0].i;
C_FIXDIV(tdc,2);
CHECK_OVERFLOW_OP(tdc.r ,+, tdc.i);
CHECK_OVERFLOW_OP(tdc.r ,-, tdc.i);
freqdata[0].r = tdc.r + tdc.i;
freqdata[ncfft].r = tdc.r - tdc.i;
#ifdef USE_SIMD
freqdata[ncfft].i = freqdata[0].i = _mm_set1_ps(0);
#else
freqdata[ncfft].i = freqdata[0].i = 0;
#endif
for ( k=1;k <= ncfft/2 ; ++k ) {
fpk = st->tmpbuf[k];
fpnk.r = st->tmpbuf[ncfft-k].r;
fpnk.i = - st->tmpbuf[ncfft-k].i;
C_FIXDIV(fpk,2);
C_FIXDIV(fpnk,2);
C_ADD( f1k, fpk , fpnk );
C_SUB( f2k, fpk , fpnk );
C_MUL( tw , f2k , st->super_twiddles[k-1]);
freqdata[k].r = HALF_OF(f1k.r + tw.r);
freqdata[k].i = HALF_OF(f1k.i + tw.i);
freqdata[ncfft-k].r = HALF_OF(f1k.r - tw.r);
freqdata[ncfft-k].i = HALF_OF(tw.i - f1k.i);
}
}
void kiss_fftri(kiss_fftr_cfg st,const kiss_fft_cpx *freqdata,kiss_fft_scalar *timedata)
{
/* input buffer timedata is stored row-wise */
int k, ncfft;
if (st->substate->inverse == 0) {
fprintf (stderr, "kiss fft usage error: improper alloc\n");
exit (1);
}
ncfft = st->substate->nfft;
st->tmpbuf[0].r = freqdata[0].r + freqdata[ncfft].r;
st->tmpbuf[0].i = freqdata[0].r - freqdata[ncfft].r;
C_FIXDIV(st->tmpbuf[0],2);
for (k = 1; k <= ncfft / 2; ++k) {
kiss_fft_cpx fk, fnkc, fek, fok, tmp;
fk = freqdata[k];
fnkc.r = freqdata[ncfft - k].r;
fnkc.i = -freqdata[ncfft - k].i;
C_FIXDIV( fk , 2 );
C_FIXDIV( fnkc , 2 );
C_ADD (fek, fk, fnkc);
C_SUB (tmp, fk, fnkc);
C_MUL (fok, tmp, st->super_twiddles[k-1]);
C_ADD (st->tmpbuf[k], fek, fok);
C_SUB (st->tmpbuf[ncfft - k], fek, fok);
#ifdef USE_SIMD
st->tmpbuf[ncfft - k].i *= _mm_set1_ps(-1.0);
#else
st->tmpbuf[ncfft - k].i *= -1;
#endif
}
kiss_fft (st->substate, st->tmpbuf, (kiss_fft_cpx *) timedata);
}

View File

@ -1,46 +0,0 @@
#ifndef KISS_FTR_H
#define KISS_FTR_H
#include "kiss_fft.h"
#ifdef __cplusplus
extern "C" {
#endif
/*
Real optimized version can save about 45% cpu time vs. complex fft of a real seq.
*/
typedef struct kiss_fftr_state *kiss_fftr_cfg;
kiss_fftr_cfg kiss_fftr_alloc(int nfft,int inverse_fft,void * mem, size_t * lenmem);
/*
nfft must be even
If you don't care to allocate space, use mem = lenmem = NULL
*/
void kiss_fftr(kiss_fftr_cfg cfg,const kiss_fft_scalar *timedata,kiss_fft_cpx *freqdata);
/*
input timedata has nfft scalar points
output freqdata has nfft/2+1 complex points
*/
void kiss_fftri(kiss_fftr_cfg cfg,const kiss_fft_cpx *freqdata,kiss_fft_scalar *timedata);
/*
input freqdata has nfft/2+1 complex points
output timedata has nfft scalar points
*/
#define kiss_fftr_free free
#ifdef __cplusplus
}
#endif
#endif

View File

@ -1,20 +0,0 @@
# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*-
# vim: set filetype=python:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
with Files("**"):
BUG_COMPONENT = ("Core", "Web Audio")
EXPORTS.kiss_fft += [
'kiss_fft.h',
'kiss_fftr.h',
]
SOURCES += [
'kiss_fft.c',
'kiss_fftr.c',
]
FINAL_LIBRARY = 'xul'

View File

@ -1,49 +0,0 @@
schema: 1
bugzilla:
product: Core
component: "Web Audio"
origin:
name: kiss_fft
description: A mixed-radix Fast Fourier Transform
url: https://github.com/mborgerding/kissfft
release: 1c3d6f5aa9eb2bf2f18641f0a7e3e6f5e523a156 (2017-10-25T13:50:40Z).
revision: 1c3d6f5aa9eb2bf2f18641f0a7e3e6f5e523a156
license: BSD-3-Clause
license-file: COPYING
vendoring:
url: https://github.com/mborgerding/kissfft
source-hosting: github
tracking: commit
exclude:
- ".*"
- test
- tools/fftutil.c
- tools/psdpng.c
- "tools/kiss_fftnd*"
- tools/kiss_fastfir.c
- "tools/kfc.*"
- "tools/.*"
- TIPS
- kissfft.hh
- tools/Makefile
- Makefile
keep:
- COPYING
- _kiss_fft_guts.h
- kiss_fft.c
- kiss_fft.h
- tools/kiss_fftr.c
- tools/kiss_fftr.h
update-actions:
- action: move-dir
from: '{vendor_dir}/tools'
to: '{vendor_dir}'

View File

@ -1,39 +0,0 @@
Use of this source code is governed by a BSD-style license that can be
found in the LICENSE file in the root of the source tree. All
contributing project authors may be found in the AUTHORS file in the
root of the source tree.
The files were originally licensed by ARM Limited.
The following files:
* dl/api/omxtypes.h
* dl/sp/api/omxSP.h
are licensed by Khronos:
Copyright (c) 2005-2008,2015 The Khronos Group Inc.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and/or associated documentation files (the
"Materials"), to deal in the Materials without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Materials, and to
permit persons to whom the Materials are furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Materials.
MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
https://www.khronos.org/registry/
THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.

View File

@ -1,3 +0,0 @@
ajm@google.com
kma@google.com
rtoy@google.com

View File

@ -1,19 +0,0 @@
Name: OpenMAX DL
Short Name: OpenMax DL
URL: https://silver.arm.com/download/Software/Graphics/OX000-BU-00010-r1p0-00bet0/OX000-BU-00010-r1p0-00bet0.tgz
Version: 1.0.2
License: BSD
License File: LICENSE
Security Critical: yes
Description:
Implementation of OpenMAX DL spec from ARM. This is used to support
WebAudio for Chromium on Android.
Local Modifications:
Only the FFT routines from the OpenMAX DL package are included. The
code was modified to work with gcc and a new implementation for a
floating-point FFT was added.
The original ARM license is unclear, but Google has obtained
permission to relicense this code under a BSD license.

View File

@ -1,9 +0,0 @@
Bug 1158741 added an omxSP_FFTInv_CCSToR_F32_Sfs_unscaled function as an
optimization which performs the same operation as
omxSP_FFTInv_CCSToR_F32_Sfs except it doesn't scale the results by the
length of the FFT. For consistency with other FFT routines used, it does
multiply the results by two.
The affected files are:
media/openmax_dl/dl/sp/api/omxSP.h
media/openmax_dl/dl/sp/src/omxSP_FFTInv_CCSToR_F32_Sfs_unscaled_s.S

View File

@ -1,417 +0,0 @@
@// -*- Mode: asm; -*-
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This file was originally licensed as follows. It has been
@// relicensed with permission from the copyright holders.
@//
@//
@// File Name: armCOMM_s.h
@// OpenMAX DL: v1.0.2
@// Last Modified Revision: 13871
@// Last Modified Date: Fri, 09 May 2008
@//
@// (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
@//
@//
@//
@// ARM optimized OpenMAX common header file
@//
.set _SBytes, 0 @ Number of scratch bytes on stack
.set _Workspace, 0 @ Stack offset of scratch workspace
.set _RRegList, 0 @ R saved register list (last register number)
.set _DRegList, 0 @ D saved register list (last register number)
@// Work out a list of R saved registers, and how much stack space is needed.
@// gas doesn't support setting a variable to a string, so we set _RRegList to
@// the register number.
.macro _M_GETRREGLIST rreg
.ifeqs "\rreg", ""
@ Nothing needs to be saved
.exitm
.endif
@ If rreg is lr or r4, save lr and r4
.ifeqs "\rreg", "lr"
.set _RRegList, 4
.exitm
.endif
.ifeqs "\rreg", "r4"
.set _RRegList, 4
.exitm
.endif
@ If rreg = r5 or r6, save up to register r6
.ifeqs "\rreg", "r5"
.set _RRegList, 6
.exitm
.endif
.ifeqs "\rreg", "r6"
.set _RRegList, 6
.exitm
.endif
@ If rreg = r7 or r8, save up to register r8
.ifeqs "\rreg", "r7"
.set _RRegList, 8
.exitm
.endif
.ifeqs "\rreg", "r8"
.set _RRegList, 8
.exitm
.endif
@ If rreg = r9 or r10, save up to register r10
.ifeqs "\rreg", "r9"
.set _RRegList, 10
.exitm
.endif
.ifeqs "\rreg", "r10"
.set _RRegList, 10
.exitm
.endif
@ If rreg = r11 or r12, save up to register r12
.ifeqs "\rreg", "r11"
.set _RRegList, 12
.exitm
.endif
.ifeqs "\rreg", "r12"
.set _RRegList, 12
.exitm
.endif
.warning "Unrecognized saved r register limit: \rreg"
.endm
@ Work out list of D saved registers, like for R registers.
.macro _M_GETDREGLIST dreg
.ifeqs "\dreg", ""
.set _DRegList, 0
.exitm
.endif
.ifeqs "\dreg", "d8"
.set _DRegList, 8
.exitm
.endif
.ifeqs "\dreg", "d9"
.set _DRegList, 9
.exitm
.endif
.ifeqs "\dreg", "d10"
.set _DRegList, 10
.exitm
.endif
.ifeqs "\dreg", "d11"
.set _DRegList, 11
.exitm
.endif
.ifeqs "\dreg", "d12"
.set _DRegList, 12
.exitm
.endif
.ifeqs "\dreg", "d13"
.set _DRegList, 13
.exitm
.endif
.ifeqs "\dreg", "d14"
.set _DRegList, 14
.exitm
.endif
.ifeqs "\dreg", "d15"
.set _DRegList, 15
.exitm
.endif
.warning "Unrecognized saved d register limit: \rreg"
.endm
@//////////////////////////////////////////////////////////
@// Function header and footer macros
@//////////////////////////////////////////////////////////
@ Function Header Macro
@ Generates the function prologue
@ Note that functions should all be "stack-moves-once"
@ The FNSTART and FNEND macros should be the only places
@ where the stack moves.
@
@ name = function name
@ rreg = "" don't stack any registers
@ "lr" stack "lr" only
@ "rN" stack registers "r4-rN,lr"
@ dreg = "" don't stack any D registers
@ "dN" stack registers "d8-dN"
@
@ Note: ARM Archicture procedure call standard AAPCS
@ states that r4-r11, sp, d8-d15 must be preserved by
@ a compliant function.
.macro M_START name, rreg, dreg
.set _Workspace, 0
@ Define the function and make it external.
.global \name
#ifndef __clang__
.func \name
#endif
.section .text.\name,"ax",%progbits
.arch armv7-a
.fpu neon
.syntax unified
.object_arch armv4
.align 2
\name :
.fnstart
@ Save specified R registers
_M_GETRREGLIST \rreg
_M_PUSH_RREG
@ Save specified D registers
_M_GETDREGLIST \dreg
_M_PUSH_DREG
@ Ensure size claimed on stack is 8-byte aligned
.if (_SBytes & 7) != 0
.set _SBytes, _SBytes + (8 - (_SBytes & 7))
.endif
.if _SBytes != 0
sub sp, sp, #_SBytes
.endif
.endm
@ Function Footer Macro
@ Generates the function epilogue
.macro M_END
@ Restore the stack pointer to its original value on function entry
.if _SBytes != 0
add sp, sp, #_SBytes
.endif
@ Restore any saved R or D registers.
_M_RET
.fnend
#ifndef __clang__
.endfunc
#endif
@ Reset the global stack tracking variables back to their
@ initial values.
.set _SBytes, 0
.endm
@// Based on the value of _DRegList, push the specified set of registers
@// to the stack. Is there a better way?
.macro _M_PUSH_DREG
.if _DRegList == 8
vpush {d8}
.exitm
.endif
.if _DRegList == 9
vpush {d8-d9}
.exitm
.endif
.if _DRegList == 10
vpush {d8-d10}
.exitm
.endif
.if _DRegList == 11
vpush {d8-d11}
.exitm
.endif
.if _DRegList == 12
vpush {d8-d12}
.exitm
.endif
.if _DRegList == 13
vpush {d8-d13}
.exitm
.endif
.if _DRegList == 14
vpush {d8-d14}
.exitm
.endif
.if _DRegList == 15
vpush {d8-d15}
.exitm
.endif
.endm
@// Based on the value of _RRegList, push the specified set of registers
@// to the stack. Is there a better way?
.macro _M_PUSH_RREG
.if _RRegList == 4
stmfd sp!, {r4, lr}
.exitm
.endif
.if _RRegList == 6
stmfd sp!, {r4-r6, lr}
.exitm
.endif
.if _RRegList == 8
stmfd sp!, {r4-r8, lr}
.exitm
.endif
.if _RRegList == 10
stmfd sp!, {r4-r10, lr}
.exitm
.endif
.if _RRegList == 12
stmfd sp!, {r4-r12, lr}
.exitm
.endif
.endm
@// The opposite of _M_PUSH_DREG
.macro _M_POP_DREG
.if _DRegList == 8
vpop {d8}
.exitm
.endif
.if _DRegList == 9
vpop {d8-d9}
.exitm
.endif
.if _DRegList == 10
vpop {d8-d10}
.exitm
.endif
.if _DRegList == 11
vpop {d8-d11}
.exitm
.endif
.if _DRegList == 12
vpop {d8-d12}
.exitm
.endif
.if _DRegList == 13
vpop {d8-d13}
.exitm
.endif
.if _DRegList == 14
vpop {d8-d14}
.exitm
.endif
.if _DRegList == 15
vpop {d8-d15}
.exitm
.endif
.endm
@// The opposite of _M_PUSH_RREG
.macro _M_POP_RREG cc
.if _RRegList == 0
bx\cc lr
.exitm
.endif
.if _RRegList == 4
ldm\cc\()fd sp!, {r4, pc}
.exitm
.endif
.if _RRegList == 6
ldm\cc\()fd sp!, {r4-r6, pc}
.exitm
.endif
.if _RRegList == 8
ldm\cc\()fd sp!, {r4-r8, pc}
.exitm
.endif
.if _RRegList == 10
ldm\cc\()fd sp!, {r4-r10, pc}
.exitm
.endif
.if _RRegList == 12
ldm\cc\()fd sp!, {r4-r12, pc}
.exitm
.endif
.endm
@ Produce function return instructions
.macro _M_RET cc
_M_POP_DREG \cc
_M_POP_RREG \cc
.endm
@// Allocate 4-byte aligned area of name
@// |name| and size |size| bytes.
.macro M_ALLOC4 name, size
.if (_SBytes & 3) != 0
.set _SBytes, _SBytes + (4 - (_SBytes & 3))
.endif
.set \name\()_F, _SBytes
.set _SBytes, _SBytes + \size
.endm
@ Load word from stack
.macro M_LDR r, a0, a1, a2, a3
_M_DATA "ldr", 4, \r, \a0, \a1, \a2, \a3
.endm
@ Store word to stack
.macro M_STR r, a0, a1, a2, a3
_M_DATA "str", 4, \r, \a0, \a1, \a2, \a3
.endm
@ Macro to perform a data access operation
@ Such as LDR or STR
@ The addressing mode is modified such that
@ 1. If no address is given then the name is taken
@ as a stack offset
@ 2. If the addressing mode is not available for the
@ state being assembled for (eg Thumb) then a suitable
@ addressing mode is substituted.
@
@ On Entry:
@ $i = Instruction to perform (eg "LDRB")
@ $a = Required byte alignment
@ $r = Register(s) to transfer (eg "r1")
@ $a0,$a1,$a2. Addressing mode and condition. One of:
@ label {,cc}
@ [base] {,,,cc}
@ [base, offset]{!} {,,cc}
@ [base, offset, shift]{!} {,cc}
@ [base], offset {,,cc}
@ [base], offset, shift {,cc}
@
@ WARNING: Most of the above are not supported, except the first case.
.macro _M_DATA i, a, r, a0, a1, a2, a3
.set _Offset, _Workspace + \a0\()_F
\i\a1 \r, [sp, #_Offset]
.endm

View File

@ -1,289 +0,0 @@
/*
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*
* This file was originally licensed as follows. It has been
* relicensed with permission from the copyright holders.
*/
/*
*
* File Name: armOMX_ReleaseVersion.h
* OpenMAX DL: v1.0.2
* Last Modified Revision: 15322
* Last Modified Date: Wed, 15 Oct 2008
*
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
*
*
*
* This file allows a version of the OMX DL libraries to be built where some or
* all of the function names can be given a user specified suffix.
*
* You might want to use it where:
*
* - you want to rename a function "out of the way" so that you could replace
* a function with a different version (the original version would still be
* in the library just with a different name - so you could debug the new
* version by comparing it to the output of the old)
*
* - you want to rename all the functions to versions with a suffix so that
* you can include two versions of the library and choose between functions
* at runtime.
*
* e.g. omxIPBM_Copy_U8_C1R could be renamed omxIPBM_Copy_U8_C1R_CortexA8
*
*/
#ifndef _armOMX_H_
#define _armOMX_H_
#define ARMOMX_ENABLE_RENAMING 0
#if ARMOMX_ENABLE_RENAMING
/* We need to define these two macros in order to expand and concatenate the names */
#define OMXCAT2BAR(A, B) omx ## A ## B
#define OMXCATBAR(A, B) OMXCAT2BAR(A, B)
/* Define the suffix to add to all functions - the default is no suffix */
#define BARE_SUFFIX
/* Define what happens to the bare suffix-less functions, down to the sub-domain accuracy */
#define OMXACAAC_SUFFIX BARE_SUFFIX
#define OMXACMP3_SUFFIX BARE_SUFFIX
#define OMXICJP_SUFFIX BARE_SUFFIX
#define OMXIPBM_SUFFIX BARE_SUFFIX
#define OMXIPCS_SUFFIX BARE_SUFFIX
#define OMXIPPP_SUFFIX BARE_SUFFIX
#define OMXSP_SUFFIX BARE_SUFFIX
#define OMXVCCOMM_SUFFIX BARE_SUFFIX
#define OMXVCM4P10_SUFFIX BARE_SUFFIX
#define OMXVCM4P2_SUFFIX BARE_SUFFIX
/* Define what the each bare, un-suffixed OpenMAX API function names is to be renamed */
#define omxACAAC_DecodeChanPairElt OMXCATBAR(ACAAC_DecodeChanPairElt, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeDatStrElt OMXCATBAR(ACAAC_DecodeDatStrElt, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeFillElt OMXCATBAR(ACAAC_DecodeFillElt, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeIsStereo_S32 OMXCATBAR(ACAAC_DecodeIsStereo_S32, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeMsPNS_S32_I OMXCATBAR(ACAAC_DecodeMsPNS_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeMsStereo_S32_I OMXCATBAR(ACAAC_DecodeMsStereo_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_DecodePrgCfgElt OMXCATBAR(ACAAC_DecodePrgCfgElt, OMXACAAC_SUFFIX)
#define omxACAAC_DecodeTNS_S32_I OMXCATBAR(ACAAC_DecodeTNS_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_DeinterleaveSpectrum_S32 OMXCATBAR(ACAAC_DeinterleaveSpectrum_S32, OMXACAAC_SUFFIX)
#define omxACAAC_EncodeTNS_S32_I OMXCATBAR(ACAAC_EncodeTNS_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_LongTermPredict_S32 OMXCATBAR(ACAAC_LongTermPredict_S32, OMXACAAC_SUFFIX)
#define omxACAAC_LongTermReconstruct_S32_I OMXCATBAR(ACAAC_LongTermReconstruct_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_MDCTFwd_S32 OMXCATBAR(ACAAC_MDCTFwd_S32, OMXACAAC_SUFFIX)
#define omxACAAC_MDCTInv_S32_S16 OMXCATBAR(ACAAC_MDCTInv_S32_S16, OMXACAAC_SUFFIX)
#define omxACAAC_NoiselessDecode OMXCATBAR(ACAAC_NoiselessDecode, OMXACAAC_SUFFIX)
#define omxACAAC_QuantInv_S32_I OMXCATBAR(ACAAC_QuantInv_S32_I, OMXACAAC_SUFFIX)
#define omxACAAC_UnpackADIFHeader OMXCATBAR(ACAAC_UnpackADIFHeader, OMXACAAC_SUFFIX)
#define omxACAAC_UnpackADTSFrameHeader OMXCATBAR(ACAAC_UnpackADTSFrameHeader, OMXACAAC_SUFFIX)
#define omxACMP3_HuffmanDecode_S32 OMXCATBAR(ACMP3_HuffmanDecode_S32, OMXACMP3_SUFFIX)
#define omxACMP3_HuffmanDecodeSfb_S32 OMXCATBAR(ACMP3_HuffmanDecodeSfb_S32, OMXACMP3_SUFFIX)
#define omxACMP3_HuffmanDecodeSfbMbp_S32 OMXCATBAR(ACMP3_HuffmanDecodeSfbMbp_S32, OMXACMP3_SUFFIX)
#define omxACMP3_MDCTInv_S32 OMXCATBAR(ACMP3_MDCTInv_S32, OMXACMP3_SUFFIX)
#define omxACMP3_ReQuantize_S32_I OMXCATBAR(ACMP3_ReQuantize_S32_I, OMXACMP3_SUFFIX)
#define omxACMP3_ReQuantizeSfb_S32_I OMXCATBAR(ACMP3_ReQuantizeSfb_S32_I, OMXACMP3_SUFFIX)
#define omxACMP3_SynthPQMF_S32_S16 OMXCATBAR(ACMP3_SynthPQMF_S32_S16, OMXACMP3_SUFFIX)
#define omxACMP3_UnpackFrameHeader OMXCATBAR(ACMP3_UnpackFrameHeader, OMXACMP3_SUFFIX)
#define omxACMP3_UnpackScaleFactors_S8 OMXCATBAR(ACMP3_UnpackScaleFactors_S8, OMXACMP3_SUFFIX)
#define omxACMP3_UnpackSideInfo OMXCATBAR(ACMP3_UnpackSideInfo, OMXACMP3_SUFFIX)
#define omxICJP_CopyExpand_U8_C3 OMXCATBAR(ICJP_CopyExpand_U8_C3, OMXICJP_SUFFIX)
#define omxICJP_DCTFwd_S16 OMXCATBAR(ICJP_DCTFwd_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTFwd_S16_I OMXCATBAR(ICJP_DCTFwd_S16_I, OMXICJP_SUFFIX)
#define omxICJP_DCTInv_S16 OMXCATBAR(ICJP_DCTInv_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTInv_S16_I OMXCATBAR(ICJP_DCTInv_S16_I, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantFwd_Multiple_S16 OMXCATBAR(ICJP_DCTQuantFwd_Multiple_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantFwd_S16 OMXCATBAR(ICJP_DCTQuantFwd_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantFwd_S16_I OMXCATBAR(ICJP_DCTQuantFwd_S16_I, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantFwdTableInit OMXCATBAR(ICJP_DCTQuantFwdTableInit, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantInv_Multiple_S16 OMXCATBAR(ICJP_DCTQuantInv_Multiple_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantInv_S16 OMXCATBAR(ICJP_DCTQuantInv_S16, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantInv_S16_I OMXCATBAR(ICJP_DCTQuantInv_S16_I, OMXICJP_SUFFIX)
#define omxICJP_DCTQuantInvTableInit OMXCATBAR(ICJP_DCTQuantInvTableInit, OMXICJP_SUFFIX)
#define omxICJP_DecodeHuffman8x8_Direct_S16_C1 OMXCATBAR(ICJP_DecodeHuffman8x8_Direct_S16_C1, OMXICJP_SUFFIX)
#define omxICJP_DecodeHuffmanSpecGetBufSize_U8 OMXCATBAR(ICJP_DecodeHuffmanSpecGetBufSize_U8, OMXICJP_SUFFIX)
#define omxICJP_DecodeHuffmanSpecInit_U8 OMXCATBAR(ICJP_DecodeHuffmanSpecInit_U8, OMXICJP_SUFFIX)
#define omxICJP_EncodeHuffman8x8_Direct_S16_U1_C1 OMXCATBAR(ICJP_EncodeHuffman8x8_Direct_S16_U1_C1, OMXICJP_SUFFIX)
#define omxICJP_EncodeHuffmanSpecGetBufSize_U8 OMXCATBAR(ICJP_EncodeHuffmanSpecGetBufSize_U8, OMXICJP_SUFFIX)
#define omxICJP_EncodeHuffmanSpecInit_U8 OMXCATBAR(ICJP_EncodeHuffmanSpecInit_U8, OMXICJP_SUFFIX)
#define omxIPBM_AddC_U8_C1R_Sfs OMXCATBAR(IPBM_AddC_U8_C1R_Sfs, OMXIPBM_SUFFIX)
#define omxIPBM_Copy_U8_C1R OMXCATBAR(IPBM_Copy_U8_C1R, OMXIPBM_SUFFIX)
#define omxIPBM_Copy_U8_C3R OMXCATBAR(IPBM_Copy_U8_C3R, OMXIPBM_SUFFIX)
#define omxIPBM_Mirror_U8_C1R OMXCATBAR(IPBM_Mirror_U8_C1R, OMXIPBM_SUFFIX)
#define omxIPBM_MulC_U8_C1R_Sfs OMXCATBAR(IPBM_MulC_U8_C1R_Sfs, OMXIPBM_SUFFIX)
#define omxIPCS_ColorTwistQ14_U8_C3R OMXCATBAR(IPCS_ColorTwistQ14_U8_C3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR565ToYCbCr420LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr420LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR565ToYCbCr422LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr422LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR565ToYCbCr444LS_MCU_U16_S16_C3P3R OMXCATBAR(IPCS_BGR565ToYCbCr444LS_MCU_U16_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR888ToYCbCr420LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr420LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR888ToYCbCr422LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr422LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_BGR888ToYCbCr444LS_MCU_U8_S16_C3P3R OMXCATBAR(IPCS_BGR888ToYCbCr444LS_MCU_U8_S16_C3P3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr420RszCscRotBGR_U8_P3C3R OMXCATBAR(IPCS_YCbCr420RszCscRotBGR_U8_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr420RszRot_U8_P3R OMXCATBAR(IPCS_YCbCr420RszRot_U8_P3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr420ToBGR565_U8_U16_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR565_U8_U16_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr420ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr420ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr420ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422RszCscRotBGR_U8_P3C3R OMXCATBAR(IPCS_YCbCr422RszCscRotBGR_U8_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_CbYCrY422RszCscRotBGR_U8_U16_C2R OMXCATBAR(IPCS_CbYCrY422RszCscRotBGR_U8_U16_C2R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422RszRot_U8_P3R OMXCATBAR(IPCS_YCbCr422RszRot_U8_P3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbYCr422ToBGR565_U8_U16_C2C3R OMXCATBAR(IPCS_YCbYCr422ToBGR565_U8_U16_C2C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbYCr422ToBGR888_U8_C2C3R OMXCATBAR(IPCS_YCbYCr422ToBGR888_U8_C2C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R OMXCATBAR(IPCS_YCbCr422ToBGR888LS_MCU_S16_U8_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_CbYCrY422ToYCbCr420Rotate_U8_C2P3R OMXCATBAR(IPCS_CbYCrY422ToYCbCr420Rotate_U8_C2P3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr422ToYCbCr420Rotate_U8_P3R OMXCATBAR(IPCS_YCbCr422ToYCbCr420Rotate_U8_P3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr444ToBGR565_U8_U16_C3R OMXCATBAR(IPCS_YCbCr444ToBGR565_U8_U16_C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr444ToBGR565_U8_U16_P3C3R OMXCATBAR(IPCS_YCbCr444ToBGR565_U8_U16_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr444ToBGR565LS_MCU_S16_U16_P3C3R OMXCATBAR(IPCS_YCbCr444ToBGR565LS_MCU_S16_U16_P3C3R, OMXIPCS_SUFFIX)
#define omxIPCS_YCbCr444ToBGR888_U8_C3R OMXCATBAR(IPCS_YCbCr444ToBGR888_U8_C3R, OMXIPCS_SUFFIX)
#define omxIPPP_Deblock_HorEdge_U8_I OMXCATBAR(IPPP_Deblock_HorEdge_U8_I, OMXIPPP_SUFFIX)
#define omxIPPP_Deblock_VerEdge_U8_I OMXCATBAR(IPPP_Deblock_VerEdge_U8_I, OMXIPPP_SUFFIX)
#define omxIPPP_FilterFIR_U8_C1R OMXCATBAR(IPPP_FilterFIR_U8_C1R, OMXIPPP_SUFFIX)
#define omxIPPP_FilterMedian_U8_C1R OMXCATBAR(IPPP_FilterMedian_U8_C1R, OMXIPPP_SUFFIX)
#define omxIPPP_GetCentralMoment_S64 OMXCATBAR(IPPP_GetCentralMoment_S64, OMXIPPP_SUFFIX)
#define omxIPPP_GetSpatialMoment_S64 OMXCATBAR(IPPP_GetSpatialMoment_S64, OMXIPPP_SUFFIX)
#define omxIPPP_MomentGetStateSize OMXCATBAR(IPPP_MomentGetStateSize, OMXIPPP_SUFFIX)
#define omxIPPP_MomentInit OMXCATBAR(IPPP_MomentInit, OMXIPPP_SUFFIX)
#define omxIPPP_Moments_U8_C1R OMXCATBAR(IPPP_Moments_U8_C1R, OMXIPPP_SUFFIX)
#define omxIPPP_Moments_U8_C3R OMXCATBAR(IPPP_Moments_U8_C3R, OMXIPPP_SUFFIX)
#define omxSP_BlockExp_S16 OMXCATBAR(SP_BlockExp_S16, OMXSP_SUFFIX)
#define omxSP_BlockExp_S32 OMXCATBAR(SP_BlockExp_S32, OMXSP_SUFFIX)
#define omxSP_Copy_S16 OMXCATBAR(SP_Copy_S16, OMXSP_SUFFIX)
#define omxSP_DotProd_S16 OMXCATBAR(SP_DotProd_S16, OMXSP_SUFFIX)
#define omxSP_DotProd_S16_Sfs OMXCATBAR(SP_DotProd_S16_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTFwd_CToC_SC16_Sfs OMXCATBAR(SP_FFTFwd_CToC_SC16_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTFwd_CToC_SC32_Sfs OMXCATBAR(SP_FFTFwd_CToC_SC32_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTFwd_RToCCS_S16S32_Sfs OMXCATBAR(SP_FFTFwd_RToCCS_S16S32_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTFwd_RToCCS_S32_Sfs OMXCATBAR(SP_FFTFwd_RToCCS_S32_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTGetBufSize_C_SC16 OMXCATBAR(SP_FFTGetBufSize_C_SC16, OMXSP_SUFFIX)
#define omxSP_FFTGetBufSize_C_SC32 OMXCATBAR(SP_FFTGetBufSize_C_SC32, OMXSP_SUFFIX)
#define omxSP_FFTGetBufSize_R_S16S32 OMXCATBAR(SP_FFTGetBufSize_R_S16S32, OMXSP_SUFFIX)
#define omxSP_FFTGetBufSize_R_S32 OMXCATBAR(SP_FFTGetBufSize_R_S32, OMXSP_SUFFIX)
#define omxSP_FFTInit_C_SC16 OMXCATBAR(SP_FFTInit_C_SC16, OMXSP_SUFFIX)
#define omxSP_FFTInit_C_SC32 OMXCATBAR(SP_FFTInit_C_SC32, OMXSP_SUFFIX)
#define omxSP_FFTInit_R_S16S32 OMXCATBAR(SP_FFTInit_R_S16S32, OMXSP_SUFFIX)
#define omxSP_FFTInit_R_S32 OMXCATBAR(SP_FFTInit_R_S32, OMXSP_SUFFIX)
#define omxSP_FFTInv_CCSToR_S32_Sfs OMXCATBAR(SP_FFTInv_CCSToR_S32_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTInv_CCSToR_S32S16_Sfs OMXCATBAR(SP_FFTInv_CCSToR_S32S16_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTInv_CToC_SC16_Sfs OMXCATBAR(SP_FFTInv_CToC_SC16_Sfs, OMXSP_SUFFIX)
#define omxSP_FFTInv_CToC_SC32_Sfs OMXCATBAR(SP_FFTInv_CToC_SC32_Sfs, OMXSP_SUFFIX)
#define omxSP_FilterMedian_S32 OMXCATBAR(SP_FilterMedian_S32, OMXSP_SUFFIX)
#define omxSP_FilterMedian_S32_I OMXCATBAR(SP_FilterMedian_S32_I, OMXSP_SUFFIX)
#define omxSP_FIR_Direct_S16 OMXCATBAR(SP_FIR_Direct_S16, OMXSP_SUFFIX)
#define omxSP_FIR_Direct_S16_I OMXCATBAR(SP_FIR_Direct_S16_I, OMXSP_SUFFIX)
#define omxSP_FIR_Direct_S16_ISfs OMXCATBAR(SP_FIR_Direct_S16_ISfs, OMXSP_SUFFIX)
#define omxSP_FIR_Direct_S16_Sfs OMXCATBAR(SP_FIR_Direct_S16_Sfs, OMXSP_SUFFIX)
#define omxSP_FIROne_Direct_S16 OMXCATBAR(SP_FIROne_Direct_S16, OMXSP_SUFFIX)
#define omxSP_FIROne_Direct_S16_I OMXCATBAR(SP_FIROne_Direct_S16_I, OMXSP_SUFFIX)
#define omxSP_FIROne_Direct_S16_ISfs OMXCATBAR(SP_FIROne_Direct_S16_ISfs, OMXSP_SUFFIX)
#define omxSP_FIROne_Direct_S16_Sfs OMXCATBAR(SP_FIROne_Direct_S16_Sfs, OMXSP_SUFFIX)
#define omxSP_IIR_BiQuadDirect_S16 OMXCATBAR(SP_IIR_BiQuadDirect_S16, OMXSP_SUFFIX)
#define omxSP_IIR_BiQuadDirect_S16_I OMXCATBAR(SP_IIR_BiQuadDirect_S16_I, OMXSP_SUFFIX)
#define omxSP_IIR_Direct_S16 OMXCATBAR(SP_IIR_Direct_S16, OMXSP_SUFFIX)
#define omxSP_IIR_Direct_S16_I OMXCATBAR(SP_IIR_Direct_S16_I, OMXSP_SUFFIX)
#define omxSP_IIROne_BiQuadDirect_S16 OMXCATBAR(SP_IIROne_BiQuadDirect_S16, OMXSP_SUFFIX)
#define omxSP_IIROne_BiQuadDirect_S16_I OMXCATBAR(SP_IIROne_BiQuadDirect_S16_I, OMXSP_SUFFIX)
#define omxSP_IIROne_Direct_S16 OMXCATBAR(SP_IIROne_Direct_S16, OMXSP_SUFFIX)
#define omxSP_IIROne_Direct_S16_I OMXCATBAR(SP_IIROne_Direct_S16_I, OMXSP_SUFFIX)
#define omxVCCOMM_Average_16x OMXCATBAR(VCCOMM_Average_16x, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_Average_8x OMXCATBAR(VCCOMM_Average_8x, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_ComputeTextureErrorBlock OMXCATBAR(VCCOMM_ComputeTextureErrorBlock, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_ComputeTextureErrorBlock_SAD OMXCATBAR(VCCOMM_ComputeTextureErrorBlock_SAD, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_Copy16x16 OMXCATBAR(VCCOMM_Copy16x16, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_Copy8x8 OMXCATBAR(VCCOMM_Copy8x8, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_ExpandFrame_I OMXCATBAR(VCCOMM_ExpandFrame_I, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_LimitMVToRect OMXCATBAR(VCCOMM_LimitMVToRect, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_SAD_16x OMXCATBAR(VCCOMM_SAD_16x, OMXVCCOMM_SUFFIX)
#define omxVCCOMM_SAD_8x OMXCATBAR(VCCOMM_SAD_8x, OMXVCCOMM_SUFFIX)
#define omxVCM4P10_Average_4x OMXCATBAR(VCM4P10_Average_4x, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_BlockMatch_Half OMXCATBAR(VCM4P10_BlockMatch_Half, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_BlockMatch_Integer OMXCATBAR(VCM4P10_BlockMatch_Integer, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_BlockMatch_Quarter OMXCATBAR(VCM4P10_BlockMatch_Quarter, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_DeblockChroma_I OMXCATBAR(VCM4P10_DeblockChroma_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_DeblockLuma_I OMXCATBAR(VCM4P10_DeblockLuma_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_DecodeChromaDcCoeffsToPairCAVLC OMXCATBAR(VCM4P10_DecodeChromaDcCoeffsToPairCAVLC, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_DecodeCoeffsToPairCAVLC OMXCATBAR(VCM4P10_DecodeCoeffsToPairCAVLC, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_DequantTransformResidualFromPairAndAdd OMXCATBAR(VCM4P10_DequantTransformResidualFromPairAndAdd, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_FilterDeblockingChroma_HorEdge_I OMXCATBAR(VCM4P10_FilterDeblockingChroma_HorEdge_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_FilterDeblockingChroma_VerEdge_I OMXCATBAR(VCM4P10_FilterDeblockingChroma_VerEdge_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_FilterDeblockingLuma_HorEdge_I OMXCATBAR(VCM4P10_FilterDeblockingLuma_HorEdge_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_FilterDeblockingLuma_VerEdge_I OMXCATBAR(VCM4P10_FilterDeblockingLuma_VerEdge_I, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_GetVLCInfo OMXCATBAR(VCM4P10_GetVLCInfo, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InterpolateChroma OMXCATBAR(VCM4P10_InterpolateChroma, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InterpolateHalfHor_Luma OMXCATBAR(VCM4P10_InterpolateHalfHor_Luma, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InterpolateHalfVer_Luma OMXCATBAR(VCM4P10_InterpolateHalfVer_Luma, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InterpolateLuma OMXCATBAR(VCM4P10_InterpolateLuma, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InvTransformDequant_ChromaDC OMXCATBAR(VCM4P10_InvTransformDequant_ChromaDC, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InvTransformDequant_LumaDC OMXCATBAR(VCM4P10_InvTransformDequant_LumaDC, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_InvTransformResidualAndAdd OMXCATBAR(VCM4P10_InvTransformResidualAndAdd, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_MEGetBufSize OMXCATBAR(VCM4P10_MEGetBufSize, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_MEInit OMXCATBAR(VCM4P10_MEInit, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_MotionEstimationMB OMXCATBAR(VCM4P10_MotionEstimationMB, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_PredictIntra_16x16 OMXCATBAR(VCM4P10_PredictIntra_16x16, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_PredictIntra_4x4 OMXCATBAR(VCM4P10_PredictIntra_4x4, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_PredictIntraChroma_8x8 OMXCATBAR(VCM4P10_PredictIntraChroma_8x8, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SAD_4x OMXCATBAR(VCM4P10_SAD_4x, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SADQuar_16x OMXCATBAR(VCM4P10_SADQuar_16x, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SADQuar_4x OMXCATBAR(VCM4P10_SADQuar_4x, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SADQuar_8x OMXCATBAR(VCM4P10_SADQuar_8x, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SATD_4x4 OMXCATBAR(VCM4P10_SATD_4x4, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_SubAndTransformQDQResidual OMXCATBAR(VCM4P10_SubAndTransformQDQResidual, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_TransformDequantChromaDCFromPair OMXCATBAR(VCM4P10_TransformDequantChromaDCFromPair, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_TransformDequantLumaDCFromPair OMXCATBAR(VCM4P10_TransformDequantLumaDCFromPair, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_TransformQuant_ChromaDC OMXCATBAR(VCM4P10_TransformQuant_ChromaDC, OMXVCM4P10_SUFFIX)
#define omxVCM4P10_TransformQuant_LumaDC OMXCATBAR(VCM4P10_TransformQuant_LumaDC, OMXVCM4P10_SUFFIX)
#define omxVCM4P2_BlockMatch_Half_16x16 OMXCATBAR(VCM4P2_BlockMatch_Half_16x16, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_BlockMatch_Half_8x8 OMXCATBAR(VCM4P2_BlockMatch_Half_8x8, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_BlockMatch_Integer_16x16 OMXCATBAR(VCM4P2_BlockMatch_Integer_16x16, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_BlockMatch_Integer_8x8 OMXCATBAR(VCM4P2_BlockMatch_Integer_8x8, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DCT8x8blk OMXCATBAR(VCM4P2_DCT8x8blk, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodeBlockCoef_Inter OMXCATBAR(VCM4P2_DecodeBlockCoef_Inter, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodeBlockCoef_Intra OMXCATBAR(VCM4P2_DecodeBlockCoef_Intra, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodePadMV_PVOP OMXCATBAR(VCM4P2_DecodePadMV_PVOP, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodeVLCZigzag_Inter OMXCATBAR(VCM4P2_DecodeVLCZigzag_Inter, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodeVLCZigzag_IntraACVLC OMXCATBAR(VCM4P2_DecodeVLCZigzag_IntraACVLC, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_DecodeVLCZigzag_IntraDCVLC OMXCATBAR(VCM4P2_DecodeVLCZigzag_IntraDCVLC, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_EncodeMV OMXCATBAR(VCM4P2_EncodeMV, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_EncodeVLCZigzag_Inter OMXCATBAR(VCM4P2_EncodeVLCZigzag_Inter, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_EncodeVLCZigzag_IntraACVLC OMXCATBAR(VCM4P2_EncodeVLCZigzag_IntraACVLC, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_EncodeVLCZigzag_IntraDCVLC OMXCATBAR(VCM4P2_EncodeVLCZigzag_IntraDCVLC, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_FindMVpred OMXCATBAR(VCM4P2_FindMVpred, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_IDCT8x8blk OMXCATBAR(VCM4P2_IDCT8x8blk, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_MCReconBlock OMXCATBAR(VCM4P2_MCReconBlock, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_MEGetBufSize OMXCATBAR(VCM4P2_MEGetBufSize, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_MEInit OMXCATBAR(VCM4P2_MEInit, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_MotionEstimationMB OMXCATBAR(VCM4P2_MotionEstimationMB, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_PredictReconCoefIntra OMXCATBAR(VCM4P2_PredictReconCoefIntra, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_QuantInter_I OMXCATBAR(VCM4P2_QuantInter_I, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_QuantIntra_I OMXCATBAR(VCM4P2_QuantIntra_I, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_QuantInvInter_I OMXCATBAR(VCM4P2_QuantInvInter_I, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_QuantInvIntra_I OMXCATBAR(VCM4P2_QuantInvIntra_I, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_TransRecBlockCoef_inter OMXCATBAR(VCM4P2_TransRecBlockCoef_inter, OMXVCM4P2_SUFFIX)
#define omxVCM4P2_TransRecBlockCoef_intra OMXCATBAR(VCM4P2_TransRecBlockCoef_intra, OMXVCM4P2_SUFFIX)
#endif /* endif ARMOMX_ENABLE_RENAMING */
#endif /* _armOMX_h_ */

View File

@ -1,286 +0,0 @@
/**
* File: omxtypes.h
* Brief: Defines basic Data types used in OpenMAX v1.0.2 header files.
*
* Copyright (c) 2005-2008,2015 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
* "Materials"), to deal in the Materials without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Materials, and to
* permit persons to whom the Materials are furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
*
*/
#ifndef _OMXTYPES_H_
#define _OMXTYPES_H_
#include <limits.h>
#ifdef __cplusplus
extern "C" {
#endif
/*
* Maximum FFT order supported by the twiddle table. Only used by the
* float FFT routines. Must be consistent with the table in
* armSP_FFT_F32TwiddleTable.c.
*/
#ifdef BIG_FFT_TABLE
#define TWIDDLE_TABLE_ORDER 15
#else
#define TWIDDLE_TABLE_ORDER 12
#endif
#define OMX_IN
#define OMX_OUT
#define OMX_INOUT
typedef enum {
/* Mandatory return codes - use cases are explicitly described for each function */
OMX_Sts_NoErr = 0, /* No error, the function completed successfully */
OMX_Sts_Err = -2, /* Unknown/unspecified error */
OMX_Sts_InvalidBitstreamValErr = -182, /* Invalid value detected during bitstream processing */
OMX_Sts_MemAllocErr = -9, /* Not enough memory allocated for the operation */
OMX_StsACAAC_GainCtrErr = -159, /* AAC: Unsupported gain control data detected */
OMX_StsACAAC_PrgNumErr = -167, /* AAC: Invalid number of elements for one program */
OMX_StsACAAC_CoefValErr = -163, /* AAC: Invalid quantized coefficient value */
OMX_StsACAAC_MaxSfbErr = -162, /* AAC: Invalid maxSfb value in relation to numSwb */
OMX_StsACAAC_PlsDataErr = -160, /* AAC: pulse escape sequence data error */
/* Optional return codes - use cases are explicitly described for each function*/
OMX_Sts_BadArgErr = -5, /* Bad Arguments */
OMX_StsACAAC_TnsNumFiltErr = -157, /* AAC: Invalid number of TNS filters */
OMX_StsACAAC_TnsLenErr = -156, /* AAC: Invalid TNS region length */
OMX_StsACAAC_TnsOrderErr = -155, /* AAC: Invalid order of TNS filter */
OMX_StsACAAC_TnsCoefResErr = -154, /* AAC: Invalid bit-resolution for TNS filter coefficients */
OMX_StsACAAC_TnsCoefErr = -153, /* AAC: Invalid TNS filter coefficients */
OMX_StsACAAC_TnsDirectErr = -152, /* AAC: Invalid TNS filter direction */
OMX_StsICJP_JPEGMarkerErr = -183, /* JPEG marker encountered within an entropy-coded block; */
/* Huffman decoding operation terminated early. */
OMX_StsICJP_JPEGMarker = -181, /* JPEG marker encountered; Huffman decoding */
/* operation terminated early. */
OMX_StsIPPP_ContextMatchErr = -17, /* Context parameter doesn't match to the operation */
OMX_StsSP_EvenMedianMaskSizeErr = -180, /* Even size of the Median Filter mask was replaced by the odd one */
OMX_Sts_MaximumEnumeration = INT_MAX /*Placeholder, forces enum of size OMX_INT*/
} OMXResult; /** Return value or error value returned from a function. Identical to OMX_INT */
/* OMX_U8 */
#if UCHAR_MAX == 0xff
typedef unsigned char OMX_U8;
#elif USHRT_MAX == 0xff
typedef unsigned short int OMX_U8;
#else
#error OMX_U8 undefined
#endif
/* OMX_S8 */
#if SCHAR_MAX == 0x7f
typedef signed char OMX_S8;
#elif SHRT_MAX == 0x7f
typedef signed short int OMX_S8;
#else
#error OMX_S8 undefined
#endif
/* OMX_U16 */
#if USHRT_MAX == 0xffff
typedef unsigned short int OMX_U16;
#elif UINT_MAX == 0xffff
typedef unsigned int OMX_U16;
#else
#error OMX_U16 undefined
#endif
/* OMX_S16 */
#if SHRT_MAX == 0x7fff
typedef signed short int OMX_S16;
#elif INT_MAX == 0x7fff
typedef signed int OMX_S16;
#else
#error OMX_S16 undefined
#endif
/* OMX_U32 */
#if UINT_MAX == 0xffffffff
typedef unsigned int OMX_U32;
#elif LONG_MAX == 0xffffffff
typedef unsigned long int OMX_U32;
#else
#error OMX_U32 undefined
#endif
/* OMX_S32 */
#if INT_MAX == 0x7fffffff
typedef signed int OMX_S32;
#elif LONG_MAX == 0x7fffffff
typedef long signed int OMX_S32;
#else
#error OMX_S32 undefined
#endif
/* OMX_U64 & OMX_S64 */
#if defined( _WIN32 ) || defined ( _WIN64 )
typedef __int64 OMX_S64; /** Signed 64-bit integer */
typedef unsigned __int64 OMX_U64; /** Unsigned 64-bit integer */
#define OMX_MIN_S64 (0x8000000000000000i64)
#define OMX_MIN_U64 (0x0000000000000000i64)
#define OMX_MAX_S64 (0x7FFFFFFFFFFFFFFFi64)
#define OMX_MAX_U64 (0xFFFFFFFFFFFFFFFFi64)
#else
typedef long long OMX_S64; /** Signed 64-bit integer */
typedef unsigned long long OMX_U64; /** Unsigned 64-bit integer */
#define OMX_MIN_S64 (0x8000000000000000LL)
#define OMX_MIN_U64 (0x0000000000000000LL)
#define OMX_MAX_S64 (0x7FFFFFFFFFFFFFFFLL)
#define OMX_MAX_U64 (0xFFFFFFFFFFFFFFFFLL)
#endif
/* OMX_SC8 */
typedef struct
{
OMX_S8 Re; /** Real part */
OMX_S8 Im; /** Imaginary part */
} OMX_SC8; /** Signed 8-bit complex number */
/* OMX_SC16 */
typedef struct
{
OMX_S16 Re; /** Real part */
OMX_S16 Im; /** Imaginary part */
} OMX_SC16; /** Signed 16-bit complex number */
/* OMX_SC32 */
typedef struct
{
OMX_S32 Re; /** Real part */
OMX_S32 Im; /** Imaginary part */
} OMX_SC32; /** Signed 32-bit complex number */
/* OMX_SC64 */
typedef struct
{
OMX_S64 Re; /** Real part */
OMX_S64 Im; /** Imaginary part */
} OMX_SC64; /** Signed 64-bit complex number */
/* OMX_F32 */
typedef float OMX_F32; /** Single precision floating point,IEEE 754 */
/* OMX_F64 */
typedef double OMX_F64; /** Double precision floating point,IEEE 754 */
/* OMX_FC32 */
typedef struct
{
OMX_F32 Re; /** Real part */
OMX_F32 Im; /** Imaginary part */
} OMX_FC32; /** single precision floating point complex number */
/* OMX_FC64 */
typedef struct
{
OMX_F64 Re; /** Real part */
OMX_F64 Im; /** Imaginary part */
} OMX_FC64; /** double precision floating point complex number */
/* OMX_INT */
typedef int OMX_INT; /** signed integer corresponding to machine word length, has maximum signed value INT_MAX*/
#define OMX_MIN_S8 (-128)
#define OMX_MIN_U8 0
#define OMX_MIN_S16 (-32768)
#define OMX_MIN_U16 0
#define OMX_MIN_S32 (-2147483647-1)
#define OMX_MIN_U32 0
#define OMX_MAX_S8 (127)
#define OMX_MAX_U8 (255)
#define OMX_MAX_S16 (32767)
#define OMX_MAX_U16 (0xFFFF)
#define OMX_MAX_S32 (2147483647)
#define OMX_MAX_U32 (0xFFFFFFFF)
typedef void OMXVoid;
#ifndef NULL
#define NULL ((void*)0)
#endif
/** Defines the geometric position and size of a rectangle,
* where x,y defines the coordinates of the top left corner
* of the rectangle, with dimensions width in the x-direction
* and height in the y-direction */
typedef struct {
OMX_INT x; /** x-coordinate of top left corner of rectangle */
OMX_INT y; /** y-coordinate of top left corner of rectangle */
OMX_INT width; /** Width in the x-direction. */
OMX_INT height; /** Height in the y-direction. */
}OMXRect;
/** Defines the geometric position of a point, */
typedef struct
{
OMX_INT x; /** x-coordinate */
OMX_INT y; /** y-coordinate */
} OMXPoint;
/** Defines the dimensions of a rectangle, or region of interest in an image */
typedef struct
{
OMX_INT width; /** Width of the rectangle, in the x-direction */
OMX_INT height; /** Height of the rectangle, in the y-direction */
} OMXSize;
#ifdef __cplusplus
}
#endif
#endif /* _OMXTYPES_H_ */

View File

@ -1,76 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This file was originally licensed as follows. It has been
@// relicensed with permission from the copyright holders.
@//
@//
@// File Name: omxtypes_s.h
@// OpenMAX DL: v1.0.2
@// Last Modified Revision: 9622
@// Last Modified Date: Wed, 06 Feb 2008
@//
@// (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
@//
@//
@// Mandatory return codes - use cases are explicitly described for each function
.equ OMX_Sts_NoErr, 0 @// No error the function completed successfully
.equ OMX_Sts_Err, -2 @// Unknown/unspecified error
.equ OMX_Sts_InvalidBitstreamValErr, -182 @// Invalid value detected during bitstream processing
.equ OMX_Sts_MemAllocErr, -9 @// Not enough memory allocated for the operation
.equ OMX_StsACAAC_GainCtrErr, -159 @// AAC: Unsupported gain control data detected
.equ OMX_StsACAAC_PrgNumErr, -167 @// AAC: Invalid number of elements for one program
.equ OMX_StsACAAC_CoefValErr, -163 @// AAC: Invalid quantized coefficient value
.equ OMX_StsACAAC_MaxSfbErr, -162 @// AAC: Invalid maxSfb value in relation to numSwb
.equ OMX_StsACAAC_PlsDataErr, -160 @// AAC: pulse escape sequence data error
@// Optional return codes - use cases are explicitly described for each function
.equ OMX_Sts_BadArgErr, -5 @// Bad Arguments
.equ OMX_StsACAAC_TnsNumFiltErr, -157 @// AAC: Invalid number of TNS filters
.equ OMX_StsACAAC_TnsLenErr, -156 @// AAC: Invalid TNS region length
.equ OMX_StsACAAC_TnsOrderErr, -155 @// AAC: Invalid order of TNS filter
.equ OMX_StsACAAC_TnsCoefResErr, -154 @// AAC: Invalid bit-resolution for TNS filter coefficients
.equ OMX_StsACAAC_TnsCoefErr, -153 @// AAC: Invalid TNS filter coefficients
.equ OMX_StsACAAC_TnsDirectErr, -152 @// AAC: Invalid TNS filter direction
.equ OMX_StsICJP_JPEGMarkerErr, -183 @// JPEG marker encountered within an entropy-coded block;
@// Huffman decoding operation terminated early.
.equ OMX_StsICJP_JPEGMarker, -181 @// JPEG marker encountered; Huffman decoding
@// operation terminated early.
.equ OMX_StsIPPP_ContextMatchErr, -17 @// Context parameter doesn't match to the operation
.equ OMX_StsSP_EvenMedianMaskSizeErr, -180 @// Even size of the Median Filter mask was replaced by the odd one
.equ OMX_Sts_MaximumEnumeration, 0x7FFFFFFF
.equ OMX_MIN_S8, (-128)
.equ OMX_MIN_U8, 0
.equ OMX_MIN_S16, (-32768)
.equ OMX_MIN_U16, 0
.equ OMX_MIN_S32, (-2147483647-1)
.equ OMX_MIN_U32, 0
.equ OMX_MAX_S8, (127)
.equ OMX_MAX_U8, (255)
.equ OMX_MAX_S16, (32767)
.equ OMX_MAX_U16, (0xFFFF)
.equ OMX_MAX_S32, (2147483647)
.equ OMX_MAX_U32, (0xFFFFFFFF)
.equ OMX_VC_UPPER, 0x1 @// Used by the PredictIntra functions
.equ OMX_VC_LEFT, 0x2 @// Used by the PredictIntra functions
.equ OMX_VC_UPPER_RIGHT, 0x40 @// Used by the PredictIntra functions
.equ NULL, 0

View File

@ -1,49 +0,0 @@
# -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*-
# vim: set filetype=python:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
if CONFIG['TARGET_CPU'] == 'arm' and CONFIG['BUILD_ARM_NEON']:
Library('openmax_dl')
EXPORTS.dl.api += [
'api/armCOMM_s.h',
'api/armOMX.h',
'api/omxtypes.h',
'api/omxtypes_s.h',
]
EXPORTS.dl.sp.api += [
'sp/api/armSP.h',
'sp/api/omxSP.h',
]
SOURCES += [
'sp/src/armSP_FFT_F32TwiddleTable.c',
'sp/src/omxSP_FFTGetBufSize_R_F32.c',
'sp/src/omxSP_FFTGetBufSize_R_S32.c',
'sp/src/omxSP_FFTInit_R_F32.c',
]
SOURCES += [
'sp/src/armSP_FFT_CToC_FC32_Radix2_fs_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix2_ls_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix2_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix4_fs_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix4_ls_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix4_unsafe_s.S',
'sp/src/armSP_FFT_CToC_FC32_Radix8_fs_unsafe_s.S',
'sp/src/armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe_s.S',
'sp/src/omxSP_FFTFwd_RToCCS_F32_Sfs_s.S',
'sp/src/omxSP_FFTInv_CCSToR_F32_Sfs_unscaled_s.S',
]
LOCAL_INCLUDES += [
'..',
'api'
]
DEFINES['BIG_FFT_TABLE'] = True
FINAL_LIBRARY = 'xul'

View File

@ -1,92 +0,0 @@
/*
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*
* This file was originally licensed as follows. It has been
* relicensed with permission from the copyright holders.
*/
/**
*
* File Name: armSP.h
* OpenMAX DL: v1.0.2
* Last Modified Revision: 7014
* Last Modified Date: Wed, 01 Aug 2007
*
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
*
*
*
* File: armSP.h
* Brief: Declares API's/Basic Data types used across the OpenMAX Signal Processing domain
*
*/
#ifndef _armSP_H_
#define _armSP_H_
#include "dl/api/omxtypes.h"
#ifdef __cplusplus
extern "C" {
#endif
/** FFT Specific declarations */
extern OMX_S32 armSP_FFT_S32TwiddleTable[1026];
extern OMX_F32 armSP_FFT_F32TwiddleTable[];
typedef struct ARMsFFTSpec_SC32_Tag
{
OMX_U32 N;
OMX_U16 *pBitRev;
OMX_SC32 *pTwiddle;
OMX_SC32 *pBuf;
}ARMsFFTSpec_SC32;
typedef struct ARMsFFTSpec_SC16_Tag
{
OMX_U32 N;
OMX_U16 *pBitRev;
OMX_SC16 *pTwiddle;
OMX_SC16 *pBuf;
}ARMsFFTSpec_SC16;
typedef struct ARMsFFTSpec_R_SC32_Tag
{
OMX_U32 N;
OMX_U16 *pBitRev;
OMX_SC32 *pTwiddle;
OMX_S32 *pBuf;
}ARMsFFTSpec_R_SC32;
typedef struct ARMsFFTSpec_R_FC32_Tag
{
OMX_U32 N;
OMX_U16* pBitRev;
OMX_FC32* pTwiddle;
OMX_F32* pBuf;
} ARMsFFTSpec_R_FC32;
typedef struct ARMsFFTSpec_FC32_Tag
{
OMX_U32 N;
OMX_U16* pBitRev;
OMX_FC32* pTwiddle;
OMX_FC32* pBuf;
} ARMsFFTSpec_FC32;
#ifdef __cplusplus
}
#endif
#endif
/*End of File*/

File diff suppressed because it is too large Load Diff

View File

@ -1,294 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of
@// armSP_FFTInv_CCSToR_S32_preTwiddleRadix2_unsafe_s.s to support float
@// instead of SC32.
@//
@//
@// Description:
@// Compute the "preTwiddleRadix2" stage prior to the call to the complexFFT
@// It does a Z(k) = Feven(k) + jW^(-k) FOdd(k); k=0,1,2,...N/2-1 computation
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r1
#define pFFTSpec r2
#define scale r3
@// Output registers
#define result r0
@//Local Scratch Registers
#define argTwiddle r1
#define argDst r2
#define argScale r4
#define tmpOrder r4
#define pTwiddle r4
#define pOut r5
#define subFFTSize r7
#define subFFTNum r6
#define N r6
#define order r14
#define diff r9
@// Total num of radix stages required to complete the FFT
#define count r8
#define x0r r4
#define x0i r5
#define diffMinusOne r2
#define round r3
#define pOut1 r2
#define size r7
#define step r8
#define step1 r9
#define twStep r10
#define pTwiddleTmp r11
#define argTwiddle1 r12
#define zero r14
@// Neon registers
#define dX0 D0
#define dShift D1
#define dX1 D1
#define dY0 D2
#define dY1 D3
#define dX0r D0
#define dX0i D1
#define dX1r D2
#define dX1i D3
#define dW0r D4
#define dW0i D5
#define dW1r D6
#define dW1i D7
#define dT0 D8
#define dT1 D9
#define dT2 D10
#define dT3 D11
#define qT0 D12
#define qT1 D14
#define qT2 D16
#define qT3 D18
#define dY0r D4
#define dY0i D5
#define dY1r D6
#define dY1i D7
#define dY2 D4
#define dY3 D5
#define dW0 D6
#define dW1 D7
#define dW0Tmp D10
#define dW1Neg D11
#define half D13
@ Structure offsets for the FFTSpec
.set ARMsFFTSpec_N, 0
.set ARMsFFTSpec_pBitRev, 4
.set ARMsFFTSpec_pTwiddle, 8
.set ARMsFFTSpec_pBuf, 12
.MACRO FFTSTAGE scaled, inverse, name
@// Read the size from structure and take log
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
@// Read other structure parameters
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
VMOV.F32 half, #0.5
MOV size,N,ASR #1 @// preserve the contents of N
MOV step,N,LSL #2 @// step = N/2 * 8 bytes
@// Z(k) = 1/2 {[F(k) + F'(N/2-k)] +j*W^(-k) [F(k) - F'(N/2-k)]}
@// Note: W^(k) is stored as negated value and also need to
@// conjugate the values from the table
@// Z(0) : no need of twiddle multiply
@// Z(0) = 1/2 { [F(0) + F'(N/2)] +j [F(0) - F'(N/2)] }
VLD1.F32 dX0,[pSrc],step
ADD pOut1,pOut,step @// pOut1 = pOut+ N/2*8 bytes
VLD1.F32 dX1,[pSrc]!
@// twStep = 3N/8 * 8 bytes pointing to W^1
SUB twStep,step,size,LSL #1
MOV step1,size,LSL #2 @// step1 = N/4 * 8 = N/2*4 bytes
SUB step1,step1,#8 @// (N/4-1)*8 bytes
VADD.F32 dY0,dX0,dX1 @// [b+d | a+c]
VSUB.F32 dY1,dX0,dX1 @// [b-d | a-c]
VMUL.F32 dY0, dY0, half[0]
VMUL.F32 dY1, dY1, half[0]
@// dY0= [a-c | a+c] ;dY1= [b-d | b+d]
VZIP.F32 dY0,dY1
VSUB.F32 dX0,dY0,dY1
SUBS size,size,#2
VADD.F32 dX1,dY0,dY1
SUB pSrc,pSrc,step
VST1.F32 dX0[0],[pOut1]!
ADD pTwiddleTmp,pTwiddle,#8 @// W^2
VST1.F32 dX1[1],[pOut1]!
ADD argTwiddle1,pTwiddle,twStep @// W^1
BLT decrementScale\name
BEQ lastElement\name
@// Z(k) = 1/2[F(k) + F'(N/2-k)] +j*W^(-k) [F(k) - F'(N/2-k)]
@// Note: W^k is stored as negative values in the table and also
@// need to conjugate the values from the table.
@//
@// Process 4 elements at a time. E.g: Z(1),Z(2) and Z(N/2-2),Z(N/2-1)
@// since both of them require F(1),F(2) and F(N/2-2),F(N/2-1)
SUB step,step,#24
evenOddButterflyLoop\name :
VLD1.F32 dW0r,[argTwiddle1],step1
VLD1.F32 dW1r,[argTwiddle1]!
VLD2.F32 {dX0r,dX0i},[pSrc],step
SUB argTwiddle1,argTwiddle1,step1
VLD2.F32 {dX1r,dX1i},[pSrc]!
SUB step1,step1,#8 @// (N/4-2)*8 bytes
VLD1.F32 dW0i,[pTwiddleTmp],step1
VLD1.F32 dW1i,[pTwiddleTmp]!
SUB pSrc,pSrc,step
SUB pTwiddleTmp,pTwiddleTmp,step1
VREV64.F32 dX1r,dX1r
VREV64.F32 dX1i,dX1i
SUBS size,size,#4
VSUB.F32 dT2,dX0r,dX1r @// a-c
VADD.F32 dT3,dX0i,dX1i @// b+d
VADD.F32 dT0,dX0r,dX1r @// a+c
VSUB.F32 dT1,dX0i,dX1i @// b-d
SUB step1,step1,#8
VMUL.F32 dT2, dT2, half[0]
VMUL.F32 dT3, dT3, half[0]
VMUL.F32 dT0, dT0, half[0]
VMUL.F32 dT1, dT1, half[0]
VZIP.F32 dW1r,dW1i
VZIP.F32 dW0r,dW0i
VMUL.F32 dX1r,dW1r,dT2
VMUL.F32 dX1i,dW1r,dT3
VMUL.F32 dX0r,dW0r,dT2
VMUL.F32 dX0i,dW0r,dT3
VMLS.F32 dX1r,dW1i,dT3
VMLA.F32 dX1i,dW1i,dT2
VMLA.F32 dX0r,dW0i,dT3
VMLS.F32 dX0i,dW0i,dT2
VADD.F32 dY1r,dT0,dX1i @// F(N/2 -1)
VSUB.F32 dY1i,dX1r,dT1
VREV64.F32 dY1r,dY1r
VREV64.F32 dY1i,dY1i
VADD.F32 dY0r,dT0,dX0i @// F(1)
VSUB.F32 dY0i,dT1,dX0r
VST2.F32 {dY0r,dY0i},[pOut1],step
VST2.F32 {dY1r,dY1i},[pOut1]!
SUB pOut1,pOut1,step
SUB step,step,#32 @// (N/2-4)*8 bytes
BGT evenOddButterflyLoop\name
@// set both the ptrs to the last element
SUB pSrc,pSrc,#8
SUB pOut1,pOut1,#8
@// Last element can be expanded as follows
@// 1/2[Z(k) + Z'(k)] - j w^-k [Z(k) - Z'(k)] (since W^k is stored as
@// -ve)
@// 1/2[(a+jb) + (a-jb)] - j w^-k [(a+jb) - (a-jb)]
@// 1/2[2a+j0] - j (c-jd) [0+j2b]
@// (a+bc, -bd)
@// Since (c,d) = (0,1) for the last element, result is just (a,-b)
lastElement\name :
VLD1.F32 dX0r,[pSrc]
VST1.F32 dX0r[0],[pOut1]!
VNEG.F32 dX0r,dX0r
VST1.F32 dX0r[1],[pOut1]
decrementScale\name :
.endm
M_START armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe,r4
FFTSTAGE "FALSE","TRUE",Inv
M_END
.end

View File

@ -1,134 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_fs_unsafe_s.S
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute the first stage of a Radix 2 DIT in-order out-of-place FFT
@// stage for a N point complex signal.
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define pPingPongBuf r5
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define pointStep r3
#define outPointStep r3
#define grpSize r4
#define setCount r4
#define step r8
#define dstStep r8
@// Neon Registers
#define dX0 D0
#define dX1 D1
#define dY0 D2
#define dY1 D3
.MACRO FFTSTAGE scaled, inverse, name
@// Define stack arguments
@// update subFFTSize and subFFTNum into RN6 and RN7 for the next stage
MOV subFFTSize,#2
LSR grpSize,subFFTNum,#1
MOV subFFTNum,grpSize
@// pT0+1 increments pT0 by 8 bytes
@// pT0+pointStep = increment of 8*pointStep bytes = 4*grpSize bytes
@// Note: outPointStep = pointStep for firststage
@// Note: setCount = grpSize/2 (reuse the updated grpSize for setCount)
MOV pointStep,grpSize,LSL #3
RSB step,pointStep,#8
@// Loop on the sets for grp zero
grpZeroSetLoop\name :
VLD1.F32 dX0,[pSrc],pointStep
VLD1.F32 dX1,[pSrc],step @// step = -pointStep + 8
SUBS setCount,setCount,#1
VADD.F32 dY0,dX0,dX1
VSUB.F32 dY1,dX0,dX1
VST1.F32 dY0,[pDst],outPointStep
@// dstStep = step = -pointStep + 8
VST1.F32 dY1,[pDst],dstStep
BGT grpZeroSetLoop\name
@// reset pSrc to pDst for the next stage
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
MOV pDst,pPingPongBuf
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",fwd
M_END
M_START armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",inv
M_END
.end

View File

@ -1,153 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_ls_unsafe_s.S
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute the last stage of a Radix 2 DIT in-order out-of-place FFT
@// stage for a N point complex signal.
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define outPointStep r3
#define grpCount r4
#define dstStep r5
#define pTmp r4
@// Neon Registers
#define dWr d0
#define dWi d1
#define dXr0 d2
#define dXi0 d3
#define dXr1 d4
#define dXi1 d5
#define dYr0 d6
#define dYi0 d7
#define dYr1 d8
#define dYi1 d9
#define qT0 d10
#define qT1 d12
.MACRO FFTSTAGE scaled, inverse, name
MOV outPointStep,subFFTSize,LSL #3
@// Update grpCount and grpSize rightaway
MOV subFFTNum,#1 @//after the last stage
LSL grpCount,subFFTSize,#1
@// update subFFTSize for the next stage
MOV subFFTSize,grpCount
RSB dstStep,outPointStep,#16
@// Loop on 2 grps at a time for the last stage
radix2lsGrpLoop\name :
@ dWr = [pTwiddle[0].Re, pTwiddle[1].Re]
@ dWi = [pTwiddle[0].Im, pTwiddle[1].Im]
VLD2.F32 {dWr,dWi},[pTwiddle, :64]!
@ dXr0 = [pSrc[0].Re, pSrc[2].Re]
@ dXi0 = [pSrc[0].Im, pSrc[2].Im]
@ dXr1 = [pSrc[1].Re, pSrc[3].Re]
@ dXi1 = [pSrc[1].Im, pSrc[3].Im]
VLD4.F32 {dXr0,dXi0,dXr1,dXi1},[pSrc, :128]!
SUBS grpCount,grpCount,#4 @// grpCount is multiplied by 2
.ifeqs "\inverse", "TRUE"
VMUL.F32 qT0,dWr,dXr1
VMLA.F32 qT0,dWi,dXi1 @// real part
VMUL.F32 qT1,dWr,dXi1
VMLS.F32 qT1,dWi,dXr1 @// imag part
.else
VMUL.F32 qT0,dWr,dXr1
VMLS.F32 qT0,dWi,dXi1 @// real part
VMUL.F32 qT1,dWr,dXi1
VMLA.F32 qT1,dWi,dXr1 @// imag part
.endif
VSUB.F32 dYr0,dXr0,qT0
VSUB.F32 dYi0,dXi0,qT1
VADD.F32 dYr1,dXr0,qT0
VADD.F32 dYi1,dXi0,qT1
VST2.F32 {dYr0,dYi0},[pDst],outPointStep
VST2.F32 {dYr1,dYi1},[pDst],dstStep @// dstStep = step = -outPointStep + 16
BGT radix2lsGrpLoop\name
@// Reset and Swap pSrc and pDst for the next stage
MOV pTmp,pDst
SUB pDst,pSrc,outPointStep,LSL #1 @// pDst -= 4*size; pSrc -= 8*size bytes
SUB pSrc,pTmp,outPointStep
@// Reset pTwiddle for the next stage
SUB pTwiddle,pTwiddle,outPointStep @// pTwiddle -= 4*size bytes
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe,r4,""
FFTSTAGE "FALSE","FALSE",fwd
M_END
M_START armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",inv
M_END
.end

View File

@ -1,191 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix2_unsafe_s.s
@// to support float instead of SC32.
@//
@// Description:
@// Compute a Radix 2 DIT in-order out-of-place FFT stage for an N point
@// complex signal. This handles the general stage, not the first or last
@// stage.
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define outPointStep r3
#define pointStep r4
#define grpCount r5
#define setCount r8
@//const RN 9
#define step r10
#define dstStep r11
#define pTable r9
#define pTmp r9
@// Neon Registers
#define dW D0
#define dX0 D2
#define dX1 D3
#define dX2 D4
#define dX3 D5
#define dY0 D6
#define dY1 D7
#define dY2 D8
#define dY3 D9
#define qT0 D10
#define qT1 D11
.MACRO FFTSTAGE scaled, inverse, name
@// Define stack arguments
@// Update grpCount and grpSize rightaway inorder to reuse pGrpCount
@// and pGrpSize regs
LSR subFFTNum,subFFTNum,#1 @//grpSize
LSL grpCount,subFFTSize,#1
@// pT0+1 increments pT0 by 8 bytes
@// pT0+pointStep = increment of 8*pointStep bytes = 4*grpSize bytes
MOV pointStep,subFFTNum,LSL #2
@// update subFFTSize for the next stage
MOV subFFTSize,grpCount
@// pOut0+1 increments pOut0 by 8 bytes
@// pOut0+outPointStep == increment of 8*outPointStep bytes =
@// 4*size bytes
SMULBB outPointStep,grpCount,pointStep
LSL pointStep,pointStep,#1
RSB step,pointStep,#16
RSB dstStep,outPointStep,#16
@// Loop on the groups
radix2GrpLoop\name :
MOV setCount,pointStep,LSR #3
VLD1.F32 dW,[pTwiddle],pointStep @//[wi | wr]
@// Loop on the sets
radix2SetLoop\name :
@// point0: dX0-real part dX1-img part
VLD2.F32 {dX0,dX1},[pSrc],pointStep
@// point1: dX2-real part dX3-img part
VLD2.F32 {dX2,dX3},[pSrc],step
SUBS setCount,setCount,#2
.ifeqs "\inverse", "TRUE"
VMUL.F32 qT0,dX2,dW[0]
VMLA.F32 qT0,dX3,dW[1] @// real part
VMUL.F32 qT1,dX3,dW[0]
VMLS.F32 qT1,dX2,dW[1] @// imag part
.else
VMUL.F32 qT0,dX2,dW[0]
VMLS.F32 qT0,dX3,dW[1] @// real part
VMUL.F32 qT1,dX3,dW[0]
VMLA.F32 qT1,dX2,dW[1] @// imag part
.endif
VSUB.F32 dY0,dX0,qT0
VSUB.F32 dY1,dX1,qT1
VADD.F32 dY2,dX0,qT0
VADD.F32 dY3,dX1,qT1
VST2.F32 {dY0,dY1},[pDst],outPointStep
@// dstStep = -outPointStep + 16
VST2.F32 {dY2,dY3},[pDst],dstStep
BGT radix2SetLoop\name
SUBS grpCount,grpCount,#2
ADD pSrc,pSrc,pointStep
BGT radix2GrpLoop\name
@// Reset and Swap pSrc and pDst for the next stage
MOV pTmp,pDst
@// pDst -= 4*size; pSrc -= 8*size bytes
SUB pDst,pSrc,outPointStep,LSL #1
SUB pSrc,pTmp,outPointStep
@// Reset pTwiddle for the next stage
@// pTwiddle -= 4*size bytes
SUB pTwiddle,pTwiddle,outPointStep
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",FWD
M_END
M_START armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",INV
M_END
.end

View File

@ -1,251 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_fs_unsafe_s.s
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute a first stage Radix 4 FFT stage for a N point complex signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define pPingPongBuf r5
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define grpSize r3
@// Reuse grpSize as setCount
#define setCount r3
#define pointStep r4
#define outPointStep r4
#define setStep r8
#define step1 r9
#define step3 r10
@// Neon Registers
#define dXr0 D0
#define dXi0 D1
#define dXr1 D2
#define dXi1 D3
#define dXr2 D4
#define dXi2 D5
#define dXr3 D6
#define dXi3 D7
#define dYr0 D8
#define dYi0 D9
#define dYr1 D10
#define dYi1 D11
#define dYr2 D12
#define dYi2 D13
#define dYr3 D14
#define dYi3 D15
#define qX0 Q0
#define qX1 Q1
#define qX2 Q2
#define qX3 Q3
#define qY0 Q4
#define qY1 Q5
#define qY2 Q6
#define qY3 Q7
#define dZr0 D16
#define dZi0 D17
#define dZr1 D18
#define dZi1 D19
#define dZr2 D20
#define dZi2 D21
#define dZr3 D22
#define dZi3 D23
#define qZ0 Q8
#define qZ1 Q9
#define qZ2 Q10
#define qZ3 Q11
.MACRO FFTSTAGE scaled, inverse, name
@// Define stack arguments
@// pT0+1 increments pT0 by 8 bytes
@// pT0+pointStep = increment of 8*pointStep bytes = 2*grpSize bytes
@// Note: outPointStep = pointStep for firststage
MOV pointStep,subFFTNum,LSL #1
@// Update pSubFFTSize and pSubFFTNum regs
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep @// data[0]
@// subFFTSize = 1 for the first stage
MOV subFFTSize,#4
@// Note: setCount = subFFTNum/4 (reuse the grpSize reg for setCount)
LSR grpSize,subFFTNum,#2
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
MOV subFFTNum,grpSize
@// Calculate the step of input data for the next set
@//MOV setStep,pointStep,LSL #1
MOV setStep,grpSize,LSL #4
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
@// setStep = 3*pointStep
ADD setStep,setStep,pointStep
@// setStep = - 3*pointStep+16
RSB setStep,setStep,#16
@// data[3] & update pSrc for the next set
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
@// step1 = 2*pointStep
MOV step1,pointStep,LSL #1
VADD.F32 qY0,qX0,qX2
@// step3 = -pointStep
RSB step3,pointStep,#0
@// grp = 0 a special case since all the twiddle factors are 1
@// Loop on the sets : 2 sets at a time
radix4fsGrpZeroSetLoop\name :
@// Decrement setcount
SUBS setCount,setCount,#2
@// finish first stage of 4 point FFT
VSUB.F32 qY2,qX0,qX2
VLD2.F32 {dXr0,dXi0},[pSrc, :128],step1 @// data[0]
VADD.F32 qY1,qX1,qX3
VLD2.F32 {dXr2,dXi2},[pSrc, :128],step3 @// data[2]
VSUB.F32 qY3,qX1,qX3
@// finish second stage of 4 point FFT
.ifeqs "\inverse", "TRUE"
VLD2.F32 {dXr1,dXi1},[pSrc, :128],step1 @// data[1]
VADD.F32 qZ0,qY0,qY1
@// data[3] & update pSrc for the next set, but not if it's the
@// last iteration so that we don't read past the end of the
@// input array.
BEQ radix4SkipLastUpdateInv\name
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
radix4SkipLastUpdateInv\name:
VSUB.F32 dZr3,dYr2,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VADD.F32 dZi3,dYi2,dYr3
VSUB.F32 qZ1,qY0,qY1
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
VADD.F32 dZr2,dYr2,dYi3
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
VSUB.F32 dZi2,dYi2,dYr3
VADD.F32 qY0,qX0,qX2 @// u0 for next iteration
VST2.F32 {dZr2,dZi2},[pDst, :128],setStep
.else
VLD2.F32 {dXr1,dXi1},[pSrc, :128],step1 @// data[1]
VADD.F32 qZ0,qY0,qY1
@// data[3] & update pSrc for the next set, but not if it's the
@// last iteration so that we don't read past the end of the
@// input array.
BEQ radix4SkipLastUpdateFwd\name
VLD2.F32 {dXr3,dXi3},[pSrc, :128],setStep
radix4SkipLastUpdateFwd\name:
VADD.F32 dZr2,dYr2,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VSUB.F32 dZi2,dYi2,dYr3
VSUB.F32 qZ1,qY0,qY1
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
VSUB.F32 dZr3,dYr2,dYi3
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
VADD.F32 dZi3,dYi2,dYr3
VADD.F32 qY0,qX0,qX2 @// u0 for next iteration
VST2.F32 {dZr3,dZi3},[pDst, :128],setStep
.endif
BGT radix4fsGrpZeroSetLoop\name
@// reset pSrc to pDst for the next stage
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
MOV pDst,pPingPongBuf
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",fwd
M_END
M_START armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",inv
M_END
.end

View File

@ -1,339 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_ls_unsafe_s.s
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute a Radix 4 FFT stage for a N point complex signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Import symbols required from other files
@// (For example tables)
@//IMPORT armAAC_constTable
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define outPointStep r3
#define grpCount r4
#define dstStep r5
#define grpTwStep r8
#define stepTwiddle r9
#define twStep r10
#define pTmp r4
#define step16 r11
#define step24 r12
@// Neon Registers
#define dButterfly1Real02 D0
#define dButterfly1Imag02 D1
#define dButterfly1Real13 D2
#define dButterfly1Imag13 D3
#define dButterfly2Real02 D4
#define dButterfly2Imag02 D5
#define dButterfly2Real13 D6
#define dButterfly2Imag13 D7
#define dXr0 D0
#define dXi0 D1
#define dXr1 D2
#define dXi1 D3
#define dXr2 D4
#define dXi2 D5
#define dXr3 D6
#define dXi3 D7
#define dYr0 D16
#define dYi0 D17
#define dYr1 D18
#define dYi1 D19
#define dYr2 D20
#define dYi2 D21
#define dYr3 D22
#define dYi3 D23
#define dW1r D8
#define dW1i D9
#define dW2r D10
#define dW2i D11
#define dW3r D12
#define dW3i D13
#define qT0 d14
#define qT1 d16
#define qT2 d18
#define qT3 d20
#define qT4 d22
#define qT5 d24
#define dZr0 D14
#define dZi0 D15
#define dZr1 D26
#define dZi1 D27
#define dZr2 D28
#define dZi2 D29
#define dZr3 D30
#define dZi3 D31
#define qX0 Q0
#define qY0 Q8
#define qY1 Q9
#define qY2 Q10
#define qY3 Q11
#define qZ0 Q7
#define qZ1 Q13
#define qZ2 Q14
#define qZ3 Q15
.MACRO FFTSTAGE scaled, inverse , name
@// Define stack arguments
@// pOut0+1 increments pOut0 by 8 bytes
@// pOut0+outPointStep == increment of 8*outPointStep bytes
MOV outPointStep,subFFTSize,LSL #3
@// Update grpCount and grpSize rightaway
VLD2.F32 {dW1r,dW1i},[pTwiddle, :128] @// [wi|wr]
MOV step16,#16
LSL grpCount,subFFTSize,#2
VLD1.F32 dW2r,[pTwiddle, :64] @// [wi|wr]
MOV subFFTNum,#1 @//after the last stage
VLD1.F32 dW3r,[pTwiddle, :64],step16 @// [wi|wr]
MOV stepTwiddle,#0
VLD1.F32 dW2i,[pTwiddle, :64]! @// [wi|wr]
SUB grpTwStep,stepTwiddle,#8 @// grpTwStep = -8 to start with
@// update subFFTSize for the next stage
MOV subFFTSize,grpCount
VLD1.F32 dW3i,[pTwiddle, :64],grpTwStep @// [wi|wr]
MOV dstStep,outPointStep,LSL #1
@// AC.r AC.i BD.r BD.i
VLD4.F32 {dButterfly1Real02,dButterfly1Imag02,dButterfly1Real13,dButterfly1Imag13},[pSrc, :256]!
ADD dstStep,dstStep,outPointStep @// dstStep = 3*outPointStep
RSB dstStep,dstStep,#16 @// dstStep = - 3*outPointStep+16
MOV step24,#24
@// AC.r AC.i BD.r BD.i
VLD4.F32 {dButterfly2Real02,dButterfly2Imag02,dButterfly2Real13,dButterfly2Imag13},[pSrc, :256]!
@// Process two groups at a time
radix4lsGrpLoop\name :
VZIP.F32 dW2r,dW2i
ADD stepTwiddle,stepTwiddle,#16
VZIP.F32 dW3r,dW3i
ADD grpTwStep,stepTwiddle,#4
VUZP.F32 dButterfly1Real13, dButterfly2Real13 @// B.r D.r
SUB twStep,stepTwiddle,#16 @// -16+stepTwiddle
VUZP.F32 dButterfly1Imag13, dButterfly2Imag13 @// B.i D.i
MOV grpTwStep,grpTwStep,LSL #1
VUZP.F32 dButterfly1Real02, dButterfly2Real02 @// A.r C.r
RSB grpTwStep,grpTwStep,#0 @// -8-2*stepTwiddle
VUZP.F32 dButterfly1Imag02, dButterfly2Imag02 @// A.i C.i
@// grpCount is multiplied by 4
SUBS grpCount,grpCount,#8
.ifeqs "\inverse", "TRUE"
VMUL.F32 dZr1,dW1r,dXr1
VMLA.F32 dZr1,dW1i,dXi1 @// real part
VMUL.F32 dZi1,dW1r,dXi1
VMLS.F32 dZi1,dW1i,dXr1 @// imag part
.else
VMUL.F32 dZr1,dW1r,dXr1
VMLS.F32 dZr1,dW1i,dXi1 @// real part
VMUL.F32 dZi1,dW1r,dXi1
VMLA.F32 dZi1,dW1i,dXr1 @// imag part
.endif
VLD2.F32 {dW1r,dW1i},[pTwiddle, :128],stepTwiddle @// [wi|wr]
.ifeqs "\inverse", "TRUE"
VMUL.F32 dZr2,dW2r,dXr2
VMLA.F32 dZr2,dW2i,dXi2 @// real part
VMUL.F32 dZi2,dW2r,dXi2
VLD1.F32 dW2r,[pTwiddle, :64],step16 @// [wi|wr]
VMLS.F32 dZi2,dW2i,dXr2 @// imag part
.else
VMUL.F32 dZr2,dW2r,dXr2
VMLS.F32 dZr2,dW2i,dXi2 @// real part
VMUL.F32 dZi2,dW2r,dXi2
VLD1.F32 dW2r,[pTwiddle, :64],step16 @// [wi|wr]
VMLA.F32 dZi2,dW2i,dXr2 @// imag part
.endif
VLD1.F32 dW2i,[pTwiddle, :64],twStep @// [wi|wr]
@// move qX0 so as to load for the next iteration
VMOV qZ0,qX0
.ifeqs "\inverse", "TRUE"
VMUL.F32 dZr3,dW3r,dXr3
VMLA.F32 dZr3,dW3i,dXi3 @// real part
VMUL.F32 dZi3,dW3r,dXi3
VLD1.F32 dW3r,[pTwiddle, :64],step24
VMLS.F32 dZi3,dW3i,dXr3 @// imag part
.else
VMUL.F32 dZr3,dW3r,dXr3
VMLS.F32 dZr3,dW3i,dXi3 @// real part
VMUL.F32 dZi3,dW3r,dXi3
VLD1.F32 dW3r,[pTwiddle, :64],step24
VMLA.F32 dZi3,dW3i,dXr3 @// imag part
.endif
VLD1.F32 dW3i,[pTwiddle, :64],grpTwStep @// [wi|wr]
@// Don't do the load on the last iteration so we don't read past the end
@// of pSrc.
addeq pSrc, pSrc, #64
beq radix4lsSkipRead\name
@// AC.r AC.i BD.r BD.i
VLD4.F32 {dButterfly1Real02,dButterfly1Imag02,dButterfly1Real13,dButterfly1Imag13},[pSrc, :256]!
@// AC.r AC.i BD.r BD.i
VLD4.F32 {dButterfly2Real02,dButterfly2Imag02,dButterfly2Real13,dButterfly2Imag13},[pSrc, :256]!
radix4lsSkipRead\name:
@// finish first stage of 4 point FFT
VADD.F32 qY0,qZ0,qZ2
VSUB.F32 qY2,qZ0,qZ2
VADD.F32 qY1,qZ1,qZ3
VSUB.F32 qY3,qZ1,qZ3
@// finish second stage of 4 point FFT
.ifeqs "\inverse", "TRUE"
VSUB.F32 qZ0,qY2,qY1
VADD.F32 dZr3,dYr0,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VSUB.F32 dZi3,dYi0,dYr3
VADD.F32 qZ2,qY2,qY1
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
VSUB.F32 dZr1,dYr0,dYi3
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
VADD.F32 dZi1,dYi0,dYr3
@// dstStep = -outPointStep + 16
VST2.F32 {dZr1,dZi1},[pDst, :128],dstStep
.else
VSUB.F32 qZ0,qY2,qY1
VSUB.F32 dZr1,dYr0,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VADD.F32 dZi1,dYi0,dYr3
VADD.F32 qZ2,qY2,qY1
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
VADD.F32 dZr3,dYr0,dYi3
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
VSUB.F32 dZi3,dYi0,dYr3
@// dstStep = -outPointStep + 16
VST2.F32 {dZr3,dZi3},[pDst, :128],dstStep
.endif
BGT radix4lsGrpLoop\name
@// Reset and Swap pSrc and pDst for the next stage
MOV pTmp,pDst
@// Extra increment done in final iteration of the loop
SUB pSrc,pSrc,#64
@// pDst -= 4*size; pSrc -= 8*size bytes
SUB pDst,pSrc,outPointStep,LSL #2
SUB pSrc,pTmp,outPointStep
SUB pTwiddle,pTwiddle,subFFTSize,LSL #1
@// Extra increment done in final iteration of the loop
SUB pTwiddle,pTwiddle,#16
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",fwd
M_END
M_START armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",inv
M_END
.end

View File

@ -1,331 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@//
@// This is a modification of armSP_FFT_CToC_SC32_Radix4_unsafe_s.s
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute a Radix 4 FFT stage for a N point complex signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@// Import symbols required from other files
@// (For example tables)
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define subFFTNum r6
#define subFFTSize r7
@//Output Registers
@//Local Scratch Registers
#define grpCount r3
#define pointStep r4
#define outPointStep r5
#define stepTwiddle r12
#define setCount r14
#define srcStep r8
#define setStep r9
#define dstStep r10
#define twStep r11
#define t1 r3
@// Neon Registers
#define dW1 D0
#define dW2 D1
#define dW3 D2
#define dXr0 D4
#define dXi0 D5
#define dXr1 D6
#define dXi1 D7
#define dXr2 D8
#define dXi2 D9
#define dXr3 D10
#define dXi3 D11
#define dYr0 D12
#define dYi0 D13
#define dYr1 D14
#define dYi1 D15
#define dYr2 D16
#define dYi2 D17
#define dYr3 D18
#define dYi3 D19
#define qT0 d16
#define qT1 d18
#define qT2 d12
#define qT3 d14
#define dZr0 D20
#define dZi0 D21
#define dZr1 D22
#define dZi1 D23
#define dZr2 D24
#define dZi2 D25
#define dZr3 D26
#define dZi3 D27
#define qY0 Q6
#define qY1 Q7
#define qY2 Q8
#define qY3 Q9
#define qX0 Q2
#define qZ0 Q10
#define qZ1 Q11
#define qZ2 Q12
#define qZ3 Q13
.MACRO FFTSTAGE scaled, inverse , name
@// Define stack arguments
@// Update grpCount and grpSize rightaway inorder to reuse
@// pGrpCount and pGrpSize regs
LSL grpCount,subFFTSize,#2
LSR subFFTNum,subFFTNum,#2
MOV subFFTSize,grpCount
VLD1.F32 dW1,[pTwiddle] @//[wi | wr]
@// pT0+1 increments pT0 by 8 bytes
@// pT0+pointStep = increment of 8*pointStep bytes = 2*grpSize bytes
MOV pointStep,subFFTNum,LSL #1
@// pOut0+1 increments pOut0 by 8 bytes
@// pOut0+outPointStep == increment of 8*outPointStep bytes
@// = 2*size bytes
MOV stepTwiddle,#0
VLD1.F32 dW2,[pTwiddle] @//[wi | wr]
SMULBB outPointStep,grpCount,pointStep
LSL pointStep,pointStep,#2 @// 2*grpSize
VLD1.F32 dW3,[pTwiddle] @//[wi | wr]
MOV srcStep,pointStep,LSL #1 @// srcStep = 2*pointStep
ADD setStep,srcStep,pointStep @// setStep = 3*pointStep
RSB setStep,setStep,#0 @// setStep = - 3*pointStep
SUB srcStep,srcStep,#16 @// srcStep = 2*pointStep-16
MOV dstStep,outPointStep,LSL #1
ADD dstStep,dstStep,outPointStep @// dstStep = 3*outPointStep
@// dstStep = - 3*outPointStep+16
RSB dstStep,dstStep,#16
radix4GrpLoop\name :
VLD2.F32 {dXr0,dXi0},[pSrc],pointStep @// data[0]
ADD stepTwiddle,stepTwiddle,pointStep
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep @// data[1]
@// set pTwiddle to the first point
ADD pTwiddle,pTwiddle,stepTwiddle
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep @// data[2]
MOV twStep,stepTwiddle,LSL #2
@// data[3] & update pSrc for the next set
VLD2.F32 {dXr3,dXi3},[pSrc],setStep
SUB twStep,stepTwiddle,twStep @// twStep = -3*stepTwiddle
MOV setCount,pointStep,LSR #3
@// set pSrc to data[0] of the next set
ADD pSrc,pSrc,#16
@// increment to data[1] of the next set
ADD pSrc,pSrc,pointStep
@// Loop on the sets
radix4SetLoop\name :
.ifeqs "\inverse", "TRUE"
VMUL.F32 dZr1,dXr1,dW1[0]
VMUL.F32 dZi1,dXi1,dW1[0]
VMUL.F32 dZr2,dXr2,dW2[0]
VMUL.F32 dZi2,dXi2,dW2[0]
VMUL.F32 dZr3,dXr3,dW3[0]
VMUL.F32 dZi3,dXi3,dW3[0]
VMLA.F32 dZr1,dXi1,dW1[1] @// real part
VMLS.F32 dZi1,dXr1,dW1[1] @// imag part
@// data[1] for next iteration
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep
VMLA.F32 dZr2,dXi2,dW2[1] @// real part
VMLS.F32 dZi2,dXr2,dW2[1] @// imag part
@// data[2] for next iteration
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep
VMLA.F32 dZr3,dXi3,dW3[1] @// real part
VMLS.F32 dZi3,dXr3,dW3[1] @// imag part
.else
VMUL.F32 dZr1,dXr1,dW1[0]
VMUL.F32 dZi1,dXi1,dW1[0]
VMUL.F32 dZr2,dXr2,dW2[0]
VMUL.F32 dZi2,dXi2,dW2[0]
VMUL.F32 dZr3,dXr3,dW3[0]
VMUL.F32 dZi3,dXi3,dW3[0]
VMLS.F32 dZr1,dXi1,dW1[1] @// real part
VMLA.F32 dZi1,dXr1,dW1[1] @// imag part
@// data[1] for next iteration
VLD2.F32 {dXr1,dXi1},[pSrc],pointStep
VMLS.F32 dZr2,dXi2,dW2[1] @// real part
VMLA.F32 dZi2,dXr2,dW2[1] @// imag part
@// data[2] for next iteration
VLD2.F32 {dXr2,dXi2},[pSrc],pointStep
VMLS.F32 dZr3,dXi3,dW3[1] @// real part
VMLA.F32 dZi3,dXr3,dW3[1] @// imag part
.endif
@// data[3] & update pSrc to data[0]
@// But don't read on the very last iteration because that reads past
@// the end of pSrc. The last iteration is grpCount = 4, setCount = 2.
cmp grpCount, #4
cmpeq setCount, #2 @// Test setCount if grpCount = 4
@// These are executed only if both grpCount = 4 and setCount = 2
addeq pSrc, pSrc, setStep
beq radix4SkipRead\name
VLD2.F32 {dXr3,dXi3},[pSrc],setStep
radix4SkipRead\name:
SUBS setCount,setCount,#2
@// finish first stage of 4 point FFT
VADD.F32 qY0,qX0,qZ2
VSUB.F32 qY2,qX0,qZ2
@// data[0] for next iteration
VLD2.F32 {dXr0,dXi0},[pSrc, :128]!
VADD.F32 qY1,qZ1,qZ3
VSUB.F32 qY3,qZ1,qZ3
@// finish second stage of 4 point FFT
VSUB.F32 qZ0,qY2,qY1
.ifeqs "\inverse", "TRUE"
VADD.F32 dZr3,dYr0,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VSUB.F32 dZi3,dYi0,dYr3
VADD.F32 qZ2,qY2,qY1
VST2.F32 {dZr3,dZi3},[pDst, :128],outPointStep
VSUB.F32 dZr1,dYr0,dYi3
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
VADD.F32 dZi1,dYi0,dYr3
VST2.F32 {dZr1,dZi1},[pDst, :128],dstStep
.else
VSUB.F32 dZr1,dYr0,dYi3
VST2.F32 {dZr0,dZi0},[pDst, :128],outPointStep
VADD.F32 dZi1,dYi0,dYr3
VADD.F32 qZ2,qY2,qY1
VST2.F32 {dZr1,dZi1},[pDst, :128],outPointStep
VADD.F32 dZr3,dYr0,dYi3
VST2.F32 {dZr2,dZi2},[pDst, :128],outPointStep
VSUB.F32 dZi3,dYi0,dYr3
VST2.F32 {dZr3,dZi3},[pDst, :128],dstStep
.endif
@// increment to data[1] of the next set
ADD pSrc,pSrc,pointStep
BGT radix4SetLoop\name
VLD1.F32 dW1,[pTwiddle, :64],stepTwiddle @//[wi | wr]
@// subtract 4 since grpCount multiplied by 4
SUBS grpCount,grpCount,#4
VLD1.F32 dW2,[pTwiddle, :64],stepTwiddle @//[wi | wr]
@// increment pSrc for the next grp
ADD pSrc,pSrc,srcStep
VLD1.F32 dW3,[pTwiddle, :64],twStep @//[wi | wr]
BGT radix4GrpLoop\name
@// Reset and Swap pSrc and pDst for the next stage
MOV t1,pDst
@// pDst -= 2*size; pSrc -= 8*size bytes
SUB pDst,pSrc,outPointStep,LSL #2
SUB pSrc,t1,outPointStep
.endm
M_START armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",FWD
M_END
M_START armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",INV
M_END
.end

View File

@ -1,422 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of armSP_FFT_CToC_FC32_Radix8_fs_unsafe_s.s
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute a first stage Radix 8 FFT stage for a N point complex signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@//Input Registers
#define pSrc r0
#define pDst r2
#define pTwiddle r1
#define subFFTNum r6
#define subFFTSize r7
@// dest buffer for the next stage (not pSrc for first stage)
#define pPingPongBuf r5
@//Output Registers
@//Local Scratch Registers
#define grpSize r3
@// Reuse grpSize as setCount
#define setCount r3
#define pointStep r4
#define outPointStep r4
#define setStep r8
#define step1 r9
#define step2 r10
#define t0 r11
@// Neon Registers
#define dXr0 D0
#define dXi0 D1
#define dXr1 D2
#define dXi1 D3
#define dXr2 D4
#define dXi2 D5
#define dXr3 D6
#define dXi3 D7
#define dXr4 D8
#define dXi4 D9
#define dXr5 D10
#define dXi5 D11
#define dXr6 D12
#define dXi6 D13
#define dXr7 D14
#define dXi7 D15
#define qX0 Q0
#define qX1 Q1
#define qX2 Q2
#define qX3 Q3
#define qX4 Q4
#define qX5 Q5
#define qX6 Q6
#define qX7 Q7
#define dUr0 D16
#define dUi0 D17
#define dUr2 D18
#define dUi2 D19
#define dUr4 D20
#define dUi4 D21
#define dUr6 D22
#define dUi6 D23
#define dUr1 D24
#define dUi1 D25
#define dUr3 D26
#define dUi3 D27
#define dUr5 D28
#define dUi5 D29
@// reuse dXr7 and dXi7
#define dUr7 D30
#define dUi7 D31
#define qU0 Q8
#define qU1 Q12
#define qU2 Q9
#define qU3 Q13
#define qU4 Q10
#define qU5 Q14
#define qU6 Q11
#define qU7 Q15
#define dVr0 D24
#define dVi0 D25
#define dVr2 D26
#define dVi2 D27
#define dVr4 D28
#define dVi4 D29
#define dVr6 D30
#define dVi6 D31
#define dVr1 D16
#define dVi1 D17
#define dVr3 D18
#define dVi3 D19
#define dVr5 D20
#define dVi5 D21
#define dVr7 D22
#define dVi7 D23
#define qV0 Q12
#define qV1 Q8
#define qV2 Q13
#define qV3 Q9
#define qV4 Q14
#define qV5 Q10
#define qV6 Q15
#define qV7 Q11
#define dYr0 D16
#define dYi0 D17
#define dYr2 D18
#define dYi2 D19
#define dYr4 D20
#define dYi4 D21
#define dYr6 D22
#define dYi6 D23
#define dYr1 D24
#define dYi1 D25
#define dYr3 D26
#define dYi3 D27
#define dYr5 D28
#define dYi5 D29
#define dYr7 D30
#define dYi7 D31
#define qY0 Q8
#define qY1 Q12
#define qY2 Q9
#define qY3 Q13
#define qY4 Q10
#define qY5 Q14
#define qY6 Q11
#define qY7 Q15
#define dT0 D14
#define dT1 D15
.MACRO FFTSTAGE scaled, inverse, name
@// Define stack arguments
@// Update pSubFFTSize and pSubFFTNum regs
@// subFFTSize = 1 for the first stage
MOV subFFTSize,#8
ADR t0,ONEBYSQRT2\name
@// Note: setCount = subFFTNum/8 (reuse the grpSize reg for setCount)
LSR grpSize,subFFTNum,#3
MOV subFFTNum,grpSize
@// pT0+1 increments pT0 by 8 bytes
@// pT0+pointStep = increment of 8*pointStep bytes = grpSize bytes
@// Note: outPointStep = pointStep for firststage
MOV pointStep,grpSize,LSL #3
@// Calculate the step of input data for the next set
@//MOV step1,pointStep,LSL #1 @// step1 = 2*pointStep
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep @// data[0]
MOV step1,grpSize,LSL #4
MOV step2,pointStep,LSL #3
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
SUB step2,step2,pointStep @// step2 = 7*pointStep
@// setStep = - 7*pointStep+16
RSB setStep,step2,#16
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
VLD2.F32 {dXr3,dXi3},[pSrc, :128],pointStep @// data[3]
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
@// data[7] & update pSrc for the next set
@// setStep = -7*pointStep + 16
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep
@// grp = 0 a special case since all the twiddle factors are 1
@// Loop on the sets
radix8fsGrpZeroSetLoop\name :
@// Decrement setcount
SUBS setCount,setCount,#2
@// finish first stage of 8 point FFT
VADD.F32 qU0,qX0,qX4
VADD.F32 qU2,qX1,qX5
VADD.F32 qU4,qX2,qX6
VADD.F32 qU6,qX3,qX7
@// finish second stage of 8 point FFT
VADD.F32 qV0,qU0,qU4
VSUB.F32 qV2,qU0,qU4
VADD.F32 qV4,qU2,qU6
VSUB.F32 qV6,qU2,qU6
@// finish third stage of 8 point FFT
VADD.F32 qY0,qV0,qV4
VSUB.F32 qY4,qV0,qV4
VST2.F32 {dYr0,dYi0},[pDst, :128],step1 @// store y0
.ifeqs "\inverse", "TRUE"
VSUB.F32 dYr2,dVr2,dVi6
VADD.F32 dYi2,dVi2,dVr6
VADD.F32 dYr6,dVr2,dVi6
VST2.F32 {dYr2,dYi2},[pDst, :128],step1 @// store y2
VSUB.F32 dYi6,dVi2,dVr6
VSUB.F32 qU1,qX0,qX4
VST2.F32 {dYr4,dYi4},[pDst, :128],step1 @// store y4
VSUB.F32 qU3,qX1,qX5
VSUB.F32 qU5,qX2,qX6
VST2.F32 {dYr6,dYi6},[pDst, :128],step1 @// store y6
.ELSE
VADD.F32 dYr6,dVr2,dVi6
VSUB.F32 dYi6,dVi2,dVr6
VSUB.F32 dYr2,dVr2,dVi6
VST2.F32 {dYr6,dYi6},[pDst, :128],step1 @// store y2
VADD.F32 dYi2,dVi2,dVr6
VSUB.F32 qU1,qX0,qX4
VST2.F32 {dYr4,dYi4},[pDst, :128],step1 @// store y4
VSUB.F32 qU3,qX1,qX5
VSUB.F32 qU5,qX2,qX6
VST2.F32 {dYr2,dYi2},[pDst, :128],step1 @// store y6
.ENDIF
@// finish first stage of 8 point FFT
VSUB.F32 qU7,qX3,qX7
VLD1.F32 dT0[0], [t0]
@// finish second stage of 8 point FFT
VSUB.F32 dVr1,dUr1,dUi5
@// data[0] for next iteration
VLD2.F32 {dXr0,dXi0},[pSrc, :128],pointStep
VADD.F32 dVi1,dUi1,dUr5
VADD.F32 dVr3,dUr1,dUi5
VLD2.F32 {dXr1,dXi1},[pSrc, :128],pointStep @// data[1]
VSUB.F32 dVi3,dUi1,dUr5
VSUB.F32 dVr5,dUr3,dUi7
VLD2.F32 {dXr2,dXi2},[pSrc, :128],pointStep @// data[2]
VADD.F32 dVi5,dUi3,dUr7
VADD.F32 dVr7,dUr3,dUi7
VLD2.F32 {dXr3,dXi3},[pSrc, :128],pointStep @// data[3]
VSUB.F32 dVi7,dUi3,dUr7
@// finish third stage of 8 point FFT
.ifeqs "\inverse", "TRUE"
@// calculate a*v5
VMUL.F32 dT1,dVr5,dT0[0] @// use dVi0 for dT1
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
VMUL.F32 dVi5,dVi5,dT0[0]
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
VSUB.F32 dVr5,dT1,dVi5 @// a * V5
VADD.F32 dVi5,dT1,dVi5
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
@// calculate b*v7
VMUL.F32 dT1,dVr7,dT0[0]
VMUL.F32 dVi7,dVi7,dT0[0]
VADD.F32 qY1,qV1,qV5
VSUB.F32 qY5,qV1,qV5
VADD.F32 dVr7,dT1,dVi7 @// b * V7
VSUB.F32 dVi7,dVi7,dT1
SUB pDst, pDst, step2 @// set pDst to y1
@// On the last iteration, this will read past the end of pSrc,
@// so skip this read.
BEQ radix8SkipLastUpdateInv\name
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep @// data[7]
radix8SkipLastUpdateInv\name:
VSUB.F32 dYr3,dVr3,dVr7
VSUB.F32 dYi3,dVi3,dVi7
VST2.F32 {dYr1,dYi1},[pDst, :128],step1 @// store y1
VADD.F32 dYr7,dVr3,dVr7
VADD.F32 dYi7,dVi3,dVi7
VST2.F32 {dYr3,dYi3},[pDst, :128],step1 @// store y3
VST2.F32 {dYr5,dYi5},[pDst, :128],step1 @// store y5
VST2.F32 {dYr7,dYi7},[pDst, :128] @// store y7
ADD pDst, pDst, #16
.ELSE
@// calculate b*v7
VMUL.F32 dT1,dVr7,dT0[0]
VLD2.F32 {dXr4,dXi4},[pSrc, :128],pointStep @// data[4]
VMUL.F32 dVi7,dVi7,dT0[0]
VLD2.F32 {dXr5,dXi5},[pSrc, :128],pointStep @// data[5]
VADD.F32 dVr7,dT1,dVi7 @// b * V7
VSUB.F32 dVi7,dVi7,dT1
VLD2.F32 {dXr6,dXi6},[pSrc, :128],pointStep @// data[6]
@// calculate a*v5
VMUL.F32 dT1,dVr5,dT0[0] @// use dVi0 for dT1
VMUL.F32 dVi5,dVi5,dT0[0]
VADD.F32 dYr7,dVr3,dVr7
VADD.F32 dYi7,dVi3,dVi7
SUB pDst, pDst, step2 @// set pDst to y1
VSUB.F32 dVr5,dT1,dVi5 @// a * V5
VADD.F32 dVi5,dT1,dVi5
@// On the last iteration, this will read past the end of pSrc,
@// so skip this read.
BEQ radix8SkipLastUpdateFwd\name
VLD2.F32 {dXr7,dXi7},[pSrc, :128],setStep @// data[7]
radix8SkipLastUpdateFwd\name:
VSUB.F32 qY5,qV1,qV5
VSUB.F32 dYr3,dVr3,dVr7
VST2.F32 {dYr7,dYi7},[pDst, :128],step1 @// store y1
VSUB.F32 dYi3,dVi3,dVi7
VADD.F32 qY1,qV1,qV5
VST2.F32 {dYr5,dYi5},[pDst, :128],step1 @// store y3
VST2.F32 {dYr3,dYi3},[pDst, :128],step1 @// store y5
VST2.F32 {dYr1,dYi1},[pDst, :128]! @// store y7
.ENDIF
@// update pDst for the next set
SUB pDst, pDst, step2
BGT radix8fsGrpZeroSetLoop\name
@// reset pSrc to pDst for the next stage
SUB pSrc,pDst,pointStep @// pDst -= 2*grpSize
MOV pDst,pPingPongBuf
.endm
@// Allocate stack memory required by the function
M_START armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","FALSE",FWD
M_END
ONEBYSQRT2FWD: .float 0.7071067811865476e0
M_START armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe,r4
FFTSTAGE "FALSE","TRUE",INV
M_END
ONEBYSQRT2INV: .float 0.7071067811865476e0
.end

File diff suppressed because it is too large Load Diff

View File

@ -1,404 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of omxSP_FFTFwd_RToCCS_S32_Sfs_s.s
@// to support float instead of SC32.
@//
@//
@// Description:
@// Compute FFT for a real signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
.extern armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@// Import symbols required from other files
@// (For example tables)
.extern armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
.extern armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
@//Input Registers
#define pSrc r0
#define pDst r1
#define pFFTSpec r2
#define scale r3
@// Output registers
#define result r0
@//Local Scratch Registers
#define argTwiddle r1
#define argDst r2
#define argScale r4
#define tmpOrder r4
#define pTwiddle r4
#define pOut r5
#define subFFTSize r7
#define subFFTNum r6
#define N r6
#define order r14
#define diff r9
@// Total num of radix stages required to comple the FFT
#define count r8
#define x0r r4
#define x0i r5
#define diffMinusOne r2
#define subFFTSizeTmp r6
#define step r3
#define step1 r4
#define twStep r8
#define zero r9
#define pTwiddleTmp r5
#define t0 r10
@// Neon registers
#define dX0 d0
#define dzero d1
#define dZero d2
#define dShift d3
#define dX0r d2
#define dX0i d3
#define dX1r d4
#define dX1i d5
#define dT0 d6
#define dT1 d7
#define dT2 d8
#define dT3 d9
#define qT0 d10
#define qT1 d12
#define dW0r d14
#define dW0i d15
#define dW1r d16
#define dW1i d17
#define dY0r d14
#define dY0i d15
#define dY1r d16
#define dY1i d17
#define dY0rS64 d14.s64
#define dY0iS64 d15.s64
#define qT2 d18
#define qT3 d20
@// lastThreeelements
#define dX1 d3
#define dW0 d4
#define dW1 d5
#define dY0 d10
#define dY1 d11
#define dY2 d12
#define dY3 d13
#define half d0
@// Allocate stack memory required by the function
@// Write function header
M_START omxSP_FFTFwd_RToCCS_F32_Sfs,r11,d15
@ Structure offsets for the FFTSpec
.set ARMsFFTSpec_N, 0
.set ARMsFFTSpec_pBitRev, 4
.set ARMsFFTSpec_pTwiddle, 8
.set ARMsFFTSpec_pBuf, 12
@// Define stack arguments
@// Read the size from structure and take log
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
@// Read other structure parameters
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
@// N=1 Treat seperately
CMP N,#1
BGT sizeGreaterThanOne
VLD1.F32 dX0[0],[pSrc]
MOV zero,#0
VMOV.F32 dzero[0],zero
VMOV.F32 dZero[0],zero
VST3.F32 {dX0[0],dzero[0],dZero[0]},[pDst]
B End
sizeGreaterThanOne:
@// Do a N/2 point complex FFT including the scaling
MOV N,N,ASR #1 @// N/2 point complex FFT
CLZ order,N @// N = 2^order
RSB order,order,#31
MOV subFFTSize,#1
@//MOV subFFTNum,N
CMP order,#3
BGT orderGreaterthan3 @// order > 3
CMP order,#1
BGE orderGreaterthan0 @// order > 0
VLD1.F32 dX0,[pSrc]
VST1.F32 dX0,[pOut]
MOV pSrc,pOut
MOV argDst,pDst
BLT FFTEnd
orderGreaterthan0:
@// set the buffers appropriately for various orders
CMP order,#2
MOVEQ argDst,pDst
MOVNE argDst,pOut
@// Pass the first stage destination in RN5
MOVNE pOut,pDst
MOV argTwiddle,pTwiddle
CMP order,#1
BGT orderGreaterthan1
@// order = 1
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
B FFTEnd
orderGreaterthan1:
CMP order,#2
BGT orderGreaterthan2
@// order =2
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
BL armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
B FFTEnd
orderGreaterthan2:@// order =3
BL armSP_FFTFwd_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
BL armSP_FFTFwd_CToC_FC32_Radix2_OutOfPlace_unsafe
BL armSP_FFTFwd_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
B FFTEnd
orderGreaterthan3:
specialScaleCase:
@// Set input args to fft stages
TST order, #2
MOVEQ argDst,pDst
MOVNE argDst,pOut
@// Pass the first stage destination in RN5
MOVNE pOut,pDst
MOV argTwiddle,pTwiddle
@//check for even or odd order
@// NOTE: The following combination of BL's would work fine even though
@// the first BL would corrupt the flags. This is because the end of
@// the "grpZeroSetLoop" loop inside
@// armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe sets the Z flag
@// to EQ
TST order,#0x00000001
BLEQ armSP_FFTFwd_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
BLNE armSP_FFTFwd_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
CMP subFFTNum,#4
BLT FFTEnd
unscaledRadix4Loop:
BEQ lastStageUnscaledRadix4
BL armSP_FFTFwd_CToC_FC32_Radix4_OutOfPlace_unsafe
CMP subFFTNum,#4
B unscaledRadix4Loop
lastStageUnscaledRadix4:
BL armSP_FFTFwd_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
B FFTEnd
FFTEnd:
finalComplexToRealFixup:
@// F(0) = 1/2[Z(0) + Z'(0)] - j [Z(0) - Z'(0)]
@// 1/2[(a+jb) + (a-jb)] - j [(a+jb) - (a-jb)]
@// 1/2[2a+j0] - j [0+j2b]
@// (a+b, 0)
@// F(N/2) = 1/2[Z(0) + Z'(0)] + j [Z(0) - Z'(0)]
@// 1/2[(a+jb) + (a-jb)] + j [(a+jb) - (a-jb)]
@// 1/2[2a+j0] + j [0+j2b]
@// (a-b, 0)
@// F(0) and F(N/2)
VLD2.F32 {dX0r[0],dX0i[0]},[pSrc]!
MOV zero,#0
VMOV.F32 dX0r[1],zero
MOV step,subFFTSize,LSL #3 @// step = N/2 * 8 bytes
VMOV.F32 dX0i[1],zero
@// twStep = 3N/8 * 8 bytes pointing to W^1
SUB twStep,step,subFFTSize,LSL #1
VADD.F32 dY0r,dX0r,dX0i @// F(0) = ((Z0.r+Z0.i) , 0)
MOV step1,subFFTSize,LSL #2 @// step1 = N/2 * 4 bytes
VSUB.F32 dY0i,dX0r,dX0i @// F(N/2) = ((Z0.r-Z0.i) , 0)
SUBS subFFTSize,subFFTSize,#2
VST1.F32 dY0r,[argDst],step
ADD pTwiddleTmp,argTwiddle,#8 @// W^2
VST1.F32 dY0i,[argDst]!
ADD argTwiddle,argTwiddle,twStep @// W^1
VDUP.F32 dzero,zero
SUB argDst,argDst,step
BLT End
BEQ lastElement
SUB step,step,#24
SUB step1,step1,#8 @// (N/4-1)*8 bytes
@// F(k) = 1/2[Z(k) + Z'(N/2-k)] -j*W^(k) [Z(k) - Z'(N/2-k)]
@// Note: W^k is stored as negative values in the table
@// Process 4 elements at a time. E.g: F(1),F(2) and F(N/2-2),F(N/2-1)
@// since both of them require Z(1),Z(2) and Z(N/2-2),Z(N/2-1)
ADR t0, HALF
VLD1.F32 half[0], [t0]
evenOddButterflyLoop:
VLD1.F32 dW0r,[argTwiddle],step1
VLD1.F32 dW1r,[argTwiddle]!
VLD2.F32 {dX0r,dX0i},[pSrc],step
SUB argTwiddle,argTwiddle,step1
VLD2.F32 {dX1r,dX1i},[pSrc]!
SUB step1,step1,#8 @// (N/4-2)*8 bytes
VLD1.F32 dW0i,[pTwiddleTmp],step1
VLD1.F32 dW1i,[pTwiddleTmp]!
SUB pSrc,pSrc,step
SUB pTwiddleTmp,pTwiddleTmp,step1
VREV64.F32 dX1r,dX1r
VREV64.F32 dX1i,dX1i
SUBS subFFTSize,subFFTSize,#4
VSUB.F32 dT2,dX0r,dX1r @// a-c
SUB step1,step1,#8
VADD.F32 dT0,dX0r,dX1r @// a+c
VSUB.F32 dT1,dX0i,dX1i @// b-d
VADD.F32 dT3,dX0i,dX1i @// b+d
VMUL.F32 dT0,dT0,half[0]
VMUL.F32 dT1,dT1,half[0]
VZIP.F32 dW1r,dW1i
VZIP.F32 dW0r,dW0i
VMUL.F32 qT0,dW1r,dT2
VMUL.F32 qT1,dW1r,dT3
VMUL.F32 qT2,dW0r,dT2
VMUL.F32 qT3,dW0r,dT3
VMLA.F32 qT0,dW1i,dT3
VMLS.F32 qT1,dW1i,dT2
VMLS.F32 qT2,dW0i,dT3
VMLA.F32 qT3,dW0i,dT2
VMUL.F32 dX1r,qT0,half[0]
VMUL.F32 dX1i,qT1,half[0]
VSUB.F32 dY1r,dT0,dX1i @// F(N/2 -1)
VADD.F32 dY1i,dT1,dX1r
VNEG.F32 dY1i,dY1i
VREV64.F32 dY1r,dY1r
VREV64.F32 dY1i,dY1i
VMUL.F32 dX0r,qT2,half[0]
VMUL.F32 dX0i,qT3,half[0]
VSUB.F32 dY0r,dT0,dX0i @// F(1)
VADD.F32 dY0i,dT1,dX0r
VST2.F32 {dY0r,dY0i},[argDst],step
VST2.F32 {dY1r,dY1i},[argDst]!
SUB argDst,argDst,step
SUB step,step,#32 @// (N/2-4)*8 bytes
BGT evenOddButterflyLoop
@// set both the ptrs to the last element
SUB pSrc,pSrc,#8
SUB argDst,argDst,#8
@// Last element can be expanded as follows
@// 1/2[Z(k) + Z'(k)] + j w^k [Z(k) - Z'(k)]
@// 1/2[(a+jb) + (a-jb)] + j w^k [(a+jb) - (a-jb)]
@// 1/2[2a+j0] + j (c+jd) [0+j2b]
@// (a-bc, -bd)
@// Since (c,d) = (0,1) for the last element, result is just (a,-b)
lastElement:
VLD1.F32 dX0r,[pSrc]
VST1.F32 dX0r[0],[argDst]!
VNEG.F32 dX0r,dX0r
VST1.F32 dX0r[1],[argDst]!
End:
@// Set return value
MOV result, #OMX_Sts_NoErr
@// Write function tail
M_END
HALF: .float 0.5
.end

View File

@ -1,49 +0,0 @@
/*
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*
*/
#include "dl/api/armOMX.h"
#include "dl/api/omxtypes.h"
#include "dl/sp/api/armSP.h"
#include "dl/sp/api/omxSP.h"
/**
* Function: omxSP_FFTGetBufSize_R_F32
*
* Description:
* Computes the size of the specification structure required for the length
* 2^order real FFT and IFFT functions.
*
* Remarks:
* This function is used in conjunction with the 32-bit functions
* <FFTFwd_RToCCS_F32_Sfs> and <FFTInv_CCSToR_F32_Sfs>.
*
* Parameters:
* [in] order base-2 logarithm of the length; valid in the range
* [1,12]. ([1,15] if BIG_FFT_TABLE is defined.)
* [out] pSize pointer to the number of bytes required for the
* specification structure.
*
* Return Value:
* Standard omxError result. See enumeration for possible result codes.
*
*/
OMXResult omxSP_FFTGetBufSize_R_F32(OMX_INT order, OMX_INT *pSize) {
if (!pSize || (order < 1) || (order > TWIDDLE_TABLE_ORDER))
return OMX_Sts_BadArgErr;
/*
* The required size is the same as for R_S32, because the
* elements are the same size and because ARMsFFTSpec_R_SC32 is
* the same size as ARMsFFTSpec_R_FC32.
*/
return omxSP_FFTGetBufSize_R_S32(order, pSize);
}

View File

@ -1,91 +0,0 @@
/*
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*
* This file was originally licensed as follows. It has been
* relicensed with permission from the copyright holders.
*/
/**
*
* File Name: omxSP_FFTGetBufSize_R_S32.c
* OpenMAX DL: v1.0.2
* Last Modified Revision: 7777
* Last Modified Date: Thu, 27 Sep 2007
*
* (c) Copyright 2007-2008 ARM Limited. All Rights Reserved.
*
*
* Description:
* Computes the size of the specification structure required.
*/
#include "dl/api/armOMX.h"
#include "dl/api/omxtypes.h"
#include "dl/sp/api/armSP.h"
#include "dl/sp/api/omxSP.h"
/**
* Function: omxSP_FFTGetBufSize_R_S32
*
* Description:
* Computes the size of the specification structure required for the length
* 2^order real FFT and IFFT functions.
*
* Remarks:
* This function is used in conjunction with the 32-bit functions
* <FFTFwd_RToCCS_S32_Sfs> and <FFTInv_CCSToR_S32_Sfs>.
*
* Parameters:
* [in] order base-2 logarithm of the length; valid in the range
* [0,12].
* [out] pSize pointer to the number of bytes required for the
* specification structure.
*
* Return Value:
* Standard omxError result. See enumeration for possible result codes.
*
*/
OMXResult omxSP_FFTGetBufSize_R_S32(
OMX_INT order,
OMX_INT *pSize
)
{
OMX_INT NBy2,N,twiddleSize;
/* Check for order zero */
if (order == 0)
{
*pSize = sizeof(ARMsFFTSpec_R_SC32)
+ sizeof(OMX_S32) * (2); /* Extra size 'N' is used in FFTInv_CCSToR_S32S16_Sfs as a temporary buf */
return OMX_Sts_NoErr;
}
NBy2 = 1 << (order - 1);
N = NBy2<<1;
twiddleSize = 5*N/8; /* 3/4(N/2) + N/4 */
/* 2 pointers to store bitreversed array and twiddle factor array */
*pSize = sizeof(ARMsFFTSpec_R_SC32)
/* Twiddle factors */
+ sizeof(OMX_SC32) * twiddleSize
/* Ping Pong buffer for doing the N/2 point complex FFT */
+ sizeof(OMX_S32) * (N<<1) /* Extra size 'N' is used in FFTInv_CCSToR_S32_Sfs as a temporary buf */
+ 62 ; /* Extra bytes to get 32 byte alignment of ptwiddle and pBuf */
return OMX_Sts_NoErr;
}
/*****************************************************************************
* END OF FILE
*****************************************************************************/

View File

@ -1,210 +0,0 @@
/*
* Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*
* This is a modification of omxSP_FFTInit_R_S32.c to support float
* instead of S32.
*/
#include "dl/api/armOMX.h"
#include "dl/api/omxtypes.h"
#include "dl/sp/api/armSP.h"
#include "dl/sp/api/omxSP.h"
/**
* Function: omxSP_FFTInit_R_F32
*
* Description:
* Initialize the real forward-FFT specification information struct.
*
* Remarks:
* This function is used to initialize the specification structures
* for functions <ippsFFTFwd_RToCCS_F32_Sfs> and
* <ippsFFTInv_CCSToR_F32_Sfs>. Memory for *pFFTSpec must be
* allocated prior to calling this function. The number of bytes
* required for *pFFTSpec can be determined using
* <FFTGetBufSize_R_F32>.
*
* Parameters:
* [in] order base-2 logarithm of the desired block length;
* valid in the range [1,12]. ([1,15] if
* BIG_FFT_TABLE is defined.)
* [out] pFFTFwdSpec pointer to the initialized specification structure.
*
* Return Value:
* Standard omxError result. See enumeration for possible result codes.
*
*/
OMXResult omxSP_FFTInit_R_F32(OMXFFTSpec_R_F32* pFFTSpec, OMX_INT order) {
OMX_INT i;
OMX_INT j;
OMX_FC32* pTwiddle;
OMX_FC32* pTwiddle1;
OMX_FC32* pTwiddle2;
OMX_FC32* pTwiddle3;
OMX_FC32* pTwiddle4;
OMX_F32* pBuf;
OMX_U16* pBitRev;
OMX_U32 pTmp;
OMX_INT Nby2;
OMX_INT N;
OMX_INT M;
OMX_INT diff;
OMX_INT step;
OMX_F32 x;
OMX_F32 y;
OMX_F32 xNeg;
ARMsFFTSpec_R_FC32* pFFTStruct = 0;
pFFTStruct = (ARMsFFTSpec_R_FC32 *) pFFTSpec;
/* Validate args */
if (!pFFTSpec || (order < 1) || (order > TWIDDLE_TABLE_ORDER))
return OMX_Sts_BadArgErr;
/* Do the initializations */
Nby2 = 1 << (order - 1);
N = Nby2 << 1;
/* optimized implementations don't use bitreversal */
pBitRev = NULL;
pTwiddle = (OMX_FC32 *) (sizeof(ARMsFFTSpec_R_SC32) + (OMX_S8*) pFFTSpec);
/* Align to 32 byte boundary */
pTmp = ((OMX_U32)pTwiddle) & 31;
if (pTmp)
pTwiddle = (OMX_FC32*) ((OMX_S8*)pTwiddle + (32 - pTmp));
pBuf = (OMX_F32*) (sizeof(OMX_FC32)*(5*N/8) + (OMX_S8*) pTwiddle);
/* Align to 32 byte boundary */
pTmp = ((OMX_U32)pBuf)&31; /* (OMX_U32)pBuf % 32 */
if (pTmp)
pBuf = (OMX_F32*) ((OMX_S8*)pBuf + (32 - pTmp));
/*
* Filling Twiddle factors :
*
* exp^(-j*2*PI*k/ (N/2) ) ; k=0,1,2,...,3/4(N/2)
*
* N/2 point complex FFT is used to compute N point real FFT The
* original twiddle table "armSP_FFT_F32TwiddleTable" is of size
* (MaxSize/8 + 1) Rest of the values i.e., upto MaxSize are
* calculated using the symmetries of sin and cos The max size of
* the twiddle table needed is 3/4(N/2) for a radix-4 stage
*
* W = (-2 * PI) / N
* N = 1 << order
* W = -PI >> (order - 1)
*/
M = Nby2 >> 3;
diff = TWIDDLE_TABLE_ORDER - (order - 1);
/* step into the twiddle table for the current order */
step = 1 << diff;
x = armSP_FFT_F32TwiddleTable[0];
y = armSP_FFT_F32TwiddleTable[1];
xNeg = 1;
if ((order - 1) >= 3) {
/* i = 0 case */
pTwiddle[0].Re = x;
pTwiddle[0].Im = y;
pTwiddle[2*M].Re = -y;
pTwiddle[2*M].Im = xNeg;
pTwiddle[4*M].Re = xNeg;
pTwiddle[4*M].Im = y;
for (i = 1; i <= M; i++) {
j = i*step;
x = armSP_FFT_F32TwiddleTable[2*j];
y = armSP_FFT_F32TwiddleTable[2*j+1];
pTwiddle[i].Re = x;
pTwiddle[i].Im = y;
pTwiddle[2*M-i].Re = -y;
pTwiddle[2*M-i].Im = -x;
pTwiddle[2*M+i].Re = y;
pTwiddle[2*M+i].Im = -x;
pTwiddle[4*M-i].Re = -x;
pTwiddle[4*M-i].Im = y;
pTwiddle[4*M+i].Re = -x;
pTwiddle[4*M+i].Im = -y;
pTwiddle[6*M-i].Re = y;
pTwiddle[6*M-i].Im = x;
}
} else if ((order - 1) == 2) {
pTwiddle[0].Re = x;
pTwiddle[0].Im = y;
pTwiddle[1].Re = -y;
pTwiddle[1].Im = xNeg;
pTwiddle[2].Re = xNeg;
pTwiddle[2].Im = y;
} else if ((order-1) == 1) {
pTwiddle[0].Re = x;
pTwiddle[0].Im = y;
}
/*
* Now fill the last N/4 values : exp^(-j*2*PI*k/N) ;
* k=1,3,5,...,N/2-1 These are used for the final twiddle fix-up for
* converting complex to real FFT
*/
M = N >> 3;
diff = TWIDDLE_TABLE_ORDER - order;
step = 1 << diff;
pTwiddle1 = pTwiddle + 3*N/8;
pTwiddle4 = pTwiddle1 + (N/4 - 1);
pTwiddle3 = pTwiddle1 + N/8;
pTwiddle2 = pTwiddle1 + (N/8 - 1);
x = armSP_FFT_F32TwiddleTable[0];
y = armSP_FFT_F32TwiddleTable[1];
xNeg = 1;
if (order >=3) {
for (i = 1; i <= M; i += 2) {
j = i*step;
x = armSP_FFT_F32TwiddleTable[2*j];
y = armSP_FFT_F32TwiddleTable[2*j+1];
pTwiddle1[0].Re = x;
pTwiddle1[0].Im = y;
pTwiddle1 += 1;
pTwiddle2[0].Re = -y;
pTwiddle2[0].Im = -x;
pTwiddle2 -= 1;
pTwiddle3[0].Re = y;
pTwiddle3[0].Im = -x;
pTwiddle3 += 1;
pTwiddle4[0].Re = -x;
pTwiddle4[0].Im = y;
pTwiddle4 -= 1;
}
} else {
if (order == 2) {
pTwiddle1[0].Re = -y;
pTwiddle1[0].Im = xNeg;
}
}
/* Update the structure */
pFFTStruct->N = N;
pFFTStruct->pTwiddle = pTwiddle;
pFFTStruct->pBitRev = pBitRev;
pFFTStruct->pBuf = pBuf;
return OMX_Sts_NoErr;
}

View File

@ -1,284 +0,0 @@
@//
@// Copyright (c) 2013 The WebRTC project authors. All Rights Reserved.
@//
@// Copyright 2016, Mozilla Foundation and contributors
@//
@// Use of this source code is governed by a BSD-style license
@// that can be found in the LICENSE file in the root of the source
@// tree. An additional intellectual property rights grant can be found
@// in the file PATENTS. All contributing project authors may
@// be found in the AUTHORS file in the root of the source tree.
@//
@// This is a modification of omxSP_FFTInv_CCSToR_S32_Sfs_s.s
@// to support float instead of SC32.
@//
@// It is further modified to produce an "unscaled" version, which
@// actually multiplies by two for consistency with the other FFT functions
@// in use.
@//
@//
@// Description:
@// Compute an inverse FFT for a complex signal
@//
@//
@// Include standard headers
#include "dl/api/armCOMM_s.h"
#include "dl/api/omxtypes_s.h"
@// Import symbols required from other files
@// (For example tables)
.extern armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
.extern armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
.extern armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
.extern armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe
.extern armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe
.extern armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe
@// Set debugging level
@//DEBUG_ON SETL {TRUE}
@// Guarding implementation by the processor name
@// Guarding implementation by the processor name
@// Import symbols required from other files
@// (For example tables)
.extern armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
.extern armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
@//Input Registers
#define pSrc r0
#define pDst r1
#define pFFTSpec r2
#define scale r3
@// Output registers
#define result r0
@//Local Scratch Registers
#define argTwiddle r1
#define argDst r2
#define argScale r4
#define tmpOrder r4
#define pTwiddle r4
#define pOut r5
#define subFFTSize r7
#define subFFTNum r6
#define N r6
#define order r14
#define diff r9
@// Total num of radix stages required to comple the FFT
#define count r8
#define x0r r4
#define x0i r5
#define diffMinusOne r2
#define round r3
#define pOut1 r2
#define size r7
#define step r8
#define step1 r9
#define twStep r10
#define pTwiddleTmp r11
#define argTwiddle1 r12
#define zero r14
@// Neon registers
#define dX0 D0
#define dShift D1
#define dX1 D1
#define dY0 D2
#define dY1 D3
#define dX0r D0
#define dX0i D1
#define dX1r D2
#define dX1i D3
#define dW0r D4
#define dW0i D5
#define dW1r D6
#define dW1i D7
#define dT0 D8
#define dT1 D9
#define dT2 D10
#define dT3 D11
#define qT0 d12
#define qT1 d14
#define qT2 d16
#define qT3 d18
#define dY0r D4
#define dY0i D5
#define dY1r D6
#define dY1i D7
#define dzero D20
#define dY2 D4
#define dY3 D5
#define dW0 D6
#define dW1 D7
#define dW0Tmp D10
#define dW1Neg D11
#define sN S0.S32
#define fN S1
@// two must be the same as dScale[0]!
#define dScale D2
#define two S4
@// Allocate stack memory required by the function
M_ALLOC4 complexFFTSize, 4
@// Write function header
M_START omxSP_FFTInv_CCSToR_F32_Sfs_unscaled,r11,d15
@ Structure offsets for the FFTSpec
.set ARMsFFTSpec_N, 0
.set ARMsFFTSpec_pBitRev, 4
.set ARMsFFTSpec_pTwiddle, 8
.set ARMsFFTSpec_pBuf, 12
@// Define stack arguments
@// Read the size from structure and take log
LDR N, [pFFTSpec, #ARMsFFTSpec_N]
@// Read other structure parameters
LDR pTwiddle, [pFFTSpec, #ARMsFFTSpec_pTwiddle]
LDR pOut, [pFFTSpec, #ARMsFFTSpec_pBuf]
@// N=1 Treat seperately
CMP N,#1
BGT sizeGreaterThanOne
VLD1.F32 dX0[0],[pSrc]
VST1.F32 dX0[0],[pDst]
B End
sizeGreaterThanOne:
@// Call the preTwiddle Radix2 stage before doing the compledIFFT
BL armSP_FFTInv_CCSToR_F32_preTwiddleRadix2_unsafe
complexIFFT:
ASR N,N,#1 @// N/2 point complex IFFT
M_STR N, complexFFTSize @ Save N for scaling later
ADD pSrc,pOut,N,LSL #3 @// set pSrc as pOut1
CLZ order,N @// N = 2^order
RSB order,order,#31
MOV subFFTSize,#1
@//MOV subFFTNum,N
CMP order,#3
BGT orderGreaterthan3 @// order > 3
CMP order,#1
BGE orderGreaterthan0 @// order > 0
VLD1.F32 dX0,[pSrc]
VST1.F32 dX0,[pDst]
MOV pSrc,pDst
BLT FFTEnd
orderGreaterthan0:
@// set the buffers appropriately for various orders
CMP order,#2
MOVNE argDst,pDst
MOVEQ argDst,pOut
@// Pass the first stage destination in RN5
MOVEQ pOut,pDst
MOV argTwiddle,pTwiddle
BGE orderGreaterthan1
BLLT armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe @// order = 1
B FFTEnd
orderGreaterthan1:
MOV tmpOrder,order @// tmpOrder = RN 4
BL armSP_FFTInv_CToC_FC32_Radix2_fs_OutOfPlace_unsafe
CMP tmpOrder,#2
BLGT armSP_FFTInv_CToC_FC32_Radix2_OutOfPlace_unsafe
BL armSP_FFTInv_CToC_FC32_Radix2_ls_OutOfPlace_unsafe
B FFTEnd
orderGreaterthan3:
specialScaleCase:
@// Set input args to fft stages
TST order, #2
MOVNE argDst,pDst
MOVEQ argDst,pOut
@// Pass the first stage destination in RN5
MOVEQ pOut,pDst
MOV argTwiddle,pTwiddle
@//check for even or odd order
@// NOTE: The following combination of BL's would work fine even though
@// the first BL would corrupt the flags. This is because the end of
@// the "grpZeroSetLoop" loop inside
@// armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe sets the Z flag
@// to EQ
TST order,#0x00000001
BLEQ armSP_FFTInv_CToC_FC32_Radix4_fs_OutOfPlace_unsafe
BLNE armSP_FFTInv_CToC_FC32_Radix8_fs_OutOfPlace_unsafe
CMP subFFTNum,#4
BLT FFTEnd
unscaledRadix4Loop:
BEQ lastStageUnscaledRadix4
BL armSP_FFTInv_CToC_FC32_Radix4_OutOfPlace_unsafe
CMP subFFTNum,#4
B unscaledRadix4Loop
lastStageUnscaledRadix4:
BL armSP_FFTInv_CToC_FC32_Radix4_ls_OutOfPlace_unsafe
B FFTEnd
FFTEnd: @// Does only the scaling
@ Scale inverse FFT result by 2 for consistency with other FFTs
VMOV.F32 two, #2.0 @ two = dScale[0]
@// N = subFFTSize ; dataptr = pDst
scaleFFTData:
VLD1.F32 {dX0},[pSrc] @// pSrc contains pDst pointer
SUBS subFFTSize,subFFTSize,#1
VMUL.F32 dX0, dX0, dScale[0]
VST1.F32 {dX0},[pSrc]!
BGT scaleFFTData
End:
@// Set return value
MOV result, #OMX_Sts_NoErr
@// Write function tail
M_END
.end

View File

@ -104,7 +104,6 @@
<li><a href="about:license#jquery">jQuery License</a></li>
<li><a href="about:license#k_exp">k_exp License</a></li>
<li><a href="about:license#khronos">Khronos group License</a></li>
<li><a href="about:license#kiss_fft">Kiss FFT License</a></li>
#ifdef MOZ_USE_LIBCXX
<li><a href="about:license#libc++">libc++ License</a></li>
#endif
@ -2041,7 +2040,6 @@ WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
<li><code>gfx/ots/</code></li>
<li><code>gfx/ycbcr/</code></li>
<li><code>ipc/chromium/</code></li>
<li><code>media/openmax_dl/</code></li>
<li><code>toolkit/components/reputationservice/</code></li>
<li><code>toolkit/components/url-classifier/chromium/</code></li>
<li><code>tools/profiler/</code></li>
@ -3116,80 +3114,6 @@ OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
</pre>
<hr>
<h1><a id="khronos"></a>Khronos group License</h1>
<p>This license applies to the following files:</p>
<ul>
<li><code>media/openmax_dl/dl/api/omxtypes.h</code></li>
<li><code>media/openmax_dl/dl/sp/api/omxSP.h</code></li>
</ul>
<pre>
Copyright 2005-2008 The Khronos Group Inc. All Rights Reserved.
These materials are protected by copyright laws and contain material
proprietary to the Khronos Group, Inc. You may use these materials
for implementing Khronos specifications, without altering or removing
any trademark, copyright or other notice from the specification.
Khronos Group makes no, and expressly disclaims any, representations
or warranties, express or implied, regarding these materials, including,
without limitation, any implied warranties of merchantability or fitness
for a particular purpose or non-infringement of any intellectual property.
Khronos Group makes no, and expressly disclaims any, warranties, express
or implied, regarding the correctness, accuracy, completeness, timeliness,
and reliability of these materials.
Under no circumstances will the Khronos Group, or any of its Promoters,
Contributors or Members or their respective partners, officers, directors,
employees, agents or representatives be liable for any damages, whether
direct, indirect, special or consequential damages for lost revenues,
lost profits, or otherwise, arising from or in connection with these
materials.
Khronos and OpenMAX are trademarks of the Khronos Group Inc.
</pre>
<hr>
<h1><a id="kiss_fft"></a>Kiss FFT License</h1>
<p>This license applies to files in the directory
<code>media/kiss_fft/</code>.</p>
<pre>
Copyright (c) 2003-2010 Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to
endorse or promote products derived from this software without specific
prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</pre>
<hr>
#ifdef MOZ_USE_LIBCXX