Commit Graph

320950 Commits

Author SHA1 Message Date
Kim Phillips
a5bbf6fa79 crypto: caam - increase TRNG clocks per sample
we need to configure the TRNG to use more clocks per sample
to handle the two back-to-back 64KiB random descriptor requests
on higher frequency P5040s.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:15 +08:00
Suresh Siddha
b6f3fefe1f crypto, tcrypt: remove local_bh_disable/enable() around local_irq_disable/enable()
Ran into this while looking at some new crypto code using FPU
hitting a WARN_ON_ONCE(!irq_fpu_usable()) in the kernel_fpu_begin()
on a x86 kernel that uses the new eagerfpu model. In short, current eagerfpu
changes return 0 for interrupted_kernel_fpu_idle() and the in_interrupt()
thinks it is in the interrupt context because of the local_bh_disable().
Thus resulting in the WARN_ON().

Remove the local_bh_disable/enable() calls around the existing
local_irq_disable/enable() calls. local_irq_disable/enable() already
disables the BH.

 [ If there are any other legitimate users calling kernel_fpu_begin() from
   the process context but with BH disabled, then we can look into fixing the
   irq_fpu_usable() in future. ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:15 +08:00
Peter Senna Tschudin
35c41db8f9 crypto: tegra-aes - fix error return code
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:15 +08:00
Peter Senna Tschudin
b48ae1df54 crypto: crypto4xx - fix error return code
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:15 +08:00
Peter Senna Tschudin
c2ff861d96 crypto: hifn_795x - fix error return code
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:14 +08:00
Peter Senna Tschudin
79c09c122f crypto: ux500 - fix error return code
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Reviewed-by: Arun Murthy <arunrmurthy83@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:14 +08:00
Horia Geanta
39ab735835 crypto: caam - fix error IDs for SEC v5.x RNG4
According to SEC v5.0-v5.3 reference manuals.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-27 13:32:14 +08:00
Fabio Estevam
821873abc2 hwrng: mxc-rnga - Access data via structure
In current driver, everytime we need to access the rng clock
,ie to enable or disable it, a call to clk_get is done.

This is not correct and the preferred way is to provide a rng data structure
that could be used for accessing rng resources.

Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:07 +08:00
Fabio Estevam
a9ccb7bd0a hwrng: mxc-rnga - Adapt clocks to new i.mx clock framework
Adapt clocks to the new i.mx clock framework and fix the following warning:

------------[ cut here ]------------
WARNING: at drivers/clk/clk.c:511 __clk_enable+0x9c/0xac()
Modules linked in:
Backtrace:
[<800124c8>] (dump_backtrace+0x0/0x10c) from [<804172dc>] (dump_stack+0x18/0x1c)
 r7:00000009 r6:000001ff r5:8032cb50 r4:00000000
[<804172c4>] (dump_stack+0x0/0x1c) from [<80021834>] (warn_slowpath_common+0x54)
[<800217e0>] (warn_slowpath_common+0x0/0x6c) from [<80021870>] (warn_slowpath_n)
 r9:80581cac r8:8700a9c0 r7:805ab070 r6:80000013 r5:806133d4
r4:8700a9c0
[<8002184c>] (warn_slowpath_null+0x0/0x2c) from [<8032cb50>] (__clk_enable+0x9c)
[<8032cab4>] (__clk_enable+0x0/0xac) from [<8032cb88>] (clk_enable+0x28/0x44)
 r5:806133d4 r4:8700a9c0
[<8032cb60>] (clk_enable+0x0/0x44) from [<80560f14>] (mxc_rnga_probe+0x68/0x164)
 r7:805ab070 r6:8706ec00 r5:80611314 r4:00000000
[<80560eac>] (mxc_rnga_probe+0x0/0x164) from [<8025914c>] (platform_drv_probe+0)
[<8025912c>] (platform_drv_probe+0x0/0x24) from [<80257c7c>] (driver_probe_devi)
[<80257bfc>] (driver_probe_device+0x0/0x204) from [<80257e94>] (__driver_attach)
 r9:80581cac r8:0000008e r7:00000000 r6:8706ec3c r5:805ab070
r4:8706ec08
[<80257e00>] (__driver_attach+0x0/0x98) from [<8025642c>] (bus_for_each_dev+0x6)
 r7:00000000 r6:80257e00 r5:87035e98 r4:805ab070
[<802563c4>] (bus_for_each_dev+0x0/0x94) from [<80257adc>] (driver_attach+0x20/)
 r7:00000000 r6:873f2380 r5:805ab338 r4:805ab070
[<80257abc>] (driver_attach+0x0/0x28) from [<80256d50>] (bus_add_driver+0x18c/0)
[<80256bc4>] (bus_add_driver+0x0/0x268) from [<802584c4>] (driver_register+0x80)
[<80258444>] (driver_register+0x0/0x134) from [<802594f4>] (platform_driver_reg)
 r7:00000000 r6:805c2e00 r5:00000007 r4:805ab05c
[<802594a8>] (platform_driver_register+0x0/0x60) from [<80259528>] (platform_dr)
[<80259508>] (platform_driver_probe+0x0/0xa4) from [<80560ea0>] (mod_init+0x18/)
 r7:00000000 r6:805c2e00 r5:00000007 r4:87034000
[<80560e88>] (mod_init+0x0/0x24) from [<800086b4>] (do_one_initcall+0x40/0x194)
[<80008674>] (do_one_initcall+0x0/0x194) from [<8053d3f4>] (kernel_init+0xfc/0x)
[<8053d2f8>] (kernel_init+0x0/0x1cc) from [<80027190>] (do_exit+0x0/0x7ec)
---[ end trace 4198eed02050f461 ]---

Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:07 +08:00
Horia Geanta
891104ed00 crypto: caam - add IPsec ESN support
Support for ESNs (extended sequence numbers).
Tested with strongswan by connecting back-to-back P1010RDB with P2020RDB.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:07 +08:00
Jussi Kivilinna
c2b3711d11 crypto: 842 - remove .cra_list initialization
.cra_list initialization is unneeded and have been removed from all other
crypto modules except 842.

Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:06 +08:00
Jussi Kivilinna
312639bb1b Revert "[CRYPTO] cast6: inline bloat--"
This reverts commit e6ccc727f3.

Above commit caused performance regression for CAST6. Reverting gives
following increase in tcrypt speed tests (revert-vs-old ratios).

AMD Phenom II X6 1055T, x86-64:

size    ecb             cbc             ctr             lrw             xts
        enc     dec     enc     dec     enc     dec     enc     dec     enc     dec
16b     1.15x   1.17x   1.16x   1.17x   1.16x   1.16x   1.14x   1.19x   1.05x   1.07x
64b     1.19x   1.23x   1.20x   1.22x   1.19x   1.19x   1.16x   1.24x   1.12x   1.12x
256b    1.21x   1.24x   1.22x   1.24x   1.20x   1.20x   1.17x   1.21x   1.16x   1.14x
1kb     1.21x   1.25x   1.22x   1.24x   1.21x   1.21x   1.18x   1.22x   1.17x   1.15x
8kb     1.21x   1.25x   1.22x   1.24x   1.21x   1.21x   1.18x   1.22x   1.18x   1.15x

Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:06 +08:00
Jussi Kivilinna
35434c5fb7 crypto: cast6 - fix sparse warnings (symbol was not declared, should be static?)
Fix "symbol 'x' was not declared. Should it be static?" sparse warnings.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:06 +08:00
Jussi Kivilinna
3cfad0d03c crypto: cast5 - fix sparse warnings (symbol was not declared, should be static?)
Fix "symbol 'x' was not declared. Should it be static?" sparse warnings.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:05 +08:00
Jussi Kivilinna
1ffb72a39a crypto: camellia-x86_64 - fix sparse warnings (constant is so big)
Fix "constant 0xXXXXXXXXXXXXXXXX is so big it's unsigned long" sparse warnings.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:05 +08:00
Jussi Kivilinna
66ce0b0f29 crypto: crypto_user - fix sparse warnings (symbol was not declared, should be static?)
Fix "symbol 'x' was not declared. Should it be static?" sparse warnings.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:05 +08:00
Jussi Kivilinna
c09220e1bc crypto: cast6-avx - tune assembler code for more performance
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies, interleaves instructions better for out-of-order scheduling
and merges constant 16-bit rotation with round-key variable rotation.

tcrypt ECB results:

Intel Core i5-2450M:

size    old-vs-new      new-vs-generic  old-vs-generic
        enc     dec     enc     dec     enc     dec
256     1.13x   1.19x   2.05x   2.17x   1.82x   1.82x
1k      1.18x   1.21x   2.26x   2.33x   1.93x   1.93x
8k      1.19x   1.19x   2.32x   2.33x   1.95x   1.95x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Improvements to round-key variable rotation handling.
 - Further interleaving improvements for better out-of-order scheduling.

Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:05 +08:00
Jussi Kivilinna
ddaea7869d crypto: cast5-avx - tune assembler code for more performance
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies, interleaves instructions better for out-of-order scheduling
and merges constant 16-bit rotation with round-key variable rotation.

tcrypt ECB results (128bit key):

Intel Core i5-2450M:

size    old-vs-new      new-vs-generic  old-vs-generic
        enc     dec     enc     dec     enc     dec
256     1.18x   1.18x   2.45x   2.47x   2.08x   2.10x
1k      1.20x   1.20x   2.73x   2.73x   2.28x   2.28x
8k      1.20x   1.19x   2.73x   2.73x   2.28x   2.29x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Improvements to round-key variable rotation handling.
 - Further interleaving improvements for better out-of-order scheduling.

Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:04 +08:00
Jussi Kivilinna
f94a73f8dd crypto: twofish-avx - tune assembler code for more performance
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.

Tested on Intel Core i5-2450M and AMD FX-8100.

tcrypt ECB results:

Intel Core i5-2450M:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.12x   1.13x   1.36x   1.37x   1.21x   1.22x
1k      1.14x   1.14x   1.48x   1.49x   1.29x   1.31x
8k      1.14x   1.14x   1.50x   1.52x   1.32x   1.33x

AMD FX-8100:

size    old-vs-new      new-vs-3way     old-vs-3way
        enc     dec     enc     dec     enc     dec
256     1.10x   1.11x   1.01x   1.01x   0.92x   0.91x
1k      1.11x   1.12x   1.08x   1.07x   0.97x   0.96x
8k      1.11x   1.13x   1.10x   1.08x   0.99x   0.97x

[v2]
 - Do instruction interleaving another way to avoid adding new FPU<=>CPU
   register moves as these cause performance drop on Bulldozer.
 - Further interleaving improvements for better out-of-order scheduling.

Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:04 +08:00
Sachin Kamat
49d30d3d5f crypto: geode-aes - Use module_pci_driver
module_pci_driver makes the code simpler by eliminating
module_init and module_exit calls.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:03 +08:00
Wei Yongjun
21a5b95f56 crypto: remove duplicated include
From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:03 +08:00
Kim Phillips
2af8f4a272 crypto: caam - coccicheck fixes
use true/false for bool, fix code alignment, and fix two allocs with
no test.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:03 +08:00
Devendra Naga
6bbb98ddfc crypto: ux500/hash - remove unneeded return at ux500_hash_mod_fini
Signed-off-by: Devendra Naga <develkernel412222@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:03 +08:00
David McCullough
f0be44f4fb arm/crypto: Add optimized AES and SHA1 routines
Add assembler versions of AES and SHA1 for ARM platforms.  This has provided
up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.

Platform   CPU SPeed    Endian   Before (bps)   After (bps)   Improvement

IXP425      533 MHz      big     11217042        15566294        ~38%
KS8695      166 MHz     little    3828549         5795373        ~51%

Signed-off-by: David McCullough <ucdevel@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:02 +08:00
Kent Yoder
956c203c5e crypto: Add a MAINTAINERS entry for P7+ in-Nest crypto driver
Add a MAINTAINERS entry for the IBM Power in-Nest Crypto Acceleators
driver.

Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:01 +08:00
Horia Geanta
e763eb699b crypto: talitos - add IPsec ESN support
Support for ESNs (extended sequence numbers).
Tested with strongswan on a P2020RDB back-to-back setup.
Extracted from /etc/ipsec.conf:
esp=aes-sha1-esn-modp4096!

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:56:30 +08:00
Horia Geanta
79fd31d355 crypto: talitos - support for assoc data provided as scatterlist
Generate a link table in case assoc data is a scatterlist.
While at it, add support for handling non-contiguous assoc data and iv.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:56:27 +08:00
Horia Geanta
2a1cfe46b1 crypto: talitos - change type and name for [src|dst]_is_chained
It's more natural to think of these vars as bool rather than int.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:56:26 +08:00
Horia Geanta
602499a342 crypto: talitos - prune unneeded descriptor allocation param
talitos_edesc_alloc does not need hash_result param.
Checking whether dst scatterlist is NULL or not is all that is required.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:56:26 +08:00
Horia Geanta
60542505b0 crypto: talitos - fix icv management on outbound direction
For IPsec encryption, in the case when:
-the input buffer is fragmented (edesc->src_nents > 0)
-the output buffer is not fragmented (edesc->dst_nents = 0)
the ICV is not output in the link table, but after the encrypted payload.

Copying the ICV must be avoided in this case; consequently the condition
edesc->dma_len > 0 must be more specific, i.e. must depend on the type
of the output buffer - fragmented or not.

Testing was performed by modifying testmgr to support src != dst,
since currently native kernel IPsec does in-place encryption
(src == dst).

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:56:26 +08:00
Kim Phillips
b286e00304 crypto: talitos - consolidate common cra_* assignments
the entry points and geniv definitions for all aead,
ablkcipher, and hash algorithms are all common; move them to a
single assignment in talitos_alg_alloc().

This assumes it's ok to assign a setkey() on non-hmac algs.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:53:53 +08:00
Kim Phillips
d4cd3283f6 crypto: talitos - consolidate cra_type assignments
lighten driver_algs[] by moving them to talitos_alg_alloc().

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28 23:53:53 +08:00
Tushar Behera
22eed1ca19 crypto: atmel - Remove possible typo error
Commit bd3c7b5c2a ("crypto: atmel - add Atmel AES driver") possibly
has a typo error of adding an extra CONFIG_.

CC: Nicolas Royer <nicolas@eukrea.com>
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20 16:28:13 +08:00
Julia Lawall
4dbb845dde drivers/char/hw_random/octeon-rng.c: drop frees of devm allocated data
devm_kfree and devm_iounmap should not have to be explicitly used.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,d;
@@

x = devm_kzalloc(...)
...
?-devm_kfree(d,x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20 16:28:13 +08:00
Michael Ellerman
33b58b01ac crypto: nx - Remove virt_to_abs() usage in nx-842.c
virt_to_abs() is just a wrapper around __pa(), use __pa() directly.

We should be including <asm/page.h> to get __pa(). abs_addr.h will be
removed shortly so drop that.

We were getting of.h via abs_addr.h so we need to include that directly.

Having done all that, clean up the ordering of the includes.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20 16:28:12 +08:00
Jussi Kivilinna
023af60825 crypto: aesni_intel - improve lrw and xts performance by utilizing parallel AES-NI hardware pipelines
Use parallel LRW and XTS encryption facilities to better utilize AES-NI
hardware pipelines and gain extra performance.

Tcrypt benchmark results (async), old vs new ratios:

Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7)

aes:128bit
        lrw:256bit      xts:256bit
size    lrw-enc lrw-dec xts-dec xts-dec
16B     0.99x   1.00x   1.22x   1.19x
64B     1.38x   1.50x   1.58x   1.61x
256B    2.04x   2.02x   2.27x   2.29x
1024B   2.56x   2.54x   2.89x   2.92x
8192B   2.85x   2.99x   3.40x   3.23x

aes:192bit
        lrw:320bit      xts:384bit
size    lrw-enc lrw-dec xts-dec xts-dec
16B     1.08x   1.08x   1.16x   1.17x
64B     1.48x   1.54x   1.59x   1.65x
256B    2.18x   2.17x   2.29x   2.28x
1024B   2.67x   2.67x   2.87x   3.05x
8192B   2.93x   2.84x   3.28x   3.33x

aes:256bit
        lrw:348bit      xts:512bit
size    lrw-enc lrw-dec xts-dec xts-dec
16B     1.07x   1.07x   1.18x   1.19x
64B     1.56x   1.56x   1.70x   1.71x
256B    2.22x   2.24x   2.46x   2.46x
1024B   2.76x   2.77x   3.13x   3.05x
8192B   2.99x   3.05x   3.40x   3.30x

Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20 16:28:10 +08:00
Seth Jennings
35a1fc1873 powerpc/crypto: add 842 crypto driver
This patch add the 842 cryptographic API driver that
submits compression requests to the 842 hardware compression
accelerator driver (nx-compress).

If the hardware accelerator goes offline for any reason
(dynamic disable, migration, etc...), this driver will use LZO
as a software failover for all future compression requests.
For decompression requests, the 842 hardware driver contains
a software implementation of the 842 decompressor to support
the decompression of data that was compressed before the accelerator
went offline.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:56 +08:00
Seth Jennings
0e16aafb12 powerpc/crypto: add 842 hardware compression driver
This patch adds the driver for interacting with the 842
compression accelerator on IBM Power7+ systems.

The device is a child of the Platform Facilities Option (PFO)
and shows up as a child of the IBM VIO bus.

The compression/decompression API takes the same arguments
as existing compression methods like lzo and deflate.  The 842
hardware operates on 4K hardware pages and the driver breaks up
input on 4K boundaries to submit it to the hardware accelerator.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:56 +08:00
Seth Jennings
da29aa8f2a powerpc/crypto: add compression support to arch vec
This patch enables compression engine support in the
architecture vector.  This causes the Power hypervisor
to allow access to the nx comrpession accelerator.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:53 +08:00
Seth Jennings
322cacce0a powerpc/crypto: rework Kconfig
This patch creates a new submenu for the NX cryptographic
hardware accelerator and breaks the NX options into their own
Kconfig file under drivers/crypto/nx/Kconfig.

This will permit additional NX functionality to be easily
and more cleanly added in the future without touching
drivers/crypto/Makefile|Kconfig.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:52 +08:00
Kim Phillips
61bb86bba1 crypto: caam - set descriptor sharing type to SERIAL
SHARE_WAIT, whilst more optimal for association-less crypto,
has the ability to start thrashing the CCB descriptor/key
caches, given high levels of traffic across multiple security
associations (and thus keys).

Switch to using the SERIAL sharing type, which prefers
the last used CCB for the SA.  On a 2-DECO platform
such as the P3041, this can improve performance by
about 3.7%.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:31 +08:00
Shengzhou Liu
95bcaa3905 crypto: caam - add backward compatible string sec4.0
In some device trees of previous version, there were string "fsl,sec4.0".
To be backward compatible with device trees, we first check "fsl,sec-v4.0",
if it fails, then check for "fsl,sec4.0".

Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>

extended to include new hash and rng code, which was omitted from
the previous version of this patch during a rebase of the SDK
version.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:31 +08:00
Kim Phillips
4a90507713 crypto: caam - fix possible deadlock condition
commit "crypto: caam - use non-irq versions of spinlocks for job rings"
made two bad assumptions:

(a) The caam_jr_enqueue lock isn't used in softirq context.
Not true: jr_enqueue can be interrupted by an incoming net
interrupt and the received packet may be sent for encryption,
via caam_jr_enqueue in softirq context, thereby inducing a
deadlock.

This is evidenced when running netperf over an IPSec tunnel
between two P4080's, with spinlock debugging turned on:

[  892.092569] BUG: spinlock lockup on CPU#7, netperf/10634, e8bf5f70
[  892.098747] Call Trace:
[  892.101197] [eff9fc10] [c00084c0] show_stack+0x48/0x15c (unreliable)
[  892.107563] [eff9fc50] [c0239c2c] do_raw_spin_lock+0x16c/0x174
[  892.113399] [eff9fc80] [c0596494] _raw_spin_lock+0x3c/0x50
[  892.118889] [eff9fc90] [c0445e74] caam_jr_enqueue+0xf8/0x250
[  892.124550] [eff9fcd0] [c044a644] aead_decrypt+0x6c/0xc8
[  892.129625] BUG: spinlock lockup on CPU#5, swapper/5/0, e8bf5f70
[  892.129629] Call Trace:
[  892.129637] [effa7c10] [c00084c0] show_stack+0x48/0x15c (unreliable)
[  892.129645] [effa7c50] [c0239c2c] do_raw_spin_lock+0x16c/0x174
[  892.129652] [effa7c80] [c0596494] _raw_spin_lock+0x3c/0x50
[  892.129660] [effa7c90] [c0445e74] caam_jr_enqueue+0xf8/0x250
[  892.129666] [effa7cd0] [c044a644] aead_decrypt+0x6c/0xc8
[  892.129674] [effa7d00] [c0509724] esp_input+0x178/0x334
[  892.129681] [effa7d50] [c0519778] xfrm_input+0x77c/0x818
[  892.129688] [effa7da0] [c050e344] xfrm4_rcv_encap+0x20/0x30
[  892.129697] [effa7db0] [c04b90c8] ip_local_deliver+0x190/0x408
[  892.129703] [effa7de0] [c04b966c] ip_rcv+0x32c/0x898
[  892.129709] [effa7e10] [c048b998] __netif_receive_skb+0x27c/0x4e8
[  892.129715] [effa7e80] [c048d744] netif_receive_skb+0x4c/0x13c
[  892.129726] [effa7eb0] [c03c28ac] _dpa_rx+0x1a8/0x354
[  892.129732] [effa7ef0] [c03c2ac4] ingress_rx_default_dqrr+0x6c/0x108
[  892.129742] [effa7f10] [c0467ae0] qman_poll_dqrr+0x170/0x1d4
[  892.129748] [effa7f40] [c03c153c] dpaa_eth_poll+0x20/0x94
[  892.129754] [effa7f60] [c048dbd0] net_rx_action+0x13c/0x1f4
[  892.129763] [effa7fa0] [c003d1b8] __do_softirq+0x108/0x1b0
[  892.129769] [effa7ff0] [c000df58] call_do_softirq+0x14/0x24
[  892.129775] [ebacfe70] [c0004868] do_softirq+0xd8/0x104
[  892.129780] [ebacfe90] [c003d5a4] irq_exit+0xb8/0xd8
[  892.129786] [ebacfea0] [c0004498] do_IRQ+0xa4/0x1b0
[  892.129792] [ebacfed0] [c000fad8] ret_from_except+0x0/0x18
[  892.129798] [ebacff90] [c0009010] cpu_idle+0x94/0xf0
[  892.129804] [ebacffb0] [c059ff88] start_secondary+0x42c/0x430
[  892.129809] [ebacfff0] [c0001e28] __secondary_start+0x30/0x84
[  892.281474]
[  892.282959] [eff9fd00] [c0509724] esp_input+0x178/0x334
[  892.288186] [eff9fd50] [c0519778] xfrm_input+0x77c/0x818
[  892.293499] [eff9fda0] [c050e344] xfrm4_rcv_encap+0x20/0x30
[  892.299074] [eff9fdb0] [c04b90c8] ip_local_deliver+0x190/0x408
[  892.304907] [eff9fde0] [c04b966c] ip_rcv+0x32c/0x898
[  892.309872] [eff9fe10] [c048b998] __netif_receive_skb+0x27c/0x4e8
[  892.315966] [eff9fe80] [c048d744] netif_receive_skb+0x4c/0x13c
[  892.321803] [eff9feb0] [c03c28ac] _dpa_rx+0x1a8/0x354
[  892.326855] [eff9fef0] [c03c2ac4] ingress_rx_default_dqrr+0x6c/0x108
[  892.333212] [eff9ff10] [c0467ae0] qman_poll_dqrr+0x170/0x1d4
[  892.338872] [eff9ff40] [c03c153c] dpaa_eth_poll+0x20/0x94
[  892.344271] [eff9ff60] [c048dbd0] net_rx_action+0x13c/0x1f4
[  892.349846] [eff9ffa0] [c003d1b8] __do_softirq+0x108/0x1b0
[  892.355338] [eff9fff0] [c000df58] call_do_softirq+0x14/0x24
[  892.360910] [e7169950] [c0004868] do_softirq+0xd8/0x104
[  892.366135] [e7169970] [c003d5a4] irq_exit+0xb8/0xd8
[  892.371101] [e7169980] [c0004498] do_IRQ+0xa4/0x1b0
[  892.375979] [e71699b0] [c000fad8] ret_from_except+0x0/0x18
[  892.381466] [e7169a70] [c0445e74] caam_jr_enqueue+0xf8/0x250
[  892.387127] [e7169ab0] [c044ad4c] aead_givencrypt+0x6ac/0xa70
[  892.392873] [e7169b20] [c050a0b8] esp_output+0x2b4/0x570
[  892.398186] [e7169b80] [c0519b9c] xfrm_output_resume+0x248/0x7c0
[  892.404194] [e7169bb0] [c050e89c] xfrm4_output_finish+0x18/0x28
[  892.410113] [e7169bc0] [c050e8f4] xfrm4_output+0x48/0x98
[  892.415427] [e7169bd0] [c04beac0] ip_local_out+0x48/0x98
[  892.420740] [e7169be0] [c04bec7c] ip_queue_xmit+0x16c/0x490
[  892.426314] [e7169c10] [c04d6128] tcp_transmit_skb+0x35c/0x9a4
[  892.432147] [e7169c70] [c04d6f98] tcp_write_xmit+0x200/0xa04
[  892.437808] [e7169cc0] [c04c8ccc] tcp_sendmsg+0x994/0xcec
[  892.443213] [e7169d40] [c04eebfc] inet_sendmsg+0xd0/0x164
[  892.448617] [e7169d70] [c04792f8] sock_sendmsg+0x8c/0xbc
[  892.453931] [e7169e40] [c047aecc] sys_sendto+0xc0/0xfc
[  892.459069] [e7169f10] [c047b934] sys_socketcall+0x110/0x25c
[  892.464729] [e7169f40] [c000f480] ret_from_syscall+0x0/0x3c

(b) since the caam_jr_dequeue lock is only used in bh context,
then semantically it should use _bh spin_lock types.  spin_lock_bh
semantics are to disable back-halves, and used when a lock is shared
between softirq (bh) context and process and/or h/w IRQ context.
Since the lock is only used within softirq context, and this tasklet
is atomic, there is no need to do the additional work to disable
back halves.

This patch adds back-half disabling protection to caam_jr_enqueue
spin_locks to fix (a), and drops it from caam_jr_dequeue to fix (b).

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:31 +08:00
Johannes Goetzfried
4ea1277d30 crypto: cast6 - add x86_64/avx assembler implementation
This patch adds a x86_64/avx assembler implementation of the Cast6 block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the functions from the generic module are called. A good
performance increase is provided for blocksizes greater or equal to 128B.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)

cast6-avx-x86_64 vs. cast6-generic
128bit key:                                             (lrw:256bit)    (xts:256bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     0.97x   1.00x   1.01x   1.01x   0.99x   0.97x   0.98x   1.01x   0.96x   0.98x
64B     0.98x   0.99x   1.02x   1.01x   0.99x   1.00x   1.01x   0.99x   1.00x   0.99x
256B    1.77x   1.84x   0.99x   1.85x   1.77x   1.77x   1.70x   1.74x   1.69x   1.72x
1024B   1.93x   1.95x   0.99x   1.96x   1.93x   1.93x   1.84x   1.85x   1.89x   1.87x
8192B   1.91x   1.95x   0.99x   1.97x   1.95x   1.91x   1.86x   1.87x   1.93x   1.90x

256bit key:                                             (lrw:384bit)    (xts:512bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     0.97x   0.99x   1.02x   1.01x   0.98x   0.99x   1.00x   1.00x   0.98x   0.98x
64B     0.98x   0.99x   1.01x   1.00x   1.00x   1.00x   1.01x   1.01x   0.97x   1.00x
256B    1.77x   1.83x   1.00x   1.86x   1.79x   1.78x   1.70x   1.76x   1.71x   1.69x
1024B   1.92x   1.95x   0.99x   1.96x   1.93x   1.93x   1.83x   1.86x   1.89x   1.87x
8192B   1.94x   1.95x   0.99x   1.97x   1.95x   1.95x   1.87x   1.87x   1.93x   1.91x

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:30 +08:00
Johannes Goetzfried
9b8b04051d crypto: testmgr - add larger cast6 testvectors
New ECB, CBC, CTR, LRW and XTS testvectors for cast6. We need larger
testvectors to check parallel code paths in the optimized implementation. Tests
have also been added to the tcrypt module.

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:30 +08:00
Johannes Goetzfried
2b49b90672 crypto: cast6 - prepare generic module for optimized implementations
Rename cast6 module to cast6_generic to allow autoloading of optimized
implementations. Generic functions and s-boxes are exported to be able to use
them within optimized implementations.

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:30 +08:00
Johannes Goetzfried
4d6d6a2c85 crypto: cast5 - add x86_64/avx assembler implementation
This patch adds a x86_64/avx assembler implementation of the Cast5 block
cipher. The implementation processes sixteen blocks in parallel (four 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the functions from the generic module are called. A good
performance increase is provided for blocksizes greater or equal to 128B.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)

cast5-avx-x86_64 vs. cast5-generic
64bit key:
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B     0.99x   0.99x   1.00x   1.00x   1.02x   1.01x
64B     1.00x   1.00x   0.98x   1.00x   1.01x   1.02x
256B    2.03x   2.01x   0.95x   2.11x   2.12x   2.13x
1024B   2.30x   2.24x   0.95x   2.29x   2.35x   2.35x
8192B   2.31x   2.27x   0.95x   2.31x   2.39x   2.39x

128bit key:
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B     0.99x   0.99x   1.00x   1.00x   1.01x   1.01x
64B     1.00x   1.00x   0.98x   1.01x   1.02x   1.01x
256B    2.17x   2.13x   0.96x   2.19x   2.19x   2.19x
1024B   2.29x   2.32x   0.95x   2.34x   2.37x   2.38x
8192B   2.35x   2.32x   0.95x   2.35x   2.39x   2.39x

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:30 +08:00
Johannes Goetzfried
a2c5826095 crypto: testmgr - add larger cast5 testvectors
New ECB, CBC and CTR testvectors for cast5. We need larger testvectors to check
parallel code paths in the optimized implementation. Tests have also been added
to the tcrypt module.

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:29 +08:00
Johannes Goetzfried
270b0c6b40 crypto: cast5 - prepare generic module for optimized implementations
Rename cast5 module to cast5_generic to allow autoloading of optimized
implementations. Generic functions and s-boxes are exported to be able to use
them within optimized implementations.

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:29 +08:00
Jussi Kivilinna
37743cc0d3 crypto: arch/s390 - cleanup - remove unneeded cra_list initialization
Initialization of cra_list is currently mixed, most ciphers initialize this
field and most shashes do not. Initialization however is not needed at all
since cra_list is initialized/overwritten in __crypto_register_alg() with
list_add(). Therefore perform cleanup to remove all unneeded initializations
of this field in 'arch/s390/crypto/'

Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01 17:47:29 +08:00