linux/arch/powerpc
Anton Blanchard 15c2d45d17 powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:55 +11:00
..
boot Merge remote-tracking branch 'scottwood/next' into next 2014-11-18 17:00:38 +11:00
configs powerpc/ps3: Enable CONFIG_PS3_REPOSITORY_WRITE in ps3_defconfig 2015-01-22 17:31:22 +11:00
crypto crypto: powerpc - replace memset by memzero_explicit 2014-12-02 22:55:50 +08:00
include powerpc/eeh: Allow to set maximal frozen times 2015-01-23 14:02:54 +11:00
kernel powerpc/eeh: Allow to set maximal frozen times 2015-01-23 14:02:54 +11:00
kvm powerpc/kvm: Create proper names for the kvm_host_state PMU fields 2014-12-29 15:45:55 +11:00
lib powerpc: Add 64bit optimised memcmp 2015-01-23 14:02:55 +11:00
math-emu powerpc: Correct emulated mtfsf instruction 2014-04-07 10:33:11 +10:00
mm mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
net PPC: bpf_jit_comp: Unify BPF_MOD | BPF_X and BPF_DIV | BPF_X 2014-11-18 13:20:09 -05:00
oprofile powerpc updates for 3.19 2014-12-11 17:48:14 -08:00
perf power/perf/hv-24x7: Use kmem_cache_free() instead of kfree 2014-12-12 16:06:13 +11:00
platforms powerpc/powernv: Remove pnv_pci_probe_mode() 2015-01-23 14:02:54 +11:00
sysdev powerpc: Replace cpumask_weight(cpu_possible_mask) with num_possible_cpus() 2015-01-23 14:02:51 +11:00
xmon powerpc/xmon: use isspace/isxdigit/isalnum from linux/ctype.h 2014-12-29 15:45:43 +11:00
Kconfig gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs 2014-12-13 12:42:51 -08:00
Kconfig.debug Patch queue for ppc - 2014-08-01 2014-08-05 09:58:11 +02:00
Makefile powerpc: Add POWER8 CPU selection 2014-09-25 23:14:49 +10:00
relocs_check.pl Fix warning typo "CONFIG_RELCOATABLE" 2013-05-29 15:11:30 +02:00