mirror of
https://github.com/RPCS3/llvm.git
synced 2025-01-13 07:50:50 +00:00
376642ed62
1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8
13 lines
364 B
LLVM
13 lines
364 B
LLVM
; RUN: llc -march=arm -mcpu=cortex-a8 < %s | FileCheck %s
|
|
|
|
; Trigger multiple NEON stores.
|
|
; CHECK: vst1.64
|
|
; CHECK-NEXT: vst1.64
|
|
define void @f_0_40(i8* nocapture %c) nounwind optsize {
|
|
entry:
|
|
call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 40, i32 16, i1 false)
|
|
ret void
|
|
}
|
|
|
|
declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
|