273 Commits

Author SHA1 Message Date
Serguei Katkov
a01b42e49a [CGP] Fix the rematerialization of gc.relocates
If we want to substitute the relocation of derived pointer with gep of base then
we must ensure that relocation of base dominates the relocation of derived pointer.

Currently only check for basic block is present. However it is possible that both
relocation are in the same basic block but relocation of derived pointer is defined
earlier.

The patch moves the relocation of base pointer right before relocation of derived
pointer in this case.

Reviewers: sanjoy,artagnon,igor-laevsky,reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36462


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311067 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 05:48:30 +00:00
Sanjay Patel
5250bac12f [CGP] use narrower types in memcmp expansion when possible
This only affects very small memcmp on x86 for now, but it
will become more important if we allow vector-sized load and
compares.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309711 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01 17:24:54 +00:00
Sanjay Patel
8209d78723 [CGP] use subtract or subtract-of-cmps for result of memcmp expansion
As noted in the code comment, transforming this in the other direction might require 
a separate transform here in CGP given the block-at-a-time DAG constraint.

Besides that theoretical motivation, there are 2 practical motivations for the 
subtract-of-cmps form:

1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). 
   There is discussion about canonicalizing IR to the select form 
   ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), 
   so we probably need to add DAG transforms for those patterns anyway, but this improves the 
   memcmp output without waiting for that step.

2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
   that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
   if we choose to enable that.

Differential Revision: https://reviews.llvm.org/D34904



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309597 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-31 18:08:24 +00:00
Florian Hahn
8b712792d3 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309553 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-31 10:07:49 +00:00
Benjamin Kramer
b69a2b5cec [CodeGenPrepare] Cut off FindAllMemoryUses if there are too many uses.
This avoids excessive compile time. The case I'm looking at is
Function.cpp from an old version of LLVM that still had the giant memcmp
string matcher in it. Before r308322 this compiled in about 2 minutes,
after it, clang takes infinite* time to compile it. With this patch
we're at 5 min, which is still bad but this is a pathological case.

The cut off at 20 uses was chosen by looking at other cut-offs in LLVM
for user scanning. It's probably too high, but does the job and is very
unlikely to regress anything.

Fixes PR33900.

* I'm impatient and aborted after 15 minutes, on the bug report it was
  killed after 2h.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308891 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-24 16:18:09 +00:00
Serguei Katkov
8d9168d095 [CGP] Allow cycles during Phi traversal in OptimizaMemoryInst
Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.

The added test shows an example of enabling optimization inside a loop.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308419 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-19 04:49:17 +00:00
Serguei Katkov
6733c4dfa1 [CGP] Cleanup - remove redundant code in OptimizeMemoryInst. NFC
optimizeMemoryInst contains a vector AddrModeInsts.
The only use of this vector is to check that all instructions are in the same
block as memory instruction. This check is guarded by PhiSeen flag,
so if we traversed through phi node then we do not need to keep information
in AddrModeInsts. AddModeInsts is set first time we found some addressing mode
and updated if we found new one later.
We can find next addressing mode only if we traverse phi node so all code
related to update of AddModeInsts can be safely removed.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35291


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308265 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-18 05:16:38 +00:00
Haicheng Wu
8c939cb97f [TTI] Refine the cost of EXT in getUserCost()
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.

Differential Revision: https://reviews.llvm.org/D34458

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308076 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-15 02:12:16 +00:00
Eli Friedman
dd70def46c [CodeGenPrepare] Don't create dead instructions in addrmode sinking
When we fail to sink an instruction, we must make sure not to modify
the function; otherwise, we end up in an infinite loop because
CodeGenPrepare iterates until it doesn't make any changes.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 .



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307866 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 23:30:02 +00:00
Serguei Katkov
0c2ce7e21d [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307628 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 06:24:44 +00:00
Keno Fischer
35d7a2e860 [CodeGenPrepare] Don't create inttoptr for ni ptrs
Summary:
Arguably non-integral pointers probably shouldn't show up here at all,
but since the backend doesn't complain and this takes valid (according
to the Verifier) IR and makes it invalid, make sure not to introduce
any inttoptr instructions if we're dealing with non-integral pointers.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D33110

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306737 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-29 20:28:59 +00:00
Sanjay Patel
dbbccbae97 [CGP] add specialization for memcmp expansion with only one basic block
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306485 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 23:15:01 +00:00
Sanjay Patel
ca9df19568 [CGP] eliminate a sub instruction in memcmp expansion
As noted in D34071, there are some IR optimization opportunities that could be 
handled by normal IR passes if this expansion wasn't happening so late in CGP.

Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, 
so I'm proposing this change:
  %s = sub i32 %x, %y
  %r = icmp ne %s, 0
    =>
  %r = icmp ne %x, %y

Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.

The fact that the PowerPC backend doesn't eliminate the 'subf.' might be 
something for PPC folks to investigate separately.

Differential Revision: https://reviews.llvm.org/D34416


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 21:46:34 +00:00
Sanjay Patel
6891a99c36 [CGP] simplify code to get bswap in memcmp expansion; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 19:31:35 +00:00
Sanjay Patel
cfc8374c45 [CGP] add an IR builder to memcmp expansion class instead of recreating it; NFCI
This was a clean-up suggestion from:
https://reviews.llvm.org/D34005


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306438 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:18:42 +00:00
Hiroshi Inoue
0df653a65e fix trivial typos, NFC
succesor -> successor



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306393 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 10:35:37 +00:00
Sanjay Patel
d8cbb8e87a [CGP, memcmp] replace CreateZextOrTrunc with CreateZext because it can never trunc
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305936 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 18:20:52 +00:00
Sanjay Patel
8c9101fe00 [CGP] fix variables to be unsigned in memcmp expansion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305935 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 18:06:13 +00:00
Sanjay Patel
7a0e66cc56 [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion
This is the last step needed to avoid regressions for x86 before we flip the switch to allow 
expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings, 
so we need to do that here too.

FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that 
code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1 
constant tests because its alignment requirements are too strict (shouldn't require alignment 
for a constant operand).

Differential Revision: https://reviews.llvm.org/D34071


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305734 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 19:48:35 +00:00
David Callahan
194057d888 Allow -profile-guided-section-prefix more than once
Summary:
At present, `-profile-guided-section-prefix` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times.  Our build system makes it pretty tricky to guarantee this.  We often accidentally pass the flag more than once (but always with the same "false" value) which results in an error, after which compilation fails:

```
clang (LLVM option parsing): for the -profile-guided-section-prefix option: may only occur zero or one times!
```

While we work on improving our build system, it also seems reasonable just to allow `-profile-guided-section-prefix` to be passed more than once, by to `cl::ZeroOrMore`.  Quoting [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | the documentation ]]:

> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.

Reviewers: danielcdh

Reviewed By: danielcdh

Subscribers: twoh, david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D34219

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-14 20:35:33 +00:00
Sanjay Patel
672c64a527 [CGP] add a reference to DataLayout in MemCmpExpansion; NFCI
We're currently passing endian-ness around as a param (and not uniformly),
so this eliminates the need for that. I'd like to add a constant fold
call too, and that requires a DL.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305129 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-09 23:01:05 +00:00
Sanjay Patel
464c05b269 fix formatting; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305008 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-08 20:00:09 +00:00
Sanjay Patel
4c04c2d072 [CGP] don't expand a memcmp with nobuiltin attribute
This matches the behavior used in the SDAG when expanding memcmp.

For reference, we're intentionally treating the earlier fortified call transforms differently after:
https://bugs.llvm.org/show_bug.cgi?id=23093
https://reviews.llvm.org/rL233776

One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers:
https://reviews.llvm.org/D19781
https://reviews.llvm.org/D19801

Differential Revision: https://reviews.llvm.org/D34043


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305007 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-08 19:47:25 +00:00
Sanjay Patel
94a001edde [CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion
The test diff for PowerPC shows we can better optimize if this case is one block.

For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed 
cheap and SDAG can't optimize across blocks. 

Instead of this:

_cmp_eq8:
  movq  (%rdi), %rax
  cmpq  (%rsi), %rax
  je  LBB23_1
## BB#2:                                ## %res_block
  movl  $1, %ecx
  jmp LBB23_3
LBB23_1:
  xorl  %ecx, %ecx
LBB23_3:                                ## %endblock
  xorl  %eax, %eax
  testl %ecx, %ecx
  sete  %al
  retq

We get this:

cmp_eq8:   
  movq  (%rdi), %rcx
  xorl  %eax, %eax
  cmpq  (%rsi), %rcx
  sete  %al
  retq

And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). 
If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable 
CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we 
can lower the IR produced here optimally.

Differential Revision: https://reviews.llvm.org/D34005


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304987 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-08 16:53:18 +00:00
Sanjay Patel
57caaecac3 [CGP] avoid zext/trunc of a memcmp expansion compare
This could be viewed as another shortcoming of the DAGCombiner:
when both operands of a compare are zexted from the same source
type, we should be able to compare the original types.

The effect on PowerPC perf is likely unnoticeable, but there's a
visible regression for x86 if we feed the suboptimal IR for memcmp
expansion to the DAG:

_cmp_eq4_zexted_to_i64:
  movl  (%rdi), %ecx
  movl  (%rsi), %edx
  xorl  %eax, %eax
  cmpq  %rdx, %rcx
  sete  %al

_cmp_eq4_better:
  movl  (%rdi), %ecx
  xorl  %eax, %eax
  cmpl  (%rsi), %ecx
  sete  %al



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304923 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 16:16:45 +00:00
Sanjay Patel
f16db3ca21 [CGP] pass size as param in MemCmpExpansion; NFCI
Avoid extracting the constant int twice.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304920 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 15:05:13 +00:00
Sanjay Patel
5fbefa5732 [CGP] pass size as param in MemCmpExpansion; NFCI
Avoid extracting the constant int twice.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304917 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 14:45:49 +00:00
Sanjay Patel
7e504a2322 [CGP] getParent()->getParent() --> getFunction(); NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304916 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 14:29:52 +00:00
Sanjay Patel
2b75145d62 [CGP] add helper function for generating compare of load pairs; NFCI
In the special (but also the likely common) case, we can avoid
the multi-block complexity of the general algorithm, so moving
this part off on its own will make it re-usable.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304908 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 13:33:00 +00:00
Sanjay Patel
77c163d8d8 [CGP] fix formatting in MemCmpExpansion; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304903 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 12:44:36 +00:00
Sanjay Patel
756819c52a [CGP / PowerPC] use direct compares if there's only one load per block in memcmp() expansion
I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the 
special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle.

I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time 
limitation in the DAG. We might have to just avoid those special-cases here in CGP. But 
regardless of that, I think this is a win for the more general cases.

http://rise4fun.com/Alive/cbQ

Differential Revision: https://reviews.llvm.org/D33963


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304849 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 00:17:08 +00:00
Sanjay Patel
675f794383 [CGP] fix formatting/typos in MemCmpExpansion; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304830 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 20:30:47 +00:00
Chandler Carruth
e3e43d9d57 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 11:49:48 +00:00
Zaara Syeda
682f92f568 [PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.

Differential Revision: https://reviews.llvm.org/D28637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-31 17:12:38 +00:00
Matthias Braun
94c4904dc5 CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303921 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-25 21:26:32 +00:00
Reid Kleckner
816047d44c [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303362 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-18 17:24:10 +00:00
Francis Visoiu Mistrih
ae1c853358 [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303360 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-18 17:21:13 +00:00
Ayman Musa
eadb58fda7 [X86] Relocate code of replacement of subtarget unsupported masked memory intrinsics to run also on -O0 option.
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).

CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).

Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.

Differential Revision: https://reviews.llvm.org/D32487



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303050 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 11:30:54 +00:00
Teresa Johnson
bb89593724 Fix code section prefix for proper layout
Summary:
r284533 added hot and cold section prefixes based on profile
information, to enable grouping of hot/cold functions at link time.
However, it used "cold" as the prefix for cold sections, but gold only
recognizes "unlikely" (which is used by gcc for cold sections).
Therefore, cold sections were not properly being grouped. Switch to
using "unlikely"

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32983

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302502 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-09 01:43:24 +00:00
Sanjoy Das
399b4d037d Rename WeakVH to WeakTrackingVH; NFC
This relands r301424.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301812 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-01 17:07:49 +00:00
Daniel Berlin
e2c5126dbb Kill off the old SimplifyInstruction API by converting remaining users.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301673 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-28 19:55:38 +00:00
Sanjoy Das
263da12ab2 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301429 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:37:05 +00:00
Sanjoy Das
d0cf26e443 Rename WeakVH to WeakTrackingVH; NFC
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.

Reviewers: dblaikie, davide

Reviewed By: davide

Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D32266

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301424 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:20:52 +00:00
Craig Topper
df22034939 [APInt] Use lshrInPlace to replace lshr where possible
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300566 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-18 17:14:21 +00:00
Brendon Cahoon
80269962da [CodeGenPrepare] Fix crash due to an invalid CFG
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.

This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.

Differential Revision: https://reviews.llvm.org/D32126


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300480 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-17 19:11:04 +00:00
Chandler Carruth
ddfada260a [IR] Redesign the case iterator in SwitchInst to actually be an iterator
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.

All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.

Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.

Differential Revision: https://reviews.llvm.org/D31548

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300032 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 07:27:28 +00:00
Eli Friedman
f25acacbe6 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299723 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 22:42:18 +00:00
Jun Bum Lim
78c95b332b [CodeGenPrep] move aarch64-type-promotion to CGP
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.

Reviewers: jmolloy, mcrosier, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, aemerson, rengolin, mcrosier

Differential Revision: https://reviews.llvm.org/D28680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299379 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 19:20:07 +00:00
Craig Topper
68149f546e [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299362 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 16:34:59 +00:00
Dehao Chen
261eb1f850 Use isFunctionHotInCallGraph to set the function section prefix.
Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31225

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298656 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-23 23:14:11 +00:00