310689 Commits

Author SHA1 Message Date
Peter Smith
777e1cfdc3 [ELF][ARM] Accept and ignore -p and -no-pipleline-knowledge
The linux kernel uses an old flag -p/-no-pipeline-knowledge that is
accepted by bfd and gold but ignored by modern versions of them. The
original option is very old and is pre-ABI, it sometimes comes up in
code-bases that had support for pre ABI toolchains. The Linux kernel uses
it in 3 places in the ARM specific section.

Differential Revision: https://reviews.llvm.org/D58540

llvm-svn: 354769
2019-02-25 10:48:31 +00:00
Simon Tatham
b70fc0c5fd [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Roman Lebedev
542e5d7bb5 [llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Pavel Labath
ad96b0e63e Finish revert of r354706
The revert in r354711 wasn't complete. Finish the job.

llvm-svn: 354766
2019-02-25 09:30:41 +00:00
Kadir Cetinkaya
f47177ddb4 [clangd] Add thread priority lowering for MacOS as well
Reviewers: ilya-biryukov

Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58492

llvm-svn: 354765
2019-02-25 09:19:26 +00:00
Roman Lebedev
49b6f81a74 [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
Summary:
This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert

Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.

However unfortunately, there is an internal detail that one needs
to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
So for **every** `llvm::json::Object`, even if you only store a single `int`
entry there, you pay the whole price of `llvm::DenseMap`.

Unfortunately, it matters for `llvm-xray`.

I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.

`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.

Timings:
Old: (copied from D58580)
```
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT

 Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):

          21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
               314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
                 1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
           2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
       85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
       14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
       32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
       67896890228      instructions              #    0.79  insn per cycle
                                                  #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
       14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
         212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)

           21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )
```
New:
```
$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT

 Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):

           7178.38 msec task-clock                #    1.000 CPUs utilized            ( +-  0.26% )
               182      context-switches          #   25.402 M/sec                    ( +- 28.84% )
                 0      cpu-migrations            #    0.046 M/sec                    ( +- 70.71% )
             33701      page-faults               # 4694.994 M/sec                    ( +-  0.88% )
       28761053971      cycles                    # 4006833.933 GHz                   ( +-  0.26% )  (83.32%)
        2028297997      stalled-cycles-frontend   #    7.05% frontend cycles idle     ( +-  1.61% )  (83.32%)
       10773154901      stalled-cycles-backend    #   37.46% backend cycles idle      ( +-  0.38% )  (33.36%)
       36199132874      instructions              #    1.26  insn per cycle
                                                  #    0.30  stalled cycles per insn  ( +-  0.03% )  (50.02%)
        6434504227      branches                  # 896420204.421 M/sec               ( +-  0.03% )  (66.68%)
          73355176      branch-misses             #    1.14% of all branches          ( +-  1.46% )  (83.33%)

            7.1807 +- 0.0190 seconds time elapsed  ( +-  0.26% )
```

So using `llvm::json` nearly triples run-time on that test case.
(+3x is times, not percent.)

Memory:
Old:
```
total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions: 33267816 (834135/s)
temporary memory allocations: 5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB
```
New:
```
total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions: 21382982 (1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB
```
Diff:
```
total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -11884834 (529155/s)
temporary memory allocations: -5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB
```
So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.

And note that memory usage is clearly unbound with `llvm::json`, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.

Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.

Reviewers: dberris, kpw, sammccall

Reviewed By: dberris

Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58584

llvm-svn: 354764
2019-02-25 07:39:07 +00:00
Craig Topper
8c9724ea4f [SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair.
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.

This saves ~3000 bytes in the X86 table.

llvm-svn: 354763
2019-02-25 03:11:44 +00:00
Kang Zhang
4faa4090c9 [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D58430

llvm-svn: 354762
2019-02-25 02:46:16 +00:00
Marc-Andre Laperle
25e690273a [clangd] Enhance macro hover to see full definition
Summary: Signed-off-by: Marc-Andre Laperle <malaperle@gmail.com>

Reviewers: simark, ilya-biryukov, sammccall, ioeric, hokein

Reviewed By: ilya-biryukov

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D55250

llvm-svn: 354761
2019-02-24 23:47:03 +00:00
Nikita Popov
b7918f3c14 [InstCombine] Add tests for PR40846; NFC
The icmps are the same as the overflow result of the intrinsic.

llvm-svn: 354760
2019-02-24 21:55:37 +00:00
Nikita Popov
bdefe47857 [InstCombine] Move with.overflow tests to separate file; NFC
And regenerate checks. I had to rename some variables, because
update_test_checks can't deal with the same variable names used
in lower and upper case. I've also dropped the result type aliases,
as just using the type directly gives a cleaner result.

llvm-svn: 354759
2019-02-24 21:55:31 +00:00
Simon Pilgrim
f43c48cb52 [X86] Add PR40483 test cases
Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops

llvm-svn: 354758
2019-02-24 21:13:29 +00:00
Simon Pilgrim
cfaf663a35 [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.

llvm-svn: 354757
2019-02-24 19:57:52 +00:00
Craig Topper
3fe4bd464c [X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:

t0: ch = EntryToken
      t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
    t12: i64 = add t8, t11
  t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.

  t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"

When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.

The patch is to fold the load and X86ISD::WrapperRIP.

Fixes PR26906

Patch by LuoYuanke

Reviewers: craig.topper, rnk, annita.zhang, wxiao3

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58336

llvm-svn: 354756
2019-02-24 19:33:37 +00:00
Craig Topper
5532a98737 [X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:

Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58574

llvm-svn: 354755
2019-02-24 19:23:41 +00:00
Craig Topper
ce2bd19c49 [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.

The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.

Reviewers: RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58581

llvm-svn: 354754
2019-02-24 19:23:39 +00:00
Craig Topper
be3348573e [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.

We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.

Reviewers: spatel, RKSimon, nikic

Reviewed By: nikic

Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58567

llvm-svn: 354753
2019-02-24 19:23:36 +00:00
Kristina Brooks
103799c060 Fix accidentally used hard tabs. NFC
Big sorry. This undoes the indentation mess I made
in r354751.

llvm-svn: 354752
2019-02-24 18:06:10 +00:00
Kristina Brooks
716cbfb464 Wrap code for builtin_assume_aligned at 80 col.NFC
Minor style fix to avoid going over 80 cols in handling
of case for Builtin::BI__builtin_assume_aligned. NFC.

llvm-svn: 354751
2019-02-24 17:57:33 +00:00
Sanjay Patel
26aa702463 [InstCombine] add test for icmp+add fold; NFC
llvm-svn: 354750
2019-02-24 17:31:15 +00:00
Simon Pilgrim
4f4f9abdfa [X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.

llvm-svn: 354749
2019-02-24 17:30:06 +00:00
Sanjay Patel
9907d3c8b4 [InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)

We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.

The backend should be prepared for this change after:
D57401
rL353433

This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.

The seeming regression in the fuzzer test was discussed in:
D58359

We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.

llvm-svn: 354748
2019-02-24 16:57:45 +00:00
Sanjay Patel
986a024c19 [InstCombine] regenerate checks; NFC
llvm-svn: 354747
2019-02-24 16:11:58 +00:00
Sanjay Patel
cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Simon Pilgrim
9b49f36a03 Fix "enumeral and non-enumeral type in conditional expression" gcc7 warning. NFCI.
llvm-svn: 354745
2019-02-24 13:31:52 +00:00
Heejin Ahn
20cf0749cb [WebAssembly] Rename a variable in CFGStackify (NFC)
llvm-svn: 354744
2019-02-24 08:30:06 +00:00
Heejin Ahn
25d924b41f [WebAssembly] Merge two identical switch case routines into one (NFC)
llvm-svn: 354743
2019-02-24 08:19:55 +00:00
Michael Liao
7faef3d1a3 Typo: s/CHCCK/CHECK
llvm-svn: 354742
2019-02-24 03:10:14 +00:00
Michael Liao
8676f12ac6 [NFC] Minor coding style (indent) fix.
llvm-svn: 354741
2019-02-24 03:07:32 +00:00
Philip Reames
33d7e49bb7 [Hexagon, SystemZ] Be super conservative about atomics
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.  

llvm-svn: 354740
2019-02-24 00:45:09 +00:00
Duncan P. N. Exon Smith
e7b9464943 VFS: Avoid some unnecessary std::string copies
Thread Twine a little deeper through the VFS to avoid unnecessarily
constructing the same std::string twice in a parameter sequence:

    Twine -> std::string -> StringRef -> std::string

Changing a few parameters from StringRef to Twine avoids the early call
to `Twine::str()`.

llvm-svn: 354739
2019-02-23 23:48:47 +00:00
Craig Topper
dc185522fb [TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.

X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.

Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.

llvm-svn: 354738
2019-02-23 21:41:44 +00:00
Craig Topper
be9eeb5526 Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.

A follow patch will fix the issue that caused it to be reverted.

llvm-svn: 354737
2019-02-23 21:41:42 +00:00
Richard Smith
10ab78e854 Enable coroutines under -std=c++2a.
llvm-svn: 354736
2019-02-23 21:06:26 +00:00
Richard Smith
456e7afbca [cxx_status] Update to match Kona motions.
llvm-svn: 354735
2019-02-23 21:06:25 +00:00
Craig Topper
ccc860cb81 Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."

These were reverted in r354713 as their context depended on other patches that were reverted for a bug.

llvm-svn: 354734
2019-02-23 19:51:32 +00:00
Nikita Popov
e661f946a7 [WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.

I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.

Differential Revision: https://reviews.llvm.org/D58575

llvm-svn: 354733
2019-02-23 18:59:01 +00:00
Simon Pilgrim
f383a47b7d [X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x)
D58053/rL354340 added this to EltsFromConsecutiveLoads directly

llvm-svn: 354732
2019-02-23 18:53:03 +00:00
Simon Pilgrim
398d0b9e96 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Simon Pilgrim
e08f177ea2 [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.

llvm-svn: 354730
2019-02-23 18:34:05 +00:00
Simon Pilgrim
31793733a0 [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.

llvm-svn: 354729
2019-02-23 17:10:47 +00:00
Bruno Ricci
d9381ae4f4 [NFC] Fix Wdocumentation warning in OMPToClause
llvm-svn: 354728
2019-02-23 16:40:30 +00:00
Bruno Ricci
17e65b428f [Sema][NFC] SequenceChecker: More tests in preparation for D57660
llvm-svn: 354727
2019-02-23 16:25:00 +00:00
Simon Dardis
86a589e38d [MIPS] Fix a incorrect test. (NFC)
This test is incorrect as it should be using the microMIPSR6 instruction to
return, not the microMIPS version.

llvm-svn: 354726
2019-02-23 15:56:32 +00:00
Louis Dionne
73be0cb773 [libcxx] Make sure all experimental tests are disabled when enable_experimental=False
Summary:
Previously, we'd run some experimental tests even when enable_experimental=False
was used with lit.

Reviewers: EricWF

Subscribers: christof, jkorous, dexonsmith, libcxx-commits, mclow.lists

Differential Revision: https://reviews.llvm.org/D55834

llvm-svn: 354725
2019-02-23 11:24:03 +00:00
Craig Topper
75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Brad Smith
6f1268a4cc Remove OpenBSD case for old system libstdc++ header path as OpenBSD
has switched to libc++.

llvm-svn: 354723
2019-02-23 07:21:19 +00:00
Michael Trent
7dcfac6171 objdump fails to parse Mach-O binaries with n_desc bearing stabs
Summary:
The objdump Mach-O parser uses MachOObjectFile::checkSymbolTable() to
verify the symbol table is in a legal state before dereferencing the
offsets in the table. This routine missed a test for N_STAB symbols
when validating the two-level name space library ordinal for undefined
symbols. If the binary in question contained a value in the n_desc high
byte that is larger than the list of loaded dylibs, checkSymbolTable()
will flag the library ordinal as being out of range. Most of the time
the n_desc field is set to 0 or to small values, but old final linked
binaries exist with N_STAB symbols bearing non-trivial n_desc fields. 

The change here is simply to verify a symbol is not an N_STAB symbol
before consulting the values of n_other or n_desc.

rdar://44977336

Reviewers: lhames, pete, ab

Reviewed By: pete

Subscribers: llvm-commits, rupprecht

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58568

llvm-svn: 354722
2019-02-23 06:19:56 +00:00
Brad Smith
6c664eda78 Remove sanitizer context workaround no longer necessary
The base linker is now lld.

llvm-svn: 354721
2019-02-23 06:19:28 +00:00
Richard Trieu
1bf3a09b3d Remove overly broad assert from r354717.
llvm-svn: 354720
2019-02-23 05:48:50 +00:00