Porting the example illustrating Polly from HTML to reStructuredText

http://polly.llvm.org/example_manual_matmul.html which illustrates individual
passes of Polly, has been ported to reStructuredText and necessary changes have
been made to the configuration files used by SPHINX to include the new source as
a part of the documentation.

Contributed-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com>

Differential Revision: https://reviews.llvm.org/D25163

llvm-svn: 294735
This commit is contained in:
Tobias Grosser 2017-02-10 11:46:57 +00:00
parent 296fe2e2ad
commit 30a02088c0
20 changed files with 1093 additions and 611 deletions

View File

@ -0,0 +1,475 @@
==================================================
How to manually use the Individual pieces of Polly
==================================================
Execute the individual Polly passes manually
============================================
.. sectionauthor:: Singapuram Sanjay Srivallabh
This example presents the individual passes that are involved when optimizing
code with Polly. We show how to execute them individually and explain for
each which analysis is performed or what transformation is applied. In this
example the polyhedral transformation is user-provided to show how much
performance improvement can be expected by an optimal automatic optimizer.
1. **Create LLVM-IR from the C code**
-------------------------------------
Polly works on LLVM-IR. Hence it is necessary to translate the source
files into LLVM-IR. If more than one file should be optimized the
files can be combined into a single file with llvm-link.
.. code-block:: console
clang -S -emit-llvm matmul.c -o matmul.s
2. **Prepare the LLVM-IR for Polly**
------------------------------------
Polly is only able to work with code that matches a canonical form.
To translate the LLVM-IR into this form we use a set of
canonicalication passes. They are scheduled by using
'-polly-canonicalize'.
.. code-block:: console
opt -S -polly-canonicalize matmul.s > matmul.preopt.ll
3. **Show the SCoPs detected by Polly (optional)**
--------------------------------------------------
To understand if Polly was able to detect SCoPs, we print the structure
of the detected SCoPs. In our example two SCoPs are detected. One in
'init_array' the other in 'main'.
.. code-block:: console
$ opt -polly-ast -analyze -q matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
:: isl ast :: init_array :: %for.cond1.preheader---%for.end19
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1)
Stmt_for_body3(c0, c1);
else
{ /* original code */ }
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1) {
Stmt_for_body3(c0, c1);
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body8(c0, c1, c2);
}
else
{ /* original code */ }
4. **Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)**
----------------------------------------------------------------------------------------
Polly can use graphviz to graphically show a CFG in which the detected
SCoPs are highlighted. It can also create '.dot' files that can be
translated by the 'dot' utility into various graphic formats.
.. code-block:: console
$ opt -view-scops -disable-output matmul.preopt.ll
$ opt -view-scops-only -disable-output matmul.preopt.ll
The output for the different functions:
- view-scops : main_, init_array_, print_array_
- view-scops-only : main-scopsonly_, init_array-scopsonly_, print_array-scopsonly_
.. _main: http://polly.llvm.org/experiments/matmul/scops.main.dot.png
.. _init_array: http://polly.llvm.org/experiments/matmul/scops.init_array.dot.png
.. _print_array: http://polly.llvm.org/experiments/matmul/scops.print_array.dot.png
.. _main-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.main.dot.png
.. _init_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.init_array.dot.png
.. _print_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.print_array.dot.png
5. **View the polyhedral representation of the SCoPs**
------------------------------------------------------
.. code-block:: console
$ opt -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
Function: init_array
Region: %for.cond1.preheader---%for.end19
Max Loop Depth: 2
Invariant Accesses: {
}
Context:
{ : }
Assumed Context:
{ : }
Invalid Context:
{ : 1 = 0 }
Arrays {
float MemRef_A[*][1536]; // Element size 4
float MemRef_B[*][1536]; // Element size 4
}
Arrays (Bounds as pw_affs) {
float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
}
Alias Groups (0):
n/a
Statements {
Stmt_for_body3
Domain :=
{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
Schedule :=
{ Stmt_for_body3[i0, i1] -> [i0, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] };
}
[...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end30' in function 'main':
Function: main
Region: %for.cond1.preheader---%for.end30
Max Loop Depth: 3
Invariant Accesses: {
}
Context:
{ : }
Assumed Context:
{ : }
Invalid Context:
{ : 1 = 0 }
Arrays {
float MemRef_C[*][1536]; // Element size 4
float MemRef_A[*][1536]; // Element size 4
float MemRef_B[*][1536]; // Element size 4
}
Arrays (Bounds as pw_affs) {
float MemRef_C[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
}
Alias Groups (0):
n/a
Statements {
Stmt_for_body3
Domain :=
{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
Schedule :=
{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] };
Stmt_for_body8
Domain :=
{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 };
Schedule :=
{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
}
6. **Show the dependences for the SCoPs**
-----------------------------------------
.. code-block:: console
$ opt -polly-dependences -analyze matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
RAW dependences:
{ }
WAR dependences:
{ }
WAW dependences:
{ }
Reduction dependences:
n/a
Transitive closure of reduction dependences:
{ }
[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end30' in function 'main':
RAW dependences:
{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
WAR dependences:
{ }
WAW dependences:
{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
Reduction dependences:
n/a
Transitive closure of reduction dependences:
{ }
7. **Export jscop files**
-------------------------
.. code-block:: console
$ opt -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Writing JScop '%for.cond1.preheader---%for.end19' in function 'init_array' to './init_array___%for.cond1.preheader---%for.end19.jscop'.
Writing JScop '%for.cond1.preheader---%for.end30' in function 'main' to './main___%for.cond1.preheader---%for.end30.jscop'.
8. **Import the changed jscop files and print the updated SCoP structure (optional)**
-------------------------------------------------------------------------------------
Polly can reimport jscop files, in which the schedules of the statements
are changed. These changed schedules are used to descripe
transformations. It is possible to import different jscop files by
providing the postfix of the jscop file that is imported.
We apply three different transformations on the SCoP in the main
function. The jscop files describing these transformations are
hand written (and available in docs/experiments/matmul).
**No Polly**
As a baseline we do not call any Polly code generation, but only apply the normal -O3 optimizations.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1) {
Stmt_for_body3(c0, c1);
for (int c3 = 0; c3 <= 1535; c3 += 1)
Stmt_for_body8(c0, c1, c3);
}
else
{ /* original code */ }
[...]
**Loop Interchange (and Fission to allow the interchange)**
We split the loops and can now apply an interchange of the loop dimensions that enumerate Stmt_for_body8.
.. Although I feel (and have created a .jscop) we can avoid splitting the loops.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
for (int c3 = 0; c3 <= 1535; c3 += 1)
Stmt_for_body8(c1, c3, c2);
}
else
{ /* original code */ }
[...]
**Interchange + Tiling**
In addition to the interchange we now tile the second loop nest.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 64)
for (int c2 = 0; c2 <= 1535; c2 += 64)
for (int c3 = 0; c3 <= 1535; c3 += 64)
for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
for (int c6 = c2; c6 <= c2 + 63; c6 += 1)
Stmt_for_body8(c4, c6, c5);
}
else
{ /* original code */ }
[...]
**Interchange + Tiling + Strip-mining to prepare vectorization**
To later allow vectorization we create a so called trivially
parallelizable loop. It is innermost, parallel and has only four
iterations. It can be replaced by 4-element SIMD instructions.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 64)
for (int c2 = 0; c2 <= 1535; c2 += 64)
for (int c3 = 0; c3 <= 1535; c3 += 64)
for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
for (int c6 = c2; c6 <= c2 + 63; c6 += 4)
for (int c7 = c6; c7 <= c6 + 3; c7 += 1)
Stmt_for_body8(c4, c7, c5);
}
else
{ /* original code */ }
[...]
9. **Codegenerate the SCoPs**
-----------------------------
This generates new code for the SCoPs detected by polly. If
-polly-import-jscop is present, transformations specified in the
imported jscop files will be applied.
.. code-block:: console
$ opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+vector.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-parallel -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+openmp.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
10. **Create the executables**
------------------------------
.. code-block:: console
$ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s -o matmul.normalopt.exe
$ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
$ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
$ llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
$ llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && gcc -fopenmp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe
11. **Compare the runtime of the executables**
----------------------------------------------
By comparing the runtimes of the different code snippets we see that a
simple loop interchange gives here the largest performance boost.
However in this case, adding vectorization and using OpenMP degrades
the performance.
.. code-block:: console
$ time ./matmul.normalopt.exe
real 0m11.295s
user 0m11.288s
sys 0m0.004s
$ time ./matmul.polly.interchanged.exe
real 0m0.988s
user 0m0.980s
sys 0m0.008s
$ time ./matmul.polly.interchanged+tiled.exe
real 0m0.830s
user 0m0.816s
sys 0m0.012s
$ time ./matmul.polly.interchanged+tiled+vector.exe
real 0m5.430s
user 0m5.424s
sys 0m0.004s
$ time ./matmul.polly.interchanged+tiled+vector+openmp.exe
real 0m3.184s
user 0m11.972s
sys 0m0.036s

View File

@ -0,0 +1,33 @@
{
"arrays" : [
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end19",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1] }"
}
]
}

View File

@ -1,40 +1,57 @@
{ {
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }", "context" : "{ : }",
"name" : "for.cond => for.end30", "name" : "%for.cond1.preheader---%for.end30",
"statements" : [ "statements" : [
{ {
"accesses" : [ "accesses" : [
{ {
"kind" : "write", "kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[1536i0 + i1] }" "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
} }
], ],
"domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }", "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3", "name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }" "schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] }"
}, },
{ {
"accesses" : [ "accesses" : [
{ {
"kind" : "read", "kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}, },
{ {
"kind" : "read", "kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[1536i0 + i2] }" "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
}, },
{ {
"kind" : "read", "kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }" "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
}, },
{ {
"kind" : "write", "kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }" "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
} }
], ],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 and i2 >= 0 and i2 <= 1535 }", "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8", "name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> schedule[0, i0, 0, i1, 1, i2, 0] }" "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] }"
} }
] ]
} }

View File

@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, i0, i2, i1] }"
}
]
}

View File

@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0 ] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }"
}
]
}

View File

@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0, 0 ] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, oo1, i1]: o0 <= i0 < o0 + 64 and o1 <= oo1 < o1 + 64 and o2 <= i2 < o2 + 64 and oo1 <= i1 < oo1 + 4 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and oo1 % 4 = 0 }"
}
]
}

View File

@ -23,8 +23,8 @@ Using Polly
Architecture Architecture
UsingPollyWithClang UsingPollyWithClang
HowToManuallyUseTheIndividualPiecesOfPolly
* `How to manually use the individual pieces of Polly <http://polly.llvm.org/example_manual_matmul.html>`_
* `A list of Polly passes <http://polly.llvm.org/documentation/passes.html>`_ * `A list of Polly passes <http://polly.llvm.org/documentation/passes.html>`_
Indices and tables Indices and tables

View File

@ -1,21 +0,0 @@
{
"context" : "{ : }",
"name" : "for.cond => for.end19",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[1536i0 + i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0] }"
}
]
}

View File

@ -1,40 +0,0 @@
{
"context" : "{ [] }",
"name" : "%1 => %17",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
"name" : "Stmt_4",
"schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
"name" : "Stmt_6",
"schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, i0, 0, i2, 0, i1, 0] }"
}
]
}

View File

@ -1,40 +0,0 @@
{
"context" : "{ [] }",
"name" : "%1 => %17",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
"name" : "Stmt_4",
"schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
"name" : "Stmt_6",
"schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }"
}
]
}

View File

@ -1,40 +0,0 @@
{
"context" : "{ [] }",
"name" : "%1 => %17",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
"name" : "Stmt_4",
"schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
}
],
"domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
"name" : "Stmt_6",
"schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, ii1, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and ii1 % 4 = 0 and ii1 <= i1 < ii1 + 4}"
}
]
}

View File

@ -1,5 +1,6 @@
; ModuleID = 'matmul.s' ; ModuleID = 'matmul.s'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" source_filename = "matmul.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu" target triple = "x86_64-unknown-linux-gnu"
%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] }
@ -7,114 +8,100 @@ target triple = "x86_64-unknown-linux-gnu"
@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 @A = common global [1536 x [1536 x float]] zeroinitializer, align 16
@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 @B = common global [1536 x [1536 x float]] zeroinitializer, align 16
@stdout = external global %struct._IO_FILE* @stdout = external global %struct._IO_FILE*, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1 @.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1
@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 @C = common global [1536 x [1536 x float]] zeroinitializer, align 16
@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 @.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
; Function Attrs: nounwind uwtable ; Function Attrs: nounwind uwtable
define void @init_array() #0 { define void @init_array() #0 {
entry: entry:
br label %for.cond br label %entry.split
for.cond: ; preds = %for.inc17, %entry entry.split: ; preds = %entry
%0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ] br label %for.cond1.preheader
%exitcond3 = icmp ne i64 %0, 1536
br i1 %exitcond3, label %for.body, label %for.end19
for.body: ; preds = %for.cond for.cond1.preheader: ; preds = %entry.split, %for.inc17
br label %for.cond1 %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ]
br label %for.body3
for.cond1: ; preds = %for.inc, %for.body for.body3: ; preds = %for.cond1.preheader, %for.body3
%indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ] %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
%arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5
%arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar %1 = trunc i64 %0 to i32
%1 = mul i64 %0, %indvar %rem = srem i32 %1, 1024
%mul = trunc i64 %1 to i32 %add = add nsw i32 %rem, 1
%exitcond = icmp ne i64 %indvar, 1536
br i1 %exitcond, label %for.body3, label %for.end
for.body3: ; preds = %for.cond1
%rem = srem i32 %mul, 1024
%add = add nsw i32 1, %rem
%conv = sitofp i32 %add to double %conv = sitofp i32 %add to double
%div = fdiv double %conv, 2.000000e+00 %div = fmul double %conv, 5.000000e-01
%conv4 = fptrunc double %div to float %conv4 = fptrunc double %div to float
%arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv
store float %conv4, float* %arrayidx6, align 4 store float %conv4, float* %arrayidx6, align 4
%rem8 = srem i32 %mul, 1024 %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5
%add9 = add nsw i32 1, %rem8 %3 = trunc i64 %2 to i32
%rem8 = srem i32 %3, 1024
%add9 = add nsw i32 %rem8, 1
%conv10 = sitofp i32 %add9 to double %conv10 = sitofp i32 %add9 to double
%div11 = fdiv double %conv10, 2.000000e+00 %div11 = fmul double %conv10, 5.000000e-01
%conv12 = fptrunc double %div11 to float %conv12 = fptrunc double %div11 to float
%arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv
store float %conv12, float* %arrayidx16, align 4 store float %conv12, float* %arrayidx16, align 4
br label %for.inc %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp ne i64 %indvars.iv.next, 1536
br i1 %exitcond, label %for.body3, label %for.inc17
for.inc: ; preds = %for.body3 for.inc17: ; preds = %for.body3
%indvar.next = add i64 %indvar, 1 %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1
br label %for.cond1 %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536
br i1 %exitcond7, label %for.cond1.preheader, label %for.end19
for.end: ; preds = %for.cond1 for.end19: ; preds = %for.inc17
br label %for.inc17
for.inc17: ; preds = %for.end
%indvar.next2 = add i64 %0, 1
br label %for.cond
for.end19: ; preds = %for.cond
ret void ret void
} }
; Function Attrs: nounwind uwtable ; Function Attrs: nounwind uwtable
define void @print_array() #0 { define void @print_array() #0 {
entry: entry:
br label %for.cond br label %entry.split
for.cond: ; preds = %for.inc10, %entry entry.split: ; preds = %entry
%indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ] br label %for.cond1.preheader
%exitcond3 = icmp ne i64 %indvar1, 1536
br i1 %exitcond3, label %for.body, label %for.end12
for.body: ; preds = %for.cond for.cond1.preheader: ; preds = %entry.split, %for.end
br label %for.cond1 %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ]
%0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
br label %for.body3
for.cond1: ; preds = %for.inc, %for.body for.body3: ; preds = %for.cond1.preheader, %for.inc
%indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ] %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.inc ]
%arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ]
%j.0 = trunc i64 %indvar to i32 %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv
%exitcond = icmp ne i64 %indvar, 1536 %2 = load float, float* %arrayidx5, align 4
br i1 %exitcond, label %for.body3, label %for.end %conv = fpext float %2 to double
%call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2
for.body3: ; preds = %for.cond1 %3 = trunc i64 %indvars.iv to i32
%0 = load %struct._IO_FILE** @stdout, align 8 %rem = srem i32 %3, 80
%1 = load float* %arrayidx5, align 4
%conv = fpext float %1 to double
%call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)
%rem = srem i32 %j.0, 80
%cmp6 = icmp eq i32 %rem, 79 %cmp6 = icmp eq i32 %rem, 79
br i1 %cmp6, label %if.then, label %if.end br i1 %cmp6, label %if.then, label %for.inc
if.then: ; preds = %for.body3 if.then: ; preds = %for.body3
%2 = load %struct._IO_FILE** @stdout, align 8 %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
%call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4)
br label %if.end
if.end: ; preds = %if.then, %for.body3
br label %for.inc br label %for.inc
for.inc: ; preds = %if.end for.inc: ; preds = %for.body3, %if.then
%indvar.next = add i64 %indvar, 1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
br label %for.cond1 %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
%exitcond = icmp ne i64 %indvars.iv.next, 1536
br i1 %exitcond, label %for.body3, label %for.end
for.end: ; preds = %for.cond1 for.end: ; preds = %for.inc
%3 = load %struct._IO_FILE** @stdout, align 8 %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ]
%call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa)
br label %for.inc10 %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1
%exitcond8 = icmp ne i64 %indvars.iv.next7, 1536
br i1 %exitcond8, label %for.cond1.preheader, label %for.end12
for.inc10: ; preds = %for.end for.end12: ; preds = %for.end
%indvar.next2 = add i64 %indvar1, 1
br label %for.cond
for.end12: ; preds = %for.cond
ret void ret void
} }
@ -123,64 +110,62 @@ declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) #1
; Function Attrs: nounwind uwtable ; Function Attrs: nounwind uwtable
define i32 @main() #0 { define i32 @main() #0 {
entry: entry:
call void @init_array() br label %entry.split
br label %for.cond
for.cond: ; preds = %for.inc28, %entry entry.split: ; preds = %entry
%indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ] tail call void @init_array()
%exitcond6 = icmp ne i64 %indvar3, 1536 br label %for.cond1.preheader
br i1 %exitcond6, label %for.body, label %for.end30
for.body: ; preds = %for.cond for.cond1.preheader: ; preds = %entry.split, %for.inc28
br label %for.cond1 %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ]
br label %for.body3
for.cond1: ; preds = %for.inc25, %for.body for.body3: ; preds = %for.cond1.preheader, %for.inc25
%indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ] %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5, %for.inc25 ]
%arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1 %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
%exitcond5 = icmp ne i64 %indvar1, 1536
br i1 %exitcond5, label %for.body3, label %for.end27
for.body3: ; preds = %for.cond1
store float 0.000000e+00, float* %arrayidx5, align 4 store float 0.000000e+00, float* %arrayidx5, align 4
br label %for.cond6 br label %for.body8
for.cond6: ; preds = %for.inc, %for.body3 for.body8: ; preds = %for.body3, %for.body8
%indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ] %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ]
%arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
%arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1 %0 = load float, float* %arrayidx12, align 4
%exitcond = icmp ne i64 %indvar, 1536 %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv
br i1 %exitcond, label %for.body8, label %for.end %1 = load float, float* %arrayidx16, align 4
%arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4
for.body8: ; preds = %for.cond6 %2 = load float, float* %arrayidx20, align 4
%0 = load float* %arrayidx5, align 4
%1 = load float* %arrayidx16, align 4
%2 = load float* %arrayidx20, align 4
%mul = fmul float %1, %2 %mul = fmul float %1, %2
%add = fadd float %0, %mul %add = fadd float %0, %mul
store float %add, float* %arrayidx5, align 4 %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
br label %for.inc store float %add, float* %arrayidx24, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp ne i64 %indvars.iv.next, 1536
br i1 %exitcond, label %for.body8, label %for.inc25
for.inc: ; preds = %for.body8 for.inc25: ; preds = %for.body8
%indvar.next = add i64 %indvar, 1 %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1
br label %for.cond6 %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536
br i1 %exitcond6, label %for.body3, label %for.inc28
for.end: ; preds = %for.cond6 for.inc28: ; preds = %for.inc25
br label %for.inc25 %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1
%exitcond9 = icmp ne i64 %indvars.iv.next8, 1536
br i1 %exitcond9, label %for.cond1.preheader, label %for.end30
for.inc25: ; preds = %for.end for.end30: ; preds = %for.inc28
%indvar.next2 = add i64 %indvar1, 1
br label %for.cond1
for.end27: ; preds = %for.cond1
br label %for.inc28
for.inc28: ; preds = %for.end27
%indvar.next4 = add i64 %indvar3, 1
br label %for.cond
for.end30: ; preds = %for.cond
ret i32 0 ret i32 0
} }
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } ; Function Attrs: nounwind
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } declare i64 @fwrite(i8* nocapture, i64, i64, %struct._IO_FILE* nocapture) #2
; Function Attrs: nounwind
declare i32 @fputc(i32, %struct._IO_FILE* nocapture) #2
attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
!llvm.ident = !{!0}
!0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"}

View File

@ -1,5 +1,6 @@
; ModuleID = 'matmul.c' ; ModuleID = 'matmul.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" source_filename = "matmul.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu" target triple = "x86_64-unknown-linux-gnu"
%struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] } %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] }
@ -7,10 +8,10 @@ target triple = "x86_64-unknown-linux-gnu"
@A = common global [1536 x [1536 x float]] zeroinitializer, align 16 @A = common global [1536 x [1536 x float]] zeroinitializer, align 16
@B = common global [1536 x [1536 x float]] zeroinitializer, align 16 @B = common global [1536 x [1536 x float]] zeroinitializer, align 16
@stdout = external global %struct._IO_FILE* @stdout = external global %struct._IO_FILE*, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1 @.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1
@C = common global [1536 x [1536 x float]] zeroinitializer, align 16 @C = common global [1536 x [1536 x float]] zeroinitializer, align 16
@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1 @.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
; Function Attrs: nounwind uwtable ; Function Attrs: nounwind uwtable
define void @init_array() #0 { define void @init_array() #0 {
@ -21,7 +22,7 @@ entry:
br label %for.cond br label %for.cond
for.cond: ; preds = %for.inc17, %entry for.cond: ; preds = %for.inc17, %entry
%0 = load i32* %i, align 4 %0 = load i32, i32* %i, align 4
%cmp = icmp slt i32 %0, 1536 %cmp = icmp slt i32 %0, 1536
br i1 %cmp, label %for.body, label %for.end19 br i1 %cmp, label %for.body, label %for.end19
@ -30,45 +31,45 @@ for.body: ; preds = %for.cond
br label %for.cond1 br label %for.cond1
for.cond1: ; preds = %for.inc, %for.body for.cond1: ; preds = %for.inc, %for.body
%1 = load i32* %j, align 4 %1 = load i32, i32* %j, align 4
%cmp2 = icmp slt i32 %1, 1536 %cmp2 = icmp slt i32 %1, 1536
br i1 %cmp2, label %for.body3, label %for.end br i1 %cmp2, label %for.body3, label %for.end
for.body3: ; preds = %for.cond1 for.body3: ; preds = %for.cond1
%2 = load i32* %i, align 4 %2 = load i32, i32* %i, align 4
%3 = load i32* %j, align 4 %3 = load i32, i32* %j, align 4
%mul = mul nsw i32 %2, %3 %mul = mul nsw i32 %2, %3
%rem = srem i32 %mul, 1024 %rem = srem i32 %mul, 1024
%add = add nsw i32 1, %rem %add = add nsw i32 1, %rem
%conv = sitofp i32 %add to double %conv = sitofp i32 %add to double
%div = fdiv double %conv, 2.000000e+00 %div = fdiv double %conv, 2.000000e+00
%conv4 = fptrunc double %div to float %conv4 = fptrunc double %div to float
%4 = load i32* %j, align 4 %4 = load i32, i32* %j, align 4
%idxprom = sext i32 %4 to i64 %idxprom = sext i32 %4 to i64
%5 = load i32* %i, align 4 %5 = load i32, i32* %i, align 4
%idxprom5 = sext i32 %5 to i64 %idxprom5 = sext i32 %5 to i64
%arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom5 %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom5
%arrayidx6 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom %arrayidx6 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
store float %conv4, float* %arrayidx6, align 4 store float %conv4, float* %arrayidx6, align 4
%6 = load i32* %i, align 4 %6 = load i32, i32* %i, align 4
%7 = load i32* %j, align 4 %7 = load i32, i32* %j, align 4
%mul7 = mul nsw i32 %6, %7 %mul7 = mul nsw i32 %6, %7
%rem8 = srem i32 %mul7, 1024 %rem8 = srem i32 %mul7, 1024
%add9 = add nsw i32 1, %rem8 %add9 = add nsw i32 1, %rem8
%conv10 = sitofp i32 %add9 to double %conv10 = sitofp i32 %add9 to double
%div11 = fdiv double %conv10, 2.000000e+00 %div11 = fdiv double %conv10, 2.000000e+00
%conv12 = fptrunc double %div11 to float %conv12 = fptrunc double %div11 to float
%8 = load i32* %j, align 4 %8 = load i32, i32* %j, align 4
%idxprom13 = sext i32 %8 to i64 %idxprom13 = sext i32 %8 to i64
%9 = load i32* %i, align 4 %9 = load i32, i32* %i, align 4
%idxprom14 = sext i32 %9 to i64 %idxprom14 = sext i32 %9 to i64
%arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom14 %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom14
%arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13 %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13
store float %conv12, float* %arrayidx16, align 4 store float %conv12, float* %arrayidx16, align 4
br label %for.inc br label %for.inc
for.inc: ; preds = %for.body3 for.inc: ; preds = %for.body3
%10 = load i32* %j, align 4 %10 = load i32, i32* %j, align 4
%inc = add nsw i32 %10, 1 %inc = add nsw i32 %10, 1
store i32 %inc, i32* %j, align 4 store i32 %inc, i32* %j, align 4
br label %for.cond1 br label %for.cond1
@ -77,7 +78,7 @@ for.end: ; preds = %for.cond1
br label %for.inc17 br label %for.inc17
for.inc17: ; preds = %for.end for.inc17: ; preds = %for.end
%11 = load i32* %i, align 4 %11 = load i32, i32* %i, align 4
%inc18 = add nsw i32 %11, 1 %inc18 = add nsw i32 %11, 1
store i32 %inc18, i32* %i, align 4 store i32 %inc18, i32* %i, align 4
br label %for.cond br label %for.cond
@ -95,7 +96,7 @@ entry:
br label %for.cond br label %for.cond
for.cond: ; preds = %for.inc10, %entry for.cond: ; preds = %for.inc10, %entry
%0 = load i32* %i, align 4 %0 = load i32, i32* %i, align 4
%cmp = icmp slt i32 %0, 1536 %cmp = icmp slt i32 %0, 1536
br i1 %cmp, label %for.body, label %for.end12 br i1 %cmp, label %for.body, label %for.end12
@ -104,47 +105,47 @@ for.body: ; preds = %for.cond
br label %for.cond1 br label %for.cond1
for.cond1: ; preds = %for.inc, %for.body for.cond1: ; preds = %for.inc, %for.body
%1 = load i32* %j, align 4 %1 = load i32, i32* %j, align 4
%cmp2 = icmp slt i32 %1, 1536 %cmp2 = icmp slt i32 %1, 1536
br i1 %cmp2, label %for.body3, label %for.end br i1 %cmp2, label %for.body3, label %for.end
for.body3: ; preds = %for.cond1 for.body3: ; preds = %for.cond1
%2 = load %struct._IO_FILE** @stdout, align 8 %2 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
%3 = load i32* %j, align 4 %3 = load i32, i32* %j, align 4
%idxprom = sext i32 %3 to i64 %idxprom = sext i32 %3 to i64
%4 = load i32* %i, align 4 %4 = load i32, i32* %i, align 4
%idxprom4 = sext i32 %4 to i64 %idxprom4 = sext i32 %4 to i64
%arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4 %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4
%arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
%5 = load float* %arrayidx5, align 4 %5 = load float, float* %arrayidx5, align 4
%conv = fpext float %5 to double %conv = fpext float %5 to double
%call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv) %call = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), double %conv)
%6 = load i32* %j, align 4 %6 = load i32, i32* %j, align 4
%rem = srem i32 %6, 80 %rem = srem i32 %6, 80
%cmp6 = icmp eq i32 %rem, 79 %cmp6 = icmp eq i32 %rem, 79
br i1 %cmp6, label %if.then, label %if.end br i1 %cmp6, label %if.then, label %if.end
if.then: ; preds = %for.body3 if.then: ; preds = %for.body3
%7 = load %struct._IO_FILE** @stdout, align 8 %7 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
%call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) %call8 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0))
br label %if.end br label %if.end
if.end: ; preds = %if.then, %for.body3 if.end: ; preds = %if.then, %for.body3
br label %for.inc br label %for.inc
for.inc: ; preds = %if.end for.inc: ; preds = %if.end
%8 = load i32* %j, align 4 %8 = load i32, i32* %j, align 4
%inc = add nsw i32 %8, 1 %inc = add nsw i32 %8, 1
store i32 %inc, i32* %j, align 4 store i32 %inc, i32* %j, align 4
br label %for.cond1 br label %for.cond1
for.end: ; preds = %for.cond1 for.end: ; preds = %for.cond1
%9 = load %struct._IO_FILE** @stdout, align 8 %9 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
%call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0)) %call9 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0))
br label %for.inc10 br label %for.inc10
for.inc10: ; preds = %for.end for.inc10: ; preds = %for.end
%10 = load i32* %i, align 4 %10 = load i32, i32* %i, align 4
%inc11 = add nsw i32 %10, 1 %inc11 = add nsw i32 %10, 1
store i32 %inc11, i32* %i, align 4 store i32 %inc11, i32* %i, align 4
br label %for.cond br label %for.cond
@ -164,13 +165,13 @@ entry:
%k = alloca i32, align 4 %k = alloca i32, align 4
%t_start = alloca double, align 8 %t_start = alloca double, align 8
%t_end = alloca double, align 8 %t_end = alloca double, align 8
store i32 0, i32* %retval store i32 0, i32* %retval, align 4
call void @init_array() call void @init_array()
store i32 0, i32* %i, align 4 store i32 0, i32* %i, align 4
br label %for.cond br label %for.cond
for.cond: ; preds = %for.inc28, %entry for.cond: ; preds = %for.inc28, %entry
%0 = load i32* %i, align 4 %0 = load i32, i32* %i, align 4
%cmp = icmp slt i32 %0, 1536 %cmp = icmp slt i32 %0, 1536
br i1 %cmp, label %for.body, label %for.end30 br i1 %cmp, label %for.body, label %for.end30
@ -179,61 +180,61 @@ for.body: ; preds = %for.cond
br label %for.cond1 br label %for.cond1
for.cond1: ; preds = %for.inc25, %for.body for.cond1: ; preds = %for.inc25, %for.body
%1 = load i32* %j, align 4 %1 = load i32, i32* %j, align 4
%cmp2 = icmp slt i32 %1, 1536 %cmp2 = icmp slt i32 %1, 1536
br i1 %cmp2, label %for.body3, label %for.end27 br i1 %cmp2, label %for.body3, label %for.end27
for.body3: ; preds = %for.cond1 for.body3: ; preds = %for.cond1
%2 = load i32* %j, align 4 %2 = load i32, i32* %j, align 4
%idxprom = sext i32 %2 to i64 %idxprom = sext i32 %2 to i64
%3 = load i32* %i, align 4 %3 = load i32, i32* %i, align 4
%idxprom4 = sext i32 %3 to i64 %idxprom4 = sext i32 %3 to i64
%arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4 %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4
%arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
store float 0.000000e+00, float* %arrayidx5, align 4 store float 0.000000e+00, float* %arrayidx5, align 4
store i32 0, i32* %k, align 4 store i32 0, i32* %k, align 4
br label %for.cond6 br label %for.cond6
for.cond6: ; preds = %for.inc, %for.body3 for.cond6: ; preds = %for.inc, %for.body3
%4 = load i32* %k, align 4 %4 = load i32, i32* %k, align 4
%cmp7 = icmp slt i32 %4, 1536 %cmp7 = icmp slt i32 %4, 1536
br i1 %cmp7, label %for.body8, label %for.end br i1 %cmp7, label %for.body8, label %for.end
for.body8: ; preds = %for.cond6 for.body8: ; preds = %for.cond6
%5 = load i32* %j, align 4 %5 = load i32, i32* %j, align 4
%idxprom9 = sext i32 %5 to i64 %idxprom9 = sext i32 %5 to i64
%6 = load i32* %i, align 4 %6 = load i32, i32* %i, align 4
%idxprom10 = sext i32 %6 to i64 %idxprom10 = sext i32 %6 to i64
%arrayidx11 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom10 %arrayidx11 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom10
%arrayidx12 = getelementptr inbounds [1536 x float]* %arrayidx11, i32 0, i64 %idxprom9 %arrayidx12 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx11, i64 0, i64 %idxprom9
%7 = load float* %arrayidx12, align 4 %7 = load float, float* %arrayidx12, align 4
%8 = load i32* %k, align 4 %8 = load i32, i32* %k, align 4
%idxprom13 = sext i32 %8 to i64 %idxprom13 = sext i32 %8 to i64
%9 = load i32* %i, align 4 %9 = load i32, i32* %i, align 4
%idxprom14 = sext i32 %9 to i64 %idxprom14 = sext i32 %9 to i64
%arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom14 %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom14
%arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13 %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13
%10 = load float* %arrayidx16, align 4 %10 = load float, float* %arrayidx16, align 4
%11 = load i32* %j, align 4 %11 = load i32, i32* %j, align 4
%idxprom17 = sext i32 %11 to i64 %idxprom17 = sext i32 %11 to i64
%12 = load i32* %k, align 4 %12 = load i32, i32* %k, align 4
%idxprom18 = sext i32 %12 to i64 %idxprom18 = sext i32 %12 to i64
%arrayidx19 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom18 %arrayidx19 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom18
%arrayidx20 = getelementptr inbounds [1536 x float]* %arrayidx19, i32 0, i64 %idxprom17 %arrayidx20 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx19, i64 0, i64 %idxprom17
%13 = load float* %arrayidx20, align 4 %13 = load float, float* %arrayidx20, align 4
%mul = fmul float %10, %13 %mul = fmul float %10, %13
%add = fadd float %7, %mul %add = fadd float %7, %mul
%14 = load i32* %j, align 4 %14 = load i32, i32* %j, align 4
%idxprom21 = sext i32 %14 to i64 %idxprom21 = sext i32 %14 to i64
%15 = load i32* %i, align 4 %15 = load i32, i32* %i, align 4
%idxprom22 = sext i32 %15 to i64 %idxprom22 = sext i32 %15 to i64
%arrayidx23 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom22 %arrayidx23 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom22
%arrayidx24 = getelementptr inbounds [1536 x float]* %arrayidx23, i32 0, i64 %idxprom21 %arrayidx24 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx23, i64 0, i64 %idxprom21
store float %add, float* %arrayidx24, align 4 store float %add, float* %arrayidx24, align 4
br label %for.inc br label %for.inc
for.inc: ; preds = %for.body8 for.inc: ; preds = %for.body8
%16 = load i32* %k, align 4 %16 = load i32, i32* %k, align 4
%inc = add nsw i32 %16, 1 %inc = add nsw i32 %16, 1
store i32 %inc, i32* %k, align 4 store i32 %inc, i32* %k, align 4
br label %for.cond6 br label %for.cond6
@ -242,7 +243,7 @@ for.end: ; preds = %for.cond6
br label %for.inc25 br label %for.inc25
for.inc25: ; preds = %for.end for.inc25: ; preds = %for.end
%17 = load i32* %j, align 4 %17 = load i32, i32* %j, align 4
%inc26 = add nsw i32 %17, 1 %inc26 = add nsw i32 %17, 1
store i32 %inc26, i32* %j, align 4 store i32 %inc26, i32* %j, align 4
br label %for.cond1 br label %for.cond1
@ -251,7 +252,7 @@ for.end27: ; preds = %for.cond1
br label %for.inc28 br label %for.inc28
for.inc28: ; preds = %for.end27 for.inc28: ; preds = %for.end27
%18 = load i32* %i, align 4 %18 = load i32, i32* %i, align 4
%inc29 = add nsw i32 %18, 1 %inc29 = add nsw i32 %18, 1
store i32 %inc29, i32* %i, align 4 store i32 %inc29, i32* %i, align 4
br label %for.cond br label %for.cond
@ -260,5 +261,9 @@ for.end30: ; preds = %for.cond
ret i32 0 ret i32 0
} }
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"}

View File

@ -3,68 +3,69 @@
echo "--> 1. Create LLVM-IR from C" echo "--> 1. Create LLVM-IR from C"
clang -S -emit-llvm matmul.c -o matmul.s clang -S -emit-llvm matmul.c -o matmul.s
echo "--> 2. Load Polly automatically when calling the 'opt' tool" echo "--> 2. Prepare the LLVM-IR for Polly"
export PATH_TO_POLLY_LIB="~/polly/build/lib/"
alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"
echo "--> 3. Prepare the LLVM-IR for Polly"
opt -S -polly-canonicalize matmul.s > matmul.preopt.ll opt -S -polly-canonicalize matmul.s > matmul.preopt.ll
echo "--> 4. Show the SCoPs detected by Polly" echo "--> 3. Show the SCoPs detected by Polly"
opt -basicaa -polly-ast -analyze -q matmul.preopt.ll opt -basicaa -polly-ast -analyze -q matmul.preopt.ll \
-polly-process-unprofitable
echo "--> 5.1 Highlight the detected SCoPs in the CFGs of the program" echo "--> 4.1 Highlight the detected SCoPs in the CFGs of the program"
# We only create .dot files, as directly -view-scops directly calls graphviz # We only create .dot files, as directly -view-scops directly calls graphviz
# which would require user interaction to continue the script. # which would require user interaction to continue the script.
# opt -basicaa -view-scops -disable-output matmul.preopt.ll # opt -basicaa -view-scops -disable-output matmul.preopt.ll
opt -basicaa -dot-scops -disable-output matmul.preopt.ll opt -basicaa -dot-scops -disable-output matmul.preopt.ll
echo "--> 5.2 Highlight the detected SCoPs in the CFGs of the program (print \ echo "--> 4.2 Highlight the detected SCoPs in the CFGs of the program (print \
no instructions)" no instructions)"
# We only create .dot files, as directly -view-scops-only directly calls # We only create .dot files, as directly -view-scops-only directly calls
# graphviz which would require user interaction to continue the script. # graphviz which would require user interaction to continue the script.
# opt -basicaa -view-scops-only -disable-output matmul.preopt.ll # opt -basicaa -view-scops-only -disable-output matmul.preopt.ll
opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll
echo "--> 5.3 Create .png files from the .dot files" echo "--> 4.3 Create .png files from the .dot files"
for i in `ls *.dot`; do dot -Tpng $i > $i.png; done for i in `ls *.dot`; do dot -Tpng $i > $i.png; done
echo "--> 6. View the polyhedral representation of the SCoPs" echo "--> 5. View the polyhedral representation of the SCoPs"
opt -basicaa -polly-scops -analyze matmul.preopt.ll opt -basicaa -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
echo "--> 7. Show the dependences for the SCoPs" echo "--> 6. Show the dependences for the SCoPs"
opt -basicaa -polly-dependences -analyze matmul.preopt.ll opt -basicaa -polly-dependences -analyze matmul.preopt.ll \
-polly-process-unprofitable
echo "--> 8. Export jscop files" echo "--> 7. Export jscop files"
opt -basicaa -polly-export-jscop matmul.preopt.ll opt -basicaa -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
echo "--> 9. Import the updated jscop files and print the new SCoPs. (optional)" echo "--> 8. Import the updated jscop files and print the new SCoPs. (optional)"
opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll
opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-polly-import-jscop-postfix=interchanged -polly-process-unprofitable
opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-polly-import-jscop-postfix=interchanged+tiled -polly-import-jscop-postfix=interchanged -polly-process-unprofitable
opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \ opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-polly-import-jscop-postfix=interchanged+tiled+vector -polly-import-jscop-postfix=interchanged+tiled -polly-process-unprofitable
opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-polly-import-jscop-postfix=interchanged+tiled+vector \
-polly-process-unprofitable
echo "--> 10. Codegenerate the SCoPs" echo "--> 9. Codegenerate the SCoPs"
opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \ opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \
-polly-codegen \ -polly-codegen -polly-process-unprofitable\
matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll
opt -basicaa -polly-import-jscop \ opt -basicaa -polly-import-jscop \
-polly-import-jscop-postfix=interchanged+tiled -polly-codegen \ -polly-import-jscop-postfix=interchanged+tiled -polly-codegen \
matmul.preopt.ll | opt -O3 > matmul.polly.interchanged+tiled.ll matmul.preopt.ll -polly-process-unprofitable \
opt -basicaa -polly-import-jscop \ | opt -O3 > matmul.polly.interchanged+tiled.ll
opt -basicaa -polly-import-jscop -polly-process-unprofitable\
-polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \
matmul.preopt.ll -polly-vectorizer=polly\ matmul.preopt.ll -polly-vectorizer=polly\
| opt -O3 > matmul.polly.interchanged+tiled+vector.ll | opt -O3 > matmul.polly.interchanged+tiled+vector.ll
opt -basicaa -polly-import-jscop \ opt -basicaa -polly-import-jscop -polly-process-unprofitable\
-polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \ -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \
matmul.preopt.ll -polly-vectorizer=polly -polly-parallel\ matmul.preopt.ll -polly-vectorizer=polly -polly-parallel\
| opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll | opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll
opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll
echo "--> 11. Create the executables" echo "--> 10. Create the executables"
llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \
-o matmul.polly.interchanged.exe -o matmul.polly.interchanged.exe
llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \
@ -80,7 +81,7 @@ llc matmul.polly.interchanged+tiled+vector+openmp.ll \
llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \
-o matmul.normalopt.exe -o matmul.normalopt.exe
echo "--> 12. Compare the runtime of the executables" echo "--> 11. Compare the runtime of the executables"
echo "time ./matmul.normalopt.exe" echo "time ./matmul.normalopt.exe"
time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe

View File

@ -1,47 +1,39 @@
digraph "Scop Graph for 'init_array' function" { digraph "Scop Graph for 'init_array' function" {
label="Scop Graph for 'init_array' function"; label="Scop Graph for 'init_array' function";
Node0x17d4370 [shape=record,label="{entry:\l br label %for.cond\l}"]; Node0x5b5b5a0 [shape=record,label="{entry:\l br label %entry.split\l}"];
Node0x17d4370 -> Node0x17da5d0; Node0x5b5b5a0 -> Node0x5b5de30;
Node0x17da5d0 [shape=record,label="{for.cond: \l %0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ]\l %exitcond3 = icmp ne i64 %0, 1536\l br i1 %exitcond3, label %for.body, label %for.end19\l}"]; Node0x5b5de30 [shape=record,label="{entry.split: \l br label %for.cond1.preheader\l}"];
Node0x17da5d0 -> Node0x17da5f0; Node0x5b5de30 -> Node0x5b5de50;
Node0x17da5d0 -> Node0x17da650; Node0x5b5de50 [shape=record,label="{for.cond1.preheader: \l %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ]\l br label %for.body3\l}"];
Node0x17da5f0 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; Node0x5b5de50 -> Node0x5b5b570;
Node0x17da5f0 -> Node0x17da900; Node0x5b5b570 [shape=record,label="{for.body3: \l %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.body3 ]\l %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l %1 = trunc i64 %0 to i32\l %rem = srem i32 %1, 1024\l %add = add nsw i32 %rem, 1\l %conv = sitofp i32 %add to double\l %div = fmul double %conv, 5.000000e-01\l %conv4 = fptrunc double %div to float\l %arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l store float %conv4, float* %arrayidx6, align 4\l %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l %3 = trunc i64 %2 to i32\l %rem8 = srem i32 %3, 1024\l %add9 = add nsw i32 %rem8, 1\l %conv10 = sitofp i32 %add9 to double\l %div11 = fmul double %conv10, 5.000000e-01\l %conv12 = fptrunc double %div11 to float\l %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l store float %conv12, float* %arrayidx16, align 4\l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body3, label %for.inc17\l}"];
Node0x17da900 [shape=record,label="{for.cond1: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l %arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar\l %arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar\l %1 = mul i64 %0, %indvar\l %mul = trunc i64 %1 to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"]; Node0x5b5b570 -> Node0x5b5b570[constraint=false];
Node0x17da900 -> Node0x17da670; Node0x5b5b570 -> Node0x5b5df30;
Node0x17da900 -> Node0x17da9a0; Node0x5b5df30 [shape=record,label="{for.inc17: \l %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1\l %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536\l br i1 %exitcond7, label %for.cond1.preheader, label %for.end19\l}"];
Node0x17da670 [shape=record,label="{for.body3: \l %rem = srem i32 %mul, 1024\l %add = add nsw i32 1, %rem\l %conv = sitofp i32 %add to double\l %div = fdiv double %conv, 2.000000e+00\l %conv4 = fptrunc double %div to float\l store float %conv4, float* %arrayidx6, align 4\l %rem8 = srem i32 %mul, 1024\l %add9 = add nsw i32 1, %rem8\l %conv10 = sitofp i32 %add9 to double\l %div11 = fdiv double %conv10, 2.000000e+00\l %conv12 = fptrunc double %div11 to float\l store float %conv12, float* %arrayidx16, align 4\l br label %for.inc\l}"]; Node0x5b5df30 -> Node0x5b5de50[constraint=false];
Node0x17da670 -> Node0x17da8e0; Node0x5b5df30 -> Node0x5b5df90;
Node0x17da8e0 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond1\l}"]; Node0x5b5df90 [shape=record,label="{for.end19: \l ret void\l}"];
Node0x17da8e0 -> Node0x17da900[constraint=false];
Node0x17da9a0 [shape=record,label="{for.end: \l br label %for.inc17\l}"];
Node0x17da9a0 -> Node0x17d9e70;
Node0x17d9e70 [shape=record,label="{for.inc17: \l %indvar.next2 = add i64 %0, 1\l br label %for.cond\l}"];
Node0x17d9e70 -> Node0x17da5d0[constraint=false];
Node0x17da650 [shape=record,label="{for.end19: \l ret void\l}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d3a30 { subgraph cluster_0x5b4bdd0 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d4ec0 { subgraph cluster_0x5b4bf50 {
label = ""; label = "Region can not profitably be optimized!";
style = filled; style = solid;
color = 3 subgraph cluster_0x17d4180 { color = 6
subgraph cluster_0x5b4c0d0 {
label = ""; label = "";
style = solid; style = solid;
color = 5 color = 5
Node0x17da900; Node0x5b5b570;
Node0x17da670;
Node0x17da8e0;
} }
Node0x17da5d0; Node0x5b5de50;
Node0x17da5f0; Node0x5b5df30;
Node0x17da9a0;
Node0x17d9e70;
} }
Node0x17d4370; Node0x5b5b5a0;
Node0x17da650; Node0x5b5de30;
Node0x5b5df90;
} }
} }

View File

@ -1,65 +1,50 @@
digraph "Scop Graph for 'main' function" { digraph "Scop Graph for 'main' function" {
label="Scop Graph for 'main' function"; label="Scop Graph for 'main' function";
Node0x17d21a0 [shape=record,label="{entry:\l call void @init_array()\l br label %for.cond\l}"]; Node0x5b5c850 [shape=record,label="{entry:\l br label %entry.split\l}"];
Node0x17d21a0 -> Node0x17d2020; Node0x5b5c850 -> Node0x5b5a440;
Node0x17d2020 [shape=record,label="{for.cond: \l %indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ]\l %exitcond6 = icmp ne i64 %indvar3, 1536\l br i1 %exitcond6, label %for.body, label %for.end30\l}"]; Node0x5b5a440 [shape=record,label="{entry.split: \l tail call void @init_array()\l br label %for.cond1.preheader\l}"];
Node0x17d2020 -> Node0x17d3950; Node0x5b5a440 -> Node0x5b38cd0;
Node0x17d2020 -> Node0x17da500; Node0x5b38cd0 [shape=record,label="{for.cond1.preheader: \l %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ]\l br label %for.body3\l}"];
Node0x17d3950 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; Node0x5b38cd0 -> Node0x5b4bd30;
Node0x17d3950 -> Node0x17da760; Node0x5b4bd30 [shape=record,label="{for.body3: \l %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5,\l... %for.inc25 ]\l %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l store float 0.000000e+00, float* %arrayidx5, align 4\l br label %for.body8\l}"];
Node0x17da760 [shape=record,label="{for.cond1: \l %indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ]\l %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1\l %exitcond5 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond5, label %for.body3, label %for.end27\l}"]; Node0x5b4bd30 -> Node0x5b38c50;
Node0x17da760 -> Node0x17db1e0; Node0x5b38c50 [shape=record,label="{for.body8: \l %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ]\l %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l %0 = load float, float* %arrayidx12, align 4\l %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv\l %1 = load float, float* %arrayidx16, align 4\l %arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4\l %2 = load float, float* %arrayidx20, align 4\l %mul = fmul float %1, %2\l %add = fadd float %0, %mul\l %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l store float %add, float* %arrayidx24, align 4\l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body8, label %for.inc25\l}"];
Node0x17da760 -> Node0x17db250; Node0x5b38c50 -> Node0x5b38c50[constraint=false];
Node0x17db1e0 [shape=record,label="{for.body3: \l store float 0.000000e+00, float* %arrayidx5, align 4\l br label %for.cond6\l}"]; Node0x5b38c50 -> Node0x5b5a290;
Node0x17db1e0 -> Node0x17da740; Node0x5b5a290 [shape=record,label="{for.inc25: \l %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1\l %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536\l br i1 %exitcond6, label %for.body3, label %for.inc28\l}"];
Node0x17da740 [shape=record,label="{for.cond6: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ]\l %arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar\l %arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body8, label %for.end\l}"]; Node0x5b5a290 -> Node0x5b4bd30[constraint=false];
Node0x17da740 -> Node0x17da5a0; Node0x5b5a290 -> Node0x5b5a340;
Node0x17da740 -> Node0x17da800; Node0x5b5a340 [shape=record,label="{for.inc28: \l %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1\l %exitcond9 = icmp ne i64 %indvars.iv.next8, 1536\l br i1 %exitcond9, label %for.cond1.preheader, label %for.end30\l}"];
Node0x17da5a0 [shape=record,label="{for.body8: \l %0 = load float* %arrayidx5, align 4\l %1 = load float* %arrayidx16, align 4\l %2 = load float* %arrayidx20, align 4\l %mul = fmul float %1, %2\l %add = fadd float %0, %mul\l store float %add, float* %arrayidx5, align 4\l br label %for.inc\l}"]; Node0x5b5a340 -> Node0x5b38cd0[constraint=false];
Node0x17da5a0 -> Node0x17da5c0; Node0x5b5a340 -> Node0x5b5a3a0;
Node0x17da5c0 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond6\l}"]; Node0x5b5a3a0 [shape=record,label="{for.end30: \l ret i32 0\l}"];
Node0x17da5c0 -> Node0x17da740[constraint=false];
Node0x17da800 [shape=record,label="{for.end: \l br label %for.inc25\l}"];
Node0x17da800 -> Node0x17dae20;
Node0x17dae20 [shape=record,label="{for.inc25: \l %indvar.next2 = add i64 %indvar1, 1\l br label %for.cond1\l}"];
Node0x17dae20 -> Node0x17da760[constraint=false];
Node0x17db250 [shape=record,label="{for.end27: \l br label %for.inc28\l}"];
Node0x17db250 -> Node0x17dae80;
Node0x17dae80 [shape=record,label="{for.inc28: \l %indvar.next4 = add i64 %indvar3, 1\l br label %for.cond\l}"];
Node0x17dae80 -> Node0x17d2020[constraint=false];
Node0x17da500 [shape=record,label="{for.end30: \l ret i32 0\l}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d3f30 { subgraph cluster_0x5b5c970 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d38d0 { subgraph cluster_0x5b5c5a0 {
label = ""; label = "";
style = filled; style = filled;
color = 3 subgraph cluster_0x17d3850 { color = 3 subgraph cluster_0x5b5c9f0 {
label = ""; label = "";
style = solid; style = solid;
color = 5 color = 5
subgraph cluster_0x17d37d0 { subgraph cluster_0x5b5c110 {
label = ""; label = "";
style = solid; style = solid;
color = 7 color = 7
Node0x17da740; Node0x5b38c50;
Node0x17da5a0;
Node0x17da5c0;
} }
Node0x17da760; Node0x5b4bd30;
Node0x17db1e0; Node0x5b5a290;
Node0x17da800;
Node0x17dae20;
} }
Node0x17d2020; Node0x5b38cd0;
Node0x17d3950; Node0x5b5a340;
Node0x17db250;
Node0x17dae80;
} }
Node0x17d21a0; Node0x5b5c850;
Node0x17da500; Node0x5b5a440;
Node0x5b5a3a0;
} }
} }

View File

@ -1,60 +1,51 @@
digraph "Scop Graph for 'print_array' function" { digraph "Scop Graph for 'print_array' function" {
label="Scop Graph for 'print_array' function"; label="Scop Graph for 'print_array' function";
Node0x17d2200 [shape=record,label="{entry:\l br label %for.cond\l}"]; Node0x5b5ee00 [shape=record,label="{entry:\l br label %entry.split\l}"];
Node0x17d2200 -> Node0x17d4f20; Node0x5b5ee00 -> Node0x5b5ee50;
Node0x17d4f20 [shape=record,label="{for.cond: \l %indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ]\l %exitcond3 = icmp ne i64 %indvar1, 1536\l br i1 %exitcond3, label %for.body, label %for.end12\l}"]; Node0x5b5ee50 [shape=record,label="{entry.split: \l br label %for.cond1.preheader\l}"];
Node0x17d4f20 -> Node0x17d3680; Node0x5b5ee50 -> Node0x5b5ee70;
Node0x17d4f20 -> Node0x17d9fc0; Node0x5b5ee70 [shape=record,label="{for.cond1.preheader: \l %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ]\l %0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l br label %for.body3\l}"];
Node0x17d3680 [shape=record,label="{for.body: \l br label %for.cond1\l}"]; Node0x5b5ee70 -> Node0x5b5ee20;
Node0x17d3680 -> Node0x17da220; Node0x5b5ee20 [shape=record,label="{for.body3: \l %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.inc ]\l %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ]\l %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv\l %2 = load float, float* %arrayidx5, align 4\l %conv = fpext float %2 to double\l %call = tail call i32 (%struct._IO_FILE*, i8*, ...)\l... @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x\l... i8]* @.str, i64 0, i64 0), double %conv) #2\l %3 = trunc i64 %indvars.iv to i32\l %rem = srem i32 %3, 80\l %cmp6 = icmp eq i32 %rem, 79\l br i1 %cmp6, label %if.then, label %for.inc\l}"];
Node0x17da220 [shape=record,label="{for.cond1: \l %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar\l %j.0 = trunc i64 %indvar to i32\l %exitcond = icmp ne i64 %indvar, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"]; Node0x5b5ee20 -> Node0x5b60d10;
Node0x17da220 -> Node0x17d9ea0; Node0x5b5ee20 -> Node0x5b60d70;
Node0x17da220 -> Node0x17da0f0; Node0x5b60d10 [shape=record,label="{if.then: \l %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4)\l br label %for.inc\l}"];
Node0x17d9ea0 [shape=record,label="{for.body3: \l %0 = load %struct._IO_FILE** @stdout, align 8\l %1 = load float* %arrayidx5, align 4\l %conv = fpext float %1 to double\l %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)\l %rem = srem i32 %j.0, 80\l %cmp6 = icmp eq i32 %rem, 79\l br i1 %cmp6, label %if.then, label %if.end\l}"]; Node0x5b60d10 -> Node0x5b60d70;
Node0x17d9ea0 -> Node0x17d9ec0; Node0x5b60d70 [shape=record,label="{for.inc: \l %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l %exitcond = icmp ne i64 %indvars.iv.next, 1536\l br i1 %exitcond, label %for.body3, label %for.end\l}"];
Node0x17d9ea0 -> Node0x17da060; Node0x5b60d70 -> Node0x5b5ee20[constraint=false];
Node0x17d9ec0 [shape=record,label="{if.then: \l %2 = load %struct._IO_FILE** @stdout, align 8\l %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %if.end\l}"]; Node0x5b60d70 -> Node0x5b60e10;
Node0x17d9ec0 -> Node0x17da060; Node0x5b60e10 [shape=record,label="{for.end: \l %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ]\l %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa)\l %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1\l %exitcond8 = icmp ne i64 %indvars.iv.next7, 1536\l br i1 %exitcond8, label %for.cond1.preheader, label %for.end12\l}"];
Node0x17da060 [shape=record,label="{if.end: \l br label %for.inc\l}"]; Node0x5b60e10 -> Node0x5b5ee70[constraint=false];
Node0x17da060 -> Node0x17da200; Node0x5b60e10 -> Node0x5b60e70;
Node0x17da200 [shape=record,label="{for.inc: \l %indvar.next = add i64 %indvar, 1\l br label %for.cond1\l}"]; Node0x5b60e70 [shape=record,label="{for.end12: \l ret void\l}"];
Node0x17da200 -> Node0x17da220[constraint=false];
Node0x17da0f0 [shape=record,label="{for.end: \l %3 = load %struct._IO_FILE** @stdout, align 8\l %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l br label %for.inc10\l}"];
Node0x17da0f0 -> Node0x17da080;
Node0x17da080 [shape=record,label="{for.inc10: \l %indvar.next2 = add i64 %indvar1, 1\l br label %for.cond\l}"];
Node0x17da080 -> Node0x17d4f20[constraint=false];
Node0x17d9fc0 [shape=record,label="{for.end12: \l ret void\l}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d38f0 { subgraph cluster_0x5b349a0 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d4030 { subgraph cluster_0x5b5c2c0 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
style = solid; style = solid;
color = 6 color = 6
subgraph cluster_0x17d3fb0 { subgraph cluster_0x5b5c240 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
style = solid; style = solid;
color = 5 color = 5
subgraph cluster_0x17d3f30 { subgraph cluster_0x5b34a20 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Region can not profitably be optimized!";
style = solid; style = solid;
color = 7 color = 7
Node0x17d9ea0; Node0x5b5ee20;
Node0x17d9ec0; Node0x5b60d10;
} }
Node0x17da220; Node0x5b60d70;
Node0x17da060;
Node0x17da200;
} }
Node0x17d4f20; Node0x5b5ee70;
Node0x17d3680; Node0x5b60e10;
Node0x17da0f0;
Node0x17da080;
} }
Node0x17d2200; Node0x5b5ee00;
Node0x17d9fc0; Node0x5b5ee50;
Node0x5b60e70;
} }
} }

View File

@ -1,47 +1,39 @@
digraph "Scop Graph for 'init_array' function" { digraph "Scop Graph for 'init_array' function" {
label="Scop Graph for 'init_array' function"; label="Scop Graph for 'init_array' function";
Node0x17d4370 [shape=record,label="{entry}"]; Node0x5ae2570 [shape=record,label="{entry}"];
Node0x17d4370 -> Node0x17d9de0; Node0x5ae2570 -> Node0x5ae4e90;
Node0x17d9de0 [shape=record,label="{for.cond}"]; Node0x5ae4e90 [shape=record,label="{entry.split}"];
Node0x17d9de0 -> Node0x17d9e40; Node0x5ae4e90 -> Node0x5ae4f50;
Node0x17d9de0 -> Node0x17d9ea0; Node0x5ae4f50 [shape=record,label="{for.cond1.preheader}"];
Node0x17d9e40 [shape=record,label="{for.body}"]; Node0x5ae4f50 -> Node0x5ae50e0;
Node0x17d9e40 -> Node0x17d9f90; Node0x5ae50e0 [shape=record,label="{for.body3}"];
Node0x17d9f90 [shape=record,label="{for.cond1}"]; Node0x5ae50e0 -> Node0x5ae50e0[constraint=false];
Node0x17d9f90 -> Node0x17d9ff0; Node0x5ae50e0 -> Node0x5ae5100;
Node0x17d9f90 -> Node0x17da050; Node0x5ae5100 [shape=record,label="{for.inc17}"];
Node0x17d9ff0 [shape=record,label="{for.body3}"]; Node0x5ae5100 -> Node0x5ae4f50[constraint=false];
Node0x17d9ff0 -> Node0x17d9f00; Node0x5ae5100 -> Node0x5ae4ff0;
Node0x17d9f00 [shape=record,label="{for.inc}"]; Node0x5ae4ff0 [shape=record,label="{for.end19}"];
Node0x17d9f00 -> Node0x17d9f90[constraint=false];
Node0x17da050 [shape=record,label="{for.end}"];
Node0x17da050 -> Node0x17da200;
Node0x17da200 [shape=record,label="{for.inc17}"];
Node0x17da200 -> Node0x17d9de0[constraint=false];
Node0x17d9ea0 [shape=record,label="{for.end19}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d3a30 { subgraph cluster_0x5ad2dd0 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d4ec0 { subgraph cluster_0x5ad2f50 {
label = ""; label = "Region can not profitably be optimized!";
style = filled; style = solid;
color = 3 subgraph cluster_0x17d4180 { color = 6
subgraph cluster_0x5ad30d0 {
label = ""; label = "";
style = solid; style = solid;
color = 5 color = 5
Node0x17d9f90; Node0x5ae50e0;
Node0x17d9ff0;
Node0x17d9f00;
} }
Node0x17d9de0; Node0x5ae4f50;
Node0x17d9e40; Node0x5ae5100;
Node0x17da050;
Node0x17da200;
} }
Node0x17d4370; Node0x5ae2570;
Node0x17d9ea0; Node0x5ae4e90;
Node0x5ae4ff0;
} }
} }

View File

@ -1,65 +1,50 @@
digraph "Scop Graph for 'main' function" { digraph "Scop Graph for 'main' function" {
label="Scop Graph for 'main' function"; label="Scop Graph for 'main' function";
Node0x17d3950 [shape=record,label="{entry}"]; Node0x5abfcf0 [shape=record,label="{entry}"];
Node0x17d3950 -> Node0x17d21a0; Node0x5abfcf0 -> Node0x5ade060;
Node0x17d21a0 [shape=record,label="{for.cond}"]; Node0x5ade060 [shape=record,label="{entry.split}"];
Node0x17d21a0 -> Node0x17db9a0; Node0x5ade060 -> Node0x5ade0e0;
Node0x17d21a0 -> Node0x17da4f0; Node0x5ade0e0 [shape=record,label="{for.cond1.preheader}"];
Node0x17db9a0 [shape=record,label="{for.body}"]; Node0x5ade0e0 -> Node0x5ade100;
Node0x17db9a0 -> Node0x17da5e0; Node0x5ade100 [shape=record,label="{for.body3}"];
Node0x17da5e0 [shape=record,label="{for.cond1}"]; Node0x5ade100 -> Node0x5ae0020;
Node0x17da5e0 -> Node0x17da640; Node0x5ae0020 [shape=record,label="{for.body8}"];
Node0x17da5e0 -> Node0x17da6a0; Node0x5ae0020 -> Node0x5ae0020[constraint=false];
Node0x17da640 [shape=record,label="{for.body3}"]; Node0x5ae0020 -> Node0x5ae0080;
Node0x17da640 -> Node0x17da550; Node0x5ae0080 [shape=record,label="{for.inc25}"];
Node0x17da550 [shape=record,label="{for.cond6}"]; Node0x5ae0080 -> Node0x5ade100[constraint=false];
Node0x17da550 -> Node0x17da5b0; Node0x5ae0080 -> Node0x5adfef0;
Node0x17da550 -> Node0x17da850; Node0x5adfef0 [shape=record,label="{for.inc28}"];
Node0x17da5b0 [shape=record,label="{for.body8}"]; Node0x5adfef0 -> Node0x5ade0e0[constraint=false];
Node0x17da5b0 -> Node0x17da8b0; Node0x5adfef0 -> Node0x5adff50;
Node0x17da8b0 [shape=record,label="{for.inc}"]; Node0x5adff50 [shape=record,label="{for.end30}"];
Node0x17da8b0 -> Node0x17da550[constraint=false];
Node0x17da850 [shape=record,label="{for.end}"];
Node0x17da850 -> Node0x17db930;
Node0x17db930 [shape=record,label="{for.inc25}"];
Node0x17db930 -> Node0x17da5e0[constraint=false];
Node0x17da6a0 [shape=record,label="{for.end27}"];
Node0x17da6a0 -> Node0x17dada0;
Node0x17dada0 [shape=record,label="{for.inc28}"];
Node0x17dada0 -> Node0x17d21a0[constraint=false];
Node0x17da4f0 [shape=record,label="{for.end30}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d3f30 { subgraph cluster_0x5ad2c80 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d38d0 { subgraph cluster_0x5ad2e50 {
label = ""; label = "";
style = filled; style = filled;
color = 3 subgraph cluster_0x17d3850 { color = 3 subgraph cluster_0x5ad2d00 {
label = ""; label = "";
style = solid; style = solid;
color = 5 color = 5
subgraph cluster_0x17d37d0 { subgraph cluster_0x5ad2dd0 {
label = ""; label = "";
style = solid; style = solid;
color = 7 color = 7
Node0x17da550; Node0x5ae0020;
Node0x17da5b0;
Node0x17da8b0;
} }
Node0x17da5e0; Node0x5ade100;
Node0x17da640; Node0x5ae0080;
Node0x17da850;
Node0x17db930;
} }
Node0x17d21a0; Node0x5ade0e0;
Node0x17db9a0; Node0x5adfef0;
Node0x17da6a0;
Node0x17dada0;
} }
Node0x17d3950; Node0x5abfcf0;
Node0x17da4f0; Node0x5ade060;
Node0x5adff50;
} }
} }

View File

@ -1,60 +1,51 @@
digraph "Scop Graph for 'print_array' function" { digraph "Scop Graph for 'print_array' function" {
label="Scop Graph for 'print_array' function"; label="Scop Graph for 'print_array' function";
Node0x17d2200 [shape=record,label="{entry}"]; Node0x5ae5e30 [shape=record,label="{entry}"];
Node0x17d2200 -> Node0x17d4f20; Node0x5ae5e30 -> Node0x5ae5f50;
Node0x17d4f20 [shape=record,label="{for.cond}"]; Node0x5ae5f50 [shape=record,label="{entry.split}"];
Node0x17d4f20 -> Node0x17d9fd0; Node0x5ae5f50 -> Node0x5ae7d90;
Node0x17d4f20 -> Node0x17da030; Node0x5ae7d90 [shape=record,label="{for.cond1.preheader}"];
Node0x17d9fd0 [shape=record,label="{for.body}"]; Node0x5ae7d90 -> Node0x5ae7f20;
Node0x17d9fd0 -> Node0x17da120; Node0x5ae7f20 [shape=record,label="{for.body3}"];
Node0x17da120 [shape=record,label="{for.cond1}"]; Node0x5ae7f20 -> Node0x5ae7f40;
Node0x17da120 -> Node0x17da180; Node0x5ae7f20 -> Node0x5ae7f60;
Node0x17da120 -> Node0x17da1e0; Node0x5ae7f40 [shape=record,label="{if.then}"];
Node0x17da180 [shape=record,label="{for.body3}"]; Node0x5ae7f40 -> Node0x5ae7f60;
Node0x17da180 -> Node0x17da090; Node0x5ae7f60 [shape=record,label="{for.inc}"];
Node0x17da180 -> Node0x17da0f0; Node0x5ae7f60 -> Node0x5ae7f20[constraint=false];
Node0x17da090 [shape=record,label="{if.then}"]; Node0x5ae7f60 -> Node0x5ae7e30;
Node0x17da090 -> Node0x17da0f0; Node0x5ae7e30 [shape=record,label="{for.end}"];
Node0x17da0f0 [shape=record,label="{if.end}"]; Node0x5ae7e30 -> Node0x5ae7d90[constraint=false];
Node0x17da0f0 -> Node0x17da390; Node0x5ae7e30 -> Node0x5ae8110;
Node0x17da390 [shape=record,label="{for.inc}"]; Node0x5ae8110 [shape=record,label="{for.end12}"];
Node0x17da390 -> Node0x17da120[constraint=false];
Node0x17da1e0 [shape=record,label="{for.end}"];
Node0x17da1e0 -> Node0x17d9e40;
Node0x17d9e40 [shape=record,label="{for.inc10}"];
Node0x17d9e40 -> Node0x17d4f20[constraint=false];
Node0x17da030 [shape=record,label="{for.end12}"];
colorscheme = "paired12" colorscheme = "paired12"
subgraph cluster_0x17d38f0 { subgraph cluster_0x5abb9a0 {
label = ""; label = "";
style = solid; style = solid;
color = 1 color = 1
subgraph cluster_0x17d4030 { subgraph cluster_0x5ae32c0 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
style = solid; style = solid;
color = 6 color = 6
subgraph cluster_0x17d3fb0 { subgraph cluster_0x5ae3240 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Call instruction: %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
style = solid; style = solid;
color = 5 color = 5
subgraph cluster_0x17d3f30 { subgraph cluster_0x5abba20 {
label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79"; label = "Region can not profitably be optimized!";
style = solid; style = solid;
color = 7 color = 7
Node0x17da180; Node0x5ae7f20;
Node0x17da090; Node0x5ae7f40;
} }
Node0x17da120; Node0x5ae7f60;
Node0x17da0f0;
Node0x17da390;
} }
Node0x17d4f20; Node0x5ae7d90;
Node0x17d9fd0; Node0x5ae7e30;
Node0x17da1e0;
Node0x17d9e40;
} }
Node0x17d2200; Node0x5ae5e30;
Node0x17da030; Node0x5ae5f50;
Node0x5ae8110;
} }
} }