mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2024-12-13 19:24:21 +00:00
95935c5de1
As the namings of the scops have changed, polly was not able to read in the user given .jscop files. By renaming the provided files, polly now finds them again and can use them to optimize the matmul function. We also update the generated files to reflect the very latest version of Polly. llvm-svn: 182265
455 lines
17 KiB
HTML
455 lines
17 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
|
|
<html>
|
|
<head>
|
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
|
<title>Polly - Examples</title>
|
|
<link type="text/css" rel="stylesheet" href="menu.css">
|
|
<link type="text/css" rel="stylesheet" href="content.css">
|
|
</head>
|
|
<body>
|
|
<!--#include virtual="menu.html.incl"-->
|
|
<div id="content">
|
|
<!--=====================================================================-->
|
|
<h1>Polly: Execute the individual Polly passes manually</h1>
|
|
<!--=====================================================================-->
|
|
|
|
<p>
|
|
This example presents the individual passes that are involved when optimizing
|
|
code with Polly. We show how to execute them individually and explain for each
|
|
which analysis is performed or what transformation is applied. In this example
|
|
the polyhedral transformation is user-provided to show how much performance
|
|
improvement can be expected by an optimal automatic optimizer.</p>
|
|
|
|
The files used and created in this example are available in the Polly checkout
|
|
in the folder <em>www/experiments/matmul</em>. They can be created automatically
|
|
by running the <em>www/experiments/matmul/runall.sh</em> script.
|
|
|
|
<ol>
|
|
<li><h4>Create LLVM-IR from the C code</h4>
|
|
|
|
Polly works on LLVM-IR. Hence it is necessary to translate the source files into
|
|
LLVM-IR. If more than on file should be optimized the files can be combined into
|
|
a single file with llvm-link.
|
|
|
|
<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre>
|
|
</li>
|
|
|
|
|
|
<li><h4>Load Polly automatically when calling the 'opt' tool</h4>
|
|
|
|
Polly is not built into opt or bugpoint, but it is a shared library that needs
|
|
to be loaded into these tools explicitally. The Polly library is called
|
|
LVMPolly.so. For a cmake build it is available in the build/lib/ directory,
|
|
autoconf creates the same file in
|
|
build/tools/polly/{Release+Asserts|Asserts|Debug}/lib. For convenience we create
|
|
an alias that automatically loads Polly if 'opt' is called.
|
|
<pre class="code">
|
|
export PATH_TO_POLLY_LIB="~/polly/build/lib/"
|
|
alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre>
|
|
</li>
|
|
|
|
<li><h4>Prepare the LLVM-IR for Polly</h4>
|
|
|
|
Polly is only able to work with code that matches a canonical form. To translate
|
|
the LLVM-IR into this form we use a set of canonicalication passes. For this
|
|
example only three passes are necessary. To get good coverage on more
|
|
complicated input files often more canonicalization passes are needed. pollycc
|
|
contains a list of passes that have shown to be beneficial.
|
|
<pre class="code">opt -S -mem2reg -loop-simplify -polly-indvars matmul.s > matmul.preopt.ll</pre></li>
|
|
|
|
<li><h4>Show the SCoPs detected by Polly (optional)</h4>
|
|
|
|
To understand if Polly was able to detect SCoPs, we print the
|
|
structure of the detected SCoPs. In our example two SCoPs were detected. One in
|
|
'init_array' the other in 'main'.
|
|
|
|
<pre class="code">opt -basicaa -polly-cloog -analyze -q matmul.preopt.ll</pre>
|
|
|
|
<pre>
|
|
init_array():
|
|
for (c2=0;c2<=1023;c2++) {
|
|
for (c4=0;c4<=1023;c4++) {
|
|
Stmt_5(c2,c4);
|
|
}
|
|
}
|
|
|
|
main():
|
|
for (c2=0;c2<=1023;c2++) {
|
|
for (c4=0;c4<=1023;c4++) {
|
|
Stmt_4(c2,c4);
|
|
for (c6=0;c6<=1023;c6++) {
|
|
Stmt_6(c2,c4,c6);
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</li>
|
|
<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4>
|
|
|
|
Polly can use graphviz to graphically show a CFG in which the detected SCoPs are
|
|
highlighted. It can also create '.dot' files that can be translated by
|
|
the 'dot' utility into various graphic formats.
|
|
|
|
<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll
|
|
opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre>
|
|
The output for the different functions<br />
|
|
view-scops:
|
|
<a href="experiments/matmul/scops.main.dot.png">main</a>,
|
|
<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>,
|
|
<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br />
|
|
view-scops-only:
|
|
<a href="experiments/matmul/scopsonly.main.dot.png">main</a>,
|
|
<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>,
|
|
<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a>
|
|
</li>
|
|
|
|
<li><h4>View the polyhedral representation of the SCoPs</h4>
|
|
<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre>
|
|
<pre>
|
|
[...]
|
|
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
|
|
'for.cond => for.end19' in function 'init_array':
|
|
Context:
|
|
{ [] }
|
|
Statements {
|
|
Stmt_5
|
|
Domain :=
|
|
{ Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
|
|
Scattering :=
|
|
{ Stmt_5[i0, i1] -> scattering[0, i0, 0, i1, 0] };
|
|
WriteAccess :=
|
|
{ Stmt_5[i0, i1] -> MemRef_A[1037i0 + i1] };
|
|
WriteAccess :=
|
|
{ Stmt_5[i0, i1] -> MemRef_B[1047i0 + i1] };
|
|
FinalRead
|
|
Domain :=
|
|
{ FinalRead[0] };
|
|
Scattering :=
|
|
{ FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4] };
|
|
ReadAccess :=
|
|
{ FinalRead[i0] -> MemRef_A[o0] };
|
|
ReadAccess :=
|
|
{ FinalRead[i0] -> MemRef_B[o0] };
|
|
}
|
|
[...]
|
|
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
|
|
'for.cond => for.end30' in function 'main':
|
|
Context:
|
|
{ [] }
|
|
Statements {
|
|
Stmt_4
|
|
Domain :=
|
|
{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
|
|
Scattering :=
|
|
{ Stmt_4[i0, i1] -> scattering[0, i0, 0, i1, 0, 0, 0] };
|
|
WriteAccess :=
|
|
{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] };
|
|
Stmt_6
|
|
Domain :=
|
|
{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 };
|
|
Scattering :=
|
|
{ Stmt_6[i0, i1, i2] -> scattering[0, i0, 0, i1, 1, i2, 0] };
|
|
ReadAccess :=
|
|
{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
|
|
ReadAccess :=
|
|
{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] };
|
|
ReadAccess :=
|
|
{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] };
|
|
WriteAccess :=
|
|
{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
|
|
FinalRead
|
|
Domain :=
|
|
{ FinalRead[0] };
|
|
Scattering :=
|
|
{ FinalRead[i0] -> scattering[200000000, o1, o2, o3, o4, o5, o6] };
|
|
ReadAccess :=
|
|
{ FinalRead[i0] -> MemRef_C[o0] };
|
|
ReadAccess :=
|
|
{ FinalRead[i0] -> MemRef_A[o0] };
|
|
ReadAccess :=
|
|
{ FinalRead[i0] -> MemRef_B[o0] };
|
|
}
|
|
[...]
|
|
</pre>
|
|
</li>
|
|
|
|
<li><h4>Show the dependences for the SCoPs</h4>
|
|
<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre>
|
|
<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region:
|
|
'for.cond => for.end19' in function 'init_array':
|
|
Must dependences:
|
|
{ }
|
|
May dependences:
|
|
{ }
|
|
Must no source:
|
|
{ }
|
|
May no source:
|
|
{ }
|
|
Printing analysis 'Polly - Calculate dependences for SCoP' for region:
|
|
'for.cond => for.end30' in function 'main':
|
|
Must dependences:
|
|
{ Stmt_4[i0, i1] -> Stmt_6[i0, i1, 0] :
|
|
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023;
|
|
Stmt_6[i0, i1, i2] -> Stmt_6[i0, i1, 1 + i2] :
|
|
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1022;
|
|
Stmt_6[i0, i1, 1023] -> FinalRead[0] :
|
|
i1 <= 1091540 - 1067i0 and i1 >= -1067i0 and i1 >= 0 and i1 <= 1023;
|
|
Stmt_6[1023, i1, 1023] -> FinalRead[0] :
|
|
i1 >= 0 and i1 <= 1023
|
|
}
|
|
May dependences:
|
|
{ }
|
|
Must no source:
|
|
{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] :
|
|
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
|
|
Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] :
|
|
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
|
|
FinalRead[0] -> MemRef_A[o0];
|
|
FinalRead[0] -> MemRef_B[o0]
|
|
FinalRead[0] -> MemRef_C[o0] :
|
|
o0 >= 1092565 or (exists (e0 = [(o0)/1067]: o0 <= 1091540 and o0 >= 0
|
|
and 1067e0 <= -1024 + o0 and 1067e0 >= -1066 + o0)) or o0 <= -1;
|
|
}
|
|
May no source:
|
|
{ }
|
|
</pre></li>
|
|
|
|
<li><h4>Export jscop files</h4>
|
|
|
|
Polly can export the polyhedral representation in so called jscop files. Jscop
|
|
files contain the polyhedral representation stored in a JSON file.
|
|
<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre>
|
|
<pre>Writing SCoP 'for.cond => for.end19' in function 'init_array' to './init_array___%for.cond---%for.end19.jscop'.
|
|
Writing SCoP 'for.cond => for.end30' in function 'main' to './main___%for.cond---%for.end30.jscop'.
|
|
</pre></li>
|
|
|
|
<li><h4>Import the changed jscop files and print the updated SCoP structure
|
|
(optional)</h4>
|
|
<p>Polly can reimport jscop files, in which the schedules of the statements are
|
|
changed. These changed schedules are used to descripe transformations.
|
|
It is possible to import different jscop files by providing the postfix
|
|
of the jscop file that is imported.</p>
|
|
<p> We apply three different transformations on the SCoP in the main function.
|
|
The jscop files describing these transformations are hand written (and available
|
|
in <em>www/experiments/matmul</em>).
|
|
|
|
<h5>No Polly</h5>
|
|
|
|
<p>As a baseline we do not call any Polly code generation, but only apply the
|
|
normal -O3 optimizations.</p>
|
|
|
|
<pre class="code">
|
|
opt matmul.preopt.ll -basicaa \
|
|
-polly-import-jscop \
|
|
-polly-cloog -analyze
|
|
</pre>
|
|
<pre>
|
|
[...]
|
|
main():
|
|
for (c2=0;c2<g;=1535;c2++) {
|
|
for (c4=0;c4<g;=1535;c4++) {
|
|
Stmt_4(c2,c4);
|
|
for (c6=0;c6<g;=1535;c6++) {
|
|
Stmt_6(c2,c4,c6);
|
|
}
|
|
}
|
|
}
|
|
[...]
|
|
</pre>
|
|
<h5>Interchange (and Fission to allow the interchange)</h5>
|
|
<p>We split the loops and can now apply an interchange of the loop dimensions that
|
|
enumerate Stmt_6.</p>
|
|
<pre class="code">
|
|
opt matmul.preopt.ll -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
|
|
-polly-cloog -analyze
|
|
</pre>
|
|
<pre>
|
|
[...]
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
|
[...]
|
|
main():
|
|
for (c2=0;c2<=1535;c2++) {
|
|
for (c4=0;c4<=1535;c4++) {
|
|
Stmt_4(c2,c4);
|
|
}
|
|
}
|
|
for (c2=0;c2<=1535;c2++) {
|
|
for (c4=0;c4<=1535;c4++) {
|
|
for (c6=0;c6<=1535;c6++) {
|
|
Stmt_6(c2,c6,c4);
|
|
}
|
|
}
|
|
}
|
|
[...]
|
|
</pre>
|
|
<h5>Interchange + Tiling</h5>
|
|
<p>In addition to the interchange we tile now the second loop nest.</p>
|
|
|
|
<pre class="code">
|
|
opt matmul.preopt.ll -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
|
|
-polly-cloog -analyze
|
|
</pre>
|
|
<pre>
|
|
[...]
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
|
[...]
|
|
main():
|
|
for (c2=0;c2<=1535;c2++) {
|
|
for (c4=0;c4<=1535;c4++) {
|
|
Stmt_4(c2,c4);
|
|
}
|
|
}
|
|
for (c2=0;c2<=1535;c2+=64) {
|
|
for (c3=0;c3<=1535;c3+=64) {
|
|
for (c4=0;c4<=1535;c4+=64) {
|
|
for (c5=c2;c5<=c2+63;c5++) {
|
|
for (c6=c4;c6<=c4+63;c6++) {
|
|
for (c7=c3;c7<=c3+63;c7++) {
|
|
Stmt_6(c5,c7,c6);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
[...]
|
|
</pre>
|
|
<h5>Interchange + Tiling + Strip-mining to prepare vectorization</h5>
|
|
To later allow vectorization we create a so called trivially parallelizable
|
|
loop. It is innermost, parallel and has only four iterations. It can be
|
|
replaced by 4-element SIMD instructions.
|
|
<pre class="code">
|
|
opt matmul.preopt.ll -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
|
-polly-cloog -analyze </pre>
|
|
|
|
<pre>
|
|
[...]
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
|
[...]
|
|
main():
|
|
for (c2=0;c2<=1535;c2++) {
|
|
for (c4=0;c4<=1535;c4++) {
|
|
Stmt_4(c2,c4);
|
|
}
|
|
}
|
|
for (c2=0;c2<=1535;c2+=64) {
|
|
for (c3=0;c3<=1535;c3+=64) {
|
|
for (c4=0;c4<=1535;c4+=64) {
|
|
for (c5=c2;c5<=c2+63;c5++) {
|
|
for (c6=c4;c6<=c4+63;c6++) {
|
|
for (c7=c3;c7<=c3+63;c7+=4) {
|
|
for (c8=c7;c8<=c7+3;c8++) {
|
|
Stmt_6(c5,c8,c6);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
[...]
|
|
</pre>
|
|
|
|
</li>
|
|
|
|
<li><h4>Codegenerate the SCoPs</h4>
|
|
<p>
|
|
This generates new code for the SCoPs detected by polly.
|
|
If -polly-import is present, transformations specified in the imported openscop
|
|
files will be applied.</p>
|
|
<pre class="code">opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll</pre>
|
|
<pre class="code">
|
|
opt -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
|
|
-polly-codegen matmul.preopt.ll \
|
|
| opt -O3 > matmul.polly.interchanged.ll</pre>
|
|
<pre>
|
|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
|
'./init_array___%for.cond---%for.end19.jscop.interchanged'.
|
|
File could not be read: No such file or directory
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from
|
|
'./main___%for.cond---%for.end30.jscop.interchanged'.
|
|
</pre>
|
|
<pre class="code">
|
|
opt -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
|
|
-polly-codegen matmul.preopt.ll \
|
|
| opt -O3 > matmul.polly.interchanged+tiled.ll</pre>
|
|
<pre>
|
|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
|
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled'.
|
|
File could not be read: No such file or directory
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from
|
|
'./main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
|
</pre>
|
|
<pre class="code">
|
|
opt -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
|
-polly-codegen -polly-vectorizer=polly matmul.preopt.ll \
|
|
| opt -O3 > matmul.polly.interchanged+tiled+vector.ll</pre>
|
|
<pre>
|
|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
|
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
|
|
File could not be read: No such file or directory
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from
|
|
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
|
</pre>
|
|
<pre class="code">
|
|
opt -basicaa \
|
|
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
|
-polly-codegen -polly-vectorizer=polly -enable-polly-openmp matmul.preopt.ll \
|
|
| opt -O3 > matmul.polly.interchanged+tiled+openmp.ll</pre>
|
|
<pre>
|
|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
|
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
|
|
File could not be read: No such file or directory
|
|
Reading JScop 'for.cond => for.end30' in function 'main' from
|
|
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
|
</pre>
|
|
|
|
<li><h4>Create the executables</h4>
|
|
|
|
Create one executable optimized with plain -O3 as well as a set of executables
|
|
optimized in different ways with Polly. One changes only the loop structure, the
|
|
other adds tiling, the next adds vectorization and finally we use OpenMP
|
|
parallelism.
|
|
<pre class="code">
|
|
llc matmul.normalopt.ll -o matmul.normalopt.s && \
|
|
gcc matmul.normalopt.s -o matmul.normalopt.exe
|
|
llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && \
|
|
gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
|
|
llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && \
|
|
gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
|
|
llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && \
|
|
gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
|
|
llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && \
|
|
gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe </pre>
|
|
|
|
<li><h4>Compare the runtime of the executables</h4>
|
|
|
|
By comparing the runtimes of the different code snippets we see that a simple
|
|
loop interchange gives here the largest performance boost. However by adding
|
|
vectorization and by using OpenMP we can further improve the performance
|
|
significantly.
|
|
<pre class="code">time ./matmul.normalopt.exe</pre>
|
|
<pre>42.68 real, 42.55 user, 0.00 sys</pre>
|
|
<pre class="code">time ./matmul.polly.interchanged.exe</pre>
|
|
<pre>04.33 real, 4.30 user, 0.01 sys</pre>
|
|
<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre>
|
|
<pre>04.11 real, 4.10 user, 0.00 sys</pre>
|
|
<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre>
|
|
<pre>01.39 real, 1.36 user, 0.01 sys</pre>
|
|
<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre>
|
|
<pre>00.66 real, 2.58 user, 0.02 sys</pre>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|