mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2025-01-24 01:58:21 +00:00
[www] Remove outdated documentation
Remove examples 'load_Polly_into_clang' and 'manual_matmul'. This information is now available in our SPHINX docs (*). (*) Thanks to Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com> who contributed the SPHINX docs update! llvm-svn: 305186
This commit is contained in:
parent
bccaea57c0
commit
2531a5d827
@ -1,143 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
|
||||
<html>
|
||||
<head>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
<title>Polly - Load Polly into clang</title>
|
||||
<link type="text/css" rel="stylesheet" href="menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="content.css">
|
||||
</head>
|
||||
<body>
|
||||
<div id="box">
|
||||
<!--#include virtual="menu.html.incl"-->
|
||||
<div id="content">
|
||||
<!--=====================================================================-->
|
||||
<h1>Load Polly into clang and automatically run it at -O3</h1>
|
||||
<!--=====================================================================-->
|
||||
|
||||
<p><b>Warning:</b> Even though this example makes it very easy to use Polly,
|
||||
you should be aware that Polly is a young research project. It is expected
|
||||
to crash, produce invalid code or to hang in complex calculations even for
|
||||
simple examples. In case you see such a problem, please check the <a
|
||||
href="bugs.html">Bug database</a> and consider reporting a bug.
|
||||
<p>
|
||||
<b>Warning II:</b> clang/LLVM/Polly need to be in sync. This means
|
||||
you need to compile them yourself from a recent svn/git checkout</b>
|
||||
<h2>Load Polly into clang</h2>
|
||||
|
||||
By default Polly is configured as a shared library plugin that is loaded in
|
||||
tools like clang, opt, and bugpoint when they start their execution.
|
||||
|
||||
By loading Polly into clang (or opt) the Polly options become automatically
|
||||
available. You can load Polly either by adding the relevant commands to
|
||||
the CPPFLAGS or by creating an alias.
|
||||
|
||||
<pre class="code">
|
||||
$ export CPPFLAGS="-Xclang -load -Xclang ${POLLY_BUILD_DIR}/lib/LLVMPolly.so"
|
||||
</pre>
|
||||
|
||||
or
|
||||
<pre class="code">
|
||||
$ alias pollycc clang -Xclang -load -Xclang ${POLLY_BUILD_DIR}/lib/LLVMPolly.so
|
||||
</pre>
|
||||
|
||||
To avoid having to load Polly in the tools, Polly can optionally be configured
|
||||
with cmake to be statically linked in the tools:
|
||||
|
||||
<pre class="code">
|
||||
$ cmake -D LINK_POLLY_INTO_TOOLS:Bool=ON
|
||||
</pre>
|
||||
|
||||
<h2>Optimizing with Polly</h2>
|
||||
|
||||
Optimizing with Polly is as easy as adding <b>-O3 -mllvm -polly</b> to your
|
||||
compiler flags (Polly is only available at -O3).
|
||||
|
||||
<pre class="code">pollycc -O3 -mllvm -polly file.c</pre>
|
||||
|
||||
<h2>Automatic OpenMP code generation</h2>
|
||||
|
||||
To automatically detect parallel loops and generate OpenMP code for them you
|
||||
also need to add <b>-mllvm -polly-parallel -lgomp</b> to your CFLAGS.
|
||||
|
||||
<pre class="code">pollycc -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c</pre>
|
||||
|
||||
<h2>Automatic Vector code generation</h2>
|
||||
|
||||
Automatic vector code generation can be enabled by adding <b>-mllvm
|
||||
-polly-vectorizer=stripmine</b> to your CFLAGS.
|
||||
|
||||
<pre class="code">pollycc -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c</pre>
|
||||
|
||||
<h2>Extract a preoptimized LLVM-IR file</h2>
|
||||
|
||||
Often it is useful to derive from a C-file the LLVM-IR code that is actually
|
||||
optimized by Polly. Normally the LLVM-IR is automatically generated from
|
||||
the C code by first lowering C to LLVM-IR (clang) and by subsequently applying a
|
||||
set of preparing transformations on the LLVM-IR. To get the LLVM-IR after the
|
||||
preparing transformations have been applied run Polly with '-O0'.
|
||||
|
||||
<pre class="code">pollycc -O0 -mllvm -polly -S -emit-llvm file.c</pre>
|
||||
|
||||
<h2>Further options</h2>
|
||||
|
||||
Polly supports further options that are mainly useful for the development or
|
||||
the
|
||||
analysis of Polly. The relevant options can be added to clang by appending
|
||||
<b>-mllvm -option-name</b> to the CFLAGS or the clang
|
||||
command line.
|
||||
|
||||
<h3>Limit Polly to a single function</h3>
|
||||
To limit the execution of Polly to a single function, use the option
|
||||
<b>-polly-only-func=functionname</b>.
|
||||
|
||||
<h3>Disable LLVM-IR generation</h3>
|
||||
Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
|
||||
see the effects of the preparing transformation, but to disable Polly code
|
||||
generation add the option <b>polly-no-codegen</b>.
|
||||
|
||||
<h3>Graphical view of the SCoPs</h3>
|
||||
|
||||
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
|
||||
options are <b>-polly-show</b>, <b>-polly-show-only</b>, <b>-polly-dot</b> and
|
||||
<b>-polly-dot-only</b>. The 'show' options automatically run dotty or another
|
||||
graphviz viewer to show the scops graphically. The 'dot' options store for each
|
||||
function a dot file that highlights the detected SCoPs. If 'only' is appended at
|
||||
the end of the option, the basic blocks are shown without the statements the
|
||||
contain.
|
||||
|
||||
<h3>Change/Disable the Optimizer</h3>
|
||||
Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
|
||||
for data-locality and parallelism using the <a
|
||||
href="http://pluto-compiler.sf.net">Pluto</a> algorithm. For research it is also
|
||||
possible to run <a
|
||||
href="http://www-rocq.inria.fr/~pouchet/software/pocc/">PoCC</a> as external
|
||||
optimizer. PoCC provides access to the original Pluto implementation. To use
|
||||
PoCC add <b>-polly-optimizer=pocc</b> to the command line (only available if
|
||||
Polly was compiled with scoplib support) [removed after <a href="http://llvm.org/releases/download.html#3.4.2">LLVM 3.4.2</a>].
|
||||
To disable the optimizer entirely use the option <b>-polly-optimizer=none</b>.
|
||||
|
||||
<h3>Disable tiling in the optimizer</h3>
|
||||
By default both optimizers perform tiling, if possible. In case this is not
|
||||
wanted the option <b>-polly-tiling=false</b> can be used to disable it. (This
|
||||
option disables tiling for both optimizers).
|
||||
|
||||
<h3>Ignore possible aliasing</h3>
|
||||
By default we only detect scops, if we can prove that the different array bases
|
||||
can not alias. This is correct do if we optimize automatically. However,
|
||||
without special user annotations like 'restrict' we can often not prove that
|
||||
no aliasing is possible. In case the user knows no aliasing can happen in the
|
||||
code the <b>-polly-ignore-aliasing</b> can be used to disable the check for
|
||||
possible aliasing.
|
||||
|
||||
<h3>Import / Export</h3>
|
||||
The flags <b>-polly-import</b> and <b>-polly-export</b> allow the export and
|
||||
reimport of the polyhedral representation. By exporting, modifying and
|
||||
reimporting the polyhedral representation externally calculated transformations
|
||||
can be applied. This enables external optimizers or the manual optimization of
|
||||
specific SCoPs.
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
@ -1,452 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
|
||||
<html>
|
||||
<head>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
<title>Polly - Examples</title>
|
||||
<link type="text/css" rel="stylesheet" href="menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="content.css">
|
||||
</head>
|
||||
<body>
|
||||
<div id="box">
|
||||
<!--#include virtual="menu.html.incl"-->
|
||||
<div id="content">
|
||||
<!--=====================================================================-->
|
||||
<h1>Execute the individual Polly passes manually</h1>
|
||||
<!--=====================================================================-->
|
||||
|
||||
<p>
|
||||
This example presents the individual passes that are involved when optimizing
|
||||
code with Polly. We show how to execute them individually and explain for each
|
||||
which analysis is performed or what transformation is applied. In this example
|
||||
the polyhedral transformation is user-provided to show how much performance
|
||||
improvement can be expected by an optimal automatic optimizer.</p>
|
||||
|
||||
The files used and created in this example are available in the Polly checkout
|
||||
in the folder <em>www/experiments/matmul</em>. They can be created automatically
|
||||
by running the <em>www/experiments/matmul/runall.sh</em> script.
|
||||
|
||||
<ol>
|
||||
<li><h4>Create LLVM-IR from the C code</h4>
|
||||
|
||||
Polly works on LLVM-IR. Hence it is necessary to translate the source files into
|
||||
LLVM-IR. If more than on file should be optimized the files can be combined into
|
||||
a single file with llvm-link.
|
||||
|
||||
<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre>
|
||||
</li>
|
||||
|
||||
|
||||
<li><h4>Load Polly automatically when calling the 'opt' tool</h4>
|
||||
|
||||
Polly is not built into opt or bugpoint, but it is a shared library that needs
|
||||
to be loaded into these tools explicitally. The Polly library is called
|
||||
LVMPolly.so. It is available in the build/lib/ directory. For convenience we create
|
||||
an alias that automatically loads Polly if 'opt' is called.
|
||||
<pre class="code">
|
||||
export PATH_TO_POLLY_LIB="~/polly/build/lib/"
|
||||
alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre>
|
||||
</li>
|
||||
|
||||
<li><h4>Prepare the LLVM-IR for Polly</h4>
|
||||
|
||||
Polly is only able to work with code that matches a canonical form. To translate
|
||||
the LLVM-IR into this form we use a set of canonicalication passes. They are
|
||||
scheduled by using '-polly-canonicalize'.
|
||||
<pre class="code">opt -S -polly-canonicalize matmul.s > matmul.preopt.ll</pre></li>
|
||||
|
||||
<li><h4>Show the SCoPs detected by Polly (optional)</h4>
|
||||
|
||||
To understand if Polly was able to detect SCoPs, we print the
|
||||
structure of the detected SCoPs. In our example two SCoPs were detected. One in
|
||||
'init_array' the other in 'main'.
|
||||
|
||||
<pre class="code">opt -basicaa -polly-ast -analyze -q matmul.preopt.ll</pre>
|
||||
|
||||
<pre>
|
||||
init_array():
|
||||
for (c2=0;c2<=1023;c2++) {
|
||||
for (c4=0;c4<=1023;c4++) {
|
||||
Stmt_5(c2,c4);
|
||||
}
|
||||
}
|
||||
|
||||
main():
|
||||
for (c2=0;c2<=1023;c2++) {
|
||||
for (c4=0;c4<=1023;c4++) {
|
||||
Stmt_4(c2,c4);
|
||||
for (c6=0;c6<=1023;c6++) {
|
||||
Stmt_6(c2,c4,c6);
|
||||
}
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
</li>
|
||||
<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4>
|
||||
|
||||
Polly can use graphviz to graphically show a CFG in which the detected SCoPs are
|
||||
highlighted. It can also create '.dot' files that can be translated by
|
||||
the 'dot' utility into various graphic formats.
|
||||
|
||||
<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll
|
||||
opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre>
|
||||
The output for the different functions<br />
|
||||
view-scops:
|
||||
<a href="experiments/matmul/scops.main.dot.png">main</a>,
|
||||
<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>,
|
||||
<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br />
|
||||
view-scops-only:
|
||||
<a href="experiments/matmul/scopsonly.main.dot.png">main</a>,
|
||||
<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>,
|
||||
<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a>
|
||||
</li>
|
||||
|
||||
<li><h4>View the polyhedral representation of the SCoPs</h4>
|
||||
<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre>
|
||||
<pre>
|
||||
[...]
|
||||
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
|
||||
'for.cond => for.end19' in function 'init_array':
|
||||
Context:
|
||||
{ [] }
|
||||
Statements {
|
||||
Stmt_5
|
||||
Domain :=
|
||||
{ Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
|
||||
Schedule :=
|
||||
{ Stmt_5[i0, i1] -> schedule[0, i0, 0, i1, 0] };
|
||||
WriteAccess :=
|
||||
{ Stmt_5[i0, i1] -> MemRef_A[1037i0 + i1] };
|
||||
WriteAccess :=
|
||||
{ Stmt_5[i0, i1] -> MemRef_B[1047i0 + i1] };
|
||||
FinalRead
|
||||
Domain :=
|
||||
{ FinalRead[0] };
|
||||
Schedule :=
|
||||
{ FinalRead[i0] -> schedule[200000000, o1, o2, o3, o4] };
|
||||
ReadAccess :=
|
||||
{ FinalRead[i0] -> MemRef_A[o0] };
|
||||
ReadAccess :=
|
||||
{ FinalRead[i0] -> MemRef_B[o0] };
|
||||
}
|
||||
[...]
|
||||
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
|
||||
'for.cond => for.end30' in function 'main':
|
||||
Context:
|
||||
{ [] }
|
||||
Statements {
|
||||
Stmt_4
|
||||
Domain :=
|
||||
{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
|
||||
Schedule :=
|
||||
{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] };
|
||||
WriteAccess :=
|
||||
{ Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] };
|
||||
Stmt_6
|
||||
Domain :=
|
||||
{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 };
|
||||
Schedule :=
|
||||
{ Stmt_6[i0, i1, i2] -> schedule[0, i0, 0, i1, 1, i2, 0] };
|
||||
ReadAccess :=
|
||||
{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
|
||||
ReadAccess :=
|
||||
{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] };
|
||||
ReadAccess :=
|
||||
{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] };
|
||||
WriteAccess :=
|
||||
{ Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
|
||||
FinalRead
|
||||
Domain :=
|
||||
{ FinalRead[0] };
|
||||
Schedule :=
|
||||
{ FinalRead[i0] -> schedule[200000000, o1, o2, o3, o4, o5, o6] };
|
||||
ReadAccess :=
|
||||
{ FinalRead[i0] -> MemRef_C[o0] };
|
||||
ReadAccess :=
|
||||
{ FinalRead[i0] -> MemRef_A[o0] };
|
||||
ReadAccess :=
|
||||
{ FinalRead[i0] -> MemRef_B[o0] };
|
||||
}
|
||||
[...]
|
||||
</pre>
|
||||
</li>
|
||||
|
||||
<li><h4>Show the dependences for the SCoPs</h4>
|
||||
<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre>
|
||||
<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region:
|
||||
'for.cond => for.end19' in function 'init_array':
|
||||
Must dependences:
|
||||
{ }
|
||||
May dependences:
|
||||
{ }
|
||||
Must no source:
|
||||
{ }
|
||||
May no source:
|
||||
{ }
|
||||
Printing analysis 'Polly - Calculate dependences for SCoP' for region:
|
||||
'for.cond => for.end30' in function 'main':
|
||||
Must dependences:
|
||||
{ Stmt_4[i0, i1] -> Stmt_6[i0, i1, 0] :
|
||||
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023;
|
||||
Stmt_6[i0, i1, i2] -> Stmt_6[i0, i1, 1 + i2] :
|
||||
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1022;
|
||||
Stmt_6[i0, i1, 1023] -> FinalRead[0] :
|
||||
i1 <= 1091540 - 1067i0 and i1 >= -1067i0 and i1 >= 0 and i1 <= 1023;
|
||||
Stmt_6[1023, i1, 1023] -> FinalRead[0] :
|
||||
i1 >= 0 and i1 <= 1023
|
||||
}
|
||||
May dependences:
|
||||
{ }
|
||||
Must no source:
|
||||
{ Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] :
|
||||
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
|
||||
Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] :
|
||||
i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
|
||||
FinalRead[0] -> MemRef_A[o0];
|
||||
FinalRead[0] -> MemRef_B[o0]
|
||||
FinalRead[0] -> MemRef_C[o0] :
|
||||
o0 >= 1092565 or (exists (e0 = [(o0)/1067]: o0 <= 1091540 and o0 >= 0
|
||||
and 1067e0 <= -1024 + o0 and 1067e0 >= -1066 + o0)) or o0 <= -1;
|
||||
}
|
||||
May no source:
|
||||
{ }
|
||||
</pre></li>
|
||||
|
||||
<li><h4>Export jscop files</h4>
|
||||
|
||||
Polly can export the polyhedral representation in so called jscop files. Jscop
|
||||
files contain the polyhedral representation stored in a JSON file.
|
||||
<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre>
|
||||
<pre>Writing SCoP 'for.cond => for.end19' in function 'init_array' to './init_array___%for.cond---%for.end19.jscop'.
|
||||
Writing SCoP 'for.cond => for.end30' in function 'main' to './main___%for.cond---%for.end30.jscop'.
|
||||
</pre></li>
|
||||
|
||||
<li><h4>Import the changed jscop files and print the updated SCoP structure
|
||||
(optional)</h4>
|
||||
<p>Polly can reimport jscop files, in which the schedules of the statements are
|
||||
changed. These changed schedules are used to descripe transformations.
|
||||
It is possible to import different jscop files by providing the postfix
|
||||
of the jscop file that is imported.</p>
|
||||
<p> We apply three different transformations on the SCoP in the main function.
|
||||
The jscop files describing these transformations are hand written (and available
|
||||
in <em>www/experiments/matmul</em>).
|
||||
|
||||
<h5>No Polly</h5>
|
||||
|
||||
<p>As a baseline we do not call any Polly code generation, but only apply the
|
||||
normal -O3 optimizations.</p>
|
||||
|
||||
<pre class="code">
|
||||
opt matmul.preopt.ll -basicaa \
|
||||
-polly-import-jscop \
|
||||
-polly-ast -analyze
|
||||
</pre>
|
||||
<pre>
|
||||
[...]
|
||||
main():
|
||||
for (c2=0;c2<g;=1535;c2++) {
|
||||
for (c4=0;c4<g;=1535;c4++) {
|
||||
Stmt_4(c2,c4);
|
||||
for (c6=0;c6<g;=1535;c6++) {
|
||||
Stmt_6(c2,c4,c6);
|
||||
}
|
||||
}
|
||||
}
|
||||
[...]
|
||||
</pre>
|
||||
<h5>Interchange (and Fission to allow the interchange)</h5>
|
||||
<p>We split the loops and can now apply an interchange of the loop dimensions that
|
||||
enumerate Stmt_6.</p>
|
||||
<pre class="code">
|
||||
opt matmul.preopt.ll -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
|
||||
-polly-ast -analyze
|
||||
</pre>
|
||||
<pre>
|
||||
[...]
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
||||
[...]
|
||||
main():
|
||||
for (c2=0;c2<=1535;c2++) {
|
||||
for (c4=0;c4<=1535;c4++) {
|
||||
Stmt_4(c2,c4);
|
||||
}
|
||||
}
|
||||
for (c2=0;c2<=1535;c2++) {
|
||||
for (c4=0;c4<=1535;c4++) {
|
||||
for (c6=0;c6<=1535;c6++) {
|
||||
Stmt_6(c2,c6,c4);
|
||||
}
|
||||
}
|
||||
}
|
||||
[...]
|
||||
</pre>
|
||||
<h5>Interchange + Tiling</h5>
|
||||
<p>In addition to the interchange we tile now the second loop nest.</p>
|
||||
|
||||
<pre class="code">
|
||||
opt matmul.preopt.ll -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
|
||||
-polly-ast -analyze
|
||||
</pre>
|
||||
<pre>
|
||||
[...]
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
||||
[...]
|
||||
main():
|
||||
for (c2=0;c2<=1535;c2++) {
|
||||
for (c4=0;c4<=1535;c4++) {
|
||||
Stmt_4(c2,c4);
|
||||
}
|
||||
}
|
||||
for (c2=0;c2<=1535;c2+=64) {
|
||||
for (c3=0;c3<=1535;c3+=64) {
|
||||
for (c4=0;c4<=1535;c4+=64) {
|
||||
for (c5=c2;c5<=c2+63;c5++) {
|
||||
for (c6=c4;c6<=c4+63;c6++) {
|
||||
for (c7=c3;c7<=c3+63;c7++) {
|
||||
Stmt_6(c5,c7,c6);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
[...]
|
||||
</pre>
|
||||
<h5>Interchange + Tiling + Strip-mining to prepare vectorization</h5>
|
||||
To later allow vectorization we create a so called trivially parallelizable
|
||||
loop. It is innermost, parallel and has only four iterations. It can be
|
||||
replaced by 4-element SIMD instructions.
|
||||
<pre class="code">
|
||||
opt matmul.preopt.ll -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
||||
-polly-ast -analyze </pre>
|
||||
|
||||
<pre>
|
||||
[...]
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
||||
[...]
|
||||
main():
|
||||
for (c2=0;c2<=1535;c2++) {
|
||||
for (c4=0;c4<=1535;c4++) {
|
||||
Stmt_4(c2,c4);
|
||||
}
|
||||
}
|
||||
for (c2=0;c2<=1535;c2+=64) {
|
||||
for (c3=0;c3<=1535;c3+=64) {
|
||||
for (c4=0;c4<=1535;c4+=64) {
|
||||
for (c5=c2;c5<=c2+63;c5++) {
|
||||
for (c6=c4;c6<=c4+63;c6++) {
|
||||
for (c7=c3;c7<=c3+63;c7+=4) {
|
||||
for (c8=c7;c8<=c7+3;c8++) {
|
||||
Stmt_6(c5,c8,c6);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
[...]
|
||||
</pre>
|
||||
|
||||
</li>
|
||||
|
||||
<li><h4>Codegenerate the SCoPs</h4>
|
||||
<p>
|
||||
This generates new code for the SCoPs detected by polly.
|
||||
If -polly-import-jscop is present, transformations specified in the imported
|
||||
jscop files will be applied.</p>
|
||||
<pre class="code">opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll</pre>
|
||||
<pre class="code">
|
||||
opt -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
|
||||
-polly-codegen matmul.preopt.ll \
|
||||
| opt -O3 > matmul.polly.interchanged.ll</pre>
|
||||
<pre>
|
||||
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
||||
'./init_array___%for.cond---%for.end19.jscop.interchanged'.
|
||||
File could not be read: No such file or directory
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from
|
||||
'./main___%for.cond---%for.end30.jscop.interchanged'.
|
||||
</pre>
|
||||
<pre class="code">
|
||||
opt -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
|
||||
-polly-codegen matmul.preopt.ll \
|
||||
| opt -O3 > matmul.polly.interchanged+tiled.ll</pre>
|
||||
<pre>
|
||||
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
||||
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled'.
|
||||
File could not be read: No such file or directory
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from
|
||||
'./main___%for.cond---%for.end30.jscop.interchanged+tiled'.
|
||||
</pre>
|
||||
<pre class="code">
|
||||
opt -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
||||
-polly-codegen -polly-vectorizer=polly matmul.preopt.ll \
|
||||
| opt -O3 > matmul.polly.interchanged+tiled+vector.ll</pre>
|
||||
<pre>
|
||||
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
||||
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
|
||||
File could not be read: No such file or directory
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from
|
||||
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
||||
</pre>
|
||||
<pre class="code">
|
||||
opt -basicaa \
|
||||
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
|
||||
-polly-codegen -polly-vectorizer=polly -polly-parallel matmul.preopt.ll \
|
||||
| opt -O3 > matmul.polly.interchanged+tiled+openmp.ll</pre>
|
||||
<pre>
|
||||
Reading JScop 'for.cond => for.end19' in function 'init_array' from
|
||||
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
|
||||
File could not be read: No such file or directory
|
||||
Reading JScop 'for.cond => for.end30' in function 'main' from
|
||||
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
|
||||
</pre>
|
||||
|
||||
<li><h4>Create the executables</h4>
|
||||
|
||||
Create one executable optimized with plain -O3 as well as a set of executables
|
||||
optimized in different ways with Polly. One changes only the loop structure, the
|
||||
other adds tiling, the next adds vectorization and finally we use OpenMP
|
||||
parallelism.
|
||||
<pre class="code">
|
||||
llc matmul.normalopt.ll -o matmul.normalopt.s && \
|
||||
gcc matmul.normalopt.s -o matmul.normalopt.exe
|
||||
llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && \
|
||||
gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
|
||||
llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && \
|
||||
gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
|
||||
llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && \
|
||||
gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
|
||||
llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && \
|
||||
gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe </pre>
|
||||
|
||||
<li><h4>Compare the runtime of the executables</h4>
|
||||
|
||||
By comparing the runtimes of the different code snippets we see that a simple
|
||||
loop interchange gives here the largest performance boost. However by adding
|
||||
vectorization and by using OpenMP we can further improve the performance
|
||||
significantly.
|
||||
<pre class="code">time ./matmul.normalopt.exe</pre>
|
||||
<pre>42.68 real, 42.55 user, 0.00 sys</pre>
|
||||
<pre class="code">time ./matmul.polly.interchanged.exe</pre>
|
||||
<pre>04.33 real, 4.30 user, 0.01 sys</pre>
|
||||
<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre>
|
||||
<pre>04.11 real, 4.10 user, 0.00 sys</pre>
|
||||
<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre>
|
||||
<pre>01.39 real, 1.36 user, 0.01 sys</pre>
|
||||
<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre>
|
||||
<pre>00.66 real, 2.58 user, 0.02 sys</pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
@ -1,22 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
|
||||
<html>
|
||||
<head>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
<title>Polly - Examples</title>
|
||||
<link type="text/css" rel="stylesheet" href="menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="content.css">
|
||||
<meta http-equiv="REFRESH"
|
||||
content="0;url=documentation.html"></HEAD>
|
||||
</head>
|
||||
<body>
|
||||
<!--#include virtual="menu.html.incl"-->
|
||||
<div id="content">
|
||||
<!--=====================================================================-->
|
||||
<h1>Polly: Examples</h1>
|
||||
<!--=====================================================================-->
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
Loading…
x
Reference in New Issue
Block a user