[www] Remove outdated documentation

Remove examples 'load_Polly_into_clang' and 'manual_matmul'. This information is
now available in our SPHINX docs (*).

(*) Thanks to Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com> who
contributed the SPHINX docs update!

llvm-svn: 305186
This commit is contained in:
Tobias Grosser 2017-06-12 12:21:47 +00:00
parent bccaea57c0
commit 2531a5d827
3 changed files with 0 additions and 617 deletions

View File

@ -1,143 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Polly - Load Polly into clang</title>
<link type="text/css" rel="stylesheet" href="menu.css">
<link type="text/css" rel="stylesheet" href="content.css">
</head>
<body>
<div id="box">
<!--#include virtual="menu.html.incl"-->
<div id="content">
<!--=====================================================================-->
<h1>Load Polly into clang and automatically run it at -O3</h1>
<!--=====================================================================-->
<p><b>Warning:</b> Even though this example makes it very easy to use Polly,
you should be aware that Polly is a young research project. It is expected
to crash, produce invalid code or to hang in complex calculations even for
simple examples. In case you see such a problem, please check the <a
href="bugs.html">Bug database</a> and consider reporting a bug.
<p>
<b>Warning II:</b> clang/LLVM/Polly need to be in sync. This means
you need to compile them yourself from a recent svn/git checkout</b>
<h2>Load Polly into clang</h2>
By default Polly is configured as a shared library plugin that is loaded in
tools like clang, opt, and bugpoint when they start their execution.
By loading Polly into clang (or opt) the Polly options become automatically
available. You can load Polly either by adding the relevant commands to
the CPPFLAGS or by creating an alias.
<pre class="code">
$ export CPPFLAGS="-Xclang -load -Xclang ${POLLY_BUILD_DIR}/lib/LLVMPolly.so"
</pre>
or
<pre class="code">
$ alias pollycc clang -Xclang -load -Xclang ${POLLY_BUILD_DIR}/lib/LLVMPolly.so
</pre>
To avoid having to load Polly in the tools, Polly can optionally be configured
with cmake to be statically linked in the tools:
<pre class="code">
$ cmake -D LINK_POLLY_INTO_TOOLS:Bool=ON
</pre>
<h2>Optimizing with Polly</h2>
Optimizing with Polly is as easy as adding <b>-O3 -mllvm -polly</b> to your
compiler flags (Polly is only available at -O3).
<pre class="code">pollycc -O3 -mllvm -polly file.c</pre>
<h2>Automatic OpenMP code generation</h2>
To automatically detect parallel loops and generate OpenMP code for them you
also need to add <b>-mllvm -polly-parallel -lgomp</b> to your CFLAGS.
<pre class="code">pollycc -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c</pre>
<h2>Automatic Vector code generation</h2>
Automatic vector code generation can be enabled by adding <b>-mllvm
-polly-vectorizer=stripmine</b> to your CFLAGS.
<pre class="code">pollycc -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c</pre>
<h2>Extract a preoptimized LLVM-IR file</h2>
Often it is useful to derive from a C-file the LLVM-IR code that is actually
optimized by Polly. Normally the LLVM-IR is automatically generated from
the C code by first lowering C to LLVM-IR (clang) and by subsequently applying a
set of preparing transformations on the LLVM-IR. To get the LLVM-IR after the
preparing transformations have been applied run Polly with '-O0'.
<pre class="code">pollycc -O0 -mllvm -polly -S -emit-llvm file.c</pre>
<h2>Further options</h2>
Polly supports further options that are mainly useful for the development or
the
analysis of Polly. The relevant options can be added to clang by appending
<b>-mllvm -option-name</b> to the CFLAGS or the clang
command line.
<h3>Limit Polly to a single function</h3>
To limit the execution of Polly to a single function, use the option
<b>-polly-only-func=functionname</b>.
<h3>Disable LLVM-IR generation</h3>
Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
see the effects of the preparing transformation, but to disable Polly code
generation add the option <b>polly-no-codegen</b>.
<h3>Graphical view of the SCoPs</h3>
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
options are <b>-polly-show</b>, <b>-polly-show-only</b>, <b>-polly-dot</b> and
<b>-polly-dot-only</b>. The 'show' options automatically run dotty or another
graphviz viewer to show the scops graphically. The 'dot' options store for each
function a dot file that highlights the detected SCoPs. If 'only' is appended at
the end of the option, the basic blocks are shown without the statements the
contain.
<h3>Change/Disable the Optimizer</h3>
Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
for data-locality and parallelism using the <a
href="http://pluto-compiler.sf.net">Pluto</a> algorithm. For research it is also
possible to run <a
href="http://www-rocq.inria.fr/~pouchet/software/pocc/">PoCC</a> as external
optimizer. PoCC provides access to the original Pluto implementation. To use
PoCC add <b>-polly-optimizer=pocc</b> to the command line (only available if
Polly was compiled with scoplib support) [removed after <a href="http://llvm.org/releases/download.html#3.4.2">LLVM 3.4.2</a>].
To disable the optimizer entirely use the option <b>-polly-optimizer=none</b>.
<h3>Disable tiling in the optimizer</h3>
By default both optimizers perform tiling, if possible. In case this is not
wanted the option <b>-polly-tiling=false</b> can be used to disable it. (This
option disables tiling for both optimizers).
<h3>Ignore possible aliasing</h3>
By default we only detect scops, if we can prove that the different array bases
can not alias. This is correct do if we optimize automatically. However,
without special user annotations like 'restrict' we can often not prove that
no aliasing is possible. In case the user knows no aliasing can happen in the
code the <b>-polly-ignore-aliasing</b> can be used to disable the check for
possible aliasing.
<h3>Import / Export</h3>
The flags <b>-polly-import</b> and <b>-polly-export</b> allow the export and
reimport of the polyhedral representation. By exporting, modifying and
reimporting the polyhedral representation externally calculated transformations
can be applied. This enables external optimizers or the manual optimization of
specific SCoPs.
</div>
</div>
</body>
</html>

View File

@ -1,452 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Polly - Examples</title>
<link type="text/css" rel="stylesheet" href="menu.css">
<link type="text/css" rel="stylesheet" href="content.css">
</head>
<body>
<div id="box">
<!--#include virtual="menu.html.incl"-->
<div id="content">
<!--=====================================================================-->
<h1>Execute the individual Polly passes manually</h1>
<!--=====================================================================-->
<p>
This example presents the individual passes that are involved when optimizing
code with Polly. We show how to execute them individually and explain for each
which analysis is performed or what transformation is applied. In this example
the polyhedral transformation is user-provided to show how much performance
improvement can be expected by an optimal automatic optimizer.</p>
The files used and created in this example are available in the Polly checkout
in the folder <em>www/experiments/matmul</em>. They can be created automatically
by running the <em>www/experiments/matmul/runall.sh</em> script.
<ol>
<li><h4>Create LLVM-IR from the C code</h4>
Polly works on LLVM-IR. Hence it is necessary to translate the source files into
LLVM-IR. If more than on file should be optimized the files can be combined into
a single file with llvm-link.
<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre>
</li>
<li><h4>Load Polly automatically when calling the 'opt' tool</h4>
Polly is not built into opt or bugpoint, but it is a shared library that needs
to be loaded into these tools explicitally. The Polly library is called
LVMPolly.so. It is available in the build/lib/ directory. For convenience we create
an alias that automatically loads Polly if 'opt' is called.
<pre class="code">
export PATH_TO_POLLY_LIB="~/polly/build/lib/"
alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre>
</li>
<li><h4>Prepare the LLVM-IR for Polly</h4>
Polly is only able to work with code that matches a canonical form. To translate
the LLVM-IR into this form we use a set of canonicalication passes. They are
scheduled by using '-polly-canonicalize'.
<pre class="code">opt -S -polly-canonicalize matmul.s &gt; matmul.preopt.ll</pre></li>
<li><h4>Show the SCoPs detected by Polly (optional)</h4>
To understand if Polly was able to detect SCoPs, we print the
structure of the detected SCoPs. In our example two SCoPs were detected. One in
'init_array' the other in 'main'.
<pre class="code">opt -basicaa -polly-ast -analyze -q matmul.preopt.ll</pre>
<pre>
init_array():
for (c2=0;c2&lt;=1023;c2++) {
for (c4=0;c4&lt;=1023;c4++) {
Stmt_5(c2,c4);
}
}
main():
for (c2=0;c2&lt;=1023;c2++) {
for (c4=0;c4&lt;=1023;c4++) {
Stmt_4(c2,c4);
for (c6=0;c6&lt;=1023;c6++) {
Stmt_6(c2,c4,c6);
}
}
}
</pre>
</li>
<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4>
Polly can use graphviz to graphically show a CFG in which the detected SCoPs are
highlighted. It can also create '.dot' files that can be translated by
the 'dot' utility into various graphic formats.
<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll
opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre>
The output for the different functions<br />
view-scops:
<a href="experiments/matmul/scops.main.dot.png">main</a>,
<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>,
<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br />
view-scops-only:
<a href="experiments/matmul/scopsonly.main.dot.png">main</a>,
<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>,
<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a>
</li>
<li><h4>View the polyhedral representation of the SCoPs</h4>
<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre>
<pre>
[...]
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
'for.cond =&gt; for.end19' in function 'init_array':
Context:
{ [] }
Statements {
Stmt_5
Domain&nbsp;:=
{ Stmt_5[i0, i1]&nbsp;: i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 };
Schedule&nbsp;:=
{ Stmt_5[i0, i1] -&gt; schedule[0, i0, 0, i1, 0] };
WriteAccess&nbsp;:=
{ Stmt_5[i0, i1] -&gt; MemRef_A[1037i0 + i1] };
WriteAccess&nbsp;:=
{ Stmt_5[i0, i1] -&gt; MemRef_B[1047i0 + i1] };
FinalRead
Domain&nbsp;:=
{ FinalRead[0] };
Schedule&nbsp;:=
{ FinalRead[i0] -&gt; schedule[200000000, o1, o2, o3, o4] };
ReadAccess&nbsp;:=
{ FinalRead[i0] -&gt; MemRef_A[o0] };
ReadAccess&nbsp;:=
{ FinalRead[i0] -&gt; MemRef_B[o0] };
}
[...]
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
'for.cond =&gt; for.end30' in function 'main':
Context:
{ [] }
Statements {
Stmt_4
Domain&nbsp;:=
{ Stmt_4[i0, i1]&nbsp;: i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 };
Schedule&nbsp;:=
{ Stmt_4[i0, i1] -&gt; schedule[0, i0, 0, i1, 0, 0, 0] };
WriteAccess&nbsp;:=
{ Stmt_4[i0, i1] -&gt; MemRef_C[1067i0 + i1] };
Stmt_6
Domain&nbsp;:=
{ Stmt_6[i0, i1, i2]&nbsp;: i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 and i2 &gt;= 0 and i2 &lt;= 1023 };
Schedule&nbsp;:=
{ Stmt_6[i0, i1, i2] -&gt; schedule[0, i0, 0, i1, 1, i2, 0] };
ReadAccess&nbsp;:=
{ Stmt_6[i0, i1, i2] -&gt; MemRef_C[1067i0 + i1] };
ReadAccess&nbsp;:=
{ Stmt_6[i0, i1, i2] -&gt; MemRef_A[1037i0 + i2] };
ReadAccess&nbsp;:=
{ Stmt_6[i0, i1, i2] -&gt; MemRef_B[i1 + 1047i2] };
WriteAccess&nbsp;:=
{ Stmt_6[i0, i1, i2] -&gt; MemRef_C[1067i0 + i1] };
FinalRead
Domain&nbsp;:=
{ FinalRead[0] };
Schedule&nbsp;:=
{ FinalRead[i0] -&gt; schedule[200000000, o1, o2, o3, o4, o5, o6] };
ReadAccess&nbsp;:=
{ FinalRead[i0] -&gt; MemRef_C[o0] };
ReadAccess&nbsp;:=
{ FinalRead[i0] -&gt; MemRef_A[o0] };
ReadAccess&nbsp;:=
{ FinalRead[i0] -&gt; MemRef_B[o0] };
}
[...]
</pre>
</li>
<li><h4>Show the dependences for the SCoPs</h4>
<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre>
<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region:
'for.cond =&gt; for.end19' in function 'init_array':
Must dependences:
{ }
May dependences:
{ }
Must no source:
{ }
May no source:
{ }
Printing analysis 'Polly - Calculate dependences for SCoP' for region:
'for.cond =&gt; for.end30' in function 'main':
Must dependences:
{ Stmt_4[i0, i1] -&gt; Stmt_6[i0, i1, 0]&nbsp;:
i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023;
Stmt_6[i0, i1, i2] -&gt; Stmt_6[i0, i1, 1 + i2]&nbsp;:
i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 and i2 &gt;= 0 and i2 &lt;= 1022;
Stmt_6[i0, i1, 1023] -&gt; FinalRead[0]&nbsp;:
i1 &lt;= 1091540 - 1067i0 and i1 &gt;= -1067i0 and i1 &gt;= 0 and i1 &lt;= 1023;
Stmt_6[1023, i1, 1023] -&gt; FinalRead[0]&nbsp;:
i1 &gt;= 0 and i1 &lt;= 1023
}
May dependences:
{ }
Must no source:
{ Stmt_6[i0, i1, i2] -&gt; MemRef_A[1037i0 + i2]&nbsp;:
i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 and i2 &gt;= 0 and i2 &lt;= 1023;
Stmt_6[i0, i1, i2] -&gt; MemRef_B[i1 + 1047i2]&nbsp;:
i0 &gt;= 0 and i0 &lt;= 1023 and i1 &gt;= 0 and i1 &lt;= 1023 and i2 &gt;= 0 and i2 &lt;= 1023;
FinalRead[0] -&gt; MemRef_A[o0];
FinalRead[0] -&gt; MemRef_B[o0]
FinalRead[0] -&gt; MemRef_C[o0]&nbsp;:
o0 &gt;= 1092565 or (exists (e0 = [(o0)/1067]: o0 &lt;= 1091540 and o0 &gt;= 0
and 1067e0 &lt;= -1024 + o0 and 1067e0 &gt;= -1066 + o0)) or o0 &lt;= -1;
}
May no source:
{ }
</pre></li>
<li><h4>Export jscop files</h4>
Polly can export the polyhedral representation in so called jscop files. Jscop
files contain the polyhedral representation stored in a JSON file.
<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre>
<pre>Writing SCoP 'for.cond =&gt; for.end19' in function 'init_array' to './init_array___%for.cond---%for.end19.jscop'.
Writing SCoP 'for.cond =&gt; for.end30' in function 'main' to './main___%for.cond---%for.end30.jscop'.
</pre></li>
<li><h4>Import the changed jscop files and print the updated SCoP structure
(optional)</h4>
<p>Polly can reimport jscop files, in which the schedules of the statements are
changed. These changed schedules are used to descripe transformations.
It is possible to import different jscop files by providing the postfix
of the jscop file that is imported.</p>
<p> We apply three different transformations on the SCoP in the main function.
The jscop files describing these transformations are hand written (and available
in <em>www/experiments/matmul</em>).
<h5>No Polly</h5>
<p>As a baseline we do not call any Polly code generation, but only apply the
normal -O3 optimizations.</p>
<pre class="code">
opt matmul.preopt.ll -basicaa \
-polly-import-jscop \
-polly-ast -analyze
</pre>
<pre>
[...]
main():
for (c2=0;c2&ltg;=1535;c2++) {
for (c4=0;c4&ltg;=1535;c4++) {
Stmt_4(c2,c4);
for (c6=0;c6&ltg;=1535;c6++) {
Stmt_6(c2,c4,c6);
}
}
}
[...]
</pre>
<h5>Interchange (and Fission to allow the interchange)</h5>
<p>We split the loops and can now apply an interchange of the loop dimensions that
enumerate Stmt_6.</p>
<pre class="code">
opt matmul.preopt.ll -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
-polly-ast -analyze
</pre>
<pre>
[...]
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
[...]
main():
for (c2=0;c2&lt;=1535;c2++) {
for (c4=0;c4&lt;=1535;c4++) {
Stmt_4(c2,c4);
}
}
for (c2=0;c2&lt;=1535;c2++) {
for (c4=0;c4&lt;=1535;c4++) {
for (c6=0;c6&lt;=1535;c6++) {
Stmt_6(c2,c6,c4);
}
}
}
[...]
</pre>
<h5>Interchange + Tiling</h5>
<p>In addition to the interchange we tile now the second loop nest.</p>
<pre class="code">
opt matmul.preopt.ll -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
-polly-ast -analyze
</pre>
<pre>
[...]
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
[...]
main():
for (c2=0;c2&lt;=1535;c2++) {
for (c4=0;c4&lt;=1535;c4++) {
Stmt_4(c2,c4);
}
}
for (c2=0;c2&lt;=1535;c2+=64) {
for (c3=0;c3&lt;=1535;c3+=64) {
for (c4=0;c4&lt;=1535;c4+=64) {
for (c5=c2;c5&lt;=c2+63;c5++) {
for (c6=c4;c6&lt;=c4+63;c6++) {
for (c7=c3;c7&lt;=c3+63;c7++) {
Stmt_6(c5,c7,c6);
}
}
}
}
}
}
[...]
</pre>
<h5>Interchange + Tiling + Strip-mining to prepare vectorization</h5>
To later allow vectorization we create a so called trivially parallelizable
loop. It is innermost, parallel and has only four iterations. It can be
replaced by 4-element SIMD instructions.
<pre class="code">
opt matmul.preopt.ll -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
-polly-ast -analyze </pre>
<pre>
[...]
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
[...]
main():
for (c2=0;c2&lt;=1535;c2++) {
for (c4=0;c4&lt;=1535;c4++) {
Stmt_4(c2,c4);
}
}
for (c2=0;c2&lt;=1535;c2+=64) {
for (c3=0;c3&lt;=1535;c3+=64) {
for (c4=0;c4&lt;=1535;c4+=64) {
for (c5=c2;c5&lt;=c2+63;c5++) {
for (c6=c4;c6&lt;=c4+63;c6++) {
for (c7=c3;c7&lt;=c3+63;c7+=4) {
for (c8=c7;c8&lt;=c7+3;c8++) {
Stmt_6(c5,c8,c6);
}
}
}
}
}
}
}
[...]
</pre>
</li>
<li><h4>Codegenerate the SCoPs</h4>
<p>
This generates new code for the SCoPs detected by polly.
If -polly-import-jscop is present, transformations specified in the imported
jscop files will be applied.</p>
<pre class="code">opt matmul.preopt.ll | opt -O3 &gt; matmul.normalopt.ll</pre>
<pre class="code">
opt -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged \
-polly-codegen matmul.preopt.ll \
| opt -O3 &gt; matmul.polly.interchanged.ll</pre>
<pre>
Reading JScop 'for.cond =&gt; for.end19' in function 'init_array' from
'./init_array___%for.cond---%for.end19.jscop.interchanged'.
File could not be read: No such file or directory
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from
'./main___%for.cond---%for.end30.jscop.interchanged'.
</pre>
<pre class="code">
opt -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
-polly-codegen matmul.preopt.ll \
| opt -O3 &gt; matmul.polly.interchanged+tiled.ll</pre>
<pre>
Reading JScop 'for.cond =&gt; for.end19' in function 'init_array' from
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled'.
File could not be read: No such file or directory
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from
'./main___%for.cond---%for.end30.jscop.interchanged+tiled'.
</pre>
<pre class="code">
opt -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
-polly-codegen -polly-vectorizer=polly matmul.preopt.ll \
| opt -O3 &gt; matmul.polly.interchanged+tiled+vector.ll</pre>
<pre>
Reading JScop 'for.cond =&gt; for.end19' in function 'init_array' from
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
</pre>
<pre class="code">
opt -basicaa \
-polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
-polly-codegen -polly-vectorizer=polly -polly-parallel matmul.preopt.ll \
| opt -O3 &gt; matmul.polly.interchanged+tiled+openmp.ll</pre>
<pre>
Reading JScop 'for.cond =&gt; for.end19' in function 'init_array' from
'./init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop 'for.cond =&gt; for.end30' in function 'main' from
'./main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
</pre>
<li><h4>Create the executables</h4>
Create one executable optimized with plain -O3 as well as a set of executables
optimized in different ways with Polly. One changes only the loop structure, the
other adds tiling, the next adds vectorization and finally we use OpenMP
parallelism.
<pre class="code">
llc matmul.normalopt.ll -o matmul.normalopt.s &amp;&amp; \
gcc matmul.normalopt.s -o matmul.normalopt.exe
llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s &amp;&amp; \
gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s &amp;&amp; \
gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s &amp;&amp; \
gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s &amp;&amp; \
gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe </pre>
<li><h4>Compare the runtime of the executables</h4>
By comparing the runtimes of the different code snippets we see that a simple
loop interchange gives here the largest performance boost. However by adding
vectorization and by using OpenMP we can further improve the performance
significantly.
<pre class="code">time ./matmul.normalopt.exe</pre>
<pre>42.68 real, 42.55 user, 0.00 sys</pre>
<pre class="code">time ./matmul.polly.interchanged.exe</pre>
<pre>04.33 real, 4.30 user, 0.01 sys</pre>
<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre>
<pre>04.11 real, 4.10 user, 0.00 sys</pre>
<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre>
<pre>01.39 real, 1.36 user, 0.01 sys</pre>
<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre>
<pre>00.66 real, 2.58 user, 0.02 sys</pre>
</li>
</ol>
</div>
</div>
</body>
</html>

View File

@ -1,22 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Polly - Examples</title>
<link type="text/css" rel="stylesheet" href="menu.css">
<link type="text/css" rel="stylesheet" href="content.css">
<meta http-equiv="REFRESH"
content="0;url=documentation.html"></HEAD>
</head>
<body>
<!--#include virtual="menu.html.incl"-->
<div id="content">
<!--=====================================================================-->
<h1>Polly: Examples</h1>
<!--=====================================================================-->
</div>
</body>
</html>