Porting the example illustrating Polly from HTML to reStructuredText

http://polly.llvm.org/example_manual_matmul.html which illustrates individual passes of Polly, has been ported to reStructuredText and necessary changes have been made to the configuration files used by SPHINX to include the new source as a part of the documentation. Contributed-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D25163 llvm-svn: 294735
2025-02-03 07:38:57 +00:00 · 2017-02-10 11:46:57 +00:00 · 2017-02-10 11:46:57 +00:00 · 30a02088c0
commit 30a02088c0
parent 296fe2e2ad
20 changed files with 1093 additions and 611 deletions
--- a/polly/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst
+++ b/polly/docs/HowToManuallyUseTheIndividualPiecesOfPolly.rst
@ -0,0 +1,475 @@
 ==================================================
 How to manually use the Individual pieces of Polly
 ==================================================
 Execute the individual Polly passes manually
 ============================================
 .. sectionauthor:: Singapuram Sanjay Srivallabh
 This example presents the individual passes that are involved when optimizing
 code with Polly. We show how to execute them individually and explain for
 each which analysis is performed or what transformation is applied. In this
 example the polyhedral transformation is user-provided to show how much
 performance improvement can be expected by an optimal automatic optimizer.
 1. **Create LLVM-IR from the C code**
 -------------------------------------
        Polly works on LLVM-IR. Hence it is necessary to translate the source
        files into LLVM-IR. If more than one file should be optimized the
        files can be combined into a single file with llvm-link.
        .. code-block:: console
                clang -S -emit-llvm matmul.c -o matmul.s
 2. **Prepare the LLVM-IR for Polly**
 ------------------------------------
        Polly is only able to work with code that matches a canonical form.
        To translate the LLVM-IR into this form we use a set of
        canonicalication passes. They are scheduled by using
        '-polly-canonicalize'.
        .. code-block:: console
                opt -S -polly-canonicalize matmul.s > matmul.preopt.ll
 3. **Show the SCoPs detected by Polly (optional)**
 --------------------------------------------------
        To understand if Polly was able to detect SCoPs, we print the structure
        of the detected SCoPs. In our example two SCoPs are detected. One in
        'init_array' the other in 'main'.
        .. code-block:: console
                $ opt -polly-ast -analyze -q matmul.preopt.ll -polly-process-unprofitable
        .. code-block:: guess
                :: isl ast :: init_array :: %for.cond1.preheader---%for.end19
                if (1)
                    for (int c0 = 0; c0 <= 1535; c0 += 1)
                      for (int c1 = 0; c1 <= 1535; c1 += 1)
                        Stmt_for_body3(c0, c1);
                else
                    {  /* original code */ }
                :: isl ast :: main :: %for.cond1.preheader---%for.end30
                if (1)
                    for (int c0 = 0; c0 <= 1535; c0 += 1)
                      for (int c1 = 0; c1 <= 1535; c1 += 1) {
                        Stmt_for_body3(c0, c1);
                        for (int c2 = 0; c2 <= 1535; c2 += 1)
                          Stmt_for_body8(c0, c1, c2);
                      }
                else
                    {  /* original code */ }
 4. **Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)**
 ----------------------------------------------------------------------------------------
        Polly can use graphviz to graphically show a CFG in which the detected
        SCoPs are highlighted. It can also create '.dot' files that can be
        translated by the 'dot' utility into various graphic formats.
        .. code-block:: console
                $ opt -view-scops -disable-output matmul.preopt.ll
                $ opt -view-scops-only -disable-output matmul.preopt.ll
        The output for the different functions:
        - view-scops : main_, init_array_, print_array_
        - view-scops-only : main-scopsonly_, init_array-scopsonly_, print_array-scopsonly_
 .. _main:  http://polly.llvm.org/experiments/matmul/scops.main.dot.png
 .. _init_array: http://polly.llvm.org/experiments/matmul/scops.init_array.dot.png
 .. _print_array: http://polly.llvm.org/experiments/matmul/scops.print_array.dot.png
 .. _main-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.main.dot.png
 .. _init_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.init_array.dot.png
 .. _print_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.print_array.dot.png
 5. **View the polyhedral representation of the SCoPs**
 ------------------------------------------------------
        .. code-block:: console
                $ opt -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
        .. code-block:: guess
                [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
                    Function: init_array
                    Region: %for.cond1.preheader---%for.end19
                    Max Loop Depth:  2
                        Invariant Accesses: {
                        }
                        Context:
                        {  :  }
                        Assumed Context:
                        {  :  }
                        Invalid Context:
                        {  : 1 = 0 }
                        Arrays {
                            float MemRef_A[*][1536]; // Element size 4
                            float MemRef_B[*][1536]; // Element size 4
                        }
                        Arrays (Bounds as pw_affs) {
                            float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
                            float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
                        }
                        Alias Groups (0):
                            n/a
                        Statements {
    	                    Stmt_for_body3
                                Domain :=
                                    { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
                                Schedule :=
                                    { Stmt_for_body3[i0, i1] -> [i0, i1] };
                                MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                    { Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] };
                                MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                    { Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] };
                        }
                [...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end30' in function 'main':
                    Function: main
                    Region: %for.cond1.preheader---%for.end30
                    Max Loop Depth:  3
                    Invariant Accesses: {
                    }
                    Context:
                    {  :  }
                    Assumed Context:
                    {  :  }
                    Invalid Context:
                    {  : 1 = 0 }
                    Arrays {
                        float MemRef_C[*][1536]; // Element size 4
                        float MemRef_A[*][1536]; // Element size 4
                        float MemRef_B[*][1536]; // Element size 4
                    }
                    Arrays (Bounds as pw_affs) {
                        float MemRef_C[*][ { [] -> [(1536)] } ]; // Element size 4
                        float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
                        float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
                    }
                    Alias Groups (0):
                        n/a
                    Statements {
                    	Stmt_for_body3
                            Domain :=
                                { Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
                            Schedule :=
                                { Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] };
                            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                { Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] };
                    	Stmt_for_body8
                            Domain :=
                                { Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 };
                            Schedule :=
                                { Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] };
                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                { Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] };
                            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                { Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] };
                            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                                { Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
                    }
 6. **Show the dependences for the SCoPs**
 -----------------------------------------
        .. code-block:: console
 	        $ opt -polly-dependences -analyze matmul.preopt.ll -polly-process-unprofitable
        .. code-block:: guess
        	[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
        		RAW dependences:
        			{  }
        		WAR dependences:
        			{  }
        		WAW dependences:
        			{  }
        		Reduction dependences:
        			n/a
        		Transitive closure of reduction dependences:
        			{  }
        	[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end30' in function 'main':
        		RAW dependences:
        			{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
        		WAR dependences:
        			{  }
        		WAW dependences:
        			{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
        		Reduction dependences:
        			n/a
        		Transitive closure of reduction dependences:
        			{  }
 7. **Export jscop files**
 -------------------------
        .. code-block:: console
        	$ opt -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
        .. code-block:: guess
 	        [...]Writing JScop '%for.cond1.preheader---%for.end19' in function 'init_array' to './init_array___%for.cond1.preheader---%for.end19.jscop'.
 	        Writing JScop '%for.cond1.preheader---%for.end30' in function 'main' to './main___%for.cond1.preheader---%for.end30.jscop'.
 8. **Import the changed jscop files and print the updated SCoP structure (optional)**
 -------------------------------------------------------------------------------------
 	Polly can reimport jscop files, in which the schedules of the statements
        are changed. These changed schedules are used to descripe
        transformations. It is possible to import different jscop files by
        providing the postfix of the jscop file that is imported.
 	We apply three different transformations on the SCoP in the main
        function. The jscop files describing these transformations are
        hand written (and available in docs/experiments/matmul).
 	**No Polly**
 	As a baseline we do not call any Polly code generation, but only apply the normal -O3 optimizations.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-ast -analyze -polly-process-unprofitable
 	.. code-block:: c
 		[...]
 		:: isl ast :: main :: %for.cond1.preheader---%for.end30
 		if (1)
 		    for (int c0 = 0; c0 <= 1535; c0 += 1)
 		      for (int c1 = 0; c1 <= 1535; c1 += 1) {
 		        Stmt_for_body3(c0, c1);
 		        for (int c3 = 0; c3 <= 1535; c3 += 1)
 		          Stmt_for_body8(c0, c1, c3);
 		      }
 		else
 		    {  /* original code */ }
 		[...]
 	**Loop Interchange (and Fission to allow the interchange)**
 	We split the loops and can now apply an interchange of the loop dimensions that enumerate Stmt_for_body8.
 	.. Although I feel (and have created a .jscop) we can avoid splitting the loops.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-ast -analyze -polly-process-unprofitable
 	.. code-block:: c
 		[...]
 		:: isl ast :: main :: %for.cond1.preheader---%for.end30
 		if (1)
 		    {
 		      for (int c1 = 0; c1 <= 1535; c1 += 1)
 		        for (int c2 = 0; c2 <= 1535; c2 += 1)
 		          Stmt_for_body3(c1, c2);
 		      for (int c1 = 0; c1 <= 1535; c1 += 1)
 		        for (int c2 = 0; c2 <= 1535; c2 += 1)
 		          for (int c3 = 0; c3 <= 1535; c3 += 1)
 		            Stmt_for_body8(c1, c3, c2);
 		    }
 		else
 		    {  /* original code */ }
 		[...]
 	**Interchange + Tiling**
 	In addition to the interchange we now tile the second loop nest.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
 	.. code-block:: c
 		[...]
 		:: isl ast :: main :: %for.cond1.preheader---%for.end30
 		if (1)
 		    {
 		      for (int c1 = 0; c1 <= 1535; c1 += 1)
 		        for (int c2 = 0; c2 <= 1535; c2 += 1)
 		          Stmt_for_body3(c1, c2);
 		      for (int c1 = 0; c1 <= 1535; c1 += 64)
 		        for (int c2 = 0; c2 <= 1535; c2 += 64)
 		          for (int c3 = 0; c3 <= 1535; c3 += 64)
 		            for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
 		              for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
 		                for (int c6 = c2; c6 <= c2 + 63; c6 += 1)
 		                  Stmt_for_body8(c4, c6, c5);
 		    }
 		else
 		    {  /* original code */ }
 		[...]
 	**Interchange + Tiling + Strip-mining to prepare vectorization**
 	To later allow vectorization we create a so called trivially
        parallelizable loop. It is innermost, parallel and has only four
        iterations. It can be replaced by 4-element SIMD instructions.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
 	.. code-block:: c
 		[...]
 		:: isl ast :: main :: %for.cond1.preheader---%for.end30
 		if (1)
 		    {
 		      for (int c1 = 0; c1 <= 1535; c1 += 1)
 		        for (int c2 = 0; c2 <= 1535; c2 += 1)
 		          Stmt_for_body3(c1, c2);
 		      for (int c1 = 0; c1 <= 1535; c1 += 64)
 		        for (int c2 = 0; c2 <= 1535; c2 += 64)
 		          for (int c3 = 0; c3 <= 1535; c3 += 64)
 		            for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
 		              for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
 		                for (int c6 = c2; c6 <= c2 + 63; c6 += 4)
 		                  for (int c7 = c6; c7 <= c6 + 3; c7 += 1)
 		                    Stmt_for_body8(c4, c7, c5);
 		    }
 		else
 		    {  /* original code */ }
 		[...]
 9. **Codegenerate the SCoPs**
 -----------------------------
 	This generates new code for the SCoPs detected by polly. If
        -polly-import-jscop is present, transformations specified in the
        imported jscop files will be applied.
 	.. code-block:: console
 		$ opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged.ll
 	.. code-block:: guess
 		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged'.
 		File could not be read: No such file or directory
 		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged'.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled.ll
 	.. code-block:: guess
 		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled'.
 		File could not be read: No such file or directory
 		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled'.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+vector.ll
 	.. code-block:: guess
 		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
 		File could not be read: No such file or directory
 		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
 	.. code-block:: console
 		$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-parallel -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+openmp.ll
 	.. code-block:: guess
 		Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
 		File could not be read: No such file or directory
 		Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
 10. **Create the executables**
 ------------------------------
        .. code-block:: console
 	        $ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s -o matmul.normalopt.exe
 	        $ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
 	        $ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
 	        $ llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
        	$ llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && gcc -fopenmp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe
 11. **Compare the runtime of the executables**
 ----------------------------------------------
 	By comparing the runtimes of the different code snippets we see that a
        simple loop interchange gives here the largest performance boost.
        However in this case, adding vectorization and using OpenMP degrades
        the performance.
        .. code-block:: console
 	        $ time ./matmul.normalopt.exe
 	        real	0m11.295s
        	user	0m11.288s
        	sys	0m0.004s
        	$ time ./matmul.polly.interchanged.exe
        	real	0m0.988s
 	        user	0m0.980s
 	        sys	0m0.008s
 	        $ time ./matmul.polly.interchanged+tiled.exe
 	        real	0m0.830s
 	        user	0m0.816s
 	        sys	0m0.012s
 	        $ time ./matmul.polly.interchanged+tiled+vector.exe
        	real	0m5.430s
        	user	0m5.424s
        	sys	0m0.004s
        	$ time ./matmul.polly.interchanged+tiled+vector+openmp.exe
        	real	0m3.184s
        	user	0m11.972s
        	sys	0m0.036s
--- a/polly/docs/experiments/matmul/init_array___%for.cond1.preheader---%for.end19.jscop
+++ b/polly/docs/experiments/matmul/init_array___%for.cond1.preheader---%for.end19.jscop
@ -0,0 +1,33 @@
 {
   "arrays" : [
      {
         "name" : "MemRef_A",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_B",
         "sizes" : [ "1536" ],
         "type" : "float"
      }
   ],
   "context" : "{  :  }",
   "name" : "%for.cond1.preheader---%for.end19",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
         "name" : "Stmt_for_body3",
         "schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1] }"
      }
   ]
 }
--- a/polly/docs/experiments/matmul/main_%for.cond1.preheader---%for.end30.jscop
+++ b/polly/docs/experiments/matmul/main_%for.cond1.preheader---%for.end30.jscop
@ -1,40 +1,57 @@
 {
   "arrays" : [
      {
         "name" : "MemRef_C",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_A",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_B",
         "sizes" : [ "1536" ],
         "type" : "float"
      }
   ],
   "context" : "{  :  }",
-   "name" : "for.cond => for.end30",
+   "name" : "%for.cond1.preheader---%for.end30",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
-               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[1536i0 + i1] }"
+               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
            }
         ],
-         "domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }",
+         "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
         "name" : "Stmt_for_body3",
-         "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }"
+         "schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
-               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
+               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            },
            {
               "kind" : "read",
-               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
+               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
            },
            {
               "kind" : "read",
-               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
+               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
            },
            {
               "kind" : "write",
-               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
+               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            }
         ],
-         "domain" : "{ Stmt_for_body8[i0, i1, i2] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 and i2 >= 0 and i2 <= 1535 }",
+         "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
         "name" : "Stmt_for_body8",
-         "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> schedule[0, i0, 0, i1, 1, i2, 0] }"
+         "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] }"
      }
   ]
 }
--- a/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged
+++ b/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged
@ -0,0 +1,57 @@
 {
   "arrays" : [
      {
         "name" : "MemRef_C",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_A",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_B",
         "sizes" : [ "1536" ],
         "type" : "float"
      }
   ],
   "context" : "{  :  }",
   "name" : "%for.cond1.preheader---%for.end30",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
         "name" : "Stmt_for_body3",
         "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
         "name" : "Stmt_for_body8",
         "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, i0, i2, i1] }"
      }
   ]
 }
--- a/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled
+++ b/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled
@ -0,0 +1,57 @@
 {
   "arrays" : [
      {
         "name" : "MemRef_C",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_A",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_B",
         "sizes" : [ "1536" ],
         "type" : "float"
      }
   ],
   "context" : "{  :  }",
   "name" : "%for.cond1.preheader---%for.end30",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
         "name" : "Stmt_for_body3",
         "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0 ] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
         "name" : "Stmt_for_body8",
         "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }"
      }
   ]
 }
--- a/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector
+++ b/polly/docs/experiments/matmul/main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector
@ -0,0 +1,57 @@
 {
   "arrays" : [
      {
         "name" : "MemRef_C",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_A",
         "sizes" : [ "1536" ],
         "type" : "float"
      },
      {
         "name" : "MemRef_B",
         "sizes" : [ "1536" ],
         "type" : "float"
      }
   ],
   "context" : "{  :  }",
   "name" : "%for.cond1.preheader---%for.end30",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
         "name" : "Stmt_for_body3",
         "schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0, 0 ] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
         "name" : "Stmt_for_body8",
         "schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, oo1, i1]: o0 <= i0 < o0 + 64 and o1 <= oo1 < o1 + 64 and o2 <= i2 < o2 + 64 and oo1 <= i1 < oo1 + 4 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and oo1 % 4 = 0 }"
      }
   ]
 }
--- a/polly/docs/index.rst
+++ b/polly/docs/index.rst
@ -23,8 +23,8 @@ Using Polly
   Architecture
   UsingPollyWithClang
   HowToManuallyUseTheIndividualPiecesOfPolly
 * `How to manually use the individual pieces of Polly <http://polly.llvm.org/example_manual_matmul.html>`_
 * `A list of Polly passes <http://polly.llvm.org/documentation/passes.html>`_
 Indices and tables
--- a/polly/www/experiments/matmul/init_array___%for.cond---%for.end19.jscop
+++ b/polly/www/experiments/matmul/init_array___%for.cond---%for.end19.jscop
@ -1,21 +0,0 @@
 {
   "context" : "{  :  }",
   "name" : "for.cond => for.end19",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[1536i0 + i1] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_for_body3[i0, i1] : i0 >= 0 and i0 <= 1535 and i1 >= 0 and i1 <= 1535 }",
         "name" : "Stmt_for_body3",
         "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0] }"
      }
   ]
 }
--- a/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged
+++ b/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged
@ -1,40 +0,0 @@
 {
   "context" : "{ [] }",
   "name" : "%1 => %17",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
         "name" : "Stmt_4",
         "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
         "name" : "Stmt_6",
         "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, i0, 0, i2, 0, i1, 0] }"
      }
   ]
 }
--- a/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled
+++ b/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled
@ -1,40 +0,0 @@
 {
   "context" : "{ [] }",
   "name" : "%1 => %17",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
         "name" : "Stmt_4",
         "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
         "name" : "Stmt_6",
         "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }"
      }
   ]
 }
--- a/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled+vector
+++ b/polly/www/experiments/matmul/main___%for.cond---%for.end30.jscop.interchanged+tiled+vector
@ -1,40 +0,0 @@
 {
   "context" : "{ [] }",
   "name" : "%1 => %17",
   "statements" : [
      {
         "accesses" : [
            {
               "kind" : "write",
               "relation" : "{ Stmt_4[i0, i1] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 }",
         "name" : "Stmt_4",
         "schedule" : "{ Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0, 0] }"
      },
      {
         "accesses" : [
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_A[1536i0 + i2] }"
            },
            {
               "kind" : "read",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1536i2] }"
            },
            {
               "kind" : "write",
               "relation" : "{ Stmt_6[i0, i1, i2] -> MemRef_C[1536i0 + i1] }"
            }
         ],
         "domain" : "{ Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 }",
         "name" : "Stmt_6",
         "schedule" : "{ Stmt_6[i0, i1, i2] -> schedule[1, o0, o1, o2, i0, i2, ii1, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and ii1 % 4 = 0 and ii1 <= i1 < ii1 + 4}"
      }
   ]
 }
--- a/polly/www/experiments/matmul/matmul.preopt.ll
+++ b/polly/www/experiments/matmul/matmul.preopt.ll
@ -1,5 +1,6 @@
 ; ModuleID = 'matmul.s'
-target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+source_filename = "matmul.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] }
@ -7,114 +8,100 @@ target triple = "x86_64-unknown-linux-gnu"
@A = common global [1536 x [1536 x float]] zeroinitializer, align 16
@B = common global [1536 x [1536 x float]] zeroinitializer, align 16
-@stdout = external global %struct._IO_FILE*
+@stdout = external global %struct._IO_FILE*, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1
@C = common global [1536 x [1536 x float]] zeroinitializer, align 16
-@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
+@.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
 ; Function Attrs: nounwind uwtable
 define void @init_array() #0 {
 entry:
-  br label %for.cond
+  br label %entry.split
-for.cond:                                         ; preds = %for.inc17, %entry
+entry.split:                                      ; preds = %entry
-  %0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ]
+  br label %for.cond1.preheader
  %exitcond3 = icmp ne i64 %0, 1536
  br i1 %exitcond3, label %for.body, label %for.end19
-for.body:                                         ; preds = %for.cond
+for.cond1.preheader:                              ; preds = %entry.split, %for.inc17
-  br label %for.cond1
+  %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ]
  br label %for.body3
-for.cond1:                                        ; preds = %for.inc, %for.body
+for.body3:                                        ; preds = %for.cond1.preheader, %for.body3
-  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
-  %arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar
+  %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5
-  %arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar
+  %1 = trunc i64 %0 to i32
-  %1 = mul i64 %0, %indvar
+  %rem = srem i32 %1, 1024
-  %mul = trunc i64 %1 to i32
+  %add = add nsw i32 %rem, 1
  %exitcond = icmp ne i64 %indvar, 1536
  br i1 %exitcond, label %for.body3, label %for.end
 for.body3:                                        ; preds = %for.cond1
  %rem = srem i32 %mul, 1024
  %add = add nsw i32 1, %rem
  %conv = sitofp i32 %add to double
-  %div = fdiv double %conv, 2.000000e+00
+  %div = fmul double %conv, 5.000000e-01
  %conv4 = fptrunc double %div to float
  %arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv
  store float %conv4, float* %arrayidx6, align 4
-  %rem8 = srem i32 %mul, 1024
+  %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5
-  %add9 = add nsw i32 1, %rem8
+  %3 = trunc i64 %2 to i32
  %rem8 = srem i32 %3, 1024
  %add9 = add nsw i32 %rem8, 1
  %conv10 = sitofp i32 %add9 to double
-  %div11 = fdiv double %conv10, 2.000000e+00
+  %div11 = fmul double %conv10, 5.000000e-01
  %conv12 = fptrunc double %div11 to float
  %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv
  store float %conv12, float* %arrayidx16, align 4
-  br label %for.inc
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 1536
  br i1 %exitcond, label %for.body3, label %for.inc17
-for.inc:                                          ; preds = %for.body3
+for.inc17:                                        ; preds = %for.body3
-  %indvar.next = add i64 %indvar, 1
+  %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1
-  br label %for.cond1
+  %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536
  br i1 %exitcond7, label %for.cond1.preheader, label %for.end19
-for.end:                                          ; preds = %for.cond1
+for.end19:                                        ; preds = %for.inc17
  br label %for.inc17
 for.inc17:                                        ; preds = %for.end
  %indvar.next2 = add i64 %0, 1
  br label %for.cond
 for.end19:                                        ; preds = %for.cond
  ret void
 }
 ; Function Attrs: nounwind uwtable
 define void @print_array() #0 {
 entry:
-  br label %for.cond
+  br label %entry.split
-for.cond:                                         ; preds = %for.inc10, %entry
+entry.split:                                      ; preds = %entry
-  %indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ]
+  br label %for.cond1.preheader
  %exitcond3 = icmp ne i64 %indvar1, 1536
  br i1 %exitcond3, label %for.body, label %for.end12
-for.body:                                         ; preds = %for.cond
+for.cond1.preheader:                              ; preds = %entry.split, %for.end
-  br label %for.cond1
+  %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ]
  %0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
  br label %for.body3
-for.cond1:                                        ; preds = %for.inc, %for.body
+for.body3:                                        ; preds = %for.cond1.preheader, %for.inc
-  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.inc ]
-  %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar
+  %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ]
-  %j.0 = trunc i64 %indvar to i32
+  %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv
-  %exitcond = icmp ne i64 %indvar, 1536
+  %2 = load float, float* %arrayidx5, align 4
-  br i1 %exitcond, label %for.body3, label %for.end
+  %conv = fpext float %2 to double
-
+  %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2
-for.body3:                                        ; preds = %for.cond1
+  %3 = trunc i64 %indvars.iv to i32
-  %0 = load %struct._IO_FILE** @stdout, align 8
+  %rem = srem i32 %3, 80
  %1 = load float* %arrayidx5, align 4
  %conv = fpext float %1 to double
  %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)
  %rem = srem i32 %j.0, 80
  %cmp6 = icmp eq i32 %rem, 79
-  br i1 %cmp6, label %if.then, label %if.end
+  br i1 %cmp6, label %if.then, label %for.inc
 if.then:                                          ; preds = %for.body3
-  %2 = load %struct._IO_FILE** @stdout, align 8
+  %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
-  %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))
+  %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4)
  br label %if.end
 if.end:                                           ; preds = %if.then, %for.body3
  br label %for.inc
-for.inc:                                          ; preds = %if.end
+for.inc:                                          ; preds = %for.body3, %if.then
-  %indvar.next = add i64 %indvar, 1
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
-  br label %for.cond1
+  %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
  %exitcond = icmp ne i64 %indvars.iv.next, 1536
  br i1 %exitcond, label %for.body3, label %for.end
-for.end:                                          ; preds = %for.cond1
+for.end:                                          ; preds = %for.inc
-  %3 = load %struct._IO_FILE** @stdout, align 8
+  %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ]
-  %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))
+  %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa)
-  br label %for.inc10
+  %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1
  %exitcond8 = icmp ne i64 %indvars.iv.next7, 1536
  br i1 %exitcond8, label %for.cond1.preheader, label %for.end12
-for.inc10:                                        ; preds = %for.end
+for.end12:                                        ; preds = %for.end
  %indvar.next2 = add i64 %indvar1, 1
  br label %for.cond
 for.end12:                                        ; preds = %for.cond
  ret void
 }
@ -123,64 +110,62 @@ declare i32 @fprintf(%struct._IO_FILE*, i8*, ...) #1
 ; Function Attrs: nounwind uwtable
 define i32 @main() #0 {
 entry:
-  call void @init_array()
+  br label %entry.split
  br label %for.cond
-for.cond:                                         ; preds = %for.inc28, %entry
+entry.split:                                      ; preds = %entry
-  %indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ]
+  tail call void @init_array()
-  %exitcond6 = icmp ne i64 %indvar3, 1536
+  br label %for.cond1.preheader
  br i1 %exitcond6, label %for.body, label %for.end30
-for.body:                                         ; preds = %for.cond
+for.cond1.preheader:                              ; preds = %entry.split, %for.inc28
-  br label %for.cond1
+  %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ]
  br label %for.body3
-for.cond1:                                        ; preds = %for.inc25, %for.body
+for.body3:                                        ; preds = %for.cond1.preheader, %for.inc25
-  %indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ]
+  %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5, %for.inc25 ]
-  %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1
+  %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
  %exitcond5 = icmp ne i64 %indvar1, 1536
  br i1 %exitcond5, label %for.body3, label %for.end27
 for.body3:                                        ; preds = %for.cond1
  store float 0.000000e+00, float* %arrayidx5, align 4
-  br label %for.cond6
+  br label %for.body8
-for.cond6:                                        ; preds = %for.inc, %for.body3
+for.body8:                                        ; preds = %for.body3, %for.body8
-  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ]
+  %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ]
-  %arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar
+  %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
-  %arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1
+  %0 = load float, float* %arrayidx12, align 4
-  %exitcond = icmp ne i64 %indvar, 1536
+  %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv
-  br i1 %exitcond, label %for.body8, label %for.end
+  %1 = load float, float* %arrayidx16, align 4
-
+  %arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4
-for.body8:                                        ; preds = %for.cond6
+  %2 = load float, float* %arrayidx20, align 4
  %0 = load float* %arrayidx5, align 4
  %1 = load float* %arrayidx16, align 4
  %2 = load float* %arrayidx20, align 4
  %mul = fmul float %1, %2
  %add = fadd float %0, %mul
-  store float %add, float* %arrayidx5, align 4
+  %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4
-  br label %for.inc
+  store float %add, float* %arrayidx24, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 1536
  br i1 %exitcond, label %for.body8, label %for.inc25
-for.inc:                                          ; preds = %for.body8
+for.inc25:                                        ; preds = %for.body8
-  %indvar.next = add i64 %indvar, 1
+  %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1
-  br label %for.cond6
+  %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536
  br i1 %exitcond6, label %for.body3, label %for.inc28
-for.end:                                          ; preds = %for.cond6
+for.inc28:                                        ; preds = %for.inc25
-  br label %for.inc25
+  %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1
  %exitcond9 = icmp ne i64 %indvars.iv.next8, 1536
  br i1 %exitcond9, label %for.cond1.preheader, label %for.end30
-for.inc25:                                        ; preds = %for.end
+for.end30:                                        ; preds = %for.inc28
  %indvar.next2 = add i64 %indvar1, 1
  br label %for.cond1
 for.end27:                                        ; preds = %for.cond1
  br label %for.inc28
 for.inc28:                                        ; preds = %for.end27
  %indvar.next4 = add i64 %indvar3, 1
  br label %for.cond
 for.end30:                                        ; preds = %for.cond
  ret i32 0
 }
-attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+; Function Attrs: nounwind
-attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+declare i64 @fwrite(i8* nocapture, i64, i64, %struct._IO_FILE* nocapture) #2
 ; Function Attrs: nounwind
 declare i32 @fputc(i32, %struct._IO_FILE* nocapture) #2
 attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #2 = { nounwind }
 !llvm.ident = !{!0}
 !0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"}
--- a/polly/www/experiments/matmul/matmul.s
+++ b/polly/www/experiments/matmul/matmul.s
@ -1,5 +1,6 @@
 ; ModuleID = 'matmul.c'
-target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+source_filename = "matmul.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 %struct._IO_FILE = type { i32, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, i8*, %struct._IO_marker*, %struct._IO_FILE*, i32, i32, i64, i16, i8, [1 x i8], i8*, i64, i8*, i8*, i8*, i8*, i64, i32, [20 x i8] }
@ -7,10 +8,10 @@ target triple = "x86_64-unknown-linux-gnu"
@A = common global [1536 x [1536 x float]] zeroinitializer, align 16
@B = common global [1536 x [1536 x float]] zeroinitializer, align 16
-@stdout = external global %struct._IO_FILE*
+@stdout = external global %struct._IO_FILE*, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lf \00", align 1
@C = common global [1536 x [1536 x float]] zeroinitializer, align 16
-@.str1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
+@.str.1 = private unnamed_addr constant [2 x i8] c"\0A\00", align 1
 ; Function Attrs: nounwind uwtable
 define void @init_array() #0 {
@ -21,7 +22,7 @@ entry:
  br label %for.cond
 for.cond:                                         ; preds = %for.inc17, %entry
-  %0 = load i32* %i, align 4
+  %0 = load i32, i32* %i, align 4
  %cmp = icmp slt i32 %0, 1536
  br i1 %cmp, label %for.body, label %for.end19
@ -30,45 +31,45 @@ for.body:                                         ; preds = %for.cond
  br label %for.cond1
 for.cond1:                                        ; preds = %for.inc, %for.body
-  %1 = load i32* %j, align 4
+  %1 = load i32, i32* %j, align 4
  %cmp2 = icmp slt i32 %1, 1536
  br i1 %cmp2, label %for.body3, label %for.end
 for.body3:                                        ; preds = %for.cond1
-  %2 = load i32* %i, align 4
+  %2 = load i32, i32* %i, align 4
-  %3 = load i32* %j, align 4
+  %3 = load i32, i32* %j, align 4
  %mul = mul nsw i32 %2, %3
  %rem = srem i32 %mul, 1024
  %add = add nsw i32 1, %rem
  %conv = sitofp i32 %add to double
  %div = fdiv double %conv, 2.000000e+00
  %conv4 = fptrunc double %div to float
-  %4 = load i32* %j, align 4
+  %4 = load i32, i32* %j, align 4
  %idxprom = sext i32 %4 to i64
-  %5 = load i32* %i, align 4
+  %5 = load i32, i32* %i, align 4
  %idxprom5 = sext i32 %5 to i64
-  %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom5
+  %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom5
-  %arrayidx6 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom
+  %arrayidx6 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
  store float %conv4, float* %arrayidx6, align 4
-  %6 = load i32* %i, align 4
+  %6 = load i32, i32* %i, align 4
-  %7 = load i32* %j, align 4
+  %7 = load i32, i32* %j, align 4
  %mul7 = mul nsw i32 %6, %7
  %rem8 = srem i32 %mul7, 1024
  %add9 = add nsw i32 1, %rem8
  %conv10 = sitofp i32 %add9 to double
  %div11 = fdiv double %conv10, 2.000000e+00
  %conv12 = fptrunc double %div11 to float
-  %8 = load i32* %j, align 4
+  %8 = load i32, i32* %j, align 4
  %idxprom13 = sext i32 %8 to i64
-  %9 = load i32* %i, align 4
+  %9 = load i32, i32* %i, align 4
  %idxprom14 = sext i32 %9 to i64
-  %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom14
+  %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom14
-  %arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13
+  %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13
  store float %conv12, float* %arrayidx16, align 4
  br label %for.inc
 for.inc:                                          ; preds = %for.body3
-  %10 = load i32* %j, align 4
+  %10 = load i32, i32* %j, align 4
  %inc = add nsw i32 %10, 1
  store i32 %inc, i32* %j, align 4
  br label %for.cond1
@ -77,7 +78,7 @@ for.end:                                          ; preds = %for.cond1
  br label %for.inc17
 for.inc17:                                        ; preds = %for.end
-  %11 = load i32* %i, align 4
+  %11 = load i32, i32* %i, align 4
  %inc18 = add nsw i32 %11, 1
  store i32 %inc18, i32* %i, align 4
  br label %for.cond
@ -95,7 +96,7 @@ entry:
  br label %for.cond
 for.cond:                                         ; preds = %for.inc10, %entry
-  %0 = load i32* %i, align 4
+  %0 = load i32, i32* %i, align 4
  %cmp = icmp slt i32 %0, 1536
  br i1 %cmp, label %for.body, label %for.end12
@ -104,47 +105,47 @@ for.body:                                         ; preds = %for.cond
  br label %for.cond1
 for.cond1:                                        ; preds = %for.inc, %for.body
-  %1 = load i32* %j, align 4
+  %1 = load i32, i32* %j, align 4
  %cmp2 = icmp slt i32 %1, 1536
  br i1 %cmp2, label %for.body3, label %for.end
 for.body3:                                        ; preds = %for.cond1
-  %2 = load %struct._IO_FILE** @stdout, align 8
+  %2 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
-  %3 = load i32* %j, align 4
+  %3 = load i32, i32* %j, align 4
  %idxprom = sext i32 %3 to i64
-  %4 = load i32* %i, align 4
+  %4 = load i32, i32* %i, align 4
  %idxprom4 = sext i32 %4 to i64
-  %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4
+  %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4
-  %arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom
+  %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
-  %5 = load float* %arrayidx5, align 4
+  %5 = load float, float* %arrayidx5, align 4
  %conv = fpext float %5 to double
-  %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)
+  %call = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), double %conv)
-  %6 = load i32* %j, align 4
+  %6 = load i32, i32* %j, align 4
  %rem = srem i32 %6, 80
  %cmp6 = icmp eq i32 %rem, 79
  br i1 %cmp6, label %if.then, label %if.end
 if.then:                                          ; preds = %for.body3
-  %7 = load %struct._IO_FILE** @stdout, align 8
+  %7 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
-  %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))
+  %call8 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %7, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0))
  br label %if.end
 if.end:                                           ; preds = %if.then, %for.body3
  br label %for.inc
 for.inc:                                          ; preds = %if.end
-  %8 = load i32* %j, align 4
+  %8 = load i32, i32* %j, align 4
  %inc = add nsw i32 %8, 1
  store i32 %inc, i32* %j, align 4
  br label %for.cond1
 for.end:                                          ; preds = %for.cond1
-  %9 = load %struct._IO_FILE** @stdout, align 8
+  %9 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8
-  %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))
+  %call9 = call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %9, i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0))
  br label %for.inc10
 for.inc10:                                        ; preds = %for.end
-  %10 = load i32* %i, align 4
+  %10 = load i32, i32* %i, align 4
  %inc11 = add nsw i32 %10, 1
  store i32 %inc11, i32* %i, align 4
  br label %for.cond
@ -164,13 +165,13 @@ entry:
  %k = alloca i32, align 4
  %t_start = alloca double, align 8
  %t_end = alloca double, align 8
-  store i32 0, i32* %retval
+  store i32 0, i32* %retval, align 4
  call void @init_array()
  store i32 0, i32* %i, align 4
  br label %for.cond
 for.cond:                                         ; preds = %for.inc28, %entry
-  %0 = load i32* %i, align 4
+  %0 = load i32, i32* %i, align 4
  %cmp = icmp slt i32 %0, 1536
  br i1 %cmp, label %for.body, label %for.end30
@ -179,61 +180,61 @@ for.body:                                         ; preds = %for.cond
  br label %for.cond1
 for.cond1:                                        ; preds = %for.inc25, %for.body
-  %1 = load i32* %j, align 4
+  %1 = load i32, i32* %j, align 4
  %cmp2 = icmp slt i32 %1, 1536
  br i1 %cmp2, label %for.body3, label %for.end27
 for.body3:                                        ; preds = %for.cond1
-  %2 = load i32* %j, align 4
+  %2 = load i32, i32* %j, align 4
  %idxprom = sext i32 %2 to i64
-  %3 = load i32* %i, align 4
+  %3 = load i32, i32* %i, align 4
  %idxprom4 = sext i32 %3 to i64
-  %arrayidx = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom4
+  %arrayidx = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom4
-  %arrayidx5 = getelementptr inbounds [1536 x float]* %arrayidx, i32 0, i64 %idxprom
+  %arrayidx5 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx, i64 0, i64 %idxprom
  store float 0.000000e+00, float* %arrayidx5, align 4
  store i32 0, i32* %k, align 4
  br label %for.cond6
 for.cond6:                                        ; preds = %for.inc, %for.body3
-  %4 = load i32* %k, align 4
+  %4 = load i32, i32* %k, align 4
  %cmp7 = icmp slt i32 %4, 1536
  br i1 %cmp7, label %for.body8, label %for.end
 for.body8:                                        ; preds = %for.cond6
-  %5 = load i32* %j, align 4
+  %5 = load i32, i32* %j, align 4
  %idxprom9 = sext i32 %5 to i64
-  %6 = load i32* %i, align 4
+  %6 = load i32, i32* %i, align 4
  %idxprom10 = sext i32 %6 to i64
-  %arrayidx11 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom10
+  %arrayidx11 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom10
-  %arrayidx12 = getelementptr inbounds [1536 x float]* %arrayidx11, i32 0, i64 %idxprom9
+  %arrayidx12 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx11, i64 0, i64 %idxprom9
-  %7 = load float* %arrayidx12, align 4
+  %7 = load float, float* %arrayidx12, align 4
-  %8 = load i32* %k, align 4
+  %8 = load i32, i32* %k, align 4
  %idxprom13 = sext i32 %8 to i64
-  %9 = load i32* %i, align 4
+  %9 = load i32, i32* %i, align 4
  %idxprom14 = sext i32 %9 to i64
-  %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]]* @A, i32 0, i64 %idxprom14
+  %arrayidx15 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @A, i64 0, i64 %idxprom14
-  %arrayidx16 = getelementptr inbounds [1536 x float]* %arrayidx15, i32 0, i64 %idxprom13
+  %arrayidx16 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx15, i64 0, i64 %idxprom13
-  %10 = load float* %arrayidx16, align 4
+  %10 = load float, float* %arrayidx16, align 4
-  %11 = load i32* %j, align 4
+  %11 = load i32, i32* %j, align 4
  %idxprom17 = sext i32 %11 to i64
-  %12 = load i32* %k, align 4
+  %12 = load i32, i32* %k, align 4
  %idxprom18 = sext i32 %12 to i64
-  %arrayidx19 = getelementptr inbounds [1536 x [1536 x float]]* @B, i32 0, i64 %idxprom18
+  %arrayidx19 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @B, i64 0, i64 %idxprom18
-  %arrayidx20 = getelementptr inbounds [1536 x float]* %arrayidx19, i32 0, i64 %idxprom17
+  %arrayidx20 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx19, i64 0, i64 %idxprom17
-  %13 = load float* %arrayidx20, align 4
+  %13 = load float, float* %arrayidx20, align 4
  %mul = fmul float %10, %13
  %add = fadd float %7, %mul
-  %14 = load i32* %j, align 4
+  %14 = load i32, i32* %j, align 4
  %idxprom21 = sext i32 %14 to i64
-  %15 = load i32* %i, align 4
+  %15 = load i32, i32* %i, align 4
  %idxprom22 = sext i32 %15 to i64
-  %arrayidx23 = getelementptr inbounds [1536 x [1536 x float]]* @C, i32 0, i64 %idxprom22
+  %arrayidx23 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x float]]* @C, i64 0, i64 %idxprom22
-  %arrayidx24 = getelementptr inbounds [1536 x float]* %arrayidx23, i32 0, i64 %idxprom21
+  %arrayidx24 = getelementptr inbounds [1536 x float], [1536 x float]* %arrayidx23, i64 0, i64 %idxprom21
  store float %add, float* %arrayidx24, align 4
  br label %for.inc
 for.inc:                                          ; preds = %for.body8
-  %16 = load i32* %k, align 4
+  %16 = load i32, i32* %k, align 4
  %inc = add nsw i32 %16, 1
  store i32 %inc, i32* %k, align 4
  br label %for.cond6
@ -242,7 +243,7 @@ for.end:                                          ; preds = %for.cond6
  br label %for.inc25
 for.inc25:                                        ; preds = %for.end
-  %17 = load i32* %j, align 4
+  %17 = load i32, i32* %j, align 4
  %inc26 = add nsw i32 %17, 1
  store i32 %inc26, i32* %j, align 4
  br label %for.cond1
@ -251,7 +252,7 @@ for.end27:                                        ; preds = %for.cond1
  br label %for.inc28
 for.inc28:                                        ; preds = %for.end27
-  %18 = load i32* %i, align 4
+  %18 = load i32, i32* %i, align 4
  %inc29 = add nsw i32 %18, 1
  store i32 %inc29, i32* %i, align 4
  br label %for.cond
@ -260,5 +261,9 @@ for.end30:                                        ; preds = %for.cond
  ret i32 0
 }
-attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
-attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
 !llvm.ident = !{!0}
 !0 = !{!"clang version 4.0.0 (http://llvm.org/git/clang.git 081569d9a29c7bc827b2d41f8e62891bbc895e2f) (http://llvm.org/git/llvm.git e117e506536626352e8e47f6c72cd6e2a276622c)"}
--- a/polly/www/experiments/matmul/runall.sh
+++ b/polly/www/experiments/matmul/runall.sh
@ -3,68 +3,69 @@
 echo "--> 1. Create LLVM-IR from C"
 clang -S -emit-llvm matmul.c -o matmul.s
-echo "--> 2. Load Polly automatically when calling the 'opt' tool"
+echo "--> 2. Prepare the LLVM-IR for Polly"
 export PATH_TO_POLLY_LIB="~/polly/build/lib/"
 alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"
 echo "--> 3. Prepare the LLVM-IR for Polly"
 opt -S -polly-canonicalize matmul.s > matmul.preopt.ll
-echo "--> 4. Show the SCoPs detected by Polly"
+echo "--> 3. Show the SCoPs detected by Polly"
-opt -basicaa -polly-ast -analyze -q matmul.preopt.ll
+opt -basicaa -polly-ast -analyze -q matmul.preopt.ll \
    -polly-process-unprofitable
-echo "--> 5.1 Highlight the detected SCoPs in the CFGs of the program"
+echo "--> 4.1 Highlight the detected SCoPs in the CFGs of the program"
 # We only create .dot files, as directly -view-scops directly calls graphviz
 # which would require user interaction to continue the script.
 # opt -basicaa -view-scops -disable-output matmul.preopt.ll
 opt -basicaa -dot-scops -disable-output matmul.preopt.ll
-echo "--> 5.2 Highlight the detected SCoPs in the CFGs of the program (print \
+echo "--> 4.2 Highlight the detected SCoPs in the CFGs of the program (print \
 no instructions)"
 # We only create .dot files, as directly -view-scops-only directly calls
 # graphviz which would require user interaction to continue the script.
 # opt -basicaa -view-scops-only -disable-output matmul.preopt.ll
 opt -basicaa -dot-scops-only -disable-output matmul.preopt.ll
-echo "--> 5.3 Create .png files from the .dot files"
+echo "--> 4.3 Create .png files from the .dot files"
 for i in `ls *.dot`; do dot -Tpng $i > $i.png; done
-echo "--> 6. View the polyhedral representation of the SCoPs"
+echo "--> 5. View the polyhedral representation of the SCoPs"
-opt -basicaa -polly-scops -analyze matmul.preopt.ll
+opt -basicaa -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
-echo "--> 7. Show the dependences for the SCoPs"
+echo "--> 6. Show the dependences for the SCoPs"
-opt -basicaa -polly-dependences -analyze matmul.preopt.ll
+opt -basicaa -polly-dependences -analyze matmul.preopt.ll \
    -polly-process-unprofitable
-echo "--> 8. Export jscop files"
+echo "--> 7. Export jscop files"
-opt -basicaa -polly-export-jscop matmul.preopt.ll
+opt -basicaa -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
-echo "--> 9. Import the updated jscop files and print the new SCoPs. (optional)"
+echo "--> 8. Import the updated jscop files and print the new SCoPs. (optional)"
 opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll
 opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-    -polly-import-jscop-postfix=interchanged
+    -polly-process-unprofitable
 opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-    -polly-import-jscop-postfix=interchanged+tiled
+    -polly-import-jscop-postfix=interchanged -polly-process-unprofitable
 opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
-    -polly-import-jscop-postfix=interchanged+tiled+vector
+    -polly-import-jscop-postfix=interchanged+tiled -polly-process-unprofitable
 opt -basicaa -polly-import-jscop -polly-ast -analyze matmul.preopt.ll \
    -polly-import-jscop-postfix=interchanged+tiled+vector \
    -polly-process-unprofitable
-echo "--> 10. Codegenerate the SCoPs"
+echo "--> 9. Codegenerate the SCoPs"
 opt -basicaa -polly-import-jscop -polly-import-jscop-postfix=interchanged \
-    -polly-codegen \
+    -polly-codegen -polly-process-unprofitable\
    matmul.preopt.ll | opt -O3 > matmul.polly.interchanged.ll
 opt -basicaa -polly-import-jscop \
    -polly-import-jscop-postfix=interchanged+tiled -polly-codegen \
-    matmul.preopt.ll | opt -O3 > matmul.polly.interchanged+tiled.ll
+    matmul.preopt.ll -polly-process-unprofitable \
-opt -basicaa -polly-import-jscop \
+    | opt -O3 > matmul.polly.interchanged+tiled.ll
 opt -basicaa -polly-import-jscop -polly-process-unprofitable\
    -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \
    matmul.preopt.ll -polly-vectorizer=polly\
    | opt -O3 > matmul.polly.interchanged+tiled+vector.ll
-opt -basicaa -polly-import-jscop \
+opt -basicaa -polly-import-jscop -polly-process-unprofitable\
    -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen \
    matmul.preopt.ll -polly-vectorizer=polly -polly-parallel\
    | opt -O3 > matmul.polly.interchanged+tiled+vector+openmp.ll
 opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll
-echo "--> 11. Create the executables"
+echo "--> 10. Create the executables"
 llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s \
    -o matmul.polly.interchanged.exe
 llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s \
@ -80,7 +81,7 @@ llc matmul.polly.interchanged+tiled+vector+openmp.ll \
 llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s \
    -o matmul.normalopt.exe
-echo "--> 12. Compare the runtime of the executables"
+echo "--> 11. Compare the runtime of the executables"
 echo "time ./matmul.normalopt.exe"
 time -f "%E real, %U user, %S sys" ./matmul.normalopt.exe
--- a/polly/www/experiments/matmul/scops.init_array.dot
+++ b/polly/www/experiments/matmul/scops.init_array.dot
@ -1,47 +1,39 @@
 digraph "Scop Graph for 'init_array' function" {
 	label="Scop Graph for 'init_array' function";
-	Node0x17d4370 [shape=record,label="{entry:\l  br label %for.cond\l}"];
+	Node0x5b5b5a0 [shape=record,label="{entry:\l  br label %entry.split\l}"];
-	Node0x17d4370 -> Node0x17da5d0;
+	Node0x5b5b5a0 -> Node0x5b5de30;
-	Node0x17da5d0 [shape=record,label="{for.cond:                                         \l  %0 = phi i64 [ %indvar.next2, %for.inc17 ], [ 0, %entry ]\l  %exitcond3 = icmp ne i64 %0, 1536\l  br i1 %exitcond3, label %for.body, label %for.end19\l}"];
+	Node0x5b5de30 [shape=record,label="{entry.split:                                      \l  br label %for.cond1.preheader\l}"];
-	Node0x17da5d0 -> Node0x17da5f0;
+	Node0x5b5de30 -> Node0x5b5de50;
-	Node0x17da5d0 -> Node0x17da650;
+	Node0x5b5de50 [shape=record,label="{for.cond1.preheader:                              \l  %indvars.iv5 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next6, %for.inc17 ]\l  br label %for.body3\l}"];
-	Node0x17da5f0 [shape=record,label="{for.body:                                         \l  br label %for.cond1\l}"];
+	Node0x5b5de50 -> Node0x5b5b570;
-	Node0x17da5f0 -> Node0x17da900;
+	Node0x5b5b570 [shape=record,label="{for.body3:                                        \l  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.body3 ]\l  %0 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l  %1 = trunc i64 %0 to i32\l  %rem = srem i32 %1, 1024\l  %add = add nsw i32 %rem, 1\l  %conv = sitofp i32 %add to double\l  %div = fmul double %conv, 5.000000e-01\l  %conv4 = fptrunc double %div to float\l  %arrayidx6 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @A, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l  store float %conv4, float* %arrayidx6, align 4\l  %2 = mul nuw nsw i64 %indvars.iv, %indvars.iv5\l  %3 = trunc i64 %2 to i32\l  %rem8 = srem i32 %3, 1024\l  %add9 = add nsw i32 %rem8, 1\l  %conv10 = sitofp i32 %add9 to double\l  %div11 = fmul double %conv10, 5.000000e-01\l  %conv12 = fptrunc double %div11 to float\l  %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv5, i64 %indvars.iv\l  store float %conv12, float* %arrayidx16, align 4\l  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l  %exitcond = icmp ne i64 %indvars.iv.next, 1536\l  br i1 %exitcond, label %for.body3, label %for.inc17\l}"];
-	Node0x17da900 [shape=record,label="{for.cond1:                                        \l  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l  %arrayidx6 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %0, i64 %indvar\l  %arrayidx16 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %0, i64 %indvar\l  %1 = mul i64 %0, %indvar\l  %mul = trunc i64 %1 to i32\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %for.body3, label %for.end\l}"];
+	Node0x5b5b570 -> Node0x5b5b570[constraint=false];
-	Node0x17da900 -> Node0x17da670;
+	Node0x5b5b570 -> Node0x5b5df30;
-	Node0x17da900 -> Node0x17da9a0;
+	Node0x5b5df30 [shape=record,label="{for.inc17:                                        \l  %indvars.iv.next6 = add nuw nsw i64 %indvars.iv5, 1\l  %exitcond7 = icmp ne i64 %indvars.iv.next6, 1536\l  br i1 %exitcond7, label %for.cond1.preheader, label %for.end19\l}"];
-	Node0x17da670 [shape=record,label="{for.body3:                                        \l  %rem = srem i32 %mul, 1024\l  %add = add nsw i32 1, %rem\l  %conv = sitofp i32 %add to double\l  %div = fdiv double %conv, 2.000000e+00\l  %conv4 = fptrunc double %div to float\l  store float %conv4, float* %arrayidx6, align 4\l  %rem8 = srem i32 %mul, 1024\l  %add9 = add nsw i32 1, %rem8\l  %conv10 = sitofp i32 %add9 to double\l  %div11 = fdiv double %conv10, 2.000000e+00\l  %conv12 = fptrunc double %div11 to float\l  store float %conv12, float* %arrayidx16, align 4\l  br label %for.inc\l}"];
+	Node0x5b5df30 -> Node0x5b5de50[constraint=false];
-	Node0x17da670 -> Node0x17da8e0;
+	Node0x5b5df30 -> Node0x5b5df90;
-	Node0x17da8e0 [shape=record,label="{for.inc:                                          \l  %indvar.next = add i64 %indvar, 1\l  br label %for.cond1\l}"];
+	Node0x5b5df90 [shape=record,label="{for.end19:                                        \l  ret void\l}"];
 	Node0x17da8e0 -> Node0x17da900[constraint=false];
 	Node0x17da9a0 [shape=record,label="{for.end:                                          \l  br label %for.inc17\l}"];
 	Node0x17da9a0 -> Node0x17d9e70;
 	Node0x17d9e70 [shape=record,label="{for.inc17:                                        \l  %indvar.next2 = add i64 %0, 1\l  br label %for.cond\l}"];
 	Node0x17d9e70 -> Node0x17da5d0[constraint=false];
 	Node0x17da650 [shape=record,label="{for.end19:                                        \l  ret void\l}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d3a30 {
+        subgraph cluster_0x5b4bdd0 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d4ec0 {
+          subgraph cluster_0x5b4bf50 {
-            label = "";
+            label = "Region can not profitably be optimized!";
-            style = filled;
+            style = solid;
-            color = 3            subgraph cluster_0x17d4180 {
+            color = 6
            subgraph cluster_0x5b4c0d0 {
              label = "";
              style = solid;
              color = 5
-              Node0x17da900;
+              Node0x5b5b570;
              Node0x17da670;
              Node0x17da8e0;
            }
-            Node0x17da5d0;
+            Node0x5b5de50;
-            Node0x17da5f0;
+            Node0x5b5df30;
            Node0x17da9a0;
            Node0x17d9e70;
          }
-          Node0x17d4370;
+          Node0x5b5b5a0;
-          Node0x17da650;
+          Node0x5b5de30;
          Node0x5b5df90;
        }
 }
--- a/polly/www/experiments/matmul/scops.main.dot
+++ b/polly/www/experiments/matmul/scops.main.dot
@ -1,65 +1,50 @@
 digraph "Scop Graph for 'main' function" {
 	label="Scop Graph for 'main' function";
-	Node0x17d21a0 [shape=record,label="{entry:\l  call void @init_array()\l  br label %for.cond\l}"];
+	Node0x5b5c850 [shape=record,label="{entry:\l  br label %entry.split\l}"];
-	Node0x17d21a0 -> Node0x17d2020;
+	Node0x5b5c850 -> Node0x5b5a440;
-	Node0x17d2020 [shape=record,label="{for.cond:                                         \l  %indvar3 = phi i64 [ %indvar.next4, %for.inc28 ], [ 0, %entry ]\l  %exitcond6 = icmp ne i64 %indvar3, 1536\l  br i1 %exitcond6, label %for.body, label %for.end30\l}"];
+	Node0x5b5a440 [shape=record,label="{entry.split:                                      \l  tail call void @init_array()\l  br label %for.cond1.preheader\l}"];
-	Node0x17d2020 -> Node0x17d3950;
+	Node0x5b5a440 -> Node0x5b38cd0;
-	Node0x17d2020 -> Node0x17da500;
+	Node0x5b38cd0 [shape=record,label="{for.cond1.preheader:                              \l  %indvars.iv7 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next8, %for.inc28 ]\l  br label %for.body3\l}"];
-	Node0x17d3950 [shape=record,label="{for.body:                                         \l  br label %for.cond1\l}"];
+	Node0x5b38cd0 -> Node0x5b4bd30;
-	Node0x17d3950 -> Node0x17da760;
+	Node0x5b4bd30 [shape=record,label="{for.body3:                                        \l  %indvars.iv4 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next5,\l... %for.inc25 ]\l  %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l  store float 0.000000e+00, float* %arrayidx5, align 4\l  br label %for.body8\l}"];
-	Node0x17da760 [shape=record,label="{for.cond1:                                        \l  %indvar1 = phi i64 [ %indvar.next2, %for.inc25 ], [ 0, %for.body ]\l  %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar3, i64 %indvar1\l  %exitcond5 = icmp ne i64 %indvar1, 1536\l  br i1 %exitcond5, label %for.body3, label %for.end27\l}"];
+	Node0x5b4bd30 -> Node0x5b38c50;
-	Node0x17da760 -> Node0x17db1e0;
+	Node0x5b38c50 [shape=record,label="{for.body8:                                        \l  %indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body8 ]\l  %arrayidx12 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l  %0 = load float, float* %arrayidx12, align 4\l  %arrayidx16 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @A, i64 0, i64 %indvars.iv7, i64 %indvars.iv\l  %1 = load float, float* %arrayidx16, align 4\l  %arrayidx20 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv4\l  %2 = load float, float* %arrayidx20, align 4\l  %mul = fmul float %1, %2\l  %add = fadd float %0, %mul\l  %arrayidx24 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536\l... x float]]* @C, i64 0, i64 %indvars.iv7, i64 %indvars.iv4\l  store float %add, float* %arrayidx24, align 4\l  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l  %exitcond = icmp ne i64 %indvars.iv.next, 1536\l  br i1 %exitcond, label %for.body8, label %for.inc25\l}"];
-	Node0x17da760 -> Node0x17db250;
+	Node0x5b38c50 -> Node0x5b38c50[constraint=false];
-	Node0x17db1e0 [shape=record,label="{for.body3:                                        \l  store float 0.000000e+00, float* %arrayidx5, align 4\l  br label %for.cond6\l}"];
+	Node0x5b38c50 -> Node0x5b5a290;
-	Node0x17db1e0 -> Node0x17da740;
+	Node0x5b5a290 [shape=record,label="{for.inc25:                                        \l  %indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1\l  %exitcond6 = icmp ne i64 %indvars.iv.next5, 1536\l  br i1 %exitcond6, label %for.body3, label %for.inc28\l}"];
-	Node0x17da740 [shape=record,label="{for.cond6:                                        \l  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body3 ]\l  %arrayidx16 = getelementptr [1536 x [1536 x float]]* @A, i64 0, i64 %indvar3, i64 %indvar\l  %arrayidx20 = getelementptr [1536 x [1536 x float]]* @B, i64 0, i64 %indvar, i64 %indvar1\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %for.body8, label %for.end\l}"];
+	Node0x5b5a290 -> Node0x5b4bd30[constraint=false];
-	Node0x17da740 -> Node0x17da5a0;
+	Node0x5b5a290 -> Node0x5b5a340;
-	Node0x17da740 -> Node0x17da800;
+	Node0x5b5a340 [shape=record,label="{for.inc28:                                        \l  %indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1\l  %exitcond9 = icmp ne i64 %indvars.iv.next8, 1536\l  br i1 %exitcond9, label %for.cond1.preheader, label %for.end30\l}"];
-	Node0x17da5a0 [shape=record,label="{for.body8:                                        \l  %0 = load float* %arrayidx5, align 4\l  %1 = load float* %arrayidx16, align 4\l  %2 = load float* %arrayidx20, align 4\l  %mul = fmul float %1, %2\l  %add = fadd float %0, %mul\l  store float %add, float* %arrayidx5, align 4\l  br label %for.inc\l}"];
+	Node0x5b5a340 -> Node0x5b38cd0[constraint=false];
-	Node0x17da5a0 -> Node0x17da5c0;
+	Node0x5b5a340 -> Node0x5b5a3a0;
-	Node0x17da5c0 [shape=record,label="{for.inc:                                          \l  %indvar.next = add i64 %indvar, 1\l  br label %for.cond6\l}"];
+	Node0x5b5a3a0 [shape=record,label="{for.end30:                                        \l  ret i32 0\l}"];
 	Node0x17da5c0 -> Node0x17da740[constraint=false];
 	Node0x17da800 [shape=record,label="{for.end:                                          \l  br label %for.inc25\l}"];
 	Node0x17da800 -> Node0x17dae20;
 	Node0x17dae20 [shape=record,label="{for.inc25:                                        \l  %indvar.next2 = add i64 %indvar1, 1\l  br label %for.cond1\l}"];
 	Node0x17dae20 -> Node0x17da760[constraint=false];
 	Node0x17db250 [shape=record,label="{for.end27:                                        \l  br label %for.inc28\l}"];
 	Node0x17db250 -> Node0x17dae80;
 	Node0x17dae80 [shape=record,label="{for.inc28:                                        \l  %indvar.next4 = add i64 %indvar3, 1\l  br label %for.cond\l}"];
 	Node0x17dae80 -> Node0x17d2020[constraint=false];
 	Node0x17da500 [shape=record,label="{for.end30:                                        \l  ret i32 0\l}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d3f30 {
+        subgraph cluster_0x5b5c970 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d38d0 {
+          subgraph cluster_0x5b5c5a0 {
            label = "";
            style = filled;
-            color = 3            subgraph cluster_0x17d3850 {
+            color = 3            subgraph cluster_0x5b5c9f0 {
              label = "";
              style = solid;
              color = 5
-              subgraph cluster_0x17d37d0 {
+              subgraph cluster_0x5b5c110 {
                label = "";
                style = solid;
                color = 7
-                Node0x17da740;
+                Node0x5b38c50;
                Node0x17da5a0;
                Node0x17da5c0;
              }
-              Node0x17da760;
+              Node0x5b4bd30;
-              Node0x17db1e0;
+              Node0x5b5a290;
              Node0x17da800;
              Node0x17dae20;
            }
-            Node0x17d2020;
+            Node0x5b38cd0;
-            Node0x17d3950;
+            Node0x5b5a340;
            Node0x17db250;
            Node0x17dae80;
          }
-          Node0x17d21a0;
+          Node0x5b5c850;
-          Node0x17da500;
+          Node0x5b5a440;
          Node0x5b5a3a0;
        }
 }
--- a/polly/www/experiments/matmul/scops.print_array.dot
+++ b/polly/www/experiments/matmul/scops.print_array.dot
@ -1,60 +1,51 @@
 digraph "Scop Graph for 'print_array' function" {
 	label="Scop Graph for 'print_array' function";
-	Node0x17d2200 [shape=record,label="{entry:\l  br label %for.cond\l}"];
+	Node0x5b5ee00 [shape=record,label="{entry:\l  br label %entry.split\l}"];
-	Node0x17d2200 -> Node0x17d4f20;
+	Node0x5b5ee00 -> Node0x5b5ee50;
-	Node0x17d4f20 [shape=record,label="{for.cond:                                         \l  %indvar1 = phi i64 [ %indvar.next2, %for.inc10 ], [ 0, %entry ]\l  %exitcond3 = icmp ne i64 %indvar1, 1536\l  br i1 %exitcond3, label %for.body, label %for.end12\l}"];
+	Node0x5b5ee50 [shape=record,label="{entry.split:                                      \l  br label %for.cond1.preheader\l}"];
-	Node0x17d4f20 -> Node0x17d3680;
+	Node0x5b5ee50 -> Node0x5b5ee70;
-	Node0x17d4f20 -> Node0x17d9fc0;
+	Node0x5b5ee70 [shape=record,label="{for.cond1.preheader:                              \l  %indvars.iv6 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next7, %for.end ]\l  %0 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l  br label %for.body3\l}"];
-	Node0x17d3680 [shape=record,label="{for.body:                                         \l  br label %for.cond1\l}"];
+	Node0x5b5ee70 -> Node0x5b5ee20;
-	Node0x17d3680 -> Node0x17da220;
+	Node0x5b5ee20 [shape=record,label="{for.body3:                                        \l  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next,\l... %for.inc ]\l  %1 = phi %struct._IO_FILE* [ %0, %for.cond1.preheader ], [ %5, %for.inc ]\l  %arrayidx5 = getelementptr inbounds [1536 x [1536 x float]], [1536 x [1536 x\l... float]]* @C, i64 0, i64 %indvars.iv6, i64 %indvars.iv\l  %2 = load float, float* %arrayidx5, align 4\l  %conv = fpext float %2 to double\l  %call = tail call i32 (%struct._IO_FILE*, i8*, ...)\l... @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x\l... i8]* @.str, i64 0, i64 0), double %conv) #2\l  %3 = trunc i64 %indvars.iv to i32\l  %rem = srem i32 %3, 80\l  %cmp6 = icmp eq i32 %rem, 79\l  br i1 %cmp6, label %if.then, label %for.inc\l}"];
-	Node0x17da220 [shape=record,label="{for.cond1:                                        \l  %indvar = phi i64 [ %indvar.next, %for.inc ], [ 0, %for.body ]\l  %arrayidx5 = getelementptr [1536 x [1536 x float]]* @C, i64 0, i64 %indvar1, i64 %indvar\l  %j.0 = trunc i64 %indvar to i32\l  %exitcond = icmp ne i64 %indvar, 1536\l  br i1 %exitcond, label %for.body3, label %for.end\l}"];
+	Node0x5b5ee20 -> Node0x5b60d10;
-	Node0x17da220 -> Node0x17d9ea0;
+	Node0x5b5ee20 -> Node0x5b60d70;
-	Node0x17da220 -> Node0x17da0f0;
+	Node0x5b60d10 [shape=record,label="{if.then:                                          \l  %4 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l  %fputc3 = tail call i32 @fputc(i32 10, %struct._IO_FILE* %4)\l  br label %for.inc\l}"];
-	Node0x17d9ea0 [shape=record,label="{for.body3:                                        \l  %0 = load %struct._IO_FILE** @stdout, align 8\l  %1 = load float* %arrayidx5, align 4\l  %conv = fpext float %1 to double\l  %call = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %0, i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %conv)\l  %rem = srem i32 %j.0, 80\l  %cmp6 = icmp eq i32 %rem, 79\l  br i1 %cmp6, label %if.then, label %if.end\l}"];
+	Node0x5b60d10 -> Node0x5b60d70;
-	Node0x17d9ea0 -> Node0x17d9ec0;
+	Node0x5b60d70 [shape=record,label="{for.inc:                                          \l  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1\l  %5 = load %struct._IO_FILE*, %struct._IO_FILE** @stdout, align 8\l  %exitcond = icmp ne i64 %indvars.iv.next, 1536\l  br i1 %exitcond, label %for.body3, label %for.end\l}"];
-	Node0x17d9ea0 -> Node0x17da060;
+	Node0x5b60d70 -> Node0x5b5ee20[constraint=false];
-	Node0x17d9ec0 [shape=record,label="{if.then:                                          \l  %2 = load %struct._IO_FILE** @stdout, align 8\l  %call8 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %2, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l  br label %if.end\l}"];
+	Node0x5b60d70 -> Node0x5b60e10;
-	Node0x17d9ec0 -> Node0x17da060;
+	Node0x5b60e10 [shape=record,label="{for.end:                                          \l  %.lcssa = phi %struct._IO_FILE* [ %5, %for.inc ]\l  %fputc = tail call i32 @fputc(i32 10, %struct._IO_FILE* %.lcssa)\l  %indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1\l  %exitcond8 = icmp ne i64 %indvars.iv.next7, 1536\l  br i1 %exitcond8, label %for.cond1.preheader, label %for.end12\l}"];
-	Node0x17da060 [shape=record,label="{if.end:                                           \l  br label %for.inc\l}"];
+	Node0x5b60e10 -> Node0x5b5ee70[constraint=false];
-	Node0x17da060 -> Node0x17da200;
+	Node0x5b60e10 -> Node0x5b60e70;
-	Node0x17da200 [shape=record,label="{for.inc:                                          \l  %indvar.next = add i64 %indvar, 1\l  br label %for.cond1\l}"];
+	Node0x5b60e70 [shape=record,label="{for.end12:                                        \l  ret void\l}"];
 	Node0x17da200 -> Node0x17da220[constraint=false];
 	Node0x17da0f0 [shape=record,label="{for.end:                                          \l  %3 = load %struct._IO_FILE** @stdout, align 8\l  %call9 = call i32 (%struct._IO_FILE*, i8*, ...)* @fprintf(%struct._IO_FILE* %3, i8* getelementptr inbounds ([2 x i8]* @.str1, i32 0, i32 0))\l  br label %for.inc10\l}"];
 	Node0x17da0f0 -> Node0x17da080;
 	Node0x17da080 [shape=record,label="{for.inc10:                                        \l  %indvar.next2 = add i64 %indvar1, 1\l  br label %for.cond\l}"];
 	Node0x17da080 -> Node0x17d4f20[constraint=false];
 	Node0x17d9fc0 [shape=record,label="{for.end12:                                        \l  ret void\l}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d38f0 {
+        subgraph cluster_0x5b349a0 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d4030 {
+          subgraph cluster_0x5b5c2c0 {
-            label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+            label = "Call instruction:   %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
            style = solid;
            color = 6
-            subgraph cluster_0x17d3fb0 {
+            subgraph cluster_0x5b5c240 {
-              label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+              label = "Call instruction:   %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
              style = solid;
              color = 5
-              subgraph cluster_0x17d3f30 {
+              subgraph cluster_0x5b34a20 {
-                label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+                label = "Region can not profitably be optimized!";
                style = solid;
                color = 7
-                Node0x17d9ea0;
+                Node0x5b5ee20;
-                Node0x17d9ec0;
+                Node0x5b60d10;
              }
-              Node0x17da220;
+              Node0x5b60d70;
              Node0x17da060;
              Node0x17da200;
            }
-            Node0x17d4f20;
+            Node0x5b5ee70;
-            Node0x17d3680;
+            Node0x5b60e10;
            Node0x17da0f0;
            Node0x17da080;
          }
-          Node0x17d2200;
+          Node0x5b5ee00;
-          Node0x17d9fc0;
+          Node0x5b5ee50;
          Node0x5b60e70;
        }
 }
--- a/polly/www/experiments/matmul/scopsonly.init_array.dot
+++ b/polly/www/experiments/matmul/scopsonly.init_array.dot
@ -1,47 +1,39 @@
 digraph "Scop Graph for 'init_array' function" {
 	label="Scop Graph for 'init_array' function";
-	Node0x17d4370 [shape=record,label="{entry}"];
+	Node0x5ae2570 [shape=record,label="{entry}"];
-	Node0x17d4370 -> Node0x17d9de0;
+	Node0x5ae2570 -> Node0x5ae4e90;
-	Node0x17d9de0 [shape=record,label="{for.cond}"];
+	Node0x5ae4e90 [shape=record,label="{entry.split}"];
-	Node0x17d9de0 -> Node0x17d9e40;
+	Node0x5ae4e90 -> Node0x5ae4f50;
-	Node0x17d9de0 -> Node0x17d9ea0;
+	Node0x5ae4f50 [shape=record,label="{for.cond1.preheader}"];
-	Node0x17d9e40 [shape=record,label="{for.body}"];
+	Node0x5ae4f50 -> Node0x5ae50e0;
-	Node0x17d9e40 -> Node0x17d9f90;
+	Node0x5ae50e0 [shape=record,label="{for.body3}"];
-	Node0x17d9f90 [shape=record,label="{for.cond1}"];
+	Node0x5ae50e0 -> Node0x5ae50e0[constraint=false];
-	Node0x17d9f90 -> Node0x17d9ff0;
+	Node0x5ae50e0 -> Node0x5ae5100;
-	Node0x17d9f90 -> Node0x17da050;
+	Node0x5ae5100 [shape=record,label="{for.inc17}"];
-	Node0x17d9ff0 [shape=record,label="{for.body3}"];
+	Node0x5ae5100 -> Node0x5ae4f50[constraint=false];
-	Node0x17d9ff0 -> Node0x17d9f00;
+	Node0x5ae5100 -> Node0x5ae4ff0;
-	Node0x17d9f00 [shape=record,label="{for.inc}"];
+	Node0x5ae4ff0 [shape=record,label="{for.end19}"];
 	Node0x17d9f00 -> Node0x17d9f90[constraint=false];
 	Node0x17da050 [shape=record,label="{for.end}"];
 	Node0x17da050 -> Node0x17da200;
 	Node0x17da200 [shape=record,label="{for.inc17}"];
 	Node0x17da200 -> Node0x17d9de0[constraint=false];
 	Node0x17d9ea0 [shape=record,label="{for.end19}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d3a30 {
+        subgraph cluster_0x5ad2dd0 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d4ec0 {
+          subgraph cluster_0x5ad2f50 {
-            label = "";
+            label = "Region can not profitably be optimized!";
-            style = filled;
+            style = solid;
-            color = 3            subgraph cluster_0x17d4180 {
+            color = 6
            subgraph cluster_0x5ad30d0 {
              label = "";
              style = solid;
              color = 5
-              Node0x17d9f90;
+              Node0x5ae50e0;
              Node0x17d9ff0;
              Node0x17d9f00;
            }
-            Node0x17d9de0;
+            Node0x5ae4f50;
-            Node0x17d9e40;
+            Node0x5ae5100;
            Node0x17da050;
            Node0x17da200;
          }
-          Node0x17d4370;
+          Node0x5ae2570;
-          Node0x17d9ea0;
+          Node0x5ae4e90;
          Node0x5ae4ff0;
        }
 }
--- a/polly/www/experiments/matmul/scopsonly.main.dot
+++ b/polly/www/experiments/matmul/scopsonly.main.dot
@ -1,65 +1,50 @@
 digraph "Scop Graph for 'main' function" {
 	label="Scop Graph for 'main' function";
-	Node0x17d3950 [shape=record,label="{entry}"];
+	Node0x5abfcf0 [shape=record,label="{entry}"];
-	Node0x17d3950 -> Node0x17d21a0;
+	Node0x5abfcf0 -> Node0x5ade060;
-	Node0x17d21a0 [shape=record,label="{for.cond}"];
+	Node0x5ade060 [shape=record,label="{entry.split}"];
-	Node0x17d21a0 -> Node0x17db9a0;
+	Node0x5ade060 -> Node0x5ade0e0;
-	Node0x17d21a0 -> Node0x17da4f0;
+	Node0x5ade0e0 [shape=record,label="{for.cond1.preheader}"];
-	Node0x17db9a0 [shape=record,label="{for.body}"];
+	Node0x5ade0e0 -> Node0x5ade100;
-	Node0x17db9a0 -> Node0x17da5e0;
+	Node0x5ade100 [shape=record,label="{for.body3}"];
-	Node0x17da5e0 [shape=record,label="{for.cond1}"];
+	Node0x5ade100 -> Node0x5ae0020;
-	Node0x17da5e0 -> Node0x17da640;
+	Node0x5ae0020 [shape=record,label="{for.body8}"];
-	Node0x17da5e0 -> Node0x17da6a0;
+	Node0x5ae0020 -> Node0x5ae0020[constraint=false];
-	Node0x17da640 [shape=record,label="{for.body3}"];
+	Node0x5ae0020 -> Node0x5ae0080;
-	Node0x17da640 -> Node0x17da550;
+	Node0x5ae0080 [shape=record,label="{for.inc25}"];
-	Node0x17da550 [shape=record,label="{for.cond6}"];
+	Node0x5ae0080 -> Node0x5ade100[constraint=false];
-	Node0x17da550 -> Node0x17da5b0;
+	Node0x5ae0080 -> Node0x5adfef0;
-	Node0x17da550 -> Node0x17da850;
+	Node0x5adfef0 [shape=record,label="{for.inc28}"];
-	Node0x17da5b0 [shape=record,label="{for.body8}"];
+	Node0x5adfef0 -> Node0x5ade0e0[constraint=false];
-	Node0x17da5b0 -> Node0x17da8b0;
+	Node0x5adfef0 -> Node0x5adff50;
-	Node0x17da8b0 [shape=record,label="{for.inc}"];
+	Node0x5adff50 [shape=record,label="{for.end30}"];
 	Node0x17da8b0 -> Node0x17da550[constraint=false];
 	Node0x17da850 [shape=record,label="{for.end}"];
 	Node0x17da850 -> Node0x17db930;
 	Node0x17db930 [shape=record,label="{for.inc25}"];
 	Node0x17db930 -> Node0x17da5e0[constraint=false];
 	Node0x17da6a0 [shape=record,label="{for.end27}"];
 	Node0x17da6a0 -> Node0x17dada0;
 	Node0x17dada0 [shape=record,label="{for.inc28}"];
 	Node0x17dada0 -> Node0x17d21a0[constraint=false];
 	Node0x17da4f0 [shape=record,label="{for.end30}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d3f30 {
+        subgraph cluster_0x5ad2c80 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d38d0 {
+          subgraph cluster_0x5ad2e50 {
            label = "";
            style = filled;
-            color = 3            subgraph cluster_0x17d3850 {
+            color = 3            subgraph cluster_0x5ad2d00 {
              label = "";
              style = solid;
              color = 5
-              subgraph cluster_0x17d37d0 {
+              subgraph cluster_0x5ad2dd0 {
                label = "";
                style = solid;
                color = 7
-                Node0x17da550;
+                Node0x5ae0020;
                Node0x17da5b0;
                Node0x17da8b0;
              }
-              Node0x17da5e0;
+              Node0x5ade100;
-              Node0x17da640;
+              Node0x5ae0080;
              Node0x17da850;
              Node0x17db930;
            }
-            Node0x17d21a0;
+            Node0x5ade0e0;
-            Node0x17db9a0;
+            Node0x5adfef0;
            Node0x17da6a0;
            Node0x17dada0;
          }
-          Node0x17d3950;
+          Node0x5abfcf0;
-          Node0x17da4f0;
+          Node0x5ade060;
          Node0x5adff50;
        }
 }
--- a/polly/www/experiments/matmul/scopsonly.print_array.dot
+++ b/polly/www/experiments/matmul/scopsonly.print_array.dot
@ -1,60 +1,51 @@
 digraph "Scop Graph for 'print_array' function" {
 	label="Scop Graph for 'print_array' function";
-	Node0x17d2200 [shape=record,label="{entry}"];
+	Node0x5ae5e30 [shape=record,label="{entry}"];
-	Node0x17d2200 -> Node0x17d4f20;
+	Node0x5ae5e30 -> Node0x5ae5f50;
-	Node0x17d4f20 [shape=record,label="{for.cond}"];
+	Node0x5ae5f50 [shape=record,label="{entry.split}"];
-	Node0x17d4f20 -> Node0x17d9fd0;
+	Node0x5ae5f50 -> Node0x5ae7d90;
-	Node0x17d4f20 -> Node0x17da030;
+	Node0x5ae7d90 [shape=record,label="{for.cond1.preheader}"];
-	Node0x17d9fd0 [shape=record,label="{for.body}"];
+	Node0x5ae7d90 -> Node0x5ae7f20;
-	Node0x17d9fd0 -> Node0x17da120;
+	Node0x5ae7f20 [shape=record,label="{for.body3}"];
-	Node0x17da120 [shape=record,label="{for.cond1}"];
+	Node0x5ae7f20 -> Node0x5ae7f40;
-	Node0x17da120 -> Node0x17da180;
+	Node0x5ae7f20 -> Node0x5ae7f60;
-	Node0x17da120 -> Node0x17da1e0;
+	Node0x5ae7f40 [shape=record,label="{if.then}"];
-	Node0x17da180 [shape=record,label="{for.body3}"];
+	Node0x5ae7f40 -> Node0x5ae7f60;
-	Node0x17da180 -> Node0x17da090;
+	Node0x5ae7f60 [shape=record,label="{for.inc}"];
-	Node0x17da180 -> Node0x17da0f0;
+	Node0x5ae7f60 -> Node0x5ae7f20[constraint=false];
-	Node0x17da090 [shape=record,label="{if.then}"];
+	Node0x5ae7f60 -> Node0x5ae7e30;
-	Node0x17da090 -> Node0x17da0f0;
+	Node0x5ae7e30 [shape=record,label="{for.end}"];
-	Node0x17da0f0 [shape=record,label="{if.end}"];
+	Node0x5ae7e30 -> Node0x5ae7d90[constraint=false];
-	Node0x17da0f0 -> Node0x17da390;
+	Node0x5ae7e30 -> Node0x5ae8110;
-	Node0x17da390 [shape=record,label="{for.inc}"];
+	Node0x5ae8110 [shape=record,label="{for.end12}"];
 	Node0x17da390 -> Node0x17da120[constraint=false];
 	Node0x17da1e0 [shape=record,label="{for.end}"];
 	Node0x17da1e0 -> Node0x17d9e40;
 	Node0x17d9e40 [shape=record,label="{for.inc10}"];
 	Node0x17d9e40 -> Node0x17d4f20[constraint=false];
 	Node0x17da030 [shape=record,label="{for.end12}"];
 	colorscheme = "paired12"
-        subgraph cluster_0x17d38f0 {
+        subgraph cluster_0x5abb9a0 {
          label = "";
          style = solid;
          color = 1
-          subgraph cluster_0x17d4030 {
+          subgraph cluster_0x5ae32c0 {
-            label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+            label = "Call instruction:   %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
            style = solid;
            color = 6
-            subgraph cluster_0x17d3fb0 {
+            subgraph cluster_0x5ae3240 {
-              label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+              label = "Call instruction:   %call = tail call i32 (%struct._IO_FILE*, i8*, ...) @fprintf(%struct._IO_FILE* %1, i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), double %conv) #2";
              style = solid;
              color = 5
-              subgraph cluster_0x17d3f30 {
+              subgraph cluster_0x5abba20 {
-                label = "Non affine branch in BB 'for.body3' with LHS: %rem and RHS: 79";
+                label = "Region can not profitably be optimized!";
                style = solid;
                color = 7
-                Node0x17da180;
+                Node0x5ae7f20;
-                Node0x17da090;
+                Node0x5ae7f40;
              }
-              Node0x17da120;
+              Node0x5ae7f60;
              Node0x17da0f0;
              Node0x17da390;
            }
-            Node0x17d4f20;
+            Node0x5ae7d90;
-            Node0x17d9fd0;
+            Node0x5ae7e30;
            Node0x17da1e0;
            Node0x17d9e40;
          }
-          Node0x17d2200;
+          Node0x5ae5e30;
-          Node0x17da030;
+          Node0x5ae5f50;
          Node0x5ae8110;
        }
 }