Imported Upstream version 6.10.0.49

Former-commit-id: 1d6753294b2993e1fbf92de9366bb9544db4189b
This commit is contained in:
Xamarin Public Jenkins (auto-signing)
2020-01-16 16:38:04 +00:00
parent d94e79959b
commit 468663ddbb
48518 changed files with 2789335 additions and 61176 deletions

View File

@@ -0,0 +1,93 @@
================
The Architecture
================
Polly is a loop optimizer for LLVM. Starting from LLVM-IR it detects and
extracts interesting loop kernels. For each kernel a mathematical model is
derived which precisely describes the individual computations and memory
accesses in the kernels. Within Polly a variety of analysis and code
transformations are performed on this mathematical model. After all
optimizations have been derived and applied, optimized LLVM-IR is regenerated
and inserted into the LLVM-IR module.
.. image:: images/architecture.png
:align: center
Polly in the LLVM pass pipeline
-------------------------------
The standard LLVM pass pipeline as it is used in -O1/-O2/-O3 mode of clang/opt
consists of a sequence of passes that can be grouped in different conceptual
phases. The first phase, we call it here **Canonicalization**, is a scalar
canonicalization phase that contains passes like -mem2reg, -instcombine,
-cfgsimplify, or early loop unrolling. It has the goal of removing and
simplifying the given IR as much as possible focusing mostly on scalar
optimizations. The second phase consists of three conceptual groups that are
executed in the so-called **Inliner cycle**, This is again a set of **Scalar
Simplification** passes, a set of **Simple Loop Optimizations**, and the
**Inliner** itself. Even though these passes make up the majority of the LLVM
pass pipeline, the primary goal of these passes is still canonicalization
without loosing semantic information that complicates later analysis. As part of
the inliner cycle, the LLVM inliner step-by-step tries to inline functions, runs
canonicalization passes to exploit newly exposed simplification opportunities,
and then tries to inline the further simplified functions. Some simple loop
optimizations are executed as part of the inliner cycle. Even though they
perform some optimizations, their primary goal is still the simplification of
the program code. Loop invariant code motion is one such optimization that
besides being beneficial for program performance also allows us to move
computation out of loops and in the best case enables us to eliminate certain
loops completely. Only after the inliner cycle has been finished, a last
**Target Specialization** phase is run, where IR complexity is deliberately
increased to take advantage of target specific features that maximize the
execution performance on the device we target. One of the principal
optimizations in this phase is vectorization, but also target specific loop
unrolling, or some loop transformations (e.g., distribution) that expose more
vectorization opportunities.
.. image:: images/LLVM-Passes-only.png
:align: center
Polly can conceptually be run at three different positions in the pass pipeline.
As an early optimizer before the standard LLVM pass pipeline, as a later
optimizer as part of the target specialization sequence, and theoretically also
with the loop optimizations in the inliner cycle. We only discuss the first two
options, as running Polly in the inline loop, is likely to disturb the inliner
and is consequently not a good idea.
.. image:: images/LLVM-Passes-all.png
:align: center
Running Polly early before the standard pass pipeline has the benefit that the
LLVM-IR processed by Polly is still very close to the original input code.
Hence, it is less likely that transformations applied by LLVM change the IR in
ways not easily understandable for the programmer. As a result, user feedback is
likely better and it is less likely that kernels that in C seem a perfect fit
for Polly have been transformed such that Polly can not handle them any
more. On the other hand, codes that require inlining to be optimized won't
benefit if Polly is scheduled at this position. The additional set of
canonicalization passes required will result in a small, but general compile
time increase and some random run-time performance changes due to slightly
different IR being passed through the optimizers. To force Polly to run early in
the pass pipleline use the option *-polly-position=early* (default today).
.. image:: images/LLVM-Passes-early.png
:align: center
Running Polly right before the vectorizer has the benefit that the full inlining
cycle has been run and as a result even heavily templated C++ code could
theoretically benefit from Polly (more work is necessary to make Polly here
really effective). As the IR that is passed to Polly has already been
canonicalized, there is also no need to run additional canonicalization passes.
General compile time is almost not affected by Polly, as detection of loop
kernels is generally very fast and the actual optimization and cleanup passes
are only run on functions which contain loop kernels that are worth optimizing.
However, due to the many optimizations that LLVM runs before Polly the IR that
reaches Polly often has additional scalar dependences that make Polly a lot less
efficient. To force Polly to run before the vectorizer in the pass pipleline use
the option *-polly-position=before-vectorizer*. This position is not yet the
default for Polly, but work is on its way to be effective even in presence of
scalar dependences. After this work has been completed, Polly will likely use
this position by default.
.. image:: images/LLVM-Passes-late.png
:align: center

View File

@@ -0,0 +1,103 @@
if (DOXYGEN_FOUND)
if (LLVM_ENABLE_DOXYGEN)
set(abs_srcdir ${CMAKE_CURRENT_SOURCE_DIR})
set(abs_builddir ${CMAKE_CURRENT_BINARY_DIR})
if (HAVE_DOT)
set(DOT ${LLVM_PATH_DOT})
endif()
if (LLVM_DOXYGEN_EXTERNAL_SEARCH)
set(enable_searchengine "YES")
set(searchengine_url "${LLVM_DOXYGEN_SEARCHENGINE_URL}")
set(enable_server_based_search "YES")
set(enable_external_search "YES")
set(extra_search_mappings "${LLVM_DOXYGEN_SEARCH_MAPPINGS}")
else()
set(enable_searchengine "NO")
set(searchengine_url "")
set(enable_server_based_search "NO")
set(enable_external_search "NO")
set(extra_search_mappings "")
endif()
# If asked, configure doxygen for the creation of a Qt Compressed Help file.
if (LLVM_ENABLE_DOXYGEN_QT_HELP)
set(POLLY_DOXYGEN_QCH_FILENAME "org.llvm.polly.qch" CACHE STRING
"Filename of the Qt Compressed help file")
set(POLLY_DOXYGEN_QHP_NAMESPACE "org.llvm.polly" CACHE STRING
"Namespace under which the intermediate Qt Help Project file lives")
set(POLLY_DOXYGEN_QHP_CUST_FILTER_NAME "Clang ${POLLY_VERSION}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters")
set(POLLY_DOXYGEN_QHP_CUST_FILTER_ATTRS "Clang,${POLLY_VERSION}" CACHE STRING
"See http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes")
set(polly_doxygen_generate_qhp "YES")
set(polly_doxygen_qch_filename "${POLLY_DOXYGEN_QCH_FILENAME}")
set(polly_doxygen_qhp_namespace "${POLLY_DOXYGEN_QHP_NAMESPACE}")
set(polly_doxygen_qhelpgenerator_path "${LLVM_DOXYGEN_QHELPGENERATOR_PATH}")
set(polly_doxygen_qhp_cust_filter_name "${POLLY_DOXYGEN_QHP_CUST_FILTER_NAME}")
set(polly_doxygen_qhp_cust_filter_attrs "${POLLY_DOXYGEN_QHP_CUST_FILTER_ATTRS}")
else()
set(polly_doxygen_generate_qhp "NO")
set(polly_doxygen_qch_filename "")
set(polly_doxygen_qhp_namespace "")
set(polly_doxygen_qhelpgenerator_path "")
set(polly_doxygen_qhp_cust_filter_name "")
set(polly_doxygen_qhp_cust_filter_attrs "")
endif()
option(LLVM_DOXYGEN_SVG
"Use svg instead of png files for doxygen graphs." OFF)
if (LLVM_DOXYGEN_SVG)
set(DOT_IMAGE_FORMAT "svg")
else()
set(DOT_IMAGE_FORMAT "png")
endif()
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/doxygen.cfg.in
${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg @ONLY)
set(abs_top_srcdir)
set(abs_top_builddir)
set(DOT)
set(enable_searchengine)
set(searchengine_url)
set(enable_server_based_search)
set(enable_external_search)
set(extra_search_mappings)
set(polly_doxygen_generate_qhp)
set(polly_doxygen_qch_filename)
set(polly_doxygen_qhp_namespace)
set(polly_doxygen_qhelpgenerator_path)
set(polly_doxygen_qhp_cust_filter_name)
set(polly_doxygen_qhp_cust_filter_attrs)
set(DOT_IMAGE_FORMAT)
add_custom_target(doxygen-polly
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating polly doxygen documentation." VERBATIM)
if (LLVM_BUILD_DOCS)
add_dependencies(doxygen doxygen-polly)
endif()
if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/doxygen/html
DESTINATION docs/html)
endif()
endif()
endif()
if (LLVM_ENABLE_SPHINX)
include(AddSphinxTarget)
if (SPHINX_FOUND)
if (${SPHINX_OUTPUT_HTML})
add_sphinx_target(html polly)
endif()
if (${SPHINX_OUTPUT_MAN})
add_sphinx_target(man polly)
endif()
endif()
endif()

View File

@@ -0,0 +1,475 @@
==================================================
How to manually use the Individual pieces of Polly
==================================================
Execute the individual Polly passes manually
============================================
.. sectionauthor:: Singapuram Sanjay Srivallabh
This example presents the individual passes that are involved when optimizing
code with Polly. We show how to execute them individually and explain for
each which analysis is performed or what transformation is applied. In this
example the polyhedral transformation is user-provided to show how much
performance improvement can be expected by an optimal automatic optimizer.
1. **Create LLVM-IR from the C code**
-------------------------------------
Polly works on LLVM-IR. Hence it is necessary to translate the source
files into LLVM-IR. If more than one file should be optimized the
files can be combined into a single file with llvm-link.
.. code-block:: console
clang -S -emit-llvm matmul.c -o matmul.s
2. **Prepare the LLVM-IR for Polly**
------------------------------------
Polly is only able to work with code that matches a canonical form.
To translate the LLVM-IR into this form we use a set of
canonicalication passes. They are scheduled by using
'-polly-canonicalize'.
.. code-block:: console
opt -S -polly-canonicalize matmul.s > matmul.preopt.ll
3. **Show the SCoPs detected by Polly (optional)**
--------------------------------------------------
To understand if Polly was able to detect SCoPs, we print the structure
of the detected SCoPs. In our example two SCoPs are detected. One in
'init_array' the other in 'main'.
.. code-block:: console
$ opt -polly-ast -analyze -q matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
:: isl ast :: init_array :: %for.cond1.preheader---%for.end19
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1)
Stmt_for_body3(c0, c1);
else
{ /* original code */ }
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1) {
Stmt_for_body3(c0, c1);
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body8(c0, c1, c2);
}
else
{ /* original code */ }
4. **Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)**
----------------------------------------------------------------------------------------
Polly can use graphviz to graphically show a CFG in which the detected
SCoPs are highlighted. It can also create '.dot' files that can be
translated by the 'dot' utility into various graphic formats.
.. code-block:: console
$ opt -view-scops -disable-output matmul.preopt.ll
$ opt -view-scops-only -disable-output matmul.preopt.ll
The output for the different functions:
- view-scops : main_, init_array_, print_array_
- view-scops-only : main-scopsonly_, init_array-scopsonly_, print_array-scopsonly_
.. _main: http://polly.llvm.org/experiments/matmul/scops.main.dot.png
.. _init_array: http://polly.llvm.org/experiments/matmul/scops.init_array.dot.png
.. _print_array: http://polly.llvm.org/experiments/matmul/scops.print_array.dot.png
.. _main-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.main.dot.png
.. _init_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.init_array.dot.png
.. _print_array-scopsonly: http://polly.llvm.org/experiments/matmul/scopsonly.print_array.dot.png
5. **View the polyhedral representation of the SCoPs**
------------------------------------------------------
.. code-block:: console
$ opt -polly-scops -analyze matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
Function: init_array
Region: %for.cond1.preheader---%for.end19
Max Loop Depth: 2
Invariant Accesses: {
}
Context:
{ : }
Assumed Context:
{ : }
Invalid Context:
{ : 1 = 0 }
Arrays {
float MemRef_A[*][1536]; // Element size 4
float MemRef_B[*][1536]; // Element size 4
}
Arrays (Bounds as pw_affs) {
float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
}
Alias Groups (0):
n/a
Statements {
Stmt_for_body3
Domain :=
{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
Schedule :=
{ Stmt_for_body3[i0, i1] -> [i0, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] };
}
[...]Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.cond1.preheader => for.end30' in function 'main':
Function: main
Region: %for.cond1.preheader---%for.end30
Max Loop Depth: 3
Invariant Accesses: {
}
Context:
{ : }
Assumed Context:
{ : }
Invalid Context:
{ : 1 = 0 }
Arrays {
float MemRef_C[*][1536]; // Element size 4
float MemRef_A[*][1536]; // Element size 4
float MemRef_B[*][1536]; // Element size 4
}
Arrays (Bounds as pw_affs) {
float MemRef_C[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_A[*][ { [] -> [(1536)] } ]; // Element size 4
float MemRef_B[*][ { [] -> [(1536)] } ]; // Element size 4
}
Alias Groups (0):
n/a
Statements {
Stmt_for_body3
Domain :=
{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 };
Schedule :=
{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] };
Stmt_for_body8
Domain :=
{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 };
Schedule :=
{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] };
}
6. **Show the dependences for the SCoPs**
-----------------------------------------
.. code-block:: console
$ opt -polly-dependences -analyze matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end19' in function 'init_array':
RAW dependences:
{ }
WAR dependences:
{ }
WAW dependences:
{ }
Reduction dependences:
n/a
Transitive closure of reduction dependences:
{ }
[...]Printing analysis 'Polly - Calculate dependences' for region: 'for.cond1.preheader => for.end30' in function 'main':
RAW dependences:
{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
WAR dependences:
{ }
WAW dependences:
{ Stmt_for_body3[i0, i1] -> Stmt_for_body8[i0, i1, 0] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535; Stmt_for_body8[i0, i1, i2] -> Stmt_for_body8[i0, i1, 1 + i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1534 }
Reduction dependences:
n/a
Transitive closure of reduction dependences:
{ }
7. **Export jscop files**
-------------------------
.. code-block:: console
$ opt -polly-export-jscop matmul.preopt.ll -polly-process-unprofitable
.. code-block:: guess
[...]Writing JScop '%for.cond1.preheader---%for.end19' in function 'init_array' to './init_array___%for.cond1.preheader---%for.end19.jscop'.
Writing JScop '%for.cond1.preheader---%for.end30' in function 'main' to './main___%for.cond1.preheader---%for.end30.jscop'.
8. **Import the changed jscop files and print the updated SCoP structure (optional)**
-------------------------------------------------------------------------------------
Polly can reimport jscop files, in which the schedules of the statements
are changed. These changed schedules are used to descripe
transformations. It is possible to import different jscop files by
providing the postfix of the jscop file that is imported.
We apply three different transformations on the SCoP in the main
function. The jscop files describing these transformations are
hand written (and available in docs/experiments/matmul).
**No Polly**
As a baseline we do not call any Polly code generation, but only apply the normal -O3 optimizations.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
for (int c0 = 0; c0 <= 1535; c0 += 1)
for (int c1 = 0; c1 <= 1535; c1 += 1) {
Stmt_for_body3(c0, c1);
for (int c3 = 0; c3 <= 1535; c3 += 1)
Stmt_for_body8(c0, c1, c3);
}
else
{ /* original code */ }
[...]
**Loop Interchange (and Fission to allow the interchange)**
We split the loops and can now apply an interchange of the loop dimensions that enumerate Stmt_for_body8.
.. Although I feel (and have created a .jscop) we can avoid splitting the loops.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
for (int c3 = 0; c3 <= 1535; c3 += 1)
Stmt_for_body8(c1, c3, c2);
}
else
{ /* original code */ }
[...]
**Interchange + Tiling**
In addition to the interchange we now tile the second loop nest.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 64)
for (int c2 = 0; c2 <= 1535; c2 += 64)
for (int c3 = 0; c3 <= 1535; c3 += 64)
for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
for (int c6 = c2; c6 <= c2 + 63; c6 += 1)
Stmt_for_body8(c4, c6, c5);
}
else
{ /* original code */ }
[...]
**Interchange + Tiling + Strip-mining to prepare vectorization**
To later allow vectorization we create a so called trivially
parallelizable loop. It is innermost, parallel and has only four
iterations. It can be replaced by 4-element SIMD instructions.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-ast -analyze -polly-process-unprofitable
.. code-block:: c
[...]
:: isl ast :: main :: %for.cond1.preheader---%for.end30
if (1)
{
for (int c1 = 0; c1 <= 1535; c1 += 1)
for (int c2 = 0; c2 <= 1535; c2 += 1)
Stmt_for_body3(c1, c2);
for (int c1 = 0; c1 <= 1535; c1 += 64)
for (int c2 = 0; c2 <= 1535; c2 += 64)
for (int c3 = 0; c3 <= 1535; c3 += 64)
for (int c4 = c1; c4 <= c1 + 63; c4 += 1)
for (int c5 = c3; c5 <= c3 + 63; c5 += 1)
for (int c6 = c2; c6 <= c2 + 63; c6 += 4)
for (int c7 = c6; c7 <= c6 + 3; c7 += 1)
Stmt_for_body8(c4, c7, c5);
}
else
{ /* original code */ }
[...]
9. **Codegenerate the SCoPs**
-----------------------------
This generates new code for the SCoPs detected by polly. If
-polly-import-jscop is present, transformations specified in the
imported jscop files will be applied.
.. code-block:: console
$ opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled -polly-codegen -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+vector.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
.. code-block:: console
$ opt matmul.preopt.ll -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector -polly-codegen -polly-vectorizer=polly -polly-parallel -polly-process-unprofitable | opt -O3 > matmul.polly.interchanged+tiled+openmp.ll
.. code-block:: guess
Reading JScop '%for.cond1.preheader---%for.end19' in function 'init_array' from './init_array___%for.cond1.preheader---%for.end19.jscop.interchanged+tiled+vector'.
File could not be read: No such file or directory
Reading JScop '%for.cond1.preheader---%for.end30' in function 'main' from './main___%for.cond1.preheader---%for.end30.jscop.interchanged+tiled+vector'.
10. **Create the executables**
------------------------------
.. code-block:: console
$ llc matmul.normalopt.ll -o matmul.normalopt.s && gcc matmul.normalopt.s -o matmul.normalopt.exe
$ llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
$ llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
$ llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
$ llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && gcc -fopenmp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe
11. **Compare the runtime of the executables**
----------------------------------------------
By comparing the runtimes of the different code snippets we see that a
simple loop interchange gives here the largest performance boost.
However in this case, adding vectorization and using OpenMP degrades
the performance.
.. code-block:: console
$ time ./matmul.normalopt.exe
real 0m11.295s
user 0m11.288s
sys 0m0.004s
$ time ./matmul.polly.interchanged.exe
real 0m0.988s
user 0m0.980s
sys 0m0.008s
$ time ./matmul.polly.interchanged+tiled.exe
real 0m0.830s
user 0m0.816s
sys 0m0.012s
$ time ./matmul.polly.interchanged+tiled+vector.exe
real 0m5.430s
user 0m5.424s
sys 0m0.004s
$ time ./matmul.polly.interchanged+tiled+vector+openmp.exe
real 0m3.184s
user 0m11.972s
sys 0m0.036s

View File

@@ -0,0 +1,57 @@
.. include:: <isonum.txt>
==================================================
Performance
==================================================
High-Performance Generalized Matrix Multiplication
--------------------------------------------------
Polly automatically detects and optimizes generalized matrix multiplication,
the computation C |larr| α ⊗ C ⊕ β ⊗ A ⊗ B, where A, B, and C are three appropriately sized matrices,
⊕ and ⊗ operations are originating from the corresponding matrix semiring, and α and β are
constants, and beta is not equal to zero. It allows to obtain the highly optimized form structured
similar to the expert implementation of GEMM that can be found in GotoBLAS and its successors. The
performance evaluation of GEMM is shown in the following figure.
.. image:: images/GEMM_double.png
:align: center
Compile Time Impact of Polly
----------------------------
Clang+LLVM+Polly are compiled using Clang on a Intel(R) Core(TM) i7-7700 based system. The experiment
is repeated twice: with and without Polly enabled in order to measure its compile time impact.
The following versions are used:
- Polly (git hash 0db98a4837b6f233063307bb9184374175401922)
- Clang (git hash 3e1d04a92b51ed36163995c96c31a0e4bbb1561d)
- LLVM git hash 0265ec7ebad69a47f5c899d95295b5eb41aba68e)
`ninja <https://ninja-build.org/>`_ is used as the build system.
For both cases the whole compilation was performed five times. The compile times in seconds are shown in the following table.
+----------------------------+
| Compile Time |
+--------------+-------------+
|Polly Disabled|Polly Enabled|
+==============+=============+
|964 |977 |
+--------------+-------------+
|964 |980 |
+--------------+-------------+
|967 |981 |
+--------------+-------------+
|967 |981 |
+--------------+-------------+
|968 |982 |
+--------------+-------------+
The median compile time without Polly enabled is 967 seconds and with Polly enabled it is 981 seconds. The overhead is 1.4%.

View File

@@ -0,0 +1,3 @@
=================
Release Notes 6.0
=================

View File

@@ -0,0 +1,56 @@
==================================================
Tips and Tricks on using and contributing to Polly
==================================================
Commiting to polly trunk
------------------------
- `General reference to git-svn workflow <https://stackoverflow.com/questions/190431/is-git-svn-dcommit-after-merging-in-git-dangerous>`_
Using bugpoint to track down errors in large files
--------------------------------------------------
If you know the ``opt`` invocation and have a large ``.ll`` file that causes
an error, ``bugpoint`` allows one to reduce the size of test cases.
The general calling pattern is:
- ``$ bugpoint <file.ll> <pass that causes the crash> -opt-args <opt option flags>``
An example invocation is:
- ``$ bugpoint crash.ll -polly-codegen -opt-args -polly-canonicalize -polly-process-unprofitable``
For more documentation on bugpoint, `Visit the LLVM manual <http://llvm.org/docs/Bugpoint.html>`_
Understanding which pass makes a particular change
--------------------------------------------------
If you know that something like `opt -O3 -polly` makes a change, but you wish to
isolate which pass makes a change, the steps are as follows:
- ``$ bugpoint -O3 file.ll -opt-args -polly`` will allow bugpoint to track down the pass which causes the crash.
To do this manually:
- ``$ opt -O3 -polly -debug-pass=Arguments`` to get all passes that are run by default. ``-debug-pass=Arguments`` will list all passes that have run.
- Bisect down to the pass that changes it.
Debugging regressions introduced at some unknown earlier point
--------------------------------------------------------------
In case of a regression in performance or correctness (e.g., an earlier version
of Polly behaved as expected and a later version does not), bisecting over the
version history is the standard approach to identify the commit that introduced
the regression.
LLVM has a single repository that contains all projects. It can be cloned at:
`<https://github.com/llvm-project/llvm-project-20170507>`_. How to bisect on a
git repository is explained here
`<https://www.metaltoad.com/blog/beginners-guide-git-bisect-process-elimination>`_.
The bisect process can also be automated as explained here:
`<https://www.metaltoad.com/blog/mechanizing-git-bisect-bug-hunting-lazy>`_.
An LLVM specific run script is available here:
`<https://gist.github.com/dcci/891cd98d80b1b95352a407d80914f7cf>`_.

View File

@@ -0,0 +1,132 @@
======================
Using Polly with Clang
======================
This documentation discusses how Polly can be used in Clang to automatically
optimize C/C++ code during compilation.
.. warning::
Warning: clang/LLVM/Polly need to be in sync (compiled from the same SVN
revision).
Make Polly available from Clang
===============================
Polly is available through clang, opt, and bugpoint, if Polly was checked out
into tools/polly before compilation. No further configuration is needed.
Optimizing with Polly
=====================
Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
flags (Polly is only available at -O3).
.. code-block:: console
clang -O3 -mllvm -polly file.c
Automatic OpenMP code generation
================================
To automatically detect parallel loops and generate OpenMP code for them you
also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.
.. code-block:: console
clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c
Automatic Vector code generation
================================
Automatic vector code generation can be enabled by adding -mllvm
-polly-vectorizer=stripmine to your CFLAGS.
.. code-block:: console
clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c
Extract a preoptimized LLVM-IR file
===================================
Often it is useful to derive from a C-file the LLVM-IR code that is actually
optimized by Polly. Normally the LLVM-IR is automatically generated from the C
code by first lowering C to LLVM-IR (clang) and by subsequently applying a set
of preparing transformations on the LLVM-IR. To get the LLVM-IR after the
preparing transformations have been applied run Polly with '-O0'.
.. code-block:: console
clang -O0 -mllvm -polly -S -emit-llvm file.c
Further options
===============
Polly supports further options that are mainly useful for the development or the
analysis of Polly. The relevant options can be added to clang by appending
-mllvm -option-name to the CFLAGS or the clang command line.
Limit Polly to a single function
--------------------------------
To limit the execution of Polly to a single function, use the option
-polly-only-func=functionname.
Disable LLVM-IR generation
--------------------------
Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
see the effects of the preparing transformation, but to disable Polly code
generation add the option polly-no-codegen.
Graphical view of the SCoPs
---------------------------
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
'show' options automatically run dotty or another graphviz viewer to show the
scops graphically. The 'dot' options store for each function a dot file that
highlights the detected SCoPs. If 'only' is appended at the end of the option,
the basic blocks are shown without the statements the contain.
Change/Disable the Optimizer
----------------------------
Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
for data-locality and parallelism using the Pluto algorithm.
To disable the optimizer entirely use the option -polly-optimizer=none.
Disable tiling in the optimizer
-------------------------------
By default both optimizers perform tiling, if possible. In case this is not
wanted the option -polly-tiling=false can be used to disable it. (This option
disables tiling for both optimizers).
Import / Export
---------------
The flags -polly-import and -polly-export allow the export and reimport of the
polyhedral representation. By exporting, modifying and reimporting the
polyhedral representation externally calculated transformations can be
applied. This enables external optimizers or the manual optimization of
specific SCoPs.
Viewing Polly Diagnostics with opt-viewer
-----------------------------------------
The flag -fsave-optimization-record will generate .opt.yaml files when compiling
your program. These yaml files contain information about each emitted remark.
Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
To run opt-viewer:
.. code-block:: console
llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
/path/to/program/src/foo.opt.yaml \
/path/to/program/src/bar.opt.yaml \
-o ./output
Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
to view all diagnostics from your program in opt-viewer. Compile with `PGO
<https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.

240
external/llvm-project/polly/docs/conf.py vendored Normal file
View File

@@ -0,0 +1,240 @@
# -*- coding: utf-8 -*-
#
# Polly documentation build configuration file, created by
# sphinx-quickstart on Sun Dec 9 20:01:55 2012.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
from datetime import date
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'Polly'
copyright = u'2010-%d, The Polly Team' % date.today().year
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '6.0-devel'
# The full version, including alpha/beta/rc tags.
release = '6.0-devel'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build', 'analyzer']
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'friendly'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
try:
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
except ImportError:
html_theme = 'haiku'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'Pollydoc'
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'Polly.tex', u'Polly Documentation',
u'The Polly Team', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'Polly', u'Polly Documentation',
u'The Polly Team', 'Polly', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,33 @@
{
"arrays" : [
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end19",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_A[i0, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_B[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1] }"
}
]
}

View File

@@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [i0, i1, 0, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [i0, i1, 1, i2] }"
}
]
}

View File

@@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, i0, i2, i1] }"
}
]
}

View File

@@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0 ] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, i1]: o0 <= i0 < o0 + 64 and o1 <= i1 < o1 + 64 and o2 <= i2 < o2 + 64 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 }"
}
]
}

View File

@@ -0,0 +1,57 @@
{
"arrays" : [
{
"name" : "MemRef_C",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_A",
"sizes" : [ "1536" ],
"type" : "float"
},
{
"name" : "MemRef_B",
"sizes" : [ "1536" ],
"type" : "float"
}
],
"context" : "{ : }",
"name" : "%for.cond1.preheader---%for.end30",
"statements" : [
{
"accesses" : [
{
"kind" : "write",
"relation" : "{ Stmt_for_body3[i0, i1] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body3[i0, i1] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 }",
"name" : "Stmt_for_body3",
"schedule" : "{ Stmt_for_body3[i0, i1] -> [0, i0, i1, 0, 0, 0, 0, 0 ] }"
},
{
"accesses" : [
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_A[i0, i2] }"
},
{
"kind" : "read",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_B[i2, i1] }"
},
{
"kind" : "write",
"relation" : "{ Stmt_for_body8[i0, i1, i2] -> MemRef_C[i0, i1] }"
}
],
"domain" : "{ Stmt_for_body8[i0, i1, i2] : 0 <= i0 <= 1535 and 0 <= i1 <= 1535 and 0 <= i2 <= 1535 }",
"name" : "Stmt_for_body8",
"schedule" : "{ Stmt_for_body8[i0, i1, i2] -> [1, o0, o1, o2, i0, i2, oo1, i1]: o0 <= i0 < o0 + 64 and o1 <= oo1 < o1 + 64 and o2 <= i2 < o2 + 64 and oo1 <= i1 < oo1 + 4 and o0 % 64 = 0 and o1 % 64 = 0 and o2 % 64 = 0 and oo1 % 4 = 0 }"
}
]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

View File

@@ -0,0 +1 @@
e2c6cf6684c7912c4409fb4557a0a7e508158ad2

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

View File

@@ -0,0 +1 @@
4d959a16f5a34b981fe162574cb6dcef676ad361

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

View File

@@ -0,0 +1 @@
f32665990ec0a853e545e817926da2815e04404e

Some files were not shown because too many files have changed in this diff Show More