Autotuner Feature Guide - Bisheng Compiler - HUAWEI TECHNOLOGIES CO., LTD - Issue Date
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. iBisheng Compiler
Autotuner Feature Guide Contents
Contents
1 Overview....................................................................................................................................1
1.1 Concepts..................................................................................................................................................................................... 1
1.2 Functions of the Bisheng Compiler................................................................................................................................... 1
1.3 Functions of the Autotuner..................................................................................................................................................2
1.4 Autotuner Tuning Process.................................................................................................................................................... 2
2 Quick Start................................................................................................................................ 4
2.1 Obtaining the Autotuner...................................................................................................................................................... 4
2.2 Environment Requirements................................................................................................................................................. 4
2.3 Installing the Autotuner........................................................................................................................................................ 4
2.4 Running the Autotuner......................................................................................................................................................... 5
2.4.1 Running Modes..................................................................................................................................................................... 5
2.4.2 llvm-autotune (Recommended)..................................................................................................................................... 6
2.4.3 auto-tuner.............................................................................................................................................................................. 8
2.5 Uninstalling the Autotuner.................................................................................................................................................. 9
3 Preparations............................................................................................................................10
4 Usage........................................................................................................................................11
4.1 llvm-autotune (Recommended)...................................................................................................................................... 11
4.1.1 Tool Introduction............................................................................................................................................................... 11
4.1.2 Help Information............................................................................................................................................................... 11
4.1.3 Compiler-related Options............................................................................................................................................... 12
4.2 auto-tuner............................................................................................................................................................................... 12
4.2.1 Tool Introduction............................................................................................................................................................... 13
4.2.2 Help Information............................................................................................................................................................... 13
4.2.3 Parse Instruction................................................................................................................................................................ 13
4.2.3.1 Usage of the Parse Instruction.................................................................................................................................. 13
4.2.3.2 Filters.................................................................................................................................................................................. 14
4.2.3.3 Search Configuration File............................................................................................................................................ 15
4.2.3.4 Parse Example................................................................................................................................................................. 16
4.2.4 Run Instruction................................................................................................................................................................... 16
4.2.4.1 Running the Tuner......................................................................................................................................................... 17
4.2.4.2 Configuration File.......................................................................................................................................................... 18
4.2.4.3 Tuners................................................................................................................................................................................. 19
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. iiBisheng Compiler Autotuner Feature Guide Contents 4.2.4.4 Search Space File............................................................................................................................................................20 4.2.4.5 Algorithm.......................................................................................................................................................................... 21 4.2.4.6 Run Example.................................................................................................................................................................... 22 4.2.5 Auto-run Instruction......................................................................................................................................................... 22 4.2.5.1 Usage of the Auto-run Instruction........................................................................................................................... 22 4.2.5.2 Auto-run Example.......................................................................................................................................................... 24 5 Appendix..................................................................................................................................26 5.1 Feedback.................................................................................................................................................................................. 26 5.2 Change History.................................................................................................................................................................... 26 Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. iii
Bisheng Compiler
Autotuner Feature Guide 1 Overview
1 Overview
1.1 Concepts
1.2 Functions of the Bisheng Compiler
1.3 Functions of the Autotuner
1.4 Autotuner Tuning Process
1.1 Concepts
Automatic Tuning
Automatic tuning is an automatic iterative process that optimizes a given program
by manipulating compilation options for optimal performance. This process is
completed by the collaboration of two components, the Bisheng compiler and the
Autotuner command line tool.
Bisheng Compiler
A compiler with the automatic tuning feature can work with the Autotuner to
control optimization in a finer granularity.
Autotuner
The Autotuner is a command line tool that needs to be used together with the
Bisheng compiler. It manages the generation and parameter operations of search
spaces and drives the entire tuning process.
1.2 Functions of the Bisheng Compiler
As one of the features of the Bisheng compiler, the automatic tuning can control
optimization in a finer granularity. You do not need to add pragma directives into
the source code. Instead, you can specify the optimization configuration in a
simple YAML file. The file contains the optimization information and the
corresponding code region information, including the name and line number. In
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 1Bisheng Compiler
Autotuner Feature Guide 1 Overview
addition, it can record optimization results, generate a tuning opportunity list, and
export the list in YAML format.
Purposes
● Make the compilation process more flexible and controllable.
● Fine-grained compilation control provides more tuning opportunities.
Functions
● Read the compilation configuration corresponding to each code area.
● Output the tuning opportunities, that is, which structures in the target
program can be used for tuning.
1.3 Functions of the Autotuner
● Interact with the Bisheng compiler:
– Create a search space based on the tuning opportunities generated by
the compiler.
– Generate the compilation configuration and invoke the compiler to
compile the source code.
● Operate tuning parameters and apply the search algorithm.
– Built-in genetic algorithm.
● Obtain performance data.
1.4 Autotuner Tuning Process
As shown in Figure 1-1, the tuning process consists of two phases: initial
compilation and tuning process.
Figure 1-1 Autotuner tuning process
Initial Compilation
In the initial compilation phase before tuning, the Autotuner instructs the compiler
to compile the target program code. During the compilation, the Bisheng compiler
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 2Bisheng Compiler
Autotuner Feature Guide 1 Overview
generates some YAML files that contain all tuning opportunities, and tells us
which structures in the target program can be used for tuning, such as module,
function, and loop. For example, loop unrolling is one of the most common
optimization methods in a compiler. By copying loop body code for multiple
times, the loop unrolling achieves optimization effects such as increasing a space
for instruction scheduling and reducing overheads of loop branch instructions. If
the tuning is performed based on the unroll factor, the compiler generates all the
loops that can be cyclically unrolled in the YAML file as the tuning opportunities.
Tuning Process
After the tuning opportunities are generated, the tuning process starts.
1. The Autotuner reads the YAML files of the tuning opportunities to generate
the corresponding search spaces, that is, the parameters and ranges for each
tuning opportunity.
2. The Autotuner tries a group of parameters based on the specified search
algorithm to generate a compilation configuration file in YAML format. In this
way, the compiler compiles the target program code to generate a binary file.
3. Finally, the Autotuner runs the compiled file in a user-defined manner and
obtains the performance information as the feedback.
4. After a certain number of iterations, the Autotuner finds the optimal
configuration, generates the optimal compilation configuration file, and stores
the file in YAML format.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 3Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
2 Quick Start
2.1 Obtaining the Autotuner
2.2 Environment Requirements
2.3 Installing the Autotuner
2.4 Running the Autotuner
2.5 Uninstalling the Autotuner
2.1 Obtaining the Autotuner
The Autotuner has been included in the release package of the Bisheng compiler.
You can find the file in the directory bisheng-compiler-1.3.3-aarch64-linux/lib/
autotuner.
2.2 Environment Requirements
Mandatory:
● Operating systems: openEuler21.03, openEuler 20.03 (LTS), CentOS 7.6,
Ubuntu 18.04, Ubuntu 20, Kylin V10, and UOS 20
● Architecture: AArch64
● Python 3.8.2
● SQLite 3.0
Optional:
● LibYAML (recommended, which can improve the Auotuner file parsing speed)
2.3 Installing the Autotuner
The Autotuner has been included in the release package of the Bisheng compiler.
If you have installed the Bisheng compiler, you only need to configure the
environment variable of the Bisheng compiler. Otherwise, install the Bisheng
compiler first.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 4Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
● Run the following command to configure the environment variable of the
Bisheng compiler:
export PATH=/opt/compiler/bisheng-compiler-1.3.3-aarch64-linux/bin:$PATH
NOTICE
The /opt/compiler is used as an example. The actual installation directory
prevails.
● Verify the installation.
Run the following commands:
llvm-autotune -h
auto-tuner -h
If the help information is displayed, the installation is successful.
NOTICE
If an error occurs during the running, ensure that your system meets the
requirements described in 2.2 Environment Requirements.
For example:
bad magic number in 'autotuner': b'U\r\r\n'
Ensure that your Python 3 version is 3.8.2 and the installation path exists in
PATH. Run the python3 -V command to check the Python 3 version.
No module named '_sqlite3'
Ensure that SQLite 3.0 has been installed.
2.4 Running the Autotuner
2.4.1 Running Modes
Currently, the Autotuner can be used in two modes with two different command
line tools, llvm-autotune and auto-tuner.
● The llvm-auotune allows users to lead the tuning process and provides
auxiliary functions to work with the compiler. Compared with the auto-tuner,
the llvm-auotune greatly simplifies the configuration and tuning procedure.
The llvm-auotune is recommended because it is available out-of-the-box.
● The auto-tuner is a traditional tuning tool that manages the entire tuning
process. You need to adapt the configuration file to set the details during the
tuning, including how to compile and run code, and how to obtain the
performance information and tunable parameters.
The following uses the coremark as an example to describe how to perform
automatic tuning. The release package of the Bisheng compiler does not contain
the coremark. Obtain the coremark from the community. For details, see 4
Usage.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 5Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
2.4.2 llvm-autotune (Recommended)
You can write the tuning scripts as required. The following uses the coremark as
an example to describe how to perform automatic tuning. The release package of
the Bisheng compiler does not contain the coremark. Obtain the coremark from
the community. The following is an example of the script for tuning the coremark
in 20 iterations:
export AUTOTUNE_DATADIR=/tmp/autotuner_data/
CompileCommand="clang -Ilinux64 -I. -g -DFLAGS_STR=\"\" -DITERATIONS=300000 core_list_join.c
core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark"
$CompileCommand -fautotune-generate;
llvm-autotune minimize;
for i in $(seq 20)
do
$CompileCommand -fautotune ;
time=`/usr/bin/time -p ./coremark 0x0 0x0 0x66 300000 2>&1 1>/dev/null | grep real | awk '{print $2}'`;
echo "iteration: " $i "cost time:" $time;
llvm-autotune feedback $time;
done
llvm-autotune finalize;
The steps are as follows:
Step 1 Configuring environment variable
Use the environment variable AUTOTUNE_DATADIR to specify the storage
location of tuning-related data.
export AUTOTUNE_DATADIR=/tmp/autotuner_data/
Step 2 Initial compilation procedure
Add the -fautotune-generate option to the Bisheng compiler to generate tuning
opportunities.
cd examples/coremark/
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune-generate
NOTICE
It is recommended that this option be used only for hotspot code files that require
tuning. If there are too many code files (more than 500) of the application, a large
number of tuning opportunity files are generated. As a result, the initialization in
Step 3 may take a long time (several minutes). In addition, the tuning effect is
not satisfactory and the convergence time is long due to the huge search space.
Step 3 Initial tuning
Run the llvm-autotune command to initialize the tuning task. Generate the initial
compilation configuration for the next compilation.
llvm-autotune minimize
minimize indicates the tuning target to minimize indicators such as program
running time. You can also use maximize to maximize indicators such as program
throughput.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 6Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
Step 4 Tuning and compilation
Add the -fautotune option to the Bisheng compiler to read the current
AUTOTUNE_DATADIR configuration and compile.
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune
Step 5 Performance feedback
You can run the program and obtain performance data based on your
requirements. Run the llvm-autotune feedback command to feed back the
performance data. For example, if you want to perform the tuning based on the
coremark running speed, run the following commands:
time -p ./coremark 0x0 0x0 0x66 300000 2>&1 1>/dev/null
llvm-autotune feedback 31.09
NOTICE
Before running the llvm-autotune feedback command, you are advised to check
whether the compilation in Step 4 is normal and whether the compiled program is
running properly. If the compilation or running is abnormal, enter the worst value
of the tuning target. For example, if the tuning target is minimize, enter llvm-
autotune feedback 9999. If the tuning target is maximize, enter 0 or -9999.
If the input performance feedback is incorrect, the final tuning result may be
affected.
Step 6 Tuning iteration
Repeat steps 4 and 5 to perform optimization iteration based on the specified
number of iteration times.
Step 7 Stopping tuning
After multiple iterations, you can stop the tuning and save the optimal
configuration file. The configuration file is saved in the directory specified by the
environment variable AUTOTUNE_DATADIR.
llvm-autotune finalize
Step 8 Final compilation
Use the optimal configuration file obtained in Step 7 to perform the final
compilation. If the environment variable is not changed, you can directly use the -
fautotune option.
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune
Alternatively, you can run the use -mllvm -auto-tuning-input= command to
directly point to the configuration file.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 7Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -mllvm -auto-tuning-
input=/tmp/autotuner_data/config.yaml
----End
2.4.3 auto-tuner
Use the auto-tuner tool to manage the tuning process. The procedure is as
follows. The configuration file for tuning coremark will be used during the process.
You can find the configuration file in the Bisheng software package directory /lib/
autotuner/config/coremark_sample.ini.
Step 1 Generating a tuning opportunity list
Use the -mllvm -auto-tuning-opp= option of the Bisheng compiler to
generate a tuning opportunity list for the search space.
cd examples/coremark/
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -mllvm -auto-tuning-
opp=opp
Step 2 Parsing
Parse the tuning opportunity list to generate the search space.
cd ../..
auto-tuner parse ./examples/coremark/opp/* -o loop_search.yaml --type-filter loop
If you want to perform tuning only at the loop level, you can use the --type-filter
loop option to specify that only the loop search space is generated.
Step 3 Running
Use the generated search space file to start automatic tuning.
auto-tuner run config/coremark_sample.ini --results-log module.log --stop-after 600 -ss loop_search.yaml --
time-after-convergence 300
You can use --stop-after or --time-after-convergence to set the tuning time. In
this example, the task will stop 600 seconds after the tuning starts, or 300 seconds
after no better configuration can be found.
NOTE
If the following error occurs:
/bin/sh: config/../../../bin/clang not found
It is because BinPath in config/coremark_sample.ini is set incorrectly. Change the value to
the bin path of the Bisheng compiler.
----End
Alternatively, run the auto_run command to generate a tuning opportunity list,
parse the list, and run the automatic tuning program step by step.
The auto_run command automatically completes the preceding three phases, that
is, automatically generates a tuning opportunity list, parses the list as a search
space, and then automatically starts tuning. Command:
auto-tuner auto_run config/coremark_sample.ini --results-log coremark.log --stop-after 600
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 8Bisheng Compiler
Autotuner Feature Guide 2 Quick Start
At the same time, it starts automatic tuning in three phases (module -> function -
> loop). In each phase, parameters are adjusted at a specific fine-grained level
(module, function, loop, or machine_basic_block).
NOTE
If you want to tune only at a specific fine-grained level, use the --stage-order option (for
example, --stage-order loop).
2.5 Uninstalling the Autotuner
Edit environment variable PATH and delete the path /opt/compiler/bisheng-
compiler-1.3.3-aarch64-linux/bin of the newly added Bisheng compiler.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 9Bisheng Compiler
Autotuner Feature Guide 3 Preparations
3 Preparations
Step 1 Install the Autotuner. For more information, see 2 Quick Start.
Step 2 The Autotuner must be used with a compiler that supports tuning.
Before running the Autotuner, check whether the environment variable of the
compiler is correctly set. Alternatively, you can put the environment variable in the
configuration file. For details, see 4 Usage.
----End
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 10Bisheng Compiler
Autotuner Feature Guide 4 Usage
4 Usage
4.1 llvm-autotune (Recommended)
4.2 auto-tuner
4.1 llvm-autotune (Recommended)
4.1.1 Tool Introduction
Currently, the Autotuner can be used in two modes with two different command
line tools, llvm-autotune and auto-tuner.
The llvm-auotune allows users to lead the tuning process and provides auxiliary
functions to work with the compiler. Compared with the auto-tuner, the llvm-
auotune greatly simplifies the configuration and tuning procedure. The llvm-
auotune is recommended because it is available out-of-the-box.
4.1.2 Help Information
Help command: llvm-autotune -h. The execution format of the llvm-autotune is
as follows:
llvm-autotune [-h] {minimize,maximize,feedback,dump,finalize}
Optional instructions:
● minimize: initializes tuning and generates an initial compiler configuration
file to minimize indicators (such as running time).
● maximize: initializes tuning and generates the initial compiler configuration
file to maximize indicators (such as throughput).
● feedback: feeds back the performance optimization result and generates new
compiler configuration.
● dump: generates the optimal configuration without stopping the tuning
(feedback can be continued).
● finalize: stops tuning and generate the optimal compiler configuration
(feedback cannot be executed).
Help information.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 11Bisheng Compiler
Autotuner Feature Guide 4 Usage
● --help/-h
usage: llvm-autotune [-h] {minimize,maximize,feedback,dump,finalize} ...
positional arguments:
{minimize,maximize,feedback,dump,finalize}
minimize Initialize tuning and generate the initial compiler
configuration file, aiming to minimize the metric
(e.g. run time)
maximize Initialize tuning and generate the initial compiler
configuration file, aiming to maximize the metric
(e.g. throughput)
feedback Feed back performance tuning result and generate a new
test configuration
dump Dump the current best configuration without
terminating the tuning run
finalize Finalize tuning and generate the optimal compiler
configuration
optional arguments:
-h, --help show this help message and exit
4.1.3 Compiler-related Options
llvm-auotune needs to be used with the -fautotune-generate and -fautotune
options of the Bisheng compiler.
● -fautotune-generate:
– The tuning opportunity list is generated in the autotune_datadir
directory. The default directory can be modified by the environment
variable AUTOTUNE_DATADIR.
– As the first step of tuning preparation, you need to use the option before
running the llvm-autotune minimize/maximize command.
– You can also assign a value to this option to change the tuning
granularity. The options are Other, Function, Loop, and
MachineBasicBlock. For example, -fautotune-generate=Function
enables the tuning opportunities of the function type. Each function is
assigned a different parameter value during tuning. Other indicates
global. The generated tuning opportunities correspond to compilation
units (code files).
-fautotune-generate is equivalent to -fautotune-
generate=Function,Loop by default. The default value is recommended.
● -fautotune:
– Use the compiler configuration in the autotune_datadir directory for
tuning and compilation. (The default directory can be modified by the
environment variable AUTOTUNE_DATADIR.)
– This option is used after the llvm-autotune minimize/maximize/
feedback command is run during tuning iteration.
NOTE
For details, see 2.4.2 llvm-autotune (Recommended).
4.2 auto-tuner
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 12Bisheng Compiler
Autotuner Feature Guide 4 Usage
4.2.1 Tool Introduction
Currently, the Autotuner can be used in two modes with two different command
line tools, llvm-autotune and auto-tuner.
The auto-tuner is a traditional tuning tool that manages the entire tuning process.
You need to adapt the configuration file to set the details during the tuning,
including how to compile and run code, and how to obtain the performance
information and tunable parameters.
4.2.2 Help Information
Help command: auto-tuner -h. The execution format of auto-tuner is as follows:
Auto-tuner [-h] {run,merge,divide,parse,auto_run} ...
Optional instructions:
● run: runs the tuner.
● merge: merges multiple compilation configuration files.
● divide: divides a compilation configuration file into multiple files based on the
source code file name in the configuration file.
● parse: parses the tuning opportunity list to generate the search space.
● auto_run (recommended): automatically generates the search space and
performs the tuning by phase. The default phase sequence is module >
function > loop.
The three main instructions are parse, run, and auto_run.
Help information.
● --help/-h
usage: auto-tuner [-h] {run,merge,divide,parse,auto_run} ...
positional arguments:
{run,merge,divide,parse,auto_run}
commands help
run Run the tuner
merge Merge LLVM configuration input files
divide Divide LLVM configuration input file into multiple
files based on file_name
parse Parse the tuning opportunity files and generate search
space
auto_run (recommended) auto-generate the search space and run
the auto-phase-based tuning (the default order of
stages is module -> function -> loop)
optional arguments:
-h, --help show this help message and exit
4.2.3 Parse Instruction
4.2.3.1 Usage of the Parse Instruction
The parse instruction is used to parse the tuning opportunity list and generate the
search space. The format of the parse instruction is as follows:
auto-tuner parse ...
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 13Bisheng Compiler
Autotuner Feature Guide 4 Usage
Mandatory parameter:
● opp_file: tuning opportunity file generated by the compiler
Common optional parameter:
● --output/-o : specifies the address of the output file.
Help information:
● --help/-h
positional arguments:
opp_file Opportunity files generated by LLVM
optional arguments:
-h, --help show this help message and exit
--parse-format [{xml,yaml}]
choose the format of LLVM auto-tuning-
input/opp,(default: yaml)
-nf Name [Name ...], --name-filter Name [Name ...]
to filter code regions by names when generating search
space
--func-name-filter Name [Name ...]
to filter code regions by function names when
generating search space
--file-name-filter Name [Name ...]
to filter code regions by file names when generating
search space
-scf SEARCH_CONFIG_FILE, --search-config-file SEARCH_CONFIG_FILE
The Search space config file
-o FILE, --output FILE
output file
-tf {machine_basic_block,loop,function,module} [{machine_basic_block,loop,function,module} ...], --type-
filter {machine_basic_block,loop,function,module} [{machine_basic_block,loop,function,module} ...]
to filter code regions by types when generating search
space
4.2.3.2 Filters
When the search space is generated, the code regions in the opp file can be
filtered based on the region name, function name, file name, and type. If no filter
is applied, the search space will contain all code regions. The format of the
instruction is as follows:
--name-filter Region name 1 Region name 2 Region name 3
--func-name-filter Function name 1 Function name 2 Function name 3
--file-name-filter File name 1 File name 2 File name 3
--type-filter Type name 1 Type name 2 Type name 3
NOTICE
These options filter the code regions by matching the text information in the opp
file.
For example, use file_name to filter the following code regions:
--- !AutoTuning
Pass: machine-scheduler
Name: '%bb.2:if.end'
DebugLoc: { File: core_list_join.c, Line: 287, Column: 7 }
Function: core_list_insert_new
CodeRegionType: machine_basic_block
...
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 14Bisheng Compiler
Autotuner Feature Guide 4 Usage
Select the correct value for --file-name-filter from the following options:
● [×] ./core_list_join.c
● [×] /home/user/coremark/core_list_join.c
● [√] core_list_join.c
4.2.3.3 Search Configuration File
The search configuration file defines global parameter settings for each type of
code region. You can use --search-config-file to specify a personalized search
configuration file. If --search-config-file is not specified, the Auotuner uses the
default search configuration file.
The content of the default search configuration file is as follows:
CodeRegion:
CodeRegionType: loop
Args:
VectorizationInterleave:
value: [1, 2, 4]
type: enum
UnrollCount:
value: [0, 1, 2, 4, 8]
type: enum
PeelCount:
value: [0, 1]
type: enum
---
CodeRegion:
CodeRegionType: machine_basic_block
Args:
MachineScheduling:
value: ["TopDown", "BottomUp", "Bidirectional"]
type: enum
---
CodeRegion:
CodeRegionType: function
Args:
InlineThreshold:
value: [175, 225, 275, 325, 375, 425, 500]
type: enum
---
CodeRegion:
CodeRegionType: other
Args:
OptPass:
type: selection
value: [ipsccp, globalopt, mem2reg, deadargelim, instcombine, simplifycfg, prune-eh, inline,
functionattrs,
argpromotion, sroa, jump-threading, simplifycfg, aggressive-instcombine, instcombine, tailcallelim,
simplifycfg,
reassociate, loop-simplify, lcssa, loop-rotate, licm, loop-unswitch, simplifycfg, instcombine, loop-simplify,
lcssa, indvars, loop-deletion, loop-unroll, gvn, memcpyopt, sccp, instcombine, jump-threading, dse, loop-
simplify,
lcssa, licm, simplifycfg, instcombine, globalopt, globaldce, loop-simplify, lcssa, loop-rotate, loop-simplify,
instcombine,
simplifycfg, instcombine, loop-simplify, lcssa, loop-unroll, instcombine, loop-simplify, lcssa, licm, strip-
dead-prototypes,
globaldce, constmerge, loop-simplify, lcssa, simplifycfg]
When configuring the personalized search configuration file, refer to the preceding
default search configuration file.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 15Bisheng Compiler
Autotuner Feature Guide 4 Usage
Important Configuration Attributes
Key Value
CodeRegionType other, loop, function, machine_basic_block
type bool, enum, range, permutation, selection
Variable Type
● bool: indicates a parameter of the Boolean type.
Args:
ParamName:
type: bool
● enum: indicates a parameter of an unordered set. Randomly select a value
from the specified set.
Args:
ParamName:
type: enum
value: [0, 2, 4, 8]
● range: indicates a parameter whose value is an integer within the valid range
(from 0 to 255). The minimum value and maximum value must be specified.
Args:
ParamName:
type: range
min: 1
max: 6
● permutation: indicates a permutation parameter. Disorder the elements in
value to form a permutation.
Args:
ParamName:
type: permutation
value: [option1, option2, option3, option4]
● selection: indicates a permutation parameter. Select any number of elements
from value to form a permutation in any order.
Args:
ParamName:
type: selection
value: [option1, option2, option4, option5]
4.2.3.4 Parse Example
Run the following command as an example:
auto-tuner parse -o search_space.yaml --type-filter loop module
● opp1.yaml opp2.yaml opp3.yaml is a tuning opportunity list generated by
the compiler through -auto-tuning-opp.
● -o search_space.yaml is used to generate the search space file
search_space.yaml, which will be used as the input of the run instruction.
● --type-filter loop is used to filter out the loop code regions.
4.2.4 Run Instruction
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 16Bisheng Compiler
Autotuner Feature Guide 4 Usage
4.2.4.1 Running the Tuner
The format of the run instruction is as follows:
auto-tuner run --search_space
Mandatory parameters:
● config_file: tuning configuration file, which is used to configure the
compilation and running methods and related paths.
● --search_space : search space file, which is generated by the parse
instruction.
Common optional parameters:
● --results-log : log file, which is used to record the information
generated each time the optimal configuration is found.
● --results-log-details: log file, which is used to record information about
each iteration.
● --test-limit : maximum iterations
● --stop-after : The tuning is stopped after the specified time
(second).
● --time-after-convergence : If no better compilation configuration
is found after the specified time (second), the tuning is stopped.
Help information:
● --help/-h
positional arguments:
config_file The tuning config file.
optional arguments:
-h, --help show this help message and exit
--machine-class MACHINE_CLASS
name of the machine class being run on
--parallel-compile present if compiling can be done in parallel
--test-limit TEST_LIMIT
stop tuning after given tests count
--stop-after STOP_AFTER
stop tuning after given seconds
--parallelism PARALLELISM
how many tests to support at once
--pipelining PIPELINING
how long a delay (in generations) before results are
available
--bail-threshold BAIL_THRESHOLD
abort if no requests have been made in X generations
--no-dups don't print out warnings for duplicate requests
--seed-configuration FILENAME
Start search at a given configuration. Can be
specified multiple times. Configurations are loaded
with ConfigurationManipulator.load_from_file() and
file format is detected from extension.
--results-log RESULTS_LOG
file to store log of the best configuration times
--results-log-details RESULTS_LOG_DETAILS
file to store log of the non-best configuration times
--quiet print less information
--display-frequency DISPLAY_FREQUENCY
how often for DisplayPlugin to print
--technique TECHNIQUE, -t TECHNIQUE
which technique to use
--list-techniques, -lt
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 17Bisheng Compiler
Autotuner Feature Guide 4 Usage
list techniques available and exit
--generate-bandit-technique, -gbt
randomly generate a bandit to use
--label LABEL name for the TuningRun
--print-search-space-size
Print out the estimated size of the search space and
exit
--database DATABASE database to store tuning results in, see: http://docs.
sqlalchemy.org/en/rel_0_8/core/engines.html#database-
urls
--print-params, -pp show parameters of the configuration being tuned
--time-after-convergence TIME, -tac TIME
stop tuning if no new best results after given
seconds
-o DIR, --output DIR write " "optimal yaml config into the given directory
--parse-format [{xml,yaml}]
choose the format of LLVM auto-tuning-
input/opp,(default: yaml)
--plugin-dir DIR specify the dir to load customized tuner scripts
-tr TUNER, --tuner TUNER
Select which tuner to use
-lr, --list-tuners List all available tuners
--add-llvm-inputs ADD_LLVM_INPUTS [ADD_LLVM_INPUTS ...]
add existing llvm configuration input files
asconstants in addition to the llvm
configurations generated in each iteration of the
tuning run
-ss SEARCH_SPACE, --search_space SEARCH_SPACE
The search space file.
--enable-final-compile
perform final compilation with optimal config at the
end of tuning
4.2.4.2 Configuration File
You need to modify the configuration file, including the system environment
variable, compilation information, and running information. For details, see the
examples in the Bisheng software package directory /lib/autotuner/config.
The following is an example of the configuration file for coremark tuning:
# variables that can be shared in all the sections below
[DEFAULT] # optional
# Home = /path/to/your/home
# change your environment variables
[Environment Setting] # optional
# prepend a list of paths into the PATH in order.
# PATH = /path/to/bin
# you can also set other environment variables here too.
[Compiling Setting] # required
# NOTE: ConfigFilePath is set to the path to the current config file automatically by default.
CompileDir = %(ConfigFilePath)s/../examples/coremark/
# Specify where autotuner will generate the compilation config (LLVM input file).
# This will be passed to the compiler with -auto-tuning-input.
LLVMInputFile = %(CompileDir)s/input.yaml
BinPath = %(ConfigFilePath)s/../../../bin/
CompileCommand = %(BinPath)s/clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g
core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark -
mllvm -auto-tuning-input=%(LLVMInputFile)s
RunDir = %(CompileDir)s
RunCommand = ./coremark 0x0 0x0 0x66 300000 # run 300000 iterations for coremark
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 18Bisheng Compiler
Autotuner Feature Guide 4 Usage
# OppDir and OppCompileCommand are optional, do not have to specify this if not using auto_run sub-
command.
# Specify where autotuner will parse tuning opportunity files from.
# This should be set to where the compiler generate tuning opportunity files with -auto-tuning-opp.
OppDir = %(CompileDir)s/opp
# both -auto-tuning-input and -mllvm -auto-tuning-opp=opp need to be used in the
OppCompileCommand directly or indirectly.
# -auto-tuning-input is also needed here because auto_run can invoke multiple stages of tuning runs. The
later stage needs to take the previous stage's best config to generate tuning opportunities.
OppCompileCommand = %(CompileCommand)s -mllvm -auto-tuning-opp=%(OppDir)s
4.2.4.3 Tuners
A tuner is an instance used to define specific tuning behavior, including
initialization, compilation, running, and testing. The behavior needs to be defined
in different ways depending on the specific tuning task objectives. Therefore, we
have multiple tuners for different objectives. You can find the sample file of the
customized tuner in the Bisheng software package directory /lib/autotuner/
plugin/.
● Create a customized tuner.
You can write a Python file to inherit the parent class CustomTunerBase and
overwrite some functions as required to create a customized tuner. To register
a customized tuner, you need to name the Python file xxx_tuner.py with the
suffix _tuner.py and place the file in the tuner plug-in directory.
● Use your own tuner plug-in.
If you need to use your own tuner plug-in when running the auto-tuner
instruction, use the following option to specify the plug-in directory where the
user-defined tuner is located:
--plugin-dir
● Select the tuner you want to use.
--tuner(or -tr)
If you do not specify the tuner to be used, SimpleTuner is used by default.
● Check all tuners.
If you want to check all tuners, run the following instruction to list all tuners:
--list-tuners (or -lr)
The following is an example of a customized tuner for coremark tuning.
import os
from opentuner import Result
from opentuner.search.objective import MinimizeCycle
from autotuner.tuners.tunerbase import CustomTunerBase
class Tuner(CustomTunerBase):
# The run method runs opentuner under the given configuration
# and returns the calculated performance under this configuration
def run(self, desired_result, input, limit):
"""
Compile and run a given configuration then
return performance
"""
cycles = float('inf')
# create a command for running a executable
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 19Bisheng Compiler
Autotuner Feature Guide 4 Usage
run_result = self.call_program(self.run_cmd, cwd=self.run_dir, limit=120)
# check if the source program is compiled and run successful
if run_result['returncode'] == 0:
std = run_result['stdout']
if "Correct operation validated." in std:
cycles_line = std.strip().splitlines()[2]
cycles = int(cycles_line.replace('Total ticks :', ''))
else:
if not os.path.isdir('errors_log'):
os.mkdir('errors_log')
with open("errors_log/errors_" + str(desired_result.configuration.id) + ".log", 'w') as file:
file.write(std)
print('coremark errors detected')
else:
self._print_errors(self.run_cmd, run_result)
return Result(cycle=cycles, time=run_result['time'])
def objective(self):
"""
Override the default object MinimizeTime
"""
return MinimizeCycle()
To automatically tune the coremark, you need to run the executable file, parse the
stdout result, and use cycle as the metric. Therefore, run() and objective() need
to be overwritten from the parent class. For more detailed examples, see the
scripts in the release package directory plugin/. Currently, the following metrics
are supported:
● time (required)
● cycle (optional)
● rate (optional)
The metrics need to be transferred with Result as the return value of the run()
function. For example:
return Result(rate=rate, time=run_result['time'])
The tuning objectives corresponding to the three metrics are as follows:
● MinimizeTime()
● MinimizeCycle()
● MaximizeRate()
For example, if MinizeTime() is used as the tuning objective, the smaller the
Result.time value obtained after the run() function is executed in each iteration,
the better the compilation configuration used in this iteration.
If MaxmizeRate() is used as the tuning objective, the greater the Result.rate
value obtained after the run() function is executed in each iteration, the better the
compilation configuration used in this iteration.
4.2.4.4 Search Space File
The search space file is a necessary parameter of the run instruction. It defines the
detailed search space (such as the code regions and parameters) for the tuning
task. The file can be generated from the tuning opportunity list generated by the
compiler using the parse instruction.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 20Bisheng Compiler
Autotuner Feature Guide 4 Usage
NOTE
To specify a search space, use --search-space or -ss.
Example: -ss SEARCH_SPACE_FILE
The following is an example of a search space file in YAML format:
code_region:
code_region_type: loop
debug_loc:
column: 13
file_name: core_list_join.c
line: 453
func_name: core_list_init
name: while.cond7.i.outer
pass_name: loop-unroll
params:
PeelCount:
type: enum
value: [0,1]
UnrollCount:
type: enum
value: [0,1,2,4,8]
VectorizationInterleave:
type: enum
value: [1,2,4]
tuning_id: 1
---
code_region:
code_region_type: loop
debug_loc:
column: 13
file_name: core_list_join.c
line: 443
func_name: core_list_init
name: for.body.i
pass_name: loop-vectorize
params:
PeelCount:
type: enum
value: [0, 1]
-0
-1
UnrollCount:
type: enum
value: [0,1,2,4,8]
VectorizationInterleave:
type: enum
value: [1,2,4]
tuning_id: 2
It is very similar to the search configuration file, except that each specific code
region corresponds to a set of parameters.
4.2.4.5 Algorithm
You can specify a search algorithm to run automatic tuning.
For example, if the automatic tuning function is used for debugging, you can use
the SimpleTraverse algorithm, which traverses all parameter values and can
change only one parameter value at a time.
● List all algorithms.
--list-techniques
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 21Bisheng Compiler
Autotuner Feature Guide 4 Usage
● Use a specific algorithm (for example, SimpleTraverse).
--technique SimpleTraverse
4.2.4.6 Run Example
Run the following command as an example:
auto-tuner run config/coremark_sample.ini --plugin-dir ./plugin-dir -tr coremark_tuner --results-log
coremark.log --results-log-details details.log --stop-after 3600 --time-after-convergence 600 -ss
search_space.yaml
The parameters are described as follows:
● coremark_sample.ini: tuning configuration file
● --plugin-dir ./plugin-dir: defines the customized plug-in directory.
● coremark_tuner: specifies the customized tuner stored in the plug-in
directory ./plugin-dir.
● --results-log coremark.log: records the performance information of the
optimal configuration found in each iteration.
● --results-log-details details.log: records performance information about
each iteration.
● --stop-after 3600: The tuning stops after 3600 seconds.
● --time-after-convergence 600: The tuning stops if no better configuration is
found after 600 seconds.
● -ss search_space.yaml: uses search_space.yaml as the tuning space file.
After the debugging is complete, the optimal configuration is generated as
opt_config.yaml. You can use the -o option to customize the name of the optimal
configuration file.
You can add the Bisheng compiler option -mllvm -auto-tuning-
input=opt_config.yaml to this configuration file to make it take effect and
generate the optimal binary file.
For example, to compile the coremark, run the following command:
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark -mllvm -auto-tuning-
input=opt_config.yaml
4.2.5 Auto-run Instruction
4.2.5.1 Usage of the Auto-run Instruction
The auto-run instruction is similar to the run instruction, but it automatically
generates a search space instead of transferring the search space file through the
command line.
NOTE
This function requires some additional settings in the configuration file, such as config/
coremark.sample.ini.
The format of the auto-run instruction is as follows:
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 22Bisheng Compiler
Autotuner Feature Guide 4 Usage
auto-tuner auto_run
Mandatory parameter:
● config_file: tuning configuration file, which is used to configure the
compilation and running methods and related paths.
Common optional parameter:
● --stage-order : specifies the sequence of tuning phases. The default
sequence is module -> function -> loop. For example, use --stage-order
function loop to perform fine-grained function-level tuning and then cyclic
tuning.
positional arguments:
config_file The tuning config file.
optional arguments:
-h, --help show this help message and exit
--machine-class MACHINE_CLASS
name of the machine class being run on
--parallel-compile present if compiling can be done in parallel
--test-limit TEST_LIMIT
stop tuning after given tests count
--stop-after STOP_AFTER
stop tuning after given seconds
--parallelism PARALLELISM
how many tests to support at once
--pipelining PIPELINING
how long a delay (in generations) before results are
available
--bail-threshold BAIL_THRESHOLD
abort if no requests have been made in X generations
--no-dups don't print out warnings for duplicate requests
--seed-configuration FILENAME
Start search at a given configuration. Can be
specified multiple times. Configurations are loaded
with ConfigurationManipulator.load_from_file() and
file format is detected from extension.
--results-log RESULTS_LOG
file to store log of the best configuration times
--results-log-details RESULTS_LOG_DETAILS
file to store log of the non-best configuration times
--quiet print less information
--display-frequency DISPLAY_FREQUENCY
how often for DisplayPlugin to print
--technique TECHNIQUE, -t TECHNIQUE
which technique to use
--list-techniques, -lt
list techniques available and exit
--generate-bandit-technique, -gbt
randomly generate a bandit to use
--label LABEL name for the TuningRun
--print-search-space-size
Print out the estimated size of the search space and
exit
--database DATABASE database to store tuning results in, see: http://docs.
sqlalchemy.org/en/rel_0_8/core/engines.html#database-
urls
--print-params, -pp show parameters of the configuration being tuned
--time-after-convergence TIME, -tac TIME
stop tuning if no new best " "results after given
seconds
-o DIR, --output DIR write " "optimal yaml config into the given directory
--parse-format [{xml,yaml}]
choose the format of LLVM auto-tuning-
input/opp,(default: yaml)
--stage-order stage [stage ...]
specify stage order of auto_run. each stage is a code
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 23Bisheng Compiler
Autotuner Feature Guide 4 Usage
region type
-nf Name [Name ...], --name-filter Name [Name ...]
to filter code regions by names when generating search
space
--func-name-filter Name [Name ...]
to filter code regions by function names when
generating search space
--file-name-filter Name [Name ...]
to filter code regions by file names when generating
search space
-scf SEARCH_CONFIG_FILE, --search-config-file SEARCH_CONFIG_FILE
The Search space config file
--plugin-dir DIR specify the dir to load customized tuner scripts
-tr TUNER, --tuner TUNER
Select which tuner to use
-lr, --list-tuners List all available tuners
--add-llvm-inputs ADD_LLVM_INPUTS [ADD_LLVM_INPUTS ...]
add existing llvm configuration input files
asconstants in addition to the llvm
configurationsgenerated in each iteration of the
tuning run
The auto-run instruction also automatically performs code region tuning based on
different granularities. That is, the auto-run instruction executes three tuning tasks
at different code region levels in sequence. The working mode is as follows:
In each phase, the optimal configuration found in the previous phase is used as
the constant configuration in the next phase, and the tuning task is executed at a
finer code region level and corresponding tuning parameters.
When each tuning phase is complete, the optimal configuration file corresponding
to each phase is generated for the compiler to use. Similar to the run instruction,
the optimal configuration file generated by this instruction can take effect by
adding the -mllvm -auto-tuning-input=< file path > option of the Bisheng
compiler.
NOTE
All the command line options contained in the run subcommand will be invoked three
times in turn in auto_run, because it has three tuning runs. For example, if you use the --
stop-after 10 option to stop the tuning 10 seconds later, the auto-run instruction will stop
in 30 seconds because there are three phases.
4.2.5.2 Auto-run Example
Run the following command as an example:
auto-tuner auto_run config/coremark_sample.ini -tr coremark_tuner --results-log coremark.log --results-
log-details details.log --time-after-convergence 600
The auto_run instruction is similar to the run command. The difference is that the
auto_run instruction does not require the search space. Similarly, you can use the
specified filter to generate a search space, just like the parse instruction.
In this example, the optimal configuration files module.yaml, function.yaml, and
loop.yaml corresponding to the three tuning phases are generated by default.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 24Bisheng Compiler
Autotuner Feature Guide 4 Usage
Select the optimal configuration file for compilation as required. Generally, you are
advised to use the last tuning phase configuration file, because it contains all the
configuration information of the previous tuning phases.
clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g core_list_join.c core_main.c
core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark -mllvm -auto-tuning-
input=loop.yaml
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 25Bisheng Compiler
Autotuner Feature Guide 5 Appendix
5 Appendix
5.1 Feedback
5.2 Change History
5.1 Feedback
If you encounter any problem and need technical support, send the problem
information to the Kunpeng compiler forum.
5.2 Change History
Date Change History
2021-06-22 This is the fifth official release. The update is as
follows:
Updated the description of using the Autotuner.
2020-12-12 This is the fourth official release. The update is as
follows:
Added the description of the llvm-autotune tool.
2020-11-26 This is the third official release. The update is as
follows:
Added the parameter description of the instructions
in Chinese.
Added the working mode diagram of the auto-run
instruction.
2020-10-29 This is the second official release. The update is as
follows:
Updated the Autotuner tuning flowchart.
2020-09-28 This is the first official release.
Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 26You can also read