Autotuner Feature Guide - Bisheng Compiler - HUAWEI TECHNOLOGIES CO., LTD - Issue Date
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. i
Bisheng Compiler Autotuner Feature Guide Contents Contents 1 Overview....................................................................................................................................1 1.1 Concepts..................................................................................................................................................................................... 1 1.2 Functions of the Bisheng Compiler................................................................................................................................... 1 1.3 Functions of the Autotuner..................................................................................................................................................2 1.4 Autotuner Tuning Process.................................................................................................................................................... 2 2 Quick Start................................................................................................................................ 4 2.1 Obtaining the Autotuner...................................................................................................................................................... 4 2.2 Environment Requirements................................................................................................................................................. 4 2.3 Installing the Autotuner........................................................................................................................................................ 4 2.4 Running the Autotuner......................................................................................................................................................... 5 2.4.1 Running Modes..................................................................................................................................................................... 5 2.4.2 llvm-autotune (Recommended)..................................................................................................................................... 6 2.4.3 auto-tuner.............................................................................................................................................................................. 8 2.5 Uninstalling the Autotuner.................................................................................................................................................. 9 3 Preparations............................................................................................................................10 4 Usage........................................................................................................................................11 4.1 llvm-autotune (Recommended)...................................................................................................................................... 11 4.1.1 Tool Introduction............................................................................................................................................................... 11 4.1.2 Help Information............................................................................................................................................................... 11 4.1.3 Compiler-related Options............................................................................................................................................... 12 4.2 auto-tuner............................................................................................................................................................................... 12 4.2.1 Tool Introduction............................................................................................................................................................... 13 4.2.2 Help Information............................................................................................................................................................... 13 4.2.3 Parse Instruction................................................................................................................................................................ 13 4.2.3.1 Usage of the Parse Instruction.................................................................................................................................. 13 4.2.3.2 Filters.................................................................................................................................................................................. 14 4.2.3.3 Search Configuration File............................................................................................................................................ 15 4.2.3.4 Parse Example................................................................................................................................................................. 16 4.2.4 Run Instruction................................................................................................................................................................... 16 4.2.4.1 Running the Tuner......................................................................................................................................................... 17 4.2.4.2 Configuration File.......................................................................................................................................................... 18 4.2.4.3 Tuners................................................................................................................................................................................. 19 Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. ii
Bisheng Compiler Autotuner Feature Guide Contents 4.2.4.4 Search Space File............................................................................................................................................................20 4.2.4.5 Algorithm.......................................................................................................................................................................... 21 4.2.4.6 Run Example.................................................................................................................................................................... 22 4.2.5 Auto-run Instruction......................................................................................................................................................... 22 4.2.5.1 Usage of the Auto-run Instruction........................................................................................................................... 22 4.2.5.2 Auto-run Example.......................................................................................................................................................... 24 5 Appendix..................................................................................................................................26 5.1 Feedback.................................................................................................................................................................................. 26 5.2 Change History.................................................................................................................................................................... 26 Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. iii
Bisheng Compiler Autotuner Feature Guide 1 Overview 1 Overview 1.1 Concepts 1.2 Functions of the Bisheng Compiler 1.3 Functions of the Autotuner 1.4 Autotuner Tuning Process 1.1 Concepts Automatic Tuning Automatic tuning is an automatic iterative process that optimizes a given program by manipulating compilation options for optimal performance. This process is completed by the collaboration of two components, the Bisheng compiler and the Autotuner command line tool. Bisheng Compiler A compiler with the automatic tuning feature can work with the Autotuner to control optimization in a finer granularity. Autotuner The Autotuner is a command line tool that needs to be used together with the Bisheng compiler. It manages the generation and parameter operations of search spaces and drives the entire tuning process. 1.2 Functions of the Bisheng Compiler As one of the features of the Bisheng compiler, the automatic tuning can control optimization in a finer granularity. You do not need to add pragma directives into the source code. Instead, you can specify the optimization configuration in a simple YAML file. The file contains the optimization information and the corresponding code region information, including the name and line number. In Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 1
Bisheng Compiler Autotuner Feature Guide 1 Overview addition, it can record optimization results, generate a tuning opportunity list, and export the list in YAML format. Purposes ● Make the compilation process more flexible and controllable. ● Fine-grained compilation control provides more tuning opportunities. Functions ● Read the compilation configuration corresponding to each code area. ● Output the tuning opportunities, that is, which structures in the target program can be used for tuning. 1.3 Functions of the Autotuner ● Interact with the Bisheng compiler: – Create a search space based on the tuning opportunities generated by the compiler. – Generate the compilation configuration and invoke the compiler to compile the source code. ● Operate tuning parameters and apply the search algorithm. – Built-in genetic algorithm. ● Obtain performance data. 1.4 Autotuner Tuning Process As shown in Figure 1-1, the tuning process consists of two phases: initial compilation and tuning process. Figure 1-1 Autotuner tuning process Initial Compilation In the initial compilation phase before tuning, the Autotuner instructs the compiler to compile the target program code. During the compilation, the Bisheng compiler Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 2
Bisheng Compiler Autotuner Feature Guide 1 Overview generates some YAML files that contain all tuning opportunities, and tells us which structures in the target program can be used for tuning, such as module, function, and loop. For example, loop unrolling is one of the most common optimization methods in a compiler. By copying loop body code for multiple times, the loop unrolling achieves optimization effects such as increasing a space for instruction scheduling and reducing overheads of loop branch instructions. If the tuning is performed based on the unroll factor, the compiler generates all the loops that can be cyclically unrolled in the YAML file as the tuning opportunities. Tuning Process After the tuning opportunities are generated, the tuning process starts. 1. The Autotuner reads the YAML files of the tuning opportunities to generate the corresponding search spaces, that is, the parameters and ranges for each tuning opportunity. 2. The Autotuner tries a group of parameters based on the specified search algorithm to generate a compilation configuration file in YAML format. In this way, the compiler compiles the target program code to generate a binary file. 3. Finally, the Autotuner runs the compiled file in a user-defined manner and obtains the performance information as the feedback. 4. After a certain number of iterations, the Autotuner finds the optimal configuration, generates the optimal compilation configuration file, and stores the file in YAML format. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 3
Bisheng Compiler Autotuner Feature Guide 2 Quick Start 2 Quick Start 2.1 Obtaining the Autotuner 2.2 Environment Requirements 2.3 Installing the Autotuner 2.4 Running the Autotuner 2.5 Uninstalling the Autotuner 2.1 Obtaining the Autotuner The Autotuner has been included in the release package of the Bisheng compiler. You can find the file in the directory bisheng-compiler-1.3.3-aarch64-linux/lib/ autotuner. 2.2 Environment Requirements Mandatory: ● Operating systems: openEuler21.03, openEuler 20.03 (LTS), CentOS 7.6, Ubuntu 18.04, Ubuntu 20, Kylin V10, and UOS 20 ● Architecture: AArch64 ● Python 3.8.2 ● SQLite 3.0 Optional: ● LibYAML (recommended, which can improve the Auotuner file parsing speed) 2.3 Installing the Autotuner The Autotuner has been included in the release package of the Bisheng compiler. If you have installed the Bisheng compiler, you only need to configure the environment variable of the Bisheng compiler. Otherwise, install the Bisheng compiler first. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 4
Bisheng Compiler Autotuner Feature Guide 2 Quick Start ● Run the following command to configure the environment variable of the Bisheng compiler: export PATH=/opt/compiler/bisheng-compiler-1.3.3-aarch64-linux/bin:$PATH NOTICE The /opt/compiler is used as an example. The actual installation directory prevails. ● Verify the installation. Run the following commands: llvm-autotune -h auto-tuner -h If the help information is displayed, the installation is successful. NOTICE If an error occurs during the running, ensure that your system meets the requirements described in 2.2 Environment Requirements. For example: bad magic number in 'autotuner': b'U\r\r\n' Ensure that your Python 3 version is 3.8.2 and the installation path exists in PATH. Run the python3 -V command to check the Python 3 version. No module named '_sqlite3' Ensure that SQLite 3.0 has been installed. 2.4 Running the Autotuner 2.4.1 Running Modes Currently, the Autotuner can be used in two modes with two different command line tools, llvm-autotune and auto-tuner. ● The llvm-auotune allows users to lead the tuning process and provides auxiliary functions to work with the compiler. Compared with the auto-tuner, the llvm-auotune greatly simplifies the configuration and tuning procedure. The llvm-auotune is recommended because it is available out-of-the-box. ● The auto-tuner is a traditional tuning tool that manages the entire tuning process. You need to adapt the configuration file to set the details during the tuning, including how to compile and run code, and how to obtain the performance information and tunable parameters. The following uses the coremark as an example to describe how to perform automatic tuning. The release package of the Bisheng compiler does not contain the coremark. Obtain the coremark from the community. For details, see 4 Usage. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 5
Bisheng Compiler Autotuner Feature Guide 2 Quick Start 2.4.2 llvm-autotune (Recommended) You can write the tuning scripts as required. The following uses the coremark as an example to describe how to perform automatic tuning. The release package of the Bisheng compiler does not contain the coremark. Obtain the coremark from the community. The following is an example of the script for tuning the coremark in 20 iterations: export AUTOTUNE_DATADIR=/tmp/autotuner_data/ CompileCommand="clang -Ilinux64 -I. -g -DFLAGS_STR=\"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark" $CompileCommand -fautotune-generate; llvm-autotune minimize; for i in $(seq 20) do $CompileCommand -fautotune ; time=`/usr/bin/time -p ./coremark 0x0 0x0 0x66 300000 2>&1 1>/dev/null | grep real | awk '{print $2}'`; echo "iteration: " $i "cost time:" $time; llvm-autotune feedback $time; done llvm-autotune finalize; The steps are as follows: Step 1 Configuring environment variable Use the environment variable AUTOTUNE_DATADIR to specify the storage location of tuning-related data. export AUTOTUNE_DATADIR=/tmp/autotuner_data/ Step 2 Initial compilation procedure Add the -fautotune-generate option to the Bisheng compiler to generate tuning opportunities. cd examples/coremark/ clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune-generate NOTICE It is recommended that this option be used only for hotspot code files that require tuning. If there are too many code files (more than 500) of the application, a large number of tuning opportunity files are generated. As a result, the initialization in Step 3 may take a long time (several minutes). In addition, the tuning effect is not satisfactory and the convergence time is long due to the huge search space. Step 3 Initial tuning Run the llvm-autotune command to initialize the tuning task. Generate the initial compilation configuration for the next compilation. llvm-autotune minimize minimize indicates the tuning target to minimize indicators such as program running time. You can also use maximize to maximize indicators such as program throughput. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 6
Bisheng Compiler Autotuner Feature Guide 2 Quick Start Step 4 Tuning and compilation Add the -fautotune option to the Bisheng compiler to read the current AUTOTUNE_DATADIR configuration and compile. clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune Step 5 Performance feedback You can run the program and obtain performance data based on your requirements. Run the llvm-autotune feedback command to feed back the performance data. For example, if you want to perform the tuning based on the coremark running speed, run the following commands: time -p ./coremark 0x0 0x0 0x66 300000 2>&1 1>/dev/null llvm-autotune feedback 31.09 NOTICE Before running the llvm-autotune feedback command, you are advised to check whether the compilation in Step 4 is normal and whether the compiled program is running properly. If the compilation or running is abnormal, enter the worst value of the tuning target. For example, if the tuning target is minimize, enter llvm- autotune feedback 9999. If the tuning target is maximize, enter 0 or -9999. If the input performance feedback is incorrect, the final tuning result may be affected. Step 6 Tuning iteration Repeat steps 4 and 5 to perform optimization iteration based on the specified number of iteration times. Step 7 Stopping tuning After multiple iterations, you can stop the tuning and save the optimal configuration file. The configuration file is saved in the directory specified by the environment variable AUTOTUNE_DATADIR. llvm-autotune finalize Step 8 Final compilation Use the optimal configuration file obtained in Step 7 to perform the final compilation. If the environment variable is not changed, you can directly use the - fautotune option. clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -fautotune Alternatively, you can run the use -mllvm -auto-tuning-input= command to directly point to the configuration file. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 7
Bisheng Compiler Autotuner Feature Guide 2 Quick Start clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -mllvm -auto-tuning- input=/tmp/autotuner_data/config.yaml ----End 2.4.3 auto-tuner Use the auto-tuner tool to manage the tuning process. The procedure is as follows. The configuration file for tuning coremark will be used during the process. You can find the configuration file in the Bisheng software package directory /lib/ autotuner/config/coremark_sample.ini. Step 1 Generating a tuning opportunity list Use the -mllvm -auto-tuning-opp= option of the Bisheng compiler to generate a tuning opportunity list for the search space. cd examples/coremark/ clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -g -o coremark -mllvm -auto-tuning- opp=opp Step 2 Parsing Parse the tuning opportunity list to generate the search space. cd ../.. auto-tuner parse ./examples/coremark/opp/* -o loop_search.yaml --type-filter loop If you want to perform tuning only at the loop level, you can use the --type-filter loop option to specify that only the loop search space is generated. Step 3 Running Use the generated search space file to start automatic tuning. auto-tuner run config/coremark_sample.ini --results-log module.log --stop-after 600 -ss loop_search.yaml -- time-after-convergence 300 You can use --stop-after or --time-after-convergence to set the tuning time. In this example, the task will stop 600 seconds after the tuning starts, or 300 seconds after no better configuration can be found. NOTE If the following error occurs: /bin/sh: config/../../../bin/clang not found It is because BinPath in config/coremark_sample.ini is set incorrectly. Change the value to the bin path of the Bisheng compiler. ----End Alternatively, run the auto_run command to generate a tuning opportunity list, parse the list, and run the automatic tuning program step by step. The auto_run command automatically completes the preceding three phases, that is, automatically generates a tuning opportunity list, parses the list as a search space, and then automatically starts tuning. Command: auto-tuner auto_run config/coremark_sample.ini --results-log coremark.log --stop-after 600 Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 8
Bisheng Compiler Autotuner Feature Guide 2 Quick Start At the same time, it starts automatic tuning in three phases (module -> function - > loop). In each phase, parameters are adjusted at a specific fine-grained level (module, function, loop, or machine_basic_block). NOTE If you want to tune only at a specific fine-grained level, use the --stage-order option (for example, --stage-order loop). 2.5 Uninstalling the Autotuner Edit environment variable PATH and delete the path /opt/compiler/bisheng- compiler-1.3.3-aarch64-linux/bin of the newly added Bisheng compiler. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 9
Bisheng Compiler Autotuner Feature Guide 3 Preparations 3 Preparations Step 1 Install the Autotuner. For more information, see 2 Quick Start. Step 2 The Autotuner must be used with a compiler that supports tuning. Before running the Autotuner, check whether the environment variable of the compiler is correctly set. Alternatively, you can put the environment variable in the configuration file. For details, see 4 Usage. ----End Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 10
Bisheng Compiler Autotuner Feature Guide 4 Usage 4 Usage 4.1 llvm-autotune (Recommended) 4.2 auto-tuner 4.1 llvm-autotune (Recommended) 4.1.1 Tool Introduction Currently, the Autotuner can be used in two modes with two different command line tools, llvm-autotune and auto-tuner. The llvm-auotune allows users to lead the tuning process and provides auxiliary functions to work with the compiler. Compared with the auto-tuner, the llvm- auotune greatly simplifies the configuration and tuning procedure. The llvm- auotune is recommended because it is available out-of-the-box. 4.1.2 Help Information Help command: llvm-autotune -h. The execution format of the llvm-autotune is as follows: llvm-autotune [-h] {minimize,maximize,feedback,dump,finalize} Optional instructions: ● minimize: initializes tuning and generates an initial compiler configuration file to minimize indicators (such as running time). ● maximize: initializes tuning and generates the initial compiler configuration file to maximize indicators (such as throughput). ● feedback: feeds back the performance optimization result and generates new compiler configuration. ● dump: generates the optimal configuration without stopping the tuning (feedback can be continued). ● finalize: stops tuning and generate the optimal compiler configuration (feedback cannot be executed). Help information. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 11
Bisheng Compiler Autotuner Feature Guide 4 Usage ● --help/-h usage: llvm-autotune [-h] {minimize,maximize,feedback,dump,finalize} ... positional arguments: {minimize,maximize,feedback,dump,finalize} minimize Initialize tuning and generate the initial compiler configuration file, aiming to minimize the metric (e.g. run time) maximize Initialize tuning and generate the initial compiler configuration file, aiming to maximize the metric (e.g. throughput) feedback Feed back performance tuning result and generate a new test configuration dump Dump the current best configuration without terminating the tuning run finalize Finalize tuning and generate the optimal compiler configuration optional arguments: -h, --help show this help message and exit 4.1.3 Compiler-related Options llvm-auotune needs to be used with the -fautotune-generate and -fautotune options of the Bisheng compiler. ● -fautotune-generate: – The tuning opportunity list is generated in the autotune_datadir directory. The default directory can be modified by the environment variable AUTOTUNE_DATADIR. – As the first step of tuning preparation, you need to use the option before running the llvm-autotune minimize/maximize command. – You can also assign a value to this option to change the tuning granularity. The options are Other, Function, Loop, and MachineBasicBlock. For example, -fautotune-generate=Function enables the tuning opportunities of the function type. Each function is assigned a different parameter value during tuning. Other indicates global. The generated tuning opportunities correspond to compilation units (code files). -fautotune-generate is equivalent to -fautotune- generate=Function,Loop by default. The default value is recommended. ● -fautotune: – Use the compiler configuration in the autotune_datadir directory for tuning and compilation. (The default directory can be modified by the environment variable AUTOTUNE_DATADIR.) – This option is used after the llvm-autotune minimize/maximize/ feedback command is run during tuning iteration. NOTE For details, see 2.4.2 llvm-autotune (Recommended). 4.2 auto-tuner Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 12
Bisheng Compiler Autotuner Feature Guide 4 Usage 4.2.1 Tool Introduction Currently, the Autotuner can be used in two modes with two different command line tools, llvm-autotune and auto-tuner. The auto-tuner is a traditional tuning tool that manages the entire tuning process. You need to adapt the configuration file to set the details during the tuning, including how to compile and run code, and how to obtain the performance information and tunable parameters. 4.2.2 Help Information Help command: auto-tuner -h. The execution format of auto-tuner is as follows: Auto-tuner [-h] {run,merge,divide,parse,auto_run} ... Optional instructions: ● run: runs the tuner. ● merge: merges multiple compilation configuration files. ● divide: divides a compilation configuration file into multiple files based on the source code file name in the configuration file. ● parse: parses the tuning opportunity list to generate the search space. ● auto_run (recommended): automatically generates the search space and performs the tuning by phase. The default phase sequence is module > function > loop. The three main instructions are parse, run, and auto_run. Help information. ● --help/-h usage: auto-tuner [-h] {run,merge,divide,parse,auto_run} ... positional arguments: {run,merge,divide,parse,auto_run} commands help run Run the tuner merge Merge LLVM configuration input files divide Divide LLVM configuration input file into multiple files based on file_name parse Parse the tuning opportunity files and generate search space auto_run (recommended) auto-generate the search space and run the auto-phase-based tuning (the default order of stages is module -> function -> loop) optional arguments: -h, --help show this help message and exit 4.2.3 Parse Instruction 4.2.3.1 Usage of the Parse Instruction The parse instruction is used to parse the tuning opportunity list and generate the search space. The format of the parse instruction is as follows: auto-tuner parse ... Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 13
Bisheng Compiler Autotuner Feature Guide 4 Usage Mandatory parameter: ● opp_file: tuning opportunity file generated by the compiler Common optional parameter: ● --output/-o : specifies the address of the output file. Help information: ● --help/-h positional arguments: opp_file Opportunity files generated by LLVM optional arguments: -h, --help show this help message and exit --parse-format [{xml,yaml}] choose the format of LLVM auto-tuning- input/opp,(default: yaml) -nf Name [Name ...], --name-filter Name [Name ...] to filter code regions by names when generating search space --func-name-filter Name [Name ...] to filter code regions by function names when generating search space --file-name-filter Name [Name ...] to filter code regions by file names when generating search space -scf SEARCH_CONFIG_FILE, --search-config-file SEARCH_CONFIG_FILE The Search space config file -o FILE, --output FILE output file -tf {machine_basic_block,loop,function,module} [{machine_basic_block,loop,function,module} ...], --type- filter {machine_basic_block,loop,function,module} [{machine_basic_block,loop,function,module} ...] to filter code regions by types when generating search space 4.2.3.2 Filters When the search space is generated, the code regions in the opp file can be filtered based on the region name, function name, file name, and type. If no filter is applied, the search space will contain all code regions. The format of the instruction is as follows: --name-filter Region name 1 Region name 2 Region name 3 --func-name-filter Function name 1 Function name 2 Function name 3 --file-name-filter File name 1 File name 2 File name 3 --type-filter Type name 1 Type name 2 Type name 3 NOTICE These options filter the code regions by matching the text information in the opp file. For example, use file_name to filter the following code regions: --- !AutoTuning Pass: machine-scheduler Name: '%bb.2:if.end' DebugLoc: { File: core_list_join.c, Line: 287, Column: 7 } Function: core_list_insert_new CodeRegionType: machine_basic_block ... Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 14
Bisheng Compiler Autotuner Feature Guide 4 Usage Select the correct value for --file-name-filter from the following options: ● [×] ./core_list_join.c ● [×] /home/user/coremark/core_list_join.c ● [√] core_list_join.c 4.2.3.3 Search Configuration File The search configuration file defines global parameter settings for each type of code region. You can use --search-config-file to specify a personalized search configuration file. If --search-config-file is not specified, the Auotuner uses the default search configuration file. The content of the default search configuration file is as follows: CodeRegion: CodeRegionType: loop Args: VectorizationInterleave: value: [1, 2, 4] type: enum UnrollCount: value: [0, 1, 2, 4, 8] type: enum PeelCount: value: [0, 1] type: enum --- CodeRegion: CodeRegionType: machine_basic_block Args: MachineScheduling: value: ["TopDown", "BottomUp", "Bidirectional"] type: enum --- CodeRegion: CodeRegionType: function Args: InlineThreshold: value: [175, 225, 275, 325, 375, 425, 500] type: enum --- CodeRegion: CodeRegionType: other Args: OptPass: type: selection value: [ipsccp, globalopt, mem2reg, deadargelim, instcombine, simplifycfg, prune-eh, inline, functionattrs, argpromotion, sroa, jump-threading, simplifycfg, aggressive-instcombine, instcombine, tailcallelim, simplifycfg, reassociate, loop-simplify, lcssa, loop-rotate, licm, loop-unswitch, simplifycfg, instcombine, loop-simplify, lcssa, indvars, loop-deletion, loop-unroll, gvn, memcpyopt, sccp, instcombine, jump-threading, dse, loop- simplify, lcssa, licm, simplifycfg, instcombine, globalopt, globaldce, loop-simplify, lcssa, loop-rotate, loop-simplify, instcombine, simplifycfg, instcombine, loop-simplify, lcssa, loop-unroll, instcombine, loop-simplify, lcssa, licm, strip- dead-prototypes, globaldce, constmerge, loop-simplify, lcssa, simplifycfg] When configuring the personalized search configuration file, refer to the preceding default search configuration file. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 15
Bisheng Compiler Autotuner Feature Guide 4 Usage Important Configuration Attributes Key Value CodeRegionType other, loop, function, machine_basic_block type bool, enum, range, permutation, selection Variable Type ● bool: indicates a parameter of the Boolean type. Args: ParamName: type: bool ● enum: indicates a parameter of an unordered set. Randomly select a value from the specified set. Args: ParamName: type: enum value: [0, 2, 4, 8] ● range: indicates a parameter whose value is an integer within the valid range (from 0 to 255). The minimum value and maximum value must be specified. Args: ParamName: type: range min: 1 max: 6 ● permutation: indicates a permutation parameter. Disorder the elements in value to form a permutation. Args: ParamName: type: permutation value: [option1, option2, option3, option4] ● selection: indicates a permutation parameter. Select any number of elements from value to form a permutation in any order. Args: ParamName: type: selection value: [option1, option2, option4, option5] 4.2.3.4 Parse Example Run the following command as an example: auto-tuner parse -o search_space.yaml --type-filter loop module ● opp1.yaml opp2.yaml opp3.yaml is a tuning opportunity list generated by the compiler through -auto-tuning-opp. ● -o search_space.yaml is used to generate the search space file search_space.yaml, which will be used as the input of the run instruction. ● --type-filter loop is used to filter out the loop code regions. 4.2.4 Run Instruction Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 16
Bisheng Compiler Autotuner Feature Guide 4 Usage 4.2.4.1 Running the Tuner The format of the run instruction is as follows: auto-tuner run --search_space Mandatory parameters: ● config_file: tuning configuration file, which is used to configure the compilation and running methods and related paths. ● --search_space : search space file, which is generated by the parse instruction. Common optional parameters: ● --results-log : log file, which is used to record the information generated each time the optimal configuration is found. ● --results-log-details: log file, which is used to record information about each iteration. ● --test-limit : maximum iterations ● --stop-after : The tuning is stopped after the specified time (second). ● --time-after-convergence : If no better compilation configuration is found after the specified time (second), the tuning is stopped. Help information: ● --help/-h positional arguments: config_file The tuning config file. optional arguments: -h, --help show this help message and exit --machine-class MACHINE_CLASS name of the machine class being run on --parallel-compile present if compiling can be done in parallel --test-limit TEST_LIMIT stop tuning after given tests count --stop-after STOP_AFTER stop tuning after given seconds --parallelism PARALLELISM how many tests to support at once --pipelining PIPELINING how long a delay (in generations) before results are available --bail-threshold BAIL_THRESHOLD abort if no requests have been made in X generations --no-dups don't print out warnings for duplicate requests --seed-configuration FILENAME Start search at a given configuration. Can be specified multiple times. Configurations are loaded with ConfigurationManipulator.load_from_file() and file format is detected from extension. --results-log RESULTS_LOG file to store log of the best configuration times --results-log-details RESULTS_LOG_DETAILS file to store log of the non-best configuration times --quiet print less information --display-frequency DISPLAY_FREQUENCY how often for DisplayPlugin to print --technique TECHNIQUE, -t TECHNIQUE which technique to use --list-techniques, -lt Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 17
Bisheng Compiler Autotuner Feature Guide 4 Usage list techniques available and exit --generate-bandit-technique, -gbt randomly generate a bandit to use --label LABEL name for the TuningRun --print-search-space-size Print out the estimated size of the search space and exit --database DATABASE database to store tuning results in, see: http://docs. sqlalchemy.org/en/rel_0_8/core/engines.html#database- urls --print-params, -pp show parameters of the configuration being tuned --time-after-convergence TIME, -tac TIME stop tuning if no new best results after given seconds -o DIR, --output DIR write " "optimal yaml config into the given directory --parse-format [{xml,yaml}] choose the format of LLVM auto-tuning- input/opp,(default: yaml) --plugin-dir DIR specify the dir to load customized tuner scripts -tr TUNER, --tuner TUNER Select which tuner to use -lr, --list-tuners List all available tuners --add-llvm-inputs ADD_LLVM_INPUTS [ADD_LLVM_INPUTS ...] add existing llvm configuration input files asconstants in addition to the llvm configurations generated in each iteration of the tuning run -ss SEARCH_SPACE, --search_space SEARCH_SPACE The search space file. --enable-final-compile perform final compilation with optimal config at the end of tuning 4.2.4.2 Configuration File You need to modify the configuration file, including the system environment variable, compilation information, and running information. For details, see the examples in the Bisheng software package directory /lib/autotuner/config. The following is an example of the configuration file for coremark tuning: # variables that can be shared in all the sections below [DEFAULT] # optional # Home = /path/to/your/home # change your environment variables [Environment Setting] # optional # prepend a list of paths into the PATH in order. # PATH = /path/to/bin # you can also set other environment variables here too. [Compiling Setting] # required # NOTE: ConfigFilePath is set to the path to the current config file automatically by default. CompileDir = %(ConfigFilePath)s/../examples/coremark/ # Specify where autotuner will generate the compilation config (LLVM input file). # This will be passed to the compiler with -auto-tuning-input. LLVMInputFile = %(CompileDir)s/input.yaml BinPath = %(ConfigFilePath)s/../../../bin/ CompileCommand = %(BinPath)s/clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark - mllvm -auto-tuning-input=%(LLVMInputFile)s RunDir = %(CompileDir)s RunCommand = ./coremark 0x0 0x0 0x66 300000 # run 300000 iterations for coremark Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 18
Bisheng Compiler Autotuner Feature Guide 4 Usage # OppDir and OppCompileCommand are optional, do not have to specify this if not using auto_run sub- command. # Specify where autotuner will parse tuning opportunity files from. # This should be set to where the compiler generate tuning opportunity files with -auto-tuning-opp. OppDir = %(CompileDir)s/opp # both -auto-tuning-input and -mllvm -auto-tuning-opp=opp need to be used in the OppCompileCommand directly or indirectly. # -auto-tuning-input is also needed here because auto_run can invoke multiple stages of tuning runs. The later stage needs to take the previous stage's best config to generate tuning opportunities. OppCompileCommand = %(CompileCommand)s -mllvm -auto-tuning-opp=%(OppDir)s 4.2.4.3 Tuners A tuner is an instance used to define specific tuning behavior, including initialization, compilation, running, and testing. The behavior needs to be defined in different ways depending on the specific tuning task objectives. Therefore, we have multiple tuners for different objectives. You can find the sample file of the customized tuner in the Bisheng software package directory /lib/autotuner/ plugin/. ● Create a customized tuner. You can write a Python file to inherit the parent class CustomTunerBase and overwrite some functions as required to create a customized tuner. To register a customized tuner, you need to name the Python file xxx_tuner.py with the suffix _tuner.py and place the file in the tuner plug-in directory. ● Use your own tuner plug-in. If you need to use your own tuner plug-in when running the auto-tuner instruction, use the following option to specify the plug-in directory where the user-defined tuner is located: --plugin-dir ● Select the tuner you want to use. --tuner(or -tr) If you do not specify the tuner to be used, SimpleTuner is used by default. ● Check all tuners. If you want to check all tuners, run the following instruction to list all tuners: --list-tuners (or -lr) The following is an example of a customized tuner for coremark tuning. import os from opentuner import Result from opentuner.search.objective import MinimizeCycle from autotuner.tuners.tunerbase import CustomTunerBase class Tuner(CustomTunerBase): # The run method runs opentuner under the given configuration # and returns the calculated performance under this configuration def run(self, desired_result, input, limit): """ Compile and run a given configuration then return performance """ cycles = float('inf') # create a command for running a executable Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 19
Bisheng Compiler Autotuner Feature Guide 4 Usage run_result = self.call_program(self.run_cmd, cwd=self.run_dir, limit=120) # check if the source program is compiled and run successful if run_result['returncode'] == 0: std = run_result['stdout'] if "Correct operation validated." in std: cycles_line = std.strip().splitlines()[2] cycles = int(cycles_line.replace('Total ticks :', '')) else: if not os.path.isdir('errors_log'): os.mkdir('errors_log') with open("errors_log/errors_" + str(desired_result.configuration.id) + ".log", 'w') as file: file.write(std) print('coremark errors detected') else: self._print_errors(self.run_cmd, run_result) return Result(cycle=cycles, time=run_result['time']) def objective(self): """ Override the default object MinimizeTime """ return MinimizeCycle() To automatically tune the coremark, you need to run the executable file, parse the stdout result, and use cycle as the metric. Therefore, run() and objective() need to be overwritten from the parent class. For more detailed examples, see the scripts in the release package directory plugin/. Currently, the following metrics are supported: ● time (required) ● cycle (optional) ● rate (optional) The metrics need to be transferred with Result as the return value of the run() function. For example: return Result(rate=rate, time=run_result['time']) The tuning objectives corresponding to the three metrics are as follows: ● MinimizeTime() ● MinimizeCycle() ● MaximizeRate() For example, if MinizeTime() is used as the tuning objective, the smaller the Result.time value obtained after the run() function is executed in each iteration, the better the compilation configuration used in this iteration. If MaxmizeRate() is used as the tuning objective, the greater the Result.rate value obtained after the run() function is executed in each iteration, the better the compilation configuration used in this iteration. 4.2.4.4 Search Space File The search space file is a necessary parameter of the run instruction. It defines the detailed search space (such as the code regions and parameters) for the tuning task. The file can be generated from the tuning opportunity list generated by the compiler using the parse instruction. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 20
Bisheng Compiler Autotuner Feature Guide 4 Usage NOTE To specify a search space, use --search-space or -ss. Example: -ss SEARCH_SPACE_FILE The following is an example of a search space file in YAML format: code_region: code_region_type: loop debug_loc: column: 13 file_name: core_list_join.c line: 453 func_name: core_list_init name: while.cond7.i.outer pass_name: loop-unroll params: PeelCount: type: enum value: [0,1] UnrollCount: type: enum value: [0,1,2,4,8] VectorizationInterleave: type: enum value: [1,2,4] tuning_id: 1 --- code_region: code_region_type: loop debug_loc: column: 13 file_name: core_list_join.c line: 443 func_name: core_list_init name: for.body.i pass_name: loop-vectorize params: PeelCount: type: enum value: [0, 1] -0 -1 UnrollCount: type: enum value: [0,1,2,4,8] VectorizationInterleave: type: enum value: [1,2,4] tuning_id: 2 It is very similar to the search configuration file, except that each specific code region corresponds to a set of parameters. 4.2.4.5 Algorithm You can specify a search algorithm to run automatic tuning. For example, if the automatic tuning function is used for debugging, you can use the SimpleTraverse algorithm, which traverses all parameter values and can change only one parameter value at a time. ● List all algorithms. --list-techniques Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 21
Bisheng Compiler Autotuner Feature Guide 4 Usage ● Use a specific algorithm (for example, SimpleTraverse). --technique SimpleTraverse 4.2.4.6 Run Example Run the following command as an example: auto-tuner run config/coremark_sample.ini --plugin-dir ./plugin-dir -tr coremark_tuner --results-log coremark.log --results-log-details details.log --stop-after 3600 --time-after-convergence 600 -ss search_space.yaml The parameters are described as follows: ● coremark_sample.ini: tuning configuration file ● --plugin-dir ./plugin-dir: defines the customized plug-in directory. ● coremark_tuner: specifies the customized tuner stored in the plug-in directory ./plugin-dir. ● --results-log coremark.log: records the performance information of the optimal configuration found in each iteration. ● --results-log-details details.log: records performance information about each iteration. ● --stop-after 3600: The tuning stops after 3600 seconds. ● --time-after-convergence 600: The tuning stops if no better configuration is found after 600 seconds. ● -ss search_space.yaml: uses search_space.yaml as the tuning space file. After the debugging is complete, the optimal configuration is generated as opt_config.yaml. You can use the -o option to customize the name of the optimal configuration file. You can add the Bisheng compiler option -mllvm -auto-tuning- input=opt_config.yaml to this configuration file to make it take effect and generate the optimal binary file. For example, to compile the coremark, run the following command: clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark -mllvm -auto-tuning- input=opt_config.yaml 4.2.5 Auto-run Instruction 4.2.5.1 Usage of the Auto-run Instruction The auto-run instruction is similar to the run instruction, but it automatically generates a search space instead of transferring the search space file through the command line. NOTE This function requires some additional settings in the configuration file, such as config/ coremark.sample.ini. The format of the auto-run instruction is as follows: Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 22
Bisheng Compiler Autotuner Feature Guide 4 Usage auto-tuner auto_run Mandatory parameter: ● config_file: tuning configuration file, which is used to configure the compilation and running methods and related paths. Common optional parameter: ● --stage-order : specifies the sequence of tuning phases. The default sequence is module -> function -> loop. For example, use --stage-order function loop to perform fine-grained function-level tuning and then cyclic tuning. positional arguments: config_file The tuning config file. optional arguments: -h, --help show this help message and exit --machine-class MACHINE_CLASS name of the machine class being run on --parallel-compile present if compiling can be done in parallel --test-limit TEST_LIMIT stop tuning after given tests count --stop-after STOP_AFTER stop tuning after given seconds --parallelism PARALLELISM how many tests to support at once --pipelining PIPELINING how long a delay (in generations) before results are available --bail-threshold BAIL_THRESHOLD abort if no requests have been made in X generations --no-dups don't print out warnings for duplicate requests --seed-configuration FILENAME Start search at a given configuration. Can be specified multiple times. Configurations are loaded with ConfigurationManipulator.load_from_file() and file format is detected from extension. --results-log RESULTS_LOG file to store log of the best configuration times --results-log-details RESULTS_LOG_DETAILS file to store log of the non-best configuration times --quiet print less information --display-frequency DISPLAY_FREQUENCY how often for DisplayPlugin to print --technique TECHNIQUE, -t TECHNIQUE which technique to use --list-techniques, -lt list techniques available and exit --generate-bandit-technique, -gbt randomly generate a bandit to use --label LABEL name for the TuningRun --print-search-space-size Print out the estimated size of the search space and exit --database DATABASE database to store tuning results in, see: http://docs. sqlalchemy.org/en/rel_0_8/core/engines.html#database- urls --print-params, -pp show parameters of the configuration being tuned --time-after-convergence TIME, -tac TIME stop tuning if no new best " "results after given seconds -o DIR, --output DIR write " "optimal yaml config into the given directory --parse-format [{xml,yaml}] choose the format of LLVM auto-tuning- input/opp,(default: yaml) --stage-order stage [stage ...] specify stage order of auto_run. each stage is a code Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 23
Bisheng Compiler Autotuner Feature Guide 4 Usage region type -nf Name [Name ...], --name-filter Name [Name ...] to filter code regions by names when generating search space --func-name-filter Name [Name ...] to filter code regions by function names when generating search space --file-name-filter Name [Name ...] to filter code regions by file names when generating search space -scf SEARCH_CONFIG_FILE, --search-config-file SEARCH_CONFIG_FILE The Search space config file --plugin-dir DIR specify the dir to load customized tuner scripts -tr TUNER, --tuner TUNER Select which tuner to use -lr, --list-tuners List all available tuners --add-llvm-inputs ADD_LLVM_INPUTS [ADD_LLVM_INPUTS ...] add existing llvm configuration input files asconstants in addition to the llvm configurationsgenerated in each iteration of the tuning run The auto-run instruction also automatically performs code region tuning based on different granularities. That is, the auto-run instruction executes three tuning tasks at different code region levels in sequence. The working mode is as follows: In each phase, the optimal configuration found in the previous phase is used as the constant configuration in the next phase, and the tuning task is executed at a finer code region level and corresponding tuning parameters. When each tuning phase is complete, the optimal configuration file corresponding to each phase is generated for the compiler to use. Similar to the run instruction, the optimal configuration file generated by this instruction can take effect by adding the -mllvm -auto-tuning-input=< file path > option of the Bisheng compiler. NOTE All the command line options contained in the run subcommand will be invoked three times in turn in auto_run, because it has three tuning runs. For example, if you use the -- stop-after 10 option to stop the tuning 10 seconds later, the auto-run instruction will stop in 30 seconds because there are three phases. 4.2.5.2 Auto-run Example Run the following command as an example: auto-tuner auto_run config/coremark_sample.ini -tr coremark_tuner --results-log coremark.log --results- log-details details.log --time-after-convergence 600 The auto_run instruction is similar to the run command. The difference is that the auto_run instruction does not require the search space. Similarly, you can use the specified filter to generate a search space, just like the parse instruction. In this example, the optimal configuration files module.yaml, function.yaml, and loop.yaml corresponding to the three tuning phases are generated by default. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 24
Bisheng Compiler Autotuner Feature Guide 4 Usage Select the optimal configuration file for compilation as required. Generally, you are advised to use the last tuning phase configuration file, because it contains all the configuration information of the previous tuning phases. clang -Ilinux64 -I. -DFLAGS_STR=\"" -lrt"\" -DITERATIONS=300000 -g core_list_join.c core_main.c core_matrix.c core_state.c core_util.c linux64/core_portme.c -O2 -o coremark -mllvm -auto-tuning- input=loop.yaml Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 25
Bisheng Compiler Autotuner Feature Guide 5 Appendix 5 Appendix 5.1 Feedback 5.2 Change History 5.1 Feedback If you encounter any problem and need technical support, send the problem information to the Kunpeng compiler forum. 5.2 Change History Date Change History 2021-06-22 This is the fifth official release. The update is as follows: Updated the description of using the Autotuner. 2020-12-12 This is the fourth official release. The update is as follows: Added the description of the llvm-autotune tool. 2020-11-26 This is the third official release. The update is as follows: Added the parameter description of the instructions in Chinese. Added the working mode diagram of the auto-run instruction. 2020-10-29 This is the second official release. The update is as follows: Updated the Autotuner tuning flowchart. 2020-09-28 This is the first official release. Issue 05 (2021-06-22) Copyright © Huawei Technologies Co., Ltd. 26
You can also read