Optimisation, Profiling and Debugging

Profiling Tools


This tutorial will guide you how to run intel profiling tools in AMD processors, we explore also the type of code we can profile.


Project setup

First of all, we prepare an executable to run the tests. You can use any code to run these examples. Here we use for the nqueens example provided by advisor installation. We copy it from advisor installation directory:

cp /dcsrsoft/spack/external/intel/2021.2/advisor/2021.2.0/samples/en/C++/nqueens_Advisor.tgz .

Then, extract the contents and compile the serial version:

make 1_nqueens_serial

Creating a project

We create a project using advisor gui:


We configure the path of our nqueens executable (or the executable you want to profile), and we click on OK.


Several analysis are proposed :


We start with Vectorization and Code insights which will give us information about the parallelization opportunities in the code. It identifies loops that will benefit most from vector parallelism, discover performance issues, etc. The summary window will give us more details.


To use Advisor in the cluster, it is better to use the command line. The GUI can provide the commands we should run. Let’s run the survey, to see the command to run, click on the following button



This will show the exact command to use:


We can copy that line in our slurm job:

#SBATCH --job-name test-prof                                                                                                                                                                                                          
#SBATCH --error advisor-%j.error                                                                                                                                                                                                      
#SBATCH --output advisor-%j.out                                                                                                                                                                                                       
#SBATCH -N 1                                                                                                                                                                                                                          
#SBATCH --cpus-per-task 1                                                                                                                                                                                                             
#SBATCH --partition cpu                                                                                                                                                                                                               
#SBATCH --time 1:00:00

dcsrsoft/spack/external/intel/2021.2/advisor/2021.2.0/bin64/advisor -collect survey -project-dir /users/cruiz1/profilers/intel/advisor/nqueens_study --app-working-dir=/users/cruiz1/profilers/intel/advisor/nqueens_Advisor -- /use\

we launch the job:

sbatch slurm_advisor.sh

check for errors in Slurm output files.

Checking results

If we close and reopen the project, we see that we have some results:


We have recommendations for using other instruction sets because no vector instruction set was detected.



We see the most time consuming loops:


It detects correctly the AMD CPU


In the survey window we can observer the time consuming parts of the code. Each line on the table represent either a function call or a loop. Several useful information is presented by line such as: vector instructions used, length of the vector instruction and type of data.

image-1649337547455.png On the window above, we should see recommendation about the vector instructions to use. This is missing probably due to the fact that we are using an AMD processors. Compilation of code using Intel compiler did not help.

The lower half of the screen shows the following tabs:

On the top down tab, we can see where the call is taking place:


Below a screenshot of the code analysis window. 


Collecting trip counts

We choose characterization analysis. To improve the analysis we should choose a loop, this can be done on the survey window:

image-1649337589080.pngAnd then launch the characterizitation, again we ask for the cmd line :


The generated command will contain the additional options:

tripcounts -flop -stacks -mark-up-list-2

We can see the different trip counts for each loop:


We can now repeat the process for memory access analysis. After running the analysis, we have new information:

image-1649338844790.pngIf we compile the code with more performant instruction set, this is detected in the summary window:


and the call stack window: