Sie befinden sich im Service: RWTH Compute Cluster Linux (HPC)

intelvtune

intelvtune

Kurzinformation

Intel VTune Amplifier is a powerful threading and performance optimization tool for Fortran, C/C++, Java, Assembly and more. It provides both GUI as well as CLI collection and analysis tools and provides a rich set of predefined analysis types. It is also possible to define custom types tailored to one's specific task.


Detailinformation 

To launch the VTune Amplifier GUI:

$ module load intelvtune
$ amplxe-gui
 

Many predefined analysis types are readily available:

  • Basic Hotspots
  • Advanced Hotspots
  • Concurrency
  • Locks and Waits

Those do not require any specific hardware or OS kernel-level support and are available on any machine in the HPC Cluster.

The hardware counter based analysis types like

  • General Exploration
  • CPU Specific Analysis (Bandwidth, ...)

require additional support, which is only available on the cluster-linux-tuning  login-t and login18-t  front end node. Therefore, interactive hardware counter experiments can be performed on this node only. Any user can use VTune Amplifier - it is no longer needed to be registered in a special group in order to use it with hardware counters.

For longer test runs you are kindly invited to use the batch system. You can run Intel VTune Amplifier in the batch in two ways:

  • start an interactive GUI session (see here) and start the GUI as per the instructions above;
  • perform data collection using the CLI tools in a batch job; after the job has finished, use the GUI to analyze the collected results.

The most convenient way to build the CLI command line is to start by creating the desired analysis project in the amplxe-gui GUI. Once you have chosen the analysis type, binary to run, options and so on, click on Command Line... button in the lower right corner of the window. A popup with the full command line will be displayed.

Note: on SLURM, to activate the support for hardware counters (needed for hardware counter based analysis types), you have to add one of the following option to your batch job:

#SBATCH --hwctr=vtune
#SBATCH --hwctr=vtuneperf

These options also sets your job to exclusive mode - mind the resource consumption!  

For further details on how to use VTune Amplifier please contact the HPC team or attend one of our regular workshops.

 

Example batch script for Intel VTune Amplifier with GUI and support for hardware counters

#!/usr/local_rwth/bin/zsh
 
### Job name
#SBATCH --job-name=VTuneGUI-HwC
 
### Request the time you need for execution in minutes
#SBATCH --time=120
 
### Request the amount of memory you need for your job in MB
### Please keep in mind that some analysis types may require a lot of memory
#SBATCH --mem-per-cpu=18500
 
### Request X11-Forwarding for this job
#TBD:
 
### Request Hardware Counters support for VTune Amplifier
### Note that this also makes your job exclusive
#SBATCH --hwctr=vtune
 
 
### Load the module and execute the GUI
module load intelvtune
amplxe-gui

Example batch script for CLI use of Intel VTune Amplifier with support for hardware counters; separate analysis after job completes

#!/usr/local_rwth/bin/zsh
 
### Job name
#SBATCH --job-name=VTuneCLI-HwC
 
### Request the time you need for execution in minutes
#SBATCH --time=120
 
### Request the amount memory you need for your job in MB
### Take into account the memory overhead of the VTune collector
#SBATCH --mem-per-cpu=1850
 
### Request Hardware Counters support for VTune Amplifier
### Note that this also makes your job exclusive
#SBATCH --hwctr=vtune
 
 
### Load modules and execute
### CLI collection of General Exploration type (with hardware counters)
### binary file 'a.out' in $HOME/test_program for user ab123456
### parameters passed to 'a.out' - '200 1 1'
module load intelvtune
amplxe-cl -collect general-exploration -result-dir my_experiment -app-working-dir /home/ab123456/test_program \
    -- /home/ab123456/test_program/a.out 200 1 1

After the job has finished, results will be available in the my_experiment directory and can be loaded in the GUI for analysis.

 

Example batch script for CLI use of Intel VTune Amplifier with support for MPI and hardware counters; separate analysis after job completes

#!/usr/local_rwth/bin/zsh
  
### Job name
#SBATCH --job-name=MPIVTuneCLI-HwC
  
### Request the time you need for execution in minutes
#SBATCH --time=120
  
### Request the amount of memory per MPI rank in MB
### Take into account the memory overhead of the VTune collector
#SBATCH --mem-per-cpu=1850
 
### This is a parallel (MPI) batch job on a single node
### NOTE: multi-node jobs are currently not supported!
#SBATCH --nodes=1
#SBATCH --ntasks=8
  
### Request Hardware Counters support for VTune Amplifier
### Note that this also makes your job exclusive
#SBATCH --hwctr=vtune
 
  
### Load modules, check that all kernel module are available, run CLI collection
module load intelvtune
lsmod | grep -e sep -e pax -e vtsspp
$MPIEXEC -l $FLAGS_MPI_BATCH amplxe-cl -trace-mpi -result-dir my_experiment -collect general-exploration -app-working-dir /home/ab123456/test_program \
    -- /home/ab123456/test_program/a.out 200 1 1

NOTE: MPI jobs utilizing multiple nodes are currently not supported.

 

Example batch script for CLI use of Intel VTune Amplifier for analysis of a single rank in unsupported MPI environments; separate analysis after job completes

#!/usr/local_rwth/bin/zsh
 
### Job name
#SBATCH --job-name=MPIVTuneCLI
 
### Request the time you need for execution in minutes
#SBATCH --time=120
 
### Request the amount of memory per MPI rank in MB
### Take into account the memory overhead of the VTune collector
#SBATCH --mem-per-cpu=1850
 
### This is a parallel (Open MPI) batch job
#SBATCH --ntasks=8
 
 
### Request Hardware Counters support for VTune Amplifier
### Note that this also makes your job exclusive
#SBATCH --hwctr=vtune
  
### Load modules, run CLI collection
module switch intelmpi openmpi
module load intelvtune
cd $HOME/test_program
$MPIEXEC $FLAGS_MPI_BATCH -n 1 amplxe-cl -result-dir my_experiment -collect hotspots a.out 200 1 1 : \
    -n 9 a.out 200 1 1

This runs the collection on rank 0 only.


Zusatzhinweise

Further reading: https://software.intel.com/en-us/blogs/2015/05/26/how-to-profile-mpi-processes-on-all-nodes

zuletzt geändert am 29.01.2021

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz