Intel Inspector 2022.2 on Apocrita¶
As the complexity of HPC applications increases, the management of memory and threading scopes becomes increasingly important. Tools like Intel Inspector are crucial in this context, to effectively identify and resolve a wide array of memory errors and thread synchronisation issues.
Introduction¶
Intel Inspector is a dynamic analysis tool designed to detect and debug memory and threading errors in high-performance computing (HPC) applications. Commonly used with C, C++, and Fortran code, it helps developers identify and resolve issues that can impact performance and reliability. The tool offers both a command-line interface (CLI) for easy scripting and automation, and a graphical user interface (GUI) for a more interactive exploration and analysis of issues.
Accessing Intel Inspector on Apocrita¶
Intel Inspector is included in the Intel oneAPI suite with a variety of other tools. To access it on Apocrita, simply load the modules in order:
module load intel/2022.2 inspector
Optionally, you can install the Inspector GUI client on your local
machine, either as a standalone version or within the
HPC Toolkit version. Note that due to ssh tunnel
limitations, the GUI will only access Apocrita's frontend
for result
visualisation. More details can be found in
the Appendix section.
Setting up the code for Inspector¶
To obtain accurate results and minimise erroneous reporting, compile your code with specific flags for Intel Inspector. This means enabling debug information, disabling optimizations, and omitting runtime checks. For threading analysis, ensure your code creates and uses more than one thread.
Languages | Required Flags | will cause Errors |
---|---|---|
C/C++ | -g -O0 -backtrace -shared-intel |
-fmudflap |
Fortran | -g -O0 -backtrace -shared-intel -check:none |
-check all |
OpenMP | -qopenmp |
Running Intel Inspector with gcc
¶
While this post focuses on using Intel Inspector with intel
compilers,
it's possible to use it with gcc
compilers. This process can be more
complex and may require additional configuration. For assistance, contact
the RSE team to help you use Inspector effectively with your code.
Choosing an Inspector analysis¶
Before running an analysis with Intel Inspector, it's important to know the types of analyses and their scope levels. Inspector offers two types of analysis: memory leak error and threading error. Each type has three scope levels: narrow, medium, and wide.
Narrow scope is fast and resource-efficient but may miss some issues. Wide scope is thorough but slower and more resource-intensive. Medium scope balances speed and thoroughness. The scopes are named as follows:
Memory error analysis | name | scope |
---|---|---|
Detect Leaks | mi1 | narrow |
Detect Memory Problems | mi2 | medium |
Locate Memory Problems | mi3 | wide |
Threading error analysis | ||
Detect Deadlocks | ti1 | narrow |
Detect Deadlocks and Data Races | ti2 | medium |
Locate Deadlocks and Data Races | ti3 | wide |
Some tips for choosing the correct type:
- Use analysis types iteratively, starting with the narrow scope to verify a good setup of both the program and Inspector, while setting expectations for the analysis duration.
- Estimated collection time may be 2 to 320 times longer than the normal execution time.
- Data set size and workload have a direct impact on application execution time and analysis speed.
Comparing methods for running Inspector¶
There are three methods to run an analysis: via qlogin
, a
job script, or the Open OnDemand interface. Each has its advantages
and limitations, depending on factors such as available resources,
urgency, analysis duration, and result visualisation preference.
Method | Advantages | Limitations | Results |
---|---|---|---|
qlogin |
Quick and easy to set up, ideal for simple analyses | Limited access to resources | Command line summary or through the local GUI client |
Job script | Queues normally and runs when resources are available, good for longer or more resource-intensive analyses | Script requires additions, has to go through queue | Command line summary or through the local GUI client |
Open OnDemand | Runs on a compute node with full GUI capabilities, ideal for complex analyses. Quick and easy to setup | Limited access to resources, has to go through queue | GUI client-based |
qlogin
and Open OnDemand are quick and easy to set up but may have
limited resources. Job scripts queue normally and run when resources
are available. Result visualisation options vary based on the method.
Note that the local GUI client can only ssh to the frontend, limiting
its functionality for analysing results. Running projects or analyses
from the GUI on the frontend should be avoided. In contrast, Open
OnDemand provides full GUI capabilities for running analyses and
visualising results.
Setting up environment variables for an analysis¶
When running an Intel Inspector analysis, you need to set up the
required environment variables. We consider two sets of
variables: a general set, and a specific set for the qlogin
and
job script cases. To include these variables, execute the following
commands on the command line or add them to your job script,
adjusting the paths as needed.
General set¶
module load intel/2022.2 inspector # load the modules in order
ulimit -s unlimited # no stack size limit on the OS side
export OMP_NUM_THREADS=${NSLOTS} # number of OpenMP threads
export OMP_STACK_SIZE=512M # size of stack size for OpenMP threads
Specific set¶
export SRCA=/data/home/user/path/to/source/dir/
export SRCB=/data/home/user/path/to/objects/dir/
export SRCC=/data/home/user/path/to/binary/dir/
Program analysis and viewing results¶
We've covered choosing a method for running Inspector, selecting an analysis type, and setting up the program and environment. In this section, we will run an analysis and manage the generated results.
Running an analysis in an interactive session: qlogin
¶
To run an Intel Inspector analysis using qlogin
, follow these steps:
- Start an interactive session with sufficient resources.
- Load the required modules and set your environment.
- Compile your program with the appropriate flags.
- Invoke the Inspector and your program from the command line, ensuring you have enough time for the analysis.
Refer to the naming scheme table in the previous section to select the
correct <analysis>
. Then, run the command below for your program:
inspxe-cl -collect <analysis> \
-search-dir src:r=${SRCA} \
-search-dir sym:r=${SRCB} \
-search-dir bin:r=${SRCC} \
-- ${SRCC}/program
Running an analysis in batch mode: job script¶
To run Inspector in batch mode:
- Create a copy of your job script.
- Pre-compile your program with the appropriate flags.
- Add the required modules and environment variables and program invocation.
- Submit the job using the
qsub
command.
Ensure you request enough resources, as insufficient resources will cause the job to fail. A sample job script to check for race conditions is available in the Appendix.
Viewing the results on the command line¶
To view results using the command line, you can run the following commands:
# -R stands for -report, -r for -result-dir - Replace "dir" with the results' directory
inspxe-cl -R status -r dir/ # brief statement of problems by state
inspxe-cl -R summary -r dir/ # brief statement of problems by type
inspxe-cl -R problems -r dir/ # detailed report of detected problems
inspxe-cl -R observations -r dir/ # detailed report of code locations in problem sets
For more information, consult the documentation. You can
use inspxe-cl -help
to display all available options or run
inspxe-cl -help <option>
to see details for a specific option.
Real-world example¶
For a real-world example, we used the Multipoint Approximation Method for
Aero-Structural optimisation MAM4AS
Fortran code, developed by Yu Zhang, PhD,
Elliot K. Bontoft, PhD, Prof. Vasili Toropov, et al.. We ran Inspector
using the narrow mi3
scope to detect memory access issues. The summary
and
status
commands produce the following brief statements:
$ inspxe-cl -R summary -r r002mi3
27 new problem(s) found
15 Invalid memory access problem(s) detected
1 Memory leak problem(s) detected
10 Uninitialized memory access problem(s) detected
1 Uninitialized partial memory access problem(s) detected
$ inspxe-cl -R status -r r002mi3
27 problem(s) found
1 Investigated
26 Not investigated
Breakdown by state:
1 Confirmed
26 New
Invoking the tool with problems
instead, provides a breakdown of the detected
issues; below is a curated selection of four:
$ inspxe-cl -R problems -r r002mi3
P1: Error: Memory leak: New
P1.33: Error: Memory leak: 240 Bytes: New
/data/home/aax010/git/mam4as/linux/../src/MAM4AS_main.f90(81):
Error X52: Allocation site:
Function mam4as:
Module /data/home/aax010/git/mam4as/linux/bin/MAM4AS
P2: Error: Invalid memory access: New
P2.13: Error: Invalid memory access: New
/data/home/aax010/git/mam4as/linux/../src/doe/Check_for_existing_points.f90(52):
Error X16: Read:
Function mam2_check_for_existing_points:
Module /data/home/aax010/git/mam4as/linux/bin/MAM4AS
P4: Error: Invalid memory access: New
P4.6: Error: Invalid memory access: New
/data/home/aax010/git/mam4as/linux/../src/MAM4AS_main.f90(151):
Error X7: Read: Function mam4as:
Module /data/home/aax010/git/mam4as/linux/bin/MAM4AS
P12: Error: Invalid memory access: New
P12.16: Error: Invalid memory access: New
/data/home/aax010/git/mam4as/linux/../src/linearSolver/linear_solver.f90(121):
Error X21: Read: Function mam2_linear_solver:
Module /data/home/aax010/git/mam4as/linux/bin/MAM4AS
/data/home/aax010/git/mam4as/linux/../src/linearSolver/linear_solver.f90(56):
Error X22: Allocation site: Function mam2_linear_solver:
Module /data/home/aax010/git/mam4as/linux/bin/MAM4AS
The report displays the error type, detection history, and source location. Detailed explanations of error types, examples, and potential fixes can be found on Inspector's website and in the documentation.
Running an analysis in Open OnDemand¶
Apocrita's OnDemand service is a web-based platform that provides easy access to computing resources, ideal for workloads requiring a graphical component. To perform an analysis using Intel Inspector:
- Launch a session using the "Desktop Environment (CPU)" option. Select appropriate resources.
- Open
xterm
to access the terminal on the compute node. - Load the necessary modules and set up the required environment variables.
- Start Inspector by calling
inspxe-gui
in the terminal.
Create a new project to group the results for easier management. Provide details for the program, including the executable name and paths to source, objects, and the executable. Additionally, modify the environment variables as shown, if not already set in the terminal.
After the setup is completed, we can start a new analysis using the available graphical options. The different analysis types and scopes can be set here, as well as a multitude of other options. If needed, we can introduce custom types too. On the bottom right, there is the option of extracting the actual command prompt; useful to replicate testing with the other two methods.
Starting the analysis, the collection of data commences, including memory usage and the use of other threads and processes. Notice the output windows for the program on the left, and Inspector on the right.
When the collection finishes, it will automatically open the results screen. This screen will provide all information about the detected issues; the types of errors, the source files and lines, the module that generated them, and more. On the bottom, the relevant code snippet from the highlighted issue will appear, as well as the stack image at the time of the error.
Interactive Debugging with Intel Inspector¶
Setting Up the Debugger¶
Intel Inspector can pause execution and request a debugger when an issue
is detected. To set up the debugger, choose between the standard gdb
module or Intel's gdb-oneapi
debugger
module.
For gdb
, run module load gdb
before opening the GUI. For Intel's
gdb-oneapi
, force-load the GCC module and debugger:
module load debugger # findable once intel/2022.2 is loaded
module load -f gcc/12.1.0 # library dependencies for gdb-oneapi
In the Inspector GUI, select the preferred debugger at "File" ->
"Options" -> "Debugger". Provide the Intel gdb-oneapi
path, if needed.
/share/apps/centos7/intel/compiler/2022.2/debugger/2021.6.0/gdb/intel64/bin/gdb-oneapi
Update the path by running which gdb-oneapi when newer versions are released. Future releases of Inspector may automate this process.
Debugging Process¶
When configuring the analysis, there are two available options:
- Enable debugger when problem detected: Stops at every error found for consecutive issue investigation.
- Select analysis start location with debugger: Runs the program without analysis until enabled, allowing quick navigation to breakpoints.
Several in-depth walkthroughs and tutorials on using gdb
can be found
online. For a quick start to an interactive debugging session, you can use
the following commands:
step # to go to next instruction
next # similar to step, but does not go inside functions
continue # resumes normal execution
backtrace # displays the call stack
backtrace full # also shows local variables in the call stack
Consult the official documentation for more information on how to use interactive debugging in project development.
Using Inspector with MPI¶
Intel Inspector can be used to analyse the correctness of MPI applications at the intra-process level, while Intel Trace Analyzer and Collector tool is used for inter-process level analysis. Intra-process parallelism analysis focuses on performance with Intel VTune Profiler and correctness with Intel Inspector within individual processes, which often use fork-join threading through OpenMP or Intel oneTBB.
The intra-process MPI analysis workflow consists of three main steps:
data collection using amplxe-cl
and inspxe-cl
command-line tools,
post-processing (finalisation or symbol resolution) of the collected data,
and analysing data through the GUI standalone viewer for each process. Note
that there are certain limitations for MPI profiling support, such as no
support for MPI dynamic processes, and hardware event-based sampling collector
limitations.
To collect correctness data for an MPI application with Intel Inspector, use the following command:
mpirun -n <N> inspxe-cl -r my_result -collect <analysis type> my_app [my_app_options]
Result directories are created for each analysed process in the job, named
as my_result.0
- my_result.3
. To collect data for a subset of MPI
processes, use the per-host syntax of mpirun
/mpiexec
.
When using Inspector with MPI, consider passing the -quiet
/-q
option
to inspxe-cl
to prevent excessive diagnostic output from cluttering the
console. Additionally, employ the -l
option for mpiexec
/mpirun
to
label "stdout" lines with their corresponding MPI rank.
Conclusion¶
In this post, we have provided an overview of Intel Inspector and its role in detecting and diagnosing memory and threading errors in HPC applications. We've covered the integration process on Apocrita, outlining three distinct strategies. Additionally, we expanded on setting up the GUI, a real-world example of detecting and fixing issues, and interactive debugging. Moreover, we have briefly discussed how Intel Inspector can be used with MPI applications to analyse their correctness at the intra-process level.
While we touched on the key features and techniques, some topics like handling regressions and suppression were not covered in detail. These themes can be explored further in the official Intel Inspector documentation and other available resources.
As always, the Research Software Engineering (RSE) team is here to help you with any issues or questions related to the topics covered in this post. Feel free to reach out to us for assistance with Intel Inspector, performance optimization, or any other challenges you may encounter in your projects.
Appendix¶
Setting up and using the local Inspector client¶
Follow the installation instructions in the standalone version or HPC Toolkit version links. Connect to Apocrita using:
ssh -XY apocrita
or, alternatively, add ForwardX11 yes
in the SSH config
file.
Constant forwarding may impact your bandwidth to and from Apocrita.
After logging in, load the necessary modules and launch the
Inspector GUI with inspxe-gui
. Use the GUI to view analysis results
from the directory where they are saved. However, avoid starting new
analyses using the local Inspector client as you are connected directly
to the frontend
.
Sample job script¶
#!/bin/sh
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=240::
#$ -l h_core=1G
#$ -l h_vmem=3G
#$ -N intelInspector
module load intel/2022.2 inspector # load the modules in order
ulimit -s unlimited # no stack size limit
export OMP_NUM_THREADS=${NSLOTS} # number of OpenMP threads
export OMP_STACK_SIZE=512M # size of stack size for OpenMP threads
export SRCA=/data/home/user/path/to/source/dir/
export SRCB=/data/home/user/path/to/objects/dir/
export SRCC=/data/home/user/path/to/binary/dir/
# collect memory leaks: mi1, mi2, mi3
# collect data races, deadlocks: ti1, ti2, ti3
# search [r]ecursively, [p]riority
inspxe-cl -collect ti1 \
-search-dir src:r=${SRCA} \
-search-dir sym:r=${SRCB} \
-search-dir bin:r=${SRCC} \
-- ${SRCC}/program
Title image: Generated by Simon Butcher, using Stable Diffusion.