| |
|
|
Trace generation: just some examples

Paraver specifies
a trace format and some mechanisms how the records and the values encoded will
be processed in the visualization. Every record specifies the object to which
it refers (indicating application task and thread) and the absolute time at
which it happens. For each type of record, some additional fields can be encoded
as desired by the user. These fields are:
State records include an integer value that is usually referred
to as the state.
Event records include a user event type and a user event value.
Relation/Communication records include a communication tag and a communication size.
Even if they have this name because it is usual to encode
the tag and size of a MPI communication, Paraver does not rely on these semantics.
The flexibility of
this approach makes it possible to use Paraver for many types of analyses. It
is quite easy to implement instrumentation tools for many systems and purposes.
The main issue in such instrumentation is how to encode the information in the
fields available in the record formats. Special emphasis should be put in a
proper selection of what to encode as state and what (and how) to encode as
events. It is our experience that a clean design of these encoding concepts
results in studies being later carried out with Paraver that were not foreseen
when the analysis was planned.
In this section we briefly describe the encoding criteria of several instrumentation/trace generation
tools that we distribute (see the Software Distribution
section). For more detailed information about these tools refer to
the Tool Documentation section.
All the tools described on this page generate also a paraver
configuration file with simbolic information that includes for instance
the function names and that facilitates to relate the trace information
with the application source code. Most of the tools allow to include
explicit instrumentation (selected points, variables, hardware
counters) and to stop/resume the tracing trough calls to the tool
library.
The programs must be executed on dedicated resources to avoid the large perturbations that OS
scheduling may cause in the presence multiple concurrent applications. OpenMP
The OMPtrace tool instruments parallel codes that use the OpenMP programming model.
This tool generates a Paraver trace
file where the basic activity in an OpenMP program is recorded.
Paraver state and flag records are emmited to reflect the evolution of the application
behaviour.
With Paraver the user can visualize the execution at thread, task or application level. The major encoding
choices are:
Besides getting a qualitative graphical perception of program behavior,
this encoding makes it possible to visualize and measure the load balance,
the profile of parallelism achieved, the percentage of time inside a mutual
exclusion, the conflicts in getting locks and percentage of sequential parts among others.
OMPtrace is currently available on SGI-IRIX and IBM SP machines.
MPI
|
The MPItrace tool instruments parallel codes that use the message passing (MPI)
programming model. This tool generates a Paraver trace file where the basic activity in an MPI program is recorded.
The MPItrace tool assumes each MPI process is single threaded. A tracefile represents
a single MPI program run, thus it includes only one application with several tasks
(as stated in the mpirun command) and one thread per task | 
|
The major encoding
choices are:
States: will record whether the
thread is Running, Waiting for Messages or doing I/O.
Communication: The tag and size are set according to those in the calls.
Physical communication is assumed to be identical to logical communication as
it is not possible through the MPI instrumentation to find out when
the actual data transfer takes place.
Events: are used to
tag the beginning and ending of MPI operations, such as Barriers, Broadcast, AlltoAll, and all kind of Send - Receive calls.
This instrumentation
module provides the typical message passing visualization functionalities.
MPItrace is currently available on SGI-IRIX, IBM-SP and Linux platforms.
OpenMP+MPI
The OMPItrace tool instruments parallel codes based on the OpenMP programming model and/or
applications using the message passing (MPI) programming model.
This tool generates a Paraver trace file where the basic activity of the program is recorded.
The major encoding
choices are: | 
|
States: will record whether the thread is Idle (waiting for work),
Running (application code), Scheduling
(generating work/notifying termination), Waiting for Messages or doing I/O.
Communication: The tag and size
are set according to those in the calls. Physical communication is assumed
to be identical to logical communication as it is not possible through the MPI instrumentation interface
to find out when the actual data transfer takes place.
Events are used to tag the basic program activity. For example:
to mark the entry to a parallel region.
to mark the entry to a work sharing construct.
to read the value of the hardware counters.
to
tag the beginning and ending of MPI operations, such as Barriers, Broadcast, AlltoAll,
and all kind of Send - Receive calls.
OMPItrace is currently available on SGI-IRIX and IBM-SP platforms.
Java and Application Servers
The analysis and visualization of Java Applications is based on two
specific tools: JIS (Java Instrumentation Suite) and JACIT (Java
Automatic Code I
nterposition Tool). They are complementary and can be used to get very
detailed traces of the execution of Java bytecodes without
recompilation. The whole environement is especially intended to perform
Performance Analysis of J2EE Application Servers, and has been
succesfully tested on WebSphere 4
.x and on Jboss 3.x.
JIS is available for Linux 2.4 and 2.5/2.6 platforms and JACIT is a cross-platform Java tool.
The
Java Instrumentation Suite (JIS) gets detailed information from all the
levels involved in the execution of J2EE applications: System, JVM
proces
s, Middleware (i.e. J2EE appserver) and User Application. This
information is automatically generated as a Paraver tracefile. All the
levels are corr
elated to offer a global view of the system execution. To summarize,
the information collected from each level of JIS is described below:
- System
level: Thread scheduling information (extracted from inside the kernel
scheduler) and detailed information of the system calls performed by
the JVM process
- JVM level: Information from the Java
threads is offered (such as their names) and put in relation with
system threads. JVM monitors and raw monitors are also instrumented on
this level. All information is extraced through the JVMPI (Java Virtual
Machine Profiler Interface).
- Middleware level: Information
from the middleware architecture components status is offered by this
level, shown in the generated tracefile as Paraver events on boundaries
of software components.
- Application level: User generated
events can be produced from the Java application bytecode, that later
will be displayed as Paraver events. A native C library is provided
with JIS to allow Java applications to generate user level events on
the Paraver trace produced by JIS, using the Java Native Interface
(JNI).
The Java Automatic Code Interposition Tool (JACIT)
is a cross-platform java tool designed to make it easy the task of
inserting probes on Java codes. With a user-friendly graphical
interface, JACIT allows the insertion of pieces of Java code
(inclunding JNI calls to C or C++ libraries) to be execu
ted before or after any of the methods of a java existing bytecode
without need of recompilation. As a possible use, interposed code can
be composed of calls to a native library interface to JIS.
Performance counters
The infoPerfex tool
relies on the SGI perfex tool and the hardware performance counters interface to
generate a trace containing the values of the performance counters sampled at
periodic intervals. infoPerfex can instrument running applications without having the source code.
The trace only contains
events for a single thread in a single application. Several types of events
may appear in a trace: system calls, context switches, bytes read, bytes written...
and the two selected performance counters (cache misses, floating point operations,
TLB misses...). For all of them the value field represents the actual count in
the previous sampling interval.
The profile of the
above type of events can be displayed with Paraver. This profile can provide
useful information about periodic patterns, phases in the program... This is
quite more useful than only having the global total number.

System
activity
The SCPUs
tool instruments the operating system scheduling. It uses the /proc
interface to obtain information about the existing processes. It can
generate a Paraver trace file where the execution and scheduling of the
processes is recorded.
SCPUs uses all the levels of Paraver process model (thread, task and
application ) and it also records information about the activity of the
different CPUs.
The trace contains
two types of records:
States:
encode the application. The CPU view shows the application that is
active on each processor. Parallel applications use the same state for
all their threads/processes, so the whole application could be painted
using the same color.
Communication: represent the migration of one process between two processors.
It encodes as tag of the message the pair application task to which
the process belongs. The size field encodes the thread/process number within the application.
With this encoding
it is possible to measure the total number of process migrations, to visualize
the migrations suffered by one application, to compute the total system utilization
or to display the profile of processors allocated to one application.
SCPUS is currently available on SGI-IRIX machines.
NanosCompiler
The NanosCompiler
allows the instrumentation of parallel applications. The
instrumentation is based on the generation of calls to an
instrumentation library that gathers information from the hardware
counters of the machine, records the execution status of each thread
and inserts events related to the OpenMP directives.
The major encoding choices are:
| States: are used to indicate the current status of each thread: idle (light blue), running (dark blue), blocked (red), creating work (yellow), or library (green).
Events: are used to signal events during the execution; they
have associated types and values related to the original program and
OpenMP parallelisation, and to display performance statistics (cache
misses, invalidations, ...) gathered from hardware counters | 
|
Dimemas
| The Dimemas simulator generates
message passing traces with similar encoding as the MPItrace. The major difference
is in the specification of the communication. Dimemas can differentiate between
startup and transfer in a communication. So the traces generated by it have
an additional state that encodes the startup part of a communication.
Also quite interesting is that these traces really differentiate between physical communication
(actual data transfer through the network) and logical communication (from the
send request till the return of the receive request). | 
|
The Dimemas
simulator reconstructs the time behavior of a parallel application on a machine modelled by
a set of performance parameters. Thus, performance experiments can be done easily. The
supported target architecture classes include networks of workstations, single and
clustered SMPs, distributed memory parallel computers, and even heterogeneous systems.
For more information on Dimemas click here.
UTE translator
ute2paraver is a filter that translates UTE traces to the Paraver format. UTE is a tracing
tool for IBM SP systems that obtains a fair amount of information about the activity of SP systems
running MPI applications (or MPI+OpenMP). In addition to process activity, UTE records scheduling
information.
The traces thus obtained with ute2paraver can be looked at through the Process model and Resource
model perspective. With the first one, the activity of each thread, how may active threads has each
MPI process or the instantaneous parallelism profile can be visualized.
With the Resource model,
the scheduling of threads to processors or specific activity of each processor can be analyzed.
| 
|
AIX Trace translator
aix2prv is a filter that translates traceso obtained with the IBM AIX trace facility
to the Paraver format. The AIX trace facility allows to collect very low level information on the
processes scheduling, system calls... for all the processes running on a SP node.  |
With this translator now we are able to use all the flexibility and
potential of Paraver to analyze the low level detail information
captured by the AIX trace facility. |
With this new module we can study:
-
the impact of the system processes on the computing applications
-
the migrations between cpus of all the processes and the resources distribution
-
some internals on the libraries implementations
-
...
| 
|
MLP instrumentation
MLP is a programming model developed at NASA AMES where shared memory
regions are allocated to perform the communication between processes.
To support the instrumentation of MLP applications we created a special
version of OMPItrace that intercepts the fork call. Gabriele Jost from
NASA-AMES developed the instrumented version of the library. This
functionality allow them to compare the efficiency of the MLP with
respect to other programming models like MPI+OpenMP.
| 
| We have developped
our own version of the MLP library and modified OMPItrace to intercept
the MLP library calls including information of the hardware counters
related to memory accesses. We are currently stuing the kind of
information provided by these counters and how it can be used to
analyze effect of memory placement on the performance of MLP programs.
Tracedrive preprocessing
| Tracedrive
is a new module we have developed during the last year to help on the
instrumentation process. The first issue faced when tracing a large and
unknown code is to identify the structure of the
application. To avoid having to look at the source code of applications
with hundreds of files, we developed a module that dynamically
instrument an already running binary to collect information on the
dynamic tree call. This information can be later analyzed with a gui
interface to select the set of significative user routines to
instrument with OMPItrace. | 
|
|