Shell scripts for ProbeVue

This section describes shell scripts for running ProbeVue command.

Shell helper programs for ProbeVue

The following shell scripts are useful when running ProbeVue:

sprobevue

Shell script that wraps all arguments in double quotes:

#!/usr/bin/ksh
#
# sprobevue:
#
#	Simple helper function for probevue
#	Wraps arguments to probevue in double quotes
#
#	Usage: sprobevue <probevue flags> <script> <args>
#		Doesn't support the -c and -A flags of probevue
#

usage()
{
	echo "Usage: sprobevue <probevue flags> <script> <args>" >&2
	echo "	Doesn't support the -c and -A flags of probevue" >&2
	exit 1
}

CMD=probevue
# Generate command to execute

while getopts 'c:A:I:s:o:t:X:' zargs
do
	case $zargs in
		I|s|o|t|X) CMD="$CMD -$zargs $OPTARG"      	;;
		?) usage
	esac
done

shift $(($OPTIND -1))

if [ -n "$1" ]
then
	CMD="$CMD $1"
	shift
fi

for i
do
	CMD="$CMD \"$i\""
done

# Execute command 
$CMD

prgrep

Shell script that prints process ID given process name:

#/usr/bin/ksh
#
# prgrep:
#
#	Simple helper function for probevue
#	Prints all process IDs with given process name
#
# 	Need options to print only one process
#	  to print process belong to a certain UID

#
#	Usage: prgrep <process_name>
#	       prgrep -p <processID>
#	


usage()
{
	echo "Usage: prgrep <process_name>" >&2
	echo "       prgrep -p <process_ID>" >&2
	exit 1
}

[ -z "$1" ] && usage

if [ $1 = "-p" ]
then
	[ -z "$2" ] && usage
	pid=$2
	export pid
	ps -e | awk 'BEGIN {pid = ENVIRON["pid"]} {if ($1 == pid) print $4}'
else
	pname=$1
	export pname
	ps -e | awk 'BEGIN {pname = ENVIRON["pname"]} {if ($4 == pname) print $1}'
fi

ProbeVue error messages

As described earlier, running the probevue command requires privilege. If an ordinary user tries running the probevue command, the RBAC framework detects this and fails the execution of the command immediately.

$ probevue kernel.e
ksh: probevue: 0403-006 Execute permission denied.

The Authorizations and privileges section of the Running ProbeVue describes how to enable non-root users with the authorizations and privileges to issue the probevue command.

The ProbeVue compiler, which is built-in to the probevue command, prints detailed error messages during the compilation phase when it detects any syntax errors, semantic errors or type incompatibility errors. Consider the following script:

/* Syntax error example:
 * syntaxbug.e
 */
@@BEGIN
{
        int i, j, k;

        i = 4;
        j = 22;

        k = i _ z;

        printf("k = %d\n", k);

        exit();
}

The preceding script has a syntax error on line 11 at column 15, the assignment statement. Instead of a minus symbol (-) or an underscore symbol (_) was typed by mistake. On running the script, the ProbeVue compiler catches this error and generates an error message:

# probevue syntaxbug.e    
syntaxbug.e: token between line 11: column 15 and line11: column 15: , expected 
instead of this token

The ProbeVue compiler also invokes internal system calls to check if the probe specifications in the Vue script are valid. A common error is to pass an invalid process ID or the process ID of an exited process in the probe point tuple. Another common error is to forget to pass a process ID as an argument on the command line when the script expects one. Consider the following script:

/* simpleprobe.e 
 */
@@syscall:$1:read:entry
{
        printf("In read system call: thread ID = %d\n", __tid);
        exit();
}

The preceding script requires a process ID as an argument to replace the '$1' variable in the probe point tuple at line 3. The kernel will return an error if you tries to probe a process that has exited or does not exist. It also fails if the process ID indicates a kernel process or the init process. Further, you cannot probe a process that does not belong to you unless you have the required privileges to probe another user's processes. You can use the prgrep command with the -p flag to print the process name given a process ID.

Note: This command produces an empty output if the specified process ID does not exist.
# probevue simpleprobe.e 233
probevue: The process does not exist.
ERR-19: Line:3 Column:3 Invalid probe string
# prgrep -p 232
#
# probevue simpleprobe.e 1  
ERR-19: Line:3 Column:3 Invalid probe string
# prgrep -p 1  
init
# probevue simpleprobe.e  
ERR-19: Line:3 Column:3 Invalid probe string

The probevue command can also detect if an unprivileged user tries to access kernel variables. Consider the kernel.e script from the sample programs section. The following example session shows what happens if you try running this as an unprivileged user:

$ probevue kernel.e
ERR-56: Line:93 Column:39 No authority to access kernel variable
ERR-56: Line:99 Column:23 No authority to access kernel variable
ERR-56: Line:100 Column:24 No authority to access kernel variable
ERR-56: Line:101 Column:25 No authority to access kernel variable
ERR-56: Line:102 Column:24 No authority to access kernel variable
ERR-102: Line:140 Column:13 Operation not allowed
ERR-46: Line:140 Column:9 Invalid Assignment, Type mismatch

After the Vue script has been compiled successfully, the probevue command invokes a system call to start a new ProbeVue session passing the intermediate code generated by the compiler. The system call will fail if the ProbeVue framework fails to initialize a new ProbeVue session. There can be several reasons for this. For example, starting the new session can cause memory resources for the user to exceed the administrator-specified limits. The session can need more memory resources than allowed for a single session. There can be unauthorized functions used in the interval probe manager. One of the processes being probed can have exited after the compilation phase checks were made. When the session cannot be started, the kernel fails the system call returning a unique 64-bit error.

The ProbeVue framework can abort a successfully started and active ProbeVue session if a severe or unrecoverable error is encountered while issuing the probe actions. Possible errors include exceeding session or user memory limits (memory requirements for thread-local variables and list variables can grow as the session progresses), exceeding temporary string or stack area limits, accessing out-of-array indexes, attempting to divide by zero, and so on. In all cases, the kernel will return a unique 64-bit error number while terminating the session.

When the session is failed whether at start or after it has been successfully started, the probevue command prints a generic error message including the unique 64-bit error number in hexadecimal format and exits. The following chart provides the meaning of some common 64-bit errors that could be returned by the kernel:

Kernel error Meaning Occurs at
0xEEEE00008C285034 Out of memory while allocating primary trace buffers. Session start
0xEEEE00008C285035 Out of memory while allocating secondary trace buffers. Session start
0xEEEE00008C52002B Out of memory while allocating storage for probe specification strings. Session start
0xEEEE000096284122 Out of memory while allocating storage for thread-local storage. Session start
0xEEEE000081284049 Use of user-space access functions in the interval probe manager. Session start
0xEEEE0000D3520022 Number of sessions limit for regular users. Session start
0xEEEE000096284131 Illegal address passed to the get_userstring function. Executing probe action
0xEEEE00008C520145 Maximum thread limit hit for thread local variables. Executing probe action

RAS events functions

The "RAS events" functions are a privileged set of Vue functions provided for very specialized system or application debugging purposes. They are not intended for general use. They provide system tracing and dumping facilities. Many of these functions are "pass through" functions that allow a Vue script to directly invoke kernel services, and hence there are risks involved with using them. You need special privileges to successfully invoke these functions in your Vue script: you must either be root or have the aix.ras.probevue.rase authorization.

To avoid the risk of these functions, pass the -K flag to the probevue command. Otherwise, these functions simply disappear from the Vue language completely.

Generating a trace recor

d

The Vue functions for generating system trace (and LMT trace) records have similar syntax to the kernel interfaces that they invoke under the covers. Hence, writing trace records from a Vue script is no different from doing so elsewhere. There are some restrictions as follows:

  • If system trace is not started, or the hookid value is not being captured by system trace, these operations do not produce system trace records (LMT tracing to the common buffer for TRCHKLx traces will still be attempted, but LMT might also be disabled).
  • You cannot generate a trace record from within a @@systrace Vue clause. Calls to the tracing functions only generate the LMT common buffer trace records for TRCHKLx traces in this case, assuming that LMT is enabled.
  • You cannot probe these ProbeVue-generated trace events; only kernel and application generated tracing can be probed.
  • You must be privileged, either as root or with the aix.ras.probevue.rase authorization.

The following Vue functions exist for writing system trace records. All data words are of type long long integers:

TRCHKL0(hookID)
Trace with no data words.
TRCHKL1(hookID, D1)
Trace with 1 data word.
TRCHKL2(hookID, D1,D2)
Trace with 2 data words.
TRCHKL3(hookID, D1,D2,D3)
Trace with 3 data words.
TRCHKL4(hookID, D1,D2,D3,D4)
Trace with 4 data words.
TRCHKL5(hookID, D1,D2,D3,D4,D5)
Trace with 5 data words.
void trcgenk(int channel, int hook_ID, unsigned long long data_word, int length, untyped buffer)
Trace a buffer.

These trace functions always append a timestamp to the event data. The hookid parameter to these functions is of the form 0xhhhh0000. This does not mean that the hookid value is required to be a constant, it just indicates how a hookid value is formed.

Note: Obsolete 12-bit hookid values will use the leftmost three hex digits, and the 4th digit will be zero.

With the trcgenk kernel service, the buffer parameter is a pointer to length bytes of data to trace, at most 4096 bytes. The buffer parameter can be an external variable like a kernel or application pointer to pinned data, or a script variable like a Vue string or structure instance. The "untyped" specification is a shorthand for this.

Note: The trcgenk kernel service only traces to the system trace, not to the LMT trace buffers.

You can use a non-zero channel number, but you must ensure that the specified channel is enabled for tracing. The return value from the trace command that started the trace of interest can be passed to the Vue script for this purpose. Using a disabled channel will result in no tracing.

These tracing functions do not return a value.

For more information, see the trcgenk kernel service in Technical Reference: Kernel and Subsystems, Volume 1 and Macros for recording trace events in Performance management.

Stopping the trace

To freeze the system trace as soon as possible after a required event has occurred, you can use void trcoff() in a Vue script. This function disables channel zero tracing immediately. You must still stop the trace in the normal way, with the trcstop command external to ProbeVue, in order for trace processing to be completed normally.

You can immediately stop LMT and component traces so that ongoing tracing does not wrap data of interest. The corresponding resuming functions are needed because there is no command line equivalent available to restart these traces. There are following new Vue functions:

void mtrcsuspend()
void ctsuspend()
void mtrcresume()
void ctresume()

The ctsuspend routine stops all component tracing. You cannot use this routine for selective trace stop by component. It stops component trace only, not any other tracing that the CT_HOOKx macros might have requested, such as system and LMT trace recording.

You must use these trace control functions with caution, as there is no serialization of the kernel tracing code being affected. You must manually ensure that only one script or command will be affecting tracing at a time.

Stopping the system

You can terminate the system and take a full dump using the following routine:

void abend(long long code, long long data_word, ...)

This routine is similar to the abend kernel service, except that only up to 7 data parameters (which will be loaded into registers r3 through up to r10) are accepted here.

Untyped parameters

In function prototypes to follow, some parameters of the equivalent kernel functions are typed ambiguously. The Vue compiler generally performs type checking on all parameters passed to a Vue function, but the parameters designated as having an "untyped" type are exempted from type checking. For example, an optional string might be passed as NULL when using these kernel services directly in the kernel, but if the Vue function was defined as taking a parameter of type String, a NULL cannot be accepted. To avoid the inconvenience of having to pass an empty string instead and to let the Vue functions take the same parameters as the following kernel interface, these functions have been defined as taking untyped parameters. An untyped parameter provides us with the liberty of passing NULL instead of a real Vue string, but be careful when specifying values for "untyped" parameters, because the compiler will accept any type for the parameter.

Note: There is really no "untyped" variable specification in the Vue language. It is just used as a shorthand notation.

Taking a live dump

ProbeVue services for most of the kernel live dump capability are provided, and strongly resemble the corresponding kernel services. For detailed information on kernel live dump services, see the livedump kernel service in Technical Reference: Kernel and Subsystems, Volume 1.

An exception to this general similarity is the ldmp_parms structure, which is not exposed at the script level. Instead, the ldmp_setupparms built-in function owns a private instance of this structure, which is allocated and returned to the caller indirectly as a 64 bit cookie which must be passed to subsequent live dump services in its place. Only one session can use the private structure at a time. You can use the other live dump services to resemble the syntax of their kernel counterparts. Because of this hidden allocation (as well as hidden allocations made by the kernel live dump services themselves), it is necessary to call either the ldmp_freeparms kernel service or the livedump kernel service once the ldmp_setupparms kernel service has been called and returned successfully. Otherwise, the current session will continue to own the private structure, causing all future ldmp_setupparms calls to fail. After the private structure has been released, it can no longer be used by its former owner without another ldmp_setupparms call. Do not use the LDT_POST flag with the ldmp_setupparms kernel service, as that implies an unsupported future reference to the hidden structure.

The typical live dump application must hold the hidden structure for only a very brief interval, typically within a single probevue clause. The hidden structure is owned by the session, and can actually be used by any Vue clause in that session. The framework will release the private structure, and other kernel resources, with the ldmp_freeparms kernel service automatically when the ProbeVue session terminates.

As the ldmp_parms structure elements are not visible to ProbeVue, those that require or permit initialization by the caller are set using extra parameters passed to the ProbeVue version of the ldmp_setupparms kernel service instead.

long long ldmp_setupparms(String	symptom,required symptom string
untyped	title,        dump title string or NULL
untyped	prefix,  	    dump file name prefix string or NULL
untyped	func,         failing function name string or NULL
long long errcode,   error code
int flags,           dump characteristics
int prio )           dump priority

The preceding ldmp_setupparms Vue function is an interface to the kernel service of the same name, except that the ldmp_parms structure is not visible to the calling Vue script. The value returned must be passed to the other live dump services as a substitute for the pointer to an ldmp_parms structure, although it is typed as a 64 bit integer.

The symptom string is a required String operand, while the title, prefix, and func strings are optional. Pass either a String or NULL for these three parameters. All String values must be local to the Vue script. The flags and prio parameters can be zero, or values from the kernel header file sys/livedump.h. The appropriate integer constants must be used here, although there is an alternative.

The following values are useful values for the flags parameter:

	LDT_ONEPASS		0x02	limit dump to one pass
	LDT_NOADDCOMPS	0x08	components can’t be added by callbacks
	LDT_NOLOG			0x10	no error is to be logged
	LDT_FORCE			0x20	force this dump

Because the dump will be taken from ProbeVue’s disabled internal environment, it must be a serialized, synchronous, one pass dump.

The following values are acceptable values for the prio parameter:

	LDPP_INFO			1	informational dump
	LDPP_CRITICAL	7	critical dump (this is the default)

If zero is specified for the prio parameter, LDPP_CRITICAL is defaulted by the ldmp_setupparms kernel service. Only a non-zero value will be stored in the hidden ldmp_parms structure to override this.

The return value upon success will be a positive cookie representing the ownership of the hidden ldmp_parms structure.

On any failure, the return value will be negative as follows:

Value Description
EINVAL_EVM_ COOKIE Indicates that the private ldmp_parms structure is not available.
EINVAL_EVM_ STRING Indicates that a String valued parameter is not valid.

All of the subsequently described Vue functions return failure indications in a similar fashion, with a negative kernel error number:

Value Description
EINVAL_EVM_ COOKIE Indicates that the caller did not correctly specify a cookie showing ownership of the private ldmp_parms structure.
EINVAL_EVM_ STRING Indicates that a String valued parameter is not valid.
EINVAL_EVM_ EXTID Indicates that the extid parameter is not supported, and must be zero.

Other kernel error numbers can be passed back by the following kernel services:

long long ldmp_freeparms (long long cookie)
After the ldmp_setupparms kernel service has returned successfully, the internal ldmp_parms structure has been allocated to the running Vue script. You must free this resource, plus other kernel-internal resources allocated by the services that add components to your dump, by either taking the dump by calling the livedump kernel service, or by calling the ldmp_freeparms kernel service. This releases the internal ldmp_parms structure for future use.
long long livedump (long long cookie)
After the ldmp_setupparms kernel service and at least one of the various services that add components (and pseudo-components) to a dump have been called, the dump is requested by the livedump service. This service produces the actual live dump in the /var/adm/ras/livedump file according to the specifications provided through the ldmp_setupparms kernel service and the other live dump services invoked. The cookie parameter is the cookie returned by the initial call to the ldmp_setupparms kernel service. The return value will be zero if the dump was successfully taken, EINVAL_EVM_COOKIE if the cookie is not valid, and another kernel error number if an error occurs during kernel livedump processing.
long long dmp_compspec(long long flags, DCF_xxx flags defined in sys/dump.h untyped comp, component to be added (by ras_block_t, name, alias, and so on.) long long cookie, cookie returned by ldmp_setupparms long long extid, not supported – must be zero untyped p1, first possible component parameter ... ); additional component parameters
You can add any component that supports live dump to the live dump by calling this service, which is identical in function to the kernel service with the same name except the following situations:
  • The extid parameter, which allows a dmp_extid_t (long) to be returned in the kernel programming environment, is unsupported and must be zero. EINVAL_EVM_EXTID will be returned otherwise. There is no way to pass a pointer to ProbeVue memory to receive this value, which might then be used with the dmp_compext kernel service, which is therefore also not supported in ProbeVue. Instead, you can call the dmp_compspec service multiple times.
  • The kernel service allows any number of parameters p1, p2, and so on where an additional NULL parameter must follow the last actual one to terminate the parameter list. The Vue function only accepts at most four parameters of p1, p2, and so on. The last must still be zero to tell the kernel service how many of these parameters there are, so in effect you can specify only up to 3 interesting values. The interface will automatically force the parameter following the last one of up to 3 variable parameters to zero, to ensure that this rule is followed.
  • The comp parameter can be a long, a kernel ras_block_t address, or a String as appropriate. The type is not checked.
  • The kernel #define flag values are not part of ProbeVue.
long long ras_block_lookup(String path)
This function locates the ras_block_t corresponding to the component path name parameter. This can be useful for calling the dmp_ct kernel service, which requires such an address, if you cannot more easily find the address in a kernel variable.

The return value from this function is either the kernel address of the requested ras_block_t, or NULL if the ras_block_t cannot be found.

The following functions are all simple "pass through" functions that allow a Vue script to directly invoke the corresponding kernel services. Some of the parameter lists have unused members for compatibility with the kernel, so you can use the kernel documentation directly. The value 0 must be passed for parameters shown as unused. You can use these services in the same way as their kernel counterparts, except that the address of an ldmp_parms structure is replaced by the cookie returned from the ldmp_setupparms kernel service.

As always, a negative return value indicates an error. This can be a kernel error number from the following kernel service, or from the interface routines if the cookie or a string is incorrect. The following interfaces provide most of the flexibility available to kernel or kernel extension driven live dumps.

long long dmp_context (long long	flags,DCF_xxx flags from dump.h 
long long	cookie,		cookie returned by ldmp_setupparms 
long long	name,		unused by this function 
long long ctx_type,	DMP_CTX_xxx flags from dump.h untyped p2) 
parameter dependent on ctx_type (NULL, mst addr, cpuid, tid)
long long dmp_ct(	long long flags,	DCF_xxx flags from dump.h
long long cookie, 	cookie returned by ldmp_setupparms
long long name, 		unused by this function 
untyped	 rasb, 			component’s ras_block_t pointer
long long size) 		amount of CT buffer to dump or 0 for all
long long dmp_eaddr( long long flags, DCF_xxx flags from dump.h
long long cookie,	 	cookie returned by ldmp_setupparms
String	name,      	cdt name
untyped	 addr,      	first address to dump
long long size)     number of bytes to dump
long long dmp_errbuf(long long flags,	DCF_xxx flags from dump.h
long long	cookie,	  	cookie returned by ldmp_setupparms
long long	name,    	unused by this function 
long long	erridx,	 	0 for global error log, or wpar id
long long	p2)      	unused
long long dmp_mtrc(long long flags, DCF_xxx flags from dump.h
long long	cookie,		cookie returned by ldmp_setupparms
long long	name,     unused by this function 
long long	com_size,	amount of LMT common data to dump
long long	rare_size)amount of LMT rare data to dump
long long dmp_pid(	long long	flags,	DCF_xxx flags from dump.h
long long	cookie,		cookie returned by ldmp_setupparms
long long	name,		unused by this function 
long long	pid,			id of process to dump
long long	p2)			unused
long long dmp_systrace (long long flags,	DCF_xxx flags from dump.h
long long	cookie,		cookie returned by ldmp_setupparms
long long	name,		unused by this function 
long long	size,		amount to dump
long long	p2)			unused
long long dmp_tid(	long long	flags, DCF_xxx flags from dump.h
long long	cookie,		cookie returned by ldmp_setupparms
long long	name,		unused by this function 
long long	tid,			id of thread to dump
long long	p2)			unused
Note: You must call the ldmp_freeparms kernel service after any failure in the preceding routines, assuming you then want to abandon the dump.

The following script is an example that takes a very small, simple live dump. The kernel symbol dc_data exports a structure from the kernel, whose actual shape and contents are of no importance to this example.

_kernel struct {int i1; int i2; int i3; int i4;} dc_data;

@@BEGIN
{
	long long ldmp_parms;
	long long rc;

	rc = ldmp_setupparms(	"dc_data dump",
				"My Sample Dump",		/* dump title */
				"pvdump",		/* dump path prefix */
				NULL,			/* no function name */
				0x1122334455667788LL,	/* error code */
				0x10,			/* LDT_NOLOG flag */
				0);			/* default dump prio */
	printf("ldmp_setupparms rc = %016llx\n", rc);
	if (rc < 0) {
		exit();
	}

	ldmp_parms = rc;	/* cookie for other livedump functions */

	/*
	 * Add 16 bytes of kernel data to sample dump.
	 * Note that "dc_data" passes the structure's address.
	 */
	rc = dmp_eaddr(0, ldmp_parms, "dc_data", dc_data, sizeof(dc_data));
	if (rc) {
		printf("dump_eaddr failed: %llx\n", rc);
		ldmp_freeparms(ldmp_parms);
		exit();
	}

	/*
	 * Take the sample live dump.
	 */
	rc = livedump(ldmp_parms);
	if (rc) {
		printf("livedump failed: %llx\n", rc);
	}

	exit();
}

Using #define symbols for live dump flags

The following sample shell script, probe.dump, can be useful if you prefer to use the actual defined symbols for live dump flags rather than manually substituting from the header files. It captures the relevant definitions from the livedump.h and dump.h files, and uses the C preprocessor to substitute values for you before passing your script to ProbeVue. Your script must comply with the following rules:

  • Must not begin with a #!/usr/bin/probevue comment.
  • Must not use symbols beginning with LDPP_, LDT_, DCF_, or DMP_ in conflict with the definitions in the header files.

Do not create files named pvdump.*, as the following script will overwrite them.

#!/bin/ksh
#
# Helper script for Vue scripts that need to pick up
# the values of the various flags used by livedump.
#
# The Vue script $1 
# must not contain a "#!/usr/bin/probevue" comment because
# the C preprocessor doesn't like it.

sed -n						\
	-e '/(/d'				\
	-e '/^#define LDPP_/p'			\
	-e '/^#define LDT_/p'			\
	-e '/^#define DCF_/p'			\
	-e '/^#define DMP_CTX_/p'		\
      /usr/include/sys/dump.h			\
      /usr/include/sys/livedump.h		\
    > pvdump.h

echo “#include \”pvdump.h\”” > pvdump.c
cat $1 >> pvdump.c
cc -P pvdump.c
/usr/bin/probevue –K pvdump.i
rm pvdump.[cih]