IPRSE_parse: Parse a text string against a grammar

IPRSE_parse matches an input string against an input grammar and produces a structure containing elements of the grammar with the corresponding elements of the input string. It is intended primarily for parsing z/TPF commands by real-time segments written in C language.

IPRSE_parse returns the parsed parameters and values through a pointer to a struct IPRSE_output, declared in the tpfparse.h header.

Format

#include <tpf/tpfparse.h>
int IPRSE_parse(char *string, const char *grammar,
               struct IPRSE_output *result, int options,
               const char *errheader);
string
The input string, which must be a standard C string terminated by a zero byte ('\0') or by an EOM character if the IPRSE_EOM option is specified. If the IPRSE_EOM option is specified, the maximum length of the string is 4095 characters.
grammar
The grammar describing acceptable input strings. The grammar must end in a zero byte ('\0').
result
The tokenized parameter list in the following form:
result.IPRSE_parameter
The parameter name as specified by the grammar
result.IPRSE_value
The value of the parameter as specified by the input string.
Note: This value is translated to uppercase when the IPRSE_MIXED_CASE option is specified and the corresponding grammar parameter ends with a less-than sign (<).
result.IPRSE_next
The pointer to the next entry in the output parameter list.
options
The following options control sending error messages:
IPRSE_PRINT
Print all error messages.
IPRSE_NOPRINT
Suppress all error messages. This is the default if IPRSE_PRINT is not specified.
The following options control allocation of storage for the results:
IPRSE_ALLOC
Obtain storage dynamically for the output structure. Always code this option.
IPRSE_NOALLOC
This is the default if IPRSE_ALLOC is not specified. Do not use IPRSE_NOALLOC except to facilitate migration of old code that uses the IPRSE_bldprstr function to initialize preallocated storage. Using the IPRSE_ALLOC option is both more efficient and less likely to cause errors.
The following options are pertinent to input message blocks:
IPRSE_EOM
If the input string ends with an EOM character (+) instead of an EOS character ('\0'), the IPRSE_parse function replaces the EOM character with an EOS character. The first EOM character in the input string is replaced, there cannot be any '+' characters within the input string if the IPRSE_EOM option is specified. The EOM character must be in the first 4095 characters of the input string.
IPRSE_NOEOM
This is the default if IPRSE_EOM is not specified. The input string must end with an EOS character ('\0').

The following four options control how the IPRSE_parse function parses the input string. All four of these options can also be specified at the beginning of the grammar (see Specifying parser options in the grammar). Options specified in the grammar parameter override options specified in the options parameter:

IPRSE_STRICT
Accept only spaces as token separators, and only dashes (-) as separators between keywords and values in the input string. Use this option when the input parameters can contain commas (,), slashes (/), or equal signs (=).
IPRSE_NOSTRICT
Accept spaces, commas (,) or slashes (/) as token separators, and dashes (-) or equal signs (=) as separators between keywords and values. This is the default if IPRSE_STRICT is not specified.
IPRSE_MIXED_CASE
Accept lowercase or uppercase letters in the input string.
IPRSE_NOMIXED_CASE
Accept only uppercase letters in the input string. This is the default if IPRSE_MIXED_CASE is not specified.
IPRSE_QUOTE
Use the dollar sign ($) or single quote (') characters to delimit quoted parameters. The returned value will match the contents of a quoted parameter minus the delimiter. To specify a delimiter character, double up the character value; for example, 2 single quotes (' ') will return 1 single quote (').
IPRSE_NOQUOTE
This is the default if IPRSE_QUOTE is not specified. The IPRSE_ NOQUOTE option specifies that there is no special meaning to the dollar sign ($) or single quote (') characters.

Multiple options can be ORed together; for example, IPRSE_ALLOC | IPRSE_PRINT.

errheader
A string that identifies the program calling the parser. This string is printed out as part of the error message text if the IPRSE_PRINT option is specified.

Normal return

IPRSE_parse returns the number of parameters that have been parsed and put in the result structure. For example, a return code of 3 would mean that 3 parameters were parsed from an input string that contained 3 parameters.

Note: The return code must be used to count the nodes when traversing the result.

Error return

IPRSE_parse detects errors in the input string and in the grammar.
  • Error in Input String
    • Return Codes
      -1
      The input string is a question mark (?) or HELP (represented by symbolic IPRSE_HELP).
      0
      The input string does not meet the requirements of the grammar (represented by symbolic IPRSE_BAD).
    • Error Messages
      The IPRSE_parse function issues the following messages. The cccc represents the errheader parameter that is printed after the message header; it shows which function or program was calling IPRSE_parse when the error occurred. All messages will be sent via the wtopc function without chaining.
      PRSE0001E
      cccc - TOO MANY PARAMETERS ENTERED
      PRSE0004E
      cccc - INVALID USE OF PERIOD
      PRSE0005E
      cccc - INVALID ALPHANUMERIC CHARACTER
      PRSE0006E
      cccc - INVALID DECIMAL CHARACTER
      PRSE0007E
      cccc - INVALID CHARACTER
      PRSE0008E
      cccc - INVALID HEXADECIMAL CHARACTER
      PRSE0009E
      cccc - MANDATORY PARAMETER NOT GIVEN

      If the system can determine the last parameter in error, the message indicates the last parameter in error by adding the PARAMETER IN ERROR IS text and the parameter value.

      PRSE0011E
      cccc - INVALID INPUT PARAMETER

      If the system can determine the last valid parameter, the message indicates the last valid parameter by adding text that states LAST VALID PARAMETER IS and the parameter value. If a keyword was found to be in error, text will be added that states ERROR IN KEYWORD and the keyword.

      PRSE0014E
      cccc - TOO MANY CHARACTERS ENTERED
      PRSE0015E
      cccc - TOO FEW CHARACTERS ENTERED FOR PARAMETER
  • Error in Grammar
    • 00006F system error messages are displayed in console or dump when the grammar syntax is in error.
  • 0007B system error messages occur when the parser is unable to obtain needed heap storage.

Programming considerations

  • Always code the IPRSE_ALLOC option. IPRSE_NOALLOC and the IPRSE_bldprstr are supported only for code that was written before the IPRSE_ALLOC option was available.
  • For information on creating a grammar, see Defining a grammar.

Examples

A series of examples follows, the first of which is a complete program for creating and parsing with a grammar. All of the other examples show a grammar, its input string, and the IPRSE_output structure.

The number of parameters found is returned if the string complies with the grammar conventions. See Defining a grammar for additional information.

The examples represent the results in the IPRSE_output structure through the diagram shown in Figure 1.
Figure 1. IPRSE_output structure
Alternate text

Example 1: Coding example for grammar and parser

The following example shows a program that:
  • Parses input using a specific grammar (IPRSE_parse)
  • Uses the parsed output (process_parm, defined in the example).
/*====================================================================*/
/*  This example shows a segment that parses                          */
/*  a message in MI0MI format on data level D0.                       */
/*  This code example includes calls to:                              */
/*    - parse input using a specific grammar (IPRSE_parse)            */
/*    - use the parsed output (process_parm, defined in this segment) */
/*====================================================================*/

#include <tpf/tpfeq.h>
#include <tpf/tpfapi.h>
#include <string.h>
#include <stdlib.h>
#include <tpf/tpfparse.h>

/*--------------------------------------------------------------------*/
/*  Define the grammar for the command handled by this segment, where:*/
/*    Positional is a positional parameter.                           */
/*    d+++ is a positional parameter that represents 1 to 4 digits.   */
/*    a.a is a positional parameter that represents a regular list of */
/*      alphanumeric characters (character type a).                   */
/*    (xx)* is an optional positional parameter that represents a     */
/*      wildcard list of hexadecimal digits (character type x).       */
/*    (NO)SELFdef is a self-defining keyword that returns a Y (yes    */
/*      value) or N (no value).                                       */
/*    Key-w is an optional regular keyword parameter that can have    */
/*      an alphanumeric value (character type w).                     */
/*    List-cc.cc is an optional regular keyword parameter that can    */
/*      have a value that consists of a regular list of uppercase     */
/*      letters (character type c).                                   */
/*--------------------------------------------------------------------*/

#define XMP_GRAMMAR "{ Positional "                               \
                    "| d+++ a.a [(xx)*] "                         \
                    "| (NO)SELFdef [Key-w List-cc.cc] "           \
                    "}"

/*--------------------------------------------------------------------*/
/*  Declare an interface to functions that will process the parsed    */
/*  command parameters.                                               */
/*--------------------------------------------------------------------*/

enum   parm1_type { POSITIONAL_NOT_SPECIFIED, POSITIONAL_SPECIFIED };
enum   parm5_type { SELFDEF_NOT_SPECIFIED, SELFDEF_NO, SELFDEF_YES };

struct xmp_interface
{
    enum parm1_type   parm1_value;      /*  "Positional"              */
    int               parm2_value;      /*  "d+++"                    */
    char             *parm3_first;      /*  first "a"                 */
    char             *parm3_second;     /*  second "a"                */
    char             *parm4_string;     /*  "(xx)*"                   */
    enum parm5_type   parm5_value;      /*  "(NO)SELFdef"             */
    char              parm6_value;      /*  "Key-w"                   */
    char             *parm7_first;      /*  first "cc"                */
    char             *parm7_second;     /*  second "cc"               */
};

#define XMP_DEFAULTS { POSITIONAL_NOT_SPECIFIED, -1, NULL, NULL, NULL, \
                       SELFDEF_NOT_SPECIFIED, '\0', NULL, NULL }

/*--------------------------------------------------------------------*/
/*  Declare internal function called by this segment.                 */
/*--------------------------------------------------------------------*/

static void process_parm(struct xmp_interface *xi , char *p, char *v);


/**********************************************************************/
/*  Function ____ completes the parsing of the "Zxxxx" functional     */
/*  message contained in the core block on data level D0.             */
/**********************************************************************/

void ____(void)
{

/*--------------------------------------------------------------------*/
/*  Define variables for accessing the command text in the            */
/*  core block on D0.                                                 */
/*--------------------------------------------------------------------*/

    struct mi0mi         *block_ptr;  /*  pointer to core block       */
    char                 *input_ptr;  /*  pointer to message text     */
    char                 *eom_ptr;    /*  pointer to _EOM character   */
                                      /*  (to be replaced by '\0')    */

/*--------------------------------------------------------------------*/
/*  Define variables for the parser results.                          */
/*--------------------------------------------------------------------*/

    struct IPRSE_output   parse_results;
    int                   num_parms;  /* For saving the IPRSE_parse   */
                                      /* return code.                 */

/*--------------------------------------------------------------------*/
/*  Define a moving pointer for traversing the parse results, a wtopc */
/*  header for the help message, and an interface variable for the    */
/*  parsed parameter values.                                          */
/*--------------------------------------------------------------------*/

    struct IPRSE_output  *pr_ptr;
    struct wtopc_header   msg_header;
    struct xmp_interface  parm_values = XMP_DEFAULTS;


/*--------------------------------------------------------------------*/
/*  Access the command block on level D0, point to the                */
/*  beginning of the parameters by skipping over "Zxxxx", and replace */
/*  _EOM with '\0'.                                                   */
/*--------------------------------------------------------------------*/

    block_ptr = ecbptr()->ce1cr0;
    input_ptr = block_ptr->mi0acc + strlen("Zxxxx");
    eom_ptr = (char *)&block_ptr->mi0ln0 + block_ptr->mi0cct - 1;
    *eom_ptr = '\0';

/*--------------------------------------------------------------------*/
/*  Call the parser.                                                  */
/*--------------------------------------------------------------------*/

    num_parms = IPRSE_parse(input_ptr, XMP_GRAMMAR, &parse_results,
        IPRSE_ALLOC | IPRSE_PRINT, "cpp_tppc_test");

/*--------------------------------------------------------------------*/
/*  Check if the command meets the grammar's requirements. */
/*--------------------------------------------------------------------*/

    if (num_parms > 0)  /*  The parse was successful; num_parms       */
                        /*  parameters from the command               */
                        /*  matched parameters specified in the       */
                        /*  grammar (XMP_GRAMMAR).                    */
    {
        pr_ptr = &parse_results;     /*  point to the first result    */
        do
        {
            process_parm(&parm_values, pr_ptr->IPRSE_parameter,
                pr_ptr->IPRSE_value);
            pr_ptr = pr_ptr->IPRSE_next;
        } while (--num_parms);

        /* call additional functions to further process the input */
    }
    else
        if (num_parms == IPRSE_HELP)
        {
            wtopc_insert_header(&msg_header, "cpp_tppc_test", 99, 'I',
                WTOPC_SYS_TIME);
            wtopc("EXAMPLE HELP MESSAGE", 0, WTOPC_NO_CHAIN,
                &msg_header);
        }
        else ;  /*  IPRSE_parse has already written an error message. */

    exit(0);    /*  Command processing is completed.       */
}


/**********************************************************************/
/*  Function process_parm sets the appropriate interface variable     */
/*  field to the value corresponding to the matched parameter.        */
/**********************************************************************/

static void process_parm(struct xmp_interface *xi , char *p, char *v)
{

/*--------------------------------------------------------------------*/
/*  Define the value that strcmp returns when the two strings passed  */
/*  to it are equal.                                                  */
/*--------------------------------------------------------------------*/

#define STRCMP_EQUAL 0

/*--------------------------------------------------------------------*/
/*  Define a local variable to point to the dot in a list.            */
/*--------------------------------------------------------------------*/

    char *dot_ptr;


/*--------------------------------------------------------------------*/
/*  Determine which parameter was matched and set up the appropriate  */
/*  interface field(s) with the matching values.                      */
/*--------------------------------------------------------------------*/

    if (strcmp(p, "Positional") == STRCMP_EQUAL)
    {
        xi->parm1_value = POSITIONAL_SPECIFIED;
    }

    else if (strcmp(p, "d+++") == STRCMP_EQUAL)
    {
        xi->parm2_value = atoi(v);
    }

    else if (strcmp(p, "a.a") == STRCMP_EQUAL)
    {
        dot_ptr = strchr(v, '.');   /*  Point to the dot separating   */
                                    /*  the two list sub-parameters.  */
        *dot_ptr = '\0';            /*  Divide the list parameter     */
                                    /*  into two sub-strings.         */
        xi->parm3_first = v;
        xi->parm3_second = dot_ptr + 1;
    }

    else if (strcmp(p, "xx") == STRCMP_EQUAL)
    {
        xi->parm4_string = v;
    }

    else if (strcmp(p, "(NO)SELFdef") == STRCMP_EQUAL)
    {
        xi->parm5_value =
            *v == 'Y' ? SELFDEF_YES : SELFDEF_NO;
    }

    else if (strcmp(p, "Key-w") == STRCMP_EQUAL)
    {
        xi->parm6_value = *v;
    }

    else if (strcmp(p, "List-cc.cc") == STRCMP_EQUAL)
    {
        dot_ptr = strchr(v, '.');
        *dot_ptr = '\0';
        xi->parm7_first = v;
        xi->parm7_second = dot_ptr + 1;
    }

    return;
}
If the grammar is:
"{ Postional | d+++ a.a [(xx)*]
| (NO)SELFdef [Key-w List-cc.cc]}"
And the input string:
NOSELF K-A L-DD.AA
Therefore, the string meets the grammar requirements. IPRSE_parse returns a return code of 3 because it parsed 3 parameters on the input string. The results in the arrays are shown as follows (result 0 is the first parameter parsed, and so on):
Alternate text
Example 2
 char *grammar = "cccc PRoc-ccxx [IS-d ]";
 char *string  = "DDDD PRO-AB23";
The string meets the grammar requirements. IPRSE_parse returns a return code of 2 because it parsed 2 parameters from the input string. The results in the arrays are shown as follows (result 0 is the first parameter parsed, and so on):
Alternate text
Example 3
 char *grammar = "cccc PR {Deac | Reac}";
 char *string  = "DDDD PR DEA ";
The string meets the grammar requirements. IPRSE_parse returns a return code of 3 because it parsed 3 parameters from the input string. The results in the arrays are shown as follows (result 0 is the first parameter parsed, and so on):
Alternate text
Example 4
 char *grammar = "cccc (wwww)* ";
 char *string  = "DDDD AAAB.CC*  ";
The string meets the grammar requirements. IPRSE_parse returns a return code of 2 because it parsed 2 parameters from the input string. The results in the arrays are shown as follows (result 0 is the first parameter parsed, and so on):
Alternate text
Example 5
 char *grammar = "(NO)RETURNcode";
 char *string  = "NORETURN";
The string meets the grammar requirements. IPRSE_parse returns a return code of 1 because it parsed 1 parameter from the input string. The result in the array is shown as follows (result 0 is the only parameter parsed):
Alternate text
Example 6
char * grammar = IPRSE_QUOTE_GRAMMAR "u*";
char * string = "'MY QUOTED INPUT WITH SPACES'";
The string meets the grammar requirements. IPRSE_parse returns a return code of 1 because it parsed 1 parameter from the input string. The result in the array is shown as follows (result 0 is the only parameter parsed):
Alternate text
char * grammar = IPRSE_QUOTE_GRAMMAR "u*";
char * string = "$MY 'QUOTED' INPUT WITH SPACES$";
Alternate text