Custom Converters

To add a new custom converter to the conversion process for a given search collection, click Add a new converter, select Custom converter from the scrollable list in the dialog that displays, and click Add. A new converter template displays, in which you can set the following fields to configure your new custom converter:

  • Type-In - Identifies a specific content-type or wildcard pattern that the input data's content-type must match for this rule to be applied. If a Conditional is specified, that conditional must also match in order for the rule to be applied.

    An asterisk ('*') can be used to match all or any part of an input content-type, examples of which are '*', 'application/*', '*/pdf', and '*/*word'. During crawling, any input data that does not have a specific content-type is given a content-type of unknown. (See Converter Process State for more detailed information about how the content type is set during crawling.)

  • Type-Out - Identifies the content-type of the converted data if this converter is successful.
  • Fallback - Identifies the content-type of the output data if the converter fails or a sub-command generates an error code. In these cases, the original data is preserved unmodified and is passed on with the specified content-type for subsequent processing. If you do not specify a fallback content-type, Watson™ Explorer Engine provides a default content-type that is triggered by converter or sub-command failure.

    You can also write rules that act upon the error that triggered your fallback content-type, or the default fallback content-type. See Examining and Acting on Converter Errors for information about how to examine and act upon any fallback errors that occur.

  • Output forking - determines whether the output of this rule is simply the processed input data stream with the specified type-out (which occurs when Output forking is unset), or whether this rule causes an additional data stream to be created whose conversion starts with the output of this rule. See Conversion Process Step for more detailed information about creating multiple data streams and naming their anchors.
  • Name - specifies the name that you want to use for this converter.
  • Conditional - this section enables you to specify a condition that must be matched (in addition to the Type-In value that you specified previously) in order for this converter to be applied to the input data. The attributes of this condition are the following:
    • Test - specifies whether the condition applies to the url or body of the input data. The default setting is url.
    • With - specifies the type of test: a wildcard set, regular expression, case-insensitive regular expression, program (where an exit status of 1 indicates success and 0 indicates failure), or an xpath
    • Condition Text - specifies the wildcard set, regular expression, xpath, or program to use in testing the input data
  • Advanced - enables you to specify specific CPU, memory, and elapsed time limits for this converter. For more information, see Setting Converter Limits.
  • Action - this section enables you to identify the type of action that will be performed when this converter executes, and the command or parser associated with that action. The attributes of this action are the following:
    • Action - Identifies the type of action that should be performed using the information specified in the Action Text area. Valid values are the following:
      • command - enables you to specify a command (in the Action Text area) that should be executed to perform this conversion step. Watson Explorer Engine supports a number of variables that can be used as command-line arguments for such commands. See Variables for Converter Actions for a complete list of available variables.
        Note: When specifying commands on Windows platforms, the commands that you want to execute must either be in the default list of directories that the system searches for binaries, or you should enter the full pathname of the command that you want to execute. When executing commands such as copy that are actually built into the Windows command processor (cmd.exe), you may need to use syntax like the following:
        C:\WINDOWS\system32\cmd.exe /c copy %source_file %target_dir

        This example is from a Windows XP installation. You may need to change the path to cmd.exe if the Windows command interpreter is located in another directory in your version of Windows.

      • type-out - enables you to specify a command (in the Action Text area) that analyzes the input data to attempt to determine its type. The type-out of the converter is set to the first line of output from that command.
      • xsl - assumes that the incoming data is already in XML format, passing it on without modification for processing by the XSL code provided in the action area. If the incoming XML is malformed, Watson Explorer Engine will attempt to fix it before processing it.
      • xsl-html - uses XSL to transform incoming HTML data into well-formed HTML that is then processed by the XSL code provided in the action area. This transformation includes cleanups such as trying to close unclosed tags, add missing tags (like html and body), escape entities, and similar normalization steps.
      • regex - use regular expressions to switch from one state to another and delimit contents. Outputs XML conformant to the IBM schema.
      • regex-text - the same as the regex action, except that the output format is text.
      • case-insensitive-regex - the same as the regex action, except that the regular expressions are not case-sensitive
    • Action Text - the command or parser that will be used to process incoming data. As mentioned previously, you can use a number of variables when defining different types of actions. See Variables for Converter Actions for detailed information about available variables.

After setting the appropriate fields and values for your custom converter, click OK to save your changes, or Cancel to discard any modifications that were made in the new converter.

You can reorder the converters by dragging and dropping the number in the leftmost column of the current converter list.