Implement a custom export plug-in for IBM Content Analytics to emit WebSphere Service Integration Bus events

This article focuses on implementation of a scenario where IBM® Content Analytics documents are consumed by an external application. A document is sent in a JMS message to the application, starting a business process, and passing unstructured and structured data from the document as an input to the process.

Vladislav Ponomarev, Software Engineer, IBM

Photo of Vladislav PonomarevVladislav Ponomarev is a Web technology solutions team leader at IBM Russian Software and Technology Lab. His major areas of expertise include Java EE applications, virtualization, and cloud computing.

Carl Osipov, Senior IT Architect, IBM

Photo of Carl OsipovCarl Osipov is an experienced solution architect with the Strategy and Technology organization in IBM Software Group. His skills are in the area of distributed computing, speech application development, and computational natural language understanding. He has published and presented on service-oriented architecture, conversational dialog management and social software to peers in the industry and academia. His current focus is on sustainable global economic development thorough improved effectiveness and intellectual property output of interdisciplinary, worldwide scientific research collaboration.

21 April 2011

Also available in Chinese Russian


IBM Content Analytics (ICA) can be used to collect (crawl) content from unstructured data sources such as web pages or text files and from structured relational database table records. Once gathered by ICA, the content is organized (indexed) into documents that can be searched and analyzed.

This article focuses on implementation of scenarios where ICA documents need to be processed in a way that is not directly supported by ICA, or where the documents will be input to an external system. A sample scenario is based on a scheduled crawler for a website with regularly updated announcement pages. Structured (e.g., date of the announcement) and unstructured (e.g., textual description of the announcement) content of the pages is gathered and indexed by ICA, one document per page. Once created, a document can be analyzed and used to start a WebSphere® Process Server (WPS) workflow to handle the incoming announcement according to a business process. We will describe how to implement this scenario using a custom ICA export plug-in, which is an event handler executed by ICA after a document is crawled and indexed. The article describes the initial configuration of ICA that has to be performed prior to using the plug-in.

We will integrate the ICA export plug-in with an external WebSphere Application Server/JEE application by sending ICA documents to the application via a Java™ Messaging Service (JMS) queue using secure socket layer (SSL) transport. We also provide a brief outline on how to set up a JMS queue in WebSphere Application Server and highlight the key points of configuring applications that send and receive JMS messages. Sample code is provided in the Downloads section.

In the first part of the article, we describe the export plug-in API specified by ICA. Next, we show how an ICA document, including structured data extracted via search facets, can be sent as a message over JMS, and we illustrate sample code of a WebSphere Application Server application for consuming the message and starting the business process.

The second part of the article is dedicated to middleware setup. The key export plug-in configuration steps describe how to deploy the plug-in to ICA. Also, this section covers how to create and configure a JMS queue in WebSphere Application Server.

Then article explains how to run the export plug-in, observe the delivery of a document event to WebSphere Application Server, and how the event can trigger a follow-on action, such as starting a WPS workflow.

ICA export plug-in development

Background, requirements, and technologies

In order to use information from ICA in other applications, ICA provides capabilities to export documents from text analytics collections (described in "Smarter collaboration for the education industry using Lotus Connections, Part 4: Use IBM Content Analytics to crawl, analyze, and display unstructured data"). Export can be customized in a number of ways. It is possible to specify the output options; a document can be exported to a file system as an XML file, to a relational database as a collection of records, and it's possible to use fully configurable destinations and formats, as defined by a custom export plug-in.

Background from ICA InfoCenter

When documents are processed through the ICA document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. ICA contains a storage component known as an index to maintain a searchable repository for the extracted values. The annotations are added to the index as tokens or facets and are used as the source for content analysis. Some annotators support user-defined dictionaries, user-defined rules, and custom configurations.

An ICA facet provides a way to organize or classify content so users can refine and navigate document collections along multiple paths, where each path presents a different view or perspective of the content. Facets can be flat, in which all possible paths are at the same level, such as a facet named State that includes values for New York, California, Virginia, etc. Facets can also be hierarchical and allow you to explore nested levels of classification, such as a music category that lets you drill down by genre, artist name, song title, etc.

The following the types of documents can be exported: crawled, analyzed, or searched. The first option means that a document is exported after ICA retrieves the document's contents from a data source and before the contents get analyzed and parsed. The second one means the document is exported after passing the collection's processing pipeline and the exported document is augmented with metadata based on facet annotations configured in the collection. And the last one enables the export of documents that result from a search by particular criteria over a collection.

In the scenario described here, the external application that consumes exported documents (e.g., announcement page on a website) should satisfy the following requirements:

  1. All document data, including content, metadata and facets, must be available to the application.
  2. The document should be exported in way that is independent from the consuming application as well as from ICA. This means that the export plug-in should be resilient to any changes to the document content in ICA or in the client. The application should not depend on ICA libraries.
  3. The export plug-in should send the document to the application using an asynchronous message (i.e., the plug-in thread should not wait on the application to process the document contents) so that ICA is decoupled from the application.
  4. After the message has been sent, it must be guaranteed that that message will eventually be delivered to the external application.
  5. The communication channel between the ICA plug-in and the application that consumes messages must be secured by the SSL protocol, which would prevent tampering and eavesdropping.

Requirements 3 and 4 imply that JMS can be used as a communication layer between the export plug-in and the application that consumes documents. Since the use of JMS is beyond what is supported by ICA, we will describe how to develop a custom export plug-in.

We will develop an external Java EE application running under WebSphere Application Server V7.0. WebSphere Application Server implements a set of JEE specifications, including Enterprise Java Beans (EJB) V3.0 (JSR 220) and JMS API (JSR 914). The application will contain a message-driven enterprise bean (MDB) connected to a JMS queue defined within the same application server. Inside the MDB, a handler method will process an incoming message, take the document information (including the data extracted by facets), and start the WPS workflow using an EJB proxy.

The WPS workflow is packaged as a Service Component Architecture (SCA) application and runs on WebSphere Process Server B7.0. Details of the workflow implementation, SCA application packaging, and deployment are beyond the scope of this article. Reference 5 contains overview of Business Process Choreographer (BPC) API functions, covers how to write a client application that uses BPC API, and provides details on how to create input data as well as access output data.

The JMS queue will be configured to prevent unauthorized connections and to enable the use of SSL to secure the communication channel, which addresses requirement 5.

Integration architecture overview

The architecture is illustrated in Figure 1. The top part of the image depicts the server running an instance of ICA. When a new document appears in the index, it's exported using a custom plug-in, which is a stand-alone Java application launched by ICA in a separate process. The document's content and the metadata derived from annotations of the document are available to the plug-in. The plug-in securely connects to the JMS queue on the WPS server using the javax.jms.Queue implementation and sends a message containing both types of the document information (more on this in the "Export plug-in implementation" section).

The WPS server hosts and runs a JEE application with an MDB connected to the queue. Upon a message delivery to the queue in WPS, the MDB's callback method is fired. In the callback, information is extracted from the message and is used to start the WPS workflow, which is a part of another SCA application running in the WPS server. The communication between the MDB and the workflow happens using EJB-based API of Business Process Choreographer, which makes it possible to run the SCA application and MDB on different physical servers, simplifying the scalability. The use of remote EJB interfaces enables the MDB application to run on a different physical server from where the SCA application runs. However, remote interfaces introduce network overhead, even when these applications share the same application server instance, in which case, local interfaces could boost the performance. This article assumes that SCA and MDB applications reside within the same WebSphere Application Server instance.

Finally, in the workflow, a document is handled according to the workflow's business process.

Figure 1. Integration overview
Image shows integration architecture

The infrastructure that enables connection between the export plug-in and the MDB and handles the messaging is provided by the WebSphere Service Integration (SI) Bus. The bus is a logical concept and can be thought of as collection of bus members (application servers or clusters) that share a common messaging infrastructure. Within each bus member, there is a number of messaging engines that manage runtime resources like queues and are capable of storing messages in a file, memory, or database. Messaging engines enable clients to connect to JMS queues and other SI Bus destinations. The MDB responsible for consuming the messages sent by the export plug-in uses an activation specification (a set of connectivity parameters) to connect to a JMS queue.

SI Bus components and their relationships with the MDB application are illustrated in Figure 2.

Figure 2. Service Integration Bus components
Image shows SI Bus components and their relationships with the MDB application

The WebSphere Application Server configuration enforces the use of a secured transport channel for communication between clients and the bus and ensures that access to queue destinations is restricted to authorized users only.

Export plug-in implementation

An ICA export plug-in must inherit the class and implement its abstract method, publish(), which takes a single parameter — an instance of the class representing a crawled and analyzed document.

To avoid unnecessary coupling of the external application to ICA, the document is converted into a generic, flat format before it is sent via a JMS message. All document fields (language, document type, source, etc.) and facet data are put into the map. Keys of the map are strings, prefixed with "field." or "facet.", and are equal to field name or to facet path (in the case of hierarchical facets, which represent nested levels of classification), which is a concatenation of path items separated by ‘ > ‘"word > num", for example. Values in the map are document field values or facet keywords. The resulting map is put into a message of type javax.jms.MapMessage and sent over JMS.

Although the Document class could have been serialized into an javax.jms.ObjectMessage and sent via JMS in a binary format, the use of a format-neutral, text-based message format ensures that the consuming application does not have dependencies on the Document implementation in the ICA libraries.

Altogether, this approach allows the implementation to address requirements 1 and 2: availability of all data from the document in a message, as well as an application-neutral message format for communication between ICA and the client.

Implementation of the publish() method of the export plug-in involves the following steps. First, there is a check for a document type: export plug-in's publish() method is called even if a document is deleted from the index, but in the case of the scenario described in this article, it makes no sense to handle this situation. Second, there is a call to a utility routine to get a "flattened" representation of the document. The flat message representation is then passed to sendMessage() method of instance initialized in the plug-in's constructor.

The class is responsible for handling JMS-related tasks. It establishes a connection to the message queue and sends messages. Its constructor takes two arguments: initialization properties and a pre-configured logger instance. Properties object contains the following:

  • JNDI provider URL in the form of corbaname:iiop:<hostname>:<port>.
  • Fully qualified name of a class implementing javax.naming.InitialContextFactory interface.
  • Queue JNDI name.
  • Connection factory JNDI name.
  • Name of the user allowed to perform the JNDI lookup. For simplicity, the queue is configured in a way that this user is the only one allowed to put messages onto it.
  • User password.
  • URL of the Java Authentication and Authorization Service (JAAS) login configuration file. For details, see Reference 8.
  • URL of the configuration file that contains Internet Inter-Orb Protocol (IIOP) authentication settings specific to thin clients of applications running under WebSphere Application Server.
  • URL of SSL configuration file.

The last three parameters are used to set system properties, as shown in Listing 1.

Listing 1. Setting system properties
System.setProperty("", jaasLoginConfig);
System.setProperty("", corbaConfigUrl);
System.setProperty("", sslConfigUrl);

Configuration of the logger passed to the constructor is beyond the scope of this article. At this point, it's worth mentioning that the logger might be set to write output into a separate file to simplify debugging.

The connection to the queue is obtained in the method initMessagingFacility(), according to the following sequence of steps:

  1. The javax.naming.InitialContext instance is created using specific initial context factory.
  2. The instance is created and authentication is performed.
  3. The instance is created by login module after authentication. Subject containing CORBA credential is used by the authorization service to further restrict access to JNDI resources.
  4. The subject is used to execute callback on behalf of the user specified in the MessageSender initialization properties. In callback, javax.jms.ConnectionFactory is looked up from JNDI context and used to obtain the javax.jms.Connection instance. Also, the javax.jms.Queue object is looked up, put into a simple data-transfer object (DTO) with a connection object, and returned from callback.

The process is illustrated in Listing 2.

Listing 2. Obtaining JMS connection and queue references
private void initMessagingFacility()
    throws NamingException, LoginException
    Properties env = new Properties();
    env.put(Context.INITIAL_CONTEXT_FACTORY, initialContextFactory);
    env.put(Context.PROVIDER_URL, providerUrl);
    final InitialContext initialContext = new InitialContext(env);
     * Dummy call to establish security realms from JAAS. It should be done
     * before JAAS login.
    CallbackHandler loginCallbackHandler = new WSCallbackHandlerImpl(
        jaasUser, jaasPassword
    LoginContext lc = new LoginContext("WSLogin", loginCallbackHandler);
    Subject subject = lc.getSubject();
    PrivilegedAction<MessagingWrapper> postMessage = \
new PrivilegedAction<MessagingWrapper>()
        public MessagingWrapper run()
                Object lookedUp = initialContext.lookup(jmsConnectionFactoryName);
                ConnectionFactory connectionFactory = \
(ConnectionFactory) PortableRemoteObject
                        .narrow(lookedUp, ConnectionFactory.class);
                lookedUp = initialContext.lookup(jmsQueueName);
                Queue queue = (Queue) PortableRemoteObject.narrow(lookedUp, Queue.class);
                Connection connection =
                    connectionFactory.createConnection(jaasUser, jaasPassword);

                return new MessagingWrapper(connection, queue);
            catch (NamingException e)
                logger.log(Level.SEVERE, "Failed to lookup \
connection factory and queue", e);
                throw new MessagingException("Failed to lookup \
connection factory and queue",e);
            catch (JMSException e)
                logger.log(Level.SEVERE, "Failed to \
create JMS connection", e);
                throw new MessagingException("Failed to \
create JMS connection", e);
    MessagingWrapper wrapper = (MessagingWrapper) WSSubject.doAs(subject, postMessage);
    connection = wrapper.getConnection();
    queue = wrapper.getQueue();
    if (logger.isLoggable(Level.INFO))
        logger.log(Level.INFO, "Connection: " + connection);
        logger.log(Level.INFO, "Queue: " + queue);

The method initMessagingFacility() is called lazily only once before sending the JMS message, per the following process:

  1. The javax.jms.Session instance (session) is created using queue connection.
  2. The object of the javax.jms.MessageProducer class is obtained using the session. The message producer is used to send messages to the destination. There are options to set message delivery mode, priority and time-to-live value. The delivery mode (non-persistent vs. persistent) specifies whether a JMS provider needs to take extra effort to ensure that a message is not lost in transit. The Priority parameter is a hint to the JMS provider about message delivery order. Time-to-live specifies the message lifetime; messages expired prior to when they were delivered to the consumer are discarded.
  3. A map message instance is created using the session object. The map message is essentially a set of key-value pairs, which makes it suitable to carry a flattened document, as described in the previous section.
  4. The map message is populated with data and is sent via the message producer.

The base class of export plug-in also defines a term() callback for the end of an export session. In this method, our implementation calls to the cleanup() method of MessageSender, which closes and releases all JMS resources it occupied.

Client application implementation

Starting WPS workflow

Starting a workflow is a multi-step process. First, a Business Flow Manager should be obtained by looking up its local interface from the JNDI context. Notice that in order to do this, it might be necessary to define an EJB reference in the deployment descriptor of the MDB application, as described in Reference 5 under "Code Review." After that, the process template object is queried by the Business Flow Manager. The process template contains the specification of a business process, including definition of its input and output data, which is used to construct Service Data Objects (SDO). SDOs are used to pass data between workflow and client code.

The client application is an MDB packed into an Enterprise Archive (EAR) and deployed to WebSphere Application Server. The single method implemented by the MDB is the onMessage() derived from the javax.jms.MessageListener interface.

In onMessage(), an incoming message is cast to a MapMessage class followed by an iteration through all of its properties to construct the hash map, which holds all the document data carried in the message. This map is then passed as an argument to start() method of the MDB implementation, which encapsulates the use of BPC API. For brevity, we omit the exact listing of the start() method; refer to the attached source code for details.

Since the goal of the article is to exemplify the consumption of a JMS message, we skip over the exact implementation of the workflow. Another example of message handling could be to construct JPA entities corresponding to the information from the exported document and store them in a database.

Listing 3 shows a skeleton implementation of the MDB.

Listing 3. Message handling
    activationConfig = {
            propertyName = "destinationType", propertyValue = "javax.jms.Queue"
public class RecommendationMessageHandler implements MessageListener
    /** Logger instance. */
    private static final Logger LOGGER = Logger
    @PersistenceContext(unitName = "tonkawa-core")
    private EntityManager manager;
    private MessageDrivenContext context;
    public void onMessage(Message message)
        if (LOGGER.isLoggable(Level.FINE))
            LOGGER.log(Level.FINE, "Message received: " + message);
            MapMessage mapMessage = (MapMessage) message;
            Enumeration<String> names = mapMessage.getMapNames();
            Map<String, String> documentData = new HashMap<String, String>();
            while (names.hasMoreElements())
                String key = names.nextElement();
                documentData.put(key, mapMessage.getString(key));

            SampleProcessWrapper wrapper = new SampleProcessWrapper();
        catch (Exception e)
            LOGGER.log(Level.SEVERE, "Exception occurred", e);

Note how all of the exceptions are caught, so that none of them are propagated upward to the call stack. This prevents the messaging engine from attempting to redeliver the message that caused the failure. In practice, such a message can be better handled in a more robust way (e.g., it might make sense to put it into a "failed" queue and have a special treatment for it in a separate MDB).

Also note how the WPS workflow is started in the last few lines of the try block. The wrapper class is a client that works with the workflow EJB API and encapsulates the tasks of JNDI lookup of Business Flow Manager, search the process template, marshalling input data into SDO, and launching the workflow by calling its start() method, as described earlier.

Middleware configuration

ICA export plug-in configuration

T get the export plug-in running, certain configuration steps must be performed.

The following environment variables are used:

ICA_ROOT— The directory where ICA is installed (e.g., C:\ICA22\es).

WAS_ROOT— The directory where WebSphere Application Server is installed (e.g., D:\Additional Programs\WID_WTE\runtimes\bi_v7).

In order to initialize MessageSender (see "Export plug-in implementation"), a number of properties should be provided. The implementation in this article (see Downloads) reads them from a file structured as shown in Listing 4.

Listing 4. Plug-in configuration properties file
# JNDI provider URL. It should be in form corbaname:iiop:<host>:<port>
# Make sure that host is reachable and add it to /etc/hosts if needed.
# InitialContextFactory class name. Don't modify this line.
# JNDI name of the queue to send message to
# JNDI name of the connection factory used to obtain connection to queue

# Name of the user allowed to perform JNDI lookup. Currently the \
assumption is that the same user is allowed to put messages onto the queue
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/wsjaas_client.conf\
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/sas.client.props\
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/ssl.client.props

The last three lines are explained in the next section.

To enable the export plug-in to use secure communication with JMS server, the following files must be provided to the plug-in:

  • wsjaas_client.conf
  • sas.client.props
  • ssl.client.props

These files specify security configuration to use in RMI-IIOP communication with the server. They should be downloaded to the ICA server from %WAS_ROOT%\profiles\<profile>\properties directory of the WebSphere Application Server or WPS server hosting the queue.

The files should be saved to a known location, and the plug-in configuration properties file should be modified to reference the file location (see the example in the previous section).

Important: Make sure paths are specified in URL format (i.e., prefixed with file:///).

After that, trust store and key store should be downloaded to the ICA server from the server where the JMS is running. The location of these files can be looked up in properties/ssl.client.props in the server profile directory, as shown in Listing 5.

Listing 5. Trust and key store location in ssl.client.props

After downloading the keys, the ssl.client.props file located in the ICA server should be modified to reflect the new location of the keys. Namely, the following properties must be adjusted:

  • user.root

Since the plug-in depends on JMS classes and performs JNDI lookups of objects from the WebSphere naming context, the class dependencies should be loaded by the JVM. Due to some peculiarities of class loading, the requirement is even stronger. It's necessary to have these classes in the classpath of the Java process, so a custom URLClassLoader implementation will not suffice. So in order to configure the plug-in process classpath, these actions need to be executed:

  1. Download the following files from %WAS_ROOT%\runtimes directory of the server you want to communicate with:
  1. Put these files in the %ICA_ROOT%\lib directory
  2. In %ICA_ROOT%\configurations\interfaces\indexservice__interface.ini file, locate the line that starts with "classpath" and append following text to it:

A final step is the export plug-in deployment. This can be achieved using ICA admin console Web UI as follows:

  1. In the System Administration web application, navigate to 'Collections' and click 'Edit' for collection for which you need export plug-in running.
  2. When the page is loaded, navigate to the 'Export' tab and select 'Configure options to export crawled or analyzed documents'.
  3. In 'Options for exporting analyzed documents' section, select the 'Export documents by using a custom plug-in' radio button.
  4. Specify the JAR file with the export plug-in in the 'Export plug-in class path' field.
  5. Specify the plug-in class name in the Publisher's 'Class name' field (e.g.,
  6. Save your changes.
  7. In the 'System Administration' web application, navigate to 'Collections' and click 'Parse and Index' for the desired collection. On the 'Parse and Index' tab on the next page, launch the indexer process.

WebSphere Application Server configuration

The following is a description of how to create and configure the SI Bus in WebSphere Application Server for ICA:

  1. Create a JAAS alias that stores the user ID and password used to authenticate the creation of a new connection to the JMS provider. To simplify the configuration, we use wasadmin credentials for this alias.
  2. Create an SI Bus. For simplicity, the JAAS alias created in the previous bullet is used to authorize communication between messaging engines of the bus.
  3. Add the WebSphere Application Server server to the SI Bus members.
  4. Create an SI Bus queue destination. Optionally, specify a reliability setting for message delivery. In our case, the ASSURED_PERSISTENT value was chosen to guarantee that the messages are not discarded.
  5. Create a connection factory connected to the SI Bus.
  6. Create a queue.
  7. Create an activation specification for the MDB application. The destination JNDI name should be set to the JNDI name of the queue created at step 6, and the JAAS alias should be set to the value from the step 1.

While it's possible to do all these steps from the WebSphere Application Server Admin Console, in order to improve the deployment process, you can use a sample Jython script for the wsadmin tool, which allows to perform administrative operations using a scripting language, helping to automate WebSphere Application Server configuration. Besides steps 1-7, the script also includes commands to install and start the MDB application. The script is provided in the Downloads section below.

Before running the script, you need to edit following lines in its header, as shown in Listing 6.

Listing 6. Environment-specific parameters in deployment script
#WebSphere administrator credentials
userName = "wasadmin"
password = "123456"

# This is a server to be added as a bus member.
server = "server1"
fqdn = ""

# EAR installation parameters.
appPath = "C:/sample.ear"
deployToServers = [ "webserver1", "server1" ]

To run the script, launch the wsadmin tool as described in the "Starting the wsadmin scripting client using wsadmin scripting" section of the WebSphere Application Server V7 InfoCenter.

After that, the wsadmin shell is launched, where Jython or Jacl commands can be executed. To run the script, type the execfile() command as shown below:

 wsadmin> execfile("<path>\")

As the script runs, it provides a comprehensive diagnostic output to track the configuration progress.

Running plug-in and observing results

At this point, it should be possible to launch all the application components and observe the results.

Assuming that SCA and MDB applications were already deployed and started, ensure that the ICA crawler and indexer for a particular document collection are running. This can be done with the System Administration web application of ICA by navigating to 'Collections' and choosing the proper one. Tabs 'Crawl' and 'Parse and Index' have controls to launch the respective service. See the top part of Figure 3 for an illustration.

The export plug-in is started simultaneously with the indexer, which can be observed in the plug-in log file if the plug-in logger was configured as described in the "Export plug-in implementation" section, as shown in Listing 7.

Listing 7. Export plug-in startup lines in a log
Initializing message sender...
Mar 3, 2011 5:56:24 AM assertProperties
INFO: Provider URL:
Mar 3, 2011 5:56:24 AM assertProperties
INFO: Initial context factory:
Mar 3, 2011 5:56:24 AM assertProperties
INFO: Queue name: jms/TonkawaQueue
Mar 3, 2011 5:56:24 AM assertProperties
INFO: Connection factory name: jms/TonkawaConnectionFactory
Mar 3, 2011 5:56:24 AM assertProperties
INFO: JAAS user: wasadmin
Mar 3, 2011 5:56:24 AM assertProperties
INFO: JAAS login config: file:///C:/CCA_plugin/signaller/wsjaas_client.conf
Mar 3, 2011 5:56:24 AM assertProperties
INFO: CORBA config: file:///C:/CCA_plugin/signaller/sas.client.props
Mar 3, 2011 5:56:24 AM assertProperties
INFO: SSL config: file:///C:/CCA_plugin/signaller/ssl.client.props
Mar 3, 2011 5:56:24 AM flush
Message sender initialized...

The next step is to make a sample announcement document available to the crawler. If using a web crawler, this could be done by publishing a web page on a local site running on IBM HTTP Server. Alternatively, a file system crawler can find the document if it is copied to a directory monitored by the crawler. Next, navigate to the crawler details page and click 'Start full recrawl', as shown at the bottom of Figure 3.

Figure 3. Starting full re-crawl
Starting full re-crawl

Once the crawling is complete and a new document appears in the index, it's supposed to be exported and sent via JMS to the consuming application. This can be confirmed in the plug-in log, as shown below (note that the flattened document content is skipped for brevity).

Listing 8. Notification about sent message in the log
Mar 3, 2011 8:20:21 AM sendMessage
FINE: Flattened doc: {<content skipped>}
Mar 3, 2011 8:20:23 AM sendMessage
INFO: Sending message...

It may take a few seconds until the message is delivered, processed and a new business process is started. The start event can be monitored with Business Process Choreographer Explorer Web application installed on WPS. Log in and navigate to the 'Started by me' branch under the 'Process Instances' node to view recently created business process instances.


In this article, we showed an example of how documents crawled and analyzed by ICA can be exported for further processing to external systems. We considered a scenario when ICA monitors a website that is periodically updated with announcements. Once published, the announcements should be handled by a business process implemented in WPS. We developed a custom export plug-in for sending document data over JMS and a sample application where messages are delivered. The application then extracts data carried in the message and creates a new WPS process instance that governs the announcement handling according to the business objectives of the organization.


Source codedm-1104customexportplugincode.zip34KB



Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Information management on developerWorks

Zone=Information Management, WebSphere
ArticleTitle=Implement a custom export plug-in for IBM Content Analytics to emit WebSphere Service Integration Bus events