IBM Content Analytics (ICA) can be used to collect (crawl) content from unstructured data sources such as web pages or text files and from structured relational database table records. Once gathered by ICA, the content is organized (indexed) into documents that can be searched and analyzed.
This article focuses on implementation of scenarios where ICA documents need to be processed in a way that is not directly supported by ICA, or where the documents will be input to an external system. A sample scenario is based on a scheduled crawler for a website with regularly updated announcement pages. Structured (e.g., date of the announcement) and unstructured (e.g., textual description of the announcement) content of the pages is gathered and indexed by ICA, one document per page. Once created, a document can be analyzed and used to start a WebSphere® Process Server (WPS) workflow to handle the incoming announcement according to a business process. We will describe how to implement this scenario using a custom ICA export plug-in, which is an event handler executed by ICA after a document is crawled and indexed. The article describes the initial configuration of ICA that has to be performed prior to using the plug-in.
We will integrate the ICA export plug-in with an external WebSphere Application Server/JEE application by sending ICA documents to the application via a Java™ Messaging Service (JMS) queue using secure socket layer (SSL) transport. We also provide a brief outline on how to set up a JMS queue in WebSphere Application Server and highlight the key points of configuring applications that send and receive JMS messages. Sample code is provided in the Downloads section.
In the first part of the article, we describe the export plug-in API specified by ICA. Next, we show how an ICA document, including structured data extracted via search facets, can be sent as a message over JMS, and we illustrate sample code of a WebSphere Application Server application for consuming the message and starting the business process.
The second part of the article is dedicated to middleware setup. The key export plug-in configuration steps describe how to deploy the plug-in to ICA. Also, this section covers how to create and configure a JMS queue in WebSphere Application Server.
Then article explains how to run the export plug-in, observe the delivery of a document event to WebSphere Application Server, and how the event can trigger a follow-on action, such as starting a WPS workflow.
ICA export plug-in development
Background, requirements, and technologies
In order to use information from ICA in other applications, ICA provides capabilities to export documents from text analytics collections (described in "Smarter collaboration for the education industry using Lotus Connections, Part 4: Use IBM Content Analytics to crawl, analyze, and display unstructured data"). Export can be customized in a number of ways. It is possible to specify the output options; a document can be exported to a file system as an XML file, to a relational database as a collection of records, and it's possible to use fully configurable destinations and formats, as defined by a custom export plug-in.
The following the types of documents can be exported: crawled, analyzed, or searched. The first option means that a document is exported after ICA retrieves the document's contents from a data source and before the contents get analyzed and parsed. The second one means the document is exported after passing the collection's processing pipeline and the exported document is augmented with metadata based on facet annotations configured in the collection. And the last one enables the export of documents that result from a search by particular criteria over a collection.
In the scenario described here, the external application that consumes exported documents (e.g., announcement page on a website) should satisfy the following requirements:
- All document data, including content, metadata and facets, must be available to the application.
- The document should be exported in way that is independent from the consuming application as well as from ICA. This means that the export plug-in should be resilient to any changes to the document content in ICA or in the client. The application should not depend on ICA libraries.
- The export plug-in should send the document to the application using an asynchronous message (i.e., the plug-in thread should not wait on the application to process the document contents) so that ICA is decoupled from the application.
- After the message has been sent, it must be guaranteed that that message will eventually be delivered to the external application.
- The communication channel between the ICA plug-in and the application that consumes messages must be secured by the SSL protocol, which would prevent tampering and eavesdropping.
Requirements 3 and 4 imply that JMS can be used as a communication layer between the export plug-in and the application that consumes documents. Since the use of JMS is beyond what is supported by ICA, we will describe how to develop a custom export plug-in.
We will develop an external Java EE application running under WebSphere Application Server V7.0. WebSphere Application Server implements a set of JEE specifications, including Enterprise Java Beans (EJB) V3.0 (JSR 220) and JMS API (JSR 914). The application will contain a message-driven enterprise bean (MDB) connected to a JMS queue defined within the same application server. Inside the MDB, a handler method will process an incoming message, take the document information (including the data extracted by facets), and start the WPS workflow using an EJB proxy.
The WPS workflow is packaged as a Service Component Architecture (SCA) application and runs on WebSphere Process Server B7.0. Details of the workflow implementation, SCA application packaging, and deployment are beyond the scope of this article. Reference 5 contains overview of Business Process Choreographer (BPC) API functions, covers how to write a client application that uses BPC API, and provides details on how to create input data as well as access output data.
The JMS queue will be configured to prevent unauthorized connections and to enable the use of SSL to secure the communication channel, which addresses requirement 5.
Integration architecture overview
The architecture is illustrated in Figure
1. The top part of the image depicts the server running an
instance of ICA. When a new document appears in the index, it's exported
using a custom plug-in, which is a stand-alone Java application launched by
ICA in a separate process. The document's content and the metadata
derived from annotations of the document are available to the plug-in. The
plug-in securely connects to the JMS queue on the WPS server using the
javax.jms.Queue implementation and sends a
message containing both types of the document information (more on this in
the "Export plug-in implementation" section).
The WPS server hosts and runs a JEE application with an MDB connected to the queue. Upon a message delivery to the queue in WPS, the MDB's callback method is fired. In the callback, information is extracted from the message and is used to start the WPS workflow, which is a part of another SCA application running in the WPS server. The communication between the MDB and the workflow happens using EJB-based API of Business Process Choreographer, which makes it possible to run the SCA application and MDB on different physical servers, simplifying the scalability. The use of remote EJB interfaces enables the MDB application to run on a different physical server from where the SCA application runs. However, remote interfaces introduce network overhead, even when these applications share the same application server instance, in which case, local interfaces could boost the performance. This article assumes that SCA and MDB applications reside within the same WebSphere Application Server instance.
Finally, in the workflow, a document is handled according to the workflow's business process.
Figure 1. Integration overview
The infrastructure that enables connection between the export plug-in and the MDB and handles the messaging is provided by the WebSphere Service Integration (SI) Bus. The bus is a logical concept and can be thought of as collection of bus members (application servers or clusters) that share a common messaging infrastructure. Within each bus member, there is a number of messaging engines that manage runtime resources like queues and are capable of storing messages in a file, memory, or database. Messaging engines enable clients to connect to JMS queues and other SI Bus destinations. The MDB responsible for consuming the messages sent by the export plug-in uses an activation specification (a set of connectivity parameters) to connect to a JMS queue.
SI Bus components and their relationships with the MDB application are illustrated in Figure 2.
Figure 2. Service Integration Bus components
The WebSphere Application Server configuration enforces the use of a secured transport channel for communication between clients and the bus and ensures that access to queue destinations is restricted to authorized users only.
An ICA export plug-in must inherit
the com.ibm.es.oze.api.export.ExportDocumentPublisher
class and implement its abstract method,
publish(), which takes a single parameter
— an instance of the
com.ibm.es.oze.api.export.document.Document
class representing a crawled and analyzed document.
To avoid unnecessary coupling of the external application to ICA,
the document is converted into a generic, flat format before it is sent
via a JMS message. All document fields (language, document type,
source, etc.) and facet data are put into the map. Keys of the map are
strings, prefixed with "field." or
"facet.", and are equal to field name or
to facet path (in the case of hierarchical facets, which represent nested
levels of classification), which is a concatenation of path items separated
by ‘ > ‘ — "word > num", for example. Values in the map are document
field values or facet keywords. The resulting map is put into a message of
type javax.jms.MapMessage and sent over
JMS.
Although the Document class could have been
serialized into an javax.jms.ObjectMessage and
sent via JMS in a binary format, the use of a format-neutral, text-based
message format ensures that the consuming application does
not have dependencies on the Document
implementation in the ICA libraries.
Altogether, this approach allows the implementation to address requirements 1 and 2: availability of all data from the document in a message, as well as an application-neutral message format for communication between ICA and the client.
Implementation of the publish() method of the
export plug-in involves the following steps. First, there is a check for a
document type: export plug-in's publish()
method is called even if a document is deleted from the index, but in the case
of the scenario described in this article, it makes no sense to handle
this situation. Second, there is a call to a utility routine to get a
"flattened" representation of the document. The flat message
representation is then passed to sendMessage()
method of com.ibm.tonkawa.cca.jms.MessageSender
instance initialized in the plug-in's constructor.
The class com.ibm.tonkawa.cca.jms.MessageSender
is responsible for handling JMS-related tasks. It establishes a connection
to the message queue and sends messages. Its constructor takes two arguments:
initialization properties and a pre-configured logger instance. Properties
object contains the following:
- JNDI provider URL in the form of
corbaname:iiop:<hostname>:<port>. - Fully qualified name of a class implementing
javax.naming.InitialContextFactoryinterface. - Queue JNDI name.
- Connection factory JNDI name.
- Name of the user allowed to perform the JNDI lookup. For simplicity, the queue is configured in a way that this user is the only one allowed to put messages onto it.
- User password.
- URL of the Java Authentication and Authorization Service (JAAS) login configuration file. For details, see Reference 8.
- URL of the configuration file that contains Internet Inter-Orb Protocol (IIOP) authentication settings specific to thin clients of applications running under WebSphere Application Server.
- URL of SSL configuration file.
The last three parameters are used to set system properties, as shown in Listing 1.
Listing 1. Setting system properties
System.setProperty("java.security.auth.login.config", jaasLoginConfig);
System.setProperty("com.ibm.CORBA.ConfigURL", corbaConfigUrl);
System.setProperty("com.ibm.SSL.ConfigURL", sslConfigUrl);
|
Configuration of the logger passed to the constructor is beyond the scope of this article. At this point, it's worth mentioning that the logger might be set to write output into a separate file to simplify debugging.
The connection to the queue is obtained in the method
initMessagingFacility(), according to the
following sequence of steps:
-
The
javax.naming.InitialContextinstance is created using specific initial context factory. -
The
javax.security.auth.login.LoginContextinstance is created and authentication is performed. -
The
javax.security.auth.Subjectinstance is created by login module after authentication. Subject containing CORBA credential is used by the authorization service to further restrict access to JNDI resources. - The subject is used to execute callback on behalf of the user specified in
the
MessageSenderinitialization properties. In callback,javax.jms.ConnectionFactoryis looked up from JNDI context and used to obtain thejavax.jms.Connectioninstance. Also, thejavax.jms.Queueobject is looked up, put into a simple data-transfer object (DTO) with a connection object, and returned from callback.
The process is illustrated in Listing 2.
Listing 2. Obtaining JMS connection and queue references
private void initMessagingFacility()
throws NamingException, LoginException
{
Properties env = new Properties();
env.put(Context.INITIAL_CONTEXT_FACTORY, initialContextFactory);
env.put(Context.PROVIDER_URL, providerUrl);
final InitialContext initialContext = new InitialContext(env);
/*
* Dummy call to establish security realms from JAAS. It should be done
* before JAAS login.
*/
initialContext.lookup(providerUrl);
CallbackHandler loginCallbackHandler = new WSCallbackHandlerImpl(
jaasUser, jaasPassword
);
LoginContext lc = new LoginContext("WSLogin", loginCallbackHandler);
lc.login();
Subject subject = lc.getSubject();
PrivilegedAction<MessagingWrapper> postMessage = \
new PrivilegedAction<MessagingWrapper>()
{
public MessagingWrapper run()
{
try
{
Object lookedUp = initialContext.lookup(jmsConnectionFactoryName);
ConnectionFactory connectionFactory = \
(ConnectionFactory) PortableRemoteObject
.narrow(lookedUp, ConnectionFactory.class);
lookedUp = initialContext.lookup(jmsQueueName);
Queue queue = (Queue) PortableRemoteObject.narrow(lookedUp, Queue.class);
Connection connection =
connectionFactory.createConnection(jaasUser, jaasPassword);
return new MessagingWrapper(connection, queue);
}
catch (NamingException e)
{
logger.log(Level.SEVERE, "Failed to lookup \
connection factory and queue", e);
throw new MessagingException("Failed to lookup \
connection factory and queue",e);
}
catch (JMSException e)
{
logger.log(Level.SEVERE, "Failed to \
create JMS connection", e);
throw new MessagingException("Failed to \
create JMS connection", e);
}
}
};
MessagingWrapper wrapper = (MessagingWrapper) WSSubject.doAs(subject, postMessage);
connection = wrapper.getConnection();
queue = wrapper.getQueue();
if (logger.isLoggable(Level.INFO))
{
logger.log(Level.INFO, "Connection: " + connection);
logger.log(Level.INFO, "Queue: " + queue);
}
}
|
The method initMessagingFacility() is called lazily
only once before sending the JMS message, per the following process:
-
The
javax.jms.Sessioninstance (session) is created using queue connection. - The object of the
javax.jms.MessageProducerclass is obtained using the session. The message producer is used to send messages to the destination. There are options to set message delivery mode, priority and time-to-live value. The delivery mode (non-persistent vs. persistent) specifies whether a JMS provider needs to take extra effort to ensure that a message is not lost in transit. The Priority parameter is a hint to the JMS provider about message delivery order. Time-to-live specifies the message lifetime; messages expired prior to when they were delivered to the consumer are discarded. - A map message instance is created using the session object. The map message is essentially a set of key-value pairs, which makes it suitable to carry a flattened document, as described in the previous section.
- The map message is populated with data and is sent via the message producer.
The base class of export plug-in also defines a
term() callback for the end of an export session. In
this method, our implementation calls to the
cleanup() method of
MessageSender, which closes and releases all JMS
resources it occupied.
Client application implementation
The client application is an MDB packed into an
Enterprise Archive (EAR) and deployed to WebSphere Application Server. The single method
implemented by the MDB is the onMessage()
derived from the javax.jms.MessageListener
interface.
In onMessage(), an incoming message is cast to a
MapMessage class followed by an iteration
through all of its properties to construct the hash map, which holds all
the document data carried in the message. This map is then passed as an
argument to start() method of the MDB
implementation, which encapsulates the use of BPC API. For brevity, we omit
the exact listing of the start() method; refer
to the attached source code for details.
Since the goal of the article is to exemplify the consumption of a JMS message, we skip over the exact implementation of the workflow. Another example of message handling could be to construct JPA entities corresponding to the information from the exported document and store them in a database.
Listing 3 shows a skeleton implementation of the MDB.
Listing 3. Message handling
@MessageDriven(
activationConfig = {
@ActivationConfigProperty(
propertyName = "destinationType", propertyValue = "javax.jms.Queue"
)
}
)
@RunAs(TonkawaRoles.TONKAWA)
public class RecommendationMessageHandler implements MessageListener
{
/** Logger instance. */
private static final Logger LOGGER = Logger
.getLogger(RecommendationMessageHandler.class.getName());
@PersistenceContext(unitName = "tonkawa-core")
private EntityManager manager;
@Resource
private MessageDrivenContext context;
public void onMessage(Message message)
{
if (LOGGER.isLoggable(Level.FINE))
{
LOGGER.log(Level.FINE, "Message received: " + message);
}
try
{
MapMessage mapMessage = (MapMessage) message;
Enumeration<String> names = mapMessage.getMapNames();
Map<String, String> documentData = new HashMap<String, String>();
while (names.hasMoreElements())
{
String key = names.nextElement();
documentData.put(key, mapMessage.getString(key));
}
SampleProcessWrapper wrapper = new SampleProcessWrapper();
wrapper.start(documentData);
}
catch (Exception e)
{
context.setRollbackOnly();
LOGGER.log(Level.SEVERE, "Exception occurred", e);
}
}
}
|
Note how all of the exceptions are caught, so that none of them are propagated upward to the call stack. This prevents the messaging engine from attempting to redeliver the message that caused the failure. In practice, such a message can be better handled in a more robust way (e.g., it might make sense to put it into a "failed" queue and have a special treatment for it in a separate MDB).
Also note how the WPS workflow is started in the last few lines of the
try block. The wrapper class is a client that
works with the workflow EJB API and encapsulates the tasks of JNDI lookup of
Business Flow Manager, search the process template, marshalling input data
into SDO, and launching the workflow by calling its
start() method, as described earlier.
ICA export plug-in configuration
T get the export plug-in running, certain configuration steps must be performed.
The following environment variables are used:
ICA_ROOT — The directory where ICA
is installed (e.g., C:\ICA22\es).
WAS_ROOT — The directory where WebSphere
Application Server is installed (e.g.,
D:\Additional Programs\WID_WTE\runtimes\bi_v7).
In order to initialize MessageSender (see
"Export plug-in implementation"), a
number of properties should be provided. The implementation in this
article (see Downloads) reads them from a file structured as shown
in Listing 4.
Listing 4. Plug-in configuration properties file
# JNDI provider URL. It should be in form corbaname:iiop:<host>:<port>
# Make sure that host is reachable and add it to /etc/hosts if needed.
jms.jndi.provider.url=corbaname:iiop:bps:2810
# InitialContextFactory class name. Don't modify this line.
jms.jndi.initialContextFactory=com.ibm.websphere.naming.WsnInitialContextFactory
# JNDI name of the queue to send message to
jms.jndi.queueName=jms/TonkawaQueue
# JNDI name of the connection factory used to obtain connection to queue
jms.jndi.connectionFactoryName=jms/TonkawaConnectionFactory
# Name of the user allowed to perform JNDI lookup. Currently the \
assumption is that the same user is allowed to put messages onto the queue
jms.jaas.user=wasadmin
jms.jaas.password={xor}Lz4sLChvLTs=
java.security.auth.login.config=file:\
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/wsjaas_client.conf
com.ibm.CORBA.ConfigURL=file:\
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/sas.client.props
com.ibm.SSL.ConfigURL=file:\
///C:/Program Files/IBM/es/esadmin/plugin_config/signaller/ssl.client.props
|
The last three lines are explained in the next section.
To enable the export plug-in to use secure communication with JMS server, the following files must be provided to the plug-in:
-
wsjaas_client.conf -
sas.client.props -
ssl.client.props
These files specify security configuration to use in RMI-IIOP
communication with the server. They should be downloaded to the ICA server
from %WAS_ROOT%\profiles\<profile>\properties
directory of the WebSphere Application Server or WPS server hosting the queue.
The files should be saved to a known location, and the plug-in configuration properties file should be modified to reference the file location (see the example in the previous section).
Important: Make sure paths are specified in URL format
(i.e., prefixed with file:///).
After that, trust store and key store should be downloaded to the ICA server
from the server where the JMS is running. The location of these files can
be looked up in properties/ssl.client.props in
the server profile directory, as shown in Listing 5.
Listing 5. Trust and key store location in ssl.client.props
...
user.root=C:/WDPE_WTE/runtimes/bi_v7/profiles/ProcSrv01
...
com.ibm.ssl.trustStore=${user.root}/etc/trust.p12
com.ibm.ssl.keyStore=${user.root}/etc/key.p12
|
After downloading the keys, the ssl.client.props
file located in the ICA server should be modified to reflect the new
location of the keys. Namely, the following properties must be
adjusted:
-
user.root -
com.ibm.ssl.trustStore -
com.ibm.ssl.keyStore
Since the plug-in depends on JMS classes and performs JNDI lookups of
objects from the WebSphere naming context, the class dependencies should
be loaded by the JVM. Due to some peculiarities of class loading, the
requirement is even stronger. It's necessary to have these classes in the
classpath of the Java process, so a custom
URLClassLoader implementation will not suffice.
So in order to configure the plug-in process classpath, these actions
need to be executed:
- Download the following files from
%WAS_ROOT%\runtimesdirectory of the server you want to communicate with:
-
com.ibm.ws.ejb.thinclient_7.0.0.jar; -
com.ibm.ws.sib.client.thin.jms_7.0.0.jar.
- Put these files in the
%ICA_ROOT%\libdirectory - In
%ICA_ROOT%\configurations\interfaces\indexservice__interface.inifile, locate the line that starts with"classpath"and append following text to it:.
,com.ibm.ws.ejb.thinclient_7.0.0.jar,com.ibm.ws.sib.client.thin.jms_7.0.0.jar
A final step is the export plug-in deployment. This can be achieved using ICA admin console Web UI as follows:
- In the System Administration web application, navigate to 'Collections' and click 'Edit' for collection for which you need export plug-in running.
- When the page is loaded, navigate to the 'Export' tab and select 'Configure options to export crawled or analyzed documents'.
- In
'Options for exporting analyzed documents'section, select the 'Export documents by using a custom plug-in' radio button. - Specify the JAR file with the export plug-in in the 'Export plug-in class path' field.
- Specify the plug-in class name in the Publisher's
'Class name' field (e.g.,
com.ibm.tonkawa.cca.plugin.SignallerPlugin). - Save your changes.
- In the 'System Administration' web application, navigate to 'Collections' and click 'Parse and Index' for the desired collection. On the 'Parse and Index' tab on the next page, launch the indexer process.
WebSphere Application Server configuration
The following is a description of how to create and configure the SI Bus in WebSphere Application Server for ICA:
-
Create a JAAS alias that stores the user ID and
password used to authenticate the creation of a new
connection to the JMS provider. To simplify the configuration, we use
wasadmincredentials for this alias. - Create an SI Bus. For simplicity, the JAAS alias created in the previous bullet is used to authorize communication between messaging engines of the bus.
- Add the WebSphere Application Server server to the SI Bus members.
- Create an SI Bus queue destination. Optionally, specify a reliability
setting for message delivery. In our case, the
ASSURED_PERSISTENTvalue was chosen to guarantee that the messages are not discarded. - Create a connection factory connected to the SI Bus.
- Create a queue.
- Create an activation specification for the MDB application. The destination JNDI name should be set to the JNDI name of the queue created at step 6, and the JAAS alias should be set to the value from the step 1.
While it's possible to do all these steps from the WebSphere Application Server Admin Console, in order to improve the deployment process, you can use a sample Jython script for the wsadmin tool, which allows to perform administrative operations using a scripting language, helping to automate WebSphere Application Server configuration. Besides steps 1-7, the script also includes commands to install and start the MDB application. The script is provided in the Downloads section below.
Before running the script, you need to edit following lines in its header, as shown in Listing 6.
Listing 6. Environment-specific parameters in deployment script
#WebSphere administrator credentials userName = "wasadmin" password = "123456" # This is a server to be added as a bus member. server = "server1" fqdn = "myhost.com" # EAR installation parameters. appPath = "C:/sample.ear" deployToServers = [ "webserver1", "server1" ] |
To run the script, launch the wsadmin tool as described in the "Starting the wsadmin scripting client using wsadmin scripting" section of the WebSphere Application Server V7 InfoCenter.
After that, the wsadmin shell is launched, where Jython or Jacl commands can be
executed. To run the script, type the execfile() command as shown below:
wsadmin> execfile("<path>\install.py")
|
As the script runs, it provides a comprehensive diagnostic output to track the configuration progress.
Running plug-in and observing results
At this point, it should be possible to launch all the application components and observe the results.
Assuming that SCA and MDB applications were already deployed and started,
ensure that the ICA crawler and indexer for a particular document
collection are running. This can be done with the System Administration web
application of ICA by navigating to
'Collections' and choosing the proper one. Tabs
'Crawl' and
'Parse and Index' have controls to launch
the respective service. See the top part of Figure 3 for an illustration.
The export plug-in is started simultaneously with the indexer, which can be observed in the plug-in log file if the plug-in logger was configured as described in the "Export plug-in implementation" section, as shown in Listing 7.
Listing 7. Export plug-in startup lines in a log
Initializing message sender... Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: Provider URL: corbaname:iiop:bps.renovations.com:2810 Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: Initial context factory: com.ibm.websphere.naming.WsnInitialContextFactory Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: Queue name: jms/TonkawaQueue Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: Connection factory name: jms/TonkawaConnectionFactory Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: JAAS user: wasadmin Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: JAAS login config: file:///C:/CCA_plugin/signaller/wsjaas_client.conf Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: CORBA config: file:///C:/CCA_plugin/signaller/sas.client.props Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.jms.MessageSender assertProperties INFO: SSL config: file:///C:/CCA_plugin/signaller/ssl.client.props Mar 3, 2011 5:56:24 AM com.ibm.tonkawa.cca.plugin.PluginLogger flush INFO: Message sender initialized... |
The next step is to make a sample announcement document available to the crawler. If using a web crawler, this could be done by publishing a web page on a local site running on IBM HTTP Server. Alternatively, a file system crawler can find the document if it is copied to a directory monitored by the crawler. Next, navigate to the crawler details page and click 'Start full recrawl', as shown at the bottom of Figure 3.
Figure 3. Starting full re-crawl
Once the crawling is complete and a new document appears in the index, it's supposed to be exported and sent via JMS to the consuming application. This can be confirmed in the plug-in log, as shown below (note that the flattened document content is skipped for brevity).
Listing 8. Notification about sent message in the log
Mar 3, 2011 8:20:21 AM com.ibm.tonkawa.cca.plugin.SignallerPlugin sendMessage
FINE: Flattened doc: {<content skipped>}
Mar 3, 2011 8:20:23 AM com.ibm.tonkawa.cca.jms.MessageSender sendMessage
INFO: Sending message...
|
It may take a few seconds until the message is delivered, processed and a new business process is started. The start event can be monitored with Business Process Choreographer Explorer Web application installed on WPS. Log in and navigate to the 'Started by me' branch under the 'Process Instances' node to view recently created business process instances.
In this article, we showed an example of how documents crawled and analyzed by ICA can be exported for further processing to external systems. We considered a scenario when ICA monitors a website that is periodically updated with announcements. Once published, the announcements should be handled by a business process implemented in WPS. We developed a custom export plug-in for sending document data over JMS and a sample application where messages are delivered. The application then extracts data carried in the message and creates a new WPS process instance that governs the announcement handling according to the business objectives of the organization.
| Description | Name | Size | Download method |
|---|---|---|---|
| Source code | dm-1104customexportplugincode.zip | 34KB | HTTP |
Information about download methods
Learn
- Check out
IBM Content Analytics to learn the basic concepts of ICA,
such as documents, collections, and processing pipeline.
- Read "Smarter collaboration for the education industry
using Lotus Connections, Part 4: Use IBM Content Analytics to crawl, analyze, and display
unstructured data" to understand how to create and configure document collections and crawlers.
-
Visit the WebSphere Application Server Version 7.0 Information Center
for extensive information on how to configure Service Integration
Bus in WebSphere Application Server.
- Peruse the
WebSphere Process Server Information Center to
understand how SCA applications are deployed.
- Learn how to use WPS EJB API in
"Business Process Management Samples: EJB
API — Overview."
-
JSR 220: Enterprise JavaBeans 3.0 provides
details about message-driven beans component contract.
-
JSR 914: Java Message Service (JMS) API
defines how messaging services can be used in Java EE applications.
-
The JAAS Login Configuration
File reference describes various configuration options to set up custom login modules.
-
Understand the basics of Service Integration Bus security configuration by reading "Securing JMS connections to WebSphere
Enterprise Service Bus V6.1 or V6.2."
- Thorough instructions on how to configure different aspects of application security can be
found in the IBM Redbooks® titled "IBM WebSphere Application
Server V6.1 Security Handbook."
- "Deploying message-driven beans
and JMS applications into the Service Integration Bus" provides step-by-step guidance on
deployment of MDB into Service Integration Bus running under WebSphere Application Server.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.

Vladislav Ponomarev is a Web technology solutions team leader at IBM Russian Software and Technology Lab. His major areas of expertise include Java EE applications, virtualization, and cloud computing.

Carl Osipov is an experienced solution architect with the Strategy and Technology organization in IBM Software Group. His skills are in the area of distributed computing, speech application development, and computational natural language understanding. He has published and presented on service-oriented architecture, conversational dialog management and social software to peers in the industry and academia. His current focus is on sustainable global economic development thorough improved effectiveness and intellectual property output of interdisciplinary, worldwide scientific research collaboration.




