Skip to main content

skip to main content

developerWorks  >  SOA and Web services | WebSphere  >

Build an RSS aggregator using IBM WebSphere DataPower SOA Appliances multistep services

Explore enhancements to multistep

developerWorks
Document options
PDF format - Fits A4 and Letter

PDF - Fits A4 and Letter
122KB

Get Adobe® Reader®

Document options requiring JavaScript are not displayed

Sample code


Rate this page

Help us improve this content


Level: Advanced

Srinivasan Muralidharan (muralisr@us.ibm.com), Advisory Engineer, IBM
David Z. Maze (dmaze@us.ibm.com), Senior Engineer, IBM

08 May 2008

The IBM® WebSphere® DataPower® SOA Appliances multistep processing policy system is a key part of appliance configuration. Version 3.6.1 of the firmware includes a number of enhancements to multistep that provide functionality familiar to programmers, including loops of actions, conditional execution of actions, and the ability to execute actions in parallel. Explore how you can combine the new features in multistep 3 to build an RSS feed aggregator.

Overview of the RSS feed aggregator service (RFAS)

RFAS uses multistep 3 features to aggregate news feeds and, optionally, filter the output using a given search criteria. The main features of the service are:

  • RSS sources that can be dynamically provided as input.
  • Search criteria that can be provided as input.
  • Feeds that are obtained in parallel.

This service illustrates the following features of the programming model introduced by multistep 3:

  • Conditional action
  • For-each action
  • Event-sink action
  • Parallel execution using asynchronous actions
  • Results action fanning out to multiple URLs asynchronously

The XML firewall service consists of four rules, as shown in Figure 1.


Figure 1. Rules in the RFAS
Rules in the RFAS

PipeURLsinRequestRule is the main rule to which input of the form shown in Listing 1 is sent via HTTP POST method.


Listing 1. URLs sent to the service
                
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" 
               xmlns:tns= "http://developerworks.ibm.com/ /ms3demo">
    <soap:Body>
       <tns:keyword>some search text</tns:keyword>
       <tns:sheets>
             <tns:url>some URL</tns:url>
             <tns:url>some URL</tns:url>
       </tns:sheets>
    </soap:Body>
</soap:Envelope>
			

Both the keyword and url elements are optional. If url elements are provided, PipesRunDynamicURLs is executed. Otherwise, the service executes the PipeFixedFeeds rule to get RSS feeds from two fixed sources. If keyword is provided, the ProcessSearchParam rule is executed to filter items retrieved from the news feed. In both cases, the choice of which rule to execute is determined by conditional actions executing XPath expressions to look for url elements and keyword elements, respectively.

PipesRunDynamicURLs uses a for-each action to collect all the URLs into a context variable. Then a results action uses this variable to fetch the feeds from those URLs in parallel.

PipeFixedFeeds contains two results actions, each asynchronously fetching a single RSS feed. An event-sink action waits for the two results actions to complete.

The rest of this article details the implementation. In particular you'll learn about some interesting aspects of multistep 3 that might not be obvious at a quick glance.



Back to top


Executing the main PipeURLsinRequestRule rule

Conditional actions and call rules
A conditional action can execute any action, including another conditional action. By executing a Call-Rulefrom, a conditional action is a powerful divide-and-conquer approach to programming a service. When a Call-Rule completes, the execution proceeds to the next action in the calling rule. The execution contexts in the called rule are available in the main rule after the invocation.

We will follow the implementation in the most natural way – tracing the flow of input as it progresses through the service.

Look for the url element in input -- PipeURLsinRequestRule_conditional_1

This conditional action executes one of two actions based on the evaluation of the XPath expression /*[local-name()='Envelope']/*[local-name()='Body']/*[local-name()='sheets']/*[local-name()='url' and count(.) > 0]. If that XPath expression matches, then the SOAP body element contains at least one URL reference, and a call action invokes the rule PipesRunDynamicURLs. Otherwise, you call PipeFixedFeeds.

Aggregate the results from the feeds -- PipeURLsinRequestRule_xform_1

The outputs from all the RSS feeds (either from the rule PipesRunDynamicURLs or PipeFixedFeeds) are stored in contexts pipesout_1, pipesout_2, and so on. The context variable var://context/loop/count contains the total number of these contexts. The PipeURLsinRequestRule_xform_1 transform executes consol.xsl, which loops over the pipesout_x variables and aggregates their contents into the context mergedout.

Get the search condition -- PipeURLsinRequestRule_filter_1

If a keyword was provided in the input, this filter action sets up a variable to hold the XPath expression with which to search over the aggregated output.

Variables across rules
This is an example of where a variable can be set in one rule but used in another. The XPath expression is used in the ProcessSearchParam rule.

To search or not to search -- PipeURLsinRequestRule_conditional_0

This conditional action evaluates the XPath expression /*[local-name()='Envelope']/*[local-name()='Body']/*[local-name()='keyword' and . != ''] to determine if the aggregated results need to be further filtered using the keyword provided in the input. If a keyword was provided, the rule ProcessSearchParam is executed. Otherwise, the catch-all XPath expression /* evaluates to true and executes the transform PipeURLsinRequestRule_conditional_0_refaction_xform_0, which merely adds a header to the aggregated output.



Back to top


Executing the PipesRunDynamicURLs rule to process RSS feed URLs in the input

Scope of for-each iterator variables
The for-each iterator variables are only valid in actions run from within a for-each action; hence, geturls.xsl stores the loop count in a global variable for later use. If a for-each action executed a call action, the iterator variables would be visible to all actions within the called rule. If loop actions are nested, the iterator variables return the state for the innermost loop.

This rule illustrates how multistep 3 allows dynamic asynchronous fetching of an arbitrary number of URLs. While this example only uses HTTP as a wire protocol, this functionality can be extended to any other protocol supported by the appliance firmware.

Collect URLs from input -- PipesRunDynamicURLs_for-each_0

This for-each action executes the transform PipesRunDynamicURLs_for-each_0_refaction_xform, which executes geturls.xsl to gather the url elements from input. The action loops over the node set selected by the XPath expression /*[local-name()='Envelope']/*[local-name()='Body']/*[local-name()='sheets']/*[local-name()='url']. Two loop variables are associated with each for-each iteration. The var://service/multistep/loop-iterator variable contains the current node from that node set; var://service/multistep/loop-count contains the current iteration count, starting at 1. Note the use of the two iterator variables in geturls.xsl.


Figure 2. For-each action
For-each action

PipesRunDynamicURLs_for-each_0_refaction_xform uses the same context (tmpout) for input and output. When run in a loop by the for-each action, this provides an easy means to collect all data into a single context.

Prepare URLs for execution -- PipesRunDynamicURLs_xform_12

The PipesRunDynamicURLs_xform_12 prepares the URLs for execution in a results action. PipesRunDynamicURLs_xform_12 adds the results element to the url elements resulting in the structure shown in Listing 2.


Listing 2. URLs for results action
                
<results>
<url>url_1</url>
<url>url_2</url>
….
…
</results>

PipesRunDynamicURLs_xform_12 completes processing by assigning the structure in Listing 2 to var://context/urlvar/urls.

Listing 2 is an examples of a general form specified by the XSD definition in Listing 3, which can be used to override attributes for individual URLs.


Listing 3. The general <results> form
                
<results mode="require-all" transactional="true" retry-interval="100"
                 asynchronous="true" multiple-outputs="true" >
        <url input="'var://context/someinputcontext'" retry-count="2"
                 asynchronous="true" >url_1</url>
           ….
</results>

The attributes input, retry-count, and asynchronous on a url element override the common value in the results action that applies to all the URLs. The attributes on the results element override corresponding attributes in the results action.

Execute the URLs -- PipesRunDynamicURLs_results_4

Asynchronous results action: a mechanism for fan out over arbitrary URLs
The results action used this way is very efficient: Data is posted to the URLs in parallel, thus maximizing the throughput of the batch execution. This approach to collecting URLs and executing them once can be a pattern for implementing fan outs of different kinds. Other options, such as the Multi-Way Results Mode, provide ways to implement such flexible patterns.

The results action is revamped with many new features. You'll read about a couple of these here, specifically the Multi-Way Results Mode and Use Multiple Outputs options. The results action has always supported passing a node set containing a list of URLs in a context variable, but prior to version 3.6.1 this could only result in sending the same data to multiple destinations serially.

In version 3.6.1, an XML syntax adds options to send different data to each target, and the network transactions are executed in parallel. The Multi-Way Results Mode option controls how failures are handled. The default, Require All, sends data to all targets in parallel and fails if any one of those targets is unreachable. Use Multiple Outputs is a toggle switch that, when on, creates a separate output context for each target URL with the provided name suffixed with a distinct number, starting at 1 for the first URL.

Use common output context names: a neat way to help with modular development
The action that aggregates results from RSS feeds (see Aggregate the results from the feeds -- PipeURLsinRequestRule_xform_1) is unaware if the feeds are obtained dynamically or using fixed feeds. It expects a total count of the number of feeds to be in var://context/loop/count and the pipesout_x contexts to contain the results appropriately. In the dynamic URL execution path, the Multi-Way Results Mode creates multiple pipesout_x contexts. To mimic that behavior, the results actions in the fixed feeds have their outputs set to pipesout_1 and pipesout_2, respectively. As for the loop count variable, the first set variable action sets it to 2. Also note that these variables are available in the called rule.

Complex branching logic
The asynchronous action followed by a event-sink action is a frequently used pattern in implementing complex processing logic found in process flow languages such as Business Process Execution Language (BPEL). From the Web GUI, users can't make an event-sink action wait for an action from a different rule. This limitation can be overcome by adding the appropriate configuration statements directly into the configuration, say using the command-line interface (CLI), to let event-sink actions wait for asynchronous actions from other rules to complete. This allows the implementation of complex branching logic. The implementer needs to know the internal name of the action (readily viewable in the CLI) and needs to make sure that the processing orchestration doesn't lead to a situation where the box can wait up to the event-sink timeout for an action that never starts.


Figure 3. Options in the results action available in multistep 3
For-each action

Executing rule PipesFixedFeeds to process preconfigured RSS feeds

The PipesFixedFeeds illustrates the implementation of parallel executions using asynchronous actions (that is, actions with the asynchronous flag on) and the event-sink action. Prior to version 3.6.1, there was only one asynchronous action, results-async, but multistep 3 allows any action to be marked as asynchronous, causes outstanding asynchronous actions to not block completion of the rule as a whole, and provides a new event-sink action, which waits for a list of named outstanding asynchronous actions.


Figure 4. Event-sink on two asynchronous results actions
For-each action

Each results action executes a single URL specified in the Destination field. Unlike the Multi-Way Results Mode action examined in the previous section, these results actions take in the URL directly instead of naming a context variable containing URLs.

Executing the ProcessSearchParam rule to process filter-aggregated feeds

Use dynamically constructed XPath expressions
Unlike the previous example of for-each action, the XPath field points to the context variable var://context/search/var instead of a hard-coded expression. This is because the XPath expression has to be constructed dynamically using the input keyword element (see Get the search condition -- PipeURLsinRequestRule_filter_1). The combination of dynamically constructing an XPath expression along with executing complex actions under conditional or for-each actions provides a powerful development tool to minimize XSLT programming.

This last rule is also executed conditionally from PipeURLsinRequestRule. It uses the keyword input to filter the aggregated news feeds.

Filter using supplied keyword -- ProcessSearchParam_for-each_0

This for-each action collects all the news items that contain the supplied keyword in the title element of the news feeds. As in the previous example of the for-each action (see Collect URLs from input -- PipesRunDynamicURLs_for-each_0), the same Input and Output contexts are used along with the loop variables var://service/multistep/loop-iterator and var://service/multistep/loop-count to collect matching news items into a single output context.

Conclusion

The enhanced multistep 3 functionality in the WebSphere DataPower SOA Appliances V3.6.1 firmware introduces some new programming verbs—for-each, conditional, and event-sink—that bring it closer to process-oriented languages like BPEL. These make WebSphere DataPower SOA Appliances development more modular and less XSLT-intensive. The RSS aggregator described in this article uses simple stylesheets, each less than 30 lines, including standard declarations. In addition to such language enhancements, performance enhancements are possible by using asynchronous behavior to run actions in parallel. Most importantly, you can use all these features seamlessly as building blocks for complex applications.




Back to top


Download

DescriptionNameSizeDownload method
DataPower config file1ms3dpconfig.zip370KBHTTP
Information about download methods

Note

  1. The file ms3dpconfig.zip contains the WebSphere DataPower SOA Appliances configuration for the RSS feed aggregator sample. Import the file into a WebSphere DataPower SOA Appliances device to use the example.


Resources

Learn

Get products and technologies
  • Innovate your next development project with IBM trial software, available for download or on DVD.


Discuss


About the authors

Srinivasan Muralidharan photo

Srinivasan Muralidharan is a developer at IBM WebSphere Technology Institute. He is interested in all aspects of SOA, in particular middleware integration, ESB technologies, and performance.


David Maze photo

David Maze is an engineer in the IBM WebSphere DataPower XML Technology group. His major WebSphere DataPower SOA Appliances projects have included rewriting the XML Schema validation engine, software integration for the XG4 XML accelerator, and work on the WebSphere DataPower SOA Appliances rule-execution engine.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top


IBM, the IBM logo, DataPower, developerWorks, Redbooks, and WebSphere are registered trademarks of IBM in the United States, other countries or both. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Other company, product, or service names may be trademarks or service marks of others.