Integrating information from diverse sources across organizational boundaries has long been a bane for businesses and a boon for high-priced consultants. The same Internet that allowed individuals and organizations to share information with others using Web sites and emails is now poised to revolutionize the way data is shared and exploited in a business.
IBM Mashup Center facilitates individuals to share structured and semi-structured information stored in their documents or local databases, with others in the organization. Data from multiple data sources can be merged, filtered, sorted, grouped, and transformed to create feed mashups. Data in these mashups can then be accessed by other applications using standard HTTP calls or can be presented in a Web browser using Lotus Mashups.
IBM Mashup Center is composed of two tools that are loosely coupled through a common catalog. The first tool is the InfoSphere MashupHub that is used to create feeds and feed mashups and the second tool is the Lotus Mashup that can be used to create Web pages to display the data feeds using pre-fabricated widgets.
This article is about InfoSphere MashupHub, which is composed of tools focused on two functionality areas:
- Tools to create data feeds from different data sources
- Mashup Editor to integrate multiple feeds into a single data feed
A common catalog is used to store and publish these feeds. Utilities are provided to search, rate, and categorize feeds in the catalog.
Figure 1. InfoSphere MashupHub architecture
InfoSphere MashupHub application server is built on top of WebSphere Application Server and its surrounding J2EE technologies. It exposes a REST API to its clients who can access its services over a secure (HTTPS) connection.
Although an embedded Derby database is provided to create the default repository for the mashup catalog, support for other relational databases is also available. WebSphere Application Server administrative services are used for authentication and role management.
The server supports a wide range of plug-ins for data sources as well as plug-ins for the feed mashup engine and the mashup catalog repository. All feed data from internal as well as external data sources is cached.
The server provides services to the mashup client as well as to third-party feed readers or even other application servers or clients. When a feed or feed mashup is being created, the MashupHub client communicates with the server to fetch data from data sources, preview partial mashups, and to save the feed or feed mashup. Any feed that is published to the catalog can be accessed and viewed directly from the MashupHub client. The feed URL can also be invoked from a Web browser, feed reader, or any other application using standard HTTP protocol.
Note: Feeds that are designated as private or custom can be accessed using HTTP basic authentication with SSL.
The InfoSphere MashupHub client is a browser-based Integrated Development Environment (IDE) that provides a Rapid Application Developer environment for building feeds and feed mashups. It has been implemented using Ajax (Dojo toolkit) and uses REST calls to communicate with the application server.
Figure 2. InfoSphere MashupHub client
Different components of the MashupHub client have been designed with the domain expertise of the expected user and potential usage scenarios in mind. For example, a developer who is going to create a feed from a relational database needs to have at least a rudimentary knowledge of database schemas and SQL. A sales executive on the other hand, might be comfortable with creating a feed from an Excel spreadsheet. While a business analyst might just be interested in combining existing feeds to create a more analytic feed mashup.
The Web 2.0 focus on community development has influenced the presentation layer and the user experience. Conforming to this paradigm, the InfoSphere MashupHub enables developers to search, rate, and share their opinion about existing feeds and mashups, in addition to providing them with the traditional launch points for creating and editing them.
Figure 3. MashupHub catalog
InfoSphere MashupHub client provides tools to create data feeds from structured and semi-structured data sources. These feeds can be published as RSS, ATOM, or plain XML data streams that are accessible over the Internet or Intranet using a HTTP URL.
Support is provided for a diverse set of data sources,like file based semi-structured documents, relational or hierarchical databases, Web and directory services, and application and transaction servers.
Figure 4. Supported data sources for creating feeds
For detailed instructions on how to create feeds for all the supported data sources click on the help link on the main page.
In order to understand the product as well as the process, look at a sample scenario in which two feeds are created. One from an Excel document and the second from a relational database. These feeds are then integrating using the mashup Editor to create a composite feed mashup.
A company is manufacturing silverware and shipping it to its vendors on demand. As the price of the raw materials has almost doubled, the management wants to control inventory costs.
Sales executives maintain sales leads for different products in personal spreadsheets until a purchase order is raised by a customer. The production manager would like to have access to the sales lead information to better forecast their inventory requirements.
Using InfoSphere MashupHub, the production manager plans to access information stored in these personal spreadsheets of the sales executives and combine it with the product information in the database to get a clearer picture of inventory requirements.
Note: This scenario has been kept barebones in order to focus on the power of the mashup model and the simplicity of the process.
In order to build the inventory projection mashup, the production manager first creates a feed from data stored in the product table. Next, sales executives create feeds from their spreadsheets containing sales lead information. They also have to update the feed whenever they make any significant changes to the spreadsheet. Once all the feeds are available, the production manager can amalgamate the data from the two feeds to project inventory requirements.
The product details are stored in a relational database that is part of the inventory management system.
To create a new feed from the product table, the production manager first clicks on the New Feed link on the main page and selects Relation Database Query (SQL) from the supported data sources list. Then he enters the database name or selects one from the drop-down, if it is a local database, and clicks Next.
Figure 5. Relational database connection information
All the in scope tables in the database are listed and any one of them can be selected by clicking the check box in front of it.
The product information is stored in the SQLPRODUCT table and when this table is selected, the list on the right side is populated with the columns in the table. A SQL statement is generated for querying the selected columns in the table.
Note: Any user who is proficient in SQL can type the query directly into the advanced views SQL query text box.
Figure 6. Wizard to create the SELECT query for the feed data
On the next page, enter the feed title and description along with other properties, like public or private access, any search keyword tags, or cache times.
Note: Feeds can also be kept private or access can even be further customized.
Figure 7. Wizard to define the feed title and properties for the catalog
The finished feed is finally saved into the InfoSphere MashupHub catalog and can now be accessed by anyone who has privileges to invoke it.
The second step is to get access to the sales lead data stored on the personal computers of the sales executives. In order to do this, the sales executives need to create feeds from their spreadsheets and give read access to the product manager of these feeds.
Figure 8. Sample sales lead information spreadsheet
In order to create the feed, the sales executives need to select the Excel Workbook from the data source list, and click Next.
In the sample spreadsheet shown in Figure 8, only the first four columns are of interest to the production manager, so the sales executive would select the spreadsheet cell range from C2 to F13 in order to cover the product ID, name, quantity, and tentative purchase date.
Figure 9. Wizard to define spreadsheet paramters to create a feed
The rest of the steps are similar to the process for creating a feed from a relational table.
Once all the sales feeds are created, the product manager can now combine data from all of them and then integrate it with the product feed to calculate the potential inventory requirements.
Note: For simplicity sake, only one sales data feed is considered for this example. Part 2 of this series will show you how to combine multiple sales feeds and aggregate the total units for each product ID using the combine, group, and transform operators.
In the first step, the product manager loads the data feeds by dragging and dropping two source operators from the operator palette to the canvas. The details window automatically opens when the first source operator is dropped.
Note: The details window can also be opened by clicking on the dropped operator or from the context menu of the operator.
When the first source operator is dropped on an empty canvas, a publish operator is automatically created. This operator servers as the endpoint of the mashup feed and can be used to set the feed type to RSS, ATOM, or simple XML (default).
Each source operator is given a relevant name that is also reflected in the caption of operator in the canvas.
Figure 10. Drag and drop source operators to the canvas
The catalog explorer dialog is launched by selecting the From Catalog radio button, and clicking Browse.
The sales leads and the product list feeds are selected for the two source operators respectively.
Figure 11. Select feed source from the catalog
Sample data from each feed is loaded into the source operators and can be seen on the Preview tab.
The feed data loaded in the previous query is unconditional. To calculate the inventory requirement for a selected item in the sales leads feed, only the product data for that item is needed.
Note: This is the equivalent to adding a where clause in a SQL SELECT statement.
The filter operator is dragged and dropped from the operator palette to the canvas and is renamed to searchPID. Since the product feed is the one that needs to be filtered, consequently the product source is connected as an input to the filter operator by dragging the plug of the source operator to the socket of the filter operator.
Note: The product source operator's context menu can also be used to connect two operators.
The filter information is set in the condition grid of the Properties tab. The first column (Input element) of the grid defines the path in the input feed that needs to be compared for filtering. Clicking on a row in this column drops down a tree representation of a sample entry from the input product feed. When any node in the tree is selected, a path to that node is generated and added to the left side of the filter condition.
Note: This is equivalent to selecting a database column in the where clause of a SQL statement.
Figure 12. Add a filter condition to a path in the input feed
Since the expected value at the selected path in the filter condition is only decided during execution, it needs to be defined as a variable.
Note: This step is similar to creating a parameterized query in SQL.
The variable association dialog is launched by clicking the Value column of the condition row, and selecting Use a variable to return the value in the drop-down box.
Figure 13. Enable paramter binding by creating a variable for the filter condition
The value to the variable can be passed from another operator (as is shown in the next step) or even from the mashup URL at run time.
Note: It is necessary to have a default value for the variable, in order to preview the results of the filter operation.
In order to calculate the inventory requirements for each entry in the sales leads feed, the production manager needs to create a master detail data structure. This can be achieved by using the For Each operator with the spreadsheet source as the master feed and the filtered product source as the detail feed.
For each entry in the "hardeeps leads" (master/outer) feed, get the details from the filtered product (detail/inner) feed. The product ID (PID) for each sales entry is passed to the PID variable that was created in the filter condition of the product feed.
Note: This is equivalent to creating a join condition in a SQL query using PID as the key.
Figure 14. Create a master detail by passing product ID values from the sales feed to select an entry in the product feed
The result of the ForEach (master detail) operation has two records (rows) per entry. One record contains the item information from the sales spreadsheet, and the other one contains product details for that item.
Some of the information in these rows is either duplicate or irrelevant for inventory projection. A Transform operator can be used to modulate the result of the ForEach operation.
A Transform operator is dragged from the operator's palette, and the output of the ForEach (master detail) operator is plugged into it. A sample entry from the master detail Input is displayed on the left-side tree, while the right-side tree defines the Output template of the transformation.
The output template can be modified by using the Output tree's context menu to add new nodes to it or by dragging and dropping existing ones from the Input tree.
A new element called "item" is added to the Output tree entry node. Relevant nodes from the Input tree are dragged and dropped as children of this element.
Figure 15. Transform the result by eliminating redundant data
The transformed mashup now contains all the information that is useful to the production manager. The output of the Transform operator is plugged into the Publish (endpoint) operator and the mashup is saved into the catalog.
It can now be associated with widgets in the Lotus Mashup Editor to create a Web page that displays the mashup information in a tabular or graphical fashion.
The URL can also be invoked from any other client tool (for example a Web browser) or application that can access a data source using an HTTP protocol.
Figure 16. Open the mashup from a Web browser
IBM Mashup Center leverages the mashup model of Web 2.0 to enable organizations to tap into semi-structured information hoarded by individual employees and integrate it with structured data from conventional sources. This shared information can now be massaged, morphed, and migrated over corporate boundaries as data feeds, effectively turning the corporate Intranet into a loosely-coupled database.
Part 2 in the series will go into details of the Mashup Editor and its architecture. The use case of this article will be extended to explain the usage and reason for each operator.
IBM Mashup Center Wiki -- Demos, tutorials, articles, FAQs: Learn about IBM Mashup Center, contribute to its knowledge base, and collaborate with others.
developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.
Stay current with developerWorks
technical events and webcasts.
Get products and technologies
IBM Mashup Center:
Find an easy to use business mashup solution, supporting line of business assembly of
dynamic situational applications — with the management, security, and governance capabilities IT requires.
Build your next development project with
trial software, available for download directly from developerWorks.