Integrate InfoSphere Streams with InfoSphere Data Explorer

Push data records from Streams to Data Explorer using BigIndex APIs

Learn how to integrate IBM InfoSphere® Streams with InfoSphere Data Explorer to enable Streams operators to connect to Data Explorer to insert and update records. The article focuses on InfoSphere Streams 3.0 or higher and InfoSphere Data Explorer 8.2.2 or 8.2.3.

Roopa Vedagiri (roopavedagiri@in.ibm.com), Senior Staff Software Engineer, IBM

Roopa Vedagiri has been a senior software engineer on the InfoSphere Streams team at ISL for two years. With nine years of IT experience in various domains, she attained technical mastery certification in 2013 for her work with InfoSphere Streams.



04 March 2014

Also available in Spanish

Introduction

InfoSphere Streams is an advanced computing platform that enables user-developed applications to quickly ingest, analyze, and correlate information as it arrives from thousands of real-time sources. With InfoSphere Streams, you can analyze data in motion and can extend the value of existing systems by integrating with different applications. InfoSphere Streams can also process structured and unstructured data. It includes a set of built-in toolkits that simplify application development. These toolkits are reusable components that include operators, types, and functions. The toolkits are broadly categorized as standard toolkits, which contain generic operators, and specialized toolkits, which contain operators to handle domain-specific functions.

InfoSphere Streams Quick Start Edition

InfoSphere Streams Quick Start Edition is a complimentary, downloadable, non-production version of InfoSphere Streams, a high-performance computing platform that enables user-developed applications to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources. With no data or time limits, InfoSphere Streams Quick Start Edition enables you to experiment with stream computing in your own unique environment. Build a powerful analytics platform that can handle incredibility high data throughput, up to millions of events or messages per second. Download InfoSphere Streams Quick Start Edition now.

InfoSphere Streams can connect to InfoSphere Data Explorer using the primitive operator DataExplorerPush, which is part of the IBM big data platform and one of the specialized toolkits that gets installed with InfoSphere Streams. The DataExplorerPush operator enables InfoSphere Streams to push data in the form of records to InfoSphere Data Explorer using BigIndex APIs, as shown in Figure 1. The data pushed to InfoSphere Data Explorer is indexed and used in InfoSphere Data Explorer UIs.

Figure 1. Architecture diagram of InfoSphere Streams DataExplorerPush operator
Image shows that DataExplorerPush sends record to Data Explorer

What you need to know and software you need to install

This article assumes you have the following skills:

  • A basic knowledge and understanding of InfoSphere Streams infrastructure, Streams Processing Language (SPL), BigIndex APIs, Apache ZooKeeper and InfoSphere Data Explorer:
    • InfoSphere Streams DataExplorerPush operator is used to push data in the form of records to InfoSphere Data Explorer using BigIndex APIs.
    • BigIndex uses InfoSphere Data Explorer as its data store for all content ingested through its APIs.
    • ZooKeeper is used to configure a cluster environment. BigIndex uses ZooKeeper to maintain InfoSphere Data Explorer's configuration and contains its connection details (endpoint, username, and password) in the cluster.

Ensure the following software is installed and available:

  • InfoSphere Streams (3.0 or higher)— Installed on a single node or on a cluster. Set the STREAMS_INSTALL environment variable to the InfoSphere Streams installation directory.
  • InfoSphere Data Explorer 8.2.2 or 8.2.3— Installed and running. After InfoSphere Data Explorer is installed, make sure the additional configuration is complete. After installation, you can find the additional configuration details in the ReleaseNotes.txt file. The ReleaseNotes.txt file is in the directory <DataExplorer_Install>/AppBuilder/bigindex/docs/bigindex.
  • Apache ZooKeeper— Installed and running. The ZooKeeper version for each InfoSphere Data Explorer version is different. Refer to the InfoSphere Data Explorer installation guide for the detailed requirements.
  • The DataExplorerPush operator of InfoSphere Streams uses BigIndex APIs to insert records into InfoSphere Data Explorer V8.2.2 and V8.2.3. Therefore, the respective JAR files should be accessible from the InfoSphere Streams server. If the InfoSphere Data Explorer server is not on the same node as InfoSphere Streams, copy the install-dir/AppBuilder/bigindex.zip compressed folder from the node where InfoSphere Data Explorer is installed to a location that can be accessed from InfoSphere Streams. Extract the contents of the bigindex.zip folder .

Create or update the configuration of a ZooKeeper namespace

After you install InfoSphere Data Explorer and ZooKeeper, you need to create or update the ZooKeeper namespace for InfoSphere Data Explorer with the BigIndex configuration information and the InfoSphere Data Explorer installation information.

Note: The import or load operation you are about to perform completely replaces any existing configuration in the ZooKeeper namespace.

To import or load the BigIndex entity model file into the ZooKeeper namespace issue the ZooKeeper command: java -jar bigindex-X.Y.Z.jar --properties-file ZooKeeper.properties --import-file configuration.xml --export-to-screen.

For detailed steps on how to edit the configuration file and load it to the ZooKeeper namespace, refer to Create or edit the BigIndex configuration and Load the entity model file into the ZooKeeper namespace.

Pass the ZooKeeper properties file and the BigIndex entity model file to the command line utility, as shown in Listing 1.

Listing 1. Sample properties file
servers=testDep.in.ibm.com:1212
namespace=zookeeper

You can also pass the ZooKeeper properties directly instead of using the properties file by issuing the command java -jar bigindex-X.Y.Z.jar -n zookeeper -s testDep.in.ibm.com:1212 -i config.xml --export-to-screen.

The example configuration file ZookeeperExampleModel.xml is in the directory <DataExplorer_Install>/bigindex/examples/bigindex for InfoSphere Data Explorer V8.2.2 or V8.2.3, as shown in Listing 2.

Note: In the following example, two entity types are associated with different collections on the same server. To accommodate multiple servers, define an additional data-explorer-engine-instance and move one of the <serves> nodes under that instead.

Listing 2. ZookeeperExampleModel of InfoSphere Data Explorer 8.2.3
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<data-stores>
          <cluster-collection-store collection-name="zookeeper-example-test-collection"
                                    name="zookeeper-example-store"
                                    total-shards="1"/>
          <cluster-collection-store collection-name="zookeeper-example-test-collection2"
                                    name="zookeeper-example-store2"
                                    total-shards="1"/>
  </data-stores>
  <entity-model>
          <!--  The store-name below should match the name you provide for the store above -->
          <!--  The name of the entity here should match the entity type in the example -->
          <entity-definition name="zk_entity_type"
                             store-name="zookeeper-example-store" />
          <entity-definition name="zk_entity_type2"
                             store-name="zookeeper-example-store2" />
  </entity-model>
  <data-sources>
          <data-explorer-engine-instance username="your-username" 
                                     url="http://path-to-data-explorer-instance/
velocity.exe?v.app=api-soap&wsdl=1&use-types=true&" password="your-password" > <!-- The name here should match the name of any stores you created above --> <serves store-name="zookeeper-example-store" shard-number="1" total-shards="1"/> <serves store-name="zookeeper-example-store2" shard-number="1" total-shards="1" /> </data-explorer-engine-instance> </data-sources> </configuration>

Create or edit the BigIndex configuration

You can use the ZookeeperExampleModel.xml file to create a new entity model file. Modify the values according to the requirements and save it as config.xml. See the following list of required XML elements. For details, see the documentation in the <DataExplorer_Install>/AppBuilder/bigindex/docs/bigindex directory.>

  • Edit the XML element data-stores. Assign a collection-name and a store name. You can use choose any name, for example, zookeeper-collection1 and zookeeper-store1.
  • Edit the XML element entity-model. Assign an entity-definition name and a store-name. For example, the entity-definition name can be Tweet. Make sure the store-name matches the store-name provided under data-stores.
  • Edit the XML element data-source. Provide values for data-explorer-engine web url, username, and password, as in this example: url="http://testserver:9080/vivisimo/cgi-bin/velocity?v.app=api-soap&wsdl=1&use-types=true. Edit the store-name. Make sure the store-name matches the store-name provided under data-stores. Choose the shard-number and total-shards based on the requirements of the project and the infrastructure available.

Load the entity model file into the ZooKeeper namespace

To load the entity model, follow these steps:

  • For InfoSphere Data Explorer 8.2.2 and later releases, use <DataExplorer_Install>/AppBuilder/bigindex/lib/bigindex.X.Y.Z.jar.
  • Next, you need to create or update the ZooKeeper namespace using the config.xml file. Pass the configuration file created from the previous section and provide the ZooKeeper port number (the default port is 2181) and server name. Issue the command java -jar bigindex-X.Y.Z.jar -n zookeeper -s testDep.in.ibm.com:1212 -i ~/config.xml
    • -n: Refers to the Zookeeper namespace.
    • -s: Refers to the host and port of the ZooKeeper server.
    • -i: Refers to the configuration file.

    Note: The following parameters are required by the InfoSphere Streams operator:

    • The entity-definition name under the entity-model from the config.xml file needs to be passed to the InfoSphere Streams recordType parameter.
    • The zookeeperNamespace is passed to the Java™ command in the load model step. In this example, it is zookeeper. Make a note of ZookeeperEndpoints, which are the host and port of the ZooKeeper server. In this example, the host is testDep.in.ibm.com and the port is 1212.
 java -jar bigindex-X.Y.Z.jar -n zookeeper -s
 testDep.in.ibm.com:1212 -i ~/config.xml

Run a sample InfoSphere Streams application using the DataExplorerPush operator

This sample application demonstrates how to use InfoSphere Streams DataExplorerPush operator to push data to InfoSphere Data Explorer:

  • Set the STREAMS_INSTALL environment variable to the InfoSphere Streams installation directory.
  • Set the BIGSEARCH_JAR environment variable to the location of the .jar file. For example, for InfoSphere Data Explorer 8.2.2 or 8.2.3, if the .zip file <install-dir>/AppBuilder/bigindex.zip is copied and extracted to /opt/DataExplorer/bigindex, the export command is BIGSEARCH_JAR='/opt/DataExplorer/bigindex/lib/bigindex-x.jar. Replace x with the right version of the BigIndex .jar file.
  • The DataExplorerPush sample application can be found in the $STREAMS_INSTALL/toolkits/com.ibm.streams.bigdata/samples/DataExplorerPush directory. The contents of the directory include the Streams Programming Language (SPL) source file, the makefile, info.xml, and the subdirectories data and etc. The etc directory contains the connections.txt document.
  • Using the following command, create a new directory —sample1, for example — in your home directory and copy the DataExplorerPush samples to this directory.
    cp -R
    $STREAMS_INSTALL/toolkits/com.ibm.streams.bigdata/samples/DataExplorerPush
    $HOME/sample1/
  • Edit etc/connections.txt file with information about your connection to InfoSphere Data Explorer. Provide the ZooKeeperNamespace and ZooKeeperEndpoints.
  • Edit the SPL source file. Update the recordType parameter. This is the same as the entity-definition name.
  • Build the sample by running make. By default, the sample application is compiled as a distributed application. To compile the application as a stand-alone application, run make standalone:
    • If you compiled the sample as stand-alone, use the following command to run the sample application: ./output/bin/standalone.
    • If you compiled the sample as distributed, use the following command to run the sample application: streamtool submitjob DataPushExplorerMain.adl -i instancename.
    • To create an InfoSphere Streams instance, use the streamtool mkinstance command. For detailed instructions for creating a Streams instance, run streamtool help.

View the results of the sample application

The Tweet.txt file in the data folder is read by the FileSource operator, and tuples are sent to the DataExplorerPush operator. Records are created from these tuples on a per-tuple basis and are pushed to InfoSphere Data Explorer. To view the indexed records in InfoSphere Data Explorer console, follow these steps:

Note: Confirm that you have completed the additional configuration steps described in the ReleaseNotes.txt file, which is in the <DataExplorer_Install>/AppBuilder/bigindex/docs/bigindex directory. If these extra configuration steps are not performed, you cannot use the admin tool to search InfoSphere Data Explorer instances that contain BigIndex data. BigIndex data is stored in Data Explorer arenas, which are not supported in the end-user display. However, you can search one arena at a time by using the method describe in the InfoSphere Data Explorer ReleaseNotes.txt file.

  • After you enable the InfoSphere Data Explorer with the workaround to search the data stored in arenas, log into the InfoSphere Data Explorer console at http://DEP_HOST:PORT/vivisimo/cgi-bin/admin.
  • Select Search Collections under the administration tool on the left pane of the admin console.
Figure 2. Search collections admin tool
Image shows search collections admin tool
  • Click the collection name used in the InfoSphere Streams application.
Figure 3. Names for search engine collections
Image shows list of names for search engine collections
  • On the left pane of InfoSphere Data Explorer admin console page, click query-meta.
Figure 4. Test project with query-meta link
Image shows test project with query-meta link
  • Modify the URL by appending &arena=<recordType>. In this example, because the recordType is Tweet, append &arena=Tweet to the URL. The sample URL is similar to http://testserver:9080/vivisimo/cgi-bin/query-meta?v%3Asources=DEP3-r6_1_1&v%3Aproject=query-meta &arena=Tweet. The result page lists all the records for the selected arena.
Figure 5. Tweets in a results window
Image shows list of tweets in a results window

Summary

InfoSphere Streams can integrate with InfoSphere Data Explorer using the InfoSphere Streams DataExplorerPush operator. This article describes the steps to enable the integration and it shows how to run InfoSphere Streams samples to push data and view the results in the InfoSphere Data Explorer web console.

Acknowledgements

Thanks go to Manasa K. Rao and Scott Linder for their review and help.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=962470
ArticleTitle=Integrate InfoSphere Streams with InfoSphere Data Explorer
publish-date=03042014