Using InfoSphere Streams with Informix

How to set up and connect with Informix CLI and IBM Common Driver

This article shows you how to connect and use IBM® Informix® as a data source or data target with IBM InfoSphere Streams. It covers the use of both the Informix-specific protocol and the more general IBM Common Driver protocol used by several IBM database products. After reading this article, a reader will be able to use Informix in a Streams environment.

Jacques Roy (jacquesr@us.ibm.com), Technical Lead, IBM

Jacques Roy is a manager in the Informix development team covering application development and database extensibility features. He has more than 29 years of industry experience and has been with IBM since 2001. He is a well-known Informix author with multiple books, Redbooks, and developerWorks articles to his credit. He is also a frequent speaker at yearly conferences, such as the International Informix Users Group conference and the IBM Information on Demand conference.



28 February 2013

Also available in Chinese

Introduction

Today, it seems that everything is about big data. Since this subject covers a lot of ground, it also has different requirements based on the type of data we are talking about. We see needs for the analysis of massive amounts of unstructured data with InfoSphere BigInsights™, real-time processing with InfoSphere Streams, and processing of time-based data with Informix TimeSeries, for example. Even in the processing of time-based data, we can have multiple requirements — billing, customer inquiries, analytics, and real-time processing — to be able to respond to critical events quickly.

This article focuses on the basic setup needed to have InfoSphere Streams communicate with Informix as a source or a target. This is the first step that allows you to integrate these products.


Introducing Streams

InfoSphere Streams is a software platform that allows you to process data in real time. Speed is the essence of Streams. This is why it is designed to process data in memory.

The way to process streams of data is to string along operators that come with InfoSphere Streams or that were created by the users. This provides complete flexibility on how to process data.

Operators are available in what are called toolkits. There are about a dozen toolkits that come with InfoSphere Streams. One of them is called the database toolkit. It provides a set of operators to access databases including Informix and includes the following operators:

  • DB2SplitDB determines the partition to use when writing input tuples to a partitioned DB2® database. The operator submits the input tuple to an output stream that corresponds to the partition number.
  • DB2PartitionedAppend writes data to a table in the specified partition of a partitioned DB2 database. A row is appended to the table for each input tuple by using an INSERT statement.
  • NetezzaLoad performs high-speed loads into a Netezza database.
  • NetezzaPrepareLoad converts an incoming tuple to a delimited string, which can be written to Netezza with its external table interface.
  • ODBCAppend stores an input stream into a DBMS table using INSERT.
  • ODBCEnrich executes a SELECT for each input tuple.
  • ODBCRun executes a generic user-provided SQL statement.
  • ODBCSource generates a stream from the result of a SELECT statement.
  • SolidDBEnrich generates a stream from an input tuple and the result set of a solidDB® table query.

The operators we are interested in for our purposes are the ones starting with ODBC.

The database toolkit documentation says that Informix 11.50 is available using the Informix CLI (ODBC) driver. This may be the official statement, but my tests show that you can use 11.70 and higher. In addition, you can also use the IBM common driver, used by DB2, to access Informix. This is particularly interesting since it allows an application to access Informix and DB2 concurrently.


Introducing Informix

Informix is an extensible relational database product from IBM. It is used by many customers for mission-critical applications. Some of its key characteristics include its scalability, high-availability, and ease of administration. It is a "set it and forget it" type of database. This makes it a great choice for embedded systems of any size.

Informix also has additional features, such as support for spatial data and specialized storage and processing for time-based data. This last one makes it ideal as an operational data store for this type of data due to storage and access optimization built into the product. One such type of data is produced by smart meters being deployed by most utility companies. Many of these companies will go from collecting one reading per month to collecting it every 15 minutes. This means that they will now generate in three hours what they used to collect in one year — roughly 3,000 times more data than before. This is big data, and Informix is ready for this challenge.


Setting up for Informix CLI (ODBC)

Informix CLI is part of the Informix Client Software Development Kit (CSDK). The CSDK can also be downloaded separately from Informix database software.

The setup for Informix CLI requires the use of two files:

  • .odbcinst.ini: Define driver instances
  • .odbc.ini: Define data sources

These files are assumed to be in the home directory of the user that executes the program. Note that, in the case of InfoSphere Streams, it is the user that started the Streams instance. You can put these files in other locations with the following environment variables:

  • INFORMIXDIR — Location of the Informix CSDK installation
  • ODBCINSTINI — Location of the .odbcinst.ini file
  • ODBCINI — Location of the .odbc.ini file

Listing 1 shows an example of the .odbcinst.ini file. It assumes that the Informix CSDK was installed at /opt/IBM/informix.

Listing 1. Example of .odbcinst.ini content
[INFORMIX]
Driver=/opt/IBM/informix/lib/cli/iclit09b.so
Setup=/opt/IBM/informix/lib/cli/iclit09b.so
APILevel=1
ConnectFunctions=YYY
DriverODBCVer=03.51
FileUsage=0
SQLLevel=1
smProcessPerConnect=Y
[ODBC]
Trace=No
TraceFile=/tmp/odbc.out

The important part in this example is the Driver entry that tells the program where to find the needed shared library.

The .odbc.ini file provides information on the data sources available on that system.

Listing 2. Example of .odbc.ini content
[ODBC Data Sources]
DBCustomer=INFORMIX
 
[DBCustomer]
Driver=/opt/IBM/informix/lib/cli/iclit09b.so
Description=IBM INFORMIX ODBC DRIVER
Database=stores
LogonID=informix
pwd=in4mix
Servername=ol_ifmx1170

The first section defines the drivers used by the different data sources defined in this file. In our case, we only have one data source, and it uses the Informix driver defined in .odbcinst.ini.

Note that the Servername parameter refers to a name found in a file called by default $INFORMIXDIR/etc/sqlhosts that defined the servers available and how to communicate with them.

Additional setup for Streams with Informix

Once you have set up the Informix CLI environment, you need to set three additional environment variables for InfoSphere Streams. Once again, we assume that the CSDK was installed in /opt/IBM/informix:

  • STREAMS_ADAPTERS_ODBC_IDS=1 (indicates that we are using Informix)
  • STREAMS_ADAPTERS_ODBC_INCPATH=/opt/IBM/informix/incl/cli (where to find include files)
  • STREAMS_ADAPTERS_ODBC_LIBPATH=/opt/IBM/informix/lib (where to find the Informix libraries)

These are used in the compilation of the Streams Programming Language (SPL) programs into binaries executable.


Setting up for the IBM Common Driver

An application program can also use the IBM common driver for ODBC and CLI to connect to Informix. The Informix installation bundle includes this and other IBM Common Drivers. You can also find it as part of the Informix CSDK (Version 9.7 Fixpack 5 at the time of this writing).

The installation is quite simple: The CSDK, mentioned earlier, includes a directory called IBM_Data_Server_Driver_Package, where you can find a file like v9.7fp5_linuxx64_dsdriver.tar. You simply have to untar the file in the directory where you want the drivers to be installed, creating a directory called dsdriver. Then execute the installDSDriver script. The home directory of your driver is the dsdriver directory.

Setting up the environment for a connection is similar to what we saw earlier. There are multiple ways to set it up. Here, we will use the db2cli.ini file to do it. The file location is set in the DB2CLIINIPATH environment variable. Listing 3 shows an example of the content of this file.

Listing 3. Example of db2cli.ini content
[DBCustomer]
uid=informix
pwd=in4mix
autocommit=0
Database=stores
Protocol=tcpip
hostname=streamslab
Servicename=23509

In this example, we see that we are connected to the Informix database stores on streamlab using port 23509. We are using the username informix and password in4mix.

Additional setup for Streams with the common driver

As we have seen in the Informix CLI section, we need a few more environment variables for InfoSphere Streams:

  • STREAMS_ADAPTERS_ODBC_DB2=1 (indicates that we are using the IBM common driver)
  • STREAMS_ADAPTERS_ODBC_INCPATH=/opt/IBM/informix/dsdriver/incl/cli (where to find include files)
  • STREAMS_ADAPTERS_ODBC_LIBPATH=/opt/IBM/informix/dsdriver/lib (where to find the IBM common driver libraries)

Reading Informix from Streams

To read from a database, we use the database toolkit function ODBCSource. The program also needs a file that defines the connections available: connections.xml. Listing 4 shows the connection specification for our example on reading Informix.

Listing 4. Example of connections.xml content
<st:connections xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<connection_specifications>

<connection_specification name="DBCustomer" >
    <ODBC database="DBCustomer" user="informix" password="passw0rd" />
</connection_specification>

</connection_specifications>

<access_specifications>

<access_specification name="Customer">
    <query query="SELECT fname FROM customer" replays="1"
          isolation_level="READ_COMMITTED" />
    <parameters></parameters>
    <uses_connection connection="DBCustomer" />
    <native_schema>
        <column name="fname" type="CHAR" length="15" />
    </native_schema>
</access_specification>

</access_specifications>

</st:connections>

This example uses the Informix CLI setup with the data source DBCustomer. It defines an access specification named Customer that includes the SELECT statement to execute. For more information about the definition of this XML file, consult the Streams documentation (see Resources).

Here is a simple SPL program that reads from the database and writes the result to the standard output.

Listing 5. SPL program example (ReadFname.spl)
use com.ibm.streams.db::*; // make operators from the database toolkit available

composite ReadFname {
graph
  stream <rstring fname> Names = ODBCSource() { // return a stream called Names
    param
       connection : "DBCustomer"; // Identify the connection to use
       access : "Customer"; // identify the statement to execute
       connectionDocument : "connections.xml"; // identify the connections definition file
  }
() as Sink = Custom(Names) {
      logic onTuple Names : printStringLn(fname);
  }
}

If we wanted to use the IBM Common Driver, we simply have to change the environment variables and re-compile the program.


Writing to Informix from Streams

In this example, we use the ODBCAppend operator to INSERT data into a table. We use the output from the previous example to insert into a new table called target. The table has only one column called fname that matches the definition of the column from the input table.

We need to modify our connections.xml file. Here is the modified content with the new section highlighted.

Listing 6. New connections.xml version
<st:connections xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<connection_specifications>

<connection_specification name="DBCustomer" >
<ODBC database="DBCustomer" user="informix" password="passw0rd" />
</connection_specification>

</connection_specifications>

<access_specifications>

<access_specification name="Customer">
<query query="SELECT fname FROM customer"
          replays="1" isolation_level="READ_COMMITTED" />
  <parameters></parameters>
  <uses_connection connection="DBCustomer" />
  <native_schema>
    <column name="fname" type="CHAR" length="15" />
  </native_schema>
</access_specification>

<access_specification name="Target">
  <table tablename = "target" transaction_batchsize="100" rowset_size="10" />
  <uses_connection connection="DBCustomer" />
  <native_schema>
    <column name="fname" type="CHAR" length="15" />
  </native_schema>
</access_specification>


</access_specifications>

</st:connections>

Now we simply have to replace the Custom operator of our SPL program with an ODBCAppend operator.

Listing 7. Modified SPL program (RwFname.spl)
use com.ibm.streams.db::*; // make operators from the database toolkit available

composite RwFname {
graph
  stream <rstring fname>  Names = ODBCSource() { // return a stream called Names
    param
       connection : "DBCustomer"; // Identify the connection to use
       access : "Customer"; // identify the statement to execute
       connectionDocument : "connections.xml"; // identify the connections definition file
  }
  () as Sink = ODBCAppend(Names) {
      param
       connection : "DBCustomer"; 
       access : "Target"; 
       connectionDocument : "connections.xml"; 
  }
}

As mentioned, using one driver or the other is just a matter of changing the environment variables and re-compiling. That gives you the flexibility to easily move from one driver to another. It also gives you a choice if you are more comfortable with one driver over another.


Conclusion

InfoSphere Streams can complement the capabilities of Informix. Informix can also complement the real-time capabilities of Streams by providing an efficient permanent storage that can also be accessed to complement other stream processing. With the TimeSeries capability, Informix provides a unique solution to the need for time-based data processing. InfoSphere Streams and Informix can be the ideal solutions for many customers that have real-time data processing needs and operational data store needs on the same data.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=858714
ArticleTitle=Using InfoSphere Streams with Informix
publish-date=02282013