Integrate data with SoftLayer Object Storage by using IBM InfoSphere DataStage

In this article, learn how the Hierarchical Data stage in IBM® InfoSphere® DataStage® interacts with SoftLayer® Object Storage by using the Representational State Transfer (REST) API. The REST step in the Hierarchical Data stage invokes REST web services with support for various authentication mechanisms and Secure Socket Layer (SSL). This article provides step-by-step instructions for using the REST step to load files to, and extract files from, SoftLayer Object Storage by using the OpenStack REST API.

Jeff J. Li (liji@us.ibm.com), Software Architect, IBM

Jeff LiJeff Li is a software architect in the Software Group in Boca Raton, FL. He has worked with enterprise applications and data integration for 15 years. Currently, he is leading the development of the complex data integration solution for IBM InfoSphere Information Server.



Suraj R Patel (surajpatel@in.ibm.com), Senior Software Engineer, IBM

Photo of Suraj R. PatelSuraj R. Patel is a QA engineer for the Hierarchical Data stage in the Information Server arena. He has been with the Hierarchical Data stage for more than four years.



10 July 2014

Introduction

IBM SoftLayer, one of the largest cloud infrastructure providers in the world, provides an integrated platform for public or private virtualized servers and bare metal servers. The SoftLayer services are delivered as a unified service that can be managed from a web portal. The services can also be accessed via robust and full featured APIs. SoftLayer Object Storage is a cloud storage solution ideal for cost effective, fault tolerant, and scale-out storage needs.

IBM InfoSphere DataStage is a comprehensive information integration platform. It profiles, cleanses, and transforms data from heterogeneous data sources to deliver consistent and accurate business data. It is an ideal solution to integrate and synchronize enterprise data with cloud data.

Download and try out IBM InfoSphere products.

This article shows how to use IBM InfoSphere DataStage to integrate with SoftLayer Object Storage based on the OpenStack Object Storage API. Examples demonstrate several business integration scenarios. The examples can help our customers and consultants solve their integration issues.

You can download the sample jobs that are used in this article.


Product overview

SoftLayer Object Storage provides a robust and highly scalable cloud storage solution to archive, search, and retrieve data across the Internet. SoftLayer Object Storage is based on the OpenStack Swift project. SoftLayer Object Storage enhances the OpenStack Swift with indexing and searching capabilities for quick access, worldwide data distribution and replication, and a powerful management toolkit.

IBM InfoSphere DataStage integrates data from various sources by using a high performance parallel framework. It supports metadata management and enterprise connectivity. The Hierarchical Data stage, a component of Information Server DataStage Version 11.3, is responsible for transforming hierarchical data and invoking Representational State Transfer (REST) web services. The Hierarchical Data stage includes the steps that are shown in Figure 1.

Figure 1. Steps available in Hierarchical Data stage
Screen capture of the Hierarchical Data stage palette with REST step

The transformation steps perform transformation operations on hierarchical data, such as joining two hierarchical structures that are based on the selected key fields by using the HJoin step.

The REST step invokes REST web services. The sample jobs in this article use the REST step to invoke the SoftLayer Object Storage services.

The XML Parser and Composer steps parse or compose XML data based on an imported XML schema.

The JSON Parser and Composer steps parse or compose JSON data based on a schema generated from JSON sample data.


OpenStack and SoftLayer Object Storage API

The OpenStack API is implemented as a set of REST web services for creating, modifying, and getting objects and metadata. Accounts, containers, and objects are all represented as REST resources. A REST resource is identified by an HTTP URI. To manipulate resources, the client applications communicate by HTTP protocol and exchange representations of these resources. HTTP methods are used to create, update, get, or delete resources. The Object Storage API supports responses in various formats such as text/plain, JSON, or XML.

The SoftLayer API design is based on the OpenStack Object Storage API. Major enhancements include:

  • Different authentication points for public Internet and private network
  • Search Service API

The scenarios in this article provide step-by-step instructions for designing DataStage jobs to load documents to SoftLayer Object Storage and extract the documents from the storage.


Scenario 1. Loading files to SoftLayer Object Storage

This scenario uses a simple job to show the required steps to authenticate to the SoftLayer Object Storage account and then load files to it. Figure 2 shows the sample job.

Figure 2. DataStage job for loading files to SoftLayer Object Storage
Diagram of DataStage job for loading files to SoftLayer Object Storage

The Sequential File stage that is called InputData reads a file that contains the container name, the file path, and the file name, as follows:

  • containerName contains the name of the container in the SoftLayer Object Storage where new files will be loaded.
  • filepath contains the file path on the local machine which will be uploaded.
  • fileName holds the name of the file that will be created in the SoftLayer Object Storage.

Figure 3 shows the input data that is being sent to the Hierarchical Data stage.

Figure 3. Data Browser view of the InputData Sequential File stage
Screen capture of the Data Browser view of the InputData Sequential File stage

Hierarchical Data stage design

A Hierarchical Data stage that is named LoadFiles is configured to load files to the SoftLayer account. The design of the Hierarchical Data stage is created in the Assembly Editor, which you can open by clicking Edit Assembly in the Hierarchical Data stage editor. The palette in the Assembly Editor contains all the steps available for the Hierarchical Data stage; each step performs a unique function. The required steps are dragged from the palette onto the Assembly Outline to design the data flow, as in Figure 4. The output of one step is passed to the following steps as input. The Input step reads data from the input links and the Output step can write to the output links.

Figure 4. Assembly Outline showing steps in the design
Screen capture of the Assembly Outline depicting the steps involved in the design

The Assembly Outline contains an Input step, two REST steps, and an Output step that perform the following tasks:

  1. The Input step reads the input data from the link named fileInput (the data that the Sequential File stage “inputData” stage outputs). The output of the step is passed to the REST step named Authenticate.
  2. The Authenticate REST step authenticates the SoftLayer Object Storage account by sending a GET request to the authentication end point with user name and API key as HTTP headers named X-Auth-User and X-Auth-Key. The response provides the authentication token and storage URL as the response headers named X-Auth-token and X-Storage-URL. An authentication token is used to validate the user for all further communication with the service. The storage URL contains the full path to the Object Storage account where the containers are created and files can be uploaded.
  3. The REST step named loadData:
    • Sends the authentication token as an HTTP header to validate itself.
    • Creates the file in the container where both the file name and container name are coming from the input flat file, in the path that is mentioned in the Storage URL.
    • Loads the data to it.

Configuring the Authenticate step

Upon clicking the Authentication step, the General tab appears, in the middle pane of the Assembly editor, where the REST method and URL are specified, as in Figure 5. The GET method is selected for the HTTP method and a public authentication end-point URL is specified for the URL.

Figure 5. General tab of Authenticate step
Screen capture of the General tab of the Authenticate step

On the Security tab, you can configure various security-related options and authentication mechanisms such as BASIC, DIGEST, LTPA, OAUTH, and SSL. SoftLayer is not configured with any authentication mechanism but it is enabled with SSL. It requires specifying SSL in the Security tab. As shown in Figure 6, the Enable SSL check box is selected so the REST step communicates with the service by using SSL. The Accept self-signed certificates check box is selected so that self-signed certificates can be accepted.

Figure 6. Security tab of Authenticate step
Screen capture of the Security tab of the Authenticate step

To authenticate the client, the OpenStack Object Storage API requires the client to send user name and API key in the X-Auth-User and X-Auth-Key HTTP headers. To send the user name and API key values, two custom headers that are named X-Auth-User and X-Auth-Key are created on the Request tab, as in Figure 7. The values for the headers are provided at the run time by using job parameters, so the Map check box is selected. The headers are available in the Mappings tab to be mapped to Job parameters.

Figure 7. Request tab of Authenticate step
Screen capture of the Request tab of the Authenticate step

In Figure 8, two custom headers that are named X-Auth-Token and X-Storage-Url are created. The REST service sends many headers in the response, but only the headers that are needed for further processing are specified in the Custom headers section of the Response tab. In this example, the X-Auth-Token header is a token that the REST service uses to validate the client for subsequent requests. The X-Storage-Url header contains the full path to the Object Storage account where the containers are created and the files can be uploaded.

Figure 8. Response tab of Authenticate step
Screen capture of the Response tab of Authenticate step

The Mappings tab lists all the URL parameters, headers, and cookies that are used in the configuration of the REST step to which values need to be mapped and provided. As in Figure 9, the RESTRequests list is mapped to the top list, which is the default mapping. The number of REST requests that are sent to the service is determined by the number of times the source list occurs to which RESTRequests is mapped. As top occurs only once in the example, only one REST request is sent. The X-Auth-User and X-Auth-Key headers are created on the Request tab. The headers are mapped to job parameters whose values are provided during run time.

Figure 9. Mappings tab of Authenticate step
Screen capture of the Mappings tab of Authenticate step

Every step in the assembly design contains Input and Output schema that are shown on the right side of the Assembly Editor GUI. The Input tab shows the input structure that is coming to a step, and the Output tab shows both the input structure and output structure that describe the processing result of the step. Figure 10 shows the GET response result of the Authenticate step. The headers that are defined in the Custom headers section of the Response tab appear under the headers group in the Output tab that can be used in subsequent steps.

Figure 10. Output schema tab of Authenticate step
Screen capture of the Output schema tab of Authenticate step

Configuring the loadData step

The loadData step sends the X-Auth-Token that is received from the Authenticate step as an HTTP header in the REST request so that the service can validate the client. The URL in the loadData step is specified by using three local parameters: urlFromAuth, Container, and FileName. The urlFromAuth parameter takes the value from the storage URL received in the Authenticate step. The Container and FileName parameters take their values from the columns containerName and fileName of the input step. When the SoftLayer Object Storage service receives the REST request, it creates the file with the name mentioned in the FileName parameter in the container that is specified by the urlFromAuth and Container parameters and saves the uploaded data from the client to the file.

In Figure 11, the PUT method is selected in the HTTP method field. In the URL field, a URL consisting of local parameters is specified. Local parameters are different from the job parameters because they are available only within the particular REST step where they are defined. Job parameters are available in all the stages of the job. Local parameters are defined within angle brackets and can be created either by typing the local parameter name within the angle bracket or from the Insert Parameter window. The local parameters that are defined in the URL field are available for mapping to the source elements on the Mappings tab under urlParameters group. In this example, the value for storage URL is generated during run time from the response header of the Authenticate step, and the values for the Container and FileName parameters are fetched from the flat file.

On the Security tab, the Enable SSL and Accept self-signed certificates check boxes are selected.

Figure 11. General tab of loadData step
Screen capture of the General tab of the loadData step

In Figure 12, the options Load the outgoing body from: and A file whose path is set in Mappings as bodyFilePath are both selected. Selecting A file whose path is set in Mappings as bodyFilePath creates a node named bodyFilePath on the Mappings tab. That node is mapped to the source column that contains the path of the file which needs to be sent as the request body. Content type denotes the content type of the request body; it is sent as a request header to the service. For Content type, text/ and plain are selected, as the Content type that is being loaded to the service is a text file. US-ASCII is selected from the Encoding type list. In the Custom headers section, X-Auth-Token is created, which is mapped to the token received in the Authenticate step.

Figure 12. Request tab of loadData step
Screen capture of the Request tab of the loadData step

No configuration is required on the Response tab because the response from the server is not required to be captured for this job.

In the Target column of Figure 13, urlFromAuth maps to X-Storage-Url and X-Auth-Token maps to X-Auth-Token, as received from the Authenticate step. Container maps to the containerName column, FileName maps to the fileName column, and bodyFilePath maps to the filePath columns, coming from the Sequential File stage.

Figure 13. Mappings tab of loadData step
Screen capture of the Mappings tab of the loadData step

Figure 14 shows the file that is uploaded to the SoftLayer Object Storage account after the job is executed.

Figure 14. Files that are loaded to SoftLayer Object Storage account
Screen capture of files loaded to SoftLayer Object Storage account

Scenario 2. Extracting data from SoftLayer Object Storage

This scenario uses an example job to illustrate the steps that are required to authenticate to the SoftLayer Object Storage account and extract data from the account. Figure 15 shows the sample job.

Figure 15. DataStage job to extract data from SoftLayer object storage
Diagram of the DataStage job to extract data from SoftLayer object storage

The Sequential File stage that is named InputData reads a file that contains the container name and the file name that will be extracted from the SoftLayer Object Storage account.

Figure 16 shows the input data that is being sent to the Hierarchical Data stage.

Figure 16. Data Browser View of InputData Sequential File stage
Screen capture of the Data Browser View of InputData Sequential File stage

Hierarchical Data stage design

The Hierarchical Data stage is configured to extract files from the SoftLayer Object Storage account. Figure 17 shows the Assembly outline for the steps in our example design.

Figure 17. Assembly Outline steps for the design
Screen capture of the Assembly Outline depicting the steps involved in the design

The Assembly Outline contains an Input step, two REST steps, and an Output step that perform the following tasks:

  1. The Input step reads the container name and file name from the input link.
  2. The Authenticate step authenticates to the service by sending the user name and API key as request headers. In response, it gets the authentication token as an HTTP header.
  3. The getData step uses the authentication token that is received from the previous step to authenticate and get the files from the SoftLayer Object Storage account mentioned in the input flat file.

Scenario 1 explains the configuration of the Authenticate step. It sends the user name and API key to the service as HTTP headers. X-Auth-User and X-Auth-Key receive the authentication token and storage URL as response headers X-Auth-Token and X-Storage-Url.

Configuring the getData step

The getData step sends the X-Auth-Token that is received from the Authenticate step as a request header so that the service can validate the client. The getData step takes the storage URL received from the Authenticate step, the local parameters Container and FileName from the columns container, and the fileName of the input step and sends a GET request with the URL to get the data.

In Figure 18, the GET HTTP method is selected and the URL composed of local parameters is specified. This step sends a GET request with the exact URL of the resource to be fetched.

Figure 18. General tab of getData step
Screen capture of the General tab of getData step

In Figure 19, Enable SSL and Accept self-signed certificates are selected.

Figure 19. Security tab of getData step
Screen capture of the Security tab of getData step

In Figure 20, the X-Auth-Token header, which is mapped to the authentication token received from the Authenticate step on the Mappings tab, is created. As the Request body has no data, Load the outgoing body from is not selected.

Figure 20. Request tab of getData step
Screen capture of the Request tab of getData step

In Figure 21, Pass the received body to and A file whose path is set below are selected. The Output directory, Filename prefix, and File extension are specified in the respective text boxes. Whatever data the getData step receives from the service is directly written to the specified output directory, and the files are named with the specified file name prefix and extension. Here, text/ and plain are selected for Content type because the data that is getting extracted from the SoftLayer Object Storage service is in text format.

Figure 21. Response tab of getData step
Screen capture of the Response tab of getData step

In the Target column of Figure 22, urlFromAuth maps to the X-Storage-Url and X-Auth-Token maps to the X-Auth-Token. X-Storage-Url and X-Auth-Token are received from the Authenticate step. Container and FileName are mapped to the containerName and fileName columns that come from the input Sequential File stage.

Figure 22. Mappings tab of getData step
Screen capture of the Mappings tab of getData step

The job fetches all the files that are mentioned in the input flat file.


Conclusion

This article explained in detail how to design DataStage jobs to load data to a SoftLayer Object Storage account. You learned how to extract data from a SoftLayer Object Storage account by using DataStage. You can now modify the sample jobs to perform similar data integration operations with SoftLayer Object Storage. We also explored the main features of the new REST step in InfoSphere DataStage 11.3, including parameterized URLs, security configuration, and request and response configurations.


Acknowledgements

We would like to thank Deepa Ramamurthy Yarangatta and Poonam Sharma for their feedback and reviews of this article.


Downloads

DescriptionNameSize
The job explained in Scenario 1HttpsLoadFiles.zip12KB
The job explained in Scenario 2HttpsGetFiles.zip12KB

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=977201
ArticleTitle=Integrate data with SoftLayer Object Storage by using IBM InfoSphere DataStage
publish-date=07102014