Integrate data with SoftLayer Object Storage by using IBM InfoSphere DataStage
IBM SoftLayer, one of the largest cloud infrastructure providers in the world, provides an integrated platform for public or private virtualized servers and bare metal servers. The SoftLayer services are delivered as a unified service that can be managed from a web portal. The services can also be accessed via robust and full featured APIs. SoftLayer Object Storage is a cloud storage solution ideal for cost effective, fault tolerant, and scale-out storage needs.
IBM InfoSphere DataStage is a comprehensive information integration platform. It profiles, cleanses, and transforms data from heterogeneous data sources to deliver consistent and accurate business data. It is an ideal solution to integrate and synchronize enterprise data with cloud data.
This article shows how to use IBM InfoSphere DataStage to integrate with SoftLayer Object Storage based on the OpenStack Object Storage API. Examples demonstrate several business integration scenarios. The examples can help our customers and consultants solve their integration issues.
You can download the sample jobs that are used in this article.
SoftLayer Object Storage provides a robust and highly scalable cloud storage solution to archive, search, and retrieve data across the Internet. SoftLayer Object Storage is based on the OpenStack Swift project. SoftLayer Object Storage enhances the OpenStack Swift with indexing and searching capabilities for quick access, worldwide data distribution and replication, and a powerful management toolkit.
IBM InfoSphere DataStage integrates data from various sources by using a high performance parallel framework. It supports metadata management and enterprise connectivity. The Hierarchical Data stage, a component of Information Server DataStage Version 11.3, is responsible for transforming hierarchical data and invoking Representational State Transfer (REST) web services. The Hierarchical Data stage includes the steps that are shown in Figure 1.
Figure 1. Steps available in Hierarchical Data stage
The transformation steps perform transformation operations on hierarchical data, such as joining two hierarchical structures that are based on the selected key fields by using the HJoin step.
The REST step invokes REST web services. The sample jobs in this article use the REST step to invoke the SoftLayer Object Storage services.
The XML Parser and Composer steps parse or compose XML data based on an imported XML schema.
The JSON Parser and Composer steps parse or compose JSON data based on a schema generated from JSON sample data.
OpenStack and SoftLayer Object Storage API
The OpenStack API is implemented as a set of REST web services for creating, modifying, and getting objects and metadata. Accounts, containers, and objects are all represented as REST resources. A REST resource is identified by an HTTP URI. To manipulate resources, the client applications communicate by HTTP protocol and exchange representations of these resources. HTTP methods are used to create, update, get, or delete resources. The Object Storage API supports responses in various formats such as text/plain, JSON, or XML.
The SoftLayer API design is based on the OpenStack Object Storage API. Major enhancements include:
- Different authentication points for public Internet and private network
- Search Service API
The scenarios in this article provide step-by-step instructions for designing DataStage jobs to load documents to SoftLayer Object Storage and extract the documents from the storage.
Scenario 1. Loading files to SoftLayer Object Storage
This scenario uses a simple job to show the required steps to authenticate to the SoftLayer Object Storage account and then load files to it. Figure 2 shows the sample job.
Figure 2. DataStage job for loading files to SoftLayer Object Storage
The Sequential File stage that is called InputData reads a file that contains the container name, the file path, and the file name, as follows:
- containerName contains the name of the container in the SoftLayer Object Storage where new files will be loaded.
- filepath contains the file path on the local machine which will be uploaded.
- fileName holds the name of the file that will be created in the SoftLayer Object Storage.
Figure 3 shows the input data that is being sent to the Hierarchical Data stage.
Figure 3. Data Browser view of the InputData Sequential File stage
Hierarchical Data stage design
A Hierarchical Data stage that is named LoadFiles is configured to load files to the SoftLayer account. The design of the Hierarchical Data stage is created in the Assembly Editor, which you can open by clicking Edit Assembly in the Hierarchical Data stage editor. The palette in the Assembly Editor contains all the steps available for the Hierarchical Data stage; each step performs a unique function. The required steps are dragged from the palette onto the Assembly Outline to design the data flow, as in Figure 4. The output of one step is passed to the following steps as input. The Input step reads data from the input links and the Output step can write to the output links.
Figure 4. Assembly Outline showing steps in the design
The Assembly Outline contains an Input step, two REST steps, and an Output step that perform the following tasks:
- The Input step reads the input data from the link named fileInput (the data that the Sequential File stage “inputData” stage outputs). The output of the step is passed to the REST step named Authenticate.
- The Authenticate REST step authenticates the SoftLayer Object Storage account by sending a GET request to the authentication end point with user name and API key as HTTP headers named X-Auth-User and X-Auth-Key. The response provides the authentication token and storage URL as the response headers named X-Auth-token and X-Storage-URL. An authentication token is used to validate the user for all further communication with the service. The storage URL contains the full path to the Object Storage account where the containers are created and files can be uploaded.
- The REST step named loadData:
- Sends the authentication token as an HTTP header to validate itself.
- Creates the file in the container where both the file name and container name are coming from the input flat file, in the path that is mentioned in the Storage URL.
- Loads the data to it.
Configuring the Authenticate step
Upon clicking the Authentication step, the General tab appears, in the middle pane of the Assembly editor, where the REST method and URL are specified, as in Figure 5. The GET method is selected for the HTTP method and a public authentication end-point URL is specified for the URL.
Figure 5. General tab of Authenticate step
On the Security tab, you can configure various security-related options and authentication mechanisms such as BASIC, DIGEST, LTPA, OAUTH, and SSL. SoftLayer is not configured with any authentication mechanism but it is enabled with SSL. It requires specifying SSL in the Security tab. As shown in Figure 6, the Enable SSL check box is selected so the REST step communicates with the service by using SSL. The Accept self-signed certificates check box is selected so that self-signed certificates can be accepted.
Figure 6. Security tab of Authenticate step
To authenticate the client, the OpenStack Object Storage API requires the client to send user name and API key in the X-Auth-User and X-Auth-Key HTTP headers. To send the user name and API key values, two custom headers that are named X-Auth-User and X-Auth-Key are created on the Request tab, as in Figure 7. The values for the headers are provided at the run time by using job parameters, so the Map check box is selected. The headers are available in the Mappings tab to be mapped to Job parameters.
Figure 7. Request tab of Authenticate step
In Figure 8, two custom headers that are named X-Auth-Token and X-Storage-Url are created. The REST service sends many headers in the response, but only the headers that are needed for further processing are specified in the Custom headers section of the Response tab. In this example, the X-Auth-Token header is a token that the REST service uses to validate the client for subsequent requests. The X-Storage-Url header contains the full path to the Object Storage account where the containers are created and the files can be uploaded.
Figure 8. Response tab of Authenticate step
The Mappings tab lists all the URL parameters, headers, and cookies that are used in the configuration of the REST step to which values need to be mapped and provided. As in Figure 9, the RESTRequests list is mapped to the top list, which is the default mapping. The number of REST requests that are sent to the service is determined by the number of times the source list occurs to which RESTRequests is mapped. As top occurs only once in the example, only one REST request is sent. The X-Auth-User and X-Auth-Key headers are created on the Request tab. The headers are mapped to job parameters whose values are provided during run time.
Figure 9. Mappings tab of Authenticate step
Every step in the assembly design contains Input and Output schema that are shown on the right side of the Assembly Editor GUI. The Input tab shows the input structure that is coming to a step, and the Output tab shows both the input structure and output structure that describe the processing result of the step. Figure 10 shows the GET response result of the Authenticate step. The headers that are defined in the Custom headers section of the Response tab appear under the headers group in the Output tab that can be used in subsequent steps.
Figure 10. Output schema tab of Authenticate step
Configuring the loadData step
The loadData step sends the X-Auth-Token that is received from the
Authenticate step as an HTTP header in the REST request so that the
service can validate the client. The URL in the loadData step is specified
by using three local parameters:
urlFromAuth parameter takes the value from the storage URL
received in the Authenticate step. The
FileName parameters take their values from the columns
containerName and fileName of the input step. When the SoftLayer Object
Storage service receives the REST request, it creates the file with the
name mentioned in the
FileName parameter in the container
that is specified by the
Container parameters and saves the uploaded data from the
client to the file.
In Figure 11, the PUT method is selected in the HTTP
method field. In the URL field, a URL consisting of local
parameters is specified. Local parameters are different from the job
parameters because they are available only within the particular REST step
where they are defined. Job parameters are available in all the stages of
the job. Local parameters are defined within angle brackets and can be
created either by typing the local parameter name within the angle bracket
or from the Insert Parameter window. The local parameters that are defined
in the URL field are available for mapping to the source elements on the
Mappings tab under urlParameters group. In this example, the value for
storage URL is generated during run time from the response header of the
Authenticate step, and the values for the
FileName parameters are fetched from the flat file.
On the Security tab, the Enable SSL and Accept self-signed certificates check boxes are selected.
Figure 11. General tab of loadData step
In Figure 12, the options Load the outgoing body from: and A file whose path is set in Mappings as bodyFilePath are both selected. Selecting A file whose path is set in Mappings as bodyFilePath creates a node named bodyFilePath on the Mappings tab. That node is mapped to the source column that contains the path of the file which needs to be sent as the request body. Content type denotes the content type of the request body; it is sent as a request header to the service. For Content type, text/ and plain are selected, as the Content type that is being loaded to the service is a text file. US-ASCII is selected from the Encoding type list. In the Custom headers section, X-Auth-Token is created, which is mapped to the token received in the Authenticate step.
Figure 12. Request tab of loadData step
No configuration is required on the Response tab because the response from the server is not required to be captured for this job.
In the Target column of Figure 13, urlFromAuth maps to X-Storage-Url and X-Auth-Token maps to X-Auth-Token, as received from the Authenticate step. Container maps to the containerName column, FileName maps to the fileName column, and bodyFilePath maps to the filePath columns, coming from the Sequential File stage.
Figure 13. Mappings tab of loadData step
Figure 14 shows the file that is uploaded to the SoftLayer Object Storage account after the job is executed.
Figure 14. Files that are loaded to SoftLayer Object Storage account
Scenario 2. Extracting data from SoftLayer Object Storage
This scenario uses an example job to illustrate the steps that are required to authenticate to the SoftLayer Object Storage account and extract data from the account. Figure 15 shows the sample job.
Figure 15. DataStage job to extract data from SoftLayer object storage
The Sequential File stage that is named InputData reads a file that contains the container name and the file name that will be extracted from the SoftLayer Object Storage account.
Figure 16 shows the input data that is being sent to the Hierarchical Data stage.
Figure 16. Data Browser View of InputData Sequential File stage
Hierarchical Data stage design
The Hierarchical Data stage is configured to extract files from the SoftLayer Object Storage account. Figure 17 shows the Assembly outline for the steps in our example design.
Figure 17. Assembly Outline steps for the design
The Assembly Outline contains an Input step, two REST steps, and an Output step that perform the following tasks:
- The Input step reads the container name and file name from the input link.
- The Authenticate step authenticates to the service by sending the user name and API key as request headers. In response, it gets the authentication token as an HTTP header.
- The getData step uses the authentication token that is received from the previous step to authenticate and get the files from the SoftLayer Object Storage account mentioned in the input flat file.
Scenario 1 explains the configuration of the Authenticate step. It sends the user name and API key to the service as HTTP headers. X-Auth-User and X-Auth-Key receive the authentication token and storage URL as response headers X-Auth-Token and X-Storage-Url.
Configuring the getData step
The getData step sends the X-Auth-Token that is received from the
Authenticate step as a request header so that the service can validate the
client. The getData step takes the storage URL received from the
Authenticate step, the local parameters
FileName from the columns container, and the
fileName of the input step and sends a GET request with the
URL to get the data.
In Figure 18, the GET HTTP method is selected and the URL composed of local parameters is specified. This step sends a GET request with the exact URL of the resource to be fetched.
Figure 18. General tab of getData step
In Figure 19, Enable SSL and Accept self-signed certificates are selected.
Figure 19. Security tab of getData step
In Figure 20, the X-Auth-Token header, which is mapped to the authentication token received from the Authenticate step on the Mappings tab, is created. As the Request body has no data, Load the outgoing body from is not selected.
Figure 20. Request tab of getData step
In Figure 21, Pass the received body to and A file whose path is set below are selected. The Output directory, Filename prefix, and File extension are specified in the respective text boxes. Whatever data the getData step receives from the service is directly written to the specified output directory, and the files are named with the specified file name prefix and extension. Here, text/ and plain are selected for Content type because the data that is getting extracted from the SoftLayer Object Storage service is in text format.
Figure 21. Response tab of getData step
In the Target column of Figure 22, urlFromAuth maps to the X-Storage-Url and X-Auth-Token maps to the X-Auth-Token. X-Storage-Url and X-Auth-Token are received from the Authenticate step. Container and FileName are mapped to the containerName and fileName columns that come from the input Sequential File stage.
Figure 22. Mappings tab of getData step
The job fetches all the files that are mentioned in the input flat file.
This article explained in detail how to design DataStage jobs to load data to a SoftLayer Object Storage account. You learned how to extract data from a SoftLayer Object Storage account by using DataStage. You can now modify the sample jobs to perform similar data integration operations with SoftLayer Object Storage. We also explored the main features of the new REST step in InfoSphere DataStage 11.3, including parameterized URLs, security configuration, and request and response configurations.
We would like to thank Deepa Ramamurthy Yarangatta and Poonam Sharma for their feedback and reviews of this article.
- Get more details on Hierarchical stage (XML Stage) of the IBM Information Server.
- Learn more about the REST Step that invokes REST web services from IBM InfoSphere DataStage.
- "An Introduction to Object Storage" (SoftLayer, 2014) discusses a robust and highly scalable cloud storage solution to achieve, manage, and serve a large quantity of structured and unstructured data.
- The OpenStack Object Storage API V1 Reference discusses how OpenStack API is implemented as a set of Representational State Transfer (REST) web services for creating, modifying, and getting objects and metadata.