IBM Support

Understanding the use cases of z/TPF support for MongoDB

Technical Blog Post


Abstract

Understanding the use cases of z/TPF support for MongoDB

Body

In November of 2015, IBM delivered z/TPF support for MongoDB.  This support allows standard, unmodified remote MongoDB clients to access and update z/TPF data by connecting directly to a z/TPF server for MongoDB.   Data that resides in z/TPF is binary data and customer-written z/TPF applications contain the knowledge of what the data is.   Using z/TPF support for MongoDB, the binary data in z/TPF is returned as a document in Binary JavaScript Object Notation (BSON) format which is easily consumed by remote systems.   No application changes are required on z/TPF.  This initial deliverable provides support for z/TPFDF databases.  

To represent the z/TPF binary data as a document, the binary data needs to be described by using Data Format Description Language (DFDL).   The DFDL description is then loaded to the z/TPF system and is used by z/TPF support for MongoDB to transform the binary data into a document, and vice versa.   Describing your data only needs to be done once per z/TPFDF file and then after the description is loaded to the z/TPF system, that z/TPFDF file on z/TPF is accessible by remote MongoDB clients.  In addition to accessing the z/TPF data by MongoDB clients, describing your z/TPF data in DFDL provides the following benefits:      

  • Enables you to send z/TPF business events of the z/TPF data in Common Base Event (CBE) format, XML or JSON.
  • Loading the same DFDL on remote platforms allows the remote platform to consume z/TPF binary data and understand its contents. 
  • The DFDL descriptions of z/TPF data may be used by future IBM deliverables.

The MongoDB server that runs on z/TPF is not a traditional MongoDB server implementation to access documents stored in MongoDB format.  Instead, it is code written by IBM to process MongoDB client requests and provides the conversion between z/TPF binary data and a format that is consumable by a remote MongoDB client. 

Remote access to z/TPF data

Before z/TPF support for MongoDB, there were two primary mechanisms to access data on z/TPF.

Web services directly into z/TPF

A z/TPF web service is a means for creating custom access to the z/TPF system.  A remote client can access z/TPF through a web service that is written and maintained by the z/TPF customer.  A good use case for the z/TPF service-based model is when the access consists of complex operations that require significant I/O to perform a business function.  For example, a reservation request can be done through a service.  However, because of the lack of standard access to data residing on the z/TPF system, simple web services have also been created to access or update one specific record or an individual field within a record.  When simple access is required to the z/TPF system it becomes too expensive to develop and maintain a custom service to perform these operations. 

Replicated z/TPF data on remote platforms

The z/TPF data can be replicated on a remote platform to provide standard access to the z/TPF data from remote clients.  The replicated z/TPF data needs to be transformed into another format to allow distributed applications to access the data in a standard way.

Replication of z/TPF data is generally accomplished through an event-based or time-initiated batch job infrastructure on z/TPF.  All of the updates made on the z/TPF system must be pushed out and transformed regardless of how much of the data was actually altered.  Access to replicated z/TPF data is by definition read only and can contain stale data because of delays between the time the primary z/TPF copy of the data is updated and the time the update is synchronized to the replicated copy on the remote system. 

z/TPF support for MongoDB provides another way to access data on z/TPF.  Determining when to use MongoDB instead of web services or data replication depends on a number of factors.   

Querying z/TPF Data: z/TPF Support for MongoDB vs Replicating Data to a Remote Platform

Replicating and transforming z/TPF data to a remote platform provides a way to access the z/TPF data by using standard access methods.  However, depending on the use cases and the frequency of updates to the z/TPF data being replicated, this approach might not be desirable.   Consider the following questions when choosing between access to z/TPF data through a replicated copy or directly from z/TPF using z/TPF support for MongoDB:

  1. Does the z/TPF data need to be the most current copy of the data?

    When data is updated on z/TPF, there is always a time delay when replicating the data to a remote platform.  This means the replicated data on the remote platform might not have the latest copy of the data.  If a use case requires that the data that is accessed needs to be the latest copy, then accessing the data from a replicated copy is not desirable.  In this situation, retrieving the latest copy of the data through the z/TPF support for MongoDB is more suitable. 
  2. Does the cost of replicating the data to another platform exceed the cost of accessing z/TPF directly through MongoDB? 

    The cost of replicating the z/TPF data on a remote platform may exceed the cost of accessing the z/TPF system directly by using the z/TPF support for MongoDB. 

    For example, let’s say you are replicating a database of one million customer records and the replication of the z/TPF data is done at the time the update occurs on a customer record.  Then let’s assume there are, on average, 1000 updates to this data per second on the z/TPF system.   The applications accessing the replicated copy of the z/TPF customer data are on average issuing 750 queries per second.  

    With these numbers, you might find that the cost of replicating the data exceeds the cost of accessing the z/TPF system directly by using z/TPF support for MongoDB.  

    In a replicated environment, there are also costs associated with supporting the remote replicated server; for example, maintaining hardware/software, floor space, power, etc.  Even if the example were 1000 updates/second and 1500 queries/second, it might make more sense to use z/TPF support for MongoDB because of the cost of a remote DB to keep up with those update rates plus the cost of the queries against the remote database might be higher than using z/TPF support for MongoDB.

    See the “z/TPF Interface for MongoDB Performance” section later in this article for performance measurements of various MongoDB operations.
  3. What are the types of queries being issued against this data?

    Sometimes the type of query issued against the replicated data differs from how the data is organized in z/TPFDF; for example, the data is not indexed on z/TPF in the way you want to query it.  In this case, it might make more sense to replicate the data and run the ad-hoc queries against the replicated copy to minimize the impact on the z/TPF system.   These types of queries running on z/TPF may result in thousands of I/Os to find the data the query is asking for. 

    Likewise, if the consumers of the data are SQL clients rather than MongoDB clients, it might make more sense to use a replicated SQL server rather than incur the cost of changing all of the clients that access this data to use MongoDB calls directly into z/TPF. 

Updating z/TPF Data: z/TPF Support for MongoDB vs Existing Web Services

Before z/TPF support for MongoDB, the only way to update data on the z/TPF system was to create custom application code (a web service) on z/TPF to perform the update.  The cost associated with creating and maintaining custom web services can be expensive.  With web services, exposing a different record or a different field within a record to remote users, requires writing new or updated service description and new application code on the server.  The z/TPF support for MongoDB provides a way to update data on z/TPF without updating or maintaining z/TPF application code.    

With z/TPF support for MongoDB, you can update a part of a z/TPFDF subfile or an entire z/TPFDF subfile.  For example, a good use case for MongoDB may be when a logical record needs to be inserted or removed from a z/TPFDF subfile.  Or if an individual field or fields within logical records need to be updated within a z/TPFDF subfile.  In addition, z/TPF support for MongoDB allows for creating and deleting z/TPFDF subfiles which includes the indexing and deindexing of the z/TPFDF subfile from the z/TPFDF index records.  All of these updates can be made using z/TPF support for MongoDB with standard MongoDB commands issued from a remote client. 

However, depending on the type of update being performed, updating data by using the z/TPF support for MongoDB might not be desirable.  MongoDB is document based and each document correlates to a z/TPFDF subfile.  The updates for a given request can only be applied to a single document in z/TPF.  

Let’s say you have a business operation that needs to update the following subfiles in a consistent manner (either all or none are updated):

  1. A z/TPFDF subfile representing a Customer Record
  2. A z/TPFDF subfile representing an Inventory Record
  3. A z/TPFDF subfile representing a Reservation Record

If you were to perform this business function with z/TPF support for MongoDB, three different requests would need to flow into z/TPF, with each request performing an atomic update operation on each document listed above (Customer, Inventory, and Reservation).   Ensuring the consistency of these updates would need to be handled by the Remote MongoDB client application to roll back any changes previously made if any of the operations were to fail.  

Depending on the complexity of the updates that are needed, it might be better to use a web service, allowing for the use of z/TPF locking and commit scopes.   For cases where the request from the client requires multiple database records or databases to be updated as well as business logic and rules, it makes more sense to centralize that code in one place in z/TPF that is accessed via a web service call rather than maintaining that business logic in each MongoDB client application.

z/TPF Support for MongoDB Use Case Comparison

The following table below summarizes the capabilities the three different types of z/TPF data access methods discussed in this article.

z/TPF data access

Read capability Guaranteed most current copy on read Update capability Complex updates No z/TPF application updates required Queries for analytic processing

Replicated data by using data events

X X X
z/TPF support for MongoDB X X X X
Web services X X X X

 

z/TPF Support for MongoDB Performance

The z/TPF support for MongoDB was designed to handle high access rates and to reduce the CPU costs of remotely accessing data on z/TPF.  All of the processing, apart from z/TPFDF and TCP/IP communication, is z/TPF Transformation Engine (TE) eligible workload. 

The following tests were performed on an IBM EC12 (2827-750) processor with the z/TPF system running in one LPAR, servicing requests from a MongoDB client running on a Linux on z LPAR in the same processor.   The z/TPF and Linux on z LPARs share the same OSA-Express card to minimize network latency. 

The following diagram depicts the environment the test was run in:

z/TPF support for MongoDB performance test environment.

The z/TPF and Linux on z LPARs both had one dedicated CPU defined.  The z/TPF LPAR was running with all system traces turned off.   The performance of the z/TPF support for MongoDB will vary from customer to customer because the complexity of the binary data has an effect on performance.   For example, if the binary data consist of 1000 1-byte fields versus one field of 1000 bytes. 

Performance Measurements for Querying Documents on z/TPF

The following measurements are for querying documents in z/TPF.   Using various size z/TPFDF subfiles, this shows the number of messages per second that was achieved with a single CPU to convert the binary z/TPFDF subfile into a document and send that back to the remote MongoDB client running on the Linux on z LPAR.

z/TPFDF subfile size (in bytes)

Overall Utilization

General Purpose Utilization*

Messages / Sec

Mils / Message

1000

99.2

40.7

9,613

0.103

10,000

96.3

20.9

3,413

0.282

*The general purpose utilization illustrates how much of the work is not z/TPF Transformation Engine (TE) eligible.   So for example, the query 10,000 byte subfile resulted in 78% of the work is TE eligible. 

Performance Measurements for Creating and Deleting Documents on z/TPF

The following measurements are for creating/deleting documents in z/TPF.   Using various size z/TPFDF subfiles, this shows the number of messages per second that was achieved with a single CPU to create and delete documents on z/TPF.   The result is an accumulation of creates and deletes, with 50% of the messages being create requests and 50% of the messages being delete requests.

z/TPFDF subfile size (in bytes)

Overall Utilization

GP Utilization

Messages / Sec

Mils / Message

1000

99.7

62.6

5,785

0.172

10,000

99.8

79.7

989

1.01

Performance Measurements for Updating Documents on z/TPF

The following measurements are for updating documents in z/TPF.   Using various size z/TPFDF subfiles, this shows the number of messages per second that was achieved with a single CPU to insert or remove logical records in a document.   The result is an accumulation of inserting a logical record into a document and removing a logical record from a document, with 50% of the messages being insert requests and 50% of the messages being remove requests.

z/TPFDF subfile size (in bytes)

Overall Utilization

GP Utilization

Messages / Sec

Mils / Message

1000

99.8

54.2

5,785

0.166

10,000

99.8

59.9

5,414

0.184

z/TPF Support for MongoDB Conclusion     

Having a standard way to remotely access data on z/TPF is extremely powerful.   There are clear cases in which using z/TPF support for MongoDB is the best option.  However, the z/TPF support for MongoDB is not a replacement for existing z/TPF custom web services or for replicating data off of the z/TPF system.  Understanding what the z/TPF support for MongoDB can do and the performance of it is the first step in determining the best way to remotely access data on the z/TPF system. 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSZL53","label":"z\/Transaction Processing Facility (TPF)"},"Component":"","Platform":[{"code":"PF036","label":"z\/TPF"}],"Version":"All versions","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

UID

ibm16213660