Modified on by NatashaLishok
ENS is a very mature enterprise z/OS product team that is responsible for z/OS Communications Server, ISPF, and IBM Multisite Workload Lifeline. We have been around for 40+ years and already have good processes in place for design, development, build, test, information development and service. We made the transformation to agile several years ago and feel like our current processes allow us to deliver on 4 week sprints, innovate on items that don't work well and deliver with high quality.
When the team started our DevOps journey back in August of 2014 there was push back in the form of two questions:
1) Why DevOps ? We already use automation.
2) What problem are we trying to solve?
In order to answer these questions for a group of well seasoned software engineers we had to demonstrate WHY DevOps and be able to show small wins which made their daily tasks easier to do.
How did we do this? We started by doing two things in parallel:
First, we created a core DevOps work group which consisted of engineers from each of the disciplines of our product development life cycle (design, development, build, FVT, SVT, performance, IDD and service). Technical leaders who were open to change were selected for the core work group.
Second, we asked the entire organization two questions:
1) What are your top two pain points?
2) If you could change one thing in the organization what would it be and why?
While we were waiting on the answers to our two survey questions we did the following with the core work group:
1) We discussed terms that were overloaded and created definitions for those terms that were relevant to the team, specific to our organization and product. For example:
DevOps = maximize the predictability, efficiency, security and maintainability of operational processes - this objective is supported by automation
Continuous Integration = merging developer code into the product stream in regular, repeatable, short intervals and rapidly propagating that code to all test systems automatically and quickly
Pre-integration = work done before developer code is merged into the product stream, for example, unit and regression test and peer review
Continuous Deployment = after passing all the automated delivery tests, each code commit is deployed to end users as soon as it is available. Because changes are delivered quickly and without human intervention, continuous deployment can be seen as risky. It requires a high degree of confidence both in the existing application infrastructure and in the development team.
Continuous Test = continuous testing adds manual testing to the continuous delivery model. With continuous testing, the test group will constantly test the most up-to-date version of available code. Continuous testing generally adds manual exploratory tests and user acceptance testing. This approach to testing is different from traditional testing because the software under test is expected to change over time, independent of a defined test-release schedule.
Continuous Monitoring = monitoring the continuous testing and getting defects reported in real time
Continuous Delivery = a software development discipline where you build software in such a way that the software can be released to end users at any time
Production = today it means deploying to our SVT enterprise customer environment daily and in the future will mean deploying to an environment that can be accessed by our external customers to provide early feedback on pre-GA product code features
2) We defined the purpose and objective of the core work group: Our high level focus areas would be Culture, Process and Tools.
3) We appointed a manager as the owner, project manager and technical lead for the work group
4) We agreed to meet bi-weekly for one hour.
5) We created an online community to track and store our meeting agendas, actions and collateral.
6) We agreed to use value stream mapping to document our end to end pipeline and processes.
By our second core work group meeting we had the results to our survey questions from the organization. We were surprised by the feedback and how easy it was to identify one or two pain points that were pervasive across the organization. The key is to act fast to solve these first few pain points to demonstrate to the organization "Why DevOps" and get buy-in to the DevOps journey.
Answering the question "What problem are we trying to solve?" will start to become more clear as the team defines the overloaded terms, states the purpose and objective of the work group, documents the first pipeline of your product development life cycle and implements your first couple of small changes which proves the value of DevOps.
We will continue to share our strategies and experiences with this blog series and welcome your feedback!
You can also reach out to the author (Frank Varone) of this blog at email@example.com
Modified on by SamReynolds
Colocation, colocation, colocation! Does colocating your application workloads on the same z Systems physical machine (CPC) really matter? In some cases colocation really can make a big difference. When you have application workloads that have communication patterns that are network intensive, meaning they either frequently communicate (i.e. exchanging many messages) in order to complete a single transaction (e.g. multi-tiered application workloads) or they exchange large amounts of data (bulk, streaming or other big data type solutions such as analytics related workloads), then the physical location or proximity of the applications can make a difference. The differences can impact your cost and your overall results.
The IBM System z13™ and z13s™ introduced new technology that offers an opportunity for clients to take a closer look at this aspect of colocation of IBM z/OS application workloads. IBM introduced z Systems technology called Internal Shared Memory (ISM). The ISM technology allows one z/OS instance to directly access (share) virtual memory within another z/OS instance (e.g. LPAR or guest virtual machine) within the same physical machine. The ISM architecture enables direct memory access (DMA) capability for software exploitation.
With ISM, IBM also announced Shared Memory Communications – Direct Memory Access (SMC-D). SMC-D exploits ISM which enables applications to directly and transparently communicate with other applications executing in other z/OS instances running in other Logical Partitions on the same physical z13 System. The direct communications is provided transparently for applications using TCP sockets.
Some history will help give perspective. Prior to ISM, z Systems provided a very efficient technology called HiperSockets. HiperSockets provides a logical internal (logical) LAN within z Systems allowing the operating system to communicate using numerous protocols such as TCP/IP, UDP, SNA etc. Communications with HiperSockets is accomplished by creating, exchanging, and processing standard IEEE 802.3 packets (frames) in software. HiperSockets provides a very efficient memory to memory transfer (of standard packets) without requiring physical networking hardware.
SMC-D with ISM goes beyond HiperSockets by eliminating all packets along with all of the TCP/IP protocol and packet related processing. SMC-D provides a direct socket to socket transfer of data. This model provides significant savings in host network processing which translates to significant savings in CPU, latency, and throughput.
In addition to HiperSockets, z/OS instances on the same CPC can use other network technology to communicate to same-CPC z/OS instances, such as Ethernet using the IBM OSA-Express family of adapters. While there are several options, typically HiperSockets would provide the most optimal option. While HiperSockets will continue to be an important technology (i.e. due to its versatility), the benefits of SMC-D are compelling.
Shared Memory Communications architecture now has two variations:
- Shared Memory Communications – RDMA (SMC-R for cross platform using RoCE)
- Shared Memory Communications – DMA (SMC-D for same platform using ISM)
Both forms of SMC can be used concurrently. The protocol dynamically selects the appropriate variation based on proximity of the peer hosts (i.e. same CPC instances use SMC-D).
So what are the benefits or differences of SMC-D? Benchmarks results comparing the technologies have shown that SMC-D using ISM provides significant savings in CPU, latency, and throughput. Here is a quick performance summary of Request / Response patterns (transactional) and streaming (bulk) workloads highlighting the differences in performance when comparing SMC-D to HiperSockets:
- Request/Response Summary for Workloads with 1k/1k – 4k/4k Payloads:
- Latency: Up to 48% reduction in latency
- Throughput: Up to 91% increase in throughput
- CPU cost: Up to 47% reduction in network-related CPU cost
- Request/Response Summary for Workloads with 8k/8k – 32k/32k Payloads:
- Latency: Up to 82% reduction in latency
- Throughput: Up to 475% (~6x) increase in throughput
- CPU cost: Up to 82% reduction in network-related CPU cost
- Streaming Workload:
- Latency: Up to 89% reduction in latency
- Throughput: Up to 800% (~9x) increase in throughput
- CPU cost: Up to 89% reduction in network-related CPU cost
As you can see the benefits of SMC-D with ISM are compelling. If you currently exploit HiperSockets, then the applicability of SMC-D is easy for you to evaluate. If you are not sure if you have or could have the z/OS network traffic patterns that apply to your environment, then you can evaluate your workload network patterns using the SMC-Applicability Tool (SMC-AT).
With the potential for this type of savings it is easy to see how colocation of network intensive workloads on the IBM z13 or IBM z13s using SMC-D with ISM can make a difference.
 Benchmarks results shown here are from a controlled IBM internal lab using standard tools. Your actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM.
Modified on by NatashaLishok
Although being a highly unlikely occurrence, IT departments need to spend resources to ensure that their business critical data and applications can be successfully recovered in the event of an unplanned failure of their production site. Minimizing data loss is the highest priority but it can be at a tradeoff with application availability. Typically, following an unplanned outage, the entire data center can be restarted at the disaster recovery (DR) site but this can take several hours or longer before the applications are available.
What is sometimes overlooked is how to maintain application availability for a far more likely scenario, a planned outage for a maintenance activity. IT departments schedule maintenance windows in order to apply software fixes or perform application upgrades. Their goal is to minimize the number of maintenance windows required as well as ensure the duration of each window is as short as possible. Despite these efforts, these maintenance windows can still last for several hours and occur multiple times per year. Since recovering the data and applications on the DR site could take several hours, it makes little sense to attempt to utilize the DR site during planned maintenance activities, as the time it takes to switch to the DR site and back to the production site could be longer than the maintenance window itself. As a result, IT departments try to schedule these maintenance windows with the aim of minimizing the impact to their customers, usually on a weekend night.
What if there was a way to quickly switch access to business critical applications and their data from one site to another in a few minutes, rather than a few hours? With application unavailability for maintenance windows shrinking down to several minutes, these windows could be scheduled more frequently, ensuring systems and applications are always running with the most up-to-date fixes. So how can this be accomplished? By using a software data replication product to keep data sources used by the applications in sync across two sites, and IBM Multi-site Workload Lifeline to distribute connections for these applications, such a reduction in site switch times can be achieved.
IBM Multi-site Workload Lifeline, or Lifeline for short, provides the ability to perform a graceful switch of the applications and their data sources, called workloads by Lifeline, during planned outages. By using simple Lifeline commands, workload migration from one site to another can be easily performed, minimizing the down time for planned events such as scheduled maintenance activities. So what makes Lifeline different from existing disaster recovery solutions? Well first, Lifeline is not an all-or-nothing solution. Rather than initially plan for, and provide system resources for the planned recovery of all workloads in the production site, IT departments can focus on their most critical workload first, and gradually roll out the solution for additional workloads, as needed. A second differentiator is that Lifeline requires no application changes or changes to the clients accessing the applications and data. Following a planned outage, no manual changes in the network topology is necessary before the workload is able to be accessed on the alternate site.
As mentioned earlier, a key component to ensure a quick switch of applications and data to the alternate site is software data replication. Depending on the data source being used by the application, a different software replication product would be used to keep the data source in sync across the sites. For example, for applications utilizing DB2, IBM InfoSphere Data Replication for DB2 would be used to keep DB2 data in sync. Lifeline ensures connections for a workload are distributed to only one site at a time, to make certain that updates to the data source are occurring on only one site at any point in time.
Lifeline enables the graceful switch of a workload from one site to the other by:
- First, preventing new connections for the workload to be distributed to either site while allowing existing connections to the production site a chance to complete their work,
- Next, resetting any connections on the production site that have not completed their work. This guarantees that no additional updates to the workload's data source can occur on this site.
- Finally, allowing new connections for the workload to be distributed to the alternate site.
In subsequent blogs, I'll cover more topics such as how Lifeline can also be used to quickly recover from unplanned outages as well as the different types of workloads that Lifeline can provide workload recovery for both planned and unplanned outages. In the meantime, you can learn more about Lifeline by going to the following link:
Modified on by SamReynolds
It's looking like I'm seeing some sort of storage creep, and it seems to be related to z/OS CommServer TCP/IP. Sometimes I'm not quite sure how to approach identifying if the storage increase is related to TCP/IP, or even how to get to the root cause. Over time, I've learned it's helpful to use some simple commands to gain a general knowledge of my system's TCP/IP storage use. This has allowed me to not only protect my system, but also to quickly get to the bottom of any issues.
There are several commands I use to periodically monitor and collect information regarding TCP/IP's storage usage for current use, high-water mark, and limit (if I have it configured). Analyzing the data, I have learned over time what's typical for TCP/IP storage use during normal and acceptable peak workloads. My automation issues these commands every 15 minutes so they are recorded in the system log. This way there's historical information to pinpoint a problem area and time frame should TCP/IP's storage usage appear to be abnormal.
- D TCPIP,,STOR - issue this command for each of your TCP/IP stacks
- D NET,CSM - this command can be used to determine overall CSM ECSA and CSM Fixed storage utilization
- D NET,CSM,OWNERID=ALL - use this command to identify what application is using CSM storage and how much it is using
TCP/IP Storage -
TCPCS STORAGE CURRENT MAXIMUM LIMIT
ECSA 2858K 3313K NOLIMIT
PRIVATE 8631K 8634K NOLIMIT
ECSA MODULES 9671K 9671K NOLIMIT
HVCOMMON 1M 1M NOLIMIT
HVPRIVATE 1M 1M NOLIMIT
TRACE HVCOMMON 2579M 2579M 2579M
- Limits for ECSA and Private storage can optionally be configured in the TCP/IP Profile (GLOBALCONFIG statement, parameters ECSALIMIT and POOLLIMIT).
- There are no recommended limits that will work for every system because every system is different.
- This output does not include CSM storage.
ECSA storage - The use of common storage can be controlled with the TCP/IP GLOBALCONFIG profile statement - ECSALIMIT. ECSALIMIT limits can be set to keep TCP/IP from monopolizing all common storage on a system. This is to protect other subsystems' ability to access common storage in the event TCP/IP hits a situation where it consumes too much ECSA. This parameter is intended to improve system reliability by limiting TCP/IP's common storage use.
Private storage - The amount of storage TCP/IP uses in its user region. There are several ways in which you can limit TCP/IP's use of private storage:
- with the TCP/IP GLOBALCONFIG profile statement - POOLLIMIT
- via the REGION keyword in TCP/IP's start up JCL. This can also be overridden by installation exits such as IEFUSI. If you choose to limit the region size you should also set a POOLLIMIT in the profile.
ECSA modules - This is common storage used by TCP/IP load modules.
64-Bit common area size
64-Bit private area size
Trace HVCommon - 64-Bit common storage used for tracing
Here are some considerations you should make when choosing to define ECSA or private (pool) storage limits.
- Accommodate for temporary application "hang" conditions, where TCP/IP must buffer large amounts of inbound or outbound data. Add a reasonable fudge factor to the observed maximum usage values. It is not uncommon to set limits that are 50% over the peak usage.
- Care should be taken when coding the ECSALIMIT parameter. Setting it too low can cause TCP/IP to terminate prematurely.
- The benefit of specifying limits is that you will receive warning messages before storage obtain calls start failing when there is not enough storage available to satisfy the requests.
- ECSALIMIT does not include any of your CSM storage used by TCP/IP.
- When choosing to limit Private storage, make sure you don't use a value that is lower than or equal to what your installation exit (IEFUSI) enforces.
- Remember that the values set for ECSALIMIT and POOLLIMIT can be changed via OBEYFILE command processing (VARY TCPIP,,OBEYFILE).
CSM storage - The Communications Storage Manger (CSM) is a VTAM component that allows authorized host applications to share data with VTAM, TCP/IP, and other CSM users without the need to physically copy the data.
- Your CSM storage will be located in either ECSA or data space storage, and can be fixed or pageable.
- CSM storage definitions are controlled by SYS1.PARMLIB member IVTPRM00 which is read by VTAM during initialization.
- The limits you set can be dynamically changed with a MODIFY CSM command. This allows you to control the amount of CSM storage that can be used in ECSA or can be FIXED at any point in time.
D NET,CSM - This command provides a quick overview of how much storage has been allocated by CSM, and how much of it is in-use or free for use by a CSM user. You'll find that CSM can be in either ECSA or data space storage. The command output also lets you know what you have defined as the maximum in your IVTPRM00 parmlib member.
SIZE SOURCE INUSE FREE TOTAL
4K ECSA 144K 112K 256K
16K ECSA 16K 240K 256K
32K ECSA 0M 512K 512K
60K ECSA 0M 0M 0M
180K ECSA 0M 360K 360K
TOTAL ECSA 160K 1224K 1384K
4K DATA SPACE 31 0M 256K 256K
16K DATA SPACE 31 0M 0M 0M
32K DATA SPACE 31 0M 0M 0M
60K DATA SPACE 31 0M 0M 0M
180K DATA SPACE 31 0M 0M 0M
TOTAL DATA SPACE 31 0M 256K 256K
4K DATA SPACE 64 4352K 128K 4480K
16K DATA SPACE 64 0M 256K 256K
32K DATA SPACE 64 96K 416K 512K
60K DATA SPACE 64 0M 0M 0M
180K DATA SPACE 64 0M 360K 360K
TOTAL DATA SPACE 64 4448K 1160K 5608K
TOTAL DATA SPACE 4448K 1416K 5864K
TOTAL ALL SOURCES 4608K 2640K 7248K
FIXED MAXIMUM = 120M FIXED CURRENT = 6877K
FIXED MAXIMUM USED = 6877K SINCE LAST DISPLAY CSM
FIXED MAXIMUM USED = 6877K SINCE IPL
ECSA MAXIMUM = 120M ECSA CURRENT = 1633K
ECSA MAXIMUM USED = 1633K SINCE LAST DISPLAY CSM
ECSA MAXIMUM USED = 1633K SINCE IPL
CSM DATA SPACE 1 NAME: CSM64001
CSM DATA SPACE 2 NAME: CSM31002
D NET,CSM,OWNERID=ALL - Use this command to see how much CSM storage each of the CSM 'users' are currently using. If you want to see only the CSM usage of TCP/IP, you can also specify the user by "OWNERID=TCP ASID". (An example of OWNERID command output is not shown.)
Considerations to make when choosing to set your CSM limits:
- ECSA CSM can't be larger than your system ECSA limit which you defined in your system parameters in parmlib member IEASYSnn (CSA).
- When setting FIXED CSM, ensure that you have enough real frames to back the FIXED allocation.
So, now that you've learned to monitor TCP/IP's ECSA, private, and CSM storage usage, you may be thinking "what's next" if you think you're seeing a TCP/IP storage-related problem. If no SVC dumps are generated for the issue, when you see that storage use is on the rise, take a console dump of the TCP/IP address space. If you think the problem is related to your CSM storage usage by TCP/IP, include the CSM dataspace in the dump. Here's a sample console dump command you can use:
DUMP COMM=(tcpip storage growth)
If needed, the IBM Support Center can assist you with identifying the cause of your TCP/IP storage growth.
Modified on by NatashaLishok
Ever wonder why z/OS Communications Server support asks for multiple traces for network issues? If so, here is the reason why.
The z/OS packet trace is collected from the perspective of the z/OS host that is sending or receiving data. The trace is collected before the data reaches the physical network, a.k.a the OSA NIC (network interface card). So in the case of outbound or sent packets trace records are collected before the data is processed by the VTAM DLC layer to be sent to the OSA NIC. Conversely, for inbound or received packets, they are traced after they arrive over the OSA NIC and before they are processed by the DLC layer in VTAM and TCP/IP.
Note that for the majority of network throughput issues ALL packet application data is not needed in the packet trace, so feel free to use the "abbrev=100 option" when you collect it!
The next possible choice for a z/OS "network" type trace is the OSAENTA trace. This trace captures packets from the OSA NIC perspective. This means that once packets sent from TCP/IP are sent over the OSA NIC, they are captured in the OSAENTA trace. Conversely, once packets arrive over the OSA NIC, before they reach VTAM and TCP/IP, they are collected in the OSAENTA trace. Hopefully, the picture is now more clear!
Note that the OSAENTA trace does not collect the entire packet application data contents. It is truncated to 200 bytes, so keep that in mind!
When diagnosing a network performance issue it is imperative to have the full network picture or as close to one as possible. Many peer hosts are often multiple router hops apart which add multiple points to look at. Any delays captured with just a z/OS packet trace don't always tell the full story, so corresponding traces outside of z/OS are often requested. You may want to go ahead and collect an "external network" trace somewhere between the z/OS and remote host endpoints. This additional trace(s), when collected simultaneously with the z/OS packet trace, provides a greater insight as to where delays or packet loss may be occurring.
There are many workstation-based tools that are available for viewing network traces. The z/OS packet trace and OSAENTA traces are designed to be viewed with IPCS. I know not everyone is very comfortable or familiar with IPCS, so consider that there is a nice option called SNIFFER that can be used to format both of these traces into binary files that can be loaded into one of these other trace viewing products to make your life simpler without having to use IPCS to look at the traces.
Lin Overby and myself, from the z/OS Communications Server and ISPF design group, will be at the Edge2015 conference in Las Vegas from May 11th-15th. Lin will be presenting the following sessions:
z/OS Communications Server IPSec and IP Packet Filtering
Leveraging z/OS Communications Server Application Transparent TLS for a More Rapid TLS Deployment
z/OS Communications Server Intrusion Detection Services
Enabling Continuous Availability with IBM Multi-Site Workload Lifeline
and I will be presenting:
z/OS Communications Server Technical Update
z/OS Communications Server: Shared Memory Communications over RDMA (SMC-R)
ISPF Hidden Treasures and New Features, Parts 1 and 2
ISPF Editor - Beyond the Basics Hands-On Lab, Parts 1 and 2
I will also be at the IBM z Systems Technical University in Dublin from May 18-22. I will be presenting the following sessions:
z/OS Communications Server Technical Update
z/OS Communications Server: Shared Memory Communications over RDMA (SMC-R)
ISPF Hidden Treasures and New Features, Parts 1 and 2
If you will be at either conference, plan on attending our sessions to learn more about z/OS Communications Server and ISPF!
With z/OS V2R2, Communications Server brings enhancements in a number of areas, including scalability, simplification, and autonomics.
If there is a single, overarching theme in z/OS V2R2 CS, it is scalability. By enabling the TCP/IP stack and its strategic device drivers to utilize 64-bit (above the bar) storage,a substantial inhibitor to workload growth is relieved. Extremely large Enterprise Extender implementations benefit from internal optimizations to allow scaling to tens of thousands of connections per LPAR. And a substantially restructured IKE daemon provides a significant reduction in the time necessary to establish a large number of VPN connections.
Shared Memory Communications over RDMA (SMC-R), first introduced in V2R1 and on the zEC12/zBC12, is also enhanced in V2R2. Most significant is the ability to share the RoCE adapter, with a single adapter shareable by up to 31 virtual servers (LPARs or second level guests under zVM®). (This enhancement requires an IBM z13, and is also available via PTF on z/OS V2R1.) Are you unclear on how SMC-R will benefit your environment? A new SMC Applicability Tool is provided which will provide a projection of what percentage of your current traffic could benefit from SMC-R enablement. (This tool is also available via PTF on V1R13 and V2R1.)
The IBM Configuration Assistant for z/OS Communications Server has long been a valuable tool for configuring policy-based networking functions such as AT-TLS, IPSec, and Intrusion Detection Services. In V2R2, the Configuration Assistant gains an entirely new discipline: the ability to configure a TCP/IP profile, allowing a graphical interface and wizard-driven approach to configuring a TCP/IP stack.
There are additional enhancements in many other areas, such as security, TCP/IP autonomic tuning, and support for CICS transaction tracking. For more information, consider downloading the “z/OS V2R2 Communications Server Technical Update” presentation from the Winter 2015 SHARE Conference.
Modified on by SamReynolds
Last year I wrote the first blog in this series about our approach in getting started with adopting DevOps for z/OS. Lets take a closer look at some of the DevOps challenges for large operating system products and how we are addressing them.
How does operating system level code get to a DevOps continuous delivery paradigm? Most successful continuous delivery implementation examples are SaaS-type products. Operating system level code has significant differences from SaaS products/services:
- Not easily decomposable into smaller deliverables
- Customers value stability over having “latest and greatest feature set on a rapid delivery cycle"
- Multiple releases in support, each with its own service stream, complicates delivery
How to discourage yourself. Comparing “canonical” DevOps principles and examples to operating system level code is discouraging. We're so different from a SaaS product. This is z/OS, not Gmail! We can't decompose large, monolithic code into small enough customer deliverables. We have to maintain service streams on three releases in the field. We don't have any resources to create new automation. Our customers are very conservative about putting changes into production, and most of them won't accept continuous deliveries. There is so much control and process over what we can disclose to which customer. etc ...
The approach we used. Stop comparing ourselves to canonical SaaS continuous delivery products (like Gmail or Facebook) and finding ourselves wanting. Instead, work on improving internal processes across the organization and moving them toward continuous delivery through automation. Even if it's just to an internal test group, this would be a prerequisite to achieving continuous delivery anyway. Internal improvements are something that the development group controls and can show positive results early, even if small, to help get the right buy-in from the team.
How we moved forward. Thoroughly document existing delivery pipelines with an emphasis on identifying hand-offs and pain points. This information is often not fully documented in one place. This is across the organization, not just development. Formally documenting this has the following benefits:
- Requires the teams to think end to end about how they are delivering to customers today
- Provides opportunity to question “why are we doing it this way”
- Identifies pain-points and bottlenecks that can be addressed
- Identifies opportunities for automation and removing the waste out of the system
- Will help the team answer the question "What problem are we trying to solve?"
Make it graphical and clear. Pipeline documentation should be graphical and clear, which will make it easier to identify bottlenecks and pain points. The next two diagrams show parts of the z/OS Communications Server pipeline as an example. This is not the only way to do it, but it shows what’s meant by graphical and clear, and shows the value gained by doing it that way. We started with a very high level overview of the release cycle shown in diagram 1.
Then we zoomed in on specific phases for the more detailed view shown in diagram 2.
Look for bottlenecks and items to shift left. Graphical pipeline documentation should show hand-offs and bottlenecks. In the example in diagram 2, we saw that the build starts at 4:00pm but the SMP/E apply was scheduled for 8:00pm. Most builds completed in less than four hours so there was significant idle time. The result was we moved the build start time back to 5:00, which gave developers more opportunity to get fixes in that had “just missed” previous builds. Lots of small improvements like this can improve your development processes and efficiency.
Look for opportunities to automate. Pipeline documentation should also make clear what’s automated and what’s not. It should be easy to identify which processes are manual and should be candidates for automation. You need to assess the value of doing that automation so you can prioritize automation activities by the return on investment (ROI). Pick out the 1-2 highest value opportunities for automation and pursue those, and don’t try to do everything at once. It’s important to show positive results early to get buy-in from your development community.
Look for pain points to remove waste and inefficiencies. Pipeline documentation is a source for identifying pain points but not the only one. While developing the pipeline documentation, the developers should ask and document “what are the pain points”. But also a general question to the development, test, build, ID and support community on what their pain points are can be very beneficial. We were surprised with the low-hanging fruit that was identified which were process-related that yield better efficiencies for the development team. These small changes can go a long way toward getting buy-in with your development and test community. This is especially true for the ones that are rooted in “we’ve always done it this way”.
What about continuous delivery to customers? This goal can seem far-fetched for enterprise operating system level code but we looked at it differently. It’s true that our customers can be very conservative about what they will put into production. What if you were able to transform your current product delivery pipeline to continuously deliver code to SVT? Then you could think about other possibilities to give code earlier to our customers for early feedback. Our thinking needs to go beyond the traditional thinking of our current Beta and ESP programs. This plays into Design Thinking principles of sponsored users and early, regular feedback.
How can ENS afford to do DevOps with just our existing resources? It's important to acknowledge that DevOps is an investment and that creating pipeline workflows, automation and tooling work requires resources.
- We should collaborate with other z products where it is efficient and makes sense
- Treat DevOps as an investment that is weighted with other investments we make, e.g., new function development (including test)
- To make the necessary investments for DevOps in ENS, capacity for new function in releases under development is reduced. We go into this with our eyes open and make the right trade offs based on ROI
Conclusion. DevOps is a journey that we are on that involves the entire organization
- People – You need the right people to drive the culture change
- Process – Stop doing things that have no value (challenge the process)
- Technology – Standardize your tools and automation on current technology and limit the number of tools the team has to learn and use
- You start by understanding what you’re doing now and look for improvements, not by comparing yourself to the ideal case and getting discouraged over how far away you are from your goals
- Lots of small victories and improvements are the path to DevOps in mainframe operating systems
- Pipelines and improvements need to be developed from the bottom up (by the people doing the work) instead of the top down
- Using existing resource to make the investment for the future
If you are interested in our other blogs on our DevOps journey we have one on changing the culture and our process focus. We will continue to share our strategies and experiences with this blog series and welcome your feedback! You can also reach out to the authors Frank Varone (firstname.lastname@example.org) and Mike Fox (email@example.com).
The Winter 2016 SHARE Conference is in San Antonio, Texas next week (February 29th - March 4th). As always, there will be a good selection of content focused on z/OS Communications Server, including the following sessions from six of our team here in Research Triangle Park, NC:
- z/OS V2R2 Communications Server Technical Update, Part 1 of 2 (Gus Kassimis and Sam Reynolds)
- z/OS V2R2 Communications Server Technical Update, Part 2 of 2 (Gus Kassimis and Sam Reynolds)
- Shared Memory Communications over RDMA (SMC-R) - Optimized TCP communications over Ethernet (Gus Kassimis)
- New Shared Memory Communications protocol - Direct Memory Access (SMC-D) - Going beyond HiperSockets (Gus Kassimis)
- z/OS Communications Server Performance: Updates and Recommendations (Dave Herr)
- Sysplex and Network Technologies and Considerations (Gus Kassimis)
- Understanding z/OS Communication Server Storage Usage (Mike Fitzpatrick)
- z/OS Communications Server Network Security Overview (Lin Overby)
- z/OS Communications Server Intrusion Detection Services (Lin Overby)
- Safe and Secure Transfers with z/OS FTP (Lin Overby and Sam Reynolds)
- Enterprise Extender on z/OS Communications Server: SNA Hints and Tips (Sam Reynolds)
- TCP/IP Stack Configuration with Configuration Assistant for z/OS V2R2 CS: Hands-on Lab Part 1 of 2 (Mike Fox)
- TCP/IP Stack Configuration with Configuration Assistant for z/OS V2R2 CS: Hands-on Lab Part 2 of 2 (Mike Fox)
- Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Workload Lifeline (Mike Fitzpatrick)
Also, there will be a panel sessions for open discussion of mainframe networking topics:
- z/OS Communications Server Free-for-All (Matthias Burkhard, Mike Fitzpatrick, Dave Herr, Gus Kassimis, Lin Overby, and Sam Reynolds)
Lastly, I will be presenting the following ISPF topics:
- ISPF Hidden Treasures and New z/OS 2.2 Features
- ISPF Editor - Beyond the Basics Hands-on Lab, Part 1 of 2 (with Tom Conley and Liam Doherty)
- ISPF Editor - Beyond the Basics Hands-on Lab, Part 2 of 2 (with Tom Conley and Liam Doherty)
We hope to see you there! For those that can’t join us, I’ll be tweeting (IBM_Commserver on Twitter) and posting updates to Facebook (Facebook.com/IBMCommserver) throughout the week.
Modified on by JeffHaggar
The QDIO Accelerator function can boost performance of IPv4 traffic forwarded over OSA-Express QDIO and HiperSockets interfaces including sysplex distributor traffic which is routed to a target stack. The optimized packet forwarding provided by QDIO Accelerator improves latency and reduces CPU consumption.
The function applies to traffic which arrives inbound over an OSA-Express QDIO or HiperSockets interface and is forwarded outbound over OSA-Express QDIO or HiperSockets. With QDIO Accelerator, the first time such a packet is forwarded using a given route in the stack routing table, the z/OS stack creates a QDIO Accelerator route. Subsequent eligible packets which would normally be forwarded by the stack on this route instead get processed at the DLC layer without having to traverse the forwarding stack. This provides a much more efficient path for that traffic.
QDIO Accelerator can be especially valuable for sysplex distributor traffic being forwarded to a target stack when you do at least one of the following:
- use HiperSockets to provide dynamic XCF connectivity between stacks on the same CPC
- use VIPAROUTE to route packets to a target stack over an OSA-Express QDIO interface
To enable the QDIO Accelerator function, specify QDIOACCELERATOR on the IPCONFIG statement in the TCP/IP profile of any stack which will perform IP forwarding or which is serving as a sysplex distributor stack. You can enable VIPAROUTE to a specific target stack by using the VIPAROUTE statement in the VIPADYNAMIC block. With VIPAROUTE, the stack forwards packets from the distributor stack to a target stack using a route from the stack routing table rather than using the dynamic XCF connectivity.
To display the QDIO Accelerator routes, use the Netstat ROUTE/-r report option with the QDIOACCEL modifier. You can use the Netstat VCRT/-V report with the DETAIL modifier to see which sysplex distributor connections are eligible for acceleration.
With VTAM tuning statistics, you can display information such as the number of packets and bytes accelerated for each interface. Because the accelerated packets do not traverse the forwarding stack, these packets are not included in a packet trace on that stack. However, these are included in an OSA-Express Network Traffic Analyzer (OSAENTA) trace.
Beginning with z/OS CS V2R1, QDIO Accelerator can co-exist with IP security. IP forwarded packets can be accelerated as long as all routed traffic is permitted by your IP filter policy and is not subject to logging. Sysplex distributor traffic is always eligible for acceleration using QDIO Accelerator because these packets are subject to IP filtering at the target stack rather than the distributor stack.
Modified on by SamReynolds
In 2013 the IBM zEC12 (zBC12) introduced the 10GbE RoCE Express feature. RoCE Express provides an RDMA capable network adapter providing access to RDMA over Converged Ethernet (RoCE). RoCE provides an optimized network interconnect for System z communications. Along with RoCE Express, z/OS provided a new RDMA based solution called Shared Memory Communications over RDMA (SMC-R). SMC-R is a sockets based solution providing transparent access to RoCE for TCP sockets applications over standard Ethernet.
The IBM System z13 introduces the capability to share (virtualize) the 10GbE RoCE Express feature among multiple (up to 31) LPARs (or z/VM guest virtual machines) using standardized PCIe virtualization (SR-IOV) technology.
When RoCE Express is exploited by z/OS with SMC-R, the combined solutions provide two key value points:
Improved latency that can provide improved transaction rates for latency sensitive transactional based workloads, and
Lower CPU cost for workloads that transfer larger payloads (i.e. analytics, streaming, FTP, big data, data replication, web services, etc.)
SMC-R does this while preserving the critical qualities of services (load balancing, security, isolation, reuse of IP topology, etc.) required by system z clusters in enterprise data center networks without requiring any application or middleware changes along with zero or minimal operational changes.
When customers enable SMC-R they should immediately experience the benefits, and longer term the benefits can be extended as they expand their exploitation of RDMA technology on System z.
With the IBM System z13 RoCE virtualization capability users can now share RoCE Express features which:
Extends access to RoCE to additional workloads across multiple z/OS instances (LPARs) which will reduce the number of required physical RoCE features
Expands (effectively doubles) the bandwidth of your RoCE Express features by enabling concurrent use of both 10GbE RoCE Express physical ports
Customers who have multiple CPCs in a single site (or an extended LAN among multiple sites) with z/OS centric workloads (e.g. SYSPLEX, DB2, WAS, CICS, MQ, IMS etc.) will be natural candidates for benefiting from RoCE Express and SMC-R.
So, if you understand the technology but you’re not sure you have an environment that might benefit from it, then we can offer you some help. If you would like some guidance and assistance with assessing how your specific application workloads might be applicable to SMC-R, or possibly assess the potential level of benefits that you might anticipate for your environment, a new tool called Shared Memory Communications Applicability Tool (SMCAT) has been created. SMCAT has been provided to assist our customers with this assessment process. SMCAT is now available via PTF for z/OS V1R3 (UI24872) and z/OS V2R1 (UI24762) customers. SMCAT does not require SMC-R or any special hardware. Instead SMCAT monitors your existing TCP/IP workloads and produces a summary report to help you understand how your workloads might be eligible for and benefit from SMC-R and RoCE Express.
If you have additional questions about SMC-R, RoCE Express, RoCE virtualization or using SMCAT then your next step is exploring the reference materials provided at:
If you are just getting started then the FAQ document might be a good first step. You can also reach out to the author (Jerry Stevens) of this blog at this email: firstname.lastname@example.org
The Summer 2014 SHARE Conference was a great educational event and continued celebration of the 50th anniversary of the mainframe! Six speakers from the Enterprise Networking Solutions organization presented 12 sessions on z/OS Communications Server and 4 on ISPF, and also participated in 3 z/OSMF hands-on labs. As with Winter SHARE, there continued to be a focus on z/OS V2R1, including an overview of z/OS V2R1 Communications Server, a detailed review of the Shared Memory Communications for RDMA (SMC-R) capability, and a V2R1 performance update. Other topics discussed included Enterprise Extender, sysplex technologies, network security, z/OS mail strategy, and z/OS CS hints and tips. We also had our first zNextGen session with our "Introduction to z/OS Communications Server." Attendance at our sessions (and across the board at SHARE in Pittsburgh) was very good, and we would like to thank all of you who attended our sessions for the great feedback and dialogue.
SHARE loves user experience sessions, and we were fortunate to have a z/OS CS-centric user experience session at this conference: Jim Darby of Nordstrom and Tom Cosenza of IBM Lab Services presented their experience with implementing IPSec on z/OS. Thanks to both of these gentlemen for their presentation!
For those that couldn't be at the conference last week, I will remind you that you can download most of the charts for the topics we presented by going to the following link:
Please plan to join us for the Winter 2015 SHARE Conference in Seattle, Washington, March 1 - 6, 2015.
Modified on by SamReynolds
As Jerry Stevens wrote in his March 8 blog post, the IBM System z13™ and z13s™ introduced an exciting new technology called Internal Shared Memory (ISM) which allows one z/OS instance to directly access (share) virtual memory within another z/OS instance (e.g. LPAR or guest virtual machine) within the same physical machine via DMA. At the same time, z/OS Communications Server introduced Shared Memory Communications – Direct Memory Access (SMC-D), which uses ISM so that TCP sockets applications can directly and transparently communicate with applications executing in other z/OS instances running in other Logical Partitions on the same physical z13 System. Put simply, SMC-D provides SMC-R semantics within a CPC, essentially by substituting RDMA and RoCE with DMA using ISM. (Check out Jerry's blog post for a more complete description.)
Both of the SMC technologies can provide some serious CPU reduction and equally serious throughput boosts. But what about security? How do SMC-D and SMC-R fit in with existing security domains like VLANs? What sort of isolation is available across different RoCE and ISM interfaces? And what happens to all of those security features in Communications Server when the vast majority of the application data is being passed between the communication peers using some form of DMA? We're talking about features like:
- SAF-based access controls like PORTACCESS and NETACCESS
- IP packet filtering
- Cryptographic security protocols like TLS/SSL, SSH and IPsec
- Intrusion Detection Services
How do those security features play (or not play) together with SMC-R and SMC-D?
The short answer is generally "just fine." The main reason for this happy coexistence is that SMC technologies preserve the TCP semantics for managing the sessions -- again, SMC is completely transparent to the TCP applications programs. Since many of the Communications Server's security features operate on or within some aspect of the TCP session semantics, they can continue to operate as usual. Of course, the TCP/IP stack has a little more to keep track of, and has to ensure that when a significant change occurs in the state of the TCP session (for example, access controls change on the fly, preventing access to a port that was previously permitted), the stack needs to reflect that same change on the related SMC-R or SMC-D session. But again, all of this is transparent to the applications.
For a the complete security story around Shared Memory Communications, check out this newly revised white paper . Originally published after V2R1 to explain the SMC-R security considerations, this paper is now expanded to cover SMC-D as well.
So remember, colocation, colocation, colocation, but also make sure you lock the doors and engage that security system!
Modified on by SamReynolds
If you had a chance to check out my previous blog on changing our culture then you may recall that we identified the three pieces of DevOps as Culture, Process and Tools. This blog will discuss how we addressed the second prong of the DevOps journey for ENS.
When the ENS journey began, we had to look at the process focus areas that made sense for our organization. We decided on Continuous Improvement, Continuous Integration, and Continuous Delivery. Once we identified the target areas, then we had to define what those mean for ENS and what initiatives we could manage with our current workload. Here are some examples of the initiatives we tackled as part of the process piece of our journey.
Continuous Improvement - The initiatives we decided on were based on pain points and areas of improvements that we could only learn were needed from going through the exercise of value stream mapping. This process is time consuming and even a little bit tedious but it really is crucial for kicking off any DevOps journey. Most of us are too ingrained in our own daily processes to see where areas of improvement are needed and it often takes someone indirectly or not at all related to the process to ask the necessary questions.
- Performance issue with the publications review tool: When going through the publication reviews process from the developers' stand point, we learned there was a performance issue with the tool that developers use to review publications. This is something that was considered part of the Information Development (ID) process but the issue was on the developer/reviewer's side. Had we not looked at it from all the parties involved, we might have overlooked this issue. We engaged the team that supports this tool to upgrade their database version, significantly improving end-user performance.
- Some items were obvious, such as upgrading tools and migrating to newer versions of our internal source control manager to allow better collaboration and increased functionality.
- After reviewing the daily build process in our core workgroup, we realized we were starting our builds over an hour earlier than was necessary, so we moved the build time out and this gave the development team more time every day to check in code.
Continuous Integration - The first step was to identify what CI meant to us, the ENS organization. Where are we integrating continuously? With whom? How will we do it? We know we are agile and we know that Systems Verification Test (SVT) doesn't need to wait until we complete every line item to begin testing, so when we looked at our product and our customer's needs we knew that a daily code delivery to SVT was our who.
The biggest goal here is to prevent build failures, which slow the availability of code to test. Our organization has a build verification test (BVT), or smoke test, that every new build must pass before it's made available to test. We determined that we could reduce the number of BVT failures by making that BVT available to all developers, so they can run it on their patch before integrating it into the build. We are also working on beefing up our collection of automated test buckets and making them easily available for developers to run on their patches before integrating them into the build. This emphasizes a key DevOps principle, which is to fail early and cheaply.
Continuous Delivery - Again we had to decide what this meant to ENS. (See a theme?) We know who we are and who we are NOT. We aren't writing applications for a mobile device or updating a search engine. We are a strictly on-premises product that is built on security and stability and our customers do not want frequent new versions. But we need to be able to get feedback earlier in the development cycle and to do that we cannot wait until a beta program that starts near the end of a two-year release cycle to get customer feedback. So we took the Configuration Assistant piece of our product and made pre-Beta development code available to select users. We provide drops and ask customers to try certain items out to get their feedback that will help guide us in our work to architect a really great end-user experience. This is not a simple task and there was overhead with getting it set up and there is overhead with maintaining it, but what we get out of it is really beneficial to creating what the customer wants and needs.
As our journey continues we are always looking for new initiatives that fall into our process focus areas that will enable us to become more efficient in our day-to-day execution and eliminate wasted cycles. If you are interested in how we got started with DevOps, check out our first blog in the series, Enterprise Network Solutions approach to getting started with adopting DevOps for z/OS.
The Winter 2016 SHARE Conference was a great educational event, with a wealth of excellent customer interaction! Six speakers from the Enterprise Networking Solutions organization presented 13 sessions on z/OS Communications Server, 3 on ISPF, and one on IBM Multi-Site Workload Lifeline, including two-part hands-on labs for Configuration Assistant for z/OSMF, and the ISPF editor. We also participated in a panel session where our attendees brought in plenty of great questions for discussion.
Given the recent availability of z/OS V2R2, there was a focus on z/OS V2R2, with special attention given to the new Shared Memory Communications - Direct Memory Access (SMC-D) protocol. Other topics discussed included Enterprise Extender, sysplex technologies, network security, z/OS CS performance, FTP security, and z/OS CS storage usage. Attendance at our sessions (and across the board at SHARE in San Antonio) was very good, and we would like to thank all of you who attended our sessions for the great feedback and dialogue.
For those that couldn't be at the conference last week, I will remind you that you can download most of the charts for the topics we presented by going to the following link:
Please plan to join us for the Summer 2016 SHARE Conference in Atlanta, Georgia, July 31 - August 5, 2016.