This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this community and its apps will no longer be available. More details available on our FAQ.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. This is my recap for Day 3 breakout sessions.
VersaStack for Containers: IBM Cloud Private and Spectrum Access
Chris Vollmar, IBM Canada, presented all day today. In this session, he explained how "Spectrum Access" was not a product, but rather a blueprint of best practices on how to install the IBM Cloud Private and Spectrum Connect software on the VersaStack solution.
Leveraging IBM Cloud Object Storage for z/OS
Louis Hanna, IBM Z Software, presented this session. I was expecting it to either cover the DS8000 Transparent Cloud Tiering, or direct access to IBM Cloud Object Storage, but it turns out neither!
Instead, Louis talked about [IBM Cloud Tape Connector for z/OS], which mimics tape drive interfaces that can be used to move disk and tape data to a public cloud, or to IBM Cloud Object Storage on premises.
Information Lifecycle Management: Why Archive is Different than Backup
Can you believe there are still companies out there keeping backup tapes for seven years and pretending that this meets their long-term retention requirements?
What happens when you try to recover those tapes, and you need the right server, the right operating system, and the right application software to make sense of it all?
Backups should not be used in this manner. Rather, backups are to recover from recent hardware failure or data corruption only. If you have are keeping backups longer than 90 days, you are probably doing something wrong.
Archiving, on the other hand, is an intelligent process for managing inactive or infrequently accessed data, that still has value, while providing the ability to preserve, search and retrieve the information during a specified retention period.
However, some of the product names have changed, so I thought it would be good to do a fresh update on this topic for this conference.
Becoming the person you published on LinkedIn
Frank Degilio, IBM Distinguished Engineer, presented this "Career Development" session. IBM Technical University is not just for technical education, it also offers sessions of general interest to help round out personal skills.
Frank explained that to rise the corporate ranks, you need to learn to communicate, to collaborate, and to network with others. Technical workers should be "T-shaped", with the top part of the letter "T" representing broad, general skills, and the lower part representing deep technical skills in a specific area.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. This is my recap of afternoon breakout sessions on Day 2.
Spectrum NAS 101 and key use cases
Chris Maestas presented IBM's latest addition to the Spectrum Storage family of Software-Defined Storage. Spectrum NAS was written from scratch in C/C++ language, instead of using open source code like SAMBA. It supports both NFS and SMB protocols.
Like IBM Cloud Object Storage, the Spectrum NAS software is shipped with the operating system, so you have a single ISO to run everything. You start with four nodes and can grow capacity and performance as needed by adding more nodes. All nodes have identical roles.
All of the storage is internal. Spectrum NAS uses DRAM memory, NVMe-based Solid State Drives (SSD), and spinning disk HDD. The NVMe drives must support at least five Drive Writes per Day (DWPD).
Each Spectrum NAS node can handle 2,000 connections, and up to 4,000 connections during fail-over processing. With 10GbE bandwidth, you can migrate 100 TB/day from other NAS devices to Spectrum NAS. If you want to try out Spectrum NAS yourself, there is a 60-day free trial offer now available. There are a collection of videos on the [Spectrum NAS YouTube channel] to walk you through the installation process.
Clients are Hyper for Hyperconverged
Marc Richardson and Bruce Jones, both from IBM Cognitive Systems, presented this client case study on successful deployment of IBM Hyperconverged Systems powered by Nutanix, often referred to as the "IBM CS" models of the POWER server line. The covered three use cases:
Modernize to Private Cloud
IBM CS models use the Nutanix Acropolis Hypervisor (AHV) to run Ubuntu and CentOS little-Endian virtual machines on POWER. The speakers claimed that they can run 50 percent faster, and 88 percent more workloads per core, than traditional x86 methods. IBM has made statement of direction that IBM CS models will support AIX 7.2 virtual machines later this year.
The IBM CS models can also run IBM Cloud Private, a collection of software that supports Docker and Kubernetes.
Simplify the Data Center
The client was not happy with the high prices of their external, high-end storage systems. When you add another IBM CS models to the cluster, you get more storage capacity and CPU capability at the same time, in lock step. What could be simpler?
Infrastructure for Modern Data Workloads
IBM CS models can run traditional Db2 and WebSphere applications. The client also reduced their costs by switching from expensive Oracle databases to open source databases like MongoDB and EnterpriseDB Postgres.
I was honored with being selected for this week's poster session. I was poster 16, explaining the What, Why and How of IBM Cloud Object Storage. Here is am posting with my colleague Heather Allen, IBM.
Kelly Groff, IBM FlashSystem, had poster 15 on how the embedded compression on the latest FlashSystem 900 models have almost no performance impact. Jeff Barnett, IBM, had poster 14 for IBM's Pay-as-you-grow Storage Utility Pricing.
Barry Whyte drew large crowds with his poster 13 on NVMe. Andy Kutner, IBM, had poster 11 on IBM Cloud Object Storage.
Fahima Zamir, IBM, had poster 29 on VersaStack solution, which combines best-of-breed x86 servers and switches from Cisco with IBM storage into a converged system. Sharie Mims from VSS is an IBM Business Partner.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. Here is my recap for morning break out sessions on Day 2.
A Survey of Deep Learning Techniques
Nin Lei, IBM Distinguished Engineer, presented a sample of Deep Learning techniques used today. CNN, RNN, and GAN.
Basic decision making: gather data, reviewed by subject matter expert, have an outcome. This is done for a variety of situations: fraudulent vs. legitimate credit card transaction, approve or reject loan application, tumor is benign or malignant. Machine Learning effectively replaces SME with a mathematical function.
Various tools are available for this: Tensorflow, SnapML, SAS, SPSS, are just a few.
Deep Learning is based on "Neural Networks", a subset of Machine Learning. There are input layer, one or more hidden layers, and then an output layer. For example, for a photo, each pixel could be an input feature. A 200x200 pixel photo represents 40,000 input values. In the past, there weren't more than three hidden layers. Today, we can have 20 to 50 layers, because we now have more computational power, with 95-97 percent accuracy.
For each connection between input layers and hidden layers, and output layers, you identify weights and biases. A research paper by Hornik 1989 posits that any machine learning can be performed by a sufficiently large neural network.
Convolution Neural Network (CNN) is often used for image recognition, for object classification or detection.
Some features are invariant. Location invariant means it doesn't matter where it is located within the photo. Color invariant means it does not matter what color it is, and can work with black-and-white or grayscale photos.
For example, for facial recognition, earlier layers are focused on identifying edges, and later layers identify facial features like eyes, nose and mouth.
Image recognition is used with self-driving cars, drones to determine power line maintenance or crop inspection, social media, video surveillance, medical image diagnosis, car racing, and ripeness of fruits and vegetables.
CNN is used for auto-encoding. This takes detailed photos, compresses them, and then can be used to decode back to something similar. It can takes weeks to train a model with a million photo images.
Recurrent Neural Network (RNN) is focused on time sequence.
This is useful for predicting sequences of letters or words. However, since mathematics are involved, a long sequence of multiplications will either get to zero or infinity, this is known as the "vanishing gradient problem".
The solution is "Long Short Term Memory" (LSTM) cells. Basically, the model selectively remembers information from previous steps, which reduces the number of multiplications.
RNN need to know related words. For example, men-women, king-queen, walking-walked, swimming-swam, Spain-Madrid. These are referred to as "embeddings", which are stored in the hidden layers for quick lookup.
Generative Adversarial Networks (GAN) are used to generate fake photos to train other models.
Sometimes, you do not have enough photos in each category for training, so you can generate fake images to help with the training system. Noise is fed into a "Generator" model, and then the results are evaluated by a "Discriminator" model, comparing the fake with real photos. Repetition allows each model to improve so that the fake photos become more realistic for training purposes.
The death of the one-size-fits-all cloud: The mainstreaming of multi-arch
Elise Spence and Drew Thorstensen, IBM Power Systems for Software Defined Cloud Infrastructure, presented this topic. The session was on IBM Cloud Private, and the multiple architectures supported by Docker and Kubernetes.
There are actually six different architectures supported for Docker containers:
While containers are "portable" between systems, the binaries are typically only written for a single architecture, typically Linux-x86 or Windows-x86, and won't run on POWER or IBM Z.
The solution is to create a multi-arch manifest file, and port all the binaries to all of these different architectures. This way, when the containerized application is run on POWER, the manifest will identify the POWER-based binaries.
Introduction to IBM Cloud Object Storage (powered by Cleversafe)
Before 2015, IBM offered two "Object Storage" products: IBM Spectrum Scale and IBM Spectrum Archive, and I was constantly having to compare and contrast IBM products to Cleversafe.
Not any more! With the IBM acquisition of Cleversafe, IBM now offers all three!
This session explained all of the features and functions of IBM Cloud Object Storage System, available as software, as pre-built systems, including a VersaStack CVD, and as Storage-as-a-Service (STaaS) in the IBM Cloud.
(IBM renamed Cleversafe DSnet to "IBM Cloud Object Storage System". I joked that if IBM ever acquired Coca-Cola, they would probably rename their signature soft drink as the "Brown Carbonated Sugar Liquid", or BroCarb SugarLiq for short!)
I provided a general overview, as well as the latest features of Concentrated Dispersal Mode and Compliance Enabled Vaults.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
Last year, Hurricanes Harvey, Irma, Jose, and Maria, ravaged various parts of North America and the Caribbean. My topic on Business Continuity and Disaster Recovery (BC/DR) was well attended. I have been working in BC/DR for most of my career, including the "High Availability Center of Competency" or HACOC, for short.
However, natural disasters like hurricanes, tornadoes, forest fires and floods represent less than 20 percent of all disasters. The majority of disasters, nearly 75 percent, arise from electrical power outages, human error, system failure and ransomware.
The seven tiers were developed by a group of IBM customers back in the 1980s, and have stood the test of time. I recently published an article in IBM Systems Magazine (January/February 2018) based on this presentation.
Cloud storage comes in four flavors: persistent, ephemeral, hosted, and reference. The first two I refer to as "Storage for the Computer Cloud" and the latter two I refer to as "Storage as the Storage Cloud".
I also explained the differences between block, file and object access, and why different Cloud storage types use different access methods.
Finally, I covered some Hybrid Cloud Storage configurations, showing how a combination of Traditional IT, on-premise local private cloud, off-premise dedicated private cloud and public cloud, and be combined to provide added value.
Reporting and Monitoring: How to Verify your Storage is Being Used Efficiently
It is hard to believe that it was over 15 years ago that I was the chief architect for the software we now call IBM Spectrum Connect, Spectrum Control and Storage Insights. There are a variety of editions and bundles for this product, but my focus on this talk was on the advanced storage analytics found in IBM Virtual Storage Center and IBM Spectrum Control Advanced Edition.
I covered three use cases:
What storage tier to put your workload in, and how to move existing data into a faster or slower tier to meet business requirements and IT budgets.
For steady state environments, how to re-balance storage pools within a single tier to keep things even for optimal performance.
When it is time to decommission storage, how to transform volumes from one storage pool to another without downtime or outages.
Special thanks to Bryan Odom for his help in updating this presentation.
Spectrum Virtualization Data Reduction Pools 101
Barry Whyte, IBM Master Inventor and ATS for Storage Virtualization for Asia Pacific region, presented on how Data Reduction Pools were implemented in version 8.1.2 of Spectrum Virtualize. The software in the latest IBM SAN Volume Controller (SVC), IBM Storwize products, and IBM FlashSystem V9000.
Basically, rather than say we "re-wrote" the code, we prefer softer euphemisms like the code was "re-imagined" or, my favorite lately, "re-factored". Legacy Storage Pools will continue to be supported, but IBM anticipates that people over time will transition to the new Data Reduction Pools (DR Pools).
Like Legacy Storage Pools, the new DR Pools also support a mix of Fully-allocated, Thin-Provisioned, and Compressed-Thin volumes. IBM has made a statement of direction that it will offer Data Deduplication feature in the future, but these will only be on the new DR Pools.
While DR Pools are available today with version 8.1.2, there are a few restrictions. There is a limit of four DR Pools per cluster, and the amount of total capacity of each pool depends on the extent size and number of I/O groups configured. Some of the migration methods developed for Legacy Storage Pools are not available, and in reality don't make sense in the new DR pool scheme. Child Pools are not supported either.
One of the big improvements that DR Pools offer is in the area of compression. With Legacy Storage Pools, CPU cores were dedicated for compression, so they were either under-utilized or overwhelmed. With DR pools, all CPU cores can be used for either I/O or compression, which potentially can increase performance by up to 40 percent!
After the sessions, IBM had its "Solution Center Reception". This is a chance to relax and unwind after a long day, with food and drink, and various sponsors in booths to explain their latest offerings.
This is Katie Thacker from [FIT]. In March 2018, FIT was recognized as IBM’s Top Strategic Service Provider of the year!
These are Elizabeth Krivan and Kelly Bouchard, two recently-hired IBM storage sellers. They attended my sessions at the IBM Technical University in New Orleans last October, so it was good to see them again at my sessions here in Orlando.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. Here is my recap for the keynote sessions on Day 1.
Art Beller, IBM Vice President of WW Systems Technical Sales
Art Beller, my third-line manager, kicked off the event. He explained that with [Artificial Intelligence], or AI for short, we are entering the "age of the incumbent". All across industries, the companies that have established dominance over the decades have the most data to get value from.
Kathryn Guarini, IBM Vice President Research Strategy
Kathryn provided an overview of the latest news on AI. Over 700 students at MIT, and 1,000 students at Stanford University, have signed up for "Intro to AI" classes. There are over 30,000 AI-related jobs in IT today. The investment in AI is 10 times more than it was just four years ago.
Kathryn explained there are three levels of AI: Narrow, Broad, and General. Narrow AI finally works, such as face recognition or speech-to-text translation. Broad AI is still a ways out, and General AI is not expected until year 2050.
An area of research is to "Learn more with less". For example, if you train a photo image recognition to identify different species of dogs, can you extend some of this learning to recognize different cats? This is often referred to as "Transfer Learning".
Cyber-criminals are already using AI, and if they can infiltrate AI training models, can introduce some scary scenarios. The next cyber battle-field will be AI vs. AI.
AI results need to be "Explainable", both in the training and debugging phases, as well as the infer/deployment phases. We need to detect and eliminate human biases, and rank different models on their fairness.
Kathryn gave some real examples:
Medical Sieve: An MRI scan captures over 10,000 images. Through AI, the top 25 most important images can be identified, making a doctor's job easier in identifying tumors.
Cancer Research: There are over 800 billion DNA base pairs to evaluate for different cancers, combined with 723 million published articles are relevant research. AI can help sort this out, matching the best research for the appropriate type of cancer.
Banking Regulations: There are over a million compliance documents, and some banks have more than 10,000 employees focused on enforcing compliance. About 10 percent of these compliance documents change every year, making this a moving target.
Fraud Detection: There are too many "false positives" in today's algorithms for suspicious spending behavior. AI can help identify this better.
Video Highlights: AI can be used to generate movie trailers or sports highlights by identify the most relevant portions of a movie or sporting event.
Reduce Air Pollution: China is investigating the use of AI to reduce air pollution in its country. Large cities like Beijing are particularly over-polluted.
Hillery Hunter, IBM Fellow and Director of Accelerated Cognitive Infrastructure at IBM Research
AI takes Terabytes of information, both structured and unstructured data, to develop a model that is very small, perhaps a few MB or GB.
The four steps are: identify your data sources, do some data preparation, train your model, and then infer using that model. Your data sources are stored in a Capacity Tier (often referred to as Data Lake). Inference must be done quickly, so a Performance tier is needed for that phase.
In some cases, data can't move, so for those situations, we need "Federated AI" where we can combine results from different systems.
IBM has added Distributed Deep Learning (DDL) to its PowerAI set of libraries. To estimate "Click-Thru Rate", a typical approach with 4.2 billion training examples took 70 minutes. With PowerAI DDL, this was reduced to 91 seconds. In another example, training that took nine days was reduced to four hours.
Lastly, Hillery mentioned "in-memory computing". Rather than reading data in from memory, and performing some computation on it, this new approach does part of the compute processing on the memory chip itself, eliminating a lot of data transfers.
Clod Barrera, IBM Distinguished Engineer and Chief Technical Strategist for storage
In previous years, IBM Technical University would offer brand-specific keynote sessions for IBM Z, IBM Power and IBM Storage. However, these were in the same time slot, so you could only see one of them. This year, IBM Storage was put into a different slot, so people could hear about their server of choice, and then also listen to the storage keynote.
Clod gave a state of the industry related to different storage media. For Flash, for example, he explained that Phase Change Memory is being developed, using the difference between amorphous and crystalline states to represent ones and zeros.
Tape is also seeing a resurgence. In 2005, Microsoft had declared tape was dead. Today, their Microsoft Azure is a big fan of tape to store data at reduced cost. Tape is 20 times less expensive than disk.
Clod summarized his talk by stating the key areas of storage development:
Optimizing for Artificial Intelligence
Automation for Security and Privacy
Data Governance and Management
You can follow along this week with Twitter hashtag #IBMTechU, or follow me at @az990tony.