This blog is for the open exchange of ideas relating to IBM Systems, storage and storage networking hardware, software and services.
(Short URL for this blog: ibm.co/Pearson )
Tony Pearson is a Master Inventor, Senior IT Architect and Event Content Manager for [IBM Systems for IBM Systems Technical University] events. With over 30 years with IBM Systems, Tony is frequent traveler, speaking to clients at events throughout the world.
Lloyd Dean is an IBM Senior Certified Executive IT Architect in Infrastructure Architecture. Lloyd has held numerous senior technical roles at IBM during his 19 plus years at IBM. Lloyd most recently has been leading efforts across the Communication/CSI Market as a senior Storage Solution Architect/CTS covering the Kansas City territory. In prior years Lloyd supported the industry accounts as a Storage Solution architect and prior to that as a Storage Software Solutions specialist during his time in the ATS organization.
Lloyd currently supports North America storage sales teams in his Storage Software Solution Architecture SME role in the Washington Systems Center team. His current focus is with IBM Cloud Private and he will be delivering and supporting sessions at Think2019, and Storage Technical University on the Value of IBM storage in this high value IBM solution a part of the IBM Cloud strategy. Lloyd maintains a Subject Matter Expert status across the IBM Spectrum Storage Software solutions. You can follow Lloyd on Twitter @ldean0558 and LinkedIn Lloyd Dean.
Tony Pearson's books are available on Lulu.com! Order your copies today!
Safe Harbor Statement: The information on IBM products is intended to outline IBM's general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on IBM products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for IBM products remains at IBM's sole discretion.
Tony Pearson is a an active participant in local, regional, and industry-specific interests, and does not receive any special payments to mention them on this blog.
Tony Pearson receives part of the revenue proceeds from sales of books he has authored listed in the side panel.
Tony Pearson is not a medical doctor, and this blog does not reference any IBM product or service that is intended for use in the diagnosis, treatment, cure, prevention or monitoring of a disease or medical condition, unless otherwise specified on individual posts.
The developerWorks Connections platform will be sunset on January 2, 2020. This blog will no longer be available unless an extension has been requested. More details available on our FAQ.
This session had four parts. First, an overview of "Data Footprint Reduction" technologies, like compression, data deduplication, space-efficient snapshots and thin provisioning.
Second, a look at how these technologies can get storage administrators in trouble. Much like airlines selling more tickets than seats on the airplane, storage administrators may over-provision based on data reduction estimates, and then suddenly run out of storage capacity.
Third, an overview of IBM FlashSystem A9000 and A9000R products, often referred to as "A9000/R" to cover both as a family. These models offer data footprint reduction for all data.
Finally, I explain how the Hyper-Scale Manager GUI can help with reporting and analytics to avoid these risks. This GUI is available for the FlashSystem A9000/R, as well as XIV Gen3 and Spectrum Accelerate software clusters.
Special thanks to Rivka Matosevich for her help in preparing this presentation.
The Pendulum Swings: Understanding Converged and Hyperconverged Integrated Systems
With IBM's partnership with Cisco for VersaStack, and Nutanix for the IBM Power systems, this has become a particularly popular topic.
I started with an overview of the last 50 years of storage evolution, from internal storage and external storage to NAS and SAN storage networks. An estimated 96 percent of the storage in corporate data centers are connected via NAS or SAN networks.
More recently, people have been willing to give up all those gains for something simpler, less powerful, less reliable, less expensive. Enter Converged and Hyperconverged Systems. IBM PureApplication and VersaStack lead the pack for Converged Systems, along with IBM Spectrum Scale, Spectrum Accelerate and Nutanix on IBM Power Systems for Hyperconverged Integrated Systems.
We had 1,600 attendees, much higher than expected. This is a good sign, when you consider IBM just had its "Think 2018" conference last March, and Dell EMC had their big conference the same week in Las Vegas.
When people asked me what was the main difference between "Think 2018" and "IBM Technical University", I explain it as follows:
Think 2018 is a big conference focused on uni-directional communication. IBM executives present the corporate line repeatedly to large audiences. Its size and scale means they can have big name bands and celebrity speakers.
IBM Technical University is a smaller conference focused on bi-directional communication. Audiences are small and encouraged to ask questions. Demos, Labs and Meetups allow for conversations with IBM technical experts. There are no crowds in the hallways to hamper ad-hoc side conversations. The IBM speakers listen to the clients concerns and bring that feedback to development.
Confused yet? Fortunately, the speaker Mark Rader, IBM Z, focused entirely on the z/OS platform version of these tools.
Storage Meetup: Cloud and Object Storage
In past years, the conference would organize three huge rooms, one for IBM Z, one for IBM POWER, and the last for IBM Storage. These would be Q&A panels, with a dozen experts at the front of the room, and a large audience asking questions.
This year, these were all split up into smaller "Meetup" sessions. There were Meetups for Spectrum Protect, Spectrum Scale, Encryption, IBM i, z/OS, Power, FlashSystem, TS7700, DS8000, Blockchain, DevOPS, Machine Learning, Spectrum Virtualize.
Andy Kutner and I led the Storage Meetup for "Cloud and Object Storage". Andy and I had both done several presentations on these topics during the week, and we were able to handle the questions that came from the audience. Here is a sample:
Is IBM Spectrum Virtualize Transparent Cloud Tiering mature enough to use now?
It is unfortunate that IBM chose "Transparent Cloud Tiering" (TCT) to describe four implementations on four different product families. The one thing they have in common is that they send data to the Cloud or IBM Cloud Object Storage. The TCT feature of Spectrum Virtualize, including SAN Volume Controller (SVC), Storwize and FlashSystem V9000 was made generally available for use last year.
Our use of S3FS does not perform well?
S3FS was developed to allow file system access through the FUSE driver for Linux and MacOS operating systems. It was not optimized for performance, and was meant as a convenience instead.
Does Spectrum Copy Data Management support the Cloud?
Yes, Spectrum CDM works with a variety of IBM and non-IBM storage to take volume snapshots that can be used for DevOPS, dev/test, and other purposes. These snapshots can be moved to the cloud to make them more broadly accessible to developers.
How do we size backup configurations to the Cloud
Several backup software support IBM Cloud and IBM Cloud Object Storage, including Veritas NetBackup and Commvault Simpana. For IBM Spectrum Protect, IBM has published "blueprints" now that include Cloud configurations that have done all the sizing work for you.
When should we consider using a Co-location facility instead of Public Cloud
There are pros and cons for each, and there are also options in between. IBM Cloud offers "Dedicated" bare-metal servers, giving you complete control, but still publicly accessible. IBM Cloud also offers "Private" bare-metal behind your firewall.
Can IBM COS Vault Mirroring employ the Aspera file transfer protocol?
Andy and I were stumped on this one. Andy agreed to get back to the attendee on this.
Can IBM COS support Spark Analytic workloads
Yes, IBM COS supports [Stocator], a high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.
We are currently using IBM FileNet to Centera, can the data be moved to IBM COS?
Absolutely, not only does FileNet offer utilities to make this possible, IBM has partnered with service providers that offer fast data movement from Centera to IBM COS for a variety of specific applications, including FileNet.
What are the key selling points over Dell EMC Centera?
Basically, IBM COS is more scalable, performs faster, at a lower Total Cost of Ownership (TCO). Contact your local storage seller or IBM Business Partner for a full presentation!
Is there a database to quickly search through all of the valuable metadata stored in IBM COS and other repositories?
Great idea! I will pass this on to development.
How does IBM COS protect against disk failure or data corruption?
IBM COS slowly rolls through all the data checking the integrity. Checksums are validated, and any corrupted or missing slices are reconstructed using Erasure Coding.
Who competes with IBM COS?
IBM ranks #1 in Object Storage. There are several competitors. Established storage companies like Dell EMC, NetApp, Hitachi Vantara, and Hewlett Packard Enterprise (HPE) Simplivity have object storage offerings. Companies also have attempted to build object stores from open source code, OpenStack Swift and Ceph.
How can IBM Business Partners demonstrate IBM COS to clients?
The IBM Systems Client Experience Portal [ISCEP] provides a list of demos available for all IBM storage products, including IBM COS.
Event Night: Universal Studios
IBM rented out a section of Universal Studios! We were welcomed by various characters to take pictures with! Here are Scooby Doo and Shaggy!
There were several rides to chose from. While waiting for the "Race through New York with Jimmy Fallon" ride, we were entertained by these five acapella singers doing popular rap songs. Laura poses for a picture with two Minions.
The other two rides were "Transformers" and "The Mummy". After Transformers ride, we could take pictures with Bumblebee, one of the auto-bots. I posed with Ahmanet, the mummy played by Sofia Boutella in the 2017 movie.
There was also plenty of food, representing Italian, Mexican, Greek, Chinese and American fare.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. This is my recap for Day 3 breakout sessions.
VersaStack for Containers: IBM Cloud Private and Spectrum Access
Chris Vollmar, IBM Canada, presented all day today. In this session, he explained how "Spectrum Access" was not a product, but rather a blueprint of best practices on how to install the IBM Cloud Private and Spectrum Connect software on the VersaStack solution.
Leveraging IBM Cloud Object Storage for z/OS
Louis Hanna, IBM Z Software, presented this session. I was expecting it to either cover the DS8000 Transparent Cloud Tiering, or direct access to IBM Cloud Object Storage, but it turns out neither!
Instead, Louis talked about [IBM Cloud Tape Connector for z/OS], which mimics tape drive interfaces that can be used to move disk and tape data to a public cloud, or to IBM Cloud Object Storage on premises.
Information Lifecycle Management: Why Archive is Different than Backup
Can you believe there are still companies out there keeping backup tapes for seven years and pretending that this meets their long-term retention requirements?
What happens when you try to recover those tapes, and you need the right server, the right operating system, and the right application software to make sense of it all?
Backups should not be used in this manner. Rather, backups are to recover from recent hardware failure or data corruption only. If you have are keeping backups longer than 90 days, you are probably doing something wrong.
Archiving, on the other hand, is an intelligent process for managing inactive or infrequently accessed data, that still has value, while providing the ability to preserve, search and retrieve the information during a specified retention period.
However, some of the product names have changed, so I thought it would be good to do a fresh update on this topic for this conference.
Becoming the person you published on LinkedIn
Frank Degilio, IBM Distinguished Engineer, presented this "Career Development" session. IBM Technical University is not just for technical education, it also offers sessions of general interest to help round out personal skills.
Frank explained that to rise the corporate ranks, you need to learn to communicate, to collaborate, and to network with others. Technical workers should be "T-shaped", with the top part of the letter "T" representing broad, general skills, and the lower part representing deep technical skills in a specific area.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. This is my recap of afternoon breakout sessions on Day 2.
Spectrum NAS 101 and key use cases
Chris Maestas presented IBM's latest addition to the Spectrum Storage family of Software-Defined Storage. Spectrum NAS was written from scratch in C/C++ language, instead of using open source code like SAMBA. It supports both NFS and SMB protocols.
Like IBM Cloud Object Storage, the Spectrum NAS software is shipped with the operating system, so you have a single ISO to run everything. You start with four nodes and can grow capacity and performance as needed by adding more nodes. All nodes have identical roles.
All of the storage is internal. Spectrum NAS uses DRAM memory, NVMe-based Solid State Drives (SSD), and spinning disk HDD. The NVMe drives must support at least five Drive Writes per Day (DWPD).
Each Spectrum NAS node can handle 2,000 connections, and up to 4,000 connections during fail-over processing. With 10GbE bandwidth, you can migrate 100 TB/day from other NAS devices to Spectrum NAS. If you want to try out Spectrum NAS yourself, there is a 60-day free trial offer now available. There are a collection of videos on the [Spectrum NAS YouTube channel] to walk you through the installation process.
Clients are Hyper for Hyperconverged
Marc Richardson and Bruce Jones, both from IBM Cognitive Systems, presented this client case study on successful deployment of IBM Hyperconverged Systems powered by Nutanix, often referred to as the "IBM CS" models of the POWER server line. The covered three use cases:
Modernize to Private Cloud
IBM CS models use the Nutanix Acropolis Hypervisor (AHV) to run Ubuntu and CentOS little-Endian virtual machines on POWER. The speakers claimed that they can run 50 percent faster, and 88 percent more workloads per core, than traditional x86 methods. IBM has made statement of direction that IBM CS models will support AIX 7.2 virtual machines later this year.
The IBM CS models can also run IBM Cloud Private, a collection of software that supports Docker and Kubernetes.
Simplify the Data Center
The client was not happy with the high prices of their external, high-end storage systems. When you add another IBM CS models to the cluster, you get more storage capacity and CPU capability at the same time, in lock step. What could be simpler?
Infrastructure for Modern Data Workloads
IBM CS models can run traditional Db2 and WebSphere applications. The client also reduced their costs by switching from expensive Oracle databases to open source databases like MongoDB and EnterpriseDB Postgres.
I was honored with being selected for this week's poster session. I was poster 16, explaining the What, Why and How of IBM Cloud Object Storage. Here is am posting with my colleague Heather Allen, IBM.
Kelly Groff, IBM FlashSystem, had poster 15 on how the embedded compression on the latest FlashSystem 900 models have almost no performance impact. Jeff Barnett, IBM, had poster 14 for IBM's Pay-as-you-grow Storage Utility Pricing.
Barry Whyte drew large crowds with his poster 13 on NVMe. Andy Kutner, IBM, had poster 11 on IBM Cloud Object Storage.
Fahima Zamir, IBM, had poster 29 on VersaStack solution, which combines best-of-breed x86 servers and switches from Cisco with IBM storage into a converged system. Sharie Mims from VSS is an IBM Business Partner.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. Here is my recap for morning break out sessions on Day 2.
A Survey of Deep Learning Techniques
Nin Lei, IBM Distinguished Engineer, presented a sample of Deep Learning techniques used today. CNN, RNN, and GAN.
Basic decision making: gather data, reviewed by subject matter expert, have an outcome. This is done for a variety of situations: fraudulent vs. legitimate credit card transaction, approve or reject loan application, tumor is benign or malignant. Machine Learning effectively replaces SME with a mathematical function.
Various tools are available for this: Tensorflow, SnapML, SAS, SPSS, are just a few.
Deep Learning is based on "Neural Networks", a subset of Machine Learning. There are input layer, one or more hidden layers, and then an output layer. For example, for a photo, each pixel could be an input feature. A 200x200 pixel photo represents 40,000 input values. In the past, there weren't more than three hidden layers. Today, we can have 20 to 50 layers, because we now have more computational power, with 95-97 percent accuracy.
For each connection between input layers and hidden layers, and output layers, you identify weights and biases. A research paper by Hornik 1989 posits that any machine learning can be performed by a sufficiently large neural network.
Convolution Neural Network (CNN) is often used for image recognition, for object classification or detection.
Some features are invariant. Location invariant means it doesn't matter where it is located within the photo. Color invariant means it does not matter what color it is, and can work with black-and-white or grayscale photos.
For example, for facial recognition, earlier layers are focused on identifying edges, and later layers identify facial features like eyes, nose and mouth.
Image recognition is used with self-driving cars, drones to determine power line maintenance or crop inspection, social media, video surveillance, medical image diagnosis, car racing, and ripeness of fruits and vegetables.
CNN is used for auto-encoding. This takes detailed photos, compresses them, and then can be used to decode back to something similar. It can takes weeks to train a model with a million photo images.
Recurrent Neural Network (RNN) is focused on time sequence.
This is useful for predicting sequences of letters or words. However, since mathematics are involved, a long sequence of multiplications will either get to zero or infinity, this is known as the "vanishing gradient problem".
The solution is "Long Short Term Memory" (LSTM) cells. Basically, the model selectively remembers information from previous steps, which reduces the number of multiplications.
RNN need to know related words. For example, men-women, king-queen, walking-walked, swimming-swam, Spain-Madrid. These are referred to as "embeddings", which are stored in the hidden layers for quick lookup.
Generative Adversarial Networks (GAN) are used to generate fake photos to train other models.
Sometimes, you do not have enough photos in each category for training, so you can generate fake images to help with the training system. Noise is fed into a "Generator" model, and then the results are evaluated by a "Discriminator" model, comparing the fake with real photos. Repetition allows each model to improve so that the fake photos become more realistic for training purposes.
The death of the one-size-fits-all cloud: The mainstreaming of multi-arch
Elise Spence and Drew Thorstensen, IBM Power Systems for Software Defined Cloud Infrastructure, presented this topic. The session was on IBM Cloud Private, and the multiple architectures supported by Docker and Kubernetes.
There are actually six different architectures supported for Docker containers:
While containers are "portable" between systems, the binaries are typically only written for a single architecture, typically Linux-x86 or Windows-x86, and won't run on POWER or IBM Z.
The solution is to create a multi-arch manifest file, and port all the binaries to all of these different architectures. This way, when the containerized application is run on POWER, the manifest will identify the POWER-based binaries.
Introduction to IBM Cloud Object Storage (powered by Cleversafe)
Before 2015, IBM offered two "Object Storage" products: IBM Spectrum Scale and IBM Spectrum Archive, and I was constantly having to compare and contrast IBM products to Cleversafe.
Not any more! With the IBM acquisition of Cleversafe, IBM now offers all three!
This session explained all of the features and functions of IBM Cloud Object Storage System, available as software, as pre-built systems, including a VersaStack CVD, and as Storage-as-a-Service (STaaS) in the IBM Cloud.
(IBM renamed Cleversafe DSnet to "IBM Cloud Object Storage System". I joked that if IBM ever acquired Coca-Cola, they would probably rename their signature soft drink as the "Brown Carbonated Sugar Liquid", or BroCarb SugarLiq for short!)
I provided a general overview, as well as the latest features of Concentrated Dispersal Mode and Compliance Enabled Vaults.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
Last year, Hurricanes Harvey, Irma, Jose, and Maria, ravaged various parts of North America and the Caribbean. My topic on Business Continuity and Disaster Recovery (BC/DR) was well attended. I have been working in BC/DR for most of my career, including the "High Availability Center of Competency" or HACOC, for short.
However, natural disasters like hurricanes, tornadoes, forest fires and floods represent less than 20 percent of all disasters. The majority of disasters, nearly 75 percent, arise from electrical power outages, human error, system failure and ransomware.
The seven tiers were developed by a group of IBM customers back in the 1980s, and have stood the test of time. I recently published an article in IBM Systems Magazine (January/February 2018) based on this presentation.
Cloud storage comes in four flavors: persistent, ephemeral, hosted, and reference. The first two I refer to as "Storage for the Computer Cloud" and the latter two I refer to as "Storage as the Storage Cloud".
I also explained the differences between block, file and object access, and why different Cloud storage types use different access methods.
Finally, I covered some Hybrid Cloud Storage configurations, showing how a combination of Traditional IT, on-premise local private cloud, off-premise dedicated private cloud and public cloud, and be combined to provide added value.
Reporting and Monitoring: How to Verify your Storage is Being Used Efficiently
It is hard to believe that it was over 15 years ago that I was the chief architect for the software we now call IBM Spectrum Connect, Spectrum Control and Storage Insights. There are a variety of editions and bundles for this product, but my focus on this talk was on the advanced storage analytics found in IBM Virtual Storage Center and IBM Spectrum Control Advanced Edition.
I covered three use cases:
What storage tier to put your workload in, and how to move existing data into a faster or slower tier to meet business requirements and IT budgets.
For steady state environments, how to re-balance storage pools within a single tier to keep things even for optimal performance.
When it is time to decommission storage, how to transform volumes from one storage pool to another without downtime or outages.
Special thanks to Bryan Odom for his help in updating this presentation.
Spectrum Virtualization Data Reduction Pools 101
Barry Whyte, IBM Master Inventor and ATS for Storage Virtualization for Asia Pacific region, presented on how Data Reduction Pools were implemented in version 8.1.2 of Spectrum Virtualize. The software in the latest IBM SAN Volume Controller (SVC), IBM Storwize products, and IBM FlashSystem V9000.
Basically, rather than say we "re-wrote" the code, we prefer softer euphemisms like the code was "re-imagined" or, my favorite lately, "re-factored". Legacy Storage Pools will continue to be supported, but IBM anticipates that people over time will transition to the new Data Reduction Pools (DR Pools).
Like Legacy Storage Pools, the new DR Pools also support a mix of Fully-allocated, Thin-Provisioned, and Compressed-Thin volumes. IBM has made a statement of direction that it will offer Data Deduplication feature in the future, but these will only be on the new DR Pools.
While DR Pools are available today with version 8.1.2, there are a few restrictions. There is a limit of four DR Pools per cluster, and the amount of total capacity of each pool depends on the extent size and number of I/O groups configured. Some of the migration methods developed for Legacy Storage Pools are not available, and in reality don't make sense in the new DR pool scheme. Child Pools are not supported either.
One of the big improvements that DR Pools offer is in the area of compression. With Legacy Storage Pools, CPU cores were dedicated for compression, so they were either under-utilized or overwhelmed. With DR pools, all CPU cores can be used for either I/O or compression, which potentially can increase performance by up to 40 percent!
After the sessions, IBM had its "Solution Center Reception". This is a chance to relax and unwind after a long day, with food and drink, and various sponsors in booths to explain their latest offerings.
This is Katie Thacker from [FIT]. In March 2018, FIT was recognized as IBM’s Top Strategic Service Provider of the year!
These are Elizabeth Krivan and Kelly Bouchard, two recently-hired IBM storage sellers. They attended my sessions at the IBM Technical University in New Orleans last October, so it was good to see them again at my sessions here in Orlando.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
This week, I am in Orlando, Florida for the [IBM Technical University], with focus on IBM storage, IBM Z mainframes and IBM Power servers. Here is my recap for the keynote sessions on Day 1.
Art Beller, IBM Vice President of WW Systems Technical Sales
Art Beller, my third-line manager, kicked off the event. He explained that with [Artificial Intelligence], or AI for short, we are entering the "age of the incumbent". All across industries, the companies that have established dominance over the decades have the most data to get value from.
Kathryn Guarini, IBM Vice President Research Strategy
Kathryn provided an overview of the latest news on AI. Over 700 students at MIT, and 1,000 students at Stanford University, have signed up for "Intro to AI" classes. There are over 30,000 AI-related jobs in IT today. The investment in AI is 10 times more than it was just four years ago.
Kathryn explained there are three levels of AI: Narrow, Broad, and General. Narrow AI finally works, such as face recognition or speech-to-text translation. Broad AI is still a ways out, and General AI is not expected until year 2050.
An area of research is to "Learn more with less". For example, if you train a photo image recognition to identify different species of dogs, can you extend some of this learning to recognize different cats? This is often referred to as "Transfer Learning".
Cyber-criminals are already using AI, and if they can infiltrate AI training models, can introduce some scary scenarios. The next cyber battle-field will be AI vs. AI.
AI results need to be "Explainable", both in the training and debugging phases, as well as the infer/deployment phases. We need to detect and eliminate human biases, and rank different models on their fairness.
Kathryn gave some real examples:
Medical Sieve: An MRI scan captures over 10,000 images. Through AI, the top 25 most important images can be identified, making a doctor's job easier in identifying tumors.
Cancer Research: There are over 800 billion DNA base pairs to evaluate for different cancers, combined with 723 million published articles are relevant research. AI can help sort this out, matching the best research for the appropriate type of cancer.
Banking Regulations: There are over a million compliance documents, and some banks have more than 10,000 employees focused on enforcing compliance. About 10 percent of these compliance documents change every year, making this a moving target.
Fraud Detection: There are too many "false positives" in today's algorithms for suspicious spending behavior. AI can help identify this better.
Video Highlights: AI can be used to generate movie trailers or sports highlights by identify the most relevant portions of a movie or sporting event.
Reduce Air Pollution: China is investigating the use of AI to reduce air pollution in its country. Large cities like Beijing are particularly over-polluted.
Hillery Hunter, IBM Fellow and Director of Accelerated Cognitive Infrastructure at IBM Research
AI takes Terabytes of information, both structured and unstructured data, to develop a model that is very small, perhaps a few MB or GB.
The four steps are: identify your data sources, do some data preparation, train your model, and then infer using that model. Your data sources are stored in a Capacity Tier (often referred to as Data Lake). Inference must be done quickly, so a Performance tier is needed for that phase.
In some cases, data can't move, so for those situations, we need "Federated AI" where we can combine results from different systems.
IBM has added Distributed Deep Learning (DDL) to its PowerAI set of libraries. To estimate "Click-Thru Rate", a typical approach with 4.2 billion training examples took 70 minutes. With PowerAI DDL, this was reduced to 91 seconds. In another example, training that took nine days was reduced to four hours.
Lastly, Hillery mentioned "in-memory computing". Rather than reading data in from memory, and performing some computation on it, this new approach does part of the compute processing on the memory chip itself, eliminating a lot of data transfers.
Clod Barrera, IBM Distinguished Engineer and Chief Technical Strategist for storage
In previous years, IBM Technical University would offer brand-specific keynote sessions for IBM Z, IBM Power and IBM Storage. However, these were in the same time slot, so you could only see one of them. This year, IBM Storage was put into a different slot, so people could hear about their server of choice, and then also listen to the storage keynote.
Clod gave a state of the industry related to different storage media. For Flash, for example, he explained that Phase Change Memory is being developed, using the difference between amorphous and crystalline states to represent ones and zeros.
Tape is also seeing a resurgence. In 2005, Microsoft had declared tape was dead. Today, their Microsoft Azure is a big fan of tape to store data at reduced cost. Tape is 20 times less expensive than disk.
Clod summarized his talk by stating the key areas of storage development:
Optimizing for Artificial Intelligence
Automation for Security and Privacy
Data Governance and Management
You can follow along this week with Twitter hashtag #IBMTechU, or follow me at @az990tony.
The New Orleans event was a five-day event, but I had to leave Wednesday evening for other meetings, so missed out on the last two days.
I do plan to be there all of next week in Orlando. Look for me at one of my sessions, during the breaks, the Solutions Reception on Monday evening, the Poster Session on Tuesday evening, or Universal Studios event on Thursday evening.
You can follow along with Twitter hashtag #IBMtechU, or follow me at @az990tony.
Well, it's Tuesday again, and you know what that means? IBM Announcements!
(FTC Disclosure: I work for IBM. This blog post can be considered a "paid celebrity endorsement" of the IBM Z and IBM storage products mentioned below.)
DS8880 R8.3.3 Enhancements
Back in 2015, IBM [DS8880 models] of the DS8000 family. Sales drastically increased, in part because IBM re-designed the systems to be a standard 19-inch wide rack, rather than the 33-inch wide custom sizes used before. Many cloud service providers (CSP) and managed service providers (MSP) require 19-inch standard rack configurations.
To meet client requirements, the newest IBM mainframes, including Z14 model ZR1 and LinuxONE Rockhopper II, are now following the same 19-inch rack size!
IBM DS8880 models now have enhanced support for zHyperlink connections. Clients with existing 6-core DS8884/F or 8-core DS8886/F models can upgrade to add more cores for zHyperlink connectivity.
Cores per CEC
Maximum zHyperlink connections
The zHyperlink supports both 40-meter and 150-meter cables. This allows applications like DB2 to read data with substantially lower latency than traditional FICON attachment.
For IBM z/OS clients, the Transparent Cloud Tiering feature allows migration of data directly from DS8000 storage systems to the cloud. This eliminates migrating data through the IBM Z, consuming MIPS and FICON traffic, back out to a tape or virtual tape system. IBM now offers 10GbE cards for the DS8880, providing faster throughput than the existing 1GbE cards previously available.
IBM Spectrum Scale v5.0 for IBM Elastic Storage Server
IBM Spectrum Scale v5.0 was available as software last year, and now is available as a Software PID for Elastic Storage Server hardware.
The new version introduces per-drive editions for licensing: Data Access edition, and Data Management edition. Here are highlights of some of the features:
Enhancements to GUI usability, including managing file systems between ESS and non-ESS storage
Audit File Logging (Data Management Edition only) for Open, Close, Destroy (Delete), Rename, Unlink, Remove Directory, Extended Attributed change, Access Control List (ACL) change
Enhancements to Active File Management, providing WAN-caching for multi-site deployments
Independent KPMG certification will be done for Spectrum Scale v5.0 on ESS for the "Immutability" feature. Some people refer to this as WORM, Government Compliance, Tamperproof, or Non-Erasable, Non-Rewriteable (NENR) enforcement protection
Enhancements to Transparent Cloud Tiering, providing archive of less-active data to IBM Cloud Object Storage, IBM Cloud, or Amazon S3.
Certification for analytics on both x86 and POWER platforms: Hortonworks Data Platform (HDP) v2.6, and Ambari v2.5
Improved I/O performance for many small and large block size workloads simultaneously, including a 4 MB default block size with variable sub-block size based on block size choice
Spectrum Scale 5.0 is incorporated into "Elastic Storage Server Solution Release 5.3". It is unfortunate the numbering is different. Existing ESS clients can download this new ESS 5.3 code from IBM FixCentral today. Going forward, starting next week or so, new Elastic Storage Servers will ship with ESS solution release 5.3 pre-installed.
The TS4500 tape library supports both TS1100 and LTO tape drives.
This feature supports mixed media in a TS4500 tape library. If you are using Library-Managed Encryption (LME), then IBM Security Key Lifecycle Manager is required as the key manager with LTO drives and cartridges.
GDPR is the IT industry's next "Y2K crisis." Effective May 25, 2018, it ensures that any citizen of the European Union can review, rectify, and even erase any personal data from corporate datacenters. Companies that fail to respond to requests can be heavily fined. See Bob Yelland's quick 13-page guidebook on this, titled [GDPR - How it Works].
His team also developed the Non-Obvious Relationship Awareness (NORA) software for the casinos, combining the records of 15 million customers, 20,000 employees, and 18 different watch lists. If a casino did business with people on certain watch lists, they could be put out of business or heavily fined.
NORA alerts identified 24 active VIP players as known cheaters, 12 employees were active gamblers against company policy, 192 employees had possible relationships with casino vendors, and in seven cases the players were the vendor. One casino discovered they were paying to have one of these cheaters flown to Las Vegas to play at their tables!
(IBM acquired Jeff's company Systems Research and Development (SRD) back in 2005. I had the pleasure of working with Jeff during his 11 year stint at IBM, and participated in his G2 project that was later spun off in 2016 to form his newest company, Senzing. See my 2011 blog post [Storage Innovation Executive Summit] of Jeff's thoughts back then.)
Jeff identifies four challenges in complying with GDPR regulation. Suppose an EU citizen comes to your company and asks just to review all information that you have on them. How would you do that?
So this is Challenge #1: There are lot's of places to look. You have a customer database, loyalty club, marketing programs, vendor and supplier databases, and customer service. But wait, the person might have also been an employee! Does your employee database let you search for information on former employees?
Challenge #2 is that the data occurs in variations. Liz Reston could be stored as Elizabeth or Beth. Her last name might have changed from various marriages and divorces. Can you generate all of the variations to search on?
(I know this personally. I am not the only famous "Tony Pearson" out there. There is Tony Pearson, a cricket player in England. There is Tony Pearson, Chief of Staff in the Australian government. And finally, there is 61-year-old "Mr. Universe" Tony Pearson, the "Michael Jackson" of Bodybuilding. Needless to say, women who showed up at my house unannounced looking for him instead were sometimes disappointed!)
Challenge #3 is that existing systems have search limitations. Imagine going to a library that doesn't have a card catalog or computerized index. Rather, you need to go floor by floor, row by row, book by book, looking for the information you are looking for.
Human Resources software might only offer search options for name, date of birth or employee serial number. Hotel systems don't offer you search capabilities of billing or home addresses.
Small typos can result in incomplete search results. Home addresses, for example, are often written in different ways, suite or apartment numbers may be represented differently as well, and abbreviations may be used to represent fully-qualified names.
What are you going to do, ask the IT department to write custom SQL queries for you? One of the unexpected benefits of Jeff's NORA system was that it could match entities between databases by street address, a trick that normally isn't designed into most applications.
Challenge #4 is that not all things that look alike are alike. For example, Liz Reston and her co-dependent husband Bob might [share the same email address].
Family members might have the same home address and phone number. Sons are often named after their fathers, but don't always write "Senior" or Junior" or "III" at the end of their names.
In other cases, roommates in college, who are not related in any other way, might share the same home address. The same apartment number or home address could be used by different people as the house is sold or apartment is rented from one family to another.
It took Jeff decades to appreciate the results of these entity relationships, and then GDPR happened in 2016. When a citizen asks to review their personal data, which they can after May 25 for free, a company must deliver within 30 days. The person can then ask to rectify certain information, or have it erased altogether.
So what seems like a simple enough question, "What do we know about Liz Reston?" turns out to be challenging to answer for a variety of reasons. Jeff did a survey of over 1,000 European companies, here were the results:
Most companies are not ready, and are concerned about their ability to comply with this GDPR regulation.
Company expect an average of 246 requests per month.
The search will require accessing, on average, 43 different system databases.
Each database search will take seven minutes.
Companies will need to dedicate seven to eight full time employees to complete these search requests.
Having access to powerful enterprise-wide "single subject search" discovery tools, however, can also lead to search abuse. For example, a famous celebrity is admitted to a hospital, and suddenly sensitive information is leaked to the tabloids or paparazzi. Someone asks their friend, a police officer, to search the license plate on someone's vehicle. A father searches his corporate database for information on his daughter's new boyfriend.
To address this privacy concern, Jeff suggests a tamper-proof audit log that shows who searched for whom. Where are we going to get technology to do this? We already have it: Blockchain! That's right, the technology that enables Bitcoin to operate without government controls already includes a tamper-proof audit log for transactions.
Jeff's plans for his new company Senzing is to deliver software for different use cases, with APIs for popular programming languages like Java and Python, and a workbench that runs on Windows. He is also considering a "Community Edition" that could be affordable for even the smallest of businesses, with a challenge to the audience to please contribute to this as an open source project.