Can Structured Query Language [SQL] be considered a storage protocol?
Several months ago, I was asked to review a book on SQL, titled appropriately enough "The Complete Idiot's Guide to SQL", by Steven Holzner, Ph.D. As a published author myself, I get a lot of these requests, and I agreed in this case, given that SQL was invented by IBM, and is a good fundamental skill to have for Business Analytics and Database Management.
(FTC Disclosure: I work for IBM but was not part of the SQL development team. I was provided a copy of this book for free to review it. I was not paid to mention this book, nor told what to write. I do not know the author personally nor anyone that works for his publicist. All of my opinions of the book in this blog post are my own.)
Despite an agreed-upon standard for SQL, each relational database management system (RDBMS) has decided to customize it for their own purposes. First, SQL can be quite wordy, so some RDBMS have made certain keywords optional. Second, RDBMS offer extra features by adding keywords or programming language extentions, options or parameters above and beyond what the SQL standard calls for. Third, the SQL standard has changed over the years, and some RDBMS have opted to keep some backward compatibility with their prior releases. Fourth, some RDBMS want to discourage people from easily porting code from one RDBMS to another, known in the industry as vendor lock-in.
Throughout my career, I have managed various databases, including Informix, DB2, MySQL, and Microsoft SQL Server, so I am quite familiar with the differences in SQL and the problems and implications that arise.
Most authors who want to write about SQL typically make a choice between (a) stick to the SQL standard, and expect the reader to customize the examples to their particular DBMS; or (b) stick to a single RDBMS implemenation, and offer examples that may not work on other RDBMS.
I found the book "The Complete Idiot's Guide to SQL" covered the basics quite well, but with an odd twist. The basics include creating databases and tables, defining columns, inserting and deleting rows, updating fields, and performing queries or joins. The odd twist is that Steven does not make the typical choice above, but rather shows how the various DBMS are different than standard SQL syntax, with actual working examples for different RDBMS.
You might be thinking to yourself that only an idiot would work in a place that had to require knowledge of multiple RDBMS. The sad truth is that most of the medium and large companies I speak to have two or more in production. This is either through acquisitions, or in some cases, individual business units or departments implementing their own via the [Shadow IT].
(For those who want to learn SQL and try out the examples in this book, IBM offers a free version of DB2 called [DB2-C Express] that runs on Windows, Linux, Mac OS, and Solaris.)
Last week, while I was in Russia for the [Edge Comes to You] event, I was interviewed by a journalist from [Storage News] on various topics. One question stuck me as strange. He asked why I did not mention IBM's acquisition of Netezza in my keynote session about storage. I had to explain that Netezza was not in the IBM System Storage product line, it is in a different group, under Business Analytics, where it belongs.
While it is true that Netezza can store data, because it has storage components inside, the same could also be said about nearly every other piece of IT equipment, from servers with internal disk, to digital cameras, smart phones and portable music players. They can all be considered storage devices, but doing so would undermine what differentiates them from one another.
Which brings me back to my original question: Should we consider SQL to be a storage protocol? For the longest time, IT folks only considered block-based interfaces as storage protocols, then we added file-based interfaces like CIFS and NFS, and we also have object-based interfaces, such as IBM's Object Access Method (OAM) and the System Storage Archive Manager (SSAM) API. Could SQL interfaces be the next storage protocol?
Let me know what you think on this. Leave a comment below.
technorati tags: IBM, SQL
This week, IBM made over a dozen announcements related to IBM storage products. Here is part 2 of my overview:
- IBM System Storage® DS8000 series microcode
One of the advantages of acquiring XIV as IBM's other high-end disk system, is that it allows the DS8000 team to focus on the IBM i and z/OS operating systems. As a result, IBM DS8000 has over half the mainframe-attach market share.
For both the DS8700 and DS8800 models, IBM Easy Tier now support sub-LUN automated tiering across three storage tiers: Solid-State Drives, high-performance spinning disk drives (15K and 10K RPM), and high-capacity disk drives (7200 RPM).
For System z customers, the latest DS8000 microcode has synergy with z/OS and GDPS, now supporting 4x larger EAV volumes, faster high-performance FICON (zHPF), and Workload Manager (WLM) integration with the I/O Priority Manager. IBM has a world record SAP performance of 59 million account postings per hour. DB2 v10 for z/OS queries were measured at 11x faster using the new zHPF feature.
- IBM System Storage® DS8800 systems
On the hardware side, the DS8800 now supports a fourth frame to hold a total over 1,500 disk drives. Yes, we have customers that three frames wasn't enough, and they wanted more.
IBM is now also offering new drive options. Small Form Factor (2.5 inch) drives now include 300GB 15K RPM drives, and a 900GB 10K RPM drives. But wait! There's more! The DS8800 is no longer a SFF-only box, it now allows for mixing in Large form factor (3.5 inch) drives, starting with the 3TB NL-SAS 7200 RPM drive.
- IBM XIV® Storage System Gen3
We announced the XIV Gen3 already, but we have two enhancements.
First, we now offer a model based entirely on 3TB NL-SAS drives. If you are thinking, what IBM is going to put 3TB drives into everything? Yup. Once we go through all the pain and suffering of qualifying a drive, we make sure we get our money's worth!
Secondly, we have now an iPad application to manage the XIV. This has nothing to do with Apple CEO Steve Jobs passing away last week, it was merely coincidence.
- IBM Real-time Compression Appliances™ STN6500 and STN6800 V3.8
The latest software for RtCA now supports Microsoft SMB v2, and enhanced reporting so that storage admins know exactly the benefits of the compression ratios of different file extensions.
- IBM System Storage EXP2500 Express®
The EXP2500 is for direct-attach situations, like the IBM BladeCenter. IBM adds LFF 3.5-inch 3TB NL-SAS drives, SFF 2.5-inch 300GB 15K RPM SAS drives, and 900GB 7200 RPM NL-SAS drives.
My colleague Curtis Neal refers to these as "B.F.D" announcements, which of course stands for Bigger, Faster, Denser!
technorati tags: IBM, DS8000, DS8700, DS8800, Easy Tier, FICON, zHPF, WLM, DB2, SAS, NL-SAS, , XIV, STN6500, STN6800, EXP2500, Curtis Neal
Well, it feels like Tuesday and you know what that means... "IBM Announcement Day!" Actually, today is Wednesday, but since Monday was Memorial Day holiday here in the USA, my week is day-shifted. Yesterday, IBM announced its latest IBM FlashCopy Manager v2.2 release. Fellow blogger, Del Hoobler (IBM) has also posted something on this out atthe [Tivoli Storage Blog].
IBM FlashCopy Manager replaces two previous products. One was called Tivoli Storage Manager for Copy Services, the other was called Tivoli Storage Manager for Advanced Copy Services. To say people were confused between these two was an understatement, the first was for Windows, and the second was for UNIX and Linux operating systems. The solution? A new product that replaces both of these former products to support Windows, UNIX and Linux! Thus, IBM FlashCopy Manager was born. I introduced this product back in 2009 in my post [New DS8700 and other announcements].
IBM Tivoli Storage FlashCopy Manager provides what most people with "N series SnapManager envy" are looking for: application-aware point-in-time copies. This product takes advantage of the underlying point-in-time interfaces available on various disk storage systems:
- FlashCopy on the DS8000 and SAN Volume Controller (SVC)
- Snapshot on the XIV storage system
- Volume Shadow Copy Services (VSS) interface on the DS3000, DS4000, DS5000 and non-IBM gear that supports this Microsoft Windows protocol
For Windows, IBM FlashCopy Manager can coordinate the backup of Microsoft Exchange and SQL Server. The new version 2.2 adds support for Exchange 2010 and SQL Server 2008 R2. This includes the ability to recover an individual mailbox or mail item from an Exchange backup. The data can be recovered directly to an Exchange server, or to a PST file.
For UNIX and Linux, IBM FlashCopy Manager can coordinate the backup of DB2, SAP and Oracle databases. Version 2.2 adds support specific Linux and Solaris operating systems, and provides a new capability for database cloning. Basically, database cloning restores a database under a new name with all the appropriate changes to allow its use for other purposes, like development, test or education training. A new "fcmcli" command line interface allows IBM FlashCopy Manager to be used for custom applications or file systems.
A common misperception is that IBM FlashCopy Manager requires IBM Tivoli Storage Manager backup software to function. That is not true. You have two options:
- Stand-alone Mode
In Stand-alone mode, it's just you, the application, IBM FlashCopy Manager and your disk system. IBM FlashCopy Manager coordinates the point-in-time copies, maintains the correct number of versions, and allows you to backup and restore directly disk-to-disk.
- Unified Recovery Management with Tivoli Storage Manager
Of course, the risk with relying only on point-in-time copies is that in most cases, they are on the same disk system as the original data. The exception being virtual disks from the SAN Volume Controller. IBM FlashCopy Manager can be combined with IBM Tivoli Storage Manager so that the point-in-time copies can be copied off to a local or remote TSM server, so that if the disk system that contains both the source and the point-in-time copies fails, you have a backup copy from TSM. In this approach, you can still restore from the point-in-time copies, but you can also restore from the TSM backups as well.
IBM FlashCopy Manager is an excellent platform to connect application-aware fucntionality with hardware-based copy services.
technorati tags: IBM, Announcements, FlashCopy, FlashCopy+Manager, Microsoft, Windows, VSS, UNIX, AIX, Solaris, Linux, TSM, Exchange, SQL+Server, SAP, DB2, Oracle
Well, I'm back safely from my tour of Asia. I am glad to report that Tokyo, Beijing and Kuala Lumpur are pretty much how I remember them from the last time I was there in each city. I have since been fighting jet lag by watching the last thirteen episodes of LOST season 6 and the series finale.
Recently, I have started seeing a lot of buzz on the term "Storage Federation". The concept is not new, but rather based on the work in database federation, first introduced in 1985 by [A federated architecture for information management] by Heimbigner and McLeod. For those not familiar with database federation, you can take several independent autonomous databases, and treat them as one big federated system. For example, this would allow you to issue a single query and get results across all the databases in the federated system. The advantage is that it is often easier to federate several disparate heterogeneous databases than to merge them into a single database. [IBM Infosphere Federation Server] is a market leader in this space, with the capability to federate DB2, Oracle and SQL Server databases.
Fellow blogger and BFF, Marc Farley (3PAR) has an excellent post [Zeroing in on a definition for federated storage]. Here's an excerpt:
- Storage expansion: You want to increase the storage capacity of an existing storage system that cannot accommodate the total amount of capacity desired. Storage Federation allows you to add additional storage capacity by adding a whole new system.
- Storage migration: You want to migrate from an aging storage system to a new one. Storage Federation allows the joining of the two systems and the evacuation from storage resources on the first onto the second and then the first system is removed.
- Safe system upgrades: System upgrades can be problematic for a number of reasons. Storage Federation allows a system to be removed from the federation and be re-inserted again after the successful completion of the upgrade.
- Load balancing: Similar to storage expansion, but on the performance axis, you might want to add additional storage systems to a Storage Federation in order to spread the workload across multiple systems.
- Storage tiering: In a similar light, storage systems in a Storage Federation could have different capacity/performance ratios that you could use for tiering data. This is similar to the idea of dynamically re-striping data across the disk drives within a single storage system, such as with 3PAR's Dynamic Optimization software, but extends the concept to cross storage system boundaries.
To some extent, IBM SAN Volume Controller (SVC), XIV, Scale-Out NAS (SONAS), and Information Archive (IA) offer most, if not all, of these capabilities. EMC claims its VPLEX will be able to offer storage federation, but only with other VPLEX clusters, which brings up a good question. What about heterogenous storage federation? Before anyone accuses me of throwing stones at glass houses, let's take a look at each IBM solution:
- IBM SAN Volume Controller
The IBM SAN Volume Controller has been doing storage federation since 2003. Not only can IBM SAN Volume Controller bring together storage from a variety of heterogenous storage, the SVC cluster itself can be a mix of different hardware models. You can have a 2145-8A4 node pair, 2145-8G4 node pair, and the new 2145-CF8 node pair, all combined together into a single SVC cluster. Upgrading SVC hardware nodes in an SVC cluster is always non-disruptive.
- IBM XIV storage system
The IBM XIV has two kinds of independent modules. Data modules have processor, cache and 12 disks. Interface modules are data modules with additional processor, FC and Ethernet (iSCSI) adapters. Because these two modules play different roles in an XIV "colony", that number of each type is predetermined. Entry-level six-module systems have 2 interface and 4 data modules. Full 15-module systems have 6 interface and 9 data modules. Individual modules can be added or removed non-disruptively in an XIV.
- IBM Scale-Out NAS
The SONAS is comprised of three kinds of nodes that work together in concert. A management node, one or more interface nodes, and two or more storage nodes. The storage nodes are paired to manage up to 240 nodes in a storage pod. Individual interface or data nodes can be added or removed non-disruptively in the SONAS. The underlying technology, the General Parallel File System, has been doing storage federation since 1996 for some of the largest top 500 supercomputers in the world.
- IBM Information Archive (IA)
For the IA, there are 1, 2 or 3 nodes, which manages a set of collections. A collection can either be file-based using industry-standard NAS protocols, or object-based using the popular System Storage™ Archive Manager (SSAM) interface. Normally, you have as many collections as you have nodes, but nodes are powerful enough to manage two collections to provide N-1 availability. This allows a node to be removed, and a new node added into the IA "colony", in a non-disruptive manner.
Even in an ant colony, there are only a few types of ants, with typically one queen, several males, and lots of workers. But all the ants are red. You don't see colonies that mix between different species of ants. For databases, federation was a way to avoid the much harder task of merging databases from different platforms. For storage, I am surprised people have latched on to the term "federation", given our mixed results in the other "federations" we have formed, which I have conveniently (IMHO) ranked from least effective to most effective:
- The Union of Soviet Socialist Republics (USSR)
My father used to say, "If the Soviet Union were in charge of the Sahara desert, they would run out of sand in 50 years." The [Soviet Union] actually lasted 68 years, from 1922 to 1991.
- The United Nations (UN)
After the previous League of Nations failed, the UN was formed in 1945 to facilitate cooperation in international law, international security, economic development, social progress, human rights, and the achieving of world peace by stopping wars between countries, and to provide a platform for dialogue.
- The European Union (EU)
With the collapse of the Greek economy, and the [rapid growth of debt] in the UK, Spain and France, there are concerns that the EU might not last past 2020.
- The United States of America (USA)
My own country is a federation of states, each with its own government. California's financial crisis was compared to the one in Greece. My own state of Arizona is under boycott from other states because of its recent [immigration law]. However, I think the US has managed better than the EU because it has evolved over the past 200 years.
- The Organization of the Petroleum Exporting Countries [OPEC]
Technically, OPEC is not a federation of cooperating countries, but rather a cartel of competing countries that have agreed on total industry output of oil to increase individual members' profits. Note that it was a non-OPEC company, BP, that could not "control their output" in what has now become the worst oil spill in US history. OPEC was formed in 1960, and is expected to collapse sometime around 2030 when the world's oil reserves run out. Matt Savinar has a nice article on [Life After the Oil Crash].
- United Federation of Planets
The [Federation] fictitiously described in the Star Trek series appears to work well, an optimistic view of what federations could become if you let them evolve long enough.
Given the mixed results with "federation", I think I will avoid using the term for storage, and stick to the original term "scale-out architecture".
technorati tags: , LOST, storage, federation, IBM, DB2, Oracle, SQL, 3PAR, Marc Farley, SVC, XIV, SONAS, IA, EMC, VPLEX, USSR, United Nations, OPEC, Star Trek
It's Tuesday, and that means more IBM announcements!
I haven't even finished blogging about all the other stuff that got announced last week, and here we are with more announcements. Since IBM's big [Pulse 2010 Conference] is next week, I thought I would cover this week's announcement on Tivoli Storage Manager (TSM) v6.2 release. Here are the highlights:
- Client-Side Data Deduplication
This is sometimes referred to as "source-side" deduplication, as storage admins can get confused on which servers are clients in a TSM client-server deployment. The idea is to identify duplicates at the TSM client node, before sending to the TSM server. This is done at the block level, so even files that are similar but not identical, such as slight variations from a master copy, can benefit. The dedupe process is based on a shared index across all clients, and the TSM server, so if you have a file that is similar to a file on a different node, the duplicate blocks that are identical in both would be deduplicated.
This feature is available for both backup and archive data, and can also be useful for archives using the IBM System Storage Archive Manager (SSAM) v6.2 interface.
- Simplified management of Server virtualization
TSM 6.2 improves its support of VMware guests by adding auto-discovery. Now, when you spontaneously create a new virtual machine OS guest image, you won't have to tell TSM, it will discover this automatically! TSM's legendary support of VMware Consolidated Backup (VCB) now eliminates the manual process of keeping track of guest images. TSM also added support of the Vstorage API for file level backup and recovery.
While IBM is the #1 reseller of VMware, we also support other forms of server virtualization. In this release, IBM adds support for Microsoft Hyper-V, including support using Microsoft's Volume Shadow Copy Services (VSS).
- Automated Client Deployment
Do you have clients at all different levels of TSM backup-archive client code deployed all over the place? TSM v6.2 can upgrade these clients up to the latest client level automatically, using push technology, from any client running v5.4 and above. This can be scheduled so that only certain clients are upgraded at a time.
- Simultaneous Background Tasks
The TSM server has many background administrative tasks:
- Migration of data from one storage pool to another, based on policies, such as moving backups and archives on a disk pool over to a tape pools to make room for new incoming data.
- Storage pool backup, typically data on a disk pool is copied to a tape pool to be kept off-site.
- Copy active data. In TSM terminology, if you have multiple backup versions, the most recent version is called the active version, and the older versions are called inactive. TSM can copy just the active versions to a separate, smaller disk pool.
In previous releases, these were done one at a time, so it could make for a long service window. With TSM v6.2, these three tasks are now run simultaneously, in parallel, so that they all get done in less time, greatly reducing the server maintenance window, and freeing up tape drives for incoming backup and archive data. Often, the same file on a disk pool is going to be processed by two or more of these scheduled tasks, so it makes sense to read it once and do all the copies and migrations at one time while the data is in buffer memory.
- Enhanced Security during Data Transmission
Previous releases of TSM offered secure in-flight transmission of data for Windows and AIX clients. This security uses Secure Socket Layer (SSL) with 256-bit AES encryption. With TSM v6.2, this feature is expanded to support Linux, HP-UX and Solaris.
- Improved support for Enterprise Resource Planning (ERP) applications
I remember back when we used to call these TDPs (Tivoli Data Protectors). TSM for ERP allows backup of ERP applications, seemlessly integrating with database-specific tools like IBM DB2, Oracle RMAN, and SAP BR*Tools. This allows one-to-many and many-to-one configurations between SAP servers and TSM servers. In other words, you can have one SAP server backup to several TSM servers, or several SAP servers backup to a single TSM server. This is done by splitting up data bases into "sub-database objects", and then process each object separately. This can be extremely helpful if you have databases over 1TB in size. In the event that backing up an object fails and has to be re-started, it does not impact the backup of the other objects.
technorati tags: , announcements, IBM, Pulse, conference, TSM, Tivoli, SSAM, backup, archive, VMware, VCB, Hyper-V, Microsoft, SSL, AES, encryption, in-flight, Linux, HP-UX, Solaris, ERP, DB2, Oracle, RMAN, SAP, BR*Tools, ibm-pulse, pulse2010