August 11, 2020 By James Cho 5 min read

In two previous blogs, we analyzed the total value of ownership (TVO) for enterprise data warehouses (EDW) based on information gleaned from a recent Cabot Partners analyst report. Predictably, many highlights of the TVO assessment cover the most significant EDW challenges including exponentially greater data volume, the challenges of maintaining data quality and acquiring cloud-savvy expertise and the need to support an enterprise’s journey to AI.

The Cabot Partners analysis identified five essential layers of the information architecture in which EDW solutions must excel: modernization, infrastructure, enterprise-readiness,data management, and analytics. Cabot Partners also evaluated several leading EDW solutions, providing scores for each: IBM® Netezza® Performance Server, Snowflake, AWS Redshift, Teradata and Azure Synapse Analytics. The following comparative analysis details the features and capabilities on which those scores are based.

Modernizing the enterprise data warehouse

A modern enterprise data warehouse requires a common, collaborative, cloud-like environment. This helps it run all analytic processes from data ingest to insights with choice, flexibility, business continuity and high-performance. A modern data warehouse must also be an extensible platform that is efficient, agile and supports features such as data virtualization, containerization, multi-cloud management, self-service capabilities and faster and easier data migration. We’ll look at three of these considerations in greater depth: Hybrid cloud support, containerization and ease of migration.

Hybrid cloud data platforms combine the benefits of public cloud with on-premises infrastructure to offer a better EDW solution. But many EDW solutions are only available on public cloud services – some specifically only on their own public cloud services. This promotes vendor and/or location lock-in, reducing solution flexibility, potentially increasing costs, reducing efficiency and introducing security and compliance concerns.

Containerization supports portability for multi-cloud and hybrid cloud operations by providing a way to deploy the same solution using a common method, thus increasing it’s portability. Yet, not all containers are created equal. Cloud-native microservice containers that can be deployed everywhere are the best option. Single node containers that can’t scale to a distributed multi-node environment and aren’t organized or managed by a common orchestrator like Kubernetes or Redhat Openshift should be avoided.

Migrating multiple data warehouses poses complexity and risk that a common container design with full hybrid and multi-cloud support can avert. Netezza Performance Server, the ability to upgrade with a single command or move from one deployment to the other with a simple backup and restore process saves considerable time and effort. Solutions without these features will have a complex migration just from the differences between on-prem and public cloud deployments or from different data formats and application support, potentially requiring expensive and risky re-writes and validation.

Optimizing infrastructure for higher performance

Optimized infrastructure exerts a direct impact on performance, which affects return on investment. The best EDW solutions provide a hyper-converged software and hardware appliance option alongside cloud solutions. But three of the five leading vendors in the Cabot Report are cloud-only. These limitations pose signigicant obstacles for large enterprises with legacy data warehouses that require highly available and scalable on-premises EDW solutions. Network and storage performance is the Achille’s heel of public clouds. Conversely, on-premises deployments offer dedicated networks and storage for consistent performance – eliminating the problem of “noisy neighbors” or larger geographic distances that introduce latency penalties. And on-premises solutions provide more dense compute and memory resources. Of the vendors featured in the Cabot Partner report offering both on premises and cloud, Netezza Performance Server was the only one to have recently-refreshed hardware and architecture. It also offers hardware-accelerated FPGAs to optimize performance on-premises.

Building enterprise-readiness into the data warehouse

For a data warehouse solution to be enterprise-ready, it must be secure, properly governed and compliant, well-integrated, highly available, scalable and well supported and maintained. Most data warehouse solutions are roughly competitive in these areas, but subtle, significant differences can make a tremendous impact.

Cloud-only solutions, once again, are not enough. Enterprises must have a solution where colud and on-premises flexibility is assured. Not having an on-premises solution or having solutions that don’t work well across deployments, in effect, result in enterprises not being able to store the data where it makes the most sense. An over-reliance on public cloud deployments because of this may introduce compliance concerns. An over-reliance on on-premises deployments may cause enterprises to purchase costly excess capacity. Neither is a viable option for enterprises that continually seek to drive down costs, operate as efficiently as possible, and adhere to the appropriate regulations.

In addition, the Cabot Partners report notes that several solutions are also relatively new to the market and require additional validation in practical use cases to determine if they meet the needs of customers. In contrast, Netezza sets itself apart with over 15 years of use in production situations. This means that the core Netezza functionality in Netezza Performance Server has already been validated and driven success for a wide range of industries and use cases. Praise for Netezza’s strength can be found from analysts and customers alike.

Ensuring data integrity through better management and governance

Cabot’s analysis of data management prowess assessed whether data from multiple sources could be made available in a useful, modeled and consistent format through data movement. Other key capabilities in this layer include governance and support for big data, mixed workloads, and data quality. Most EDWs lack out of the box tools or maturity in one or more of these areas.

When data grows rapidly and is sourced externally, sound governance is critical. Tools to be able to manage the usability, integrity and security of the data can be more important than the database that stores it in some cases. These tools are most effective when built into the EDW architecture natively, rather than added as a collection of third-party extensions that require more time and effort to integrate, while also introducing support and compatibility issues.

Native data virtualization should be considered a must-have for data movement . It allows the data to be brought together and governed at a single access point rather than across multiple silos. Moreover, it limits or eliminates extract, transform and load (ETL) processes that can increase expenses due to additional movement and storage as well as waste valuable time.

Managing large data volumes for analytics

The ability to manage large volumes of data for discovery, self-service analytics, dashboards/visualization, machine learning and in-database analytics is critical. Data virtualization supports queries of any data source, anywhere, without the necessity to move data from siloed sources. Data Science tools like the ones available with Netezza Performance Server for IBM Cloud Pak® for Data help data scientists and analysts prepare data for building models at scale. In-database analytic functions as well as industry-standard compatible geospatial support complement these capabilities.

Conclusion

Cabot Partners concluded that across all critical EDW layers, IBM Netezza Performance Server for IBM Cloud Pak for Data outpaces leading competitors. For a deeper dive, read the full report or book a free one-on-one consultation with one of our experts.

Was this article helpful?
YesNo

More from Cloud

Apache Kafka use cases: Driving innovation across diverse industries

6 min read - Apache Kafka is an open-source, distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. Whether checking an account balance, streaming Netflix or browsing LinkedIn, today’s users expect near real-time experiences from apps. Apache Kafka’s event-driven architecture was designed to store data and broadcast events in real-time, making it both a message broker and a storage unit that enables real-time…

Primary storage vs. secondary storage: What’s the difference?

6 min read - What is primary storage? Computer memory is prioritized according to how often that memory is required for use in carrying out operating functions. Primary storage is the means of containing primary memory (or main memory), which is the computer’s working memory and major operational component. The main or primary memory is also called “main storage” or “internal memory.” It holds relatively concise amounts of data, which the computer can access as it functions. Because primary memory is so frequently accessed,…

Cloud investments soar as AI advances

3 min read - These days, cloud news often gets overshadowed by anything and everything related to AI. The truth is they go hand-in-hand since many enterprises use cloud computing to deliver AI and generative AI at scale. "Hybrid cloud and AI are two sides of the same coin because it's all about the data," said Ric Lewis, IBM’s SVP of Infrastructure, at Think 2024. To function well, generative AI systems need to access the data that feeds its models wherever it resides. Enter…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters