Finding the right enterprise data warehouse to meet the data and AI challenge
31 July 2020
4 min read

Gone are the days when enterprises can function with reports delivered Monday from queries submitted Friday. Though modern enterprise data warehouses (EDW) have moved far beyond hours- or days-long cycles running analytics on vast collections of structured data, the fundamental parameters of data warehousing have not changed dramatically. High-performance enterprise-ready data warehouse solutions continue to confront the following challenges:

  • Data volume –With terabyte and petabyte-size EDWs now commonplace, high-performance operation requires excellent loading, efficient storage and database engines that meet the demand for hyper-efficiency.
  • Data quality and management – Because an EDW handles tremendous volumes of structured and non-structured data from many sources, good governance is imperative to ensure a single source of truth for all users.
  • Skills required in the Cloud Era – With enterprise data residing in disparate environments – whether by regulation or business need – the EDW market today implies hybrid and multi-cloud, with data flow, ingestion and analysis moving across different systems. These myriad handshakes between different systems require advanced skillsets.
  • Support for the AI ladder – An AI-ready EDW must collect, organize and analyze data to infuse relevant knowledge and intelligence into the operation of AI-focused businesses.

Netezza® Performance Server (NPS) on the IBM® Cloud Pak® for Data (ICP4D) platform is an EDW that fully meets these challenges by adopting a containerized solution running on a fully-enabled data and AI platform. The solution provides a wide range of capabilities easy to implement in hybrid or multi-cloud environments, including identical deployments on premises or in public clouds that support containers. And hardware improvements have made it possible to see 3x faster speeds than previous Mako models.[1]

Cabot Partners recently evaluated several EDW solutions including NPS, AWS Redshift, Snowflake, Azure and Synapse. This two-part blog series will survey four areas of total value of ownership (TVO) discussed in the Cabot analysis to help explain why Cabot awarded NPS its highest rank: total cost of ownership (TCO), improved productivity, revenue/profits, and risk mitigation.

Total Cost of Ownership

Of the EDW solutions Cabot compared, many are exclusively cloud-based. Some, like AWS Redshift or Microsoft Synapse, restrict customers to a single public cloud deployment. Others that reside on multiple clouds are distinguished by subtle differences (Snowflake) or may require you to set up a virtual machine (VM) and follow a “DIY” procedure (Teradata). These differences may require additional or ongoing engineering and management resources and may require different on-premises or cross-cloud databases – including adding the inevitable ETL – to support proper function.

A hybrid solution provides a flexible way to keep predictable, stable, and often regulated data close at hand while running variable, spiky workloads in the cloud. A multicloud approach prevents lock-in and enables the exploitation of unique capabilities on different clouds without added hassle. These configurations pair nicely with data virtualization, providing a single view of data wherever it resides, enabling analysis without expensive data migration or ETL. Native data virtualization prevents the necessity to deploy a separate DV solution, eliminating time-consuming, costly, painful maintenance. Neteeza Performance Server offers a hybrid solution with native data virtualization.

Though pause and resume features are noted as a cost-saving capability, their use is overstated. A high-performance EDW production environment rarely stops the workload. Indeed, it seems this feature is touted most highly by vendors who solve high concurrency needs with multiple virtual warehouses (VW) and then pause to avert exorbitant costs.

Improved Productivity

For improved productivity, a containerized data warehouse built on Red Hat OpenShift is the industry’s most secure and comprehensive enterprise-grade container platform. Developed on well-established and accepted industry standards – Dockers and Kubernetes – OpenShift supports building, deploying and scaling applications on any infrastructure. OpenShift provides a faster path to production, reduces the need to monitor, and ensures the environment remains uniform across cloud platforms. Features like high availability are native to OpenShift without any required setup. The ability to deploy in on-premises or most public cloud platforms helps avert being locked into a single cloud and potential productivity disruptions and costly migration.

Additional helpful features include:

  • Massively Parallel Processing (MPP) – Breaks down queries to multiple nodes to fully utilize available power.
  • NVMe flash drive – reduces I/O overhead, provides better performance for various logical interfaces, helps support thousands of users and processes data at high speed and low cost.
  • Hardware acceleration (FPGA) – faster read from disk, the “where” clause is handled here instead of CPA, less data is moved into memory.

NPS is the only EDW on the market that combines each of these on a Red Hat OpenShift base.

A robust web console enhances productivity, providing at-a-glance information about utilization, query throughput, query performance (prep time and duration), active queries, queue lengths, open sessions, resource allocation and currently running jobs. An EDW with a web console provides a quick and comprehensive view via a single source, giving the administrator a broad view and information to support better production performance. Again, in this area, NPS sets the standard.

Get the full story

As these components of total value of ownership demonstrate, Netezza Performance Server provides outstanding value. Read the full Cabot Partners report for a deeper dive into comparisons between the five competing EDW solutions.

For more information, read a review of the second two components of TVO: revenue/profits and risk mitigation.

 
Author
Felix Lee Program Manager, Competitive Office, IBM Data and AI, IBM Cloud and Cognitive Software
Footnotes

[1] Based on internal and customer-reported tests