Use big data and fast data analytics to achieve analytics as a service (AaaS)

Key analytical platforms on IBM SoftLayer Cloud


In the recent past, two dominant trends have gripped the IT industry. One is accelerated IT infrastructure optimization, primarily supported by cloud technologies. The other is the necessity of dealing with large amounts of data, commonly known as big data.

Today, several potentially disruptive innovations are happening in parallel in the IT field. The device ecosystem is seeing an unprecedented growth toward billions of connected electronic devices as the number of implantables, wearables, portables, and cyber-physical systems (CPS) is increasing continually. Business-critical operational, transactional, and analytical systems are becoming pervasive, while social sites are embraced by more and more people across the world. Trillions of so-called "smart objects," including chairs, sofas, tables, beds, and so on, are being digitized and coming online, while powerful scientific and technical experimentation is being performed as never before.

Traditionally, analytics has addressed itself primarily to business data, in an effort to squeeze out business insights. Today, the data size is massive, while data scope, speed, and structure vary sharply. The value of data for any individual, innovator, and institution is dependent upon the data being crunched in an intelligent and insightful way.

Two grand disciplines, big data analytics and fast data analytics, are emerging. New technologies, platforms, and tools are being offered by worldwide product vendors to support these disciplines in a simplified and streamlined fashion.

In this article, we show how data analytics can be delivered as a service by using IBM SoftLayer Cloud for worldwide users in an affordable and accelerated fashion. We review how the IBM SoftLayer Cloud is using big data and fast data analytics to achieve the goal of analytics as a service (AaaS).

Big data analytics is moving beyond the realm of intellectual curiosity and is beginning to tangibly affect business operations, offerings, and outlooks. No longer merely hype or a buzzword, big data analytics will soon become a central tenet for every sort of business enterprise.

Meanwhile, real-time analytics has become a hot requirement. For example, factories require the use of real-time data such as sensor data to detect abnormalities in plant and machinery.

Analytics for big data and real-time data in public clouds

In the past, most traditional data warehousing and business intelligence (BI) projects involved collecting, cleansing, and analyzing data extracted from on-premises business-critical systems. While this age-old practice will soon change, it is unlikely that many organizations will be in a hurry to move their mission-critical systems or data (customer, confidential, and corporate) to public cloud environments for analysis. However, businesses are adopting the cloud model for business operational and transactional purposes.

Currently, the biggest potential for cloud computing is in the processing of data that already exists in cloud centers. Many functional websites, applications, and services are bound to be cloud-based sooner rather than later. Eventually, every kind of physical asset will be seamlessly integrated with cloud-based services. For example, ground-level sensors and actuators are increasingly tied up with cloud-based software. Such developments suggest that future data analytics will flourish in cloud environments.

Today, public clouds natively provide many kinds of big data analytics platforms and tools in order to speed up data analytics at an affordable cost. WAN optimization technologies are maturing quickly to substantially reduce network latency while transmitting large amounts of data from one system to another among geographically distributed clouds. Federated, open, connected, and interoperable cloud schemes are being devised, and we foresee the coming of the inter-cloud soon through open standards and deeper automations.

With the continued adoption and articulation of new capabilities and competencies, such as software-defined computing, storage, and networking, cloud-based data analytics should grow immensely.

Filtering and anonymyzing data in hybrid and public clouds

In the coming years, the value of hybrid clouds will climb sharply, because a mixed and multi-site IT environment is appropriate for most emerging scenarios. For the analytics space, a viable hybrid cloud use case is to filter out sensitive information from data sets shortly after capturing that data and then to use the public cloud to perform any complex analytics on them. For example, if the goal is to analyze terabytes of medical data to identify overall healthcare patterns, the identity details of individual patients are not relevant. In that case, a filter can scrub names, addresses, Social Security numbers, and so on, before pushing the anonymized data to secure cloud data storage.

Software systems are steadily being modernized and moved to cloud environments, especially public clouds subscribed to and used as a service over the web.

Another noteworthy development is that a variety of social sites that are used by people across the world are emerging and joining in mainstream computing. Facebook, for example, pours out at least 8 terabytes of data every day. Similarly, other social sites produce large amounts of personal, social, and professional data apart from mere comments, complaints, and advertisements. This poly-structured data plays an important role in shaping the data analytics domain.

Other notable trends include the movement of enterprise-class operational, transactional, commercial, and analytics systems to public clouds. For example, SalesForce ( is the founding public cloud providing CRM as a service. Thus, most enterprise data originates in public clouds. With public clouds projected to grow quickly, cloud data provides another opportunity for cloud-based data analytics.

Contemporary analytics in hybrid clouds

Apart from traditional business analytics, the trends discussed above require newer kinds of analytics that can manage big data and real-time data. These analytics fall into both domain-specific and domain-agnostic analytics categories.

It is important to do operational analytics on all kinds of data of IT infrastructural elements—such as appliances, electronics, and other devices—in order to perform predictive maintenance of them. In other words, doing predictive analytics of network devices' data depends upon doing operational analytics.

Every industry vertical has its big data analytics. With different data velocities, real-time and streaming analytics are certain to be required. Here are a few parameters to consider when determining the appropriateness of cloud environments for data analytics:

  • Data volume and velocity
  • Impacts on computing, storage, and network resources
  • Sensitivity of data and regulatory and compliance requirements
  • Scope of analytics
  • Types of environments

Next-generation data analytics applications and platforms in cloud environments

Cloud-based data analytics has been growing rapidly in an effort to reap all the benefits of the cloud paradigm. Here is a list of potential key benefits of moving to the cloud:

  • Agility and affordability—No capital investment of a large-scale IT infrastructure is needed. Just use and pay.
  • Big and fast data platforms—Deploying and using any kind of big data platforms (generic or specific, open or commercial-grade) for analytics are quick and easy.
  • End-to-end Hadoop platforms—Data virtualization, ingestion, processing, mining, analytics, and information visualization tasks are being performed by these platforms.
  • Data management systems—Parallel, clustered, distributed SQL databases, NoSQL, and NewSQL databases are made available in clouds.
  • Data warehouse systems—Recently, data warehouse as a service (DWaaS) capabilities are being realized.
  • Social sites, mobile application stores, and similar apps—Popular social media and network applications are being run on public clouds.
  • WAN optimization technologies—WAN optimization products and platforms for efficiently transmitting data over the Internet infrastructure have emerged.
  • Business applications in clouds—With enterprise information systems (EIS), business-critical packaged applications such as ERP, CMS, SCM, KM, and so on, are also being deployed in clouds.
  • Cloud integrators, brokers, and orchestrators—Products and platforms for seamless interoperability among different and distributed systems, services, and data are available.
  • Operational, transactional, and analytical systems are being modernized, migrated, and hosted in clouds.
  • Devices, sensors, and other machines are being integrated with cloud-native applications, as well as enabled applications, services, and data.

We have created a number of proof of concepts (PoCs) to understand cloud-based big and fast data analytics. (Note: These articles are currently under review and upon publication will be available through IBM developerWorks.)

The following sections describe the various platforms, databases, and tools that are made to run in IBM SoftLayer Cloud for simplifying and streamlining analytics as a service to worldwide clients and customers.

Big data analytics platforms in IBM SoftLayer Cloud

Increasingly, individuals, innovators, and institutions are taking advantage of the agility and cost efficiencies that cloud infrastructures provide. This "cloudification" of enterprise IT infrastructures also offers several other advantages.

Most developers agree that Hadoop is the most important method of confidently handling big data. The maturity and stability levels of Hadoop-compliant data analytics platforms are pushing companies toward big data analytics. Hadoop-based platforms are being steadily taken to cloud environments in order to deliver big data analytics with nimbleness and suppleness.

As noted earlier, the cloud infrastructure is being positioned as the most appropriate one for big data analytics. Several open source as well as commercial-grade implementations of Hadoop specifications are on the market, including Cloudera, Hortonworks, and MapR. IBM InfoSphere BigInsights, with Apache Hadoop as its base, is the most favored commercial implementation of Hadoop.

Designed specifically for mission-critical environments, Cloudera Enterprise includes Cloudera data hub (CDH), the world's most popular open source Hadoop-based platform, as well as advanced system management and data management tools. Cloudera Enterprise includes Cloudera Manager to help you easily deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at scale.

Cloud environments are becoming increasingly popular for critical Apache Hadoop workloads, given their flexibility and elasticity. With Cloudera Director, you can unlock the full potential of Hadoop in the cloud, without compromise.

SoftLayer Cloud not only provides potentially unlimited resources for your high-performance computing cluster; it also makes it easy to manage with Cloudera-managed Hadoop.

We have deployed Hortonworks and MapR Hadoop platforms in SoftLayer Cloud. A typical cloud-based solution comprises storage, processing, and management components deployed on SoftLayer Cloud, which provides an extensible, elegant, efficient, and elastic environment for processing your data. Other benefits include extreme flexibility, high performance, agility, and pay-per-usage, eliminating upfront costs.

IBM BigInsights, also available on SoftLayer Cloud, provides the following benefits:

  • Accelerates and simplifies cluster deployment—Take advantage of big data analytics without the need for an on-premise infrastructure.
  • Scales as your business demands—Keep infrastructure costs in line with the changing needs of the business.
  • Provides advanced tools to reduce time to value—Gain value from Big SQL, Big Sheets, text analytics, and more.
  • Optimizes performance and enhances security—Experience speed and reliability with a dedicated bare-metal infrastructure.
  • Offers expertise and best practices—Benefit from a dedicated cloud operations team that deploys clusters based on best practices.

Real-time analytics platforms in IBM SoftLayer Cloud

Real-time analytics on fast and streaming data can also be done successfully in cloud environments. In this section, we explain how a couple of platforms were modernized and migrated to the IBM SoftLayer Cloud center in order to illustrate the concerns, challenges, and changes associated with cloud-based real-time analytics.

Now that data is being generated and captured in unprecedented amounts, the traditional data analytics platforms and infrastructures are bound to face a variety of constraints. We need robust and resilient algorithms and IT solutions for big and fast data. Several product vendors, having realized the growing challenges, are proactively bringing forth big data analytics systems that facilitate the smooth transition of captured and consolidated data to information and knowledge.

Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI), and visualization solutions are critical to successful knowledge extraction and engineering. VoltDB is a high-performance and scalable relational database management system (RDBMS) for big data, high-velocity OLTP and real-time analytics. VoltDB, which is a kind of NewSQL database, is a blazingly fast database that is designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale.

VoltDB provides:

  • Database throughput reaching millions of operations per second
  • On-demand scaling
  • High availability, fault tolerance, and database durability
  • Real-time data analytics

VoltDB is deployed in SoftLayer Cloud in order to showcase its real-time and real-world capabilities of producing actionable insights.

In addition to data size and structure, data speed also matters greatly. Specific use cases across industry verticals are emerging that require fast data analytics. Data is being updated, encapsulated, and delivered as messages. Data and event messages are emerging as formalized building blocks to be received, opened, parsed, and used for a variety of deeper and more decisive analysis. Data streams (multimedia) and events from newer data sources such as sensors, machines, operational systems, platforms, and so on, need to be systematically captured and analyzed in real time. While clouds are being positioned as the core and optimized IT infrastructure, there are several open source as well as commercial-grade platforms for automating the process of real-time and streaming analytics.

Apache Storm is one such real-time analytics platform. A free and open source distributed real-time computation system, Apache Storm makes it easy to reliably process unbounded streams of data. It thus does for real-time processing what Hadoop did for batch processing. Storm is simple and can be used with any programming language. It has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

Apache Storm is fast. One benchmark clocked Storm at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queuing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation as needed. We deployed an instance of Apache Storm in IBM SoftLayer Cloud and chose a small use case in order to show how cloud-based Storm functions and how it delivers its goals.

High-performance big data analytics in SoftLayer Cloud

Everyone agrees that in today's world high performance is key. Valid concerns have been expressed in different quarters that cloud environments do not guarantee high performance. For that reason, it is important to host high-performing platforms on clouds in order to ensure high performance of cloud-hosted services and workloads.

Big data analytics (BDA) is emerging as a data-intensive activity that requires high-end IT infrastructures and integrated platforms to simplify and streamline the tasks that are typically associated with any data analytics. Currently, there are several viable options to accomplish data analytics efficiently, ranging from mainframes, clusters, grids, and appliances, to super computers. Hadoop platforms are the most sought after for enabling cost-effective analysis of multi-structured data mountains. High-performance computing (HPC) is the most appropriate computing model to adopt when approaching the infrastructural challenges thrown by BDA.

One of our PoCs shows how the Netezza software solution can be systematically moved to IBM SoftLayer Cloud, configured there, and used for accomplishing next-generation real-time analytics with a low total cost of ownership (TCP) and a high return on investment (RoI). We provide all the relevant details of a sample application that accentuates the power of cloud-based Netezza in fulfilling the various requirements of high-performance data analytics.

Streaming analytics in IBM SoftLayer Cloud

Stream computing continuously integrates and analyzes data in motion to deliver real-time analytics. It further enables organizations to detect insights (risks and opportunities) in high-velocity data that can be detected and acted on at only a moment's notice. High-velocity flows of data from real-time sources, such as market data, machines, smartphones, sensors and actuators, click streams, and even transactions, remain largely untapped.

IBM Cloud Analytics Application Services delivers high-performance clusters for running enterprise-grade big data and analytics workloads on a dedicated bare metal infrastructure pre-installed with industry-leading big data software.

IBM InfoSphere Streams, the supported software for this type of cloud analytics, is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze, and correlate information as it arrives from thousands of real-time sources. This solution can handle very high data throughput rates, up to millions of events or messages per second.

Many organizations need to process a large amount of data in real-time for real-time analytics or ETL or to respond to events instantaneously. Analyzing big data streams in real time is emerging as a distinct need for many industry verticals.

We have deployed DataTorrent in IBM SoftLayer Cloud and verified how it delivers on its promises for big data streaming analytics. DataTorrent is an enterprise-grade software platform that enables businesses to perform any sort of data processing or transformations on structured or unstructured data, all in real time as the data is being streamed into a data center. Leveraging Hadoop 2.0, DataTorrent is a YARN-native application platform. It can be installed directly onto an existing Hadoop cluster, connect directly to all in-coming data sources live, and perform any type of processing or transformation of your data in-memory, as it comes streaming in. DataTorrent will handle all of the scaling and fault tolerance of the system, leaving enterprises to focus on their business logic.

DataTorrent supports today's most demanding, mission-critical, big data streaming applications. It enables you to quickly develop applications that ingest massive amounts of data from various sources in real time, and perform highly scalable computations in real time. With DataTorrent, you can leverage your existing Hadoop environment for real-time stream processing. We have employed a sample application in order to show readers how cloud-based real-time analytics applications can be implemented in a streamlined manner.

End-to-end big data analytics platform in IBM SoftLayer Cloud

In general, Hadoop platforms do pre-processing, processing, and analytics for knowledge discovery. But an end-to-end big data analytics platform involves data collection, virtualization, ingestion, analytics, and visualization modules. With just a single click, everything is accomplished quickly and securely.

Datameer is one such platform. Specifically built for Hadoop, Datameer enables the fastest time from raw data to new insights. Its mission is to eliminate the complexity of the tasks associated with big data analytics and empower everyone to make data-driven decisions in minutes, not months. Data scientists or multiple technical tools no longer need to model, integrate, cleanse, prepare, analyze, and visualize your data. Datameer is a one-stop shop for getting all your data into Hadoop, analyzing that data, discovering the knowledge, and visualizing the insights squeezed into a preferred form and format. Datameer can handle all kinds of data from multiple sources as illustrated in Figure 2 below. It has been successfully installed in the IBM SoftLayer Cloud environment and tested with a sample application to demonstrate its unique capabilities.

Databases in IBM SoftLayer Cloud

Versatile in-memory computing, NoSQL and NewSQL databases, parallel file systems, and so on, are important IT solutions to be hosted and run in elastic clouds.

NoSQL databases

Let's look at these NoSQL databases:

  • HBase
  • Apache Cassandra
  • Aerospike

HBase is a column-oriented database management system that runs on top of the Hadoop distributed file system (HDFS). A NoSQL database, HBase is well suited for sparse data sets. Unlike SQL, it does not support a structured query language. An HBase system comprises a set of tables, and each table must have an element defined as a primary key. All access attempts to HBase tables must use this primary key. An HBase column represents an attribute of an object and allows many attributes to be grouped together into what are known as column families. With HBase, you must predefine the table schema and specify the column families. However, HBase is very flexible in that new columns can be added to families at any time, making it possible for the schema to adapt to changing application requirements.

HBase is a part of every standard Hadoop distribution and has been installed in IBM SoftLayer Cloud. There are certain usage scenarios where big data analytics (BDA) is successfully performed with the help of a cloud-based HBase database.

There are several other competent and high-end NoSQL databases in the marketplace. Facebook Cassandra and Google BigTable, for example, are popular database management systems moving into cloud environments.

The Apache Cassandra database is an excellent choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple data centers is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for de-normalization and materialized views, and powerful built-in caching. This data model is also deployed in IBM SoftLayer Cloud.

Basho Riak is another NoSQL database made available in SoftLayer Cloud. Other well-known databases such as MongoDB are also being taken to the cloud.

Aerospike is an open source distributed NoSQL database optimized for in-memory and SSD-based indexing and data storage. Aerospike is a modern database built from the ground up to push the limits of flash memory, processors, and networks. It was designed to operate with predictable low latency at high throughput with uncompromising reliability. It greatly simplifies developers' workloads by eliminating the need to incorporate the logic for sharding and for cluster changes. This game-changing database solution also eliminates the need to worry about data loss or downtime.

Aerospike is ideal for real-time big data or context-driven applications that must sense and respond immediately. It operates at in-memory speed and global scale with enterprise-grade reliability. Identical Aerospike servers scale out to form a shared-nothing cluster that transparently partitions data and makes processing across nodes parallel. Nodes in the cluster are identical: you can start with two and just add more hardware. The cluster scales linearly.

We have migrated an instance of Aerospike database to the IBM SoftLayer Cloud environment and configured it to deliver on its promises.

MySQL databases in IBM SoftLayer Cloud

ScaleBase brings in elasticity, scalability, and continuous high availability to MySQL databases and applications in public, private, and hybrid cloud environments. ScaleBase enables instant and transparent MySQL scale out, leveraging the power of smaller, less expensive servers working together. Policy-based data distribution (automated sharding), powered by the ScaleBase Analysis Genie, and intelligent load balancing with replication-aware read/write splitting enable growth of the operational load and throughput. They also enable better application performance and protect from varying usage peaks and load spikes.

ScaleBase automated failover and failback ensure business continuity and protection from both unexpected and expected outages. They also simplify different ongoing maintenance tasks, such as software and hardware upgrades, all without impacting the application or database availability. The ability to migrate an application from a hosted environment with a single growing database to a virtualized environment with smaller, more manageable data nodes gives companies agility, flexibility, and competitiveness.

ScaleBase was built especially for cloud deployment. It can be run on private clouds and is also available on public clouds. We have executed the initial formalities in order to prepare and migrate the ScaleBase solution to the IBM SoftLayer public cloud. We have made the necessary configuration changes and created a small sample application to run and check how ScaleBase functions in an online, off-premise, and on-demand cloud environment. These steps form a major part of our strategy of empowering public cloud offerings for data and process-intensive applications

NewSQL databases in IBM SoftLayer Cloud

Essentially, NewSQL combines the best features from both worlds: it maintains the transactional integrity of traditional database systems while providing high-end scalable performance of NoSQL systems. This combination of performance and scale is crucial in transaction-intensive environments. NoSQL-based data systems are riding a seismic wave of success with the promise of scalability. NewSQL databases seek to overtake NoSQL with the added bonus of high-speed transactional integrity.

VoltDB (described earlier in this article) is a NewSQL database that has been successfully deployed in IBM SoftLayer Cloud and subjected to a variety of small-scale tests. Other popular NewSQL databases such as Clustrix and NuoDB are fast acquiring market share. They are conveniently hosted and delivered as a service via cloud environments.

Database as a service (DBaaS)

Today's applications are expected to manage a variety of structured and unstructured data, accessed by massive networks of users, devices, business locations, and even sensors, vehicles, and Internet-enabled goods. Companies of all sizes, from startups to mega-users like Samsung, Hothead Games, and Fidelity Investments, use Cloudant to manage data for large or fast-growing web and mobile applications in e-commerce, online education, gaming, financial services, and other industries.

IBM Cloudant is best suited for applications that need a database to handle a massively concurrent mix of low-latency reads and writes. Its data replication and synchronization technology also enables continuous data availability, as well as offline application usage for mobile or remote users. In a large organization, it can take several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation and agility. DBaaS enables instant provisioning of your data layer, so that you can begin new development at any time.

Unlike do-it-yourself (DIY) databases, DBaaS solutions such as Cloudant provide—and guarantee—a specific level of data layer performance and uptime. This eliminates the risk of service delivery failure for you and your project. The Cloudant database as a service (DBaaS) is the first data management platform to leverage the availability, elasticity, and reach of the cloud to create a global data delivery network (DDN) that enables applications to scale larger and remain available to users wherever they are.

Data warehouse as a service (DWaaS) in IBM SoftLayer Cloud

IBM dashDB is a fully managed data warehousing service in the cloud that puts an analytics powerhouse at your fingertips. IBM dashDB allows you to break free from the bonds of infrastructure when your business demands it. IBM dashDB can help extend your existing infrastructure into the cloud, or help you start new data warehousing self-service capabilities. It is powered by high-performance in-memory and in-database technology that delivers answers as fast as you can think. IBM dashDB provides the simplicity of an appliance with the elasticity and agility of the cloud for any size organization. It is designed to meet your expectations of enterprise security, and with it you can gain instant access to critical business insights without a hefty upfront infrastructure investment. You can load, analyze, and visualize your data in minutes. With IBM dashDB, the day of providing data warehouse as a service has arrived.

IBM Watson Analytics in SoftLayer Cloud

IBM Watson Analytics is a cognitive service that provides natural language processing capabilities and instant access to predictive and visual analytic tools for businesses. It makes advanced and predictive analytics easy for anyone to acquire and use. Watson Analytics offers self-service analytics, including access to easy-to-use data refinement and data warehousing services. These make it easier for business users to acquire and prepare data, beyond simple spreadsheets for analysis and visualization.

IBM Watson Analytics automates steps such as data preparation, predictive analysis, and visual storytelling for business professionals across data-intensive disciplines like marketing, sales, operations, finance, and human resources. IBM SoftLayer is integrating the latest IBM Power Systems into its cloud infrastructure in order to fulfill the infrastructural needs for cost-effective high-performance computing. The IBM Watson system will run efficiently on IBM Power Systems. Soon, Watson Analytics will be available as a service in the IBM SoftLayer Cloud.

Containerized analytics as a service in IBM SoftLayer Cloud

The concept of containerization for bundling and deploying mission-critical applications is attracting the attention of developers and administrators alike. Bundling every kind of software module along with its binary files, libraries, configuration details, and other dependencies together into a single package is one way to ensure faster and more error-free deployment and delivery of software workloads. This pragmatic idea has spread, and today all kinds of mobile, cloud, social, embedded, middleware, database, enterprise, and IoT applications are methodically being containerized.

Sandboxing, a subtle and smart isolation technique, eliminates the restricting dependencies on underlying operating systems. Such comprehensive and compact sandboxed and contained applications are a worthy solution for achieving portability, extensibility, maneuverability, sustainability, and security needs.

As Docker technology has matured, a new paradigm of "containers as a service (CaaS)" has emerged. Containers are being readied, hosted, and delivered as a service over the public web. All the necessary procedures to deliver application-aware containers as a service are being configured on containers to make them ready for the forthcoming service era. That is to say, knowledge-filled, service-oriented, cloud-based, composable, and cognitive containers are being offered as the principal ingredients for the establishment and sustenance of the Smarter Planet vision. Applications are containerized and exposed as services to be discovered and used by a variety of consumers for a growing set of use cases. Big and fast data analytics via Hadoop and Apache Storm, Spark, and so on, are quickly maturing and stabilizing. Virtual machines (VMs) are widely being used to enable Hadoop as a service. In short, containers are destined for cloud environments.

The integration of Hadoop YARN with Docker will allow multiple clusters to use the same hardware resources. We have made YARN containers through the Dockerization steps and hosted the YARN containers in IBM SoftLayer Cloud. We have created an example to show how containerized big data workloads and analytical platforms ensure higher efficiency. The new offering of containerized analytics as a service with the IBM SoftLayer Cloud seems imminent.


Data has become a strategic asset for any organization, and it is important that every organization carefully plan ahead before proceeding with its data strategy. To enjoy continued success, data-driven enterprises will need to overcome all kinds of unexpected business challenges and changes.

To extract actionable insights, each enterprise must systematically subject all of the data that it has gleaned from different and distributed sources to a series of IT-enabled deep analytics processes with the help of end-to-end platforms.

In this article, we have explained how IBM SoftLayer can help you squeeze actionable insights out of your big and real-time data. By using analytics as a service in the cloud with the open, public, and cheap Internet infrastructure, you can create an optimized, organized, and very capable IT solution.

Downloadable resources

Related topic


Sign in or register to add and subscribe to comments.

Zone=Cloud computing, Internet of Things
ArticleTitle=Use big data and fast data analytics to achieve analytics as a service (AaaS)