Unshackle data and develop new business insights

As companies look to accelerate their digital transformation, they must analyze and leverage the vast amount of data that has become available for effective decision making. By leveraging cloud-based analytics with scalable, persistent cloud storage, companies can unshackle their data and develop new business insights.

Using IBM Cloud™ Object Storage, organizations can build a centralized data repository, leveraging cost-effective and scalable storage that makes it possible to collect and store nearly unlimited amounts of data of any type, from any source. Data remains in its native format and doesn’t need to be moved in and out of IBM Cloud Object Storage; rather, the IBM Cloud Object Storage-based data lake is the persistent data store for the analytics.

IBM Cloud Object Storage is integrated with IBM Analytics Engine, IBM Watson® Studio, IBM Cloud SQL Query and other IBM Cloud services to provide self-service data analytics and business intelligence solutions that go well beyond the scalability, security and cost efficiencies of traditional solutions.

Common use cases

Move data from HDFS clusters to IBM Cloud Object Storage

Free up space on expensive Hadoop clusters by efficiently migrating large amounts of data from Hadoop to IBM Cloud Object Storage.

Query data in place

Use as an active workspace for a range of big data analytics use cases with query-in-place functionality that lets you run analytics directly on your data at rest.

Perform Apache Spark Analytics directly against data stored in object storage

Use as a low-cost, scalable persistent storage layer for analytics with optimized connectively to Apache Spark.

Store data for AI training models

Accelerate machine and deep learning workflows required to infuse AI into your business. Build and train AI models, and prepare and analyze data, in a single, integrated environment.

Build and analyze IoT pipelines

Store massive amounts of IoT data at low cost, and analytics frameworks can access the data directly. Data pipelines can be easily set up and managed to generate analytics-ready data.

Key capabilities

Easily move data from HDFS clusters to IBM Cloud Object Storage

Free up space on expensive Hadoop clusters by using IBM Big Replicate to efficiently move data between Hadoop data clusters to IBM Cloud Object Storage, offering continuous replication with guaranteed data consistency. You can also use IBM Cloud Object Storage Distributed Copy (DistCp), an open source tool for migrating large amounts data from Hadoop to IBM Cloud Object Storage.

Query data in place

IBM Cloud SQL Query is a fully managed service that lets developers analyze and transform data stored across multiple files in multiple formats using ANSI SQL statements. The service can query across CSV, Parquet, JSON and ORC files stored in IBM Cloud Object Storage without the need to move or transform data beforehand. IBM Cloud SQL Query uses Apache Spark, an open source, fast, extensible, in-memory data processing engine optimized for low-latency, ad hoc analysis of data.

Perform Apache Spark analytics

IBM Cloud Object Storage offers optimized connectivity to Apache Spark services to store data from multiple sources and quickly gain insights from it. The use of IBM Cloud Object Storage with Spark analytics can completely decouple the compute and storage tiers, allowing users to store data in an object storage layer and spin up clusters of compute nodes as and when users need them. With this model, compute and storage can scale (and be purchased) independently, allowing compute costs to drop to zero when no jobs are running. The insights persist in IBM Cloud Object Storage, and the data can be re-ingested for future analysis.

Store data for Watson machine learning and deep learning workflows

Watson Studio is a hybrid cloud platform, built on the best of open source and IBM tools, to analyze data and use it to build and deploy AI models. IBM Cloud Object Storage is integrated with Watson Studio on IBM Cloud. When a machine learning project is created in Watson Studio, an instance of IBM Cloud Object Storage is created automatically to accelerate the handling of the data required to train and deploy machine and deep learning models.

Perform intelligent data discovery and governance

Once your data is in IBM Cloud Object Storage, it can be governed with the Watson Knowledge Catalog, using data profilers that segment and protect data, allowing for better governance of data such as personally identifiable information or other private data. By implementing a metadata catalogue, Watson Knowledge Catalog has a fundamental understanding of what the data is and what data policies may apply to it, and then implements those policies. Watson Knowledge Catalog includes intelligent data discovery and is integrated with Watson Studio to allow for a seamless transition from ‘finding’ to ‘using’ the information across your business.

Easily build and analyze IoT data pipelines

Object storage is perfectly suited for storing massive amounts of IoT data at a low cost and allowing analytics frameworks to access the data directly. IBM Cloud provides services based on Apache Kafka and Apache Spark, including IBM Events Streams and Spark as a service, respectively. Data pipelines from IBM Event Streams to object storage can be easily set up and managed to generate analytics-ready data, which can be analyzed directly by Watson using Spark as a service. Moreover, the Watson IoT Platform can be used to capture IoT device data and send it to IBM Event Streams.

Case studies

Cost-effective, secure and always available data

Easy data collection and ingestion

IBM offers a variety of ways to get your data into IBM Cloud Object Storage, including natively integrated Aspera high-speed data transfer capabilities for quick data transfer over the network. In addition, services such as IBM Event Streams make it easy to ingest data in real time. IBM Big Replicate can efficiently move data between Hadoop data clusters to IBM Cloud Object Storage, offering continuous replication, and IBM Cloud Object Storage Distributed Copy (DistCp), an open source tool, can be used for migrating large amounts of data from Hadoop to IBM Cloud Object Storage.

Cost-effective and flexible

Using IBM Cloud Object Storage, organizations can build a centralized data repository, leveraging cost-effective and scalable storage that makes it possible to collect and store virtually unlimited amounts of data of any type, from any source. Data is stored in its native format and does not require up-front transformations.

Always available

IBM Cloud Object Storage is built to help data scientists, business analysts and app developers across your organizations easily access data with virtually unmatched availability. It is designed to deliver 99.999999999% (11 nines) of durability, and availability is ensured with a patented technology where data is encrypted and distributed across multiple devices in multiple IBM data center facilities.

Highly secure

IBM Cloud Object Storage secures data using automatic server-side encryption and offers encryption options with keys managed by IBM Key Protect (key management system) or encryption with keys that you manage. Integration with IBM Identity and Access Management ensures granular access controls down to the data bucket level and according to user role.