Unshackle data and develop new business insights
IBM Cloud® Object Storage enables organizations to build a centralized data repository to leverage cost-effective and scalable storage that makes it possible to collect and store nearly unlimited amounts of data of any type, from any source. Data remains in its native format and doesn’t need to be moved in and out of IBM Cloud Object Storage; rather, the IBM Cloud Object Storage-based data lake is the persistent data store for the analytics.
IBM Cloud Object Storage is integrated with IBM Analytics Engine, IBM Watson® Studio, IBM Cloud SQL Query and other IBM Cloud services to provide self-service data analytics and business intelligence solutions that go well beyond the scalability, security and cost efficiencies of traditional solutions.
Common use cases
Move data from HDFS clusters to IBM Cloud Object Storage
Free up space on expensive Hadoop clusters by efficiently migrating large amounts of data from Hadoop to IBM Cloud Object Storage.
Query data in place
Use as an active workspace for a range of big data analytics use cases with query-in-place functionality that lets you run analytics directly on your data at rest.
Perform Apache Spark Analytics directly against data stored in object storage
Use as a low-cost, scalable persistent storage layer for analytics with optimized connectively to Apache Spark.
Store data for AI training models
Accelerate machine and deep learning workflows required to infuse AI into your business. Build and train AI models, and prepare and analyze data, in a single, integrated environment.
Build and analyze IoT pipelines
Store massive amounts of IoT data at low cost and allow analytics frameworks to access the data directly. Data pipelines can be easily set up and managed to generate analytics-ready data.
Customer success: Skåne University Hospital
For Skåne University Hospital, IBM Cloud Object Storage makes it possible to gather — and retain — as much surgical information as possible because it is reliable, cost-effective, available globally, and most importantly, offers nearly limitless capacity.
Easily move data from HDFS clusters to IBM Cloud Object Storage
Free up space on expensive Hadoop clusters by using IBM Big Replicate to efficiently move data between Hadoop data clusters to IBM Cloud Object Storage to offer continuous replication with guaranteed data consistency. You can also use IBM Cloud Object Storage Distributed Copy (DistCp), an open-source tool for migrating large amounts data from Hadoop to IBM Cloud Object Storage.
Query data in place
IBM Cloud SQL Query is a fully managed service that lets developers analyze and transform data stored across multiple files in various formats using ANSI SQL statements. The service can query across CSV, Parquet, JSON and ORC files stored in IBM Cloud Object Storage without the need to move or transform data prior. IBM Cloud SQL Query uses Apache Spark, an open-source, fast, extensible, in-memory data processing engine optimized for low-latency, ad hoc analysis of data.
Perform Apache Spark analytics
IBM Cloud Object Storage offers optimized connectivity to Apache Spark services to store data from multiple sources and quickly gain insights. Using IBM Cloud Object Storage with Spark analytics can completely decouple the compute and storage tiers to allow users to store data in an object storage layer and spin up clusters of compute nodes exactly when users need them. With this model, compute and storage can scale and be purchased independently to allow compute costs to drop to zero when no jobs are running. The insights remain in IBM Cloud Object Storage, and the data can be re-ingested for future analysis.
Store data for Watson machine learning and deep learning workflows
IBM Watson Studio is a hybrid cloud platform, built on the best of open source and IBM tools to analyze data and use it to build and deploy AI models. IBM Cloud Object Storage is integrated with IBM Watson Studio on IBM Cloud. When a machine learning project is created in IBM Watson Studio, an instance of IBM Cloud Object Storage is created automatically to accelerate the handling of the data required to train and deploy machine and deep learning models.
Perform intelligent data discovery and governance
Once your data is in IBM Cloud Object Storage, it can be governed with the IBM Watson Knowledge Catalog, using data profilers that segment and protect data to allow for better governance. By implementing a metadata catalogue, IBM Watson Knowledge Catalog has a fundamental understanding of what the data is and what policies may apply to it, and then implements them. IBM Watson Knowledge Catalog includes intelligent data discovery and is integrated with IBM Watson Studio to allow for a seamless transition from ‘finding’ to ‘using’ the information across your business.
Easily build and analyze IoT data pipelines
Object storage is designed for storing massive amounts of IoT data at a low cost and allowing analytics frameworks to access it directly. IBM Cloud provides services based on Apache Kafka and Apache Spark, including IBM Events Streams and Spark as a service, respectively. Data pipelines from IBM Event Streams to object storage can be easily set up and managed to generate analytics-ready data, which can be analyzed directly by IBM Watson using Spark as a service. Moreover, the IBM Watson IoT Platform can be used to capture IoT device data and sent to IBM Event Streams.