Unshackle data and develop new business insights
As companies look to accelerate their digital transformation, they must analyze and leverage the vast amount of data that has become available for effective decision making. By leveraging cloud-based analytics with scalable, persistent cloud storage, companies can unshackle their data and develop new business insights.
Using IBM Cloud™ Object Storage, organizations can build a centralized data repository, leveraging cost-effective and scalable storage that makes it possible to collect and store nearly unlimited amounts of data of any type, from any source. Data remains in its native format and doesn’t need to be moved in and out of IBM Cloud Object Storage; rather, the IBM Cloud Object Storage-based data lake is the persistent data store for the analytics.
IBM Cloud Object Storage is integrated with IBM Analytics Engine, IBM Watson® Studio, IBM Cloud SQL Query and other IBM Cloud services to provide self-service data analytics and business intelligence solutions that go well beyond the scalability, security and cost efficiencies of traditional solutions.
Common use cases
Move data from HDFS clusters to IBM Cloud Object Storage
Free up space on expensive Hadoop clusters by efficiently migrating large amounts of data from Hadoop to IBM Cloud Object Storage.
Query data in place
Use as an active workspace for a range of big data analytics use cases with query-in-place functionality that lets you run analytics directly on your data at rest.
Perform Apache Spark Analytics directly against data stored in object storage
Use as a low-cost, scalable persistent storage layer for analytics with optimized connectively to Apache Spark.
Store data for AI training models
Accelerate machine and deep learning workflows required to infuse AI into your business. Build and train AI models, and prepare and analyze data, in a single, integrated environment.
Build and analyze IoT pipelines
Store massive amounts of IoT data at low cost, and analytics frameworks can access the data directly. Data pipelines can be easily set up and managed to generate analytics-ready data.
Easily move data from HDFS clusters to IBM Cloud Object Storage
Free up space on expensive Hadoop clusters by using IBM Big Replicate to efficiently move data between Hadoop data clusters to IBM Cloud Object Storage, offering continuous replication with guaranteed data consistency. You can also use IBM Cloud Object Storage Distributed Copy (DistCp), an open source tool for migrating large amounts data from Hadoop to IBM Cloud Object Storage.
Query data in place
IBM Cloud SQL Query is a fully managed service that lets developers analyze and transform data stored across multiple files in multiple formats using ANSI SQL statements. The service can query across CSV, Parquet, JSON and ORC files stored in IBM Cloud Object Storage without the need to move or transform data beforehand. IBM Cloud SQL Query uses Apache Spark, an open source, fast, extensible, in-memory data processing engine optimized for low-latency, ad hoc analysis of data.
Perform Apache Spark analytics
IBM Cloud Object Storage offers optimized connectivity to Apache Spark services to store data from multiple sources and quickly gain insights from it. The use of IBM Cloud Object Storage with Spark analytics can completely decouple the compute and storage tiers, allowing users to store data in an object storage layer and spin up clusters of compute nodes as and when users need them. With this model, compute and storage can scale (and be purchased) independently, allowing compute costs to drop to zero when no jobs are running. The insights persist in IBM Cloud Object Storage, and the data can be re-ingested for future analysis.
Store data for Watson machine learning and deep learning workflows
Watson Studio is a hybrid cloud platform, built on the best of open source and IBM tools, to analyze data and use it to build and deploy AI models. IBM Cloud Object Storage is integrated with Watson Studio on IBM Cloud. When a machine learning project is created in Watson Studio, an instance of IBM Cloud Object Storage is created automatically to accelerate the handling of the data required to train and deploy machine and deep learning models.
Perform intelligent data discovery and governance
Once your data is in IBM Cloud Object Storage, it can be governed with the Watson Knowledge Catalog, using data profilers that segment and protect data, allowing for better governance of data such as personally identifiable information or other private data. By implementing a metadata catalogue, Watson Knowledge Catalog has a fundamental understanding of what the data is and what data policies may apply to it, and then implements those policies. Watson Knowledge Catalog includes intelligent data discovery and is integrated with Watson Studio to allow for a seamless transition from ‘finding’ to ‘using’ the information across your business.
Easily build and analyze IoT data pipelines
Object storage is perfectly suited for storing massive amounts of IoT data at a low cost and allowing analytics frameworks to access the data directly. IBM Cloud provides services based on Apache Kafka and Apache Spark, including IBM Events Streams and Spark as a service, respectively. Data pipelines from IBM Event Streams to object storage can be easily set up and managed to generate analytics-ready data, which can be analyzed directly by Watson using Spark as a service. Moreover, the Watson IoT Platform can be used to capture IoT device data and send it to IBM Event Streams.