Object storage, often referred to as object-based storage, is a data storage architecture ideal for storing, archiving, backing up and managing high volumes of static unstructured data—reliably, efficiently and affordably.
Today’s internet communications data is largely unstructured, meaning that it does not conform to, or cannot be organized easily into, a traditional relational database with rows and columns. This includes email, videos, photos, web pages, audio files, sensor data and other types of media and web content (textual or non-textual). This content streams continuously from social media, search engines, mobile and “smart” devices.
The International Data Corporation (IDC) estimates that unstructured data is likely to represent as much as 80% of all data worldwide by the year 2025 (link resides outside ibm.com).
Enterprises are finding it challenging to efficiently (and affordably) store and manage this unprecedented volume of data. Object-based storage has emerged as the preferred method for data archiving and backup. It offers a level of scalability not possible with traditional file- or block-based storage. With object-based storage, you can store and manage data volumes on the order of terabytes (TBs), petabytes (PBs) and even greater.
Strategic application modernization is one key to transformational success that can boost annual revenue and lower maintenance and running costs.
Register for the guide on hybrid cloud
Objects are discrete units of data that are stored in a structurally flat data environment. There are no folders, directories or complex hierarchies as in a file-based system. Each object is a simple, self-contained repository that includes the data, metadata (descriptive information associated with an object) and a unique identifying ID number (instead of a file name and file path).
This information enables an application to locate and access the object. You can aggregate object storage devices into larger storage pools and distribute these storage pools across locations. This allows for unlimited scale, as well as improved data resiliency and disaster recovery.
Object storage removes the complexity and scalability challenges of a hierarchical file system with folders and directories. Objects can be stored locally, but most often reside on cloud servers, with accessibility from anywhere in the world.
Objects (data) in an object-storage system are accessed via Application Programming Interfaces (APIs). The native API for object storage is an HTTP-based RESTful API (also known as a RESTful web service). These APIs query an object’s metadata to locate the wanted object (data) via the internet from anywhere, on any device.
RESTful APIs use HTTP commands like “PUT” or “POST” to upload an object, “GET” to retrieve an object and “DELETE” to remove it. (HTTP stands for Hypertext Transfer Protocol and is the set of rules for transferring text, graphic images, sound, video and other multimedia files on the internet).
You can store any number of static files on an object storage instance to be called by an API. More RESTful API standards are emerging that go beyond creating, retrieving, updating and deleting objects. These allow applications to manage the object storage, its containers, accounts, multi-tenancy, security, billing and more.
For example, suppose you want to store all the books in a large library system on a single platform. You need to store the contents of the books (data), but also the associated information like the author, publication date, publisher, subject, copyrights and other details (metadata). You might store all this data and metadata in a relational database, organized in folders under a hierarchy of directories and subdirectories.
But with millions of books, the search and retrieval process becomes cumbersome and time-consuming. An object storage system works well since the data is static or fixed. In this example, the contents of the book will not change.
The objects (data, metadata and ID) are stored as “packages” in a flat structure and easily located and retrieved with a single API call. Further, as the number of books continues to grow, you can aggregate storage devices into larger storage pools and distribute these storage pools for unlimited scale.
There are many reasons to consider an object-storage-based solution to store your data, particularly in this era of the internet and digital communications that is producing large volumes of web-based, multimedia data at an increasing rate.
Object storage is seeing wide adoption in the era of cloud computing and for the management of unstructured data, which analysts estimate will represent most of all data worldwide soon.
The volume of web-generated content—emails, videos, social media, documents, sensor data produced by the Internet of Things (IoT) devices and more—is massive and growing. Unstructured data is typically static (unchanging) but may be required at any time, anywhere (like images and video files, for example, or archived data backups).
Cloud-based object storage is ideal for long-term data retention. Use object storage to replace traditional archives, such as Network Attached Storage (NAS), reducing your IT infrastructure. Easily archive and store mandated, regulatory data that must be retained for extended periods of time. Cost-effectively preserve large amounts of rich media content (images, videos and more) that is not frequently accessed.
Unlimited scale is perhaps the most significant advantage of object-based data storage. Objects, or discrete units of data (in any quantity), are stored in a structurally flat data environment, within a storage device such as a server. You can simply add more devices or servers in parallel to an object storage cluster for extra processing and to support the higher throughputs required by large files such as videos or images.
Object storage removes the complexity that comes with a hierarchical file system with folders and directories. There is less potential for performance delay and more efficiency when retrieving data since there are no folders, directories or complex hierarchies to navigate. This improves performance, particularly when managing large quantities of data.
You can configure object storage systems so that they replicate content. If a disk within a cluster fails, a duplicate disk is available, ensuring that the system continues running with no interruption or performance degradation. Data can be replicated within nodes and clusters and among distributed data centers for extra backup off-site and even across geographical regions.
Object storage is a more efficient alternative to tape backup solutions, which require tapes that need to be physically loaded into and removed from tape drives and moved off-site for geographic redundancy. You can use object storage to automatically back up on-premises databases to the cloud and to cost-effectively replicate data among distributed data centers. Add extra backup off-site and even across geographical regions to ensure disaster recovery.
For a deeper dive on disaster recovery, check out "Backup and Disaster Recovery: A Complete Guide."
Each object is a self-contained repository that includes metadata or descriptive information associated with it. Objects use this metadata for important functions such as policies for retention, deletion and routing, disaster recovery strategies (data protection) or validating content authenticity. You can also customize the metadata with extra context that can be later extracted and leveraged to perform business insights and analytics around customer service or market trends, for example.
Object storage services use pay-as-you-go pricing that incurs no upfront costs or capital investment. You simply pay a monthly subscription fee for a specified amount of storage capacity, data retrieval, bandwidth usage and API transactions. Pricing is usually tier-based or volume-based, which means that you pay less for large volumes of data.
Additional cost savings come from the use of commodity server hardware, since object storage solutions have limited hardware constraints and can be deployed on most properly configured commodity servers. This limits the need to purchase new hardware when deploying an object storage platform on-premises. You can even use hardware from multiple vendors.
Object storage goes hand in hand with cloud or hosted environments that deliver multi-tenant storage as a service. This allows many companies or departments within a company to share the same storage repository, with each having access to a separate portion of the storage space. This shared storage approach inherently optimizes scale and costs.
You reduce your organization’s onsite IT infrastructure by using low-cost cloud storage while keeping your data accessible when needed. Your enterprise, for example, can use a cloud-based object storage solution to collect and store large amounts of unstructured IoT and mobile data for your smart device applications.
Storage methods have evolved to meet the changing nature of data. Data can be transactional and collected in smaller volumes that are neatly stored in a database on a disk drive on a server. File-based storage and block-based storage are well suited to this type of structured data and continue to work well in certain scenarios. But the Internet has changed everything. Organizations struggle to manage mounting volumes of web-based, digital content (unstructured data). Object-based storage can meet this challenge.
Your company likely has differing storage needs, depending on the speed and performance requirements of your IT operations. Look carefully at file-, block- and object-based storage methods, as each has its own advantages and disadvantages. You might find that a combination of these architectures will best fulfill your data storage needs.
File storage organizes and stores data inside a folder. Files are named, tagged with metadata (typically the file name, file type and when it was created and last updated) and organized in folders under a hierarchy of directories and subdirectories. You can think of file storage in the same way you store physical paper files in a filing cabinet. There are multiple drawers (directories) and labeled file folders inside each drawer (subdirectories).
To locate a particular file folder in your file cabinet, you pull out the proper drawer and view the folder labels. In the same way, to access the data in a file storage system, your computer system only requires the path (directories and subdirectories) in which to find it. A hierarchical storage system like this works well with relatively small, easily organized amounts of data. However, as the number of files grows, the search and retrieval process can become cumbersome and time-consuming.
"File Storage: A Complete Guide" provides a full overview of block storage.
Object-based storage has emerged as a preferred method for data archiving and backup today’s digital communications—unstructured media, web content (email, videos, image files and web pages) and sensor data produced by IoT devices. Instead of breaking files into blocks stored on disks in a file system, this storage system treats objects as discrete units of data stored in a structurally flat data environment.
Object storage does not use folders, directories or complex hierarchies. Rather, each object is a simple, self-contained repository that includes the data, metadata and a unique identifying ID number that an application uses to locate and access it. In this case, the metadata is more descriptive than with a file-based approach. You can customize the metadata with more context that you can later extract and use for other purposes, such as data analytics.
Use object storage as a solution if you require cost-effective storage capacity for your unstructured data scaling far past the effective limits of block and file solutions. Object storage is also ideal for archiving data that does not change frequently or at all (static files), such as transaction records or music, image and video files.
Block storage offers an alternative to file-based storage—one with improved efficiency and performance. Block storage breaks a file into equally sized chunks of data and stores these data blocks separately under a unique address. You don't need a file-folder structure. Instead, you can store the collection of blocks anywhere in the system for maximum efficiency.
To access a file, a server operating system uses the unique address to pull the blocks back together, assembling these into the file. You gain efficiency as the system does not need to navigate through directories and file hierarchies to access the data blocks. Block storage works well for critical business applications, transactional databases and virtual machines that require low-latency (minimal delay), granular or more detailed access to data and consistent performance.
"Block Storage: A Complete Guide" provides a full overview of block storage.
As mentioned, object-based storage is an ideal solution for storing, archiving, backing up and managing high volumes of static or unstructured data.
More use cases:
You can use simple API calls to upload and retrieve files in an object storage system, but an application also needs the object’s metadata to locate the proper object in storage. This is where an object storage database comes into play. This database provides a directory of sorts that uses the object’s metadata to locate the appropriate data files in a distributed storage system.
Each object storage group has an object storage database that contains two tables. One table is an object directory and the other table is for the object storage.
The object directory table contains descriptive information about each object (the metadata). This directory tracks all objects in the storage hierarchy by recording the collection name identifier, the object name and other pertinent information. For example, in IBM’s object storage methodology, the object directory table includes three “indexes":
The second table in the object storage database is the object storage table, which contains the data content or the file itself (the objects). The data (fixed digital content such as video and image files or large libraries of documents) sits in the object store, while the metadata (contextual information about the data, including the name ID) resides in a database or object directory table.
When an application “posts” a file, it creates the metadata and stores it in the object directory table within the object storage database, along with “putting” the file to the object storage table. To retrieve the file later, the application queries the object directory or database for the metadata and uses that descriptive, identifying information to locate or “get” the data.
Open source generally refers to a universal or non-proprietary software development model. An open source developer environment encourages collaboration. The public has free access to all source code, documentation, software development kits (SDKs) and application programming interfaces (APIs) within the environment.
Developers and programmers can modify and improve upon source code, then share, distribute or publish these efforts within the developer community. Other developers can then download this code or further modify it.
Open-source technologies give you maximum flexibility and control over your data management and storage options. With open source tools and access to open APIs, you can customize the code to suit your organization’s specific requirements. You are not locked into proprietary technologies as you develop, but will have the freedom to use existing hardware you might own (or a mix of vendor hardware). You will also benefit from other developer’s efforts within the broader community.
Regarding object-based storage systems, there are several open source solutions available, such as from Ceph, MinIO, Openio.io and SwiftStack or OpenStack Swift. While these tout differing features, policy options and methodologies, each has the same goal—to enable large-scale storage of unstructured, digital data.
All major open source technology solutions adhere to Amazon’s Simple Storage Service (Amazon S3) object storage protocol. First introduced in 2006, it has since become the de-facto standard for cloud storage. Each offers an open source object storage server compatible with Amazon S3 RESTful APIs.
Many also offer their own open API as an alternative. OpenStack Swift, for example, not only supports Amazon’s S3 API but also offers its own Swift Open API with some unique capabilities. Ceph Object Storage and Openio.io are S3-compatible but also support a large subset of the OpenStack Swift API.
As more developers compete to deploy and scale applications faster, containerization has emerged as a growing solution.
Containerization is an application packaging approach that is quickly maturing and delivering unprecedented benefits to developers, infrastructure and operations teams. "Containerization: A Complete Guide" will give you a full overview of all things containerization.
Kubernetes, in turn, has become a leading container management solution. Kubernetes eases management tasks such as scaling containerized applications. It also helps you roll out new versions of applications, and provides monitoring, logging and debugging services, among other functions. Kubernetes is an open source platform and conforms to the Open Container Initiative (OCI) standards for container image formats and runtimes.
What does Kubernetes have to do with object storage? The key term here is scale.
Kubernetes enables the management of containers at scale. It is capable of orchestrating containers across multiple hosts and scaling containerized applications and their resources dynamically (auto-scaling is one of the key features of Kubernetes).
Object storage systems handle storage at scale. These systems can store massive volumes of unstructured data at petabyte-scale and even greater. These two scale-out approaches, used together, create an ideal environment for current and future growing data workloads.
Running an object storage system on top of Kubernetes is a natural fit. Use Kubernetes for provisioning and managing distributed containerized applications. Likewise, Kubernetes can be the unified management interface to handle the orchestration of distributed object storage pools, whether these are local or distributed across data centers or even across geographical regions.
To learn all about Kubernetes, see "Kubernetes: A Complete Guide."
To back up a bit and start from the core concepts, see our video "Container Orchestration Explained."
IBM Cloud Object Storage is an unstructured data storage service designed for durability, resiliency and security.
IBM Analytics Engine is a combined Apache Spark and Apache Hadoop service for creating analytics applications.
Smart identity and access management (IAM) solutions for the hybrid, multicloud enterprise. Powered by AI. Backed by IBM Security.
IBM Cloud Pak® for Applications (CP4Apps) provides the ultimate flexibility for your application landscape. Whether it’s building new cloud-native services and applications, refactoring or re-platforming existing applications; CP4Apps has it covered.
Gain the skills and knowledge required to expand your career and deepen your skills as an IBM Cloud Developer. This interactive curriculum helps you prepare for professional-level certification.
File storage—also called file-level or file-based storage—is a hierarchical storage methodology used to organize and store data on a computer hard drive or on network-attached storage (NAS) device.
Learn the fundamentals of block storage, a type of storage used to store data files on Storage Area Networks (SAN) or on cloud platforms.