Cloud-based data storage for business data — particularly big data — is top of mind today, whether you are relying on it to conduct day-to-day business or to accomplish specific tasks.
Data drives many business functions — from creating targeted programs for customers and prospects, to optimizing manufacturing and operations processes, to developing, testing, distributing and tracking virus testing and vaccination. Modern businesses rely on the availability of the data they need, when they need it. However, finding the best option to suit your needs is not an easy task, and it may involve several different types of repositories for different categories of data.
Let’s start with the basics and delve into some examples of how one data repository or many types of data repositories may be necessary to serve the needs of your business.
Three distinct types of cloud storage repositories exist today, each serving a different purpose to address a specific need:
A data lake is a large repository of raw data, either unstructured or semi-structured. This data is aggregated from various sources and is simply stored. It is not altered to suit a specific purpose or fit into a particular format. To prepare this data for analysis involves time-consuming data preparation, cleansing and reformatting for uniformity. Data lakes are great resources for municipalities or other organizations that store information related to outages, traffic, crime or demographics. The data could be used at a later date to update DPW or emergency services budgets and resources.
A data warehouse is an aggregation of data from many sources to a single, centralized repository that unifies the data qualities and format, making it useful for data scientists to use in data mining, artificial intelligence (AI), machine learning and, ultimately, business analytics and business intelligence. Data warehousing could be used by a large city to aggregate electronic transactions from various departments, including speeding tickets, dog licenses, excise tax payments and other transactions. This structured data would be analyzed by the city to issue follow-up invoicing and to update census data and police logs. It could also be used by a developer to aggregate terabytes of data generated by sensors on automobiles to aid in the decision-making process for an autonomous driving solution.
A data mart is a subset of a data warehouse that benefits a specific set of users within the business or business unit. A data mart could be used by the marketing department of a manufacturing company to determine the ideal target demographic or persona to aid in the development of marketing plans. It could also be used by a manufacturing department to analyze performance and error rates to enable continuous improvement. Data sets within a data mart are often utilized in real time, for current analysis and actionable results.
While all three types of cloud data repositories hold data, there are very distinct differences between them. For instance, a data warehouse and a data lake are both large aggregations of data, but a data lake is typically more cost-effective to implement and maintain because it is largely unstructured.
Data lake architecture has evolved over the past few years to support larger volumes of data and cloud-based computing. Large amounts of data are received from a number of data sources to a central location.
A data warehouse could be structured in one of three ways:
Data within a data warehouse can be more easily utilized for various purposes than data within a data lake. The reason is because a data warehouse is structured and can be more easily mined or analyzed.
A data mart, on the other hand, contains a smaller amount of data as compared to both a data lake and a data warehouse, and the data is categorized for a specific use or by a specific demographic or business unit. A data mart can exist in many different formats (star, snowflake or vault) defined by the logical structure of the data, with a vault structure being more agile, flexible and scalable than the other formats.
There are three types of data marts:
The type of data repository you choose, and the structure of it, is highly dependent on the needs and demands of your business. If it makes sense for your business, take advantage of the benefit of hybrid cloud-based storage for flexibility, scalability and a broader, informed approach to problem-solving and decision-making.
A large multinational manufacturing company generates large volumes of data for various uses. Some of the data is important, while other data may or may not have a purpose in the future. The company uses a cloud-based data warehouse for storage of bulk data, which is less expensive than other data storage options. However, the company also has dependent data marts in place for specific areas of the business, providing value to business users in departments like finance, manufacturing and marketing. Each of these marts contains data earmarked for a specific use, formatted to make it easy to analyze. For example:
A large municipality needs an affordable solution that provides data in an affordable and somewhat usable manner. The municipality uses a data lake in the cloud to maintain traffic data. It can’t afford to analyze and take action on that data at the moment but will be ready to when funding comes through. It also uses a software data warehouse on-premises to track tax bill status. In addition, the municipality uses a hybrid data mart to track the spread of a virus among residents, aggregating data from various hospitals and municipal health services to a single repository to be analyzed and used by the department of health.
There are many misconceptions regarding cloud-based data repositories. Some of the most common misconceptions include the following:
Your business is unique, with specific resources, goals, and challenges. Evaluate your options carefully to determine what solution will best serve your needs. Consider the following:
These considerations will help you determine what solution, or combination of solutions, will help you reach your goals.
IBM offers several solutions to assist with your cloud storage and data science needs.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com