Published: 30 August 2024
Contributors: Tim Mucci
Data sharing is the process of making an organization’s data resources available to multiple applications, users and other organizations. Effective data sharing involves a combination of technologies, practices, legal frameworks and organizational efforts to facilitate secure access for multiple entities without compromising data integrity.
Organizations that embrace big data analytics recognize data as a valuable strategic asset in their portfolio. This data comes from various sources, such as metrics derived from software applications, customer behavior data and Internet of Things (IoT) signals from appliances and sensors.
Think of data as books in a library. Data sharing is akin to having a library card that allows everyone in the organization to access and borrow these books when they need them. Without data sharing, each department would need to create and maintain its own library, leading to duplication, outdated information and narrow resources.
Organizations that share data can collaborate more effectively with partners, establish new business opportunities, form new partnerships and generate revenue streams through data products and other monetization. However, data sharing requires a commitment to maintaining the integrity and reliability of shared data throughout its lifecycle, assuring that it remains trustworthy, coherent and useful for accurate analysis. Successful data sharing allows stakeholders to gain valuable perspectives, develop new services and technologies and prepare for upcoming trends by analyzing vast amounts of data from both within and without the organization.
The data leader's guide illustrates how each type of data base suits a business’ needs, depending on whether the organization prioritizes analytics, AI or application performance.
Learn how a data management plan helps protect sensitive information
Data as a product offers a structured approach to managing the value of data
Organizations have been sharing data long before the invention of the internet, but advancements in digital literacy, technology and cloud adoption have led to real-time data sharing on a global scale. Data storage and transfer technologies are more available and affordable than ever. As a result, policies and regulations have evolved to reduce the risks associated with data sharing. Data sharing is more than just allowing access for analysis and monetization, it also breaks down barriers between business units and external partners. Different teams can work independently or with one another, each drawing from the same up-to-date data source. The increased quantity and variety of data available allows diverse teams across the organization to contribute to broader organizational goals.
Combining information from various sources, such as research data, operational data or customer feedback, improves service performance and increases the value of those services. For example, business units with access to data can use data analysis to decide based on market trends and customer preferences and develop successful marketing strategies.
Moreover, data sharing allows public authorities and organizations to share their data in a secure, lawful and governed manner. An essential part of data sharing hygiene involves data producers carefully documenting and labeling datasets with accurate metadata to support reproducibility. Detailed descriptions with clear definitions ensure that others can easily find, discover and understand the shared data.
The Future of Privacy Forum1 (FPF) analyzed data-sharing partnerships between companies and academic researchers and determined that these partnerships can accelerate socially beneficial research, broaden access to valuable data sets and improve the reproducibility of research findings. As data-sharing becomes more widespread, stakeholders are taking proactive steps to address risks and data breaches by using data-sharing agreements (DSAs) and privacy-enhancing technologies (PETs).
IBM® provides a good example of employing rigorous privacy and security protocols in its data-sharing practices, including the use of PETs to anonymize data before sharing it with universities, nonprofits and research labs. IBM's approach supports scientific discovery while protecting sensitive data, fostering safer and more effective partnerships. For instance, IBM collaborated with Melbourne Water in Australia to analyze data aimed at reducing energy emissions. During the COVID-19 pandemic, IBM processed SARS-CoV-2 genomic sequences, contributing over 3 million sequences to a research repository.
Another compelling use case of the value of data sharing comes from the US nonprofit Benefits Data Trust.2 Benefits Data Trust (BDT) promotes data sharing among states and organizations involved in US healthcare and education. Through data-sharing agreements, BDT boosts enrollment in critical public programs such as the Supplemental Nutrition Assistance Program (SNAP) and Medicaid.
South Carolina's Department of Social Services, with BDT, compared monthly Medicaid and SNAP lists, where they identified eligible individuals not enrolled in the program. This initiative has led to over 20,000 more SNAP enrollments since 2015, improving access to nutrition assistance for vulnerable populations. Similar efforts in Pennsylvania have also seen success, with data sharing helping to enroll approximately 240,000 people in various public assistance programs since 2005.
While data sharing offers businesses many benefits, it also introduces risks. When sensitive information is improperly distributed, it can expose an organization to regulatory, competitive, financial and security risks. Data consumers have limited control over the quality and availability of data. Low-quality data might also harbor hidden biases against genders, races, religions or ethnic groups.
Data governance processes establish the policies, standards and best practices to manage data securely, accurately and consistently across the organization. Effective governance limits access so that only authorized users have data use permissions. Governance also protects, classifies and helps to help ensure that data is used in compliance with legal and regulatory bodies.
Every organization has legal and ethical obligations to safeguard the privacy of the customer data it manages. Technologies such as encryption and data redaction allow for safe data sharing while protecting privacy. However, a lack of communication between data producers and consumers can lead to misinterpretations, resulting in incorrect assumptions when developing reports or engaging in data-driven decision-making initiatives.
For example, in 2012, Knight Capital Group3 suffered a trading glitch due to a lack of communication and coordination between teams, causing them to lose USD 440 million in just 45 minutes. A software update inadvertently activated an untested, undocumented and dormant piece of embedded software. Because the developers didn’t effectively communicate the potential impacts of the changes to traders’ systems, erroneous trades were run at high speed, resulting in significant financial loss.
The costly movement of data, especially through resource-intensive extract, transform, load (ETL) processes, has traditionally hindered widespread data sharing. Maintaining data quality and governance best practices can be a challenge, especially when dealing with massive volumes of data. Safely sharing large datasets over networks is time-consuming and highly technical and requires an extensive investment in storage and infrastructure.
Data security requires rigorous protective measures and education to safeguard sensitive data. Information traveling across networks and platforms during data-sharing processes is vulnerable to threats, such as unauthorized access, data breaches and cyberattacks. Furthermore, organizations must navigate complex data privacy laws and regulations when sharing data with external partners, stakeholders or third-party vendors.
Implementing best practices in data sharing helps organizations maximize benefits while minimizing risk.
A data marketplace allows organizations to safely share and monetize their data and data products. There are a few different types of data marketplaces:
Public data marketplaces offer a secure environment for participants to buy and sell data and related services, which in turn certifies high quality and consistency from data providers. Companies can use a data marketplace to acquire third-party data to enrich their existing datasets or to offer and monetize new data products and services.
Each data sharing type fulfills a specific role in facilitating a secure exchange of information.
The most widely used types of data-sharing technology among enterprise organizations are the data warehouse and data lakehouse. These modern data architecture systems provide central repositories for big data collection, storing and sharing from multiple business units. These architectures typically include tiers for front-end clients, analytics engines and database servers.
Application programming interfaces (APIs) allow software components to communicate shared definitions and protocols. Data-sharing APIs support fine-grained access controls and permissions, specifying what data consumers can and cannot request.
Federated learning, blockchain technology and data exchange platforms are other technologies that support data sharing. Federated learning allows AI systems to train on distributed datasets from diverse sources without having to move the data. Blockchain provides a transparent, immutable ledger for tracking transactions, including those on open data exchanges, providing a layer of integrity and security to data-sharing processes.
Legacy technologies such as Secure File Transfer Protocol (SFTP) and email allow vendor-agnostic, homegrown solutions, but are increasingly difficult to secure and govern. They lack advanced security features such as encryption at rest, granular data access controls and automated auditing, which are more common in modern solutions.
Modern data solutions focus on secure data sharing, with cloud data storage offering scalability and reliability with limitations to accessibility and security. Vendor-specific data-sharing solutions offer built-in security and scalability, but they often come with vendor lock-in, which limits flexibility and increases long-term costs.
Privacy-enhancing technologies, data clean rooms and other technologies are enhancing data operations through automation. These trends highlight the shift toward privacy, decentralization and AI-driven approaches in handling and analyzing data.
Future trends in data sharing emphasize the increasing importance of privacy. Privacy-enhancing technologies such as secure multiparty computation and data masking are becoming crucial for balancing seamless data sharing and secure data protection. Adopting PETs gives companies a competitive edge as these tools become integral to operations.
Data clean rooms are secure, privacy-focused environments where multiple parties can collaborate on data without sharing raw data. They allow companies to perform analytics and gain insights while protecting sensitive data, so it remains compliant with privacy regulations. Clean rooms help in maintaining trust between partners by preventing the exposure of personal information and allowing aggregated, anonymized data to be shared.
A data mesh allows an organization to treat data as a product, making it discoverable and usable in a self-service format. This approach allows business units to create and manage their data products independently. It also facilitates a centralized view of data across various platforms and technologies, improving connectivity and insights without the need for separate data platforms
Large language models (LLMs) can streamline data engineering and operations by automating tasks such as data profiling, modeling and integration, leading to improved data quality. Deploying generative AI within existing data infrastructures allows organizations to handle routine tasks more efficiently, freeing up resources for more complex analyses and decision-making.
IBM Data Product Hub helps streamline data sharing and automates the delivery of data products to data consumers across the organization.
IBM Cloud Pak for Data helps improve data quality, privacy and compliance, and helps users find the data they need faster.
IBM watsonx.data is a scalable, hybrid data store designed for AI and analytics workloads. It offers open data access, fit-for-purpose query engines and integration with various data environments, enabling efficient data preparation and management across any cloud or on-premises setup.
IBM's partnership with TechD uses generative AI to unlock data-driven insights and decision-making capabilities while maintaining strong privacy.
Advanced data storage solutions are critical for ensuring business agility, security and scalability while enabling organizations to efficiently manage and leverage data across evolving environments.
A data fabric architecture helps simplify data access, break down silos and enhance decision-making by providing business-ready data across hybrid and multicloud environments.
1 Data sharing for research (link resides outside IBM.com), The Future of Privacy Forum, August 2022
2 How data sharing boosts public benefits enrollment (link resides outside IBM.com), Benefits Data Trust, January 2023
3 Knight Capital Group stock trading disruption (link resides outside of IBM.com), Wikipedia, August 2012