Organizations today work with a lot of data, which comes to the business from multiple different sources, in multiple formats. This data is handled by various users and ends up scattered across public and private clouds, on-premises storage systems and even employees’ personal endpoints.
It can be hard to centrally track and manage all of this data, which raises two problems.
First, an organization cannot use a dataset if it does not know that the dataset exists.
Second, this undiscovered and unmanaged “shadow data” poses security risks. According to IBM’s Cost of a Data Breach Report, one-third of data breaches involve shadow data. These breaches cost USD 5.27 million on average—16% more than the overall average breach cost.
AI and ML can automate many aspects of data discovery, granting organizations more visibility into, and control over, all their data assets.
Examples of AI in data discovery include: