June 2, 2020 By Anup Gandhi 3 min read

The integrity and trustworthiness of data or any other master entity is enforced via data quality rules. Customers no longer want to rely on hand crafted rules that can number in the thousands, which in turn also need a lot of maintenance.

Riding on the machine learning (ML) wave, customers can break free from their rule-based business logic and rely on data driven decisions within product information management systems (PIM). To keep manual effort and onboarding time to the bare minimum, ML comes particularly in handy in three areas:

  1. Auto-categorization of products from product description
  2. Enrichment of products from product description
  3. And data standardization of product descriptions or other long text attributes

These processes are necessary for decreasing effort and saving time and costs. The IBM InfoSphere Master Data Management (MDM) suite offers these ML capabilities in IBM MDM Product Master to help organize product and service information across the enterprise. As a PIM solution, IBM Product Master (formerly IBM InfoSphere Master Data Management Collaborative Edition) aggregates information from any upstream system, enforces business processes to ensure data accuracy and consistency, and synchronizes trusted information with downstream systems.

Auto-categorization of products

Using MDM CE’s machine learning (ML) capabilities, PIM users can auto-categorize the product names to appropriate product categories, based on their product descriptions. Along with the categorizations, it also provides a confidence score of the action. Categorizations that show a confidence score below a set threshold are moved for manual review. The manual corrections/assignments are collected as feedback to retrain ML models at regular intervals.

To train the ML service, an initial data file is needed that contains product names and expected product categories. The training data must have a representation of product names from every category. It must contain samples such that the model learns enough variations of possible product names in every category. Around 15-20 products per category is sufficient for a corpus of around 500 categories.

With this feature, the next set of products are imported into the system will not need their categories being specified at import time.

Enrichment of products from product description

Normally in Product Master, either a data steward is expected to populate values into the different attributes of products, or there are some custom rules written to populate the values based on some conditions on product descriptions. Using this feature, the product is enriched from the provided product description. Product Master uses probabilistic data structures to fill out attributes with values from product descriptions in a way that the maximum available attributes of the product get populated. Product Master reads an Excel file containing the data schema to understand the possible attributes and their values for every product. This feature just needs the description, without any explicit set actions, and populates the attributes automatically.

Data Standardization of product descriptions

Not only does Product Master use ML to auto-categorize products and to populate product attributes, Product Master’s ML capabilities can also standardize the product descriptions to repair any evident problems. The ML service learns contextual representations of product descriptions, and uses these representations to identify issues in the descriptions and recommends better options to address problems.

As part of the training, the ML service uses an Excel/CSV file containing all true product descriptions. This service also uses feedback to improve on identifying and addressing the issues accordingly.

Why are these features so necessary?

Two of the biggest product information management pain points organizations see are automated attribute matching and anomaly detection, both of which can be solved by ML.

While merging external datasets into existing data columns, challenges such as datasets having different names to the same columns, column order changes, or missing headers of datasets need manual effort to resolve. With automated attribute matching, ML suggestions on column mapping between the datasets enables enterprises to quickly approve the suggested merge, matching and mapping different columns into one.

Data values of a column can run into thousands of records. It becomes practically impossible to detect any anomalies in the data values manually. Using ML-based anomaly detection, the outliers in the data values could be identified and highlighted for manual supervision. This could further help in automatically generating business rules to validate the columns.

IBM Product Master helps organizations achieve better operational efficiency, manage compliance and drive data based digital transformation. Visit the webpage to learn more about how IBM can help you accelerate and transform your organization’s product information management system.

Try the interactive demo. 

Accelerate your journey to AI.

Was this article helpful?

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

Data science vs data analytics: Unpacking the differences

5 min read - Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters