IBM InfoSphere Big Match for Hadoop

Match and analyze disparate volumes of structured and unstructured customer data for deeper customer insights

InfoSphere Big Match for Hadoop image

Find and connect the customer data that matters most

IBM InfoSphere® Big Match for Hadoop helps you analyze massive volumes of structured and unstructured data to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to provide more complete and accurate customer information — without the risks of moving data from source to source. The solution supports platforms running Apache Hadoop such as Cloudera.

Gain unique customer insights

Enables you to find information buried in your data — like Twitter feeds, email, call center logs — and match them to entities, such as customers from other sources, to gain a more complete picture.

Connect and analyze masses of data

Uses probabilistic matching technology, combined with big data accelerators and text analytics, to extract relevant information and help you connect customer identities at the speed of business.

Use the processing power of Hadoop

Reduces bottlenecks in data parsing and integration by loading all data natively in the Hadoop environment for faster analysis and integration, with fewer constraints.

Features

Programer sitting on desk discussing with mixed team of software developers about artificial intelligence innovation.
Matching algorithms

Uses statistical learning algorithms and a probabilistic matching engine running natively within Hadoop for fast and more accurate customer data matching.

Aerial view of illuminated road interchange or highway intersection with busy urban traffic speeding on the road at night.
Fast processing and deployment

Provides configurable prebuilt algorithms and templates to help you deploy in hours instead of spending weeks or months developing code. Uses distributed processing to accelerate matching of big data volumes.

Asian girl working on computer with codes
API support

Provides support for Java and REST-based APIs, which can be used by third-party applications.

Happy, tablet and woman with laptop, office and media strategist with info, typing and digital marketing. Online, smile and research for target audience, planning and person with tech in business.
Searching and export capabilities

Provides search functions, as well as export — with entity ID — and extract capabilities to allow data to be consumed by downstream systems.

Apache Spark logo and wordmark.
Apache Spark support

Provides Spark-based utilities and visualization to further enable analysis of results. Spark’s advanced analytics and data science capabilities include near real-time streaming through micro batch processing and graph computation analysis.

Programer sitting on desk discussing with mixed team of software developers about artificial intelligence innovation.
Matching algorithms

Uses statistical learning algorithms and a probabilistic matching engine running natively within Hadoop for fast and more accurate customer data matching.

Aerial view of illuminated road interchange or highway intersection with busy urban traffic speeding on the road at night.
Fast processing and deployment

Provides configurable prebuilt algorithms and templates to help you deploy in hours instead of spending weeks or months developing code. Uses distributed processing to accelerate matching of big data volumes.

Asian girl working on computer with codes
API support

Provides support for Java and REST-based APIs, which can be used by third-party applications.

Happy, tablet and woman with laptop, office and media strategist with info, typing and digital marketing. Online, smile and research for target audience, planning and person with tech in business.
Searching and export capabilities

Provides search functions, as well as export — with entity ID — and extract capabilities to allow data to be consumed by downstream systems.

Apache Spark logo and wordmark.
Apache Spark support

Provides Spark-based utilities and visualization to further enable analysis of results. Spark’s advanced analytics and data science capabilities include near real-time streaming through micro batch processing and graph computation analysis.

Related products

IBM InfoSphere Global Name Management

Helps manage, search, analyze and compare multicultural name data sets.

IBM InfoSphere Identity Insight

Predict and preempt criminal activity and risk with entity resolution and analytics.

IBM InfoSphere Master Data Management

Manages enterprise data, presents it in a single trusted view, empowers business users and delivers analytic capabilities.

Master data management tools and solutions

Learn more about master data management

Take the next step

Expert resources to help you succeed.

More ways to explore Product documentation Hadoop resources Master data management tools and solutions Data and Analytics Consulting Services
Legal information

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.