Impact analysis

Use Impact analysis on an exploration to determine the impact of data fields on each other as well as identify and quantify relationships across the data. Generative AI interprets and summarizes the analysis to describe the direct and indirect impact of the data fields on each other, as well as any dependencies and relationships between them.

The initial release of Impact analysis is available only in Planning Analytics Workspace Cloud and in English only.
Important: The Impact analysis feature is part of the Planning Analytics AI assistant add-on, which requires a separate license purchase; it is not included in a standard Planning Analytics Workspace license. To purchase a license, contact your IBM® sales representative or go directly to your IBM account.

To start an Impact analysis, click an exploration in a book. Then, click AI operations in the toolbar and select Impact analysis. To run the analysis, the exploration must have a minimum of two rows and six columns. The feature considers only the first 120 columns in the exploration for the analysis.

The completed analysis results in a Bubble chart, which visually depicts the relationships between the data fields, along with an AI-generated summary of the analysis.

Impact analysis chart with summary

You can hover over a bubble to see the data field name and its direct and indirect impact values. You can also click the labels in the legend to show or hide the related bubbles in the chart.

When you click a bubble in the chart, the related section under Relative impact analysis expands to display the relative contribution of data fields on the selected data. The Relative impact analysis quantifies the degree of association between two data fields so that you can view the data that drives or influences the selected data.

Techniques used in impact analysis

Two techniques, Linear Principal Component Analysis (PCA) and Kernel PCA, are used to identify and extract key features from the data. Scores are compared from both methods to gain insight into the underlying structure of the data and the importance of individual features.

Principal Component Analysis (PCA)
PCA is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while retaining as much variance as possible. This is crucial in data analysis as it helps in:
  • Reducing complexity by simplifying datasets to make them easier to visualize and interpret.
  • Removing noise by filtering out less informative features, which can improve model performance.
  • Uncovering patterns by revealing relationships and patterns that might not be evident in high-dimensional data.
  • Unsupervised learning as PCA and Kernel PCA are both unsupervised techniques and they analyze the data without any reference to target labels. This allows for a more objective exploration of the data’s structure and relationships among features.

Linear PCA operates under the assumption that the data is linearly separable and identifies the directions (principal components) in which the data varies the most. The scores from Linear PCA represent the projection of the original data onto these principal components. Each score indicates how much variance a feature contributes to the overall data structure. A higher score signifies that a feature is more influential in determining the position of the data point in the reduced space.

Kernel PCA extends the capabilities of Linear PCA by applying a kernel function to project data into a higher-dimensional space, capturing nonlinear relationships. This makes Kernel PCA useful for datasets where the relationships between features are not linear. Kernel PCA scores reveal the importance of features, but they also account for complex interactions and nonlinearity in the data. A high Kernel PCA score suggests that a feature plays a significant role in the structure of the data in the transformed space.

Feature scores
Understanding the scores from both Linear and Kernel PCA is important for:
  • Feature selection: Identifying which features to keep or discard based on their importance.
  • Model optimization: Improving machine learning models by focusing on the most informative features.
  • Data interpretation: Gaining insights into the data structure and feature interactions, which can guide further analysis and decision-making.
Visualization of scores
To compare the results of Linear PCA and Kernel PCA, scores from Linear PCA are plotted on the x-axis and the scores from Kernel PCA, on the y-axis. This visual representation helps in assessing the relationship between the two methods.
Points in different areas of the Bubble chart can indicate how features perform under linear versus nonlinear transformations.
  • Upper right area: Features with high scores in both Linear and Kernel PCA indicate strong relevance and contribution to the data structure.
  • Bottom left area: Features with low scores in both methods might be less informative.
  • Cross-area scores: Features that score high in one method but low in another might warrant further investigation to understand their unique contributions.

Relative impact analysis is achieved by using Gradient Booster Regressor. This technique identifies contributor features related to a specific target variable in a dataset. Gradient Boosting is an ensemble learning technique that constructs a model in a stage-wise fashion, optimizing for accuracy by combining the predictions of multiple weak learners. This approach not only enhances prediction capabilities but also provides essential insights into feature importance, helping to identify the features that significantly impact the target variable.

AI-generated summary

The Impact analysis feature uses granite-3-8b-instruct from the Granite family of IBM foundation models to generate the summary.

The Granite models are decoder-only models that can efficiently predict and generate language. These models were built with trusted data that has the following characteristics:
  • Sourced from quality data sets in domains such as finance (SEC Filings), law (Free Law), technology (Stack Exchange), science (arXiv, DeepMind Mathematics), literature (Project Gutenberg (PG-19)), and more.
  • Compliant with rigorous IBM data clearance and governance standards.
  • Scrubbed of hate, abuse, and profanity, data duplication, and blocklisted URLs, among other things.
Note: IBM is committed to building AI that is open, trusted, targeted, and empowering. For more information about contractual protections that are related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Tips on working with AI

While working with Impact analysis, here are few things that you can do that might lead the AI model to generate a better analysis:

  • Ensure that the values in the exploration data are numeric.
  • Remove spacers from rows or columns.
  • Remove duplicate values.