Tree-shaped Bayesian network

Question & Answer

Question

What is a tree-shaped Bayesian network?

Answer

A Bayesian network (BN) is a method of representing a joint probability distribution in many variables in a compact way. It is a graphical representation of probabilistically described relationships within a set of attributes.

Bayesian networks express the direct relationships between variables and conditional independencies. BNs are predominantly used to identity primary (direct) and secondary (indirect) relations between variables, to identify variable independence or relative independence, and to derive the strength of secondary relationships from primary ones.

The BN can be either provided directly by an expert, learned from data automatically, or acquired in any kind of mixed mode.

Automatic acquisition from data is algorithmically and computationally complex. Tree-shaped Bayesian networks, however, constitute a simplified subclass of Bayesian networks with restrictions imposed on the type of attribute relationships that can be discovered and represented. (For example, each node has only one parent.) The restrictions permit simpler and more efficient algorithms as well as more straightforward interpretation. Tree-shaped Bayesian networks may not be sufficient for highly accurate prediction, but provide an excellent qualitative description of the relationship structure observed in the data. They are used in many areas, including text analysis and medical diagnosis.

Bayesian network tree discovery algorithms can be applied whenever a Bayesian network is needed for:

determining the major dependencies among attributes.
creating a simplistic model to derive correlations between attributes from a small set of pre-computed ones.
identifying subsets of attributes that do not appear to be related to the same topic, by splitting the tree by the weakest links.
identifying attributes of central importance, that is, those that are central in the network, with many links.

Functionality

The Bayesian representation consists of a directed acyclic graph (DAG) structure, in which each node is annotated with:

1. a distinct (random) variable X_i
2. a table of conditional probabilities of the variable X_i given the set of variables pi{X_i} occurring as its parents in the DAG, denoted P(X_i | pi(X_i)).

The second example, above, specifically refers to the case of BNs that consist of discrete (also known as nominal or non-continuous) variables. But there also exists versions of Bayesian Networks that allow for continuous variables, in which the role of conditional probability tables is taken over by correlations, variances, and means of variables.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"IBM Netezza Analytics","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

NZ558615

Was this topic helpful?

Document Information

More support for:
IBM PureData System

Software version:
1.0.0

Document number:
460727

Modified date:
17 October 2019

UID

swg21568260

IBM Support

Tips