IBM InfoSphere Information Server suite-wide glossary

This glossary includes terms and definitions for IBM® InfoSphere® Information Server.

The following cross-references are used in this glossary:
  • See refers you from a term to a preferred synonym, or from an acronym or abbreviation to the defined full form.
  • See also refers you to a related or contrasting term.

To view glossaries for other IBM products, go to www.ibm.com/software/globalization/terminology.

A

abbreviation
A shortened form of a word or phrase that represents the full form of the term in a glossary.
accepted term
A term in a glossary that has been accepted as a new, valid term for general use within an organization by the glossary administrator. See also candidate term.
action
The part of a standardization rule that specifies how the rule processes a record. See also condition, standardization rule.
aggregate
1. (n) In information analysis, a calculation that returns a single result value from several relational data rows or dimensional members. Typical examples of an aggregate are total and average.
2. (v) To collect related information for processing and analysis.
analysis database
A database that InfoSphere Information Analyzer uses when it runs analysis jobs and where it stores the extended analysis information. The analysis database does not contain the InfoSphere Information Analyzer projects, analysis results, and design-time information; all of this information is stored in the metadata repository.
asset collection
A set of assets that have been grouped together to work on as a set instead of individually.
assign
To link between two assets in the metadata repository.

B

base column
In a cross-domain or cross-table information analysis, the column of data that is the driver for the analysis.
baseline analysis
A type of data analysis that compares a saved view of the results of a column analysis to another view of the same results that are captured later.
benchmark
A quantitative quality standard that defines the minimum level of acceptability for the data or the tolerance for some level of exceptions in a data analysis.
binding
In information analysis, a direct relationship between a logical element in a data rule and an actual column in a table in a data source.
blocking
A process that partitions records into subsets that share common characteristics with the goal to limit the number of record pairs being examined during matching. By limiting matching to records pairs within a subset, successful matching becomes computationally feasible for large data sets.
blueprint
A collection of diagrams that include information technology elements, which represent the architecture of an information project, and method elements, which represent standard information technology practices.
bridge
A component that converts metadata from one format to another format by mapping the metadata elements to a standard model. This model translates the semantics of the source tool into the semantics of the target tool. For example, the source tool might be a business intelligence or data modeling tool, and the target tool might be the metadata repository. Or, the source tool might be the metadata repository, and the target tool might be a file that is used by a data modeling tool. See also connector, metadata repository.
business analyst
A specialist who analyzes business needs and problems. A business analyst consults with users and stakeholders to identify opportunities for improving business return through information technology and then transforms the requirements into a technical form.
business intelligence asset (BI asset)
An information asset that is used by business intelligence (BI) tools to organize reports and models that provide a business view of data. These assets include BI reports, BI models, BI collections, and cubes.
business lineage
The lifecycle of a unit of data, such as a table or a column, as it moves between information assets (data sources). Unlike data lineage, no information about data transformations is included in business lineage information. Business lineage information is like a summary of data lineage information. See also data lineage.
business metadata
Metadata that provides a business context and a business name for assets that are created and managed by other applications. Business metadata includes terms, information governance rules, labels, and stewards.

C

candidate column
A column that is used as a placeholder in a mapping.
candidate term
A term in a glossary that is being considered but that has not yet become standard or accepted. See also accepted term, standard term.
cardinality
In information analysis, a measure of the number of unique values in a column.
catalog
An authoritative dictionary of the assets and the metadata about assets that are used throughout the enterprise. A catalog is the collection of glossary assets, metadata about information assets, and metadata about external assets that is stored in the metadata repository.
category
A word or phrase that classifies and organizes terms in the glossary. A category can contain other categories, and it can also contain terms. In addition, a category can reference terms that it does not contain.
class
The syntactic category for a group of related values. A value can be assigned to different classes in different contexts or scenarios. See also value.
classification
1. The process of grouping values into specific classes. See also class.
2. The system that defines classes and the relationships among those classes. See also class.
clerical record
A record for which the matching process cannot definitively determine if the record is a duplicate record or a nonmatched record or if the record is a matched record or a nonmatched record. See also duplicate record, matched record, nonmatched record.
client tier
The client programs and consoles that are used for development, administration, and other tasks for the InfoSphere Information Server suite and product modules and the computers where they are installed.
column analysis
A data quality process that describes the condition of data at the field level.
common domain
In information analysis, the set of columns that share overlapping and potentially redundant values.
commonality
In information analysis, a measure of the number of matching values in a set of paired columns.
complex flat file
A file that has hierarchical structure, especially mainframe data structures and XML files.
compute node
A processing node in a parallel processing environment that handles elements of the job logic. Any processing node that is not a conductor node is a compute node. See also processing node.
condition
The part of a standardization rule that defines the requirements that the record must meet for the rule to apply to that record. A pattern is a type of condition. See also action, pattern, standardization rule.
conductor node
The processing node that initiates the job run. See also processing node.
connector
A component that provides data connectivity and metadata integration for external data sources, such as relational databases or messaging software. A connector typically includes a stage that is specific to the external data source. See also bridge, operator, plug-in.
constant
Data that has an unchanging, predefined value to be used in processing.
contained term
A term in a category in a glossary. A term must be contained by only one category.
context
The hierarchy of elements within which an element exists. For example, the context of a term in a glossary is the hierarchy of categories in which the term is contained.
cross-domain analysis
A type of data analysis that identifies the overlap of data values between two columns of data.
cross-table analysis
A type of data analysis that combines foreign key analysis and cross-domain analysis. Foreign keys reference primary keys that are already defined or identified during primary key analysis.
custom attribute
A user-defined property for an asset that further describes assets of that type. For example, a custom attribute for database tables might be "Expected maximum row count." The custom attribute would be available for every database table, and might contain different values for different database tables.
cutoff
A threshold that specifies how the scored record pairs are categorized as matched, nonmatched, or clerical records based on the weight generated by the matching process.

D

data analyst
A specialist who consults with users, business analysts, and stakeholders and then creates and runs processes in order to review and analyze the content, structure, and quality of data.
database
A collection of interrelated or independent data items that are stored together to serve one or more applications.
database schema
A collection of database objects such as tables, views, indexes, or triggers that define a database. A schema provides a logical classification of database objects.
data class
In information analysis, a classification that designates the logical type of data in a data field. A data class categorizes a column according to how the data in the column is used. For example, the classification INDICATOR represents a binary value such as TRUE/FALSE or YES/NO.
data cleansing
The process of preparing, standardizing, deduplicating, and integrating data from one or more sources such that it conforms to organizational requirements.
data click activity
A simple set of steps that move and transform data.
data enrichment
The process of adding and correcting the values of records from records that have been identified as representing similar entities.
data field
A column or field that contains a specific set of data values that are common to all records in a file or table.
data file
1. A file that stores a collection of fields in a native system file instead of in a database table.
2. The information asset that represents a collection of fields that are stored in a single file, such as a flat file, a complex flat file, or a sequential file.
data file structure
A collection of fields.
data lineage
The lifecycle of a unit of data, such as a table or a column, that indicates where the data comes from or where it goes to and how the data changes as it moves between data stores of any type. Data lineage is often expressed as a graph of a detailed bi-directional data flow. See also business lineage.
data partitioning
The process of logically or physically partitioning data into segments that are more easily maintained or accessed.
data pipelining
The process of pulling records from the source system and moving them through a sequence of functions that are defined in the data flow.
data rule
An expression that is generated out of a data rule definition that evaluates and analyzes conditions found during data profiling and data quality assessment. Data rules define specific tests, validations, or constraints associated with the data.
data set
A set of parallel data files and the descriptor file that refers to them. Data sets optimize the writing of data to disk by preserving the degree of partitioning. See also data file.
data source
The source of data itself, such as a database or XML file, and the connection information necessary for accessing the data.
data store
A place (such as a database system, file, or directory) where data is stored.
deduplication
The process of creating representative records from a set of records that have been identified as representing the same entities. See also matching, survivorship, and data enrichment.
deprecated term
A term in a glossary that is no longer approved for use. Typically, deprecated terms are replaced with a new term or a synonym. See also replacement term.
design metadata
Metadata about the data flow that is included within a job design.
development glossary
A glossary that contains only the categories and terms that are being created or revised as part of a configured workflow and that have not been published yet. See also published glossary.
diagram
A graphical representation of the logical data model or a subject area.
domain
In information analysis, the set of data in a column.
domain analysis
A type of data analysis where the values of columns are identified and marked as invalid values.
DS engine

1. See InfoSphere Information Server engine.

2. See server engine.

duplicate record
A record that matches a master record. The duplicate record is likely to represent the same unique entity as the master record. See also master record.

E

ELT (extract, load, and transform)
The process of extracting data from one or more sources, loading it directly into a relational database, and then running data transformations in the relational database.
engine
See InfoSphere Information Server engine.
engine tier
The logical group of engine components for the InfoSphere Information Server suite and product modules (the InfoSphere Information Server engine components, service agents, and so on) and the computer or computers where those components are installed.
ETL (extract, transform, and load)
The process of collecting data from one or more sources, cleansing and transforming it, and then loading it into a database.
event
An occurrence of significance to a task or system. Events can include completion or failure of an operation, a user action, or the change in state of a process.
exception
A condition or event that might require additional information or investigation.
exception set
Groups of exception records that were generated by a particular event and the details about those exception records.
extended data source
A data structure that cannot be written to disk or that cannot be imported into the metadata repository.
external assets
Assets that exist outside of InfoSphere Information Server products, such as ETL tools, scripts, Java programs, web services, or physical data models.
extract, load, and transform (ELT)
See ELT.
extract, transform, and load (ETL)
See ETL.

F

flat file
A file that has no hierarchical structure.
format analysis
A type of data analysis that validates the pattern of characters that is used to store a data value in selective columns (for example, telephone numbers or Social Security numbers) that have a standard general format.
frequency distribution
In information analysis, the number of occurrences of each unique value in a column and the characteristics of that column. A frequency distribution is a foundation on which other analyses are run when profiling data.

G

general format
In information analysis, the use of a character symbol for each unique data value. For example, all alphabetic characters in a column are replaced with the letter A.
global logical variable
In information analysis, a value that you set to represent a specific piece of data. It is a shared construct that can be used in all data rule definitions. See also data rule.
glossary
The controlled vocabulary and associated information governance policies and rules that define business semantics. Business and IT professionals can use a glossary to manage enterprise-wide information according to defined regulatory requirements or operational needs of the business. See also category, term, information governance policy, information governance rule.
glossary assets
The following set of assets: categories, terms, information governance policies, and information governance rules.

I

impact analysis
The process of identifying where objects are used and what other objects that they depend on.
implemented data resource
An information asset that represents a database and its contents (schemas, database tables, and stored procedures), a data file and its contents (data file structures and data file fields), or a data item definition.
inference
In information analysis, a statistical measure in which probabilities are interpreted as degrees of belief.
inferred data type
In information analysis, the optimum data type that is identified during data analysis that can be used for an individual data value.
information analysis
The data analysis processes that assess the quality of your data, profile the data for integration and migration, and verify any external data sources.
information asset
A piece of information that is of value to the organization and can have relationships, dependencies, or both, with other information assets. Information assets include those assets created or imported by InfoSphere Information Server products, such as business intelligence (BI) reports, jobs, or mapping specifications.
information governance
The procedures that an organization uses to maintain oversight and accountability of information assets.
information governance policy
A natural language description of an information governance subject area. An information governance policy is made up of information governance rules. See information governance rule.
information governance rule
A natural language definition of a characteristic for making information assets compliant with corporate objectives.
InfoSphere Information Server engine
The software that runs tasks or jobs, such as discovery, analysis, cleansing, or transformation. The engine includes the server engine, the parallel engine, and the other components that make up the runtime environment for InfoSphere Information Server and its product modules.
input link
A link that connects a data source to a stage. See also link.
investigation
A process of profiling the data source to understand the source data in order to identify relevant values, structures, and patterns.

J

job
The design objects and compiled programmatic elements that can connect to data sources, extract and transform that data, and then can load that data into a target system. Types of jobs include parallel jobs, sequence jobs, server jobs, and mainframe jobs. See also job design, job executable,job parameter.
job activity
In a sequence job, a type of stage that indicates the actions that occur when the sequence job runs.
job design
The metadata that defines the sources and targets that are used within a job and the logic that operates on the associated data. A job design is composed of stages and the links between those stages. The job design is stored in the metadata repository, separate from the job executable. See also job.
job executable
The set of binary objects, generated scripts, and associated files that are used when running a job. See also job and job design.
job parameter
A processing variable that can be used at various points in a job design and overridden when the job is executed in order to dynamically influence the processing of the job. Job parameters are most often used for paths, file names, database connection information, or job logic. See also job, job design, and parameter set.
job run
A specific run of a job. A job can run multiple times, producing multiple job runs.
job sequence
See sequence job.
job template
A job design that only uses job parameters to specify values during runtime. See also job design and job parameter.

K

key analysis
A type of data analysis that evaluates data tables to find primary, foreign, and natural key candidates.

L

label
A short descriptor or keyword that classifies or categorizes information assets in the metadata repository, including categories and terms in the glossary.
link
A representation of a data flow that joins the stages in a job. A link connects data sources to processing stages, connects processing stages to each other, and also connects those processing stages to target systems. The types of links are input link, output link, reference link, and reject link. See also input link, output link, reference link, reject link.
literal
A character string whose fixed value is defined by the characters themselves.
logical asset
A logical data model element.
logical data model
The data model that captures the business definition of information assets by using the entity-relationship modeling approach. The logical data model consists of a set of related entities and their business associations. The logical data model can be represented graphically in the logical data model diagram. The logical data model contains logical entities, logical relationships, entity generalization hierarchies, and logical domains.
long description
An extended description of a term in a glossary that fully defines the term. See also short description.
lookup table
  1. A database table used to map one or more input values to one or more output values.
  2. A data source that has a key value that jobs use to retrieve reference information.

M

mapping specification
A set of mappings that describe how data is extracted, transformed, or loaded from one data source to another.
master record
During one-source matching, the record that is considered to be the primary record of a set of related records. Each group of two or more matched records has one master record. See also one-source matching.
master schema definition
A physical model of the inferred properties that are generated out of the selected data. It reflects the inferences of the data instead of the original definitions of the metadata.
match comparison
An algorithm that analyzes the values in columns and then calculates a score that contributes to the composite weight, which is used to determine the strength of the match. See also score.
Match Designer database
A database that stores the results of match test passes that are generated by InfoSphere QualityStage.
matched record
A data record that is identified to be the same as a reference record by a two-source matching process. See also two-source matching.
matching
A probabilistic or deterministic record linkage process that automates either the identification of records that are likely to represent the same entity or the identification of a relationship among records.
metadata repository
A shared component that stores design-time, runtime, glossary, and other metadata for product modules in the InfoSphere Information Server suite.
metadata services
A shared set of components that provide common functions (such as import and export) to other product modules in the InfoSphere Information Server suite.
metric
1. A measure to assess performance in a key area of a business.
2. In information analysis, a mathematical calculation that is performed on statistical results from data rules, rule sets, and other metrics themselves. A metric consolidates measurements from various data analysis steps to reduce hundreds of detailed analytical results into a few meaningful measurements that effectively convey the overall quality of the data.

N

node
1. A logical processing unit that is defined in a configuration file by a virtual name and a set of associated details about the physical resources, such as the server, the disks, its pools, and so on.
2. Any computer system that has a parallel engine installed on it. See also parallel engine.
nonmatched record
A record that is not a matched record, clerical record, or duplicate record. See also matched record, clerical record, and duplicate record.

O

one-source matching
The process of matching records within one source. See also matching and deduplication.
operand
An entity on which an operation is performed.
operational metadata
Metadata that describes the events and processes that occur and the objects that are affected when a job is run. See also operations database.
operations database
A component of the metadata repository that stores both the operational metadata and the information about the system resources that were used when a job is run for the product modules in the InfoSphere Information Server suite. See also metadata repository, operational metadata.
operator
A runtime object library that is part of the parallel engine and that executes the logic as defined in its corresponding stage. See also connector, stage.
output link
A link that is connected to a stage and generally moves processed data from the stage. See also link, reject link.
override
An object that defines how to change the processing of data as specified in classifications or standardization rules.

P

pack
A collection of components that extends existing capabilities.
paired column
In a cross-domain analysis or cross-table analysis, the column of data that has been matched to the base column.
parallel engine
The component of the InfoSphere Information Server engine that runs parallel jobs.
parallel job
A job that is compiled and run on the parallel engine and that supports parallel processing system features, including data pipelining, partitioning, and distributed execution. See also job.
parameter set
A set of job parameters. See also job parameter and value set.
parsing
A process that analyzes a sentence or phrase by dividing the strings into tokens before trying to determine the meaning of the strings. See also token.
pattern
The sequence of class labels assigned to the values in a data record which can be used to identify a subset of records that might be standardized the same way. See also class and value.
pattern-action language
The language that defines standardization rules. See also standardization rule.
physical asset
A physical data model element or an implemented data resource.
physical data model
The data model that represents the design schema for the information assets by using the relational model approach. The physical data model is typically generated from the logical data model by using the same modeling tools, although it can be reverse engineered from an existing database. A physical data model can be implemented many times. The physical data model contains design tables, design stored procedures, and physical domains.
physical data resource (PDR)
See implemented data resources.
plug-in
A type of stage that is used to connect to data sources but that does not support parallel processing capabilities. See also connector.
policy
1. The set of characteristics that defines the behavior of a runtime artifact.
2. See information governance policy.
precision
In information analysis, a measurement of the ability to distinguish between nearly equal values.
process asset
A mapping component, a mapping specification, or potentially other similar assets, such as a job, rule, or parameter.
processing node
The logical nodes in the system where jobs are run. The configuration file can define one processing node for each physical node in the system or multiple processing nodes for each physical node. See also compute node.
project
A container that organizes and provides security for objects that are supplied, created, or maintained for data integration, data profiling, quality monitoring, and so on.
publish
To make analysis results, rules, and other entities visible to a broader audience outside the scope of a project.
published glossary
The glossary that includes the set of categories and terms that have been approved and published as part of a configured workflow.
PX engine
See parallel engine.

R

redundancy
In information analysis, a measure of the number of columns that have the same values or common domains.
referenced term
A term in a glossary that is referred to by a category instead of being contained in that category. A term can be referred to by multiple categories. A term cannot be contained by and referenced by the same category.
reference link
An input link on a Transformer or Lookup stage that defines where the lookup tables exist. See also link.
reference match
See two-source matching.
reference table
A data table that you use in comparisons during data analysis.
referential integrity
An analysis that is run after foreign key analysis to ensure that foreign key candidates match the values of an associated primary key.
reject link
An output link that identifies errors when the stage is processing records and that routes those rejected records to a target stage. See also link, output link.
related term
A term in a glossary that is related to the term in question. This relationship can be used for "see also" relationships to terms that are similar but not identical. The relationship is symmetrical; that is, if you specify that term A has term B as a related term, then term B has term A as a related term. A term can have multiple related terms.
relationship
1. A defined connection between the rows of a table or the rows of two tables. A relationship is the internal representation of a referential constraint.
2. An association between glossary assets.
replaced by term
A term in a glossary that supersedes another term. Typically, deprecated terms specify replacement terms to identify which term replaces the deprecated term. See also deprecated term.
report
A set of data deliberately laid out to communicate business information.
repository
A persistent storage area for data and other application resources.
repository tier
The repository tier consists of the metadata repository and, if installed, other data stores to support other product modules. The metadata repository contains the shared metadata, data, and configuration information for InfoSphere Information Server product modules.
representative record
The record that is created during survivorship and populated with the best available data from a group of records. See also survivorship.
routine
A program or sequence of instructions called by a program. Typically, a routine has a general purpose and is frequently used.

S

score
In the matching process, the result of a match comparison. See also matching and match comparison.
separation character
A character that separates or delimits tokens. See also token.
separation list
The list of separation characters. See also separation character.
sequence job
A job whose job design is composed of job activities and the triggers between those job activities that are run in a specified order.
server engine
The component of the InfoSphere Information Server engine that runs server jobs and job sequences.
server job
A job that is compiled and run on the server engine.
services tier
The application server, common services, and product services for the InfoSphere Information Server suite and product modules and the computer or computers where those components are installed.
short description
A brief description that defines a term in a glossary. See also long description.
similarity threshold
In information analysis, a comparison threshold that defines the degree of variation that is allowed in the spelling or representation of another value.
source-to-target mapping
A row in a mapping specification that describes a transformation between one or more source columns and business terms to one or more target columns and business terms.
stage
The element of a job design that describes a data source, a data processing step, or a target system and that defines the processing logic that moves data from input links to output links. A stage is a configured instance of a stage type. See also job design, stage type.
stage type
An object that defines the capabilities of a stage, the parameters of the stage, and the libraries that the stage uses at run time. See also stage.
standard deviation
A measurement of how varied the values in a frequency distribution are from the average value of the distribution. A low standard deviation value means that the values are close to the average value, whereas a high standard deviation value means that the values are more widely dispersed over a large range of values.
standardization
A process that separates records into parts, changes them to implement enterprise data quality standards, and potentially enriches the data for when it is used.
standardization rule
One or more conditions, such as a pattern, and the associated set of actions, which is used to standardize data. See also condition, pattern, action, and standardization.
standard term
A term in a glossary that has been thoroughly evaluated and approved by the team and that has been defined as definitively describing a characteristic of the enterprise or organization. See also candidate term.
standard value
The element of a classification definition that is a standardized spelling or representation of the value and that can be used to facilitate matching.
steward
The user or group of users that is responsible for the definition, purpose, and use of glossary assets or the information assets that are described in the metadata repository. The steward does not have to be a user of the glossary.
strip character
A character to be removed when parsing text into tokens. See also token.
strip list
The list of strip characters. See also strip character.
subscription
(1) The set of mappings between source replication objects and target replication objects.
(2) In the common event framework, a definition of how to process a certain type of event.
survivorship
The data cleansing process of evaluating a group of related records and creating one representative record. See also representative record.
synonym
A term in a glossary that has the same meaning as another term. A term can have multiple synonyms. The relationship is symmetrical and transitive; that is, if term A is a synonym of term B, and term B is a synonym of term C, each term is a synonym of the others.

T

table analysis
A data analysis process that consists of primary key analysis and the assessment of multicolumn primary keys and potential duplicate values.
technical metadata
Metadata that provides details about source and target systems, database table and field structures, and dependencies of information assets.
term
In a glossary, a word or phrase that describes a characteristic of the enterprise. By assigning assets to terms in the glossary, you can organize your information assets based on business meaning.
threshold
A customizable value for defining the acceptable tolerance limits (maximum, minimum, or reference limit) for an application resource or system resource. When the measured value of the resource is greater than the maximum value, less than the minimum value, or equal to the reference value, an exception or event is raised.
tier
The logical group of components and the computers on which those components are installed.
token
A syntactic element, such as a phrase, a word, or a set of one or more characters, that is used for analyzing and processing text.
tokenization
The process that segments data into tokens. See also token and parsing.
tolerance
See benchmark.
trigger
A representation of dependencies between workflow tasks that joins job activities in a sequence job. Job activities typically have one input trigger, but multiple output triggers.
two-source matching
The process of matching records between two sources. See also matching.

U

unduplicate
See deduplication.
unduplicate match
See one-source matching.
uniqueness
In information analysis, a measure of the value occurring exactly once in the table data.

V

validity
A data analysis process that evaluates columns for valid and invalid values.
value
1. When standardizing data, a phrase, a word, or a set of one or more characters that is used for analyzing and processing text. See also token.
2. The content of a variable, parameter, special register, or field.
value set
A named set of values that can be used to override the default values for the job parameters that are grouped in a parameter set. See also job parameter and parameter set.
view
A logical table that is based on data stored in an underlying set of tables. The data returned by a view is determined by a SELECT statement that is run on the underlying tables.
virtual column
In information analysis, a single column or a concatenation of two or more columns that can be analyzed as if it is an existing physical data column.

W

weight
In the matching process, a factor that indicates the relative importance of part of a record. See also score.
workflow
The glossary development process that adds an approval step and a publishing step to the creation or revision of glossary content. New or revised glossary assets are added to the development glossary and sent for review and approval before being published to the published glossary.