IBM Streams 4.2

IBM Streams glossary

This glossary includes terms and definitions for IBM® Streams.

The following cross-references are used in this glossary:

See refers you from a term to a preferred synonym, or from an acronym or abbreviation to the defined full form.
See also refers you to a related or contrasting term.

To view glossaries for other IBM products, go to www.ibm.com/software/globalization/terminology.

@ A B C D E F G H I J L M N O P R S T V W

@

@autonomous: An SPL annotation that defines operators and streams as part of an autonomous region. The @autonomous annotation is associated with one or more operators that define the start of an autonomous region. By default, operators are implicitly defined in an autonomous region.
@consistent: An SPL annotation that defines operators and streams as part of a consistent region. The @consistent annotation is associated with one or more operators that define the start of a consistent region. Consistent regions can merge if their subgraphs intersect.
@catch: An SPL annotation that specifies that exceptions of a specified type that are thrown by an operator while processing tuples are caught.
@parallel: An SPL annotation that specifies that parallel processing of streaming data in specific regions of your IBM Streams application is used.
@view: An SPL annotation that defines that a view is to be created when the streams processing application is run.

A

accelerator: A set of software components that help developers implement big data solutions and that help data scientists iterate and refine data analyses that produce more meaningful results.
access control list (ACL): IBM Streams uses access control lists to enforce security. An ACL is composed of the type of domain or instance object to secure and the actions that a group, user, or role is authorized to perform against the object.
adapter: An intermediary software component that allows two other software components to communicate with one another.
analytics: The science of studying data in order to find meaningful patterns in the data and draw conclusions based on those patterns.
annotation: A modification to the invocation of an operator that alters the behavior of stream processing applications. Annotation names are prefixed with the at symbol (@).
annotation query language (AQL): A query language that is used for text-based information extraction.
application: One or more computer programs or software components that provide a function in direct support of a specific business process or processes. See stream processing application.
application bundle file: A file that contains all toolkit artifacts that are needed to run a stream processing application. You submit the application bundle file (.sab) to the IBM Streams instance. See also stream processing application.
application description language file (ADL file): A configuration file that is created when a stream processing application is compiled. See also application bundle file and stream processing application.
application graph: See data flow graph.
application scope: An attribute of a stream processing application that limits which streams can connect to each other by using import and export operators. An application can only connect to another application if they share an application scope.
AQL view: A collection of tuples that are produced from text extraction on a particular document.
attribute: A named data value in a tuple. Each attribute has a specific data type.
autonomous region: A region where operators are either implicitly or explicitly annotated with the @autonomous annotation. The SPL compiler calculates the extent of downstream operators that are contained in the region. During run time, operators in the region process tuples as they arrive. By default, operators are in an autonomous region.

B

basic domain: A basic domain has a single resource and user. It uses Apache ZooKeeper for managing and storing configuration information. See also domain and enterprise domain.
bootstrap properties: The bootstrap properties contain information necessary to start components of IBM Streams, such as the embedded ZooKeeper.
build configuration: The metadata that describes how an application is built, including compiler parameters and dependencies. See also launch configuration.

C

cluster: A logical grouping of resources in a network. See also resource.
code generation template: The mixed-mode source file that is used by a generic operator to generate specific customizations. See also generic operator.
collection: A kind of data type. The three kinds of collections are a list, set, or map.
collection resource: A resource that provides access to information about a set of artifacts of the same type, such as jobs, operators, or processing elements.
composite operator: An operator that is implemented in the Streams Processing Language (SPL) that encapsulates a subgraph of a data flow graph that can be parameterized to make it reusable in multiple stream processing applications. See also operator, data flow graph, subgraph, and main composite operator.
config: A directive that describes how the compiler builds an operator invocation or how the runtime system executes the operator. See also operator and operator invocation.
consistent region: A region that is defined by the @consistent SPL annotation on a starting operator. The SPL compiler calculates the extent of downstream operators that are contained in the region. During run time, operators in the region process all tuples at least once.
consistent state: A point in time where all tuples for all streams in a consistent region have been fully processed by the operators in the consistent region.
Custom operator: An operator written in the Streams Processing Language (SPL). Contrast with primitive operator. See also operator and primitive operator.

D

data flow: The transfer of data between constants, variables, and files by running statements, procedures, modules, or programs.
data flow graph: A representation of the set of operators and the streams that connect them within a stream processing application. See also operator and stream.
data mining: The process of collecting critical business information from a data warehouse, correlating the information and uncovering associations, patterns, and trends.
data parallelism: A situation in which parallel tasks perform the same computation on different sets of data
data source: The source of data itself, such as a database or XML file, and the connection information necessary for accessing the data.
deploy: To place files into an operational environment. For example, when you submit an application bundle file, the IBM Streams instance automatically deploys the file to the resource where the job runs.
deserialize: To reconstruct an object from serialized data.
distributed execution environment: The IBM Streams instance runtime system that executes a stream processing application. See also stand-alone execution environment.
distributed file system: A file system that is composed of files or directories that physically exist on more than one computer in a communications network.
domain: An IBM Streams domain is a logical grouping of resources in a network for common management and administration. To use IBM Streams, you must create at least one domain. See also basic domain and enterprise domain.
domain cache: The domain cache contains a mapping between domains and their ZooKeeper connection information. The domain cache is used when you run streamtool commands. There is one domain cache per user, and it is stored in the .streams/var directory.
domain controller: A domain controller service runs on every resource in the domain and manages all of the other services on that resource.
domain host installation package: A subset of the IBM Streams installation package, which you can use to install IBM Streams with a smaller footprint on hosts in a domain. At least one host must install the larger IBM Streams product image. See also installation package.
Domain Manager: A user interface that you can use to perform postinstallation tasks such as creating, starting, stopping, and removing IBM Streams domains.

E

engine: A program that provides essential functions for other programs.
enterprise domain: An enterprise domain can have multiple resources and users. This type of domain is typically used for production environments. You can configure high availability to ensure that IBM Streams can continue to run even if resources fail or are not available. See also domain and basic domain.
embedded ZooKeeper: An embedded version of Apache ZooKeeper is installed with the product and managed by IBM Streams. You can use this embedded ZooKeeper for managing and storing configuration information in basic domains.
exported stream: A sequence of tuples that is output from an operator, making it available for other operators and stream processing applications to import. The stream can only be imported by applications that are running in the same streaming middleware instance. See also stream.
expression mode: In a composite operator parameter declaration, the type of an accepted parameter value. The expression mode can be an attribute, an expression, or a constant.
externally managed resource: A resource that is allocated to IBM Streams by an external resource manager, such as IBM Platform Symphony®, Apache Hadoop YARN, or a user-defined resource manager. See also resource.
external resource manager: A resource manager that runs separately from IBM Streams and allocates externally managed resources that can be used to run IBM Streams domain and instance services and stream processing applications. See also IBM Streams resource manager.
external ZooKeeper: An external Apache ZooKeeper server is required for an enterprise domain. The ZooKeeper server or collection of servers (also known as an ensemble) must be installed and configured before you create the enterprise domain. See also enterprise domain and embedded ZooKeeper.

F

failover: An automatic operation that switches to a standby service in the event of a software, hardware, or network interruption.
feed: A data format that contains periodically updated content that is available to multiple users, applications, or both.
fuse: To combine multiple operator invocations in a data flow graph into the same partition and thus into the same processing element.

G

generic operator: A primitive operator that contains mixed-mode code, both C++ code and Perl code. The Perl code generates C++ code that augments the other C++ code and provides specific customization for that operator invocation. See also non-generic operator and primitive operator.

H

high availability: The process of monitoring resources and applications for errors and failing over to standby services in order to maintain availability of those resources for consumers. This concept is sometimes also known as continuous operations.
host: A host is a computer that IBM Streams uses as a resource for running domain services, instance services, and stream processing applications. See IBM Streams resources.

I

IBM Streams Console: A user interface that you can use to monitor and manage the resources, instances, and applications in a domain.
imported stream: A sequence of tuples that is imported by an operator. Imported streams are matched to exported streams. The match can be done by subscription (also known as properties) or by a stream name. The stream can only be imported from operators or applications that are running in the same streaming middleware instance. See also stream.
IBM Streams instance: Each IBM Streams instance operates as an autonomous unit and can be shared by multiple users. You can create multiple instances within a domain.
IBM Streams resource: A resource that is allocated by the IBM Streams resource manager. IBM Streams resources are always hosts. See also resource.
IBM Streams resource manager: The default resource manager in IBM Streams, which allocates IBM Streams resources that can be used to run domain and instance services and stream processing applications. See also external resource manager.
installation package: An installable unit of a software product. Software product packages are separately installable units that can operate independently from other packages of that software product. See also domain host installation package.
instance: A specific occurrence of an object that belongs to a class.
instance graph: A graphical view of the stream processing applications that are running on an IBM Streams instance. Color schemes communicate the health of the processing elements and streams as well as other metrics, such as data flow rates or the number of processed tuples.

J

job: An instance of a running stream processing application as defined in the application bundle file. See also stream processing application.
job group: A group of jobs that have the same authority or permissions.

L

launch configuration: The metadata that describes how a stream processing application is launched, including a reference to the instance, the ADL file, and runtime parameters. See also build configuration.
logical application: A compiled stream processing application. See also physical application.

M

main composite operator: A composite operator that encapsulates the data flow graph, that is the root of that graph, that has no input or output ports, and that when compiled represents a stream processing application. See also composite operator and data flow graph.
mixed-mode application: An application that includes mixed-mode code, both Perl code and SPL code. The Perl code augments the existing SPL code. See also stream processing application.
mutability: The capability of modifying tuples on a port. Both input ports and output ports can be defined as mutating or non-mutating.

N

namespace: A logical container in which all the names are unique. The unique identifier for an artifact is composed of the namespace and the local name of the artifact.
native function: A function written in C++ or Java™ code and that can be invoked from SPL code.
node: A computer that is part of a clustered system. See host.
non-generic operator: A primitive operator implemented entirely in C++ code. See also generic operator and primitive operator.

O

operator: A program that processes tuples in an incoming stream and produces an output stream as a result. An operator can have any number of input ports and any number of output ports. See also tuple, composite operator and primitive operator.
operator instance: See operator invocation.
operator invocation: An instance of an operator that was defined for a specific context. See operator.
operator model: An XML document that describes the basic syntactic and semantic properties of a primitive operator. See also primitive operator.

P

parallelism: The state of a computer program in which parts of the program can be concurrently executed.
parallel transformation: The application of transformation rules for creating a physical application from a logical application in order to enable data parallelism in a stream processing application. See also logical application and physical application.
partition: (1) A set of operator invocations that are fused together into a processing element. See also operator invocation, fuse, and processing element.; (2) In a window, a logical set of tuples based on an expression. See also window.
permission: The ability to perform an action against an IBM Streams object, such as a domain, instance, host, or job. Permissions are assigned by using an access control list.
physical application: An application that was submitted to the runtime system. See also logical application.
PE: See processing element.
port: The point of connection of an operator to a stream. Input ports consume one or more streams, whereas output ports produce a stream.
primitive operator: An operator that is implemented in the C++ or Java language and that includes an operator model that describes the syntax and semantics of the operator.
process: An instance of a program running on a system and the resources that it uses.; A sequence of instructions that a computer can interpret and run without a user's intervention.
processing element (PE): An operating system process that includes the operators and streams that are defined in a data flow graph or subgraph of a stream processing application.
program: A sequence of instructions that a computer can interpret and run without a user's intervention.
project: In Streams Studio, a logical container for the artifacts of toolkits and stream processing applications.
punctuation: A control signal within a stream that either creates boundaries within a stream of tuples (window punctuation marks) or identifies the end of a stream (final punctuation marks). See also window.

R

region: A subgraph of an SPL application where the operators and streams are related. Typically, operators are related by an annotation.
resource: A physical or logical entity that IBM Streams uses to run services. See also externally managed resource and IBM Streams resource.
role: A collection of permissions that can be assigned to a user or group of users.

S

sink operator: An operator that sends information as a stream to an external system, such as a dashboard, web server, mail server, or a database.
source operator: An operator that fetches information from an external system, such as a sensor, messaging system, or a database, and presents that information as a stream.
splitter: A runtime component that exists on the output port of an operator and that sends tuples to different channels in the parallel region.
stand-alone execution environment: The platform that runs a stream processing application locally as an executable and does not require the IBM Streams runtime system. See also distributed application.
standby: An idle service that is available to replace another service that is currently in use. See also high availability.
stateful: Of or pertaining to a system or process that tracks the state of interaction. Stateful means the computer or program tracks the state of interaction, usually by setting values in a storage field that is designated for that purpose.
stateless: Having no record of previous interactions. A stateless server processes requests based solely on information that is provided with the request itself, and not based on memory from earlier requests.
stream: A sequence of tuples. See also tuple.
stream processing application: An application that consists of a main composite operator with at least one primitive operator and possibly one or more composite or primitive operators, all of which process streams of data. See also main composite operator, composite operator, and primitive operator.
Streams Processing Language (SPL): A programming language that is used to create stream processing applications. See stream processing application.
subgraph: A data flow graph for a composite operator that is reused by stream processing applications. See data flow graph.

T

tag: An identifier that is associated with one or more resources and helps group resources that have different physical characteristics or logical uses. Resources can have any number of tags.
toolkit: A collection of artifacts that are organized into a package. A toolkit includes one or more namespaces, which contain the functions, operators, and types that are packaged as part of the toolkit, all of which can then be reused in other applications.
trigger: A mechanism that detects an occurrence and can cause processing in response.
tuple: An individual piece of data in a stream that is represented as a set of attributes and data values. Typically, the data values in a tuple represent a single observation of data, such as a stock ticker quote or a temperature reading from an individual sensor.

V

view: The metadata that describes how the runtime system samples the tuples in a stream for visualization. See AQL view.

W

window: A logical container for a defined set of tuples that were received by an input port of an operator and that are typically maintained in memory.

Z

ZooKeeper connection string: One or more host and port pairs that identify the ZooKeeper servers for use with IBM Streams.