IBM InfoSphere Streams Version 4.1.1
InfoSphere Streams glossary
This glossary includes terms and definitions for InfoSphere® Streams.
The
following cross-references are used in this glossary:
- See refers you from a term to a preferred synonym, or from an acronym or abbreviation to the defined full form.
- See also refers you to a related or contrasting term.
To view glossaries for other IBM® products, go to www.ibm.com/software/globalization/terminology.
@
- @autonomous
- An SPL annotation that defines operators and streams as part of an autonomous region. The @autonomous annotation is associated with one or more operators that define the start of an autonomous region. By default, operators are implicitly defined in an autonomous region.
- @consistent
- An SPL annotation that defines operators and streams as part of a consistent region. The @consistent annotation is associated with one or more operators that define the start of a consistent region. Consistent regions can merge if their subgraphs intersect.
- @view
- An SPL annotation that defines that a view is to be created when the streams processing application is run.
A
- accelerator
- A set of software components that help developers implement big data solutions and that help data scientists iterate and refine data analyses that produce more meaningful results.
- access control list (ACL)
- InfoSphere Streams uses access control lists to enforce security. An ACL is composed of the type of domain or instance object to secure and the actions that a group, user, or role is authorized to perform against the object.
- adapter
- An intermediary software component that allows two other software components to communicate with one another.
- analytics
- The science of studying data in order to find meaningful patterns in the data and draw conclusions based on those patterns.
- annotation
- A modification to the invocation of an operator that alters the behavior of streams processing applications. Annotation names are prefixed with the at symbol (@).
- annotation query language (AQL)
- A query language that is used for text-based information extraction.
- application
- One or more computer programs or software components that provide a function in direct support of a specific business process or processes. See streams processing application.
- application bundle file
- A file that contains all toolkit artifacts that are needed to run a streams processing application. You submit the application bundle file (.sab) to the InfoSphere Streams instance. See also streams processing application.
- application description language file (ADL file)
- A configuration file that is created when a streams processing application is compiled. See also application bundle file and streams processing application.
- application graph
- See data flow graph.
- application scope
- An attribute of a streams processing application that limits which streams can connect to each other by using import and export operators. An application can only connect to another application if they share an application scope.
- AQL view
- A collection of tuples that are produced from text extraction on a particular document.
- attribute
- A named data value in a tuple. Each attribute has a specific data type.
- autonomous region
- A region where operators are either implicitly or explicitly annotated with the @autonomous annotation. The SPL compiler calculates the extent of downstream operators that are contained in the region. During run time, operators in the region process tuples as they arrive. By default, operators are in an autonomous region.
B
- basic domain
- A basic domain has a single resource and user. It uses Apache ZooKeeper for managing and storing configuration information. See also domain and enterprise domain.
- bootstrap properties
- The bootstrap properties contain information necessary to start components of InfoSphere Streams, such as the embedded ZooKeeper.
- build configuration
- The metadata that describes how an application is built, including compiler parameters and dependencies. See also launch configuration.
C
- cluster
- A logical grouping of resources in a network. See also resource.
- code generation template
- The mixed-mode source file that is used by a generic operator to generate specific customizations. See also generic operator.
- collection
- A kind of data type. The three kinds of collections are a list, set, or map.
- collection resource
- A resource that provides access to information about a set of artifacts of the same type, such as jobs, operators, or processing elements.
- composite operator
- An operator that is implemented in the Streams Processing Language (SPL) that encapsulates a subgraph of a data flow graph that can be parameterized to make it reusable in multiple streams processing applications. See also operator, data flow graph, subgraph, and main composite operator.
- config
- A directive that describes how the compiler builds an operator invocation or how the runtime system executes the operator. See also operator and operator invocation.
- consistent region
- A region that is defined by the @consistent SPL annotation on a starting operator. The SPL compiler calculates the extent of downstream operators that are contained in the region. During run time, operators in the region process all tuples at least once.
- consistent state
- A point in time where all tuples for all streams in a consistent region have been fully processed by the operators in the consistent region.
- Custom operator
- An operator written in the Streams Processing Language (SPL). Contrast with primitive operator. See also operator and primitive operator.
D
- data flow
- The transfer of data between constants, variables, and files by running statements, procedures, modules, or programs.
- data flow graph
- A representation of the set of operators and the streams that connect them within a streams processing application. See also operator and stream.
- data mining
- The process of collecting critical business information from a data warehouse, correlating the information and uncovering associations, patterns, and trends.
- data parallelism
- A situation in which parallel tasks perform the same computation on different sets of data
- data source
- The source of data itself, such as a database or XML file, and the connection information necessary for accessing the data.
- deploy
- To place files into an operational environment. For example, when you submit an application bundle file, the InfoSphere Streams instance automatically deploys the file to the resource where the job runs.
- deserialize
- To reconstruct an object from serialized data.
- distributed application
- A streams processing application that runs on the runtime system. See also stand-alone application.
- distributed file system
- A file system that is composed of files or directories that physically exist on more than one computer in a communications network.
- domain
- An InfoSphere Streams domain is a logical grouping of resources in a network for common management and administration. To use InfoSphere Streams, you must create at least one domain. See also basic domain and enterprise domain.
- domain cache
- The domain cache contains a mapping between domains and their ZooKeeper connection information. The domain cache is used when you run streamtool commands. There is one domain cache per user, and it is stored in the .streams/var directory.
- domain controller
- A domain controller service runs on every resource in the domain and manages all of the other services on that resource.
- domain host installation package
- A subset of the InfoSphere Streams installation package, which you can use to install InfoSphere Streams with a smaller footprint on hosts in a domain. At least one host must install the larger InfoSphere Streams product image. See also installation package.
- Domain Manager
- A user interface that you can use to perform postinstallation tasks such as creating, starting, stopping, and removing InfoSphere Streams domains.
E
- engine
- A program that provides essential functions for other programs.
- enterprise domain
- An enterprise domain can have multiple resources and users. This type of domain is typically used for production environments. You can configure high availability to ensure that InfoSphere Streams can continue to run even if resources fail or are not available. See also domain and basic domain.
- embedded ZooKeeper
- An embedded version of Apache ZooKeeper is installed with the product and managed by InfoSphere Streams. You can use this embedded ZooKeeper for managing and storing configuration information in basic domains.
- exported stream
- A sequence of tuples that is output from an operator, making it available for other operators and streams processing applications to import. The stream can only be imported by applications that are running in the same streaming middleware instance. See also stream.
- expression mode
- In a composite operator parameter declaration, the type of an accepted parameter value. The expression mode can be an attribute, an expression, or a constant.
- externally managed resource
- A resource that is allocated to InfoSphere Streams by an external resource manager, such as IBM Platform Symphony®, Apache Hadoop YARN, or a user-defined resource manager. See also resource.
- external resource manager
- A resource manager that runs separately from InfoSphere Streams and allocates externally managed resources that can be used to run InfoSphere Streams domain and instance services and streams processing applications. See also InfoSphere Streams resource manager.
- external ZooKeeper
- An external Apache ZooKeeper server is required for an enterprise domain. The ZooKeeper server or collection of servers (also known as an ensemble) must be installed and configured before you create the enterprise domain. See also enterprise domain and embedded ZooKeeper.
F
- failover
- An automatic operation that switches to a standby service in the event of a software, hardware, or network interruption.
- feed
- A data format that contains periodically updated content that is available to multiple users, applications, or both.
- fuse
- To combine multiple operator invocations in a data flow graph into the same partition and thus into the same processing element.
G
- generic operator
- A primitive operator that contains mixed-mode code, both C++ code and Perl code. The Perl code generates C++ code that augments the other C++ code and provides specific customization for that operator invocation. See also non-generic operator and primitive operator.
H
- high availability
- The process of monitoring resources and applications for errors and failing over to standby services in order to maintain availability of those resources for consumers. This concept is sometimes also known as continuous operations.
- host
- A host is a computer that InfoSphere Streams uses as a resource for running domain services, instance services, and streams processing applications. See InfoSphere Streams resources.
I
- IBM InfoSphere Streams Console
- A user interface that you can use to monitor and manage the resources, instances, and applications in a domain.
- imported stream
- A sequence of tuples that is imported by an operator. Imported streams are matched to exported streams. The match can be done by subscription (also known as properties) or by a stream name. The stream can only be imported from operators or applications that are running in the same streaming middleware instance. See also stream.
- InfoSphere Streams instance
- Each InfoSphere Streams instance operates as an autonomous unit and can be shared by multiple users. You can create multiple instances within a domain.
- InfoSphere Streams resource
- A resource that is allocated by the InfoSphere Streams resource manager. InfoSphere Streams resources are always hosts. See also resource.
- InfoSphere Streams resource manager
- The default resource manager in InfoSphere Streams, which allocates InfoSphere Streams resources that can be used to run domain and instance services and streams processing applications. See also external resource manager.
- installation package
- An installable unit of a software product. Software product packages are separately installable units that can operate independently from other packages of that software product. See also domain host installation package.
- instance
- A specific occurrence of an object that belongs to a class.
- instance graph
- A graphical view of the streams processing applications that are running on an InfoSphere Streams instance. Color schemes communicate the health of the processing elements and streams as well as other metrics, such as data flow rates or the number of processed tuples.
J
- job
- An instance of a running streams processing application as defined in the application bundle file. See also streams processing application.
- job group
- A group of jobs that have the same authority or permissions.
L
- launch configuration
- The metadata that describes how a streams processing application is launched, including a reference to the instance, the ADL file, and runtime parameters. See also build configuration.
- logical application
- A compiled streams processing application. See also physical application.
M
- main composite operator
- A composite operator that encapsulates the data flow graph, that is the root of that graph, that has no input or output ports, and that when compiled represents a streams processing application. See also composite operator and data flow graph.
- mixed-mode application
- An application that includes mixed-mode code, both Perl code and SPL code. The Perl code augments the existing SPL code. See also streams processing application.
- mutability
- The capability of modifying tuples on a port. Both input ports and output ports can be defined as mutating or non-mutating.
N
- namespace
- A logical container in which all the names are unique. The unique identifier for an artifact is composed of the namespace and the local name of the artifact.
- native function
- A function written in C++ or Java™ code and that can be invoked from SPL code.
- node
- A computer that is part of a clustered system. See host.
- non-generic operator
- A primitive operator implemented entirely in C++ code. See also generic operator and primitive operator.
O
- operator
- A program that processes tuples in an incoming stream and produces an output stream as a result. An operator can have any number of input ports and any number of output ports. See also tuple, composite operator and primitive operator.
- operator instance
- See operator invocation.
- operator invocation
- An instance of an operator that was defined for a specific context. See operator.
- operator model
- An XML document that describes the basic syntactic and semantic properties of a primitive operator. See also primitive operator.
P
- parallelism
- The state of a computer program in which parts of the program can be concurrently executed.
- parallel transformation
- The application of transformation rules for creating a physical application from a logical application in order to enable data parallelism in a streams processing application. See also logical application and physical application.
- partition
- (1) A set of operator invocations that are fused together into a processing element. See also operator invocation, fuse, and processing element.
- (2) In a window, a logical set of tuples based on an expression. See also window.
- permission
- The ability to perform an action against an InfoSphere Streams object, such as a domain, instance, host, or job. Permissions are assigned by using an access control list.
- physical application
- An application that was submitted to the runtime system. See also logical application.
- PE
- See processing element.
- port
- The point of connection of an operator to a stream. Input ports consume one or more streams, whereas output ports produce a stream.
- primitive operator
- An operator that is implemented in the C++ or Java language and that includes an operator model that describes the syntax and semantics of the operator.
- process
- An instance of a program running on a system and the resources that it uses.
- A sequence of instructions that a computer can interpret and run without a user's intervention.
- processing element (PE)
- An individual execution program that includes the operators and stream that are defined in a data flow graph or subgraph of a streams processing application.
- program
- A sequence of instructions that a computer can interpret and run without a user's intervention.
- project
- In Streams Studio, a logical container for the artifacts of toolkits and streams processing applications.
- punctuation
- A control signal within a stream that either creates boundaries within a stream of tuples (window punctuation marks) or identifies the end of a stream (final punctuation marks). See also window.
R
- region
- A subgraph of an SPL application where the operators and streams are related. Typically, operators are related by an annotation.
- resource
- A physical or logical entity that InfoSphere Streams uses to run services. See also externally managed resource and InfoSphere Streams resource.
- role
- A collection of permissions that can be assigned to a user or group of users.
S
- sink operator
- An operator that sends information as a stream to an external system, such as a dashboard, web server, mail server, or a database.
- source operator
- An operator that fetches information from an external system, such as a sensor, messaging system, or a database, and presents that information as a stream.
- splitter
- A runtime component that exists on the output port of an operator and that sends tuples to different channels in the parallel region.
- stand-alone application
- A streams processing application that runs locally as an executable and does not require the runtime system. See also distributed application.
- standby
- An idle service that is available to replace another service that is currently in use. See also high availability.
- stateful
- Of or pertaining to a system or process that tracks the state of interaction. Stateful means the computer or program tracks the state of interaction, usually by setting values in a storage field that is designated for that purpose.
- stateless
- Having no record of previous interactions. A stateless server processes requests based solely on information that is provided with the request itself, and not based on memory from earlier requests.
- stream
- A sequence of tuples. See also tuple.
- streams processing application
- An application that consists of a main composite operator with at least one primitive operator and possibly one or more composite or primitive operators, all of which process streams of data. See also main composite operator, composite operator, and primitive operator.
- Streams Processing Language (SPL)
- A programming language that is used to create streams processing applications. See streams processing application.
- subgraph
- A data flow graph for a composite operator that is reused by streams processing applications. See data flow graph.
T
- tag
- An identifier that is associated with one or more resources and helps group resources that have different physical characteristics or logical uses. Resources can have any number of tags.
- toolkit
- A collection of artifacts that are organized into a package. A toolkit includes one or more namespaces, which contain the functions, operators, and types that are packaged as part of the toolkit, all of which can then be reused in other applications.
- trigger
- A mechanism that detects an occurrence and can cause processing in response.
- tuple
- An individual piece of data in a stream that is represented as a set of attributes and data values. Typically, the data values in a tuple represent a single observation of data, such as a stock ticker quote or a temperature reading from an individual sensor.
V
- view
- The metadata that describes how the runtime system samples the tuples in a stream for visualization. See AQL view.