The AssemblyLine
An AssemblyLine (AL) is a set of components strung together to move and transform data; it describes the "route" along which the data will pass. The data that is been handled through that journey is represented as an Entry object. The AssemblyLine works with a single entry at a time on each cycle of the AssemblyLine. It is the unit of work in IBM® Security Verify Directory Integrator and typically represents a flow of information from one or more data sources to one or more targets.
Overview
Some of the components that comprise the AssemblyLine retrieve data from one or more connected systems—data obtained this way is said to "feed" the AL. Data to be processed is fed into the AL one Entry at a time, where these Entries carry Attributes with values coming from directory entries, database rows, emails, Lotus® Notes® documents, records or similar data objects. Each entry carries Attributes that hold the data values read from fields or columns in the source system. These Attributes are renamed, reformatted or computed as processing flows from one component to the next in the AL. New information can be "joined" from other sources and all or parts of the transformed data can be written to target stores or sent to target systems as required. This can be illustrated thus:
In this diagram, picture the collection of large jigsaw puzzle pieces as the AssemblyLine, the leftmost blue dots and squares in the grey stream entering from below as raw data from an input stream, and the purple bits on the top right as data output on an output stream. The darker orange element intersecting a jigsaw piece with the bucket in it denotes a Parser, turning raw data into structured data, which then can start travelling down the AssemblyLine (as lighter-colored elements in a bucket). The middle jigsaw piece pictures a Connector reading already-structured data from for example a database.
Data enters the AssemblyLine from connected systems using Connectors in some sort of input Mode, and is output later to one or more connected systems using Connectors in some output Mode.
Data can either be read from record-oriented systems like a database or a message queue: in this case the various columns in the input are readily mapped into Attributes in the resulting work Entry, which is depicted as a "bucket" in the puzzle piece on the left. Or, data can be read from a data stream, like a text file in a filesystem, a network connection, and so forth. In this case, a Parser can be prefixed to the Connector, in order to make sense of the input stream, and cut it up into pieces after which it can be assigned to Attributes in the work Entry.
Once the first Connector has done its work, the bucket of information (the "work Entry", called, appropriately, work) is passed along the AssemblyLine to the next Component—in the illustration, another Connector. Since the data from the first Connector is available, it can now be used as key information to retrieve, or lookup data in the second connected system. Once the relevant data is found, it can be merged into work, complementing the data that is still around from the first Connector.
Finally, the merged data is passed along the AssemblyLine to the third puzzle piece or Connector this time in some output Mode, which takes care of outputting the data to the connected system. If the connected system is record-oriented the various Attributes in work are just mapped to columns in the record; if the connected system is stream-oriented, a Parser can do the necessary formatting.
Other components, like Script Components and Functions, can be inserted at will in the AssemblyLine to perform operations on the data in work.
It is important to keep in mind that the AssemblyLine is designed and optimized for working with one item at a time. However, if you want to do multiple updates or multiple deletes (for example, processing more than a single item at the time) then you must write AssemblyLine scripts to do this. If necessary, this kind of processing can be implemented using JavaScript, Java™ libraries and standard IBM® Security Verify Directory Integrator functionality, such as pooling the data to a sorted data store, for example with the JDBC Connector, and then reading it back and processing it with a second AssemblyLine.
AssemblyLines are built, configured and tested using the IBM® Security Verify Directory Integrator Config Editor (CE), see The Configuration Editor for more information. The AssemblyLine has a Data Flow tab in the Config Editor. This is where the list of components that make up this AL are kept.
All components in an AL are automatically registered as script variables. So if you have a Connector called ReadHRdump then you can access it and its methods directly from script using the ReadHRdump variable. As a result, you will want to name your AL components as you would script variables: Use alphanumeric characters only, do not start the name with a number, and do not use special national characters (for example, å, ä), separators (apart from underscore '_'), white space, and so forth.
There is always an alternative method for accessing an AL component (for example, the task.getConnector()
function) but a conscious naming convention is always advisable.
Starting an AssemblyLine in IBM® Security Verify Directory Integrator is a fairly costly operation, as it involves the creation of a new Java thread and usually sets up connections to one or more data sources. Consider carefully if your solution design could be made to work with fewer, rather than more, distinct AssemblyLines, where each AssemblyLine does more work; for example, by using Branches or Switches to define multiple operations handled by a single AL. Note that each operation can still be implemented as a separate AssemblyLine, but these can be embedded "hot-and-ready" into a single AL that dispatches work to them by using the AL Connector or AL Function. This also allows you to leverage features like Global Connector Pools to manage resource usage and boost performance and scalability.
Components
AssemblyLines can include the following components:
Additionally, Connectors can have Parsers configured; also, at System, Config, AssemblyLine, Attribute map and Attribute level there are options to configureNull Behavior.
Accessing AL components inside the AssemblyLine
Each AL component is available as a pre-registered script variable with the name you chose for the component.
Note that you can dynamically load components with scripted calls to functions like system.getConnector()
, although this is not for inexperienced users.¹
AssemblyLine parameter passing
There are three ways for data to get into an AssemblyLine:
- Generating your own initial entry inside the AssemblyLine; for example, in a Prolog script.
- Fed from one or more Iterators².
- Starting the AssemblyLine with parameters from another AssemblyLine using the AL Connector or AL Function Component, or using an API call.
If you want to start an AssemblyLine with parameters from another AssemblyLine, then you have a couple of options:
-
Use the Task Call Block (TCB), which is the preferred method. See Task Call Block (TCB) for more information. This section also discusses techniques for dynamically disabling and enabling AssemblyLine components.
-
Provide an Initial Work Entry directly; refer to Providing an Initial Work Entry (IWE) for details. Note: These options are provided for compatibility with earlier versions.
-
Connectors
Connectors are used to access and update information sources. The job of a Connector is to level the playing field so that you do not have to deal with the technical details of working with various data stores, systems, services or transports. As such, each type of Connector is designed to use a specific protocol or API, handling the details of data source access so that you can concentrate on the data manipulations and relationships, as well as custom processing like filtering and consistency control. -
Functions
A Function, often referred to as a Function Component (FC), is a component much like a Connector, except that it does not have a mode setting. Whereas Connectors provide standard access verbs for connected systems (Lookup, Delete, Update, and so forth), Functions on the other hand only perform a single operation, like pushing data through a Parser, dispatching work to another AssemblyLine or making a web service call. -
Script Components
The Script Component (SC) is a user-defined block of JavaScript code that you can drop any place in the AssemblyLine data Flow list, alongside your Connectors and Function Components, causing the script code within to be executed for each cycle at this point in the AL workflow. -
AttributeMaps
Attribute Maps are pathways for data to flow into or out of the AssemblyLine. Attribute Maps appear in Connectors and Functions as Input and Output Maps, and are also available as stand-alone components in the AssemblyLine. -
Branch Components
Branch components affect the order in which the other components, such as, Connectors, Scripts, Functions, AttributeMaps and other Branch components, are executed in the Flow of the AssemblyLine. -
Parsers
Parsers are used in conjunction with a byte stream component, for example, a File System Connector, to interpret or generate the structure of content being read or written. -
Controlling the flow of an AssemblyLine
Hooks are programmable waypoints in the built-in automated behavior of IBM® Security Verify Directory Integrator, where you can impose your own logic.
¹ The Connector object you get from this call is a Connector Interface object, and is the data source specific part of an AssemblyLine Connector. When you change the type of any Connector, you are actually swapping out its data source intelligence (the Connector Interface) which provides the functionality for accessing data on a specific system, service or data store. Most of the functionality of an AssemblyLine Connector, including the attribute maps, Link Criteria and Hooks, is provided by the kernel and is kept intact when you switch Connector types.
² An Iterator is a shorthand notation of a Connector in Iterator Mode.