See the WebSphere eXtreme Scale Wiki for links to eXtreme Scale Version 7.0 documentation.
If you log in
with your developerWorks ID, you can leave comments and feedback for the development team.
A stream query is a continuous query over streaming data. ObjectGrid is a data caching product which stores data in maps. The streaming data are the in-flight data that stored or have been stored in ObjectGrid maps.
1. A motivation example
For example, if you are a system administrator and want to monitor the memory usage in a cluster of 50 servers, you would create an ObjectGrid map currentServerMemory to record the current memory usages. You write a simple program to update this map with the current memory usage every second. The key to the ObjectGrid map is the server name, and the value stored in this map is a ServerMemoryUsage object, which contains the server name, the used memory, and the available memory of the system.
From this map, you can monitor the current memory usage for all 50 servers. However, you cannot get the history memory usage for each server or do any statistical analysis over the memory usages. A stream query can be used to solve this problem.
A data stream in this case is a series of memory usage data updated to the map over the desired period; for example, the past 5 minutes. Because the map is updated every second, this series of data is a one-second sample of the server memory usages.
With the stream query feature, you can build queries to retrieve the statical information you want. Here are a few examples:
- The average memory utilization over the past 5 minutes for all servers
- The five servers with the highest or lowest average memory utilization over the past minute
- The server with the highest or lowest average available memories over the past 30 seconds
- The server with the highest increasing rate of memory usage compared to 30 seconds ago.
These queries can help you spot troubles, make decisions on how to balance the load of the systems, and predict future behaviors based on the latest trend.
2. the stream query runtime architecture
At the heart of the stream query is the Stream Processing Technology (SPT) engine. It is designed for real-time high performance mission-critical operations. Fig.1 shows the diagram of how an SPT engine is used in ObjectGrid environment.
Fig.1 Stream query architecture
The SPT engine uses the concepts of streams and views. A stream represents a stream of raw incoming data. There can be any number of streams and these streams can update with any frequency. Each stream is associated with an ObjectGrid stream map. Whenever there is an insertion or update to the ObjectGrid stream map, a stream event is generated. Each stream has its own schema to define the data and type. The schema is defined using a special SPTSQL statement, which has similar syntax as SQL. In the following text, SQL and SPTSQL are used alternately.
The output from the SQL statements are called views. These SQL statements are used to define the processing rules. The rules can be very complex. For example, you can join multiple streams, apply aggregation and computations, apply window operations, and group results. Reference the Stream query engine language tutorial for detailed syntaxes. These SQL statements are analyzed, and a graph is constructed as shown in Fig. 1. Each node in the graph has an input and output. The input of the stream nodes are the ObjectGrid stream maps, and the output of the view nodes are the ObjectGrid view maps. The intermediate nodes connect with each other to apply the processing rules.
Each time a stream is updated, the SPT engine populates the changes through the nodes, and finally updates the views. The view changes (insertions, updates, and deletions) are then published to ObjectGrid maps.
Stream queries are similar to database queries in terms of how they analyze the data. The difference is that they are event driven, operating continuously on data as it arrives and updating the results in real-time.
© Copyright IBM Corporation 2007,2009. All Rights Reserved.