The IBM Puredata System for Analytics appliances are build to store and process large amounts of data efficiently in a massively parallel processing architecture. An appliance can store many TB of compressed data and distributes the work over many CPU cores. The main components of an appliance are the host whose role is to co-ordinate and orchestrate the work performed and the SPU's (Snippet Processing Units) where the actual work takes place. Clients interact with the appliance through the host, never directly with the SPU's.
The most efficient use of the appliance's capabilities is through queries that process on the appliance, leveraging both the available in database analytics as well as the massively parallel architecture and that return a limited result set to the client (typically < 1 M rows). In this scenario, the ODBC, JDBC connectivity for the client is more than adequate.
For some applications this processing on the appliance is not an option and a large result set must be returned for processing external to the appliance. For this type of application, the appliance NPS software implements the external table mechanism that can be leveraged to efficiently return these large result sets. Using this mechanism, an appliance can stream these results at speeds up to 5 TB/hr. For this type of application, receiving the data through a 'Select * from' query using ODBC is counterproductive and essentially reduces the host to the role of intermediate storage device to hold large result sets until they are completely processed. The host was not designed for this role and using the appliance in this manner significantly reduces its capability.
To leverage the mechanisms that are best practice for this type of requirement using external tables, one should consider a common storage where to load and unload data into and from an appliance. ODBC/JDBC protocols coupled with network bandwidth can never reach the speeds of loading from storage via external tables.
At the architecture design table verify that there is no discrepancy between the requirement and the method chosen to satisfy that requirement. This is not about the notion that the appliance is unable to process this type of workload, simply that there are much better ways to achieve a desired result.
Likes before 03/04/2016 - 0
Views before 03/04/2016 - 559