Parallel processing of unstructured data, Part 1, With DB2 LUW and GPFS SNC


From the developerWorks archives

Steve Raspudic and Alexander Abrashkevich

Date archived: January 13, 2017 | First published: January 30, 2014

Learn how unstructured data can be processed in parallel fashion — within a machine and across a series of machines — by leveraging IBM DB2® for Linux®, UNIX®, and Windows® and GPFS™ shared-nothing cluster (SNC) to provide efficient highly scalable access to unstructured data, all through a standard SQL interface. Realize this capability with clusters of commodity hardware, suitable for provisioning in the cloud or directly on bare metal clusters of commodity hardware. Scalability is achieved within the framework via the principle of computation locality. Computation is performed local to the host that has direct data access, thus minimizing or eliminating network bandwidth requirements and eliminating the need for any shared compute resource.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.

Zone=Information Management
ArticleTitle=Parallel processing of unstructured data, Part 1: With DB2 LUW and GPFS SNC