Technical library

  • spacer Filter by products, topics, and types of content

    (115 Products)

    (113 Topics)

    (15 Industries)

    (15 Types)

 

1 - 3 of 3 results Show Summaries | Hide Summaries Subscribe to search results (RSS)

View Results
Title none Type none Date down
Parallel processing of unstructured data, Part 3: Extend the sashyReader
This series explores how to process unstructured data in parallel fashion — within a machine and across a series of machines — using the power of IBM DB2 for Linux, UNIX and Windows (LUW) and GPFS shared-nothing cluster (SNC) to provide efficient, scalable access to unstructured data through a standard SQL interface. In this article, see how the Java-based sashyReader framework leverages the architectural features in DB2 LUW. The sashyReader provides for parallel and scalable processing of unstructured data stored locally or on a cloud via an SQL interface. This is useful for data ingest, data cleansing, data aggregation, and other tasks requiring the scanning, processing, and aggregation of large unstructured data sets. You also learn how to extend the sashyReader framework to read arbitrary unstructured text data by using dynamically pluggable Python classes.
Also available in: Russian  
Articles 22 May 2014
Parallel processing of unstructured data, Part 2: Use AWS S3 as an unstructured data repository
See how unstructured data can be processed in parallel fashion. Leverage the power of IBM DB2 for Linux, UNIX and Windows to provide efficient highly scalable access to unstructured data stored on the cloud.
Also available in: Russian  
Articles 13 Mar 2014
Parallel processing of unstructured data, Part 1: With DB2 LUW and GPFS SNC
Learn how unstructured data can be processed in parallel fashion -- within a machine and across a series of machines -- by leveraging DB2 Linux, UNIX, and Windows and GPFS SNC to provide efficient highly scalable access to unstructured data, all through a standard SQL interface. Realize this capability with clusters of commodity hardware, suitable for provisioning in the cloud or directly on bare metal clusters of commodity hardware. Scalability is achieved within the framework via the principle of computation locality. Computation is performed local to the host which has direct data access, thus minimizing or eliminating network bandwidth requirements and eliminating the need for any shared compute resource.
Also available in: Russian  
Articles 30 Jan 2014

1 - 3 of 3 results Show Summaries | Hide Summaries Subscribe to search results (RSS)