Using Cascading 3.0.2 on IOP platform

The following directions detail the manual installation of software into IBM Open Platform for Apache Hadoop. These directions, and any binaries that may be provided as part of this article (either hosted by IBM or otherwise), are provided for convenience and make no guarantees as to stability, performance, or functionality of the software being installed. Product support for this software will not be provided (including upgrade support for either IOP or the software described). Questions or issues encountered should be discussed on the BigInsights StackOverflow forum or the appropriate Apache Software Foundation mailing list for the component(s) covered by this article.

Overview

Cascading is a development platform for big data and a software abstraction layer for Apache Hadoop. Cascading offers rich APIs that enable users to abstract standard data processing operations away from underlying computation fabrics. This article introduces the benefits of Cascading and shows you how to use Cascading to create and run applications over Hadoop.

Cascading benefits

Cascading supports JVM-based languages (Java, JRuby, Clojure, and so on) and enables you to create unit tests easily.
It provides rich and unified APIs without computation fabrics.
Cascading offers Hadoop development compatibility.
There is no installation; Cascading APIs are assembled when your application is built.

Configuration

You can use your favorite IDE to set up the development environment. In this section, we show you how to build a Cascading project with Gradle or Maven.

To set up Cascading as a Gradle project, put the following GRADLE file settings in the build.gradle file.

repositories {      maven { url = 'http://conjars.org/repo/' }  }  ext.cascadingVersion = '3.0.2'  dependencies {      compile ( group: 'cascading', name: 'cascading-core', version:cascadingVersion )      compile ( group: 'cascading', name: 'cascading-local', version:cascadingVersion )      compile ( group: 'cascading', name: 'cascading-hadoop', version:cascadingVersion )      compile ( group: 'cascading', name: 'cascading-xml', version:cascadingVersion )      testCompile ( group: 'cascading', name: 'cascading-platform', version:cascadingVersion )  }

To set up Cascading as a Maven project, put the following POM file settings in the pom.xml file.

<repository>     <id>conjars.org</id>     <url>http://conjars.org/repo</url>  </repository>  <properties>      <cascading.version>3.0.2</cascading.version>  </properties>  <dependency>    <groupId>cascading</groupId>    <artifactId>cascading-core</artifactId>    <version>${cascading.version}</version>  </dependency>    <dependency>    <groupId>cascading</groupId>    <artifactId>cascading-local</artifactId>    <version>${cascading.version}</version>  </dependency>    <dependency>    <groupId>cascading</groupId>    <artifactId>cascading-hadoop</artifactId>    <version>${cascading.version}</version>  </dependency>    <dependency>    <groupId>cascading</groupId>    <artifactId>cascading-xml</artifactId>    <version>${cascading.version}</version>  </dependency>    <dependency>    <groupId>cascading</groupId>    <artifactId>cascading-platform</artifactId>    <version>${cascading.version}</version>    <scope>test</scope>  </dependency>

WordCount in Cascading

In this example, the Cascading application is built by Gradle. The example WordCount application is part 2 of the Cascading Impatient Series.

Download the Cascading package to your development machine.

mkdir -p cascading-impatient  cd cascading-impatient/  wget https://github.com/Cascading/Impatient/archive/master.zip  unzip master   cd Impatient-master/part2  vim build.gradle

Search for “hadoopVersion” and modify the version, as shown in the following example:

ext.hadoopVersion = 'hadoop version' // Set the hadoop version according to the hadoop in the IOP.

Build the project by running the following command:
```
gradle clean jar  
```
Optional If Kerberos was installed, you can complete the following steps:
1. Create a user (such as ‘cascading’, for example), specifying user_name@realm_name:
```
    kadmin.local      addprinc cascading@IBM.COM      input password  
```
2. Create the user on all the nodes in your cluster:
```
    useradd cascading -g hadoop  
```
3. Log on as a secure user:
```
    su - cascading      kinit cascading  
```

Upload the JAR file to one BigInsights client node, and run the following commands:

cd /root/cascading-impatient/Impatient-master/part2  hadoop fs -put -f data/rain.txt /user/cascading/  yarn jar build/libs/impatient.jar /user/cascading/rain.txt /user/cascading/wordcount

View the results by running the following commands:

hadoop fs -get  /user/cascading/wordcount  cat wordcount/*

More examples

After downloading the resource from GitHub, you can try the entire Impatient Series. The official Cascading web site provides tutorials and examples for further study.

Tips

Using Cascading 3.0.2 on IOP platform - Hadoop Dev

Technical Blog Post

Abstract

Body

Overview

Cascading benefits

Configuration

WordCount in Cascading

More examples

UID

Share your feedback

Need support?