The following directions detail the manual installation of software into IBM Open Platform for Apache Hadoop. These directions, and any binaries that may be provided as part of this article (either hosted by IBM or otherwise), are provided for convenience and make no guarantees as to stability, performance, or functionality of the software being installed. Product support for this software will not be provided (including upgrade support for either IOP or the software described). Questions or issues encountered should be discussed on the BigInsights StackOverflow forum or the appropriate Apache Software Foundation mailing list for the component(s) covered by this article.
Overview
Cascading is a development platform for big data and a software abstraction layer for Apache Hadoop. Cascading offers rich APIs that enable users to abstract standard data processing operations away from underlying computation fabrics. This article introduces the benefits of Cascading and shows you how to use Cascading to create and run applications over Hadoop.
Cascading benefits
- Cascading supports JVM-based languages (Java, JRuby, Clojure, and so on) and enables you to create unit tests easily.
- It provides rich and unified APIs without computation fabrics.
- Cascading offers Hadoop development compatibility.
- There is no installation; Cascading APIs are assembled when your application is built.
Configuration
You can use your favorite IDE to set up the development environment. In this section, we show you how to build a Cascading project with Gradle or Maven.
To set up Cascading as a Gradle project, put the following GRADLE file settings in the build.gradle file.
repositories { maven { url = 'http://conjars.org/repo/' } } ext.cascadingVersion = '3.0.2' dependencies { compile ( group: 'cascading', name: 'cascading-core', version:cascadingVersion ) compile ( group: 'cascading', name: 'cascading-local', version:cascadingVersion ) compile ( group: 'cascading', name: 'cascading-hadoop', version:cascadingVersion ) compile ( group: 'cascading', name: 'cascading-xml', version:cascadingVersion ) testCompile ( group: 'cascading', name: 'cascading-platform', version:cascadingVersion ) } To set up Cascading as a Maven project, put the following POM file settings in the pom.xml file.
<repository> <id>conjars.org</id> <url>http://conjars.org/repo</url> </repository> <properties> <cascading.version>3.0.2</cascading.version> </properties> <dependency> <groupId>cascading</groupId> <artifactId>cascading-core</artifactId> <version>${cascading.version}</version> </dependency> <dependency> <groupId>cascading</groupId> <artifactId>cascading-local</artifactId> <version>${cascading.version}</version> </dependency> <dependency> <groupId>cascading</groupId> <artifactId>cascading-hadoop</artifactId> <version>${cascading.version}</version> </dependency> <dependency> <groupId>cascading</groupId> <artifactId>cascading-xml</artifactId> <version>${cascading.version}</version> </dependency> <dependency> <groupId>cascading</groupId> <artifactId>cascading-platform</artifactId> <version>${cascading.version}</version> <scope>test</scope> </dependency> WordCount in Cascading
In this example, the Cascading application is built by Gradle. The example WordCount application is part 2 of the Cascading Impatient Series.
- Download the Cascading package to your development machine.
mkdir -p cascading-impatient cd cascading-impatient/ wget https://github.com/Cascading/Impatient/archive/master.zip unzip master cd Impatient-master/part2 vim build.gradle - Search for “hadoopVersion” and modify the version, as shown in the following example:
ext.hadoopVersion = 'hadoop version' // Set the hadoop version according to the hadoop in the IOP. - Build the project by running the following command:
gradle clean jar - Optional If Kerberos was installed, you can complete the following steps:
- Create a user (such as ‘cascading’, for example), specifying user_name@realm_name:
kadmin.local addprinc cascading@IBM.COM input password - Create the user on all the nodes in your cluster:
useradd cascading -g hadoop - Log on as a secure user:
su - cascading kinit cascading
- Create a user (such as ‘cascading’, for example), specifying user_name@realm_name:
- Upload the JAR file to one BigInsights client node, and run the following commands:
cd /root/cascading-impatient/Impatient-master/part2 hadoop fs -put -f data/rain.txt /user/cascading/ yarn jar build/libs/impatient.jar /user/cascading/rain.txt /user/cascading/wordcount - View the results by running the following commands:
hadoop fs -get /user/cascading/wordcount cat wordcount/*
More examples
After downloading the resource from GitHub, you can try the entire Impatient Series. The official Cascading web site provides tutorials and examples for further study.