Datameer

Datameer Analytics Solution (DAS) is a Hadoop-based solution for big data analytics that includes data source integration, storage, an analytics engine and visualization.

About this task

Follow these steps to use Datameer within the MapReduce framework in IBM® Spectrum Symphony:

Procedure

  1. Download Datameer. For details on the Cloudera’s Distribution including Hadoop (CDH) versions supported for the MapReduce framework in IBM Spectrum Symphony, see Supported distributed files systems for MapReduce or YARN integration.

    For information on installing Datameer, refer to Datameer documentation.

  2. Unzip the Datameer distribution package and install Datameer as described in the Datameer installation guide. Modify and run the following commands as they apply to your installation:

    export INSTALL_LOCATION=/opt/datameer

    cd $INSTALL_LOCATION

    unzip das-1.3.7-0.20.2.zip

    cd datameer/das-1.3.7-0.20.2

  3. Edit the etc/das-env.sh file to provide values for the following properties:
    • export DAS_PORT=8081
    • export DAS_DB_MODE=hsql-file
  4. Remove the existing Hadoop jar files and create symbolic links for your distribution’s Hadoop files (including Jackson) for HDFS integration:

    cd /opt/datameer/das-1.3.7-0.20.2/webapps/ROOT/WEB-INF/lib

    mv hadoop-0.20.2-core.jar hadoop-0.20.2-tools.jar /tmp

    ln -s /usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar .

    ln -s /usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar .

    ln -s /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar .

    ln -s /usr/lib/hadoop/hadoop-tools-0.20.2-cdh3u2.jar .

  5. Edit the $INSTALL_LOCATION/das-1.3.7-0.20.2/bin/conductor.sh file:
    1. Add PMR_LD_PATH and include it before the existing $JAVA_LIBRARY_PATH. For example:
      @@ -171,6 +171,8 @@
         JAVA_PLATFORM=`CLASSPATH=${HADOOP_JAR} $JAVA -Xmx32m 
      org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`
      
         JAVA_LIBRARY_PATH=${homeDir}/lib/native/${JAVA_PLATFORM}
      + PMR_LD_PATH="/opt/pmr/soam/mapreduce/version/os_type/lib64:
      /opt/pmr/soam/mapreduce/version/os_type/lib:
      /opt/pmr/soam/version/os_type/lib64:
      /opt/pmr/soam/version/os_type/lib::
      /opt/pmr/perf/soam/version/linux-x86_64/lib:
      /opt/pmr/perf/ego/ego_version/linux-x86_64/lib:
      /opt/pmr/soam/version/os_type/lib:/opt/pmr/ego_version/os_type/lib"
      + JAVA_LIBRARY_PATH=$PMR_LD_PATH:$JAVA_LIBRARY_PATH
      fi
      
      echo "DeployMode: $DAS_DEPLOY_MODE"
    2. Replace the custom jars with PMR_JARS and change the prefix for the custom jars.
      @@ -186,11 +188,11 @@
          
       # Link custom libraries
       if [[ $command == start || $command == restart ]]; then
      -    CUSTOM_JARS=${homeDir}/etc/custom-jars
      +    CUSTOM_JARS="/opt/pmr/soam/version/os_type/lib 
      /opt/pmr/soam/mapreduce/version/os_type/lib 
      /opt/pmr/soam/mapreduce/version/os_type/lib/cloudera-cdh3u2 
      /opt/pmr/soam/mapreduce/version/os_type/lib/hadoop-0.20.203"
      
                JAR_DIRS="${homeDir}/job-jar ${homeDir}/webapps/ROOT/WEB-INF/lib"
      -    LINK_PREFIX=das-custom-jar-
      +    LINK_PREFIX=a-
          
      -    customJars=`find $CUSTOM_JARS -name "*.jar"`
      +    customJars=`find $CUSTOM_JARS -maxdepth 1 -name "*.jar"`
                for dir in $JAR_DIRS; do
                             jars=`find $dir -name "$LINK_PREFIX*.jar"`
                             for jar in $jars; do
      
  6. Add the following properties to $INSTALL_LOCATION/$DAS/conf/das-common.properties file:

    mapreduce.application.name=application_name

    mapreduce.job.login.user=Admin

    mapreduce.job.login.password=Admin

  7. Start Datameer with the user account that has permissions to submit jobs via the mrsh utility and write to HDFS:

    cd /opt/datameer/das-1.3.7-0.20.2 && bin/conductor.sh start

  8. Configure the Hadoop cluster settings in DAS to use the Distributed mode from Administrator > Hadoop Cluster > Edit.
  9. Add the datastore and run the import job.