Data gathered for hadoop on Linux

The following data is gathered when running gpfs.snap with the --hadoop core argument:

  1. The output of these commands:
    • ps -elf
    • netstat --nap
  2. The content of these files:
    • /var/log/hadoop
    • /var/log/flume
    • /var/log/hadoop-hdfs
    • /var/log/hadoop-httpfs
    • /var/log/hadoop-mapreduce
    • /var/log/hadoop-yarn
    • /var/log/hbase
    • /var/log/hive
    • /var/log/hive-hcatalog
    • /var/log/kafka
    • /var/log/knox
    • /var/log/oozie
    • /var/log/ranger
    • /var/log/solr
    • /var/log/spark
    • /var/log/sqoop
    • /var/log/zookeeper
    • /var/mmfs/hadoop/etc/hadoop
    • /var/log/hadoop/root

    The user can customize hadoop.snap.py to include the user defined files and directories into the snap, by listing these custom files and directories in the environment variable HADOOP_LOG_DIRS. This helps users to set up the hadoop.snap for using custom paths for the hadoop-installation or for including some special files.

    In this case the syntax of the contents of the environment variable HADOOP_LOG_DIRS is:
    pathname1[;pathname2[;pathname3[...]]]
    where pathname1..pathnameN are file path names (wildcard usage allowed)/directory path names. For directory path names all files in these directories are collected recursively.

Limitations of customizations when using sudo wrapper

If the sudo wrapper is in use, persistent environment variables, saved in the $HOME/.bashrc in /root/.bashrc, $HOME/.kshrc, /root/.kshrc and similar paths are not initialized when the current non-root gpfsadmin user elevates his rights with the sudo command. Thus gpfs.snap will not be able to detect any customization options for the Hadoop data collection. Starting from IBM Spectrum Scale version 5.0, if you want to apply your customization to the Hadoop debugging data with an active sudo wrapper, you must create symlinks from the actual log files to the corresponding locations mentioned in the list of collected log files.