Start of change

Data gathered for hadoop on Linux

The following data is gathered when running gpfs.snap with the --hadoop core argument:

  1. The output of these commands:
    • ps -elf
    • netstat --nap
  2. The content of these files:
    • /var/log/hadoop
    • /var/log/flume
    • /var/log/hadoop-hdfs
    • /var/log/hadoop-httpfs
    • /var/log/hadoop-mapreduce
    • /var/log/hadoop-yarn
    • /var/log/hbase
    • /var/log/hive
    • /var/log/hive-hcatalog
    • /var/log/kafka
    • /var/log/knox
    • /var/log/oozie
    • /var/log/ranger
    • /var/log/solr
    • /var/log/spark
    • /var/log/sqoop
    • /var/log/zookeeper
    • /usr/lpp/mmfs/hadoop/etc/hadoop
    • /usr/lpp/mmfs/hadoop/logs

    The user can customize hadoop.snap.py to include the user defined files and directories into the snap, by listing these custom files and directories in the environment variable HADOOP_LOG_DIRS. This helps users to set up the hadoop.snap for using custom paths for the hadoop-installation or for including some special files.

    In this case the syntax of the contents of the environment variable HADOOP_LOG_DIRS is:
    pathname1[;pathname2[;pathname3[...]]]
    where pathname1..pathnameN are file path names (wildcard usage allowed)/directory path names. For directory path names all files in these directories are collected recursively.
End of change