Data gathered for Hadoop on Linux®

The following data is gathered when running gpfs.snap with the --hadoop core argument:

The output of these commands:
- ps -elf
- netstat --nap
The content of these files:
- /var/log/hadoop
- /var/log/flume
- /var/log/hadoop-hdfs
- /var/log/hadoop-httpfs
- /var/log/hadoop-mapreduce
- /var/log/hadoop-yarn
- /var/log/hbase
- /var/log/hive
- /var/log/hive-hcatalog
- /var/log/kafka
- /var/log/knox
- /var/log/oozie
- /var/log/ranger
- /var/log/solr
- /var/log/spark
- /var/log/sqoop
- /var/log/zookeeper
- /var/mmfs/hadoop/etc/hadoop
- /var/log/hadoop/root
Note: From IBM Storage Scale 5.0.5, gpfs.snap --hadoop is able to capture the HDFS Transparency logs from the user configured directories.

Limitations of customizations when using sudo wrapper

If the sudo wrapper is in use, persistent environment variables, saved in the $HOME/.bashrc in /root/.bashrc, $HOME/.kshrc, /root/.kshrc and similar paths are not initialized when the current non-root gpfsadmin user elevates his rights with the sudo command. Thus, gpfs.snap is not able to detect any customization options for the Hadoop data collection. Starting from IBM Storage Scale 5.0, if you want to apply your customization to the Hadoop debugging data with an active sudo wrapper, then you must create symlinks from the actual log files to the corresponding locations mentioned in the list of collected log files.