Data gathered for hadoop on Linux
The following data is gathered when running gpfs.snap with the --hadoop core argument:
- The output of these commands:
- ps -elf
- netstat --nap
- The content of these files:
- /var/log/hadoop
- /var/log/flume
- /var/log/hadoop-hdfs
- /var/log/hadoop-httpfs
- /var/log/hadoop-mapreduce
- /var/log/hadoop-yarn
- /var/log/hbase
- /var/log/hive
- /var/log/hive-hcatalog
- /var/log/kafka
- /var/log/knox
- /var/log/oozie
- /var/log/ranger
- /var/log/solr
- /var/log/spark
- /var/log/sqoop
- /var/log/zookeeper
- /usr/lpp/mmfs/hadoop/etc/hadoop
- /usr/lpp/mmfs/hadoop/logs
The user can customize hadoop.snap.py to include the user defined files and directories into the snap, by listing these custom files and directories in the environment variable HADOOP_LOG_DIRS. This helps users to set up the hadoop.snap for using custom paths for the hadoop-installation or for including some special files.
In this case the syntax of the contents of the environment variable HADOOP_LOG_DIRS is:
where pathname1..pathnameN are file path names (wildcard usage allowed)/directory path names. For directory path names all files in these directories are collected recursively.pathname1[;pathname2[;pathname3[...]]]