- Ambari Metrics Collector logs: /var/log/ambari-metrics-collector/
- Verify available disk space in “df -h”
- Verify available memory “free -m”
- See running processes/cpu usage for user ams “top -u ams”
- See whether the HBase Regionserver and HBase HMaster are still running “ps -ef | grep ams”
Issue | Possible Cause(s) | Resolution |
Ambari Metrics Collector process is using 100% of available CPU’s. Any service (including Ambari Web UI) running on the same host as Ambari Metrics Collector becomes slow/unresponsive. | Ambari Metrics Collector is running on the same node as Ambari Server Ambari Metrics is running in embedded mode | C A |
ams-hbase*.log shows multiple zookeeper timeouts | CPU Contention on the Metrics Collector Host when running in embedded mode | A B |
Metrics for CPU, Network, among other ‘go missing’ from the Ambari web UI | CPU Contention caused by disk r/w bottleneck; ams-hbase master heapsize too low | A D |
Metrics collector fails to start, “port in use” or “Binding to port -1” | Port 61181 doesn’t get stopped | E |
After adding hosts to Ambari for a total of > 100 hosts, UI error is thrown “Validation failed. Config validation failed” | stack-advisor fails to update 1 property for that range of hosts | F |
GC Options applied to Ambari Metrics Collector are not applied to collector process | AMBARI-14945 | G |
- In the Ambari Web UI, select the Ambari Metrics service and navigate to Configs. Update the following properties:
- Restart Metrics Collector and affected Metrics monitors
- If your host has multiple disks, modify the default value used for hbase.root.dir and hbase.tmp.dir, preferably to a lower-utilized disk than the OS is running on
- Delete the contents of the ZooKeeper tmp snapshot dir. This will delete any unsaved metrics, effectively removing the backlog/bottleneck caused by disk contention.
rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper - Lower the TTL metrics aggregation. By default these are collected every 2 minute, reducing this to 5min or higher will significantly reduce the extended lag/cpu spikes caused — though you will still see CPU spikes on the new TTL intervals for a short period. Note, in Ambari 2.2 the default value here has been increased to 5 min.
In the Ambari Web UI, modify the configs for ams-site
timeline.metrics.host.aggregator.minute.interval : 300
- Stop Ambari Metrics Service
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}' http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/services/AMBARI_METRICS/components/METRICS_COLLECTOR curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}' http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/services/AMBARI_METRICS/components/METRICS_MONITOR
- Delete the Ambari Metrics Collector from the old host
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/hosts/old.metrics.collector.host/host_components/METRICS_COLLECTOR
- Add the Ambari Metrics Collector component to the new host
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/hosts/new.metrics.collector.host/host_components/METRICS_COLLECTOR
- Install the Ambari Metrics Collector component on the new host
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/hosts/new.metrics.collector.host/host_components/METRICS_COLLECTOR
- Update the Collector hostname used by Metrics Monitory on all hosts in your ambari cluster. The collector hostname is stored the ‘metrics_server’ property in /etc/ambari-metrics-monitor/conf/metric_monitor.ini
#Run on every host in the cluster sed -i 's/old.collector.hostname/new.collector.hostname/' /etc/ambari-metrics-monitor/conf/metric_monitor.ini
- Start Ambari Metrics service, either via UI or curl call below
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context”:"Start All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}' http://ambari.server.host:8080/api/v1/clusters/your_cluster_name/services/AMBARI_METRICS/components/METRICS_COLLECTOR
Property | Recommended Value 1-50 Nodes |
hbase_master_heapsize | 2048 |
hbase_regionserver_heapsize | 2048 |
metrics_collector_heapsize | 1024 |
Ensure the port used by the embedded ams zookeeper is free on the collector host,
hbase.zookeeper.property.clientPort default valus is: 61181:
netstat -nltp | grep 61181
Free up this port or change the default clientPort to a free port and restart ambari metrics collector
Option 1: Edit stack_advisor directly:
#Open the BI 4.0 stack advisor on the Ambari Server node(4.1 inherits 4.0) vim /var/lib/ambari-server/resources/stacks/BigInsights/4.0/services/stack_advisor.py #At line 684 "totalHostsCount = len(hosts["items"])" #Add the following bolded line totalHostsCount = len(hosts["items"]) putAmsHbaseEnvProperty("hbase_master_heapsize", "512m") #Restart ambari-server
Option 2: Download tar.gz with patched stack_advisor.py
wget http://developer.ibm.com/hadoop/wp-content/uploads/sites/28/2016/02/stack_advisor_ams_patch.tar_.gz tar -C /var/lib/ambari-server/resources/stacks/BigInsights/4.0/services/ -xvfz stack_advisor_ams_patch.tar.gz
G. Ambari Metrics Collector start script patch for java GC options
The Ambari Metrics Collector start script doesn’t properly read the java options used for the collector process. This causes all GC options to be skipped. AMBARI-14945
#Update the /usr/sbin/ambari-metrics-collector script to remove extra quotes on AMS_COLLECTOR_OPTS sed -i 's/"${AMS_COLLECTOR_OPTS}"/${AMS_COLLECTOR_OPTS}/' /usr/sbin/ambari-metrics-collector