Linux-UNIX: Hadoop configuration

This topic introduces fundamental concepts and processes for monitoring Hadoop data with Guardium.

Capacity planning

The following sizing guidelines assume an average volume of audited traffic. Higher volumes of audited traffic may require additional resources.
  • 10 management or server nodes per collector
  • 20 or more data nodes per collector
  • Possibly additional nodes per collector if physical appliances are used
It is also possible to size by the Processor Value Unit (PVU) of the nodes, but this may result in over-sizing if auditing low volumes of traffic. The capacity sizing guideline is 4000 PVU per collector.

There are a few options for integrating with Hadoop. The Ranger HDFS integration supports newer versions of Hortonworks and Cloudera (CDP 7.x). Use this integration when possible. Otherwise, choose your integration according to you cluster version. All integrations support SSL.