Untangling YARN: Building Multitenant Hadoop Environments Part-I
IBM Software Defined 2700052JD4 Visits (8801)
All in the Hadoop world are excited about YARN. For those who don’t follow such topics, YARN is an acronym for “yet another resource negotiator”. YARN is an important development for organizations deploying Hadoop environments.
What YARN does is essentially de-couple Hadoop workload management from resource management. This means that multiple applications can share a common infrastructure pool. While this idea is not new to many of us, it is new to Hadoop. Earlier versions of Hadoop consolidated both workload and resource management functions into a single JobTracker. This approach resulted in limitations for customers hoping to run multiple applications on the same cluster infrastructure.
Open source Hadoop 2.2.0 and later incorporate generally available versions of YARN. The community delivered the GA release in Hadoop 2.2.0 in October 2013, and major providers of Hadoop including IBM are at various stages of incorporating YARN into commercial Hadoop offerings.
Yet another resource negotiator
YARN is well named. While an important technology, the world is not suffering from a shortage of resource managers. Some Hadoop providers (including IBM) are supporting YARN while others are supporting Apache Mesos. In addition, there is a plethora of general purpose batch workload managers supporting Hadoop as “yet another workload pattern” (YAWP – you heard it here first!) on their own scheduling and resource management products. This includes our own Platform LSF where a freely available Hado
To borrow from object-oriented programming terminology, multitenancy is an “over-loaded” term. It means different things to different people depending on their orientation and context. To say a solution is multitenant is not helpful unless we are specific about the meaning. Some interpretations of multitenancy in Big Data environments are:
Organizations that are sophisticated in their view of multitenancy will need all of these capabilities and more. YARN promises to address some of these requirements, but clearly not all.
Standards ‘R’ us – but capabilities matter too
IBM’s view is simple. If an open standard solves the business problem, this is likely the best solution.
In IBM Info
Watch out for my next space where I am going to talk about true multitenant infrastructure and how IBM’s multitenant distributed computing solutions help in our client’s success. Till then stay tuned and feel free to write to me at email@example.com if you have any questions.
Gord J. Sissons
Product Marketing Manager - IBM Platform Symphony