Yet another resource negotiator
YARN is well named. While an important technology, the world is not suffering from a shortage of resource managers. Some Hadoop providers (including IBM) are supporting YARN while others are supporting Apache Mesos. In addition, there is a plethora of general purpose batch workload managers supporting Hadoop as “yet another workload pattern” (YAWP – you heard it here first!) on their own scheduling and resource management products. This includes our own Platform LSF where a freely available Hadoop Connector for LSF enables existing Platform Computing customers to support Hadoop MapReduce applications natively on existing HPC clusters. Also, many distributed applications embed their own proprietary solutions for workload management in clustered environments or support one or more commercial solutions. In short –workload and resource management for distributed applications is a big topic.
To borrow from object-oriented programming terminology, multitenancy is an “over-loaded” term. It means different things to different people depending on their orientation and context. To say a solution is multitenant is not helpful unless we are specific about the meaning. Some interpretations of multitenancy in Big Data environments are:
- Support for multiple concurrent Hadoop jobs
- Support for multiple lines of business on a shared infrastructure
- Support for multiple application workloads of different types (Hadoop and non-Hadoop)
- Provisions for security isolation between tenants
- Contract-oriented service level guarantees for tenants
- Support for multiple versions of applications and application frameworks concurrently
IBM’s view is simple. If an open standard solves the business problem, this is likely the best solution.
In IBM InfoSphere BigInsights, IBM offers a 100% standard Hadoop implementation as the default choice, but also provides additional capabilities for clients that need them. Good examples of innovations that can be optionally deployed are IBM GPFS for clients needing a POSIX compliant alternative to HDFS, Adaptive MapReduce for clients needing agile high-performance scheduling, and IBM Big SQL for clients needing a 100% ANSI compliant SQL interface to data residing in HBASE or other data formats on Hadoop or other data sources. Customers can choose to use YARN as a default resource manager, but for those with more challenging problems, IBM offers more capable solutions.
Watch out for my next space where I am going to talk about true multitenant infrastructure and how IBM’s multitenant distributed computing solutions help in our client’s success. Till then stay tuned and feel free to write to me at firstname.lastname@example.org if you have any questions.