Topic
2 replies Latest Post - ‏2013-01-18T15:26:47Z by SystemAdmin
SystemAdmin
SystemAdmin
1485 Posts
ACCEPTED ANSWER

Pinned topic Hadoop vs WXS

‏2013-01-16T17:40:22Z |
When I read the high level description of what Hadoop does, it seems to overlap greatly with WXS. Can you help me understand the difference?
Updated on 2013-01-18T15:26:47Z at 2013-01-18T15:26:47Z by SystemAdmin
  • Mobiletechs
    Mobiletechs
    1 Post
    ACCEPTED ANSWER

    Re: Hadoop vs WXS

    ‏2013-01-17T20:49:37Z  in response to SystemAdmin
    Drawing differences between Apache Hadoop and WebSphere eXtreme Scale (WXS)

    Author: Nitin Gaur

    Hadoop:
    Apache Hadoop is a software framework (platform) that enables a distributed manipulation of vast amount of data. Introduced in 2006, it is supported by Google, Yahoo!, and IBM, to name a few. At the heart of its design is the MapReduce implementation and HDFS (Hadoop Distributed File System), which was inspired by the MapReduce (introduced by a Google paper) and the Google File System.
    MapReduce: MapReduce is a software framework introduced by Google that supports distributed computing on large data sets on clusters of computers (or nodes). It is the combination of two processes named Map and Reduce.
    Note: MapReduce applications must have the characteristic of "Map" and "Reduce," meaning that the task or job can be divided into smaller pieces to be processed in parallel. Then the result of each sub-task can be reduced to make the answer for the original task. One example of this is Website keyword searching. The searching and grabbing tasks can be divided and delegated to slave nodes, then each result can be aggregated and the outcome (the final result) is on the master node.
    In the Map process, the master node takes the input, divides it up into smaller sub-tasks, and distributes those to worker nodes. The worker node processes that smaller task, and passes the answer back to the master node.
    In the Reduce process, the master node then takes the answers of all the sub-tasks and combines them to get the output, which is the result of the original task.
    The advantage of MapReduce is that it allows for the distributed processing of the map and reduction operations. Because each mapping operation is independent, all maps can be performed in parallel, thus reducing the total computing time.

    HDFS
    From the perspective of an end user, HDFS appears as a traditional file system. You can perform CRUD actions on files with certain directory path. But, due to the characteristics of distributed storage, there are "NameNode" and "DataNode," which take each of their responsibility.
    The NameNode is the master of the DataNodes. It provides metadata services within HDFS. The metadata indicates the file mapping of the DataNode. It also accepts operation commands and determines which DataNode should perform the action and replication.
    The DataNode serves as storage blocks for HDFS. They also respond to commands that create, delete, and replicate blocks received from the NameNode.
    Use case:
    1. MapReduce Application
    2. querying the data stored on the Hadoop cluster
    3. Data integration and processing ( grid batch type (ETL) applications)
    WXS:

    WebSphere eXtreme Scale compliments the database layer to provide a fault tolerant, highly available and scalable data layer that addresses the growing concern around the data and eventually the business.

    • Scalability is never an IT problem alone. It directly impacts the business applications and the business unit that owns the applications.
    • Scalability is treated as a competitive advantage.
    • The applications that are scalable can easily accommodate growth and aid
    The business functions in analysis and business development.

    WebSphere eXtreme Scale provides a set of interconnected java processes that holds the data in memory, thereby acting as shock absorbers to the back end databases. This not only enabled faster data access, as the data is accessed from memory, but also reduces the stress on database.

    WebSphere® eXtreme Scale is an elastic, scalable, in-memory data grid. It dynamically caches, partitions, replicates, and manages application data and business logic across multiple servers. WebSphere eXtreme Scale performs massive volumes of transaction processing with high efficiency and linear scalability, and provides qualities of service such as transactional integrity, high availability, and predictable response times.
    The elastic scalability is possible through the use of distributed object caching. Elastic means the grid monitors and manages itself, allows scale-out and scale-in, and is self-healing by automatically recovering from failures. Scale-out allows memory capacity to be added while the grid is running, without requiring a restart. Conversely, scale-in allows for immediate removal of memory capacity.
    WebSphere eXtreme Scale can be used in different ways. It can be used as a very powerful cache or as a form of an in-memory database processing space to manage application state or as a platform for building powerful Extreme Transaction Processing (XTP) applications.
    Use Case:
    1. Extensible network attached cache
    2. In memory data grid
    3. Application cache ( session and data)

    References:
    1. http://www.ibm.com/developerworks/aix/library/au-cloud_apache/
    2. http://wiki.apache.org/hadoop/
    3. http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r0/index.jsp?topic=/com.ibm.websphere.extremescale.over.doc/cxsoverview.html
    4. http://hadoop.apache.org/hbase/
  • SystemAdmin
    SystemAdmin
    1485 Posts
    ACCEPTED ANSWER

    Re: Hadoop vs WXS

    ‏2013-01-18T15:26:47Z  in response to SystemAdmin
    Thanks for the answer. This validates a discussion I was in the other day where the architects were trying to exploit the complimentary parts of the tools in a single solution.

    The distinction I would like to be clear on is - it seems that the memory in Hadoop is distributed and independent across its nodes while the memory in WXS is shared across its nodes. Therefore, the type of problems one might solve with each tool will be slightly different. Does that sound correct?