• 3 replies
  • Latest Post - ‏2013-12-17T09:27:50Z by HajoEhlers
24 Posts

Pinned topic GPFS and Hadoop

‏2013-12-06T10:12:35Z | bigdata gpfs hadoop hdfs tsm

Hi Everyone,

I just looking at it and are extremely new on Hadoop. Just in a stage to investigate and downloading code to install my first test clusters with Hadoop based on HortonWorks, IBM BigInsight and also a clean Apache version. (3 different Clusters)

My main goal with this test is to see how to backup and restore 100s of PB of storage located in a Hadoop Cluster based on HDFS and also GPFS.

My question to you all, is their anyone in this forum that have some knowledge how to setup Hadoop together with GPFS and do you have any comments or tips and tricks for me before I start?

For all other who are interested of the result or just are very geeky please feel free to join me on Twitter (@IssenSvensson) / DW-Forum.
I hope to start very small before New Year and grow the information during the 1st half year 2014 and maybe have a short summary at IBM Edge in June. 

Thanks for your valuable information
Christian Svensson

  • DickM
    1 Post

    Re: GPFS and Hadoop


    Does this help:

  • Neeta Garimella
    Neeta Garimella
    1 Post

    Re: GPFS and Hadoop

    • DickM
    • ‏2013-12-12T19:37:30Z

    Sorry for delayed response.  Had some issues with my ID to this forum  that prevented me from posting.


    I've written a white paper about using GPFS-FPO for big data environment and it should be made available on the external website by EOY.. IBMers can get a copy from my Cattail account.

    Given your efforts are targeted about backup restore of PBs of data,  the approach would depend on what type of backup/restore solution you are after. For backup to external/tape-based systems, you can use TSM (using mmbackup integration) following some best practices that would create new data files(as opposed to keep appending  to existing files forever) to ensure backup data is manageable in  the backup window.   TSM backup is at file level so entire file will be backed up even if a few blocks are appended.  Use of fileset level snapshot will be recommended for most restore needs.  

    For DR purposes, AFM replication can be considered, though one must be very careful as DR using AFM is not officially supported and needs to be simulated using AFM's independent writer  cache mode.

  • HajoEhlers
    254 Posts

    Re: GPFS and Hadoop


    Backup/Restore DR -  Take a look at

     * HPSS -

     * LTFS 

    QSTAR -

    IBM -