Topic
4 replies Latest Post - ‏2012-10-15T19:15:03Z by SystemAdmin
SystemAdmin
SystemAdmin
603 Posts
ACCEPTED ANSWER

Pinned topic PIG and HIVE dataload question

‏2012-10-11T11:52:06Z |
Hello, OK I have been doing some reading, and this is what I understand.
We load data into HDFS fist ,then we will use Hbase,PIG and HIVE schema at the query time.
At the query time Hadoop will load the data from HDFS into the schema created by Hbase,PIG and HIVE and make it available for BI reporting tools to be able to report on the data stored in HDFS.

Did I understand this correctly?

Thanks a lot
Updated on 2012-10-15T19:15:03Z at 2012-10-15T19:15:03Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    603 Posts
    ACCEPTED ANSWER

    Re: PIG and HIVE dataload question

    ‏2012-10-12T23:51:22Z  in response to SystemAdmin
    Hi David

    Please note that PIG and HIVE are programming languages, while HBASE is a storage.
    Otherwise, you got the concept right.

    Thanks,

    Zach
  • SystemAdmin
    SystemAdmin
    603 Posts
    ACCEPTED ANSWER

    Re: PIG and HIVE dataload question

    ‏2012-10-13T14:34:25Z  in response to SystemAdmin
    Thanks a lot. Having said that Hbase is a storage, does it mean data can be loaded into Hbase and HDFS but not both, even though Hbase can use HDFS as its data storage.
    We only load data once lets say to Hbase, Hbase then can use HDFS as its data file storage can we then use PIG and HIVE to query data stored in Hbase?

    Regards
    • chenti@us.ibm.com
      chenti@us.ibm.com
      3 Posts
      ACCEPTED ANSWER

      Re: PIG and HIVE dataload question

      ‏2012-10-15T19:06:10Z  in response to SystemAdmin
      Hi David,

      Think of HBase as Hadoop version of a database. HDFS as a file system.

      First you can load the data into a directory in HDFS and then write a Java application or Jaql script to load the data into HBase. You would want to do this if you wanted to change the sequence of columns loaded into HBase or if you wanted to make compress column or if you want a backup of the file in HDFS.

      This would mean you place the files into HDFS as an intermediate staging location or as a backup. Once the data is loaded into HBase you can either delete the file from HDFS or keep it there depending how much storage you have.

      The data stored in HBase is written in HDFS, typically in the hdfs://<hostname>/hbase

      The data in HBase is written/stored as binary. To query out of HBase you have the the options of HBase shell, Java, Jaql, HiveQL.

      • tina
  • SystemAdmin
    SystemAdmin
    603 Posts
    ACCEPTED ANSWER

    Re: PIG and HIVE dataload question

    ‏2012-10-15T19:15:03Z  in response to SystemAdmin
    Thanks so much, problem is data is duplicated into different storage media , HDFS and Hbase.