Hello, OK I have been doing some reading, and this is what I understand.
We load data into HDFS fist ,then we will use Hbase,PIG and HIVE schema at the query time.
At the query time Hadoop will load the data from HDFS into the schema created by Hbase,PIG and HIVE and make it available for BI reporting tools to be able to report on the data stored in HDFS.
Did I understand this correctly?
Thanks a lot
Pinned topic PIG and HIVE dataload question
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-10-15T19:15:03Z at 2012-10-15T19:15:03Z by SystemAdmin
Re: PIG and HIVE dataload question2012-10-12T23:51:22ZThis is the accepted answer. This is the accepted answer.Hi David
Please note that PIG and HIVE are programming languages, while HBASE is a storage.
Otherwise, you got the concept right.
Re: PIG and HIVE dataload question2012-10-13T14:34:25ZThis is the accepted answer. This is the accepted answer.Thanks a lot. Having said that Hbase is a storage, does it mean data can be loaded into Hbase and HDFS but not both, even though Hbase can use HDFS as its data storage.
We only load data once lets say to Hbase, Hbase then can use HDFS as its data file storage can we then use PIG and HIVE to query data stored in Hbase?
firstname.lastname@example.org 2700005F2D3 Posts
Re: PIG and HIVE dataload question2012-10-15T19:06:10ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin 110000D4XK
Think of HBase as Hadoop version of a database. HDFS as a file system.
First you can load the data into a directory in HDFS and then write a Java application or Jaql script to load the data into HBase. You would want to do this if you wanted to change the sequence of columns loaded into HBase or if you wanted to make compress column or if you want a backup of the file in HDFS.
This would mean you place the files into HDFS as an intermediate staging location or as a backup. Once the data is loaded into HBase you can either delete the file from HDFS or keep it there depending how much storage you have.
The data stored in HBase is written in HDFS, typically in the hdfs://<hostname>/hbase
The data in HBase is written/stored as binary. To query out of HBase you have the the options of HBase shell, Java, Jaql, HiveQL.