Although Apache™ Pig™ can be quite a powerful and simple language to use, the downside is that it’s something new to learn and master. A runtime Apache Hadoop™ support structure was developed allowing fluent SQL users (which is commonplace for relational data-base developers) to leverage the Hadoop platform. Apache Hive, allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL ones. HQL is limited in commands it understands, but still useful.
As with any database management system (DBMS), you can run your Hive queries from a command line interface (known as the Hive shell), from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application leveraging the Hive JDBC/ODBC drivers. You can run a Hive Thrift Client, within applications written in C++, Java, PHP, Python, or Ruby (much like you can use these client-side languages with embedded SQL to access a database such as Db2 or Informix).
Hive looks very much like traditional database code with SQL access. However, because Hive is based on Apache™ Hadoop™ and hive operations, there are several key differences.
The first is that Hadoop is intended for long sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very high latency (many minutes). This means that Hive would not be appropriate for applications that need very fast response times, as you would expect with a database such as Db2. Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.
If you're interested in SQL on Hadoop, in addition to Hive, IBM offers Db2 Big SQL which makes accessing Hive datasets faster and more secure. Checkout our videos, below, for a quick overview of Hive and Db2 Big SQL.