What is Hive?

Although Apache Pig can be quite a powerful and simple language to use, the downside is that it’s something new to learn and master. Some folks developed a runtime Hadoop® support structure that allows anyone who is already fluent with SQL (which is commonplace for relational data-base developers) to leverage the Hadoop platform right out of the gate.

Their creation, called Apache® Hive™, allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements; now you should be aware that HQL is limited in the commands it understands, but it is still pretty useful. HQL statements are broken down by the Hive service into hive jobs and executed across a Hadoop cluster.

For anyone with a SQL or relational database background, this section will look very familiar to you. As with any database management system (DBMS), you can run your Hive queries in many ways. You can run them from a command line interface (known as the Hive shell), from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application leveraging the Hive JDBC/ODBC drivers, or from what is called a Hive Thrift Client. The Hive Thrift Client is much like any database client that gets installed on a user’s client machine (or in a middle tier of a three-tier architecture): it communicates with the Hive services running on the server. You can use the Hive Thrift Client within applications written in C++, Java, PHP, Python, or Ruby (much like you can use these client-side languages with embedded SQL to access a database such as Db2 or Informix).

Hive looks very much like traditional database code with SQL access. However, because Hive is based on Hadoop and hive operations, there are several key differences. The first is that Hadoop is intended for long sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very high latency (many minutes). This means that Hive would not be appropriate for applications that need very fast response times, as you would expect with a database such as Db2. Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.

If you're interested in SQL on Hadoop, in addition to Hive, IBM offers Big SQL which makes accessing Hive datasets faster and more secure. Checkout our videos, below, for a quick overview of Hive and Big SQL.

Access Apache Hive Data Faster and More Securely with Big SQL

Just a placeholder image

Improve Security for Hive Data using Big SQL

Just a placeholder image

Related products or solutions

IBM Big SQL screenshot

IBM Big SQL

A hybrid SQL engine for Apache Hadoop that concurrently exploits Hive, HBase and Spark using a single database connection or a single query.

Learn more

IBM Analytics for Apache Spark

IBM Analytics for Apache Spark

IBM Analytics for Apache Spark™ gives you the power of Apache Spark with integrated Jupyter Notebooks, so that you can iterate faster, and get to answers faster. The service is fully-managed, which gives you immediate access to hassle-free Apache Spark.

Learn more

IBM InfoSphere Federation Server

IBM InfoSphere Federation Server

Access and integrate diverse data and content sources as if they were a single resource - regardless of where the information resides.

Learn more

Resources

 

The Data Warehouse Evolved: A Foundation for Analytical Excellence

ReExplore a Best-in-Class approach to data management and how companies are prioritizing data technologies to drive growth and efficiency.

 

Understanding Big Data Beyond the Hype

Read this practical introduction to the next generation of data architectures that introduces the role of the cloud and NoSQL technologies and discusses the practicalities of security, privacy and governance.

 

IBM Big SQL data sheet

With Spark SQL, the fastest open source SQL engine available, amplify the power of Apache Hadoop on IBM BigInsights to create insight. Spark SQL is helping make big data environments faster than ever.

 

Accessing tables created in Hive and files added to HDFS from Big SQL

This blog will give an overview of procedures that can be taken if immediate access to these tables are needed, offer an explanation of why those procedures are required and also give an introduction to some of the features in Big SQL in this area.