Hive User-defined functions

The Db2 Big SQL environment in Hadoop includes the Hive user-defined functions package. This set of functions is an optional package that you can install to use some of the Hive open source user-defined functions in your Db2 Big SQL queries. You can also use the package as an example to implement and use other functions that might not be included in the package. The Hive user-defined functions are part of the HIVE schema.

Installing the Hive user-defined functions package

Thes Hive user-defined functions provide LANGUAGE SQL (inline) and LANGUAGE C implementations for some common Hive built-in functions.
Note: The functions and procedures are provided "AS-IS" without warranty of any kind.
Do the following steps to extract and install the package:
  1. In the Linux command line, log in as the Db2 Big SQL administrator, and provide the Db2 Big SQL administrator password:
    sudo -u bigsql bash
  2. In the Linux command line, change the directory to the location of the Hive UDFs:
    cd $BIGSQL_HOME/hive-compat
  3. Extract the $BIGSQL_HOME/hive-compat/bigsql-hiveudf-*.tar.gz file.
    tar -zxvf $BIGSQL_HOME/hive-compat/bigsql-hiveudf-*.tar.gz
    
  4. Change the directory to the extracted bigsql-hiveudf directory.
    cd bigsql-hiveudf
  5. Run the following script as user bigsql, or the Db2 Big SQL administrator, to install the Hive UDFs in your Db2 Big SQL environment:
    sh ./install-hiveudfs.sh
After the functions are installed, you can use them as part of your Db2 Big SQL queries.

List of available Hive user-defined functions

The following Hive built-in functions are supported in Db2 Big SQL:
Table 1. Hive user-defined functions
Function Name Arguments Return type
datediff

end_date varchar(50), start_date varchar(50)
int
date_add
start_date varchar(50), daysToAdd int
varchar(50)
date_sub
start_date varchar(50), daysToSubtract int
varchar(50)
decode
binToDecode varbinary(50), charSet varchar(50)
varchar(50)
e no arguments double
encode
sourceString varchar(50), charSet varchar(50)
varbinary(50)
format_number
numberToConvert double, decimalPlaces int
varchar(50)
from_unixtime
unixtime bigint
varchar(50)
from_utc_timestamp This function is available as a built-in function in the SYSIBM schema. For more information about this function, see FROM_UTC_TIMESTAMP scalar function. varchar(50)
get_json_object
JSON_TEXT varchar(32672), JSON_PATH varchar(4096)
varchar(32672)
log
base double,number double
double
log2
number double
double
parse_url
urlString varchar(32672), partToExtract varchar(4096) [, key varchar(4096)]
varchar(32672)
pi no arguments double
to_utc_timestamp This function is available as a built-in function in the SYSIBM schema. For more information about this function, see TO_UTC_TIMESTAMP scalar function. varchar(50)
unix_timestamp
[dateToConvert varchar(50) [, format varchar(50)]]
bigint

Examples

datediff
The result of the function is the number of days from start_date to end_date as an integer. In this example, the column introduction_date is a VARCHAR(15). The query returns those rows where the number of days is greater than 30.
SELECT * FROM PRODUCT WHERE HIVE.DATEDIFF('2014-06-1',introduction_date) > 30;
date_add
The result of the function adds a number of days to start_date. In this example, the column introduction_date is a VARCHAR(15). The function returns the adjusted date.
SELECT HIVE.DATE_ADD(introduction_date,1),introduction_date FROM PRODUCT;
date_sub
The result of the function subtracts a number of days from start_date. In this example, the column introduction_date is a VARCHAR(15). The function returns the adjusted date.
SELECT HIVE.DATE_SUB(introduction_date,1),introduction_date FROM PRODUCT;
decode
This function decodes the first argument into a STRING by using one of the following character sets:
  • 'US_ASCII'
  • 'ISO-8859-1'
  • 'UTF-8'
  • 'UTF-16BE'
  • 'UTF-16LE'
  • 'UTF-16'
If either argument is null, the result is also null. In the following example, the string 'abc' is displayed as a result of decoding:

SELECT hive.decode(bx'616263','UTF-8') FROM SYSIBM.SYSDUMMY1;
The output:
'abc'
SELECT hive.decode(bx'006100620063','UTF-16BE') FROM SYSIBM.SYSDUMMY1;
The output:

'abc'
e
This function returns the constant that is the base of the natural logorithm.
SELECT number, power(HIVE.e(), hive.log(HIVE.e(), number)) FROM numberTable;
encode
This function encodes the first argument into a BINARY by using one of the following character sets:
  • 'US_ASCII'
  • 'ISO-8859-1'
  • 'UTF-8'
  • 'UTF-16BE'
  • 'UTF-16LE'
  • 'UTF-16'
If either argument is null, the result will also be null. In the following example, the string 'Jürgen' is displayed in a variety of formats:
SELECT HIVE.ENCODE('Jürgen','UTF-8') FROM sysibm.sysdummy1;
The output:
'0x4AC3BC7267656E'
SELECT HIVE.ENCODE('Jürgen','UTF-16BE') FROM sysibm.sysdummy1;
The output:
'0x004A00FC007200670065006E'
format_number
This function formats the number in the first expression to a format like '#,###,###.##', rounded to the number of decimal places in the second expression. It returns the result as a STRING. If the second expression is 0, the result has no decimal point or fractional part.

SELECT hive.format_number(12332.123456, 4),
    hive.format_number(12332.1,4),
    hive.format_number(12332.2,0) FROM sysibm.sysdummy1; 
The output:
'12,332.1235'  |    '12,332.1000'   |   '12,332'
from_unixtime
This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of "1970-01-01 00:00:00". The following example returns the current date including the time.

SELECT HIVE.FROM_UNIXTIME(HIVE.UNIX_TIMESTAMP()) FROM SYSIBM.SYSDUMMY1;
get_json_object
This function extracts the JSON object from a JSON string (the first expression) that is based on the JSON path specified in the second expression. It returns a JSON string of the extracted JSON object. The result is NULL if the input JSON string is invalid. Because of Hive column name restrictions, the only valid characters in the JSON path are the characters [0-9a-z_]. Also, the keys cannot start with numbers. In the following example, the function reads the json_table.json from the json_table and extracts the fields Foo, Bar, Quuz.QuuxId, and Quux.QuuxName from each JSON object.
SELECT HIVE.GET_JSON_OBJECT(json_table.json, '$.Foo') as foo,
   HIVE.GET_JSON_OBJECT(json_table.json, '$.Bar') as bar,
   HIVE.GET_JSON_OBJECT(json_table.json, '$.Quux.QuuxId') as qid,
   HIVE.GET_JSON_OBJECT(json_table.json, '$.Quux.QuuxName') as qname 
FROM json_table;
The output is:

foo    bar              qid     qname
ABC    20090101100000   1234    Sam
log
This function returns the base logarithm of the argument in the second expression.
SELECT HIVE.LOG(3, 66) FROM sysibm.sysdummy1 ;
The output looks like the following:
+3.81358809221559E+000
log2
This function returns the base-2 logarithm of the argument.
SELECT HIVE.LOG2(44) FROM sysibm.sysdummy1 ;
The output looks like the following:
+5.45943161863730E+000
pi
This function returns the constant pi that is used in calculating the area of circle.
SELECT (hive.pi()*radius*radius) from radiiTable;
parse_url
This function returns the specified part from the URL. The valid values for the second expression, partToExtract, include the following values:
  • HOST
  • PATH
  • QUERY
  • REF
  • PROTOCOL
  • AUTHORITY
  • FILE
  • USERINFO
The following example returns 'facebook.com'.
SELECT HIVE.PARSE_URL('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') FROM sysibm.sysdummy1;
The following example extracts the value of a particular key in the QUERY, which is anything after the question mark, and returns the value 'v1'.
SELECT HIVE.PARSE_URL('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') FROM sysibm.sysdummy1;
unix_timestamp
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0. The following example returns the value 1237532400

SELECT hive.unix_timestamp('2009-03-20', 'yyyy-MM-dd') FROM SYSIBM.SYSDUMMY1;