Data types

To run R user-defined functions on the server requires a good understanding of how different data types in Netezza are represented and processed by the Netezza R libraries.

Analytic Executables that are written in R support a number of data types, all of which have a direct equivalent in the Netezza DBMS. Some data types can be represented precisely in R, whereas some data types must be cast to a similar data type, such as a 64-bit integer stored as numeric or double in R. There are also data types that cannot be easily supported in R. The following table identifies all available data types and shows how they are supported in the R Language Adapter.
Netezza Supported R Comments
FIXED Yes Character  
VARIABLE Yes Character  
NATIONAL_FIXED Yes Character UTF-8
NATIONAL_VARIABLE Yes Character UTF-8
BOOL Yes Logical  
INT8 Yes Integer  
INT16 Yes Integer  
INT32 Yes Integer  
INT64 Yes Double Loss of precision
NUMERIC32 No    
NUMERIC64 No    
NUMERIC128 No    
FLOAT Yes Double  
DOUBLE Yes Double  
DATE Yes Integer  
TIME Yes POSIXct  
TIMETZ Yes List Elements: time and zone
TIMESTAMP Yes Double Loss of precision
INTERVAL Yes List Elements: time and month

Character strings

N columns of types VARIABLE and FIXED are represented in R as character vectors. The Netezza NATIONAL_FIXED and NATIONAL_ VARIABLE data types result in UTF-8-encoded character strings, and are handled using Encoding(x) == "UTF-8".

Integers

Integers that are stored in one, two, and four bytes, that is, INT8, INT16 and INT32, are translated to integer vectors. When outputting data, the 4-byte-long integer value is cropped to an appropriate length.

Netezza INT64 columns are represented as numeric vectors in R. Thus, input values greater than 2 53−1 might not be represented correctly. When outputting INT64 values from an R AE, the output value is cast to numeric, and then to 8-byte-long integer.

NA values are not allowed when outputting data to the Netezza system when the output format is an integer type. R allows for NA values in the case of integers, but internally, these values are represented as an arbitrarily chosen sequence of bits. Currently the largest negative 32-bit integer, which cannot be translated to Netezza integers. If NULL should be output instead, the setOutputNull function can be used.

Boolean

The BOOL data type is represented as logical.

NA values are not allowed when outputting data to the Netezza system when the output format is Boolean. If NULL should be output instead, the setOutputNull function can be used.

Floating point

Netezza supports two floating-point numeric formats, FLOAT and DOUBLE, which are both standardized and described in “IEEE 754 Standard for Floating-Point Arithmetic.” In R, columns of these types are cast as double; similarly, outputting one of these data types means casting from double.

Numeric

Netezza numeric data types are NUMERIC32, NUMERIC64, and NUMERIC128. Currently, none of the NUMERIC types are supported in R. To avoid errors, convert the data types to REAL before using R functions on the data.

Date and time

Netezza defines a number of data and time formats: DATE, TIME, TIMETZ, TIMESTAMP, and INTERVAL.
Date
The DATE data type is stored as a 4-byte integer and represents the number of days before (-) or after (+) 1/1/2000 (January 1, 2000). In R, it is stored as an integer value.
minimal value -730,119 (1/1/0001)
maximal value 2,921,939 (12/31/9999)
Time
The TIME data type is stored as an 8-byte integer and represents the number of microseconds between midnight and one microsecond before midnight. In R, it is stored as a double value, but only the integer portion is taken into account, whereas the fractional portion is ignored.
minimal value 0 (00:00:00.000000)
maximal value 86,399,999,999
(23:59:59.999999)
Time with timezone
The TIMETZ data type consists of two fields: the standard TIME field and a timezone field that is a 4-byte integer representing the offset in seconds, sign reversed. For example, the offset of “+1 hour” is stored as -3600. The offset must be a whole number of minutes, that is, offset mod 60=0.
offset minimal value -46800 (+ 13:00:00)
offset maximal value 46740 (-12:59:00)
Timestamp
The TIMESTAMP data type is an 8 -byte integer representing the number of microseconds before (- ) or after (+) 00:00:00.0, 1/1/2000. In R, it is stored as data type double, which means that for some values greater than 2 53−1 , the rounding error might affect the value that is returned to Netezza.
minimal value -63,082,281,600,000,000 (00:00:00, 1/1/0001)
maximal value 252,455,615,999,999,999 (23:59:59.999999,
12/31/9999)
Interval
The INTERVAL data type consists of a 4-byte integer—the number of months, signed—and an 8- byte integer—the number of microseconds, signed. A configuration of both a positive (+) or negative (-) months value as well as a positive (+) or negative (-) microseconds values are possible and supported by Netezza. In R, these values are represented as an integer (months) and double (microseconds). The microsecond use means that, as with TIMESTAMP, rounding errors must be taken into account.
Note:
  • The microsecond value can be as large as the INT64 data type allows and overflows into negatives without error. In R, the microsecond value is stored as type numeric and cast to INT64 when it is sent to Netezza. Because the numeric type allows values that are larger than the allowed maximum INT64 value, it is important that values are not larger than the maximum INT64. This rule applies also to the minimal microsecond value.
  • A month is always considered to contain 30 days.
  • The months and microseconds values are stored separately; they do not exchange information.
    months minimal value 3,000,000 (-250,000 years)
    months maximal value 3,000,000 (250,000 years)
    microseconds minimal value none (minimal signed INT64)
    microseconds maximal value none (maximal signed INT64)