Data types
To run R user-defined functions on the server requires a good understanding of how different data types in Netezza are represented and processed by the Netezza R libraries.
| Netezza | Supported | R | Comments |
|---|---|---|---|
| FIXED | Yes | Character | |
| VARIABLE | Yes | Character | |
| NATIONAL_FIXED | Yes | Character | UTF-8 |
| NATIONAL_VARIABLE | Yes | Character | UTF-8 |
| BOOL | Yes | Logical | |
| INT8 | Yes | Integer | |
| INT16 | Yes | Integer | |
| INT32 | Yes | Integer | |
| INT64 | Yes | Double | Loss of precision |
| NUMERIC32 | No | ||
| NUMERIC64 | No | ||
| NUMERIC128 | No | ||
| FLOAT | Yes | Double | |
| DOUBLE | Yes | Double | |
| DATE | Yes | Integer | |
| TIME | Yes | POSIXct | |
| TIMETZ | Yes | List | Elements: time and zone |
| TIMESTAMP | Yes | Double | Loss of precision |
| INTERVAL | Yes | List | Elements: time and month |
Character strings
N columns of types VARIABLE and FIXED are represented in R as
character vectors. The Netezza NATIONAL_FIXED and NATIONAL_
VARIABLE data types result in UTF-8-encoded character strings, and are handled using
Encoding(x) == "UTF-8".
Integers
Integers that are stored in one, two, and four bytes, that is, INT8,
INT16 and INT32, are translated to integer vectors. When
outputting data, the 4-byte-long integer value is cropped to an appropriate length.
Netezza INT64 columns are represented as numeric vectors in R. Thus, input
values greater than 2 53−1 might not be represented correctly. When outputting
INT64 values from an R AE, the output value is cast to numeric, and then to
8-byte-long integer.
NA values are not allowed when outputting data to the Netezza system when the output format is an integer type. R allows for NA values in the case of integers, but internally, these values are represented as an arbitrarily chosen sequence of bits. Currently the largest negative 32-bit integer, which cannot be translated to Netezza integers. If NULL should be output instead, the setOutputNull function can be used.
Boolean
The BOOL data type is represented as logical.
NA values are not allowed when outputting data to the Netezza system when the output format is
Boolean. If NULL should be output instead, the setOutputNull
function can be used.
Floating point
Netezza supports two floating-point numeric formats, FLOAT and
DOUBLE, which are both standardized and described in “IEEE 754 Standard for
Floating-Point Arithmetic.” In R, columns of these types are cast as double; similarly, outputting
one of these data types means casting from double.
Numeric
Netezza numeric data types are NUMERIC32, NUMERIC64, and
NUMERIC128. Currently, none of the NUMERIC types are supported in
R. To avoid errors, convert the data types to REAL before using R functions on the
data.
Date and time
DATE, TIME,
TIMETZ, TIMESTAMP, and INTERVAL.- Date
- The
DATEdata type is stored as a 4-byte integer and represents the number of days before (-) or after (+) 1/1/2000 (January 1, 2000). In R, it is stored as an integer value.minimal value -730,119 (1/1/0001) maximal value 2,921,939 (12/31/9999) - Time
- The
TIMEdata type is stored as an 8-byte integer and represents the number of microseconds between midnight and one microsecond before midnight. In R, it is stored as a double value, but only the integer portion is taken into account, whereas the fractional portion is ignored.minimal value 0 (00:00:00.000000) maximal value 86,399,999,999 (23:59:59.999999) - Time with timezone
- The
TIMETZdata type consists of two fields: the standardTIMEfield and a timezone field that is a 4-byte integer representing the offset in seconds, sign reversed. For example, the offset of “+1 hour” is stored as -3600. The offset must be a whole number of minutes, that is, offset mod 60=0.offset minimal value -46800 (+ 13:00:00) offset maximal value 46740 (-12:59:00) - Timestamp
- The
TIMESTAMPdata type is an 8 -byte integer representing the number of microseconds before (- ) or after (+) 00:00:00.0, 1/1/2000. In R, it is stored as data type double, which means that for some values greater than 2 53−1 , the rounding error might affect the value that is returned to Netezza.minimal value -63,082,281,600,000,000 (00:00:00, 1/1/0001) maximal value 252,455,615,999,999,999 (23:59:59.999999, 12/31/9999) - Interval
- The
INTERVALdata type consists of a 4-byte integer—the number of months, signed—and an 8- byte integer—the number of microseconds, signed. A configuration of both a positive (+) or negative (-) months value as well as a positive (+) or negative (-) microseconds values are possible and supported by Netezza. In R, these values are represented as an integer (months) and double (microseconds). The microsecond use means that, as withTIMESTAMP, rounding errors must be taken into account.Note:- The microsecond value can be as large as the
INT64data type allows and overflows into negatives without error. In R, the microsecond value is stored as type numeric and cast toINT64when it is sent to Netezza. Because the numeric type allows values that are larger than the allowed maximumINT64value, it is important that values are not larger than the maximumINT64. This rule applies also to the minimal microsecond value. - A month is always considered to contain 30 days.
- The months and microseconds values are stored separately; they do not exchange
information.
months minimal value 3,000,000 (-250,000 years) months maximal value 3,000,000 (250,000 years) microseconds minimal value none (minimal signed INT64) microseconds maximal value none (maximal signed INT64)
- The microsecond value can be as large as the