Interpolation of Values in Netezza Time Series

Interpolation is the process of estimating and inserting missing values in time series data.

If the intervals of the time series are regular but some values are simply not present, the missing values can be estimated using linear interpolation. Consider the following series of monthly passenger arrivals at an airport terminal.

Table 1. Monthly arrivals at a passenger terminal
Month Passengers
3 3,500,000
4 3,900,000
5 -
6 3,400,000
7 4,500,000
8 3,900,000
9 5,800,000
10 6,000,000

In this case, linear interpolation would estimate the missing value for month 5 as 3,650,000 (the mid-point between months 4 and 6).

Irregular intervals are handled differently. Consider the following series of temperature readings.

Table 2. Temperature readings
Date Time Temperature
2011-07-24 7:00 57
2011-07-24 14:00 75
2011-07-24 21:00 72
2011-07-25 7:15 59
2011-07-25 14:00 77
2011-07-25 20:55 74
2011-07-27 7:00 60
2011-07-27 14:00 78
2011-07-27 22:00 74

Here we have readings taken at three points during three days, but at various times, only some of which are common between days. In addition, only two of the days are consecutive.

This situation can be handled in one of two ways: calculating aggregates, or determining a step size.

Aggregates might be daily aggregates calculated according to a formula based on semantic knowledge of the data. Doing so could result in the following data set.

Table 3. Temperature readings (aggregated)
Date Time Temperature
2011-07-24 24:00 69
2011-07-25 24:00 71
2011-07-26 24:00 null
2011-07-27 24:00 72

Alternatively, the algorithm can treat the series as a distinct series and determine a suitable step size. In this case, the step size determined by the algorithm might be 8 hours, resulting in the following.

Table 4. Temperature readings with step size calculated
Date Time Temperature
2011-07-24 6:00  
2011-07-24 14:00 75
2011-07-24 22:00  
2011-07-25 6:00  
2011-07-25 14:00 77
2011-07-25 22:00  
2011-07-26 6:00  
2011-07-26 14:00  
2011-07-26 22:00  
2011-07-27 6:00  
2011-07-27 14:00 78
2011-07-27 22:00 74

Here, only four readings correspond to the original measurements, but with the help of the other known values from the original series, the missing values can again be calculated by interpolation.