Time Series Interpolation

Time series interpolation

There are many cases time series data which we want to process has missing values.
We have some options to deal with missing values, for example, we are going to remove the record included missing values. But sometimes in time series data, deletion is dangerous answer to deal with it, the charasteristics of time series can be dismissed.
So, we have to interpolate the missing values based on the exist values.

Import required libraries

In [3]:
import numpy as np
from renom.utility import interpolate

Prepare the data to interpolate

In [5]:
x = [[2.0, 3.0, 5.0], [3.0, np.nan, 8.0], [6.0, np.nan, 9.0],[7.0, 9.0, np.nan], [8.0, 10.0, 10.0], [10.0, 12.0, 14.0]]
x
Out[5]:
[[2.0, 3.0, 5.0],
 [3.0, nan, 8.0],
 [6.0, nan, 9.0],
 [7.0, 9.0, nan],
 [8.0, 10.0, 10.0],
 [10.0, 12.0, 14.0]]

Interpolation mode

We prepare the interpolation mode for time series interpolation.

  • linear interpolation
  • spline interpolation
  • constant interpolation
  • nearest index interpolation
Each method have pos and cons, linear interpolation is simple and useful method. Basically, linear interpolation is sufficient except for big changes occurences. Spline interpolation is naturally interpolate the missing value using some conditional(second order differential and first order differential), so in the case that small difference between the missing values, it is similar to linear interpolation.
But in the case that big difference between the missing value, it is useful to interpolate.
Constant interpolation and nearest index interpolation are ad hoc method to interpolate, it is useful to interpolate the specified value.
  • linear interpolation

Linear interpolation is the simple method of interpolation, that interpolate the linearly to use the previous value and the former value.

In [4]:
x_interp = interpolate(x, mode="linear")
x_interp
Out[4]:
array([[  2. ,   3. ,   5. ],
       [  3. ,   5. ,   8. ],
       [  6. ,   7. ,   9. ],
       [  7. ,   9. ,   9.5],
       [  8. ,  10. ,  10. ],
       [ 10. ,  12. ,  14. ]])
  • spline interpolation

Spline interpolation is the useful interpolation for non-linear function or the case that have the wide missing range. Spline interpolation used the previous value and the former value to constract the interpolate equation, so it is usually natural interpolation.

In [8]:
x_interp = interpolate(x, mode="spline")
x_interp
Out[8]:
array([[  2.        ,   3.        ,   5.        ],
       [  3.        ,   6.4       ,   8.        ],
       [  6.        ,   8.1       ,   9.        ],
       [  7.        ,   9.        ,   9.03571429],
       [  8.        ,  10.        ,  10.        ],
       [ 10.        ,  12.        ,  14.        ]])
  • constant interpolation

Constant interpolation is the intepolation that substitute the missing values for constant value. We must specify the constant value.

In [10]:
x_interp = interpolate(x, mode="constant", constant=0.0)
x_interp
Out[10]:
array([[  2.,   3.,   5.],
       [  3.,   0.,   8.],
       [  6.,   0.,   9.],
       [  7.,   9.,   0.],
       [  8.,  10.,  10.],
       [ 10.,  12.,  14.]])
  • nearest index interpolation

Nearest index interpolation uses nearest index values not missing value.

In [11]:
x_interp = interpolate(x, mode="nearest_index")
x_interp
Out[11]:
array([[  2.,   3.,   5.],
       [  3.,   3.,   8.],
       [  6.,   9.,   9.],
       [  7.,   9.,   9.],
       [  8.,  10.,  10.],
       [ 10.,  12.,  14.]])