r - Handle Continous Missing values in time-series data -


i have time-series data shown below.

2015-04-26 23:00:00  5704.27388916015661380 2015-04-27 00:00:00  4470.30868326822928793 2015-04-27 01:00:00  4552.57241617838553793 2015-04-27 02:00:00  4570.22250032825650123 2015-04-27 03:00:00  na 2015-04-27 04:00:00  na 2015-04-27 05:00:00  na 2015-04-27 06:00:00 12697.37724086216439900 2015-04-27 07:00:00  5538.71119009653739340 2015-04-27 08:00:00    81.95060647328695325 2015-04-27 09:00:00  8550.65816895300667966 2015-04-27 10:00:00  2925.76573206583680076 

how should handle continous na values. in cases have 1 na, use take average of extreme values of na entry. there standard approaches deal continuous missing values?

the zoo package has several functions dealing na values. 1 of following functions might suit needs:

  • na.locf: last observation carried forward. using parameter fromlast = true corresponds next observation carried backward (nocb).
  • na.aggregate: replace na's aggregated value. default aggregation function mean, can specify other functions well. see ?na.aggregate more info.
  • na.approx: na's replaced linear interpolated values.

you can compare outcomes see these functions do:

library(zoo) df$v.loc <- na.locf(df$v2) df$v.agg <- na.aggregate(df$v2) df$v.app <- na.approx(df$v2) 

this results in:

> df                     v1          v2       v.loc       v.agg       v.app 1  2015-04-26 23:00:00  5704.27389  5704.27389  5704.27389  5704.27389 2  2015-04-27 00:00:00  4470.30868  4470.30868  4470.30868  4470.30868 3  2015-04-27 01:00:00  4552.57242  4552.57242  4552.57242  4552.57242 4  2015-04-27 02:00:00  4570.22250  4570.22250  4570.22250  4570.22250 5  2015-04-27 03:00:00          na  4570.22250  5454.64894  6602.01119 6  2015-04-27 04:00:00          na  4570.22250  5454.64894  8633.79987 7  2015-04-27 05:00:00          na  4570.22250  5454.64894 10665.58856 8  2015-04-27 06:00:00 12697.37724 12697.37724 12697.37724 12697.37724 9  2015-04-27 07:00:00  5538.71119  5538.71119  5538.71119  5538.71119 10 2015-04-27 08:00:00    81.95061    81.95061    81.95061    81.95061 11 2015-04-27 09:00:00  8550.65817  8550.65817  8550.65817  8550.65817 12 2015-04-27 10:00:00  2925.76573  2925.76573  2925.76573  2925.76573 

used data:

df <- structure(list(v1 = structure(c(1430082000, 1430085600, 1430089200, 1430092800, 1430096400, 1430100000, 1430103600, 1430107200, 1430110800, 1430114400, 1430118000, 1430121600), class = c("posixct", "posixt"), tzone = ""), v2 = c(5704.27388916016, 4470.30868326823, 4552.57241617839, 4570.22250032826, na, na, na, 12697.3772408622, 5538.71119009654, 81.950606473287, 8550.65816895301, 2925.76573206584)), .names = c("v1", "v2"), row.names = c(na, -12l), class = "data.frame") 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

python - RuntimeWarning: PyOS_InputHook is not available for interactive use of PyGTK -