r - Handle Continous Missing values in time-series data -
i have time-series data shown below.
2015-04-26 23:00:00 5704.27388916015661380 2015-04-27 00:00:00 4470.30868326822928793 2015-04-27 01:00:00 4552.57241617838553793 2015-04-27 02:00:00 4570.22250032825650123 2015-04-27 03:00:00 na 2015-04-27 04:00:00 na 2015-04-27 05:00:00 na 2015-04-27 06:00:00 12697.37724086216439900 2015-04-27 07:00:00 5538.71119009653739340 2015-04-27 08:00:00 81.95060647328695325 2015-04-27 09:00:00 8550.65816895300667966 2015-04-27 10:00:00 2925.76573206583680076
how should handle continous na values. in cases have 1 na, use take average of extreme values of na entry. there standard approaches deal continuous missing values?
the zoo
package has several functions dealing na
values. 1 of following functions might suit needs:
na.locf
: last observation carried forward. using parameterfromlast = true
corresponds next observation carried backward (nocb).na.aggregate
: replacena
's aggregated value. default aggregation functionmean
, can specify other functions well. see?na.aggregate
more info.na.approx
:na
's replaced linear interpolated values.
you can compare outcomes see these functions do:
library(zoo) df$v.loc <- na.locf(df$v2) df$v.agg <- na.aggregate(df$v2) df$v.app <- na.approx(df$v2)
this results in:
> df v1 v2 v.loc v.agg v.app 1 2015-04-26 23:00:00 5704.27389 5704.27389 5704.27389 5704.27389 2 2015-04-27 00:00:00 4470.30868 4470.30868 4470.30868 4470.30868 3 2015-04-27 01:00:00 4552.57242 4552.57242 4552.57242 4552.57242 4 2015-04-27 02:00:00 4570.22250 4570.22250 4570.22250 4570.22250 5 2015-04-27 03:00:00 na 4570.22250 5454.64894 6602.01119 6 2015-04-27 04:00:00 na 4570.22250 5454.64894 8633.79987 7 2015-04-27 05:00:00 na 4570.22250 5454.64894 10665.58856 8 2015-04-27 06:00:00 12697.37724 12697.37724 12697.37724 12697.37724 9 2015-04-27 07:00:00 5538.71119 5538.71119 5538.71119 5538.71119 10 2015-04-27 08:00:00 81.95061 81.95061 81.95061 81.95061 11 2015-04-27 09:00:00 8550.65817 8550.65817 8550.65817 8550.65817 12 2015-04-27 10:00:00 2925.76573 2925.76573 2925.76573 2925.76573
used data:
df <- structure(list(v1 = structure(c(1430082000, 1430085600, 1430089200, 1430092800, 1430096400, 1430100000, 1430103600, 1430107200, 1430110800, 1430114400, 1430118000, 1430121600), class = c("posixct", "posixt"), tzone = ""), v2 = c(5704.27388916016, 4470.30868326823, 4552.57241617839, 4570.22250032826, na, na, na, 12697.3772408622, 5538.71119009654, 81.950606473287, 8550.65816895301, 2925.76573206584)), .names = c("v1", "v2"), row.names = c(na, -12l), class = "data.frame")
Comments
Post a Comment