r - Comparing two dataframes in ddply function -
i've 2 dataframes, data
, quantiles
. data
has dimension of 23011 x 2
, consists of columns "year"
, "data"
year sequence of days 1951:2013. quantiles
df has dimension of 63x2
consists of columns "year"
, "quantiles"
, year 63 rows, ie. 1951:2013
.
i need compare quantile
df against data
df , count sum of data values exceeding quantiles value each year. that, i'm using ddply
in manner :
ddply(data, .(year), function(y) sum(y[which(y[,2] > quantile[,2]),2]) )
however, code compares against first row of quantile , not iterating on each of year against data df. want iterate on each year in quantile
df , calculate sum of data exceeding quantile
df in each year.
any shall appreciated.
the example problem - quantile
df here , data
pasted here
the quantile
df derived data
, 90th percentile data
df exceeding value 1
quantile = quantile(data[-c(which(prcp2[,2] < 1)),x],0.9)})
why not in 1 go? creating quantiles
-dataframe first , referring makes things more complicated need be. can ddply too.
set.seed(1) data <- data.frame( year=sample(1951:2013,23011,replace=t), data=rnorm(23011) ) res <- ddply(data,.(year), function(x){ return(sum(x$data[x$data>quantile(x$data,.9)])) })
and -as plyr seems replaced dplyr - :
library(dplyr) res2 <- mydf %>% group_by(year) %>% summarise( test=sum(value[value>quantile(value,.9)]) )
Comments
Post a Comment