r - Comparing two dataframes in ddply function -
i've 2 dataframes, data , quantiles. data has dimension of 23011 x 2 , consists of columns "year" , "data" year sequence of days 1951:2013. quantiles df has dimension of 63x2 consists of columns "year" , "quantiles" , year 63 rows, ie. 1951:2013.
i need compare quantile df against data df , count sum of data values exceeding quantiles value each year. that, i'm using ddply in manner :
ddply(data, .(year), function(y) sum(y[which(y[,2] > quantile[,2]),2]) ) however, code compares against first row of quantile , not iterating on each of year against data df. want iterate on each year in quantile df , calculate sum of data exceeding quantile df in each year.
any shall appreciated.
the example problem - quantile df here , data pasted here
the quantile df derived data , 90th percentile data df exceeding value 1
quantile = quantile(data[-c(which(prcp2[,2] < 1)),x],0.9)})
why not in 1 go? creating quantiles-dataframe first , referring makes things more complicated need be. can ddply too.
set.seed(1) data <- data.frame( year=sample(1951:2013,23011,replace=t), data=rnorm(23011) ) res <- ddply(data,.(year), function(x){ return(sum(x$data[x$data>quantile(x$data,.9)])) }) and -as plyr seems replaced dplyr - :
library(dplyr) res2 <- mydf %>% group_by(year) %>% summarise( test=sum(value[value>quantile(value,.9)]) )
Comments
Post a Comment