r - Comparing two dataframes in ddply function -


i've 2 dataframes, data , quantiles. data has dimension of 23011 x 2 , consists of columns "year" , "data" year sequence of days 1951:2013. quantiles df has dimension of 63x2 consists of columns "year" , "quantiles" , year 63 rows, ie. 1951:2013.

i need compare quantile df against data df , count sum of data values exceeding quantiles value each year. that, i'm using ddply in manner :

ddply(data, .(year), function(y) sum(y[which(y[,2] > quantile[,2]),2]) ) 

however, code compares against first row of quantile , not iterating on each of year against data df. want iterate on each year in quantile df , calculate sum of data exceeding quantile df in each year.

any shall appreciated.

the example problem - quantile df here , data pasted here

the quantile df derived data , 90th percentile data df exceeding value 1

quantile = quantile(data[-c(which(prcp2[,2] < 1)),x],0.9)}) 

why not in 1 go? creating quantiles-dataframe first , referring makes things more complicated need be. can ddply too.

set.seed(1) data <- data.frame(   year=sample(1951:2013,23011,replace=t),   data=rnorm(23011) )   res <- ddply(data,.(year), function(x){   return(sum(x$data[x$data>quantile(x$data,.9)])) }) 

and -as plyr seems replaced dplyr - :

library(dplyr)   res2 <- mydf %>% group_by(year) %>% summarise(   test=sum(value[value>quantile(value,.9)]) ) 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -