taking the sum of a TRUE/FALSE vector in r -


i working analyzing snp data fungus, , trying impute missing data changing ns genotype of more frequent allele....see below. newdata matrix of snps (rows)and fungal isolates(columns). genotypes each snp in 0, 1, , n format, , why trying impute missing genotypes.

newdata_imputed=newdata (k in 1:nrow(newdata)){ u=newdata[k,] x<-sum(u==0) y<-sum(u==1) all_freq=y/(x+y) if (all_freq<0.5){ newdata_imputed[k,]=gsub("n",0,u) } else{newdata_imputed[k,]=gsub("n",1,u)} print(k) } 

however, keep getting error:

[1] 295 [1] 296 error in if (all_freq < 0.5) { : missing value true/false needed 

it obvious code runs stops after encountering problem. please, can tell me doing wrong? newbie r, , advice appreciated.

@akrun, reason why used loop because nested in loop..so after using code.

newdata=as.data.frame(newdata) u=newdata all_freq <- rowsums(u==1)/rowsums((u==1)|(u==0)) indx <- all_freq < 0.5 indx1 <-  indx & !is.na(indx) indx2 <- !indx & !is.na(indx) newdata[indx1,] <- lapply(newdata[indx1,], gsub, pattern='n', replacement=0) newdata[indx2,] <- lapply(newdata[indx2,], gsub, pattern='n', replacement=1) newdata[] <- lapply(newdata, as.numeric) 

i got weird values

newdata[1:10,1:10]    v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 1   3  3  3  3  3  3  3  3  3   3 2   2  2  2  2  2  2  2  2  2   2 3   3  3  3  3  3  3  3  3  3   3 4   1  1  1  1  1  1  1  1  1   1 

please "3" coming from.???? should have 0 or 1

we using rowsums. @bergant , @matthewlundberg mentioned in comments, if there rows no 0 or 1 elements, nan based on calculation. 1 way modify logical condition including !is.na, i.e. elements not na along previous condition.

#using `rowsums` create all_freq vector  all_freq <- rowsums(newdata==1)/rowsums((newdata==1)|(newdata==0)) #create logical index based on elements less 0.5 indx <- all_freq < 0.5  #the na elements can changed false adding condition indx1 <-  indx & !is.na(indx) #similarly elements > 0.5  indx2 <- !indx & !is.na(indx) 

now, subset rows of 'newdata' 'indx1', loop through columns (lapply) , use gsub pattern , replacement arguments , assign output subset of 'newdata'.

newdata[indx1,] <- lapply(newdata[indx1,], gsub, pattern='n', replacement=0) 

similarly, can replacement rows greater 0.5 'all_freq'

newdata[indx2,] <- lapply(newdata[indx2,], gsub, pattern='n', replacement=1) 

the gsub output columns character class, can converted numeric (if needed).

newdata[] <- lapply(newdata, as.numeric) 

data

set.seed(24) newdata <- as.data.frame(matrix(sample(c(0:1, "n"), 10*4, replace=true),          ncol=4), stringsasfactors=false) newdata[7,] <- 2 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -