Dimitri Liakhovitski
2011-May-21 13:12 UTC
[R] Looping through values in a data frame that are >zero
Hello! I've tried for a while - but can't figure it out. I have data frame x: y=c("a","b","c","d","e") z=c("m","n","o","p","r") a=c(0,0,1,0,0) b=c(2,0,0,0,0) c=c(0,0,0,4,0) x<-data.frame(y,z,a,b,c,stringsAsFactors=F) str(x) Some of the values in columns a,b, and c are >0: I need to write a loop through all the cells in columns a,b,c that are>0 (only through them).For each of those cells, I need to know: 1. Name of the column it is in 2 The entry of column y that is in the same row 3 The entry of column z that is in the same row It'd be good to save this info in a data frame somehow - so that I could loop through rows of this data frame. To explain what I need it for eventually: I have a different data frame "large.df" that has the same columns (variables) - but with many more entries than "x". Something like: large.df<-expand.grid(y,z) names(large.df)<-c("y","z") set.seed(123) large.df$a<-sample(0:5,75,replace=T) set.seed(234) large.df$b<-sample(0:5,75,replace=T) set.seed(345) large.df$c<-sample(0:5,75,replace=T) large.df$y<-as.character(large.df$y) large.df$z<-as.character(large.df$z) large.df<-large.df[order(large.df$y,large.df$z),] row.names(large.df)<-1:nrow(large.df) (large.df);str(large.df) 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"]. 2. Find all the corresponding cells in the large.df - in this case, it's: large.df[large.df$y %in% "c" & large.df$z %in% "o","a"] and those 3 values can be found in rows 37:39 of large.df, in column "a". 3. Take those 3 values and add to them the corresponding value in x (in this case = 1) divided by their length (in this case = 3). 4. Do the same for the other cells in x that are >0. The final result will be (sorry for lengthy code): large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]+x[3,"a"]/3 large.df[large.df$y %in% "a" & large.df$z %in% "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in% "m","b"]+x[1,"b"]/3 large.df[large.df$y %in% "d" & large.df$z %in% "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in% "p","c"]+x[4,"c"]/3 (large.df) (It just happens that at the end I divide by 3 - it could be anything that is length(large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]), etc. Thanks a lot for your suggestions! -- Dimitri Liakhovitski Ninah Consulting ninah.com
David Winsemius
2011-May-21 14:01 UTC
[R] Looping through values in a data frame that are >zero
On May 21, 2011, at 9:12 AM, Dimitri Liakhovitski wrote:> Hello! > > I've tried for a while - but can't figure it out. I have data frame x: > > y=c("a","b","c","d","e") > z=c("m","n","o","p","r") > a=c(0,0,1,0,0) > b=c(2,0,0,0,0) > c=c(0,0,0,4,0) > x<-data.frame(y,z,a,b,c,stringsAsFactors=F) > str(x) > Some of the values in columns a,b, and c are >0: > > I need to write a loop through all the cells in columns a,b,c that are >> 0 (only through them). > For each of those cells, I need to know: > 1. Name of the column it is inapply(x[,3:5], 1, function(z) if(any(z >0) ){ names(x)[2+which(z >0)] } else { "none" }) [1] "b" "none" "a" "cc" "none"> 2 The entry of column y that is in the same rowapply(x, 1, function(z) if(any(z[3:5] >0) ){ z[1] } else { "none" }) [1] "a" "none" "c" "d" "none" there might be pitfalls about which I am unaware since z will be coerced to a character vector. Generally the character comparisons with ">" will be "as expected" when the values were originally numeric. > ("-3" > 0) [1] FALSE > ("0.1" > 0) [1] TRUE> 3 The entry of column z that is in the same rowapply(x, 1, function(z) if(any(z[3:5] >0) ){ z[2] } else { "none" }) [1] "m" "none" "o" "p" "none" If you want to use NA instead of "none" I don't foresee any problems. -- David> It'd be good to save this info in a data frame somehow - so that I > could loop through rows of this data frame. > > > To explain what I need it for eventually: I have a different data > frame "large.df" that has the same columns (variables) - but with many > more entries than "x". Something like: > large.df<-expand.grid(y,z) > names(large.df)<-c("y","z") > set.seed(123) > large.df$a<-sample(0:5,75,replace=T) > set.seed(234) > large.df$b<-sample(0:5,75,replace=T) > set.seed(345) > large.df$c<-sample(0:5,75,replace=T) > large.df$y<-as.character(large.df$y) > large.df$z<-as.character(large.df$z) > large.df<-large.df[order(large.df$y,large.df$z),] > row.names(large.df)<-1:nrow(large.df) > (large.df);str(large.df) > > 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"]. > 2. Find all the corresponding cells in the large.df - in this case, > it's: > large.df[large.df$y %in% "c" & large.df$z %in% "o","a"] > and those 3 values can be found in rows 37:39 of large.df, in column > "a". > 3. Take those 3 values and add to them the corresponding value in x > (in this case = 1) divided by their length (in this case = 3). > 4. Do the same for the other cells in x that are >0. > > The final result will be (sorry for lengthy code): > > large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]+x[3,"a"]/3 > large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]+x[1,"b"]/3 > large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]+x[4,"c"]/3 > (large.df) > > (It just happens that at the end I divide by 3 - it could be anything > that is length(large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]), etc. > > > Thanks a lot for your suggestions! > > > -- > Dimitri Liakhovitski > Ninah Consulting > ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Bert Gunter
2011-May-21 14:40 UTC
[R] Looping through values in a data frame that are >zero
Dmitri: 1. I did not read your whole missive. I prefer mystery novels. ;-) 2. I suggest you banish Excel language ("cells") from your vocabulary and think in R's terms of whole objects that one indexes into. 3. If I understand correctly, you can't combine results into a data frame, because they would in general be of different lengths (whole object thinking). 4. Again, if I understand correctly, this seems to be just a matter of indexing for which: lapply(x[,c("a","b","c")], function(zz)x[zz>0, c("y","z")]) should do it. HTH -- Bert On Sat, May 21, 2011 at 6:12 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > I've tried for a while - but can't figure it out. I have data frame x: > > y=c("a","b","c","d","e") > z=c("m","n","o","p","r") > a=c(0,0,1,0,0) > b=c(2,0,0,0,0) > c=c(0,0,0,4,0) > x<-data.frame(y,z,a,b,c,stringsAsFactors=F) > str(x) > Some of the values in columns a,b, and c are >0: > > I need to write a loop through all the cells in columns a,b,c that are >>0 (only through them). > For each of those cells, I need to know: > 1. Name of the column it is in > 2 The entry of column y that is in the same row > 3 The entry of column z that is in the same row > It'd be good to save this info in a data frame somehow - so that I > could loop through rows of this data frame. > > > To explain what I need it for eventually: I have a different data > frame "large.df" that has the same columns (variables) - but with many > more entries than "x". Something like: > large.df<-expand.grid(y,z) > names(large.df)<-c("y","z") > set.seed(123) > large.df$a<-sample(0:5,75,replace=T) > set.seed(234) > large.df$b<-sample(0:5,75,replace=T) > set.seed(345) > large.df$c<-sample(0:5,75,replace=T) > large.df$y<-as.character(large.df$y) > large.df$z<-as.character(large.df$z) > large.df<-large.df[order(large.df$y,large.df$z),] > row.names(large.df)<-1:nrow(large.df) > (large.df);str(large.df) > > 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"]. > 2. Find all the corresponding cells in the large.df - in this case, it's: > large.df[large.df$y %in% "c" & large.df$z %in% "o","a"] > and those 3 values can be found in rows 37:39 of large.df, in column "a". > 3. Take those 3 values and add to them the corresponding value in x > (in this case = 1) divided by their length (in this case = 3). > 4. Do the same for the other cells in x that are >0. > > The final result will be (sorry for lengthy code): > > large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]+x[3,"a"]/3 > large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]+x[1,"b"]/3 > large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]+x[4,"c"]/3 > (large.df) > > (It just happens that at the end I divide by 3 - it could be anything > that is length(large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]), etc. > > > Thanks a lot for your suggestions! > > > -- > Dimitri Liakhovitski > Ninah Consulting > ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 devo.gene.com/groups/devo/depts/ncb/home.shtml
Berend Hasselman
2011-May-21 15:17 UTC
[R] Looping through values in a data frame that are >zero
Dimitri Liakhovitski-2 wrote:> > Hello! > > I've tried for a while - but can't figure it out. I have data frame x: > > y=c("a","b","c","d","e") > z=c("m","n","o","p","r") > a=c(0,0,1,0,0) > b=c(2,0,0,0,0) > c=c(0,0,0,4,0) > x<-data.frame(y,z,a,b,c,stringsAsFactors=F) > str(x) > Some of the values in columns a,b, and c are >0: > > I need to write a loop through all the cells in columns a,b,c that are >>0 (only through them). > For each of those cells, I need to know: > 1. Name of the column it is in > 2 The entry of column y that is in the same row > 3 The entry of column z that is in the same row > It'd be good to save this info in a data frame somehow - so that I > could loop through rows of this data frame. >This will give you a dataframe x[-which(rowSums(x[,3:5]>0)==0),] or this x[-which(rowSums(x[,c("a","b","c")]>0)==0),] Berend -- View this message in context: r.789695.n4.nabble.com/Looping-through-values-in-a-data-frame-that-are-zero-tp3540579p3540752.html Sent from the R help mailing list archive at Nabble.com.
Dennis Murphy
2011-May-21 21:28 UTC
[R] Looping through values in a data frame that are >zero
Hi: Does this work for the first problem? library(reshape2) subset(melt(x, id = c('y', 'z')), value > 0) y z variable value 3 c o a 1 6 a m b 2 14 d p c 4 The second problem is so convoluted I don't even know where to start... HTH, Dennis On Sat, May 21, 2011 at 6:12 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > I've tried for a while - but can't figure it out. I have data frame x: > > y=c("a","b","c","d","e") > z=c("m","n","o","p","r") > a=c(0,0,1,0,0) > b=c(2,0,0,0,0) > c=c(0,0,0,4,0) > x<-data.frame(y,z,a,b,c,stringsAsFactors=F) > str(x) > Some of the values in columns a,b, and c are >0: > > I need to write a loop through all the cells in columns a,b,c that are >>0 (only through them). > For each of those cells, I need to know: > 1. Name of the column it is in > 2 The entry of column y that is in the same row > 3 The entry of column z that is in the same row > It'd be good to save this info in a data frame somehow - so that I > could loop through rows of this data frame. > > > To explain what I need it for eventually: I have a different data > frame "large.df" that has the same columns (variables) - but with many > more entries than "x". Something like: > large.df<-expand.grid(y,z) > names(large.df)<-c("y","z") > set.seed(123) > large.df$a<-sample(0:5,75,replace=T) > set.seed(234) > large.df$b<-sample(0:5,75,replace=T) > set.seed(345) > large.df$c<-sample(0:5,75,replace=T) > large.df$y<-as.character(large.df$y) > large.df$z<-as.character(large.df$z) > large.df<-large.df[order(large.df$y,large.df$z),] > row.names(large.df)<-1:nrow(large.df) > (large.df);str(large.df) > > 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"]. > 2. Find all the corresponding cells in the large.df - in this case, it's: > large.df[large.df$y %in% "c" & large.df$z %in% "o","a"] > and those 3 values can be found in rows 37:39 of large.df, in column "a". > 3. Take those 3 values and add to them the corresponding value in x > (in this case = 1) divided by their length (in this case = 3). > 4. Do the same for the other cells in x that are >0. > > The final result will be (sorry for lengthy code): > > large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]+x[3,"a"]/3 > large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in% > "m","b"]+x[1,"b"]/3 > large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in% > "p","c"]+x[4,"c"]/3 > (large.df) > > (It just happens that at the end I divide by 3 - it could be anything > that is length(large.df[large.df$y %in% "c" & large.df$z %in% > "o","a"]), etc. > > > Thanks a lot for your suggestions! > > > -- > Dimitri Liakhovitski > Ninah Consulting > ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Seemingly Similar Threads
- lookup in R - possible to avoid loops?
- Efficiency question: replacing all NAs with a zero
- Code is too slow: mean-centering variables in a data frame by subgroup
- regression coefficient for different factors
- summing values by week - based on daily dates - but with some dates missing