thr3ads.net - R help - [R] Looping through values in a data frame that are >zero [May 2011]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2011-May-21 13:12 UTC

[R] Looping through values in a data frame that are >zero

Hello!

I've tried for a while - but can't figure it out. I have data frame x:

y=c("a","b","c","d","e")
z=c("m","n","o","p","r")
a=c(0,0,1,0,0)
b=c(2,0,0,0,0)
c=c(0,0,0,4,0)
x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
str(x)
Some of the values in columns a,b, and c are >0:

I need to write a loop through all the cells in columns a,b,c that
are>0 (only through them).For each of those cells, I need to know:
1. Name of the column it is in
2 The entry of column y that is in the same row
3 The entry of column z that is in the same row
It'd be good to save this info in a data frame somehow - so that I
could loop through rows of this data frame.


To explain what I need it for eventually: I have a different data
frame "large.df" that has the same columns (variables) - but with many
more entries than "x". Something like:
large.df<-expand.grid(y,z)
names(large.df)<-c("y","z")
set.seed(123)
large.df$a<-sample(0:5,75,replace=T)
set.seed(234)
large.df$b<-sample(0:5,75,replace=T)
set.seed(345)
large.df$c<-sample(0:5,75,replace=T)
large.df$y<-as.character(large.df$y)
large.df$z<-as.character(large.df$z)
large.df<-large.df[order(large.df$y,large.df$z),]
row.names(large.df)<-1:nrow(large.df)
(large.df);str(large.df)

1. Find the first cell in x that is > 0 (in this case - it's
x[3,"a"].
2. Find all the corresponding cells in the large.df - in this case, it's:
large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]
and those 3 values can be found in rows 37:39 of large.df, in column
"a".
3. Take those 3 values and add to them the corresponding value in x
(in this case = 1) divided by their length (in this case = 3).
4. Do the same for the other cells in x that are >0.

The final result will be (sorry for lengthy code):

large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]<-large.df[large.df$y %in% "c" &
large.df$z %in%
"o","a"]+x[3,"a"]/3
large.df[large.df$y %in% "a" & large.df$z %in%
"m","b"]<-large.df[large.df$y %in% "a" &
large.df$z %in%
"m","b"]+x[1,"b"]/3
large.df[large.df$y %in% "d" & large.df$z %in%
"p","c"]<-large.df[large.df$y %in% "d" &
large.df$z %in%
"p","c"]+x[4,"c"]/3
(large.df)

(It just happens that at the end I divide by 3 - it could be anything
that is length(large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]), etc.


Thanks a lot for your suggestions!


-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

David Winsemius

2011-May-21 14:01 UTC

head link

[R] Looping through values in a data frame that are >zero

On May 21, 2011, at 9:12 AM, Dimitri Liakhovitski wrote:
> Hello!
>
> I've tried for a while - but can't figure it out. I have data frame
x:
>
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
>
> I need to write a loop through all the cells in columns a,b,c that are
>> 0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in
apply(x[,3:5], 1, function(z) if(any(z >0) ){
                                   names(x)[2+which(z >0)]
                               } else {
                                   "none" })
[1] "b"    "none" "a"    "cc"  
"none"
> 2 The entry of column y that is in the same row
  apply(x, 1, function(z) if(any(z[3:5] >0) ){ z[1]  } else {
"none" })
[1] "a"    "none" "c"    "d"   
"none"

there might be pitfalls about which I am unaware since z will be  
coerced to a character vector. Generally the character comparisons  
with ">" will be "as expected" when the values were
originally numeric.

 > ("-3" > 0)
[1] FALSE
 > ("0.1" > 0)
[1] TRUE
> 3 The entry of column z that is in the same row
  apply(x, 1, function(z) if(any(z[3:5] >0) ){ z[2]  } else {
"none" })
[1] "m"    "none" "o"    "p"   
"none"

If you want to use NA instead of "none" I don't foresee any
problems.

-- 
David

> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
>
>
> To explain what I need it for eventually: I have a different data
> frame "large.df" that has the same columns (variables) - but with
many
> more entries than "x". Something like:
> large.df<-expand.grid(y,z)
> names(large.df)<-c("y","z")
> set.seed(123)
> large.df$a<-sample(0:5,75,replace=T)
> set.seed(234)
> large.df$b<-sample(0:5,75,replace=T)
> set.seed(345)
> large.df$c<-sample(0:5,75,replace=T)
> large.df$y<-as.character(large.df$y)
> large.df$z<-as.character(large.df$z)
> large.df<-large.df[order(large.df$y,large.df$z),]
> row.names(large.df)<-1:nrow(large.df)
> (large.df);str(large.df)
>
> 1. Find the first cell in x that is > 0 (in this case - it's
x[3,"a"].
> 2. Find all the corresponding cells in the large.df - in this case,  
> it's:
> large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]
> and those 3 values can be found in rows 37:39 of large.df, in column  
> "a".
> 3. Take those 3 values and add to them the corresponding value in x
> (in this case = 1) divided by their length (in this case = 3).
> 4. Do the same for the other cells in x that are >0.
>
> The final result will be (sorry for lengthy code):
>
> large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]<-large.df[large.df$y %in% "c"
& large.df$z %in%
> "o","a"]+x[3,"a"]/3
> large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]<-large.df[large.df$y %in% "a"
& large.df$z %in%
> "m","b"]+x[1,"b"]/3
> large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]<-large.df[large.df$y %in% "d"
& large.df$z %in%
> "p","c"]+x[4,"c"]/3
> (large.df)
>
> (It just happens that at the end I divide by 3 - it could be anything
> that is length(large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]), etc.
>
>
> Thanks a lot for your suggestions!
>
>
> -- 
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Bert Gunter

2011-May-21 14:40 UTC

head link

[R] Looping through values in a data frame that are >zero

Dmitri:

1. I did not read your whole missive. I prefer mystery novels. ;-)

2. I suggest you banish Excel language ("cells") from your vocabulary
and think in R's terms of whole objects that one indexes into.

3. If I understand correctly, you can't combine results into a data
frame, because they would in general be of different lengths (whole
object thinking).

4. Again, if I understand correctly, this seems to be just a matter of
indexing for which:

lapply(x[,c("a","b","c")], function(zz)x[zz>0,
c("y","z")])

should do it.

HTH

-- Bert

On Sat, May 21, 2011 at 6:12 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:> Hello!
>
> I've tried for a while - but can't figure it out. I have data frame
x:
>
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
>
> I need to write a loop through all the cells in columns a,b,c that are
>>0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in
> 2 The entry of column y that is in the same row
> 3 The entry of column z that is in the same row
> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
>
>
> To explain what I need it for eventually: I have a different data
> frame "large.df" that has the same columns (variables) - but with
many
> more entries than "x". Something like:
> large.df<-expand.grid(y,z)
> names(large.df)<-c("y","z")
> set.seed(123)
> large.df$a<-sample(0:5,75,replace=T)
> set.seed(234)
> large.df$b<-sample(0:5,75,replace=T)
> set.seed(345)
> large.df$c<-sample(0:5,75,replace=T)
> large.df$y<-as.character(large.df$y)
> large.df$z<-as.character(large.df$z)
> large.df<-large.df[order(large.df$y,large.df$z),]
> row.names(large.df)<-1:nrow(large.df)
> (large.df);str(large.df)
>
> 1. Find the first cell in x that is > 0 (in this case - it's
x[3,"a"].
> 2. Find all the corresponding cells in the large.df - in this case,
it's:
> large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]
> and those 3 values can be found in rows 37:39 of large.df, in column
"a".
> 3. Take those 3 values and add to them the corresponding value in x
> (in this case = 1) divided by their length (in this case = 3).
> 4. Do the same for the other cells in x that are >0.
>
> The final result will be (sorry for lengthy code):
>
> large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]<-large.df[large.df$y %in% "c"
& large.df$z %in%
> "o","a"]+x[3,"a"]/3
> large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]<-large.df[large.df$y %in% "a"
& large.df$z %in%
> "m","b"]+x[1,"b"]/3
> large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]<-large.df[large.df$y %in% "d"
& large.df$z %in%
> "p","c"]+x[4,"c"]/3
> (large.df)
>
> (It just happens that at the end I divide by 3 - it could be anything
> that is length(large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]), etc.
>
>
> Thanks a lot for your suggestions!
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

Berend Hasselman

2011-May-21 15:17 UTC

head link

[R] Looping through values in a data frame that are >zero

Dimitri Liakhovitski-2 wrote:> 
> Hello!
> 
> I've tried for a while - but can't figure it out. I have data frame
x:
> 
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
> 
> I need to write a loop through all the cells in columns a,b,c that are
>>0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in
> 2 The entry of column y that is in the same row
> 3 The entry of column z that is in the same row
> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
> 
This will give you a dataframe

x[-which(rowSums(x[,3:5]>0)==0),]

or this

x[-which(rowSums(x[,c("a","b","c")]>0)==0),]

Berend



--
View this message in context:
http://r.789695.n4.nabble.com/Looping-through-values-in-a-data-frame-that-are-zero-tp3540579p3540752.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

2011-May-21 21:28 UTC

head link

[R] Looping through values in a data frame that are >zero

Hi:

Does this work for the first problem?

library(reshape2)
subset(melt(x, id = c('y', 'z')), value > 0)
   y z variable value
3  c o        a     1
6  a m        b     2
14 d p        c     4

The second problem is so convoluted I don't even know where to start...

HTH,
Dennis


On Sat, May 21, 2011 at 6:12 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:> Hello!
>
> I've tried for a while - but can't figure it out. I have data frame
x:
>
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
>
> I need to write a loop through all the cells in columns a,b,c that are
>>0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in
> 2 The entry of column y that is in the same row
> 3 The entry of column z that is in the same row
> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
>
>
> To explain what I need it for eventually: I have a different data
> frame "large.df" that has the same columns (variables) - but with
many
> more entries than "x". Something like:
> large.df<-expand.grid(y,z)
> names(large.df)<-c("y","z")
> set.seed(123)
> large.df$a<-sample(0:5,75,replace=T)
> set.seed(234)
> large.df$b<-sample(0:5,75,replace=T)
> set.seed(345)
> large.df$c<-sample(0:5,75,replace=T)
> large.df$y<-as.character(large.df$y)
> large.df$z<-as.character(large.df$z)
> large.df<-large.df[order(large.df$y,large.df$z),]
> row.names(large.df)<-1:nrow(large.df)
> (large.df);str(large.df)
>
> 1. Find the first cell in x that is > 0 (in this case - it's
x[3,"a"].
> 2. Find all the corresponding cells in the large.df - in this case,
it's:
> large.df[large.df$y %in% "c" & large.df$z %in%
"o","a"]
> and those 3 values can be found in rows 37:39 of large.df, in column
"a".
> 3. Take those 3 values and add to them the corresponding value in x
> (in this case = 1) divided by their length (in this case = 3).
> 4. Do the same for the other cells in x that are >0.
>
> The final result will be (sorry for lengthy code):
>
> large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]<-large.df[large.df$y %in% "c"
& large.df$z %in%
> "o","a"]+x[3,"a"]/3
> large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]<-large.df[large.df$y %in% "a"
& large.df$z %in%
> "m","b"]+x[1,"b"]/3
> large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]<-large.df[large.df$y %in% "d"
& large.df$z %in%
> "p","c"]+x[4,"c"]/3
> (large.df)
>
> (It just happens that at the end I divide by 3 - it could be anything
> that is length(large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]), etc.
>
>
> Thanks a lot for your suggestions!
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - May 2011 - Looping through values in a data frame that are >zero

[R] Looping through values in a data frame that are >zero

[R] Looping through values in a data frame that are >zero

[R] Looping through values in a data frame that are >zero

[R] Looping through values in a data frame that are >zero

[R] Looping through values in a data frame that are >zero

Possibly Parallel Threads