thr3ads.net - R help - [R] Comparing elements for equality [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Doran, Harold

2009-Jan-13 19:17 UTC

[R] Comparing elements for equality

Suppose I have a dataframe as follows:

dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2
c('foo', 'foo', 'foo', 'foobar', 'foo'))

Now, if I were to subset by id, such as:
> subset(dat, id==1)  id var1 var2
1  1   10  foo
2  1   10  foo

I can see that the elements in var1 are exactly the same and the
elements in var2 are exactly the same. However,
> subset(dat, id==2)  id var1   var2
3  2   20    foo
4  2   20 foobar
5  2   25    foo

Shows the elements are not the same for either variable in this
instance. So, what I am looking to create is a data frame that would be
like this

id	freq	var1	var2
1	2	TRUE	TRUE	
2	3	FALSE	FALSE

Where freq is the number of times the ID is repeated in the dataframe. A
TRUE appears in the cell if all elements in the column are the same for
the ID and FALSE otherwise. It is insignificant which values differ for
my problem.

The way I am thinking about tackling this is to loop through the ID
variable and compare the values in the various columns of the dataframe.
The problem I am encountering is that I don't think all.equal or
identical are the right functions in this case.

So, say I was wanting to compare the elements of var1 for id ==1. I
would have

x <- c(10,10)

Of course, the following works
> all.equal(x[1], x[2])[1] TRUE

As would a similar call to identical. However, what if I only have a
vector of values (or if the column consists of names) that I want to
assess for equality when I am trying to automate a process over
thousands of cases? As in the example above, the vector may contain only
two values or it may contain many more. The number of values in the
vector differ by id.

Any thoughts?

Harold

Carlos J. Gil Bellosta

2009-Jan-13 19:54 UTC

head link

[R] Comparing elements for equality

Hello,

You could build your output dataframe along the following lines:

foo <- function(x) length( unique(x) ) == 1

results <- data.frame(
	freq = tapply( dat$id,   dat$id, length ),
	var1 = tapply( dat$var1, dat$id, foo ),
	var2 = tapply( dat$var2, dat$id, foo )
)

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Tue, 2009-01-13 at 14:17 -0500, Doran, Harold wrote:> Suppose I have a dataframe as follows:
> 
> dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 >
c('foo', 'foo', 'foo', 'foobar', 'foo'))
> 
> Now, if I were to subset by id, such as:
> 
> > subset(dat, id==1)
>   id var1 var2
> 1  1   10  foo
> 2  1   10  foo
> 
> I can see that the elements in var1 are exactly the same and the
> elements in var2 are exactly the same. However,
> 
> > subset(dat, id==2)
>   id var1   var2
> 3  2   20    foo
> 4  2   20 foobar
> 5  2   25    foo
> 
> Shows the elements are not the same for either variable in this
> instance. So, what I am looking to create is a data frame that would be
> like this
> 
> id	freq	var1	var2
> 1	2	TRUE	TRUE	
> 2	3	FALSE	FALSE
> 
> Where freq is the number of times the ID is repeated in the dataframe. A
> TRUE appears in the cell if all elements in the column are the same for
> the ID and FALSE otherwise. It is insignificant which values differ for
> my problem.
> 
> The way I am thinking about tackling this is to loop through the ID
> variable and compare the values in the various columns of the dataframe.
> The problem I am encountering is that I don't think all.equal or
> identical are the right functions in this case.
> 
> So, say I was wanting to compare the elements of var1 for id ==1. I
> would have
> 
> x <- c(10,10)
> 
> Of course, the following works
> 
> > all.equal(x[1], x[2])
> [1] TRUE
> 
> As would a similar call to identical. However, what if I only have a
> vector of values (or if the column consists of names) that I want to
> assess for equality when I am trying to automate a process over
> thousands of cases? As in the example above, the vector may contain only
> two values or it may contain many more. The number of values in the
> vector differ by id.
> 
> Any thoughts?
> 
> Harold
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

markleeds at verizon.net

2009-Jan-13 19:57 UTC

head link

[R] Comparing elements for equality

Hi Harold: Below works on your data set but check it a lot because I am 
a little worried that
I could have missed something. Hopefully someone can send a a little 
clearer way.

dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = 
c('foo', 'foo', 'foo', 'foobar', 'foo'))
print(dat)

temp <- lapply(split(dat,dat$id), function(.df) {
   data.frame(id=.df$id[1],freq=nrow(.df),var1=all(.df$var1 %in% 
.df$var1[1]),var2=all(.df$var2 %in% .df$var2[1]))
})

result <- do.call(rbind,temp)
print(result)



On Tue, Jan 13, 2009 at  2:17 PM, Doran, Harold wrote:
> Suppose I have a dataframe as follows:
>
> dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 >
c('foo', 'foo', 'foo', 'foobar', 'foo'))
>
> Now, if I were to subset by id, such as:
>
>> subset(dat, id==1)
>   id var1 var2
> 1  1   10  foo
> 2  1   10  foo
>
> I can see that the elements in var1 are exactly the same and the
> elements in var2 are exactly the same. However,
>
>> subset(dat, id==2)
>   id var1   var2
> 3  2   20    foo
> 4  2   20 foobar
> 5  2   25    foo
>
> Shows the elements are not the same for either variable in this
> instance. So, what I am looking to create is a data frame that would 
> be
> like this
>
> id	freq	var1	var2
> 1	2	TRUE	TRUE	
> 2	3	FALSE	FALSE
>
> Where freq is the number of times the ID is repeated in the dataframe. 
> A
> TRUE appears in the cell if all elements in the column are the same 
> for
> the ID and FALSE otherwise. It is insignificant which values differ 
> for
> my problem.
>
> The way I am thinking about tackling this is to loop through the ID
> variable and compare the values in the various columns of the 
> dataframe.
> The problem I am encountering is that I don't think all.equal or
> identical are the right functions in this case.
>
> So, say I was wanting to compare the elements of var1 for id ==1. I
> would have
>
> x <- c(10,10)
>
> Of course, the following works
>
>> all.equal(x[1], x[2])
> [1] TRUE
>
> As would a similar call to identical. However, what if I only have a
> vector of values (or if the column consists of names) that I want to
> assess for equality when I am trying to automate a process over
> thousands of cases? As in the example above, the vector may contain 
> only
> two values or it may contain many more. The number of values in the
> vector differ by id.
>
> Any thoughts?
>
> Harold
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2009-Jan-13 20:09 UTC

head link

[R] Comparing elements for equality

on 01/13/2009 01:17 PM Doran, Harold wrote:> Suppose I have a dataframe as follows:
> 
> dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 >
c('foo', 'foo', 'foo', 'foobar', 'foo'))
> 
> Now, if I were to subset by id, such as:
> 
>> subset(dat, id==1)
>   id var1 var2
> 1  1   10  foo
> 2  1   10  foo
> 
> I can see that the elements in var1 are exactly the same and the
> elements in var2 are exactly the same. However,
> 
>> subset(dat, id==2)
>   id var1   var2
> 3  2   20    foo
> 4  2   20 foobar
> 5  2   25    foo
> 
> Shows the elements are not the same for either variable in this
> instance. So, what I am looking to create is a data frame that would be
> like this
> 
> id	freq	var1	var2
> 1	2	TRUE	TRUE	
> 2	3	FALSE	FALSE
> 
> Where freq is the number of times the ID is repeated in the dataframe. A
> TRUE appears in the cell if all elements in the column are the same for
> the ID and FALSE otherwise. It is insignificant which values differ for
> my problem.
> 
> The way I am thinking about tackling this is to loop through the ID
> variable and compare the values in the various columns of the dataframe.
> The problem I am encountering is that I don't think all.equal or
> identical are the right functions in this case.
> 
> So, say I was wanting to compare the elements of var1 for id ==1. I
> would have
> 
> x <- c(10,10)
> 
> Of course, the following works
> 
>> all.equal(x[1], x[2])
> [1] TRUE
> 
> As would a similar call to identical. However, what if I only have a
> vector of values (or if the column consists of names) that I want to
> assess for equality when I am trying to automate a process over
> thousands of cases? As in the example above, the vector may contain only
> two values or it may contain many more. The number of values in the
> vector differ by id.
> 
> Any thoughts?
> 
> Harold
Harold,

If we are not talking about testing floats for equivalence:
> merge(table(id = dat$id),        aggregate(dat[-1], list(id = dat$id),
                  function(x) length(unique(x)) == 1),
        by = "id")
  id Freq  var1  var2
1  1    2  TRUE  TRUE
2  2    3 FALSE FALSE


HTH,

Marc Schwartz

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jan 2009 - Comparing elements for equality

[R] Comparing elements for equality

[R] Comparing elements for equality

[R] Comparing elements for equality

[R] Comparing elements for equality

Maybe Matching Threads