thr3ads.net - R help - [R] select rows with identical columns from a data frame [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Sam Steingold

2013-Jan-18 20:53 UTC

[R] select rows with identical columns from a data frame

I have a data frame with several columns.
I want to select the rows with no NAs (as with complete.cases)
and all columns identical.
E.g., for

--8<---------------cut
here---------------start------------->8---> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
> f   a  b  c
1  1  1  1
2 NA NA NA
3 NA  3  5
4  4 40 40
--8<---------------cut here---------------end--------------->8---

I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
row because there all 3 columns are the same and none is NA.

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://mideasttruth.com
http://honestreporting.com http://pmw.org.il http://iris.org.il
All extremists should be taken out and shot.

Sam Steingold

2013-Jan-18 20:58 UTC

head link

[R] select rows with identical columns from a data frame

I can do
  Reduce("==",f[complete.cases(f),])
but that creates an intermediate data frame which I would love to avoid
(to save memory).
> * Sam Steingold <fqf at tah.bet> [2013-01-18 15:53:21 -0500]:
>
> I have a data frame with several columns.
> I want to select the rows with no NAs (as with complete.cases)
> and all columns identical.
> E.g., for
>
>> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
>> f
>    a  b  c
> 1  1  1  1
> 2 NA NA NA
> 3 NA  3  5
> 4  4 40 40
>
> I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
> row because there all 3 columns are the same and none is NA.
>
> thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://iris.org.il
http://www.PetitionOnline.com/tap12009/ http://ffii.org http://jihadwatch.org
War doesn't determine who's right, just who's left.

Rui Barradas

2013-Jan-18 21:02 UTC

head link

[R] select rows with identical columns from a data frame

Hello,

Try the following.

complete.cases(f) & apply(f, 1, function(x) all(x == x[1]))


Hope this helps,

Rui Barradas

Em 18-01-2013 20:53, Sam Steingold escreveu:> I have a data frame with several columns.
> I want to select the rows with no NAs (as with complete.cases)
> and all columns identical.
> E.g., for
>
> --8<---------------cut here---------------start------------->8---
>> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
>> f
>     a  b  c
> 1  1  1  1
> 2 NA NA NA
> 3 NA  3  5
> 4  4 40 40
> --8<---------------cut here---------------end--------------->8---
>
> I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
> row because there all 3 columns are the same and none is NA.
>
> thanks!
>

David Winsemius

2013-Jan-18 21:47 UTC

head link

[R] select rows with identical columns from a data frame

On Jan 18, 2013, at 1:02 PM, Rui Barradas wrote:
> Hello,
> 
> Try the following.
> 
> complete.cases(f) & apply(f, 1, function(x) all(x == x[1]))
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Em 18-01-2013 20:53, Sam Steingold escreveu:
>> I have a data frame with several columns.
>> I want to select the rows with no NAs (as with complete.cases)
>> and all columns identical.
>> E.g., for
>> 
>> --8<---------------cut here---------------start------------->8---
>>> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
>>> f
>>    a  b  c
>> 1  1  1  1
>> 2 NA NA NA
>> 3 NA  3  5
>> 4  4 40 40
>> --8<---------------cut here---------------end--------------->8---
> f[ which( rowSums(f==f[[1]]) == length(f) ), ]  a b c
1 1 1 1
>> 
>> I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
>> row because there all 3 columns are the same and none is NA.
>> 
>> thanks!
>> 
David Winsemius
Alameda, CA, USA

William Dunlap

2013-Jan-18 21:48 UTC

head link

[R] select rows with identical columns from a data frame

Here are two related approaches to your problem.  The first uses
a logical vector, "keep", to say which rows to keep.  The second
uses an integer vector, it can be considerably faster when the columns
are not well correlated with one another (so the number of desired
rows is small proportion of the input rows).

f1 <- function (x) 
{
    # sieve with logical 'keep' vector
    stopifnot(is.data.frame(x), ncol(x) > 1)
    keep <- x[[1]] == x[[2]]
    for (i in seq_len(ncol(x))[-(1:2)]) {
        keep <- keep & x[[i - 1]] == x[[i]]
    }
    !is.na(keep) & keep
}

f2 <- function (x) 
{
    # sieve with integer 'keep' vector
    stopifnot(is.data.frame(x), ncol(x) > 1)
    keep <- which(x[[1]] == x[[2]])
    for (i in seq_len(ncol(x))[-(1:2)]) {
        keep <- keep[which(x[[i - 1]][keep] == x[[i]][keep])]
    }
    seq_len(nrow(x)) %in% keep
}

E.g., for a 10 million by 10 data.frame I get:
> x <- data.frame(lapply(structure(1:10,names=letters[1:10]),
function(i)sample(c(NA,1,1,1,2,2,2,3), replace=TRUE, size=1e7)))
> system.time(v1 <- f1(x))   user  system elapsed 
   4.04    0.16    4.19 > system.time(v2 <- f2(x))   user  system elapsed 
   0.80    0.00    0.79 > identical(v1, v2)
[1] TRUE> head(x[v1,])      a b c d e f g h i j
4811  2 2 2 2 2 2 2 2 2 2
41706 1 1 1 1 1 1 1 1 1 1
56633 1 1 1 1 1 1 1 1 1 1
70859 1 1 1 1 1 1 1 1 1 1
83848 1 1 1 1 1 1 1 1 1 1
84767 1 1 1 1 1 1 1 1 1 1


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Sam Steingold
> Sent: Friday, January 18, 2013 12:53 PM
> To: r-help at r-project.org
> Subject: [R] select rows with identical columns from a data frame
> 
> I have a data frame with several columns.
> I want to select the rows with no NAs (as with complete.cases)
> and all columns identical.
> E.g., for
> 
> --8<---------------cut here---------------start------------->8---
> > f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
> > f
>    a  b  c
> 1  1  1  1
> 2 NA NA NA
> 3 NA  3  5
> 4  4 40 40
> --8<---------------cut here---------------end--------------->8---
> 
> I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
> row because there all 3 columns are the same and none is NA.
> 
> thanks!
> 
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
11.0.11103000
> http://www.childpsy.net/ http://memri.org http://mideasttruth.com
> http://honestreporting.com http://pmw.org.il http://iris.org.il
> All extremists should be taken out and shot.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2013-Jan-18 22:05 UTC

head link

[R] select rows with identical columns from a data frame

?apply(f,1,function(x)
all(duplicated(x)|duplicated(x,fromLast=TRUE)&!is.na(x)))

#[1]? TRUE FALSE FALSE FALSE


A.K.



----- Original Message -----
From: Sam Steingold <sds at gnu.org>
To: r-help at r-project.org
Cc: 
Sent: Friday, January 18, 2013 3:53 PM
Subject: [R] select rows with identical columns from a data frame

I have a data frame with several columns.
I want to select the rows with no NAs (as with complete.cases)
and all columns identical.
E.g., for

--8<---------------cut
here---------------start------------->8---> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
> f?  a? b? c
1? 1? 1? 1
2 NA NA NA
3 NA? 3? 5
4? 4 40 40
--8<---------------cut here---------------end--------------->8---

I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
row because there all 3 columns are the same and none is NA.

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://mideasttruth.com
http://honestreporting.com http://pmw.org.il http://iris.org.il
All extremists should be taken out and shot.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more reasonably related threads

R help - Jan 2013 - select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

[R] select rows with identical columns from a data frame

Apparently Analagous Threads