thr3ads.net - R help - [R] Combining Overlapping Data [Nov 2011]

If this information is useful, please help other people find it:
Share via:

kickout

2011-Nov-11 21:07 UTC

[R] Combining Overlapping Data

I've scoured the archives but have found no concrete answer to my question.

Problem: Two data sets

1st data set(x) = 20,000 rows 
2nd data set(y) = 5,000 rows

Both have the same column names, the column of interest to me is a variable
called strain.

For example, a strain named "Chab1405" appears in x 150 times and in y
25
times...
strain "Chab1999" only appears 200 times in x and none in y (so i dont
want
that retained).


I want to create a new data frame that has all 175 measurements for
"Chab1405" and any other 'strain' that appears in both the two
data sets..
but not strains that appear in only one data set...So i want the
intersection of two data sets (maybe?).

I've tried x %in% y, but that only gives TRUE/FALSE


--
View this message in context:
http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html
Sent from the R help mailing list archive at Nabble.com.

Sarah Goslee

2011-Nov-11 23:05 UTC

head link

[R] Combining Overlapping Data

What about merge() with all=FALSE?
> x <- data.frame(a=letters[1:6], b=1:6)
> y <- data.frame(a=letters[4:9], b=11:16)
> x  a b
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6> y  a  b
1 d 11
2 e 12
3 f 13
4 g 14
5 h 15
6 i 16> merge(x, y, by="a", all=FALSE)  a b.x b.y
1 d   4  11
2 e   5  12
3 f   6  13>
If that doesn't work, some sample data would be useful.

Sarah

On Fri, Nov 11, 2011 at 4:07 PM, kickout <kyle.kocak at gmail.com>
wrote:> I've scoured the archives but have found no concrete answer to my
question.
>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a variable
> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and
in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i
dont want
> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both
the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>
-- 
Sarah Goslee
http://www.functionaldiversity.org

jim holtman

2011-Nov-11 23:27 UTC

head link

[R] Combining Overlapping Data

Use 'intersect' to get the items common in both dataframes and then use
that to extract the data in common.

On Friday, November 11, 2011, kickout <kyle.kocak@gmail.com>
wrote:> I've scoured the archives but have found no concrete answer to my
question.>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a
variable> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and
in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i
dont
want> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both
the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

	[[alternative HTML version deleted]]

Dennis Murphy

2011-Nov-12 01:29 UTC

head link

[R] Combining Overlapping Data

Hi:

This doesn't sort the data by strain level, but I think it does what
you're after. It helps if strain is either a factor or character
vector in each data frame.

h <- function(x, y) {
       tbx <- table(x$strain)
       tby <- table(y$strain)
  # Select the strains who have more than one member
  # in each data frame
       mgrps <- intersect(names(tbx[tbx > 0]),
                          names(tby[tby > 0]))
  # concatenate the data with common strains
       rbind(subset(x, gp %in% mgrps),
             subset(y, gp %in% mgrps))
   }

# Result:
dc <- h(x, y)

HTH,
Dennis

On Fri, Nov 11, 2011 at 1:07 PM, kickout <kyle.kocak at gmail.com>
wrote:> I've scoured the archives but have found no concrete answer to my
question.
>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a variable
> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and
in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i
dont want
> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both
the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more seemingly similar threads

R help - Nov 2011 - Combining Overlapping Data

[R] Combining Overlapping Data

[R] Combining Overlapping Data

[R] Combining Overlapping Data

[R] Combining Overlapping Data

Maybe Matching Threads