thr3ads.net - R help - [R] correlating rows of two differently-sized data frames in R [Aug 2012]

If this information is useful, please help other people find it:
Share via:

JenniferH

2012-Aug-09 14:54 UTC

[R] correlating rows of two differently-sized data frames in R

Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location   Part    Sample 1   Sample 2
A                     1           value         value
A                     2           value         value
A                     3           value         value
B                     1           value         value

DataSet2
Location   Sample 1    Sample 2
A                      value          value
B                      value          value
C                      value          value

I would like to look at the correlations between DataSet1 and DataSet2, such
that each row in Location A from DataSet1 is paired with the Location A row
from DataSet2, and so forth.  So far, my only ideas  involve trying to
copy-paste each of the rows in DataSet2 the number of times each occurs in
DataSet1 on a spreadsheet before loading the sets into R; however, as I have
approaching 8000 rows in DataSet2, this is clearly not a workable solution!

I'm sure there's a simple solution to this, so I'm sorry if this
seems like
a really silly question.

Thanks for your help!

Jen



--
View this message in context:
http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

R. Michael Weylandt

2012-Aug-09 16:28 UTC

head link

[R] correlating rows of two differently-sized data frames in R

Perhaps load them both and ?merge can show you the way.

Michael

On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at gmail.com>
wrote:> Hello everyone,
>
> I have two sets of data, with the following structure:
>
> DataSet1
> Location   Part    Sample 1   Sample 2
> A                     1           value         value
> A                     2           value         value
> A                     3           value         value
> B                     1           value         value
>
> DataSet2
> Location   Sample 1    Sample 2
> A                      value          value
> B                      value          value
> C                      value          value
>
> I would like to look at the correlations between DataSet1 and DataSet2,
such
> that each row in Location A from DataSet1 is paired with the Location A row
> from DataSet2, and so forth.  So far, my only ideas  involve trying to
> copy-paste each of the rows in DataSet2 the number of times each occurs in
> DataSet1 on a spreadsheet before loading the sets into R; however, as I
have
> approaching 8000 rows in DataSet2, this is clearly not a workable solution!
>
> I'm sure there's a simple solution to this, so I'm sorry if
this seems like
> a really silly question.
>
> Thanks for your help!
>
> Jen
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R. Michael Weylandt

2012-Aug-09 17:57 UTC

head link

[R] correlating rows of two differently-sized data frames in R

Hi Jen,

It's generally best to keep cc'ing R-help so others can lend a hind
when I step away from my computer:

On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs <jenachobbs at gmail.com>
wrote:> Hi Michael -
>
> thanks for the advice - I did find merge() just after posting but I'm
having
> difficulty with using it.  I've loaded both datasets; then I tried
>
>> CombinedData<-merge(MethyData1,ExprData1)
>
> but when I looked at CombinedData, I found there was no actual data in it:
>
>> str(CombinedData)
> 'data.frame': 0 obs. of  20 variables
Take a look at

?merge.data.frame

in particular since there are many different forms of merges. Your
original post suggests you may want to set

all = TRUE
by = "Location"

Hope that helps,
Michael


>
> I thought this might be due to the fact that my column names, as well as
the
> row names, in both data sets were the same, so I renamed the column names
in
> ExprData1 and tried again:
>
>> colnames(ExprData1)<-NewExprNames
>> merge(ExprData1,MethyData1)
> Error: cannot allocate vector of size 4.2 Gb
> In addition: Warning messages:
> 1: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 2: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 3: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 4: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
>
> I was surprised about this, as I'm using a 64-bit computer and it's
managed
You'll also need to be using a 64 bit build of R. Merging is pretty
memory expensive so if you're right on the edge of what R can handle
you might have to look into a more specialized solution (such as an
SQL backend)
> to deal with much larger data sets before now (I know that's not the
only
> criterion, but my understanding of computers isn't extensive).  I had
> previously run up against a memory problem because I hadn't transformed
my
> data (I thought I was looking at columns, the computer was looking at rows)
> so I tried transforming both data sets and merging again, but I end up with
> another empty data frame:
>
>> tED1<-t(ExprData1)
>> tMD1<-t(MethyData1)
>> CombineData<-merge(tED1,tMD1)
>> str(CombineData)
> 'data.frame': 0 obs. of  152247 variables:
>
> This is where I'm stuck.  Any advice would be hugely appreciated!
>
> Jen
>
> On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Perhaps load them both and ?merge can show you the way.
>>
>> Michael
>>
>> On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at
gmail.com> wrote:
>> > Hello everyone,
>> >
>> > I have two sets of data, with the following structure:
>> >
>> > DataSet1
>> > Location   Part    Sample 1   Sample 2
>> > A                     1           value         value
>> > A                     2           value         value
>> > A                     3           value         value
>> > B                     1           value         value
>> >
>> > DataSet2
>> > Location   Sample 1    Sample 2
>> > A                      value          value
>> > B                      value          value
>> > C                      value          value
>> >
>> > I would like to look at the correlations between DataSet1 and
DataSet2,
>> > such
>> > that each row in Location A from DataSet1 is paired with the
Location A
>> > row
>> > from DataSet2, and so forth.  So far, my only ideas  involve
trying to
>> > copy-paste each of the rows in DataSet2 the number of times each
occurs
>> > in
>> > DataSet1 on a spreadsheet before loading the sets into R; however,
as I
>> > have
>> > approaching 8000 rows in DataSet2, this is clearly not a workable
>> > solution!
>> >
>> > I'm sure there's a simple solution to this, so I'm
sorry if this seems
>> > like
>> > a really silly question.
>> >
>> > Thanks for your help!
>> >
>> > Jen
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Aug 2012 - correlating rows of two differently-sized data frames in R

[R] correlating rows of two differently-sized data frames in R

[R] correlating rows of two differently-sized data frames in R

[R] correlating rows of two differently-sized data frames in R

Possibly Parallel Threads