JenniferH
2012-Aug-09 14:54 UTC
[R] correlating rows of two differently-sized data frames in R
Hello everyone, I have two sets of data, with the following structure: DataSet1 Location Part Sample 1 Sample 2 A 1 value value A 2 value value A 3 value value B 1 value value DataSet2 Location Sample 1 Sample 2 A value value B value value C value value I would like to look at the correlations between DataSet1 and DataSet2, such that each row in Location A from DataSet1 is paired with the Location A row from DataSet2, and so forth. So far, my only ideas involve trying to copy-paste each of the rows in DataSet2 the number of times each occurs in DataSet1 on a spreadsheet before loading the sets into R; however, as I have approaching 8000 rows in DataSet2, this is clearly not a workable solution! I'm sure there's a simple solution to this, so I'm sorry if this seems like a really silly question. Thanks for your help! Jen -- View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2012-Aug-09 16:28 UTC
[R] correlating rows of two differently-sized data frames in R
Perhaps load them both and ?merge can show you the way. Michael On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at gmail.com> wrote:> Hello everyone, > > I have two sets of data, with the following structure: > > DataSet1 > Location Part Sample 1 Sample 2 > A 1 value value > A 2 value value > A 3 value value > B 1 value value > > DataSet2 > Location Sample 1 Sample 2 > A value value > B value value > C value value > > I would like to look at the correlations between DataSet1 and DataSet2, such > that each row in Location A from DataSet1 is paired with the Location A row > from DataSet2, and so forth. So far, my only ideas involve trying to > copy-paste each of the rows in DataSet2 the number of times each occurs in > DataSet1 on a spreadsheet before loading the sets into R; however, as I have > approaching 8000 rows in DataSet2, this is clearly not a workable solution! > > I'm sure there's a simple solution to this, so I'm sorry if this seems like > a really silly question. > > Thanks for your help! > > Jen > > > > -- > View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
R. Michael Weylandt
2012-Aug-09 17:57 UTC
[R] correlating rows of two differently-sized data frames in R
Hi Jen, It's generally best to keep cc'ing R-help so others can lend a hind when I step away from my computer: On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs <jenachobbs at gmail.com> wrote:> Hi Michael - > > thanks for the advice - I did find merge() just after posting but I'm having > difficulty with using it. I've loaded both datasets; then I tried > >> CombinedData<-merge(MethyData1,ExprData1) > > but when I looked at CombinedData, I found there was no actual data in it: > >> str(CombinedData) > 'data.frame': 0 obs. of 20 variablesTake a look at ?merge.data.frame in particular since there are many different forms of merges. Your original post suggests you may want to set all = TRUE by = "Location" Hope that helps, Michael> > I thought this might be due to the fact that my column names, as well as the > row names, in both data sets were the same, so I renamed the column names in > ExprData1 and tried again: > >> colnames(ExprData1)<-NewExprNames >> merge(ExprData1,MethyData1) > Error: cannot allocate vector of size 4.2 Gb > In addition: Warning messages: > 1: In expand.grid(seq_len(nx), seq_len(ny)) : > Reached total allocation of 8055Mb: see help(memory.size) > 2: In expand.grid(seq_len(nx), seq_len(ny)) : > Reached total allocation of 8055Mb: see help(memory.size) > 3: In expand.grid(seq_len(nx), seq_len(ny)) : > Reached total allocation of 8055Mb: see help(memory.size) > 4: In expand.grid(seq_len(nx), seq_len(ny)) : > Reached total allocation of 8055Mb: see help(memory.size) > > I was surprised about this, as I'm using a 64-bit computer and it's managedYou'll also need to be using a 64 bit build of R. Merging is pretty memory expensive so if you're right on the edge of what R can handle you might have to look into a more specialized solution (such as an SQL backend)> to deal with much larger data sets before now (I know that's not the only > criterion, but my understanding of computers isn't extensive). I had > previously run up against a memory problem because I hadn't transformed my > data (I thought I was looking at columns, the computer was looking at rows) > so I tried transforming both data sets and merging again, but I end up with > another empty data frame: > >> tED1<-t(ExprData1) >> tMD1<-t(MethyData1) >> CombineData<-merge(tED1,tMD1) >> str(CombineData) > 'data.frame': 0 obs. of 152247 variables: > > This is where I'm stuck. Any advice would be hugely appreciated! > > Jen > > On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt > <michael.weylandt at gmail.com> wrote: >> >> Perhaps load them both and ?merge can show you the way. >> >> Michael >> >> On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at gmail.com> wrote: >> > Hello everyone, >> > >> > I have two sets of data, with the following structure: >> > >> > DataSet1 >> > Location Part Sample 1 Sample 2 >> > A 1 value value >> > A 2 value value >> > A 3 value value >> > B 1 value value >> > >> > DataSet2 >> > Location Sample 1 Sample 2 >> > A value value >> > B value value >> > C value value >> > >> > I would like to look at the correlations between DataSet1 and DataSet2, >> > such >> > that each row in Location A from DataSet1 is paired with the Location A >> > row >> > from DataSet2, and so forth. So far, my only ideas involve trying to >> > copy-paste each of the rows in DataSet2 the number of times each occurs >> > in >> > DataSet1 on a spreadsheet before loading the sets into R; however, as I >> > have >> > approaching 8000 rows in DataSet2, this is clearly not a workable >> > solution! >> > >> > I'm sure there's a simple solution to this, so I'm sorry if this seems >> > like >> > a really silly question. >> > >> > Thanks for your help! >> > >> > Jen >> > >> > >> > >> > -- >> > View this message in context: >> > http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html >> > Sent from the R help mailing list archive at Nabble.com. >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >