joe1985
2009-Apr-22 07:30 UTC
[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's
Hello I have two data frames, SNP4 and SNP1:> head(SNP4)Animal Marker Y 3213 194073197 P1001 0.021088 1295 194073197 P1002 0.021088 915 194073197 P1004 0.021088 2833 194073197 P1005 0.021088 1487 194073197 P1006 0.021088 1885 194073197 P1007 0.021088> head(SNP1)Animal Marker x 3213 194073197 P1001 2 1295 194073197 P1002 1 915 194073197 P1004 2 2833 194073197 P1005 0 1487 194073197 P1006 2 1885 194073197 P1007 0 I want these two data frames merged by 'Marker', but when i try> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)Error: cannot allocate vector of size 2.4 Gb In addition: Warning messages: 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) And error occurs. What i want is the column SNP1$x merged together with SNP4 by Marker, so some markers will have NA's in the 'x'-column in the SNP5 dataset. I also tried this> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)Error in fix.by(by.y, y) : 'by' must specify valid column(s) I won't work either. Does anyone have any idea how to solve this. Regards, Johannes. -- View this message in context: http://www.nabble.com/Merging-data-frames%2C-or-one-column-vector-with-a-data-frame-filling-out-empty-rows-with-NA%27s-tp23171110p23171110.html Sent from the R help mailing list archive at Nabble.com.
Johannes G. Madsen
2009-Apr-22 09:22 UTC
[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's
Hello I have two data frames, SNP4 and SNP1:> head(SNP4)Animal Marker Y 3213 194073197 P1001 0.021088 1295 194073197 P1002 0.021088 915 194073197 P1004 0.021088 2833 194073197 P1005 0.021088 1487 194073197 P1006 0.021088 1885 194073197 P1007 0.021088> head(SNP1)Animal Marker x 3213 194073197 P1001 2 1295 194073197 P1002 1 915 194073197 P1004 2 2833 194073197 P1005 0 1487 194073197 P1006 2 1885 194073197 P1007 0 I want these two data frames merged by 'Marker', but when i try> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)Error: cannot allocate vector of size 2.4 Gb In addition: Warning messages: 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) And error occurs. What i want is the column SNP1$x merged together with SNP4 by Marker, so some markers will have NA's in the 'x'-column in the SNP5 dataset. I also tried this> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)Error in fix.by(by.y, y) : 'by' must specify valid column(s) I won't work either. Does anyone have any idea how to solve this. Regards, Johannes. [[alternative HTML version deleted]]
Sarah Goslee
2009-Apr-22 13:37 UTC
[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's
Hi, How about this:> SNP5 <- merge(SNP4, SNP1[,2:3], all.x=TRUE) > SNP5Marker Animal Y x 1 P1001 194073197 0.021088 2 2 P1002 194073197 0.021088 1 3 P1004 194073197 0.021088 2 4 P1005 194073197 0.021088 0 5 P1006 194073197 0.021088 2 6 P1007 194073197 0.021088 0 This ignores Animal, and that may or may not be what you want - it wasn't clear from your question. But your error is due to memory limitations - could be due to specifying the wrong merge, or to having files larger than your computer can handle. This is a good job for a proper database.>> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE) > Error in fix.by(by.y, y) : 'by' must specify valid column(s)If you just include SNP1$x, there is no Marker column to merge on. You need to include at least two columns. On Wed, Apr 22, 2009 at 3:30 AM, joe1985 <johannes at dsr.life.ku.dk> wrote:> > Hello > > I have two data frames, SNP4 and SNP1: > >> head(SNP4) > ? ? ? ? ?Animal ? ? Marker ? ? ? ?Y > 3213 194073197 ?P1001 0.021088 > 1295 194073197 ?P1002 0.021088 > 915 ? 194073197 ?P1004 0.021088 > 2833 194073197 ?P1005 0.021088 > 1487 194073197 ?P1006 0.021088 > 1885 194073197 ?P1007 0.021088 > >> head(SNP1) > ? ? ? ? ? Animal ? ?Marker x > 3213 194073197 ?P1001 2 > 1295 194073197 ?P1002 1 > 915 ? 194073197 ?P1004 2 > 2833 194073197 ?P1005 0 > 1487 194073197 ?P1006 2 > 1885 194073197 ?P1007 0 > > I want these two data frames merged by 'Marker', but when i try > >> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE) > Error: cannot allocate vector of size 2.4 Gb > In addition: Warning messages: > 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > ?Reached total allocation of 1535Mb: see help(memory.size) > 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > ?Reached total allocation of 1535Mb: see help(memory.size) > 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > ?Reached total allocation of 1535Mb: see help(memory.size) > 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > ?Reached total allocation of 1535Mb: see help(memory.size) > > And error occurs. > > What i want is the column SNP1$x merged together with SNP4 by Marker, so > some markers will have NA's in the 'x'-column in the SNP5 dataset. > > I also tried this > >> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE) > Error in fix.by(by.y, y) : 'by' must specify valid column(s) > > I won't work either. > > Does anyone have any idea how to solve this. > > Regards, > > Johannes. > > > >-- Sarah Goslee http://www.functionaldiversity.org
stephenb
2009-Apr-27 15:37 UTC
[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's
You are exceeding your max memory here, so R will not be able to do that. dump both tables into a db such as mysql and then run the query either from RMySQL or from mysql directly. then output the result and import back in R. that will take care of the merge, but not sure what will happen when you actually try to run some stats on the object. it is very likely the operation will exceed memory again. in the end you may have to write your own code which does not attempt to load everything in memory, it could be either R or a lower level language. if you have SAS it will probably work as it deals with large sets in long format well. depending on what you do R may be able to deal with it after a reshape() to a wide format. joe1985 wrote:> > Hello > > I have two data frames, SNP4 and SNP1: > >> head(SNP4) > Animal Marker Y > 3213 194073197 P1001 0.021088 > 1295 194073197 P1002 0.021088 > 915 194073197 P1004 0.021088 > 2833 194073197 P1005 0.021088 > 1487 194073197 P1006 0.021088 > 1885 194073197 P1007 0.021088 > >> head(SNP1) > Animal Marker x > 3213 194073197 P1001 2 > 1295 194073197 P1002 1 > 915 194073197 P1004 2 > 2833 194073197 P1005 0 > 1487 194073197 P1006 2 > 1885 194073197 P1007 0 > > I want these two data frames merged by 'Marker', but when i try > >> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE) > Error: cannot allocate vector of size 2.4 Gb > In addition: Warning messages: > 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > Reached total allocation of 1535Mb: see help(memory.size) > 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > Reached total allocation of 1535Mb: see help(memory.size) > 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > Reached total allocation of 1535Mb: see help(memory.size) > 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) : > Reached total allocation of 1535Mb: see help(memory.size) > > And error occurs. > > What i want is the column SNP1$x merged together with SNP4 by Marker, so > some markers will have NA's in the 'x'-column in the SNP5 dataset. > > I also tried this > >> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE) > Error in fix.by(by.y, y) : 'by' must specify valid column(s) > > I won't work either. > > Does anyone have any idea how to solve this. > > Regards, > > Johannes. > > > > >-- View this message in context: http://www.nabble.com/Merging-data-frames%2C-or-one-column-vector-with-a-data-frame-filling-out-empty-rows-with-NA%27s-tp23171110p23259062.html Sent from the R help mailing list archive at Nabble.com.