Matthew
2019-Mar-21 21:08 UTC
[R] creating a dataframe with full_join and looping over a list of lists.
My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing. I have been trying create a dataframe by looping through a list of lists, and using dplyr's full_join so as to keep common elements on the same row. But, I have a couple of problems. 1) The lists have different numbers of elements. 2) In the final dataframe, I would like the column names to be the names of the lists. Is it possible ? Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <- as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}* Explanation: # Start out with a list, myenter, to dataframe. mydf3 now has 1 column. # This first column will be the longest column in the final mydf3. # Loop through a list of lists, comatgs, and with each loop a particular list # is made into a dataframe of one column, atglsts. # The name of the column is the name of the list. # Each atglsts dataframe has a different number of elements. # What I want to do, is to add the newly made dataframe, atglsts, as a # new column of the data frame, mydf3 using full_join # in order to keep common elements on the same row. # I could rename the colname to 'AGI' so that I can join by 'AGI', # but then I would lose the name of the list. # In the final dataframe, I want to know the name of the original list # the column was made from. Matthew [[alternative HTML version deleted]]
Jim Lemon
2019-Mar-21 23:12 UTC
[R] creating a dataframe with full_join and looping over a list of lists.
Hi Matthew, First thing, don't put: mydf3 <- data.frame(myenter) inside your loop, otherwise you will reset the value of mydf3 each time and end up with only "myenter" and the final list. Without some idea of the contents of comatgs, it is difficult to suggest a way to get what you want. Jim On Fri, Mar 22, 2019 at 8:16 AM Matthew <mccormack at molbio.mgh.harvard.edu> wrote:> > My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing. > > I have been trying create a dataframe by looping through a list of lists, > and using dplyr's full_join so as to keep common elements on the same row. > But, I have a couple of problems. > > 1) The lists have different numbers of elements. > > 2) In the final dataframe, I would like the column names to be the names > of the lists. > > Is it possible ? > > Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <- > as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}* > Explanation: # Start out with a list, myenter, to dataframe. mydf3 now > has 1 column. # This first column will be the longest column in the > final mydf3. # Loop through a list of lists, comatgs, and with each loop > a particular list # is made into a dataframe of one column, atglsts. # > The name of the column is the name of the list. # Each atglsts dataframe > has a different number of elements. # What I want to do, is to add the > newly made dataframe, atglsts, as a # new column of the data frame, > mydf3 using full_join # in order to keep common elements on the same > row. # I could rename the colname to 'AGI' so that I can join by 'AGI', > # but then I would lose the name of the list. # In the final dataframe, > I want to know the name of the original list # the column was made from. Matthew > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2019-Mar-22 03:01 UTC
[R] creating a dataframe with full_join and looping over a list of lists.
Hi Matthew, Remember, keep it on the list so that people know the status of the request. I couldn't get this to work with the "_source_info_" variable. It seems to be unreadable as a variable name. So, this _may_ be what you want. I don't know if it can be done with "merge" and I don't know the function "full_join". WRKY8_colamp_a<-as.character( c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150", "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920", "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690", "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840", "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975", "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110", "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020")) bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750")) bHLH10_colamp_a<-as.character( c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620", "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370", "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555", "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540", "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010", "AT5G57220","AT5G64750","AT5G66020")) # let myenter be the sorted superset myenter<- sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a))) splice<-function(x,y) { nx<-length(x) ny<-length(y) newy<-rep(NA,nx) if(ny) { yi<-1 for(xi in 1:nx) { if(x[xi] == y[yi]) { newy[xi]<-y[yi] yi<-yi+1 } if(yi>ny) break() } } return(newy) } comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a, bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a) mydf3<-data.frame(myenter,stringsAsFactors=FALSE) for(j in 1:length(comatgs)) { tmp<-data.frame(splice(myenter,sort(comatgs[[j]]))) names(tmp)<-names(comatgs)[j] mydf3<-cbind(mydf3,tmp) } Jim On Fri, Mar 22, 2019 at 10:29 AM Matthew <mccormack at molbio.mgh.harvard.edu> wrote:> > Hi Jim, > > Thanks for the reply. That was pretty dumb of me. I took that out of the loop. > > comatgs is longer than this but here is a sample of 4 of 569 elements: > > $WRKY8_colamp_a > [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" "AT1G21120" > [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" "AT1G66090" > [15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" "AT2G43620" > [22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" "AT4G14370" > [29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G11140" > [36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" "AT5G66020" > > $`_source_info_` > character(0) > > $bHLH10_col_a > [1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750" > > $bHLH10_colamp_a > [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" "AT1G57630" > [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G23250" "AT3G55840" > [15] "AT4G03460" "AT4G04480" "AT4G04540" "AT4G08555" "AT4G11470" "AT4G11890" "AT4G16820" > [22] "AT4G23280" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G20230" "AT5G22530" "AT5G24110" > [29] "AT5G56960" "AT5G57010" "AT5G57220" "AT5G64750" "AT5G66020" > > > I have been thinking of something like this: > > lenmyen <- length(myenter) # get length of longest list > length(comatgs[[j]) <- lenmyen # make each list length of myenter > atglsts <- as.data.frame(comatgs[j]) # create dataframe > colnames(atglsts) <- "AGI" # rename column to 'AGI' > > mydf3 <- full_join(mydf3, atglsts, by = "AGI" # full_join > > Matthew > > On 3/21/2019 7:12 PM, Jim Lemon wrote: > > External Email - Use Caution > > Hi Matthew, > First thing, don't put: > > mydf3 <- data.frame(myenter) > > inside your loop, otherwise you will reset the value of mydf3 each > time and end up with only "myenter" and the final list. Without some > idea of the contents of comatgs, it is difficult to suggest a way to > get what you want. > > Jim > > On Fri, Mar 22, 2019 at 8:16 AM Matthew > <mccormack at molbio.mgh.harvard.edu> wrote: > > My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing. > > I have been trying create a dataframe by looping through a list of lists, > and using dplyr's full_join so as to keep common elements on the same row. > But, I have a couple of problems. > > 1) The lists have different numbers of elements. > > 2) In the final dataframe, I would like the column names to be the names > of the lists. > > Is it possible ? > > Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <- > as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}* > Explanation: # Start out with a list, myenter, to dataframe. mydf3 now > has 1 column. # This first column will be the longest column in the > final mydf3. # Loop through a list of lists, comatgs, and with each loop > a particular list # is made into a dataframe of one column, atglsts. # > The name of the column is the name of the list. # Each atglsts dataframe > has a different number of elements. # What I want to do, is to add the > newly made dataframe, atglsts, as a # new column of the data frame, > mydf3 using full_join # in order to keep common elements on the same > row. # I could rename the colname to 'AGI' so that I can join by 'AGI', > # but then I would lose the name of the list. # In the final dataframe, > I want to know the name of the original list # the column was made from. Matthew > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Matthew
2019-Mar-25 23:48 UTC
[R] creating a dataframe with full_join and looping over a list of lists.
This is fantastic !? It was exactly what I was looking for. It is part of a larger Shiny app, so difficult to provide a working example as part of the post, and after figuring out how your code works ( I am an R novice), I made a couple of small tweaks and it works great !? Thank you very much, Jim, for the work you put into this. Matthew On 3/21/2019 11:01 PM, Jim Lemon wrote:> External Email - Use Caution > > Hi Matthew, > Remember, keep it on the list so that people know the status of the request. > I couldn't get this to work with the "_source_info_" variable. It > seems to be unreadable as a variable name. So, this _may_ be what you > want. I don't know if it can be done with "merge" and I don't know the > function "full_join". > > WRKY8_colamp_a<-as.character( > c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150", > "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920", > "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690", > "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840", > "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975", > "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110", > "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020")) > > bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750")) > > bHLH10_colamp_a<-as.character( > c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620", > "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370", > "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555", > "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540", > "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010", > "AT5G57220","AT5G64750","AT5G66020")) > > # let myenter be the sorted superset > myenter<- > sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a))) > > splice<-function(x,y) { > nx<-length(x) > ny<-length(y) > newy<-rep(NA,nx) > if(ny) { > yi<-1 > for(xi in 1:nx) { > if(x[xi] == y[yi]) { > newy[xi]<-y[yi] > yi<-yi+1 > } > if(yi>ny) break() > } > } > return(newy) > } > > comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a, > bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a) > mydf3<-data.frame(myenter,stringsAsFactors=FALSE) > for(j in 1:length(comatgs)) { > tmp<-data.frame(splice(myenter,sort(comatgs[[j]]))) > names(tmp)<-names(comatgs)[j] > mydf3<-cbind(mydf3,tmp) > } > > Jim > > On Fri, Mar 22, 2019 at 10:29 AM Matthew > <mccormack at molbio.mgh.harvard.edu> wrote: >> Hi Jim, >> >> Thanks for the reply. That was pretty dumb of me. I took that out of the loop. >> >> comatgs is longer than this but here is a sample of 4 of 569 elements: >> >> $WRKY8_colamp_a >> [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" "AT1G21120" >> [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" "AT1G66090" >> [15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" "AT2G43620" >> [22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" "AT4G14370" >> [29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G11140" >> [36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" "AT5G66020" >> >> $`_source_info_` >> character(0) >> >> $bHLH10_col_a >> [1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750" >> >> $bHLH10_colamp_a >> [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" "AT1G57630" >> [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G23250" "AT3G55840" >> [15] "AT4G03460" "AT4G04480" "AT4G04540" "AT4G08555" "AT4G11470" "AT4G11890" "AT4G16820" >> [22] "AT4G23280" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G20230" "AT5G22530" "AT5G24110" >> [29] "AT5G56960" "AT5G57010" "AT5G57220" "AT5G64750" "AT5G66020" >> >> >> I have been thinking of something like this: >> >> lenmyen <- length(myenter) # get length of longest list >> length(comatgs[[j]) <- lenmyen # make each list length of myenter >> atglsts <- as.data.frame(comatgs[j]) # create dataframe >> colnames(atglsts) <- "AGI" # rename column to 'AGI' >> >> mydf3 <- full_join(mydf3, atglsts, by = "AGI" # full_join >> >> Matthew >> >> On 3/21/2019 7:12 PM, Jim Lemon wrote: >> >> External Email - Use Caution >> >> Hi Matthew, >> First thing, don't put: >> >> mydf3 <- data.frame(myenter) >> >> inside your loop, otherwise you will reset the value of mydf3 each >> time and end up with only "myenter" and the final list. Without some >> idea of the contents of comatgs, it is difficult to suggest a way to >> get what you want. >> >> Jim >> >> On Fri, Mar 22, 2019 at 8:16 AM Matthew >> <mccormack at molbio.mgh.harvard.edu> wrote: >> >> My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing. >> >> I have been trying create a dataframe by looping through a list of lists, >> and using dplyr's full_join so as to keep common elements on the same row. >> But, I have a couple of problems. >> >> 1) The lists have different numbers of elements. >> >> 2) In the final dataframe, I would like the column names to be the names >> of the lists. >> >> Is it possible ? >> >> Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <- >> as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}* >> Explanation: # Start out with a list, myenter, to dataframe. mydf3 now >> has 1 column. # This first column will be the longest column in the >> final mydf3. # Loop through a list of lists, comatgs, and with each loop >> a particular list # is made into a dataframe of one column, atglsts. # >> The name of the column is the name of the list. # Each atglsts dataframe >> has a different number of elements. # What I want to do, is to add the >> newly made dataframe, atglsts, as a # new column of the data frame, >> mydf3 using full_join # in order to keep common elements on the same >> row. # I could rename the colname to 'AGI' so that I can join by 'AGI', >> # but then I would lose the name of the list. # In the final dataframe, >> I want to know the name of the original list # the column was made from. Matthew >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.