R-users, I have the following piece of code which I am trying to run on a dataframe (aga2) with about a half million records. While the code works, it is extremely slow. I've read some of the help archives indicating that I should allocate space to the p1 and ags1 vectors, which I have done, but this doesn't seem to improve speed much. Would anyone be able to provide me with advice on how I might be able to speed this up? p1 <- character(dim(aga2)[1]) ags <- character(dim(aga2)[1]) for (i in 1:dim(aga2)[1]) { if (aga2$first.exon[i]==TRUE) { p1[i]<-as.character(aga2[i, "AP"]) ags[i]<-as.character(aga2[i, "AS"]) } else { p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",") ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",") } } Thanks. --Mark Lamias [[alternative HTML version deleted]]
Instead of looping on each row, try the following p1 <- as.character(aga$AP) # skew by one on the paste p1 <- ifelse(aga2$first.exon, p1, paste(c("", tail(ags, -1)), aga2$AP, sep=',')) ags <- as.character(aga$AS) ags <- ifelse(aga2$first.exon, ags, paste(c("", tail(ags, -1)), aga2$AS, sep=',') On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote:> R-users, > > I have the following piece of code which I am trying to run on a dataframe > (aga2) with about a half million records. While the code works, it is > extremely slow. I've read some of the help archives indicating that I > should allocate space to the p1 and ags1 vectors, which I have done, but > this doesn't seem to improve speed much. Would anyone be able to provide me > with advice on how I might be able to speed this up? > > > p1 <- character(dim(aga2)[1]) > ags <- character(dim(aga2)[1]) > for (i in 1:dim(aga2)[1]) > { > if (aga2$first.exon[i]==TRUE) > { > p1[i]<-as.character(aga2[i, "AP"]) > ags[i]<-as.character(aga2[i, "AS"]) > > } > else > { > p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",") > ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",") > } > } > > Thanks. > > --Mark Lamias > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
It was supposed to be 'head(p1, -1)' instead of 'tail(p1, -1)' On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote:> R-users, > > I have the following piece of code which I am trying to run on a dataframe > (aga2) with about a half million records. While the code works, it is > extremely slow. I've read some of the help archives indicating that I > should allocate space to the p1 and ags1 vectors, which I have done, but > this doesn't seem to improve speed much. Would anyone be able to provide me > with advice on how I might be able to speed this up? > > > p1 <- character(dim(aga2)[1]) > ags <- character(dim(aga2)[1]) > for (i in 1:dim(aga2)[1]) > { > if (aga2$first.exon[i]==TRUE) > { > p1[i]<-as.character(aga2[i, "AP"]) > ags[i]<-as.character(aga2[i, "AS"]) > > } > else > { > p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",") > ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",") > } > } > > Thanks. > > --Mark Lamias > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
I will clarify my problem as others has asked for more detail: I have a dataframe, aga2, that looks like this: Row.ID AgilentProbe GeneSymbol GeneID Exons AgilentStart first.geneid first.exon last.geneid last.exon 8 1348 A_23_P116898 A2M 2 34 9112685 TRUE TRUE TRUE TRUE 62 19410 A_23_P95594 NAT1 9 4 18124656 TRUE TRUE TRUE TRUE 39 10323 A_23_P31798 NAT2 10 2 18302422 TRUE TRUE TRUE TRUE 21 5353 A_23_P162918 SERPINA3 12 5 94150936 TRUE TRUE FALSE FALSE 22 9999 A_23_P162913 SERPINA3 12 5 94150800 FALSE FALSE FALSE FALSE 98 29990 A_32_P151937 SERPINA3 12 5 94150720 FALSE FALSE FALSE TRUE 33 9516 A_23_P2920 SERPINA3 12 7 94158435 FALSE TRUE FALSE TRUE 96 29595 A_32_P124727 SERPINA3 12 8 94160018 FALSE TRUE TRUE TRUE 57 18176 A_23_P80570 AADAC 13 5 153028473 TRUE TRUE TRUE TRUE 46 16139 A_23_P56529 AAMP 14 9 218838396 TRUE TRUE TRUE TRUE For the above example, I would like to end up with a vector, probe1, like this, based upon the AgilentProbe values: A_23_P116898 A_23_P95594 A_23_P31798 A_23_P162918 A_23_P162918,A_23_P162913, A_23_P162918,A_23_P162913,A_32_P151937 A_23_P2920 A_32_P124727 A_23_P80570 A_23_P56529 I build up each element of the vector based upon the value of last.exon. If the value of last.exon is FALSE, I'd like to obtain the previous value of AgilentProbe and concatenate it with the current value, and then move on to the next element. As stated previously, this code works, but it is very slow with larger datasets: probe1 <- character(dim(aga2)[1]) agstart <- character(dim(aga2)[1]) for (i in 1:dim(aga2)[1]) { if (aga2$first.exon[i]==TRUE) { probe1[i]<-as.character(aga2[i, "AgilentProbe"]) agstart[i]<-as.character(aga2[i, "AgilentStart"]) } else { probe1[i]<-paste(probe1[i-1], aga2[i, "AgilentProbe"], sep=",") agstart[i]<-paste(agstart[i-1], aga2[i, "AgilentStart"], sep=",") } } I tried a few of the previous suggestions (and tried modifying them), but they didn't seem to quite do the trick. Any assistance would be greatly appreciated. Thanks a million. --Mark Lamias ----- Forwarded Message ---- From: jim holtman <jholtman@gmail.com> To: Mark Lamias <mlamias@yahoo.com> Cc: r-help@r-project.org Sent: Tue, May 11, 2010 12:46:26 PM Subject: Re: [R] Improving loop performance It was supposed to be 'head(p1, -1)' instead of 'tail(p1, -1)' On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote: R-users,> >I have the following piece of code which I am trying to run on a dataframe (aga2) with about a half million records. While the code works, it is extremely slow. I've read some of the help archives indicating that I should allocate space to the p1 and ags1 vectors, which I have done, but this doesn't seem to improve speed much. Would anyone be able to provide me with advice on how I might be able to speed this up? > > >p1 <- character(dim(aga2)[1]) >ags <- character(dim(aga2)[1]) >for (i in 1:dim(aga2)[1]) >{ > if (aga2$first.exon[i]==TRUE) > { > p1[i]<-as.character(aga2[i, "AP"]) > ags[i]<-as.character(aga2[i, "AS"]) > > } > else > { > p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",") > ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",") > } >} > >Thanks. > >--Mark Lamias > > > > [[alternative HTML version deleted]] > > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]