thr3ads.net - R help - [R] Improving loop performance [May 2010]

If this information is useful, please help other people find it:
Share via:

Mark Lamias

2010-May-11 16:17 UTC

[R] Improving loop performance

R-users,

I have the following piece of code which I am trying to run on a dataframe
(aga2) with about a half million records.  While the code works, it is extremely
slow.  I've read some of the help archives indicating that I should allocate
space to the p1 and ags1 vectors, which I have done, but this doesn't seem
to improve speed much.  Would anyone be able to provide me with advice on how I
might be able to speed this up?


p1 <- character(dim(aga2)[1])
ags <- character(dim(aga2)[1])
for (i in 1:dim(aga2)[1])
{
 if (aga2$first.exon[i]==TRUE)
 {
  p1[i]<-as.character(aga2[i, "AP"])
  ags[i]<-as.character(aga2[i, "AS"])
  
 }
 else 
 {
  p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
  ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
 }
}

Thanks.

--Mark Lamias


      
	[[alternative HTML version deleted]]

jim holtman

2010-May-11 16:33 UTC

head link

[R] Improving loop performance

Instead of looping on each row, try the following

p1 <- as.character(aga$AP)
# skew by one on the paste
p1 <- ifelse(aga2$first.exon, p1, paste(c("", tail(ags, -1)),
aga2$AP,
sep=','))

ags <- as.character(aga$AS)
ags <- ifelse(aga2$first.exon, ags, paste(c("", tail(ags, -1)),
aga2$AS,
sep=',')

On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote:
> R-users,
>
> I have the following piece of code which I am trying to run on a dataframe
> (aga2) with about a half million records.  While the code works, it is
> extremely slow.  I've read some of the help archives indicating that I
> should allocate space to the p1 and ags1 vectors, which I have done, but
> this doesn't seem to improve speed much.  Would anyone be able to
provide me
> with advice on how I might be able to speed this up?
>
>
> p1 <- character(dim(aga2)[1])
> ags <- character(dim(aga2)[1])
> for (i in 1:dim(aga2)[1])
> {
>  if (aga2$first.exon[i]==TRUE)
>  {
>   p1[i]<-as.character(aga2[i, "AP"])
>   ags[i]<-as.character(aga2[i, "AS"])
>
>  }
>  else
>  {
>   p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
>   ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
>  }
> }
>
> Thanks.
>
> --Mark Lamias
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

jim holtman

2010-May-11 16:46 UTC

head link

[R] Improving loop performance

It was supposed to be  'head(p1, -1)'  instead of  'tail(p1,
-1)'

On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote:
> R-users,
>
> I have the following piece of code which I am trying to run on a dataframe
> (aga2) with about a half million records.  While the code works, it is
> extremely slow.  I've read some of the help archives indicating that I
> should allocate space to the p1 and ags1 vectors, which I have done, but
> this doesn't seem to improve speed much.  Would anyone be able to
provide me
> with advice on how I might be able to speed this up?
>
>
> p1 <- character(dim(aga2)[1])
> ags <- character(dim(aga2)[1])
> for (i in 1:dim(aga2)[1])
> {
>  if (aga2$first.exon[i]==TRUE)
>  {
>   p1[i]<-as.character(aga2[i, "AP"])
>   ags[i]<-as.character(aga2[i, "AS"])
>
>  }
>  else
>  {
>   p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
>   ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
>  }
> }
>
> Thanks.
>
> --Mark Lamias
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

Mark Lamias

2010-May-11 17:50 UTC

head link

[R] Fw: Improving loop performance

I will clarify my problem as others has asked for more detail:

I have a dataframe, aga2, that looks like this:

   Row.ID AgilentProbe GeneSymbol GeneID Exons AgilentStart first.geneid
first.exon last.geneid last.exon
8    1348 A_23_P116898        A2M      2    34      9112685         TRUE      
TRUE        TRUE      TRUE
62  19410  A_23_P95594       NAT1      9     4     18124656         TRUE      
TRUE        TRUE      TRUE
39  10323  A_23_P31798       NAT2     10     2     18302422         TRUE      
TRUE        TRUE      TRUE
21   5353 A_23_P162918   SERPINA3     12     5     94150936         TRUE      
TRUE       FALSE     FALSE
22   9999 A_23_P162913   SERPINA3     12     5     94150800        FALSE     
FALSE       FALSE     FALSE
98  29990 A_32_P151937   SERPINA3     12     5     94150720        FALSE     
FALSE       FALSE      TRUE
33   9516   A_23_P2920   SERPINA3     12     7     94158435        FALSE      
TRUE       FALSE      TRUE
96  29595 A_32_P124727   SERPINA3     12     8     94160018        FALSE      
TRUE        TRUE      TRUE
57  18176  A_23_P80570      AADAC     13     5    153028473         TRUE      
TRUE        TRUE      TRUE
46  16139  A_23_P56529       AAMP     14     9    218838396         TRUE      
TRUE        TRUE      TRUE

For the above example, I would like to end up with a vector, probe1, like this,
based upon the AgilentProbe values:

A_23_P116898
A_23_P95594      
A_23_P31798     
A_23_P162918   
A_23_P162918,A_23_P162913,
A_23_P162918,A_23_P162913,A_32_P151937 
A_23_P2920  
A_32_P124727
A_23_P80570
A_23_P56529

I build up each element of the vector based upon the value of last.exon.  If the
value of last.exon is FALSE, I'd like to obtain the previous value of
AgilentProbe and concatenate it with the current value, and then move on to the
next element. 

As stated previously, this code works, but it is very slow with larger datasets:

probe1 <- character(dim(aga2)[1])
agstart <- character(dim(aga2)[1])

for (i in 1:dim(aga2)[1])
{
 if (aga2$first.exon[i]==TRUE)
 {
  probe1[i]<-as.character(aga2[i, "AgilentProbe"])
  agstart[i]<-as.character(aga2[i, "AgilentStart"])
  
 }
 else 
 {
  probe1[i]<-paste(probe1[i-1], aga2[i, "AgilentProbe"],
sep=",")
  agstart[i]<-paste(agstart[i-1], aga2[i, "AgilentStart"],
sep=",")
 }
}


I tried a few of the previous suggestions (and tried modifying them), but they
didn't seem to quite do the trick.  Any assistance would be greatly
appreciated.

Thanks a million.

--Mark Lamias






 


----- Forwarded Message ----
From: jim holtman <jholtman@gmail.com>
To: Mark Lamias <mlamias@yahoo.com>
Cc: r-help@r-project.org
Sent: Tue, May 11, 2010 12:46:26 PM
Subject: Re: [R] Improving loop performance

It was supposed to be  'head(p1, -1)'  instead of  'tail(p1,
-1)'


On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlamias@yahoo.com> wrote:

R-users,>
>I have the following piece of code which I am trying to run on a dataframe
(aga2) with about a half million records.  While the code works, it is extremely
slow.  I've read some of the help archives indicating that I should allocate
space to the p1 and ags1 vectors, which I have done, but this doesn't seem
to improve speed much.  Would anyone be able to provide me with advice on how I
might be able to speed this up?
>
>
>p1 <- character(dim(aga2)[1])
>ags <- character(dim(aga2)[1])
>for (i in 1:dim(aga2)[1])
>{
> if (aga2$first.exon[i]==TRUE)
> {
>  p1[i]<-as.character(aga2[i, "AP"])
>  ags[i]<-as.character(aga2[i, "AS"])
>  
> }
> else
> {
>  p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
>  ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
> }
>}
>
>Thanks.
>
>--Mark Lamias
>
>
>
>       [[alternative HTML version deleted]]
>
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



      
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more possibly parallel threads

R help - May 2010 - Improving loop performance

[R] Improving loop performance

[R] Improving loop performance

[R] Improving loop performance

[R] Fw: Improving loop performance

Apparently Analagous Threads