Paul,
My interpretation is that you are trying to assign a new bin number to a
row every time the variable chrom changes and every time the variable
chromStart changes by 115341 or more. Is that right? If so, you don't
need a loop at all. Check out the code below. I made a couple changes to
the all.tf7 example data frame so that it would have two changes in bin
number, one based on the chrom variable and one based on the chromStart
variable.
Jean
all.tf7 <- data.frame(
chrom = c("chr1", "chr1", "chr2",
"chr2", "chr2", "chr2"),
chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
name = c("ZBTB33", "TAF7_(SQ-8)",
"Pol2-4H8", "MafF_(M8194)",
"ZBTB33", "CTCF"),
cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
bin = rep(NA, 6)
)
# assign a new bin every time chrom changes and every time chromStart
changes by 115341 or more
L <- nrow(all.tf7)
prev.chrom <- c(NA, all.tf7$chrom[-L])
delta.start <- c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start
>= 115341
all.tf7$bin <- cumsum(new.bin)
all.tf7
pguilha <paul.guilhamon@gmail.com> wrote on 07/02/2012 06:25:13 AM:
> Hello all,
>
> I have written a for loop to act on a dataframe with close to 3million
rows> and 6 columns and I would like to pass it to apply() to speed the
process up> (I let the loop run for 2 days before stopping it and it had only gone
> through 200,000 rows) but I am really struggling to find a way to pass
the> arguments. Below are the loop and the head of the dataframe I am working
on.> Any hints would be much appreciated, thank you! (I have searched for
this> but could not find any other posts doing quite what I want)
> Paul
>
> x<-as.numeric(all.tf7[1,2])
> for (i in 2:nrow(all.tf7)) {
> if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
> all.tf7[i,6]<-all.tf7[i-1,6]
> else if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)>=115341)
{
> all.tf7[i,6]<-(all.tf7[i-1,6]+1)
> x<-as.numeric(all.tf7[i,2]) }
> else if (all.tf7[i,1]!=all.tf7[i-1,1]) {
> all.tf7[i,6]<-(all.tf7[i-1,6]+1)
> x<-as.numeric(all.tf7[i,2]) }
> }
>
> #the aim here is to attribute a bin number to each row so that I can
then> split the dataframe according to those bins.
>
>
> chrom chromStart chromEnd name cumsum bin
> chr1 10089 10309 ZBTB33 10089 1
> chr1 10132 10536 TAF7_(SQ-8) 20221 1
> chr1 10133 10362 Pol2-4H8 30354 1
> chr1 10148 10418 MafF_(M8194) 40502 1
> chr1 10382 10578 ZBTB33 50884 1
> chr1 16132 16352 CTCF 67016 1
[[alternative HTML version deleted]]