thr3ads.net - R help - [R] Vector searching and counting speed optimization [May 2006]

If this information is useful, please help other people find it:
Share via:

Eric Archer

2006-May-03 18:28 UTC

[R] Vector searching and counting speed optimization

R-users,

I'm seeking any suggestions on optimizing some code for speed.  Here's 
the setup:  the code below is part of a larger chunk that is calculating 
Fst values across loci and alleles.  This chunk is designed to calculate 
the proportion ('p.a') of an allele ('a') at a locus in each
population
('p') and the proportion of individuals heterozygous for that allele 
('h.a').   I'm not concerned with being slick in terms of using the
most
convenient functions, but would rather have it do the calculations as 
fast as possible as this bit is getting run very frequently and seems to 
be taking the most compute time.  Profiling seems to indicate that 
functions like 'length(which(...))' and
'table(factor(...,level=a))' are
more expensive than the logical vector creation scheme below.  I just 
want to make sure I haven't overlooked any other viable options that 
might be available.  Any and all suggestions are gladly welcomed.  
Thanks in advance.

Cheers,
eric

---

Variables used :
'pop' - population i.d. , 'a1' & 'a2' - alleles 1
and 2 at locus : all
character vectors of equal length (no NAs)
nvec - vector of number of individuals in population 'p'
a - allele for which 'p.a' and 'p.het.a' are being calculated

Here's some example data and then the code snippet in question:

test <- structure(list(pop = c("1", "1", "1",
"1", "1", "1", "1", "1",
"2", "2", "2", "2", "2",
"2", "2", "2", "3", "3",
"3", "3", "3"
), loc4.a1 = c("3", "3", "4", "3",
"3", "4", "4", "4", "3",
"4",
"4", "3", "4", "2", "4",
"4", "4", "4", "2", "4",
"2"), loc4.a2 = c("3",
"3", "3", "3", "4", "3",
"3", "3", "2", "3", "3",
"3", "4", "2",
"3", "4", "3", "4", "1",
"3", "1")), .Names = c("pop", "loc4.a1",
"loc4.a2"), na.action = structure(36, .Names = "36", class =
"omit"),
row.names = c("1",
"2", "3", "4", "5", "6",
"7", "8", "9", "10", "11",
"12", "13",
"14", "15", "16", "17", "18",
"19", "20", "21"), class = "data.frame")
pop <- test$pop
a1 <- test$loc4.a1
a2 <- test$loc4.a2
nvec <- table(pop)
a <- "3"

    with.a <- a1 == a | a2 == a
    allele.stats <- sapply(names(nvec), function(p) {
      this.pop <- pop == p & with.a
      a.in.a1 <- a1[this.pop] == a
      a.in.a2 <- a2[this.pop] == a
      na1 <- length(a.in.a1[a.in.a1])
      na2 <- length(a.in.a2[a.in.a2])
      p.a <- (na1 + na2) / nvec[p] / 2
      is.het <- a1[this.pop] != a2[this.pop]
      p.het.a <- length(is.het[is.het]) / nvec[p]
      c(p.a, p.het.a)
    })

-- 

Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
eric.archer at noaa.gov


"Lighthouses are more helpful than churches."
    - Benjamin Franklin

"Cogita tute" - Think for yourself

Richard M. Heiberger

2006-May-03 19:02 UTC

head link

[R] Vector searching and counting speed optimization

sum() looks faster to me

tmp <- as.logical(rbinom(100,1,.5))
> system.time(for (i in 1:10000) length(tmp[tmp]))
[1] 0.09 0.00 0.09   NA   NA> system.time(for (i in 1:10000) sum(tmp))[1] 0.03 0.00 0.03   NA   NA

I am running on Windows XP
> Sys.getenv()[20]                            PROCESSOR_IDENTIFIER 
"x86 Family 15 Model 2 Stepping 4, GenuineIntel"
>

Maybe Matching Threads

Search for more apparently analagous threads

R help - May 2006 - Vector searching and counting speed optimization

[R] Vector searching and counting speed optimization

[R] Vector searching and counting speed optimization

Maybe Matching Threads