toby909 at gmail.com
2007-Jul-21 01:26 UTC
[R] avoiding timconsuming for loop renaming identifiers
Hi All I was wondering if I can avoid a time-consuming for loop on my 600000 obs dataset. school_id y 8 9.87 8 8.89 8 7.89 8 8.88 20 6.78 20 9.99 20 8.79 31 10.1 31 11 There are, say, 143 different schools in this 600000 obs dataset. I need to thave sequential identifiers, 1,2,3,4,5,...,143. I was using an awkward for look that took 30 minutes to run. sid = 1 dta$sid[1] = 1 for (i in 2:nrow(dta)) { if (dta$school_id[i] != dta$school_[i-1]) sid = sid+1 dta$sid[i] = sid } Any hints appreciated. Thanks Toby
Benilton Carvalho
2007-Jul-21 01:55 UTC
[R] avoiding timconsuming for loop renaming identifiers
as.integer(factor(dta[["school_id"]])) b On Jul 20, 2007, at 9:26 PM, toby909 at gmail.com wrote:> Hi All > > I was wondering if I can avoid a time-consuming for loop on my > 600000 obs dataset. > > school_id y > 8 9.87 > 8 8.89 > 8 7.89 > 8 8.88 > 20 6.78 > 20 9.99 > 20 8.79 > 31 10.1 > 31 11 > > There are, say, 143 different schools in this 600000 obs dataset. > > I need to thave sequential identifiers, 1,2,3,4,5,...,143. > > I was using an awkward for look that took 30 minutes to run. > sid = 1 > dta$sid[1] = 1 > for (i in 2:nrow(dta)) { > if (dta$school_id[i] != dta$school_[i-1]) sid = sid+1 > dta$sid[i] = sid > } > > Any hints appreciated. > > Thanks Toby > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
François Pinard
2007-Jul-21 02:03 UTC
[R] avoiding timconsuming for loop renaming identifiers
[toby909 at gmail.com]>I was wondering if I can avoid a time-consuming for loop on my 600000 >obs dataset.>school_id y >8 9.87 >8 8.89 >8 7.89 >8 8.88 >20 6.78 >20 9.99 >20 8.79 >31 10.1 >31 11>There are, say, 143 different schools in this 600000 obs dataset. >I need to thave sequential identifiers, 1,2,3,4,5,...,143.Hello, Toby. Maybe: dta$id <- cumsum(c(1, diff(dta$school_id) != 0)) -- Fran?ois Pinard http://pinard.progiciels-bpi.ca