Dear Rhelpers, Is there a faster way than below to set a vector based on values from another vector? I'd like to call a pre-existing function for this, but one which can also handle an arbitrarily large number of categories. Any ideas? Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable C1=vector(length=length(Cat)) # New vector for numeric values # Cycle through each column and set C1 to corresponding value of Cat. for(i in 1:length(C1)){ if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 } C1 [1] -1 -1 -1 1 1 1 -1 -1 1 Cat [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" Sincerely, KeithC. Psych Undergrad, CU Boulder (US) RE McNair Scholar
Keith Alan Chamberlain <Keith.Chamberlain <at> Colorado.EDU> writes:> Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values> for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"ifelse(Cat == "a", -1, 1) [1] -1 -1 -1 1 1 1 -1 -1 1 HTH
C1 <- rep(-1, length(Cat)) C1[Cat == "b"]] <- 1 b On Jul 4, 2007, at 9:44 AM, Keith Alan Chamberlain wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values from > another vector? I'd like to call a pre-existing function for this, > but one > which can also handle an arbitrarily large number of categories. > Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > > Sincerely, > KeithC. > Psych Undergrad, CU Boulder (US) > RE McNair Scholar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
> Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > >how about: Cat<-c('a','a','a','b','b','b','a','a','b') c1<- -2*(Cat=="a")+1 -=-=- ... Time is an illusion, lunchtime doubly so. (Ford Prefect)
Cat <- c('a','a','a','b','b','b','a','a','b') C1 <- ifelse(Cat == 'a', -1, 1) ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney> -----Oorspronkelijk bericht----- > Van: r-help-bounces op stat.math.ethz.ch > [mailto:r-help-bounces op stat.math.ethz.ch] Namens Keith Alan > Chamberlain > Verzonden: woensdag 4 juli 2007 15:45 > Aan: r-help op stat.math.ethz.ch > Onderwerp: [R] A More efficient method? > > Dear Rhelpers, > > Is there a faster way than below to set a vector based on > values from another vector? I'd like to call a pre-existing > function for this, but one which can also handle an > arbitrarily large number of categories. Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > > Sincerely, > KeithC. > Psych Undergrad, CU Boulder (US) > RE McNair Scholar > > ______________________________________________ > R-help op stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Here are two ways. The second way is more than 10x faster.> set.seed(1) > C <- sample(c("a", "b"), 100000, replace = TRUE) > system.time(s1 <- ifelse(C == "a", 1, -1))user system elapsed 0.37 0.01 0.38> system.time(s2 <- 2 * (C == "a") - 1)user system elapsed 0.02 0.00 0.02> identical(s1, s2)[1] TRUE On 7/4/07, Keith Alan Chamberlain <Keith.Chamberlain at colorado.edu> wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values from > another vector? I'd like to call a pre-existing function for this, but one > which can also handle an arbitrarily large number of categories. Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > > Sincerely, > KeithC. > Psych Undergrad, CU Boulder (US) > RE McNair Scholar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
[Sorry, there were silly typose in the previous version. Corrected below] On 04-Jul-07 13:44:44, Keith Alan Chamberlain wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values > from another vector? I'd like to call a pre-existing function for > this, but one which can also handle an arbitrarily large number > of categories. Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > ># Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"> Cat=c('a','a','a','b','b','b','a','a','b')> Cat=="b"[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE> (Cat=="b") - 0.5[1] -0.5 -0.5 -0.5 0.5 0.5 0.5 -0.5 -0.5 0.5> 2*((Cat=="b") - 0.5)[1] -1 -1 -1 1 1 1 -1 -1 1 to give one example of a way to do it. But you don't say why you really want to do this. You may really want factors. And what do you want to see if there is "an arbitrarily large number of categories"? For instance:> factor(Cat,labels=c(-1,1))[1] -1 -1 -1 1 1 1 -1 -1 1 but this is not a vector, but a "factor" object. To get the vector, you need to convert Cat to an integer:> as.integer(factor(Cat))[1] 1 1 1 2 2 2 1 1 2 where (unless you've specified otherwise in factor()) the values will correspond to the elements of Cat in "natural" order, in this case first "a" (-> 1), then "b" (-> 2). E.g.> Cat2<-c("a","a","c","b","a","b") > as.integer(factor(Cat2))[1] 1 1 3 2 1 2 so, with C2<-as.integer(factor(Cat2)), you get a vector of distinct integers [1,2,3) for the distinct levels ("a","b","c") of Cat2. If you want different integer values for these levels, you can write a function to change them. Hoping this helps to break the ice! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <efh at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 04-Jul-07 Time: 16:44:20 ------------------------------ XFMail ------------------------------
#Given Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable #and defining coding<-array(c(-1,1), dimnames=list(unique(Cat) )) #(ie an array of values corresponding to your character array levels, and with names set to those levels) coding[Cat] #does what you want.>>> Keith Alan Chamberlain <Keith.Chamberlain at Colorado.EDU> 04/07/2007 14:44:44 >>>Dear Rhelpers, Is there a faster way than below to set a vector based on values from another vector? I'd like to call a pre-existing function for this, but one which can also handle an arbitrarily large number of categories. Any ideas? Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable C1=vector(length=length(Cat)) # New vector for numeric values # Cycle through each column and set C1 to corresponding value of Cat. for(i in 1:length(C1)){ if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 } C1 [1] -1 -1 -1 1 1 1 -1 -1 1 Cat [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" Sincerely, KeithC. Psych Undergrad, CU Boulder (US) RE McNair Scholar ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use, co...{{dropped}}
Gabor Grothendieck wrote:>> set.seed(1) >> C <- sample(c("a", "b"), 100000, replace = TRUE) >> system.time(s1 <- ifelse(C == "a", 1, -1)) >> > user system elapsed > 0.37 0.01 0.38 > >> system.time(s2 <- 2 * (C == "a") - 1) >> > user system elapsed > 0.02 0.00 0.02 > > system.time(s1 <- ifelse(C == "a", 1, -1))user system elapsed 0.04 0.01 0.08> system.time(s2 <- 2 * (C == "a") - 1)user system elapsed 0 0 0 I am just wondering: how comes the time does add up to 0.05 while elapsed states 0.08 on my system? (Vista+R2.5.1) Stefan -=-=- ... Time is an illusion, lunchtime doubly so. (Ford Prefect)
Dear Ted, You are correct in that factors are probably what I had in mind since I would be using them as predictors in a regression. I didn't know the syntax to get R to do the arithmetic. Many thanks to everyone who replied! Sincerely, KeithC. Psych Undergrad, CU Boulder (US) RE McNair Scholar
[Keith Alan Chamberlain]>Is there a faster way than below to set a vector based on values >from another vector? I'd like to call a pre-existing function for >this, but one which can also handle an arbitrarily large number of >categories. Any ideas?>Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable >C1=vector(length=length(Cat)) # New vector for numeric values># Cycle through each column and set C1 to corresponding value of Cat. >for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 >}>C1 >[1] -1 -1 -1 1 1 1 -1 -1 1 >Cat >[1] "a" "a" "a" "b" "b" "b" "a" "a" "b"For handling an arbitrarily large number of categories, one may go through a recoding vector, like this for the example above:> Cat <- c('a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'b') > C1 <- c(a=-1, b=1)[Cat] > C1a a a b b b a a b -1 -1 -1 1 1 1 -1 -1 1 -- Fran?ois Pinard http://pinard.progiciels-bpi.ca