Dear Rhelpers,
Is there a faster way than below to set a vector based on values from
another vector? I'd like to call a pre-existing function for this, but one
which can also handle an arbitrarily large number of categories. Any ideas?
Cat=c('a','a','a','b','b','b','a','a','b')
# Categorical variable
C1=vector(length=length(Cat)) # New vector for numeric values
# Cycle through each column and set C1 to corresponding value of Cat.
for(i in 1:length(C1)){
if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
}
C1
[1] -1 -1 -1 1 1 1 -1 -1 1
Cat
[1] "a" "a" "a" "b" "b"
"b" "a" "a" "b"
Sincerely,
KeithC.
Psych Undergrad, CU Boulder (US)
RE McNair Scholar
Keith Alan Chamberlain <Keith.Chamberlain <at> Colorado.EDU> writes:> Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values> for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"ifelse(Cat == "a", -1, 1) [1] -1 -1 -1 1 1 1 -1 -1 1 HTH
C1 <- rep(-1, length(Cat)) C1[Cat == "b"]] <- 1 b On Jul 4, 2007, at 9:44 AM, Keith Alan Chamberlain wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values from > another vector? I'd like to call a pre-existing function for this, > but one > which can also handle an arbitrarily large number of categories. > Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > > Sincerely, > KeithC. > Psych Undergrad, CU Boulder (US) > RE McNair Scholar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
> Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > >how about: Cat<-c('a','a','a','b','b','b','a','a','b') c1<- -2*(Cat=="a")+1 -=-=- ... Time is an illusion, lunchtime doubly so. (Ford Prefect)
Cat <-
c('a','a','a','b','b','b','a','a','b')
C1 <- ifelse(Cat == 'a', -1, 1)
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be
Do not put your faith in what statistics say until you have carefully
considered what they do not say. ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op stat.math.ethz.ch
> [mailto:r-help-bounces op stat.math.ethz.ch] Namens Keith Alan
> Chamberlain
> Verzonden: woensdag 4 juli 2007 15:45
> Aan: r-help op stat.math.ethz.ch
> Onderwerp: [R] A More efficient method?
>
> Dear Rhelpers,
>
> Is there a faster way than below to set a vector based on
> values from another vector? I'd like to call a pre-existing
> function for this, but one which can also handle an
> arbitrarily large number of categories. Any ideas?
>
>
Cat=c('a','a','a','b','b','b','a','a','b')
# Categorical variable
> C1=vector(length=length(Cat)) # New vector for numeric values
>
> # Cycle through each column and set C1 to corresponding value of Cat.
> for(i in 1:length(C1)){
> if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
> }
>
> C1
> [1] -1 -1 -1 1 1 1 -1 -1 1
> Cat
> [1] "a" "a" "a" "b" "b"
"b" "a" "a" "b"
>
> Sincerely,
> KeithC.
> Psych Undergrad, CU Boulder (US)
> RE McNair Scholar
>
> ______________________________________________
> R-help op stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Here are two ways. The second way is more than 10x faster.> set.seed(1) > C <- sample(c("a", "b"), 100000, replace = TRUE) > system.time(s1 <- ifelse(C == "a", 1, -1))user system elapsed 0.37 0.01 0.38> system.time(s2 <- 2 * (C == "a") - 1)user system elapsed 0.02 0.00 0.02> identical(s1, s2)[1] TRUE On 7/4/07, Keith Alan Chamberlain <Keith.Chamberlain at colorado.edu> wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values from > another vector? I'd like to call a pre-existing function for this, but one > which can also handle an arbitrarily large number of categories. Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > > # Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" > > Sincerely, > KeithC. > Psych Undergrad, CU Boulder (US) > RE McNair Scholar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
[Sorry, there were silly typose in the previous version. Corrected below] On 04-Jul-07 13:44:44, Keith Alan Chamberlain wrote:> Dear Rhelpers, > > Is there a faster way than below to set a vector based on values > from another vector? I'd like to call a pre-existing function for > this, but one which can also handle an arbitrarily large number > of categories. Any ideas? > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable > C1=vector(length=length(Cat)) # New vector for numeric values > ># Cycle through each column and set C1 to corresponding value of Cat. > for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 > } > > C1 > [1] -1 -1 -1 1 1 1 -1 -1 1 > Cat > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"> Cat=c('a','a','a','b','b','b','a','a','b')> Cat=="b"[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE> (Cat=="b") - 0.5[1] -0.5 -0.5 -0.5 0.5 0.5 0.5 -0.5 -0.5 0.5> 2*((Cat=="b") - 0.5)[1] -1 -1 -1 1 1 1 -1 -1 1 to give one example of a way to do it. But you don't say why you really want to do this. You may really want factors. And what do you want to see if there is "an arbitrarily large number of categories"? For instance:> factor(Cat,labels=c(-1,1))[1] -1 -1 -1 1 1 1 -1 -1 1 but this is not a vector, but a "factor" object. To get the vector, you need to convert Cat to an integer:> as.integer(factor(Cat))[1] 1 1 1 2 2 2 1 1 2 where (unless you've specified otherwise in factor()) the values will correspond to the elements of Cat in "natural" order, in this case first "a" (-> 1), then "b" (-> 2). E.g.> Cat2<-c("a","a","c","b","a","b") > as.integer(factor(Cat2))[1] 1 1 3 2 1 2 so, with C2<-as.integer(factor(Cat2)), you get a vector of distinct integers [1,2,3) for the distinct levels ("a","b","c") of Cat2. If you want different integer values for these levels, you can write a function to change them. Hoping this helps to break the ice! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <efh at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 04-Jul-07 Time: 16:44:20 ------------------------------ XFMail ------------------------------
#Given
Cat=c('a','a','a','b','b','b','a','a','b')
# Categorical variable
#and defining
coding<-array(c(-1,1), dimnames=list(unique(Cat) ))
#(ie an array of values corresponding to your character array levels, and with
names set to those levels)
coding[Cat]
#does what you want.
>>> Keith Alan Chamberlain <Keith.Chamberlain at Colorado.EDU>
04/07/2007 14:44:44 >>>
Dear Rhelpers,
Is there a faster way than below to set a vector based on values from
another vector? I'd like to call a pre-existing function for this, but one
which can also handle an arbitrarily large number of categories. Any ideas?
Cat=c('a','a','a','b','b','b','a','a','b')
# Categorical variable
C1=vector(length=length(Cat)) # New vector for numeric values
# Cycle through each column and set C1 to corresponding value of Cat.
for(i in 1:length(C1)){
if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
}
C1
[1] -1 -1 -1 1 1 1 -1 -1 1
Cat
[1] "a" "a" "a" "b" "b"
"b" "a" "a" "b"
Sincerely,
KeithC.
Psych Undergrad, CU Boulder (US)
RE McNair Scholar
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
*******************************************************************
This email and any attachments are confidential. Any use, co...{{dropped}}
Gabor Grothendieck wrote:>> set.seed(1) >> C <- sample(c("a", "b"), 100000, replace = TRUE) >> system.time(s1 <- ifelse(C == "a", 1, -1)) >> > user system elapsed > 0.37 0.01 0.38 > >> system.time(s2 <- 2 * (C == "a") - 1) >> > user system elapsed > 0.02 0.00 0.02 > > system.time(s1 <- ifelse(C == "a", 1, -1))user system elapsed 0.04 0.01 0.08> system.time(s2 <- 2 * (C == "a") - 1)user system elapsed 0 0 0 I am just wondering: how comes the time does add up to 0.05 while elapsed states 0.08 on my system? (Vista+R2.5.1) Stefan -=-=- ... Time is an illusion, lunchtime doubly so. (Ford Prefect)
Dear Ted, You are correct in that factors are probably what I had in mind since I would be using them as predictors in a regression. I didn't know the syntax to get R to do the arithmetic. Many thanks to everyone who replied! Sincerely, KeithC. Psych Undergrad, CU Boulder (US) RE McNair Scholar
[Keith Alan Chamberlain]>Is there a faster way than below to set a vector based on values >from another vector? I'd like to call a pre-existing function for >this, but one which can also handle an arbitrarily large number of >categories. Any ideas?>Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable >C1=vector(length=length(Cat)) # New vector for numeric values># Cycle through each column and set C1 to corresponding value of Cat. >for(i in 1:length(C1)){ > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1 >}>C1 >[1] -1 -1 -1 1 1 1 -1 -1 1 >Cat >[1] "a" "a" "a" "b" "b" "b" "a" "a" "b"For handling an arbitrarily large number of categories, one may go through a recoding vector, like this for the example above:> Cat <- c('a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'b') > C1 <- c(a=-1, b=1)[Cat] > C1a a a b b b a a b -1 -1 -1 1 1 1 -1 -1 1 -- Fran?ois Pinard http://pinard.progiciels-bpi.ca