thr3ads.net - R help - [R] Fwd: generate ordered categorical variable in R [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2015-Sep-16 20:40 UTC

[R] generate ordered categorical variable in R

Nope. Take it back. I stand uncorrected.
> system.time(z <-sample(1:10,1e6, rep=TRUE))   user  system elapsed
  0.045   0.001   0.047
> system.time(z <-sample.int(10,1e6,rep=TRUE))   user  system elapsed
  0.012   0.000   0.013


sample() has to do subscripting in the general case; sample.int doesn't.

But I would agree that the difference is likely almost always unnoticeable.


-- Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Wed, Sep 16, 2015 at 1:34 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:> Yes. Thanks Marc. I stand corrected.
>
> -- Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>    -- Clifford Stoll
>
>
> On Wed, Sep 16, 2015 at 1:28 PM, Marc Schwartz <marc_schwartz at
me.com> wrote:
>>
>>> On Sep 16, 2015, at 1:06 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>>
>>> Yikes! The uniform distribution is a **continuous** distribution
over
>>> an interval. You seem to want to sample over a discrete
distribution.
>>> See ?sample for that, as in:
>>>
>>> sample(1:4,100,rep=TRUE)
>>>
>>> ## or for this special case and faster
>>>
>>> sample.int(4,size=100,rep=TRUE)
>>
>>
>> Bert,
>>
>> I am not sure that it is really faster, since internally, sample()
calls sample.int():
>>
>>> sample
>> function (x, size, replace = FALSE, prob = NULL)
>> {
>>     if (length(x) == 1L && is.numeric(x) && x >= 1)
{
>>         if (missing(size))
>>             size <- x
>>         sample.int(x, size, replace, prob)
>>     }
>>     else {
>>         if (missing(size))
>>             size <- length(x)
>>         x[sample.int(length(x), size, replace, prob)]
>>     }
>> }
>>
>>
>> set.seed(1)
>>
>>> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))
>>    user  system elapsed
>>   2.755   0.170   2.925
>>
>>
>> set.seed(1)
>>> system.time(x2 <- sample.int(1e10, 1e8, replace = TRUE))
>>    user  system elapsed
>>   2.767   0.183   2.951
>>
>>
>>> all(x1 == x2)
>> [1] TRUE
>>
>>
>> Regards,
>>
>> Marc
>>
>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>>   -- Clifford Stoll
>>>
>>>
>>> On Wed, Sep 16, 2015 at 10:11 AM, thanoon younis
>>> <thanoon.younis80 at gmail.com> wrote:
>>>> Dear R- users
>>>>
>>>> I want to generate ordered categorical variable vector with
200x1 dimension
>>>> and from 1 to 4 categories and i tried with this code
>>>>
>>>> Q1=runif(200,1,4) the results are not just 1 ,2 3,4, but the
results with
>>>> decimals like 1.244, 2.342,4,321 and so on ... My question how
can i
>>>> generate a vector and also a matrix with orered categorical
variables and
>>>> without decimals just 1,2,3 ,4 ,1,2,3,4, ....
>>>>
>>>> Many thanks in advance
>>

Marc Schwartz

2015-Sep-16 21:07 UTC

head link

[R] generate ordered categorical variable in R

> On Sep 16, 2015, at 3:40 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> Nope. Take it back. I stand uncorrected.
> 
>> system.time(z <-sample(1:10,1e6, rep=TRUE))
>   user  system elapsed
>  0.045   0.001   0.047
> 
>> system.time(z <-sample.int(10,1e6,rep=TRUE))
>   user  system elapsed
>  0.012   0.000   0.013
> 
> 
> sample() has to do subscripting in the general case; sample.int
doesn't.
> 
> But I would agree that the difference is likely almost always unnoticeable.

Well, in your defense Bert, given the nuance of the example you provided, it
actually gets worse the larger the initial sample space is, if defined as a
vector rather than a scalar.

On my MacBook Pro, with 16 Gb of RAM and a 2.5 Ghz i7, running R version 3.2.2
(2015-08-14):
> system.time(x1 <- sample(1:1e10, 1e8, replace = TRUE))Killed: 9

That ran for a couple of minutes and eventually crashed R.

However, as below:
> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))   user  system elapsed 
  2.943   0.238   3.191 
> system.time(x1 <- sample.int(1e10, 1e8, replace = TRUE))   user  system elapsed 
  3.135   0.198   3.336 


Here is another example that works, showing a larger time difference with the
sample space as a vector:
> system.time(x1 <- sample(1:1e9, 1e8, replace = TRUE))   user  system elapsed 
  7.069   1.317   8.399 
> system.time(x1 <- sample(1e9, 1e8, replace = TRUE))   user  system elapsed 
  1.324   0.111   1.438 
> system.time(x1 <- sample.int(1e9, 1e8, replace = TRUE))   user  system elapsed 
  1.328   0.116   1.450 


If one is running Monte Carlo simulations, repeating the above a very large
number of times, it can become a meaningful difference.

Thus, there is an incentive for one to specify the sample space as a scalar and
perhaps consider the resultant vector, if needed, as indices (1:x) into the
actual sample space desired.

Interesting...

Regards,

Marc

> 
> 
> -- Bert
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
> 
> 
> On Wed, Sep 16, 2015 at 1:34 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>> Yes. Thanks Marc. I stand corrected.
>> 
>> -- Bert
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And
knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>> 
>> 
>> On Wed, Sep 16, 2015 at 1:28 PM, Marc Schwartz <marc_schwartz at
me.com> wrote:
>>> 
>>>> On Sep 16, 2015, at 1:06 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>>> 
>>>> Yikes! The uniform distribution is a **continuous**
distribution over
>>>> an interval. You seem to want to sample over a discrete
distribution.
>>>> See ?sample for that, as in:
>>>> 
>>>> sample(1:4,100,rep=TRUE)
>>>> 
>>>> ## or for this special case and faster
>>>> 
>>>> sample.int(4,size=100,rep=TRUE)
>>> 
>>> 
>>> Bert,
>>> 
>>> I am not sure that it is really faster, since internally, sample()
calls sample.int():
>>> 
>>>> sample
>>> function (x, size, replace = FALSE, prob = NULL)
>>> {
>>>    if (length(x) == 1L && is.numeric(x) && x >=
1) {
>>>        if (missing(size))
>>>            size <- x
>>>        sample.int(x, size, replace, prob)
>>>    }
>>>    else {
>>>        if (missing(size))
>>>            size <- length(x)
>>>        x[sample.int(length(x), size, replace, prob)]
>>>    }
>>> }
>>> 
>>> 
>>> set.seed(1)
>>> 
>>>> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))
>>>   user  system elapsed
>>>  2.755   0.170   2.925
>>> 
>>> 
>>> set.seed(1)
>>>> system.time(x2 <- sample.int(1e10, 1e8, replace = TRUE))
>>>   user  system elapsed
>>>  2.767   0.183   2.951
>>> 
>>> 
>>>> all(x1 == x2)
>>> [1] TRUE
>>> 
>>> 
>>> Regards,
>>> 
>>> Marc
>>> 
>>> 
>>>> 
>>>> Cheers,
>>>> Bert
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "Data is not information. Information is not knowledge.
And knowledge
>>>> is certainly not wisdom."
>>>>  -- Clifford Stoll
>>>> 
>>>> 
>>>> On Wed, Sep 16, 2015 at 10:11 AM, thanoon younis
>>>> <thanoon.younis80 at gmail.com> wrote:
>>>>> Dear R- users
>>>>> 
>>>>> I want to generate ordered categorical variable vector with
200x1 dimension
>>>>> and from 1 to 4 categories and i tried with this code
>>>>> 
>>>>> Q1=runif(200,1,4) the results are not just 1 ,2 3,4, but
the results with
>>>>> decimals like 1.244, 2.342,4,321 and so on ... My question
how can i
>>>>> generate a vector and also a matrix with orered categorical
variables and
>>>>> without decimals just 1,2,3 ,4 ,1,2,3,4, ....
>>>>> 
>>>>> Many thanks in advance
>>>

thanoon younis

2015-Sep-17 13:40 UTC

head link

[R] Fwd: generate ordered categorical variable in R

Dear all users

I want to write a vector with one column and just NA values and nrow=200
when i write X=numeric(NA) is not correct  how can i do this please?


Regards

	[[alternative HTML version deleted]]

Boris Steipe

2015-Sep-17 14:07 UTC

head link

[R] Fwd: generate ordered categorical variable in R

x <- rep(NA, 200)

For all cases I can think of, that is enough. If you MUST have a matrix with one
column and two hundred rows, set:

dim(x) <- c(200,1)


B.

On Sep 17, 2015, at 9:40 AM, thanoon younis <thanoon.younis80 at
gmail.com> wrote:
> Dear all users
> 
> I want to write a vector with one column and just NA values and nrow=200
> when i write X=numeric(NA) is not correct  how can i do this please?
> 
> 
> Regards
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2015-Sep-17 14:19 UTC

head link

[R] Fwd: generate ordered categorical variable in R

Vectors have no columns or rows.

rep(  NA, 200 )

If you need a matrix, you have to turn it into one:

matrix( rep(  NA, 200 ), ncol=1 )
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On September 17, 2015 6:40:34 AM PDT, thanoon younis <thanoon.younis80 at
gmail.com> wrote:>Dear all users
>
>I want to write a vector with one column and just NA values and
>nrow=200
>when i write X=numeric(NA) is not correct  how can i do this please?
>
>
>Regards
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2015 - Fwd: generate ordered categorical variable in R

[R] generate ordered categorical variable in R

[R] generate ordered categorical variable in R

[R] Fwd: generate ordered categorical variable in R

[R] Fwd: generate ordered categorical variable in R

[R] Fwd: generate ordered categorical variable in R