thr3ads.net - R help - [R] Selecting random subset by ID [Sep 2018]

If this information is useful, please help other people find it:
Share via:

David Joubert

2018-Sep-07 17:40 UTC

[R] Selecting random subset by ID

Hello R users,

I am working with a large dataset, including roughly 50 000 sequential
observations (variable "count") for 8000 individuals (variable
"id"). The dataset is very unbalanced, meaning that some individuals
have few observations and others have many. Because I plan on running
Generalized Linear Models for panel data using pglm and the package has file
size restrictions, I want to create 4 randomly selected subsets of 2500
individuals from the main dataset. What functions and code would I use to do
this?

Thanks in advance,

David Joubert



	[[alternative HTML version deleted]]

Bert Gunter

2018-Sep-07 19:00 UTC

head link

[R] Selecting random subset by ID

?sample

Should get you started

We expect you to first make an effort to learn about and write your
own code, rather than asking us to write it for you.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 7, 2018 at 11:38 AM David Joubert <David.Joubert at
uottawa.ca> wrote:>
> Hello R users,
>
> I am working with a large dataset, including roughly 50 000 sequential
observations (variable "count") for 8000 individuals (variable
"id"). The dataset is very unbalanced, meaning that some individuals
have few observations and others have many. Because I plan on running
Generalized Linear Models for panel data using pglm and the package has file
size restrictions, I want to create 4 randomly selected subsets of 2500
individuals from the main dataset. What functions and code would I use to do
this?
>
> Thanks in advance,
>
> David Joubert
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2018-Sep-07 20:06 UTC

head link

[R] Selecting random subset by ID

IMO it is worth pointing out that you don't have to write code that solves
your problem (else why have this list?) but this whole communication thing works
best when you write code that creates a mock set of data that illustrates what
you are starting from and some mock output.

The mock input can sometimes be the output of the dput function on a subset of
your data, but in your case would probably be something more like

set.seed(42)
ids <- data.frame( id=1:8000,
a1=rnorm(8000,0,1),n=sample(2:15,8000,replace=TRUE))
dta <- ids[rep(ids$id,ids$n),]
dta$a0 <- rnorm(nrow(dta),1,2)
dta$value <- with( dta, a0 + a1 )

where the exact way I approach making the data may not be exactly how your data
is structured, but clarifying and avoiding that misunderstanding is exactly what
you should try to address by learning how to do this when you ask your question.

You may find that reading the above helps you answer your own question, or you
can confirm that this data set is close enough and show what code you tried
starting with this data.

Oh, and by the way, sending your emails to this list formatted with html is a
good way to corrupt your code examples because this list only forwards the plain
text part of your email. Start with the plain text setting in your email program
and avoid further miscommunication.

More on reproducible examples [1][2][3].

[1]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the
vignette)

On September 7, 2018 12:00:07 PM PDT, Bert Gunter <bgunter.4567 at
gmail.com> wrote:>?sample
>
>Should get you started
>
>We expect you to first make an effort to learn about and write your
>own code, rather than asking us to write it for you.
>
>-- Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Fri, Sep 7, 2018 at 11:38 AM David Joubert
><David.Joubert at uottawa.ca> wrote:
>>
>> Hello R users,
>>
>> I am working with a large dataset, including roughly 50 000
>sequential observations (variable "count") for 8000 individuals
>(variable "id"). The dataset is very unbalanced, meaning that some
>individuals have few observations and others have many. Because I plan
>on running Generalized Linear Models for panel data using pglm and the
>package has file size restrictions, I want to create 4 randomly
>selected subsets of 2500 individuals from the main dataset. What
>functions and code would I use to do this?
>>
>> Thanks in advance,
>>
>> David Joubert
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

R help - Sep 2018 - Selecting random subset by ID

[R] Selecting random subset by ID

[R] Selecting random subset by ID

[R] Selecting random subset by ID