thr3ads.net - R help - [R] Large Test Datasets in R [Jun 2012]

If this information is useful, please help other people find it:
Share via:

vioravis

2012-Jun-24 14:08 UTC

[R] Large Test Datasets in R

I am looking for some large datasets (10,000 rows & 100,000 columns or vice
versa) to create some test sets.  I am not concerned about the invidividual
elements since I will be converting them to binary (0/1) by using arbitrary
thresholds.

Does any R package provide such big datasets?

Also, what is the biggest text document collection available in R? tm
package seems to provide only 20 records from the Reuters dataset. Is there
any package that has 10,000+ documents??

Would appreciate any help on these.

Thank you.

Ravi

--
View this message in context:
http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html
Sent from the R help mailing list archive at Nabble.com.

Joshua Wiley

2012-Jun-25 03:45 UTC

head link

[R] Large Test Datasets in R

Hi Ravi,

My hunch would be "no" because it seems awfully inefficient.  Packages
are mirrored all over the world, and it seems rather silly to be
mirroring, updating, etc. large datasets.

The good news is that if you just want a 10,000 x 100,000 matrix of
0/1s, it is trivial to generate:

X <- matrix(sample(0L:1L, 10^9, TRUE), nrow = 10^4)

Even stored as integers, this is probably going to be around 4GB.  If
you want arbitrary values to later cut:

X <- matrix(rnorm(10^9), nrow = 10^4)

Cheers,

Josh


On Sun, Jun 24, 2012 at 7:08 AM, vioravis <vioravis at gmail.com>
wrote:> I am looking for some large datasets (10,000 rows & 100,000 columns or
vice
> versa) to create some test sets. ?I am not concerned about the invidividual
> elements since I will be converting them to binary (0/1) by using arbitrary
> thresholds.
>
> Does any R package provide such big datasets?
>
> Also, what is the biggest text document collection available in R? tm
> package seems to provide only 20 records from the Reuters dataset. Is there
> any package that has 10,000+ documents??
>
> Would appreciate any help on these.
>
> Thank you.
>
> Ravi
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

R help - Jun 2012 - Large Test Datasets in R

[R] Large Test Datasets in R

[R] Large Test Datasets in R