thr3ads.net - R help - [R] Extracting random rows from a dataset [Jan 2009]

If this information is useful, please help other people find it:
Share via:

S.Putoto

2009-Jan-18 17:35 UTC

[R] Extracting random rows from a dataset

Hello dear R Users,

I am working on a dataset of 928 Enterprises, of which are observed 12
different characters. I need to randomly sample, without repetition, 70% of
the entreprises, to create a testing set, and let the other 30% of the
enterprises be a validating set (holdout validation, I think that is). How
do I do that? Of course all the characters of each row must remain together.
Also, I am not very familiar with the R-Base language (it is the first time
I use it) so if You could also explain to me what every function and
argument means, it would be great help to then reiterate the procedure.

Thank You very much, 

Sebastiano
-- 
View this message in context:
http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2009-Jan-18 19:21 UTC

head link

[R] Extracting random rows from a dataset

Here is one way to do it:
> x <- matrix(1:100,10)
> x      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1   11   21   31   41   51   61   71   81    91
 [2,]    2   12   22   32   42   52   62   72   82    92
 [3,]    3   13   23   33   43   53   63   73   83    93
 [4,]    4   14   24   34   44   54   64   74   84    94
 [5,]    5   15   25   35   45   55   65   75   85    95
 [6,]    6   16   26   36   46   56   66   76   86    96
 [7,]    7   17   27   37   47   57   67   77   87    97
 [8,]    8   18   28   38   48   58   68   78   88    98
 [9,]    9   19   29   39   49   59   69   79   89    99
[10,]   10   20   30   40   50   60   70   80   90   100> select <- sample(nrow(x), nrow(x) * .7)
> x[select,]  # select     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3   13   23   33   43   53   63   73   83    93
[2,]    2   12   22   32   42   52   62   72   82    92
[3,]    5   15   25   35   45   55   65   75   85    95
[4,]    9   19   29   39   49   59   69   79   89    99
[5,]    7   17   27   37   47   57   67   77   87    97
[6,]   10   20   30   40   50   60   70   80   90   100
[7,]    8   18   28   38   48   58   68   78   88    98> x[-select,]  # testing     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1   11   21   31   41   51   61   71   81    91
[2,]    4   14   24   34   44   54   64   74   84    94
[3,]    6   16   26   36   46   56   66   76   86    96>

On Sun, Jan 18, 2009 at 12:35 PM, S.Putoto <rebelshop615 at gmail.com>
wrote:>
> Hello dear R Users,
>
> I am working on a dataset of 928 Enterprises, of which are observed 12
> different characters. I need to randomly sample, without repetition, 70% of
> the entreprises, to create a testing set, and let the other 30% of the
> enterprises be a validating set (holdout validation, I think that is). How
> do I do that? Of course all the characters of each row must remain
together.
> Also, I am not very familiar with the R-Base language (it is the first time
> I use it) so if You could also explain to me what every function and
> argument means, it would be great help to then reiterate the procedure.
>
> Thank You very much,
>
> Sebastiano
> --
> View this message in context:
http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

David Winsemius

2009-Jan-18 19:37 UTC

head link

[R] Extracting random rows from a dataset

> read.table(textConnection(gsub("\\(|\\)", "", var) )) 
#from priorposting
   V1 V2
1 p1 10
2 p1  3
3 p1  4
4 p2 20
5 p2 30
6 p2 40
7 p3  4
8 p3  1
9 p1  2

 > ridxs <- sample(1:nrow(df),floor(0.7*nrow(df)) )  # the 70% sample  
row IDs

 > df[ridxs,]
   V1 V2
5 p2 30
6 p2 40
2 p1  3
7 p3  4
4 p2 20
8 p3  1
 >
 >
 > df[-ridxs,]
   V1 V2
1 p1 10
3 p1  4
9 p1  2

The terms to pay particular attention to in the introductory material  
are row indexing, dataframe, and negative indexing of dataframes.



On Jan 18, 2009, at 12:35 PM, S.Putoto wrote:
>
> Hello dear R Users,
>
> I am working on a dataset of 928 Enterprises, of which are observed 12
> different characters. I need to randomly sample, without repetition,  
> 70% of
> the entreprises, to create a testing set, and let the other 30% of the
> enterprises be a validating set (holdout validation, I think that  
> is). How
> do I do that? Of course all the characters of each row must remain  
> together.
> Also, I am not very familiar with the R-Base language (it is the  
> first time
> I use it) so if You could also explain to me what every function and
> argument means, it would be great help to then reiterate the  
> procedure.
Really! Don't you that is a bit much? There are many tutorials  
available on line. The terms to pay particular attention to in the  
introductory material are indexing, dataframe, and negative indexing  
of dataframes.

--
David Winsemius
>
>
> Thank You very much,
>
> Sebastiano
> -- 
> View this message in context:
http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more reasonably related threads

R help - Jan 2009 - Extracting random rows from a dataset

[R] Extracting random rows from a dataset

[R] Extracting random rows from a dataset

[R] Extracting random rows from a dataset

Apparently Analagous Threads