thr3ads.net - R help - [R] drawing samples based on a matching variable [Sep 2010]

If this information is useful, please help other people find it:
Share via:

L Brown

2010-Sep-28 21:46 UTC

[R] drawing samples based on a matching variable

Hi, everyone. I have what I hope will be a simple coding question. It seems
this is a common job, but so far I've had trouble finding the answer in
searches.

I have two matrices (x and y) with a different number of observations in
each. I need to draw a random sample without replacement of observations
from x, and then, using a matching variable, draw a sample of equal size
from y. It is the matching variable that is hanging me up.

For example--
> # example matrices. lets assume seed always equals 1. (lets also assume I
have assigned variable names A and B to my columns..)> set.seed(1)
> x<-cbind(1:10,sample(1:5,10,rep=T))
> x      [A] [B]
 [1,]    1    2
 [2,]    2    2
 [3,]    3    3
 [4,]    4    5
 [5,]    5    2
 [6,]    6    5
 [7,]    7    5
 [8,]    8    4
 [9,]    9    4
[10,]   10    1
> y<-cbind(1:14,sample(1:5,14,rep=T))
> y      [A] [B]
 [1,]    1    2
 [2,]    2    2
 [3,]    3    3
 [4,]    4    5
 [5,]    5    2
 [6,]    6    5
 [7,]    7    5
 [8,]    8    4
 [9,]    9    4
[10,]   10    1
[11,]   11    2
[12,]   12    1
[13,]   13    4
[14,]   14    2
> #draw random sample of n=4 without replacement from matrix x.
> x.samp<-x[sample(10,4,replace=F),]
> x.samp     [A] [B]
[1,]    3    3
[2,]    4    5
[3,]    5    2
[4,]    7    5

Next, I would need to draw four observations from matrix y (without
replacement) so that the distribution of y$B is identical to x.samp$B.

I'd appreciate any help, and sorry to post such a basic question!

LB

	[[alternative HTML version deleted]]

Huso, Manuela

2010-Sep-28 23:03 UTC

head link

[R] fixing the dispersion parameter in glm

I would like to fit a glm with Poisson distribution and log link with a known
dispersion parameter.  I do not want to estimate the dispersion parameter.  I
know what it is, so I simply want to fix it at a constant for this and other
models to follow.  My simple, no covariate model is:

Tall.glm<-glm(Seedling~1, 
	family=poisson, 
	offset(log(area)),
	data=tallPSME.df)

I want to fix the dispersion parameter at 2.5.  How can I do this, please?

Thanks in advance,

Manuela
 >::<>::<>::<>::<>::<>::<>::<>::<>::<Manuela Huso
Consulting Statistician
201H Richardson Hall
Department of Forest Ecosystems and Society
Oregon State University
Corvallis, OR   97331
ph: 541-737-6232
fx: 541-737-1393

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of L Brown
Sent: Tuesday, September 28, 2010 2:47 PM
To: r-help at r-project.org
Subject: [R] drawing samples based on a matching variable

Hi, everyone. I have what I hope will be a simple coding question. It seems
this is a common job, but so far I've had trouble finding the answer in
searches.

I have two matrices (x and y) with a different number of observations in
each. I need to draw a random sample without replacement of observations
from x, and then, using a matching variable, draw a sample of equal size
from y. It is the matching variable that is hanging me up.

For example--
> # example matrices. lets assume seed always equals 1. (lets also assume I
have assigned variable names A and B to my columns..)> set.seed(1)
> x<-cbind(1:10,sample(1:5,10,rep=T))
> x      [A] [B]
 [1,]    1    2
 [2,]    2    2
 [3,]    3    3
 [4,]    4    5
 [5,]    5    2
 [6,]    6    5
 [7,]    7    5
 [8,]    8    4
 [9,]    9    4
[10,]   10    1
> y<-cbind(1:14,sample(1:5,14,rep=T))
> y      [A] [B]
 [1,]    1    2
 [2,]    2    2
 [3,]    3    3
 [4,]    4    5
 [5,]    5    2
 [6,]    6    5
 [7,]    7    5
 [8,]    8    4
 [9,]    9    4
[10,]   10    1
[11,]   11    2
[12,]   12    1
[13,]   13    4
[14,]   14    2
> #draw random sample of n=4 without replacement from matrix x.
> x.samp<-x[sample(10,4,replace=F),]
> x.samp     [A] [B]
[1,]    3    3
[2,]    4    5
[3,]    5    2
[4,]    7    5

Next, I would need to draw four observations from matrix y (without
replacement) so that the distribution of y$B is identical to x.samp$B.

I'd appreciate any help, and sorry to post such a basic question!

LB

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Remko Duursma

2010-Sep-28 23:47 UTC

head link

[R] fixing the dispersion parameter in glm

How about:

y[y[,2] %in% x.samp[,2],]

gives you the subset of y where values in the second column are restricted
to your sample from x.

You can then sample from this matrix, if you need to...


greetings,
Remko
-- 
View this message in context:
http://r.789695.n4.nabble.com/drawing-samples-based-on-a-matching-variable-tp2718009p2718128.html
Sent from the R help mailing list archive at Nabble.com.

Michael Bedward

2010-Sep-29 01:40 UTC

head link

[R] drawing samples based on a matching variable

Hello LB,

It's one of those problems that's basic but tricky :)  I don't have
an
elegant one-liner for it but here's a function that would do it...

function(xs, y) {
# sample matrix y such that col 2 of the sample matches
# col 2 of matrix xs

  used <- logical(nrow(y))
  yi <- integer(nrow(xs))

  k <- 1
  for (xsval in xs[,2]) {
    i <- which( !used & y[,2] == xsval )
    if (length(i) >= 1) {
      yi[k] <- sample(i, 1)
      used[ yi[k] ] <- TRUE
      k <- k + 1
    } else {
      stop("bummer: not possible to get a matching sample")
    }
  }

  y[yi, ]
}

Note, I've assumed here that in your real data the first col won't
always contain the row index as it does in your example.

Michael

On 29 September 2010 07:46, L Brown <missmissliss at gmail.com>
wrote:> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume
I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
> ? ? ?[A] [B]
> ?[1,] ? ?1 ? ?2
> ?[2,] ? ?2 ? ?2
> ?[3,] ? ?3 ? ?3
> ?[4,] ? ?4 ? ?5
> ?[5,] ? ?5 ? ?2
> ?[6,] ? ?6 ? ?5
> ?[7,] ? ?7 ? ?5
> ?[8,] ? ?8 ? ?4
> ?[9,] ? ?9 ? ?4
> [10,] ? 10 ? ?1
>
>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
> ? ? ?[A] [B]
> ?[1,] ? ?1 ? ?2
> ?[2,] ? ?2 ? ?2
> ?[3,] ? ?3 ? ?3
> ?[4,] ? ?4 ? ?5
> ?[5,] ? ?5 ? ?2
> ?[6,] ? ?6 ? ?5
> ?[7,] ? ?7 ? ?5
> ?[8,] ? ?8 ? ?4
> ?[9,] ? ?9 ? ?4
> [10,] ? 10 ? ?1
> [11,] ? 11 ? ?2
> [12,] ? 12 ? ?1
> [13,] ? 13 ? ?4
> [14,] ? 14 ? ?2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
> ? ? [A] [B]
> [1,] ? ?3 ? ?3
> [2,] ? ?4 ? ?5
> [3,] ? ?5 ? ?2
> [4,] ? ?7 ? ?5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!
>
> LB
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry

2010-Sep-29 17:54 UTC

head link

[R] drawing samples based on a matching variable

On Tue, 28 Sep 2010, L Brown wrote:
> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume
I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
>      [A] [B]
> [1,]    1    2
> [2,]    2    2
> [3,]    3    3
> [4,]    4    5
> [5,]    5    2
> [6,]    6    5
> [7,]    7    5
> [8,]    8    4
> [9,]    9    4
> [10,]   10    1
>
Looks like set.seed(1) was invoked here, too.
>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
>      [A] [B]
> [1,]    1    2
> [2,]    2    2
> [3,]    3    3
> [4,]    4    5
> [5,]    5    2
> [6,]    6    5
> [7,]    7    5
> [8,]    8    4
> [9,]    9    4
> [10,]   10    1
> [11,]   11    2
> [12,]   12    1
> [13,]   13    4
> [14,]   14    2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
>     [A] [B]
> [1,]    3    3
> [2,]    4    5
> [3,]    5    2
> [4,]    7    5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!

Break it down like this:
> x.levels <- sort( unique(x[,2]) )
> x.samp.tab <- table( factor( x.samp[,2], x.levels ) )
> y.rows <- split(1:nrow(y),factor( y[,2], x.levels ) )
> unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE )
In some cases sample might fail if

 	length( y.rows[[i]] ) < x.samp.tab[ i ]

you can trace which element that was using 'traceback()' or write a 
wrapper for sample() that checks that condition.

HTH,

Chuck
>
> LB
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Reasonably Related Threads

Search for more reasonably related threads

R help - Sep 2010 - drawing samples based on a matching variable

[R] drawing samples based on a matching variable

[R] fixing the dispersion parameter in glm

[R] fixing the dispersion parameter in glm

[R] drawing samples based on a matching variable

[R] drawing samples based on a matching variable

Reasonably Related Threads