Hi, everyone. I have what I hope will be a simple coding question. It seems this is a common job, but so far I've had trouble finding the answer in searches. I have two matrices (x and y) with a different number of observations in each. I need to draw a random sample without replacement of observations from x, and then, using a matching variable, draw a sample of equal size from y. It is the matching variable that is hanging me up. For example--> # example matrices. lets assume seed always equals 1. (lets also assume Ihave assigned variable names A and B to my columns..)> set.seed(1) > x<-cbind(1:10,sample(1:5,10,rep=T)) > x[A] [B] [1,] 1 2 [2,] 2 2 [3,] 3 3 [4,] 4 5 [5,] 5 2 [6,] 6 5 [7,] 7 5 [8,] 8 4 [9,] 9 4 [10,] 10 1> y<-cbind(1:14,sample(1:5,14,rep=T)) > y[A] [B] [1,] 1 2 [2,] 2 2 [3,] 3 3 [4,] 4 5 [5,] 5 2 [6,] 6 5 [7,] 7 5 [8,] 8 4 [9,] 9 4 [10,] 10 1 [11,] 11 2 [12,] 12 1 [13,] 13 4 [14,] 14 2> #draw random sample of n=4 without replacement from matrix x. > x.samp<-x[sample(10,4,replace=F),] > x.samp[A] [B] [1,] 3 3 [2,] 4 5 [3,] 5 2 [4,] 7 5 Next, I would need to draw four observations from matrix y (without replacement) so that the distribution of y$B is identical to x.samp$B. I'd appreciate any help, and sorry to post such a basic question! LB [[alternative HTML version deleted]]
I would like to fit a glm with Poisson distribution and log link with a known dispersion parameter. I do not want to estimate the dispersion parameter. I know what it is, so I simply want to fix it at a constant for this and other models to follow. My simple, no covariate model is: Tall.glm<-glm(Seedling~1, family=poisson, offset(log(area)), data=tallPSME.df) I want to fix the dispersion parameter at 2.5. How can I do this, please? Thanks in advance, Manuela>::<>::<>::<>::<>::<>::<>::<>::<>::<Manuela Huso Consulting Statistician 201H Richardson Hall Department of Forest Ecosystems and Society Oregon State University Corvallis, OR 97331 ph: 541-737-6232 fx: 541-737-1393 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of L Brown Sent: Tuesday, September 28, 2010 2:47 PM To: r-help at r-project.org Subject: [R] drawing samples based on a matching variable Hi, everyone. I have what I hope will be a simple coding question. It seems this is a common job, but so far I've had trouble finding the answer in searches. I have two matrices (x and y) with a different number of observations in each. I need to draw a random sample without replacement of observations from x, and then, using a matching variable, draw a sample of equal size from y. It is the matching variable that is hanging me up. For example--> # example matrices. lets assume seed always equals 1. (lets also assume Ihave assigned variable names A and B to my columns..)> set.seed(1) > x<-cbind(1:10,sample(1:5,10,rep=T)) > x[A] [B] [1,] 1 2 [2,] 2 2 [3,] 3 3 [4,] 4 5 [5,] 5 2 [6,] 6 5 [7,] 7 5 [8,] 8 4 [9,] 9 4 [10,] 10 1> y<-cbind(1:14,sample(1:5,14,rep=T)) > y[A] [B] [1,] 1 2 [2,] 2 2 [3,] 3 3 [4,] 4 5 [5,] 5 2 [6,] 6 5 [7,] 7 5 [8,] 8 4 [9,] 9 4 [10,] 10 1 [11,] 11 2 [12,] 12 1 [13,] 13 4 [14,] 14 2> #draw random sample of n=4 without replacement from matrix x. > x.samp<-x[sample(10,4,replace=F),] > x.samp[A] [B] [1,] 3 3 [2,] 4 5 [3,] 5 2 [4,] 7 5 Next, I would need to draw four observations from matrix y (without replacement) so that the distribution of y$B is identical to x.samp$B. I'd appreciate any help, and sorry to post such a basic question! LB [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
How about: y[y[,2] %in% x.samp[,2],] gives you the subset of y where values in the second column are restricted to your sample from x. You can then sample from this matrix, if you need to... greetings, Remko -- View this message in context: http://r.789695.n4.nabble.com/drawing-samples-based-on-a-matching-variable-tp2718009p2718128.html Sent from the R help mailing list archive at Nabble.com.
Hello LB, It's one of those problems that's basic but tricky :) I don't have an elegant one-liner for it but here's a function that would do it... function(xs, y) { # sample matrix y such that col 2 of the sample matches # col 2 of matrix xs used <- logical(nrow(y)) yi <- integer(nrow(xs)) k <- 1 for (xsval in xs[,2]) { i <- which( !used & y[,2] == xsval ) if (length(i) >= 1) { yi[k] <- sample(i, 1) used[ yi[k] ] <- TRUE k <- k + 1 } else { stop("bummer: not possible to get a matching sample") } } y[yi, ] } Note, I've assumed here that in your real data the first col won't always contain the row index as it does in your example. Michael On 29 September 2010 07:46, L Brown <missmissliss at gmail.com> wrote:> Hi, everyone. I have what I hope will be a simple coding question. It seems > this is a common job, but so far I've had trouble finding the answer in > searches. > > I have two matrices (x and y) with a different number of observations in > each. I need to draw a random sample without replacement of observations > from x, and then, using a matching variable, draw a sample of equal size > from y. It is the matching variable that is hanging me up. > > For example-- > >> # example matrices. lets assume seed always equals 1. (lets also assume I > have assigned variable names A and B to my columns..) >> set.seed(1) >> x<-cbind(1:10,sample(1:5,10,rep=T)) >> x > ? ? ?[A] [B] > ?[1,] ? ?1 ? ?2 > ?[2,] ? ?2 ? ?2 > ?[3,] ? ?3 ? ?3 > ?[4,] ? ?4 ? ?5 > ?[5,] ? ?5 ? ?2 > ?[6,] ? ?6 ? ?5 > ?[7,] ? ?7 ? ?5 > ?[8,] ? ?8 ? ?4 > ?[9,] ? ?9 ? ?4 > [10,] ? 10 ? ?1 > >> y<-cbind(1:14,sample(1:5,14,rep=T)) >> y > ? ? ?[A] [B] > ?[1,] ? ?1 ? ?2 > ?[2,] ? ?2 ? ?2 > ?[3,] ? ?3 ? ?3 > ?[4,] ? ?4 ? ?5 > ?[5,] ? ?5 ? ?2 > ?[6,] ? ?6 ? ?5 > ?[7,] ? ?7 ? ?5 > ?[8,] ? ?8 ? ?4 > ?[9,] ? ?9 ? ?4 > [10,] ? 10 ? ?1 > [11,] ? 11 ? ?2 > [12,] ? 12 ? ?1 > [13,] ? 13 ? ?4 > [14,] ? 14 ? ?2 > >> #draw random sample of n=4 without replacement from matrix x. >> x.samp<-x[sample(10,4,replace=F),] >> x.samp > ? ? [A] [B] > [1,] ? ?3 ? ?3 > [2,] ? ?4 ? ?5 > [3,] ? ?5 ? ?2 > [4,] ? ?7 ? ?5 > > Next, I would need to draw four observations from matrix y (without > replacement) so that the distribution of y$B is identical to x.samp$B. > > I'd appreciate any help, and sorry to post such a basic question! > > LB > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Tue, 28 Sep 2010, L Brown wrote:> Hi, everyone. I have what I hope will be a simple coding question. It seems > this is a common job, but so far I've had trouble finding the answer in > searches. > > I have two matrices (x and y) with a different number of observations in > each. I need to draw a random sample without replacement of observations > from x, and then, using a matching variable, draw a sample of equal size > from y. It is the matching variable that is hanging me up. > > For example-- > >> # example matrices. lets assume seed always equals 1. (lets also assume I > have assigned variable names A and B to my columns..) >> set.seed(1) >> x<-cbind(1:10,sample(1:5,10,rep=T)) >> x > [A] [B] > [1,] 1 2 > [2,] 2 2 > [3,] 3 3 > [4,] 4 5 > [5,] 5 2 > [6,] 6 5 > [7,] 7 5 > [8,] 8 4 > [9,] 9 4 > [10,] 10 1 >Looks like set.seed(1) was invoked here, too.>> y<-cbind(1:14,sample(1:5,14,rep=T)) >> y > [A] [B] > [1,] 1 2 > [2,] 2 2 > [3,] 3 3 > [4,] 4 5 > [5,] 5 2 > [6,] 6 5 > [7,] 7 5 > [8,] 8 4 > [9,] 9 4 > [10,] 10 1 > [11,] 11 2 > [12,] 12 1 > [13,] 13 4 > [14,] 14 2 > >> #draw random sample of n=4 without replacement from matrix x. >> x.samp<-x[sample(10,4,replace=F),] >> x.samp > [A] [B] > [1,] 3 3 > [2,] 4 5 > [3,] 5 2 > [4,] 7 5 > > Next, I would need to draw four observations from matrix y (without > replacement) so that the distribution of y$B is identical to x.samp$B. > > I'd appreciate any help, and sorry to post such a basic question!Break it down like this:> x.levels <- sort( unique(x[,2]) ) > x.samp.tab <- table( factor( x.samp[,2], x.levels ) ) > y.rows <- split(1:nrow(y),factor( y[,2], x.levels ) ) > unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE )In some cases sample might fail if length( y.rows[[i]] ) < x.samp.tab[ i ] you can trace which element that was using 'traceback()' or write a wrapper for sample() that checks that condition. HTH, Chuck> > LB > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901