Emmanuel Levy
2009-Aug-12 22:05 UTC
[R] Random sampling while keeping distribution of nearest neighbor distances constant.
Dear All, I cannot find a solution to the following problem although I imagine that it is a classic, hence my email. I have a vector V of X values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like to keep the same distribution of distances between the X values * For example let's say N=10 and I have V = c(3,4,5,6) then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc.. so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <-> 5, 4 <-> 6 etc ...) is kept constant. I couldn't find a package that help me with this, but it looks like it should be a classic problem so there should be something! Many thanks in advance for any help or hint you could provide, All the best, Emmanuel
(Ted Harding)
2009-Aug-12 22:49 UTC
[R] Random sampling while keeping distribution of nearest ne
On 12-Aug-09 22:05:24, Emmanuel Levy wrote:> Dear All, > I cannot find a solution to the following problem although I imagine > that it is a classic, hence my email. > > I have a vector V of X values comprised between 1 and N. > > I would like to get random samples of X values also comprised between > 1 and N, but the important point is: > * I would like to keep the same distribution of distances between the X > values * > > For example let's say N=10 and I have V = c(3,4,5,6) > then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or > 4,5,6,7 etc.. > so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <-> > 5, 4 <-> 6 etc ...) is kept constant. > > I couldn't find a package that help me with this, but it looks like it > should be a classic problem so there should be something! > > Many thanks in advance for any help or hint you could provide, > All the best, > EmmanuelIf I've understood you right, you are basically putting a sequence with given spacings in a random position amongst the available positions. In your example, you would randomly choose between 1,2,3,4/2,3,4,5/3,4,5,6/4,5,6,7/5,6,7,8/6,7,8,9/7,8,9,10/ Hence a result Y could be: A <- min(V) L <- max(V) - A + 1 M <- (0:(N-L)) Y <- 1 + (V-A) + sample(M,1) I think this does it! -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 12-Aug-09 Time: 23:49:22 ------------------------------ XFMail ------------------------------
Emmanuel Levy
2009-Aug-12 22:53 UTC
[R] Random sampling while keeping distribution of nearest neighbor distances constant.
Dear All, (my apologies if it got posted twice, it seems it didn't get through) I cannot find a solution to the following problem although I suppose this is a classic. I have a vector V of X=length(V) values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like to keep the same distribution of distances between the original X values * For example let's say N=10 and I have V = c(3,4,5,6) then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc.. so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <-> 5, 4 <-> 6 etc ...) is kept constant. I couldn't find a package that help me with this, but it looks like it should be a classic problem so there should be something! Many thanks in advance for any help or hint you could provide, All the best, Emmanuel
Nordlund, Dan (DSHS/RDA)
2009-Aug-12 23:00 UTC
[R] Random sampling while keeping distribution of nearest neighbor distances constant.
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of Emmanuel Levy > Sent: Wednesday, August 12, 2009 3:05 PM > To: r-help at stat.math.ethz.ch > Cc: dev djomson > Subject: [R] Random sampling while keeping distribution of nearest neighbor > distances constant. > > Dear All, > > I cannot find a solution to the following problem although I imagine > that it is a classic, hence my email. > > I have a vector V of X values comprised between 1 and N. > > I would like to get random samples of X values also comprised between > 1 and N, but the important point is: > * I would like to keep the same distribution of distances between the X values * > > For example let's say N=10 and I have V = c(3,4,5,6) > then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc.. > so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <-> > 5, 4 <-> 6 etc ...) is kept constant. > > I couldn't find a package that help me with this, but it looks like it > should be a classic problem so there should be something! > > Many thanks in advance for any help or hint you could provide, > > All the best, > > Emmanuel >Emmanuel, I don't know if this is a classic problem or not. But given your description, you write your own function something like this sample.dist <- function(vec, Min=1, Max=10){ diffs <- c(0,diff(vec)) sum_d <- sum(diffs) sample(Min:(Max-sum_d),1)+cumsum(diffs) } Where Min and Max are the minimum and maximum values that you are sampling from (Min=1 and Max=10 in your example), and vec is passed the vector that you are sampling distances from. This assumes that your vector is sorted smallest to largest as in your example. The function could be changed to accommodate a vector that isn't sorted.> V <- sort(sample(1:100,4)) > V#[1] 46 78 82 95> sample.dist(V, Min=1, Max=100)#[1] 36 68 72 85> sample.dist(V, Min=1, Max=100)#[1] 12 44 48 61>This should get you started at least. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204