Dear all, I'm having trouble with my dataframe, and I hope someone can help me out... I have data from 40 subjects, displayed in a dataframe. I have randomly assigned subjects to group 1 or 0 (mar.y==0 or mar.y==1, with probabilities used). In the end, I want 34 cases assigned to group 0, with the rest of the subjects assigned to group 1. However, if there are more than 34 cases assigned to group 0 due to the randomness, I would like to keep 34 cases in group 0 (this is already written in my script below), but with the rest of the cases assigned to group 1. (Vice versa, if there are less than 34 cases assigned to group 0, I would like to sample cases from group 1 and put them in group 0, while retaining the rest of group 1 in my dataframe.) I can't figure out how to keep 34 cases in group 0, WHILE assigning the rest of the cases a value 1 (mar.y==1)... if (length(which(df$mar.y==0))>34) { df <- df[sample(which(df$mar.y==0),34), ] } else { df <- df[c(which(df$mar.y==0), sample(which(df$mar.y==1),34-length(which(df$mar.y==0)))), ] } (I'm aware that using this script is not the most elegant way to solve the problem, but because this script is part of a larger design, I have to stick to this example.) Hope someone has an answer. Wishing you a very happy 2011, Sarah. -- View this message in context: http://r.789695.n4.nabble.com/dataframe-simulating-data-tp3169246p3169246.html Sent from the R help mailing list archive at Nabble.com.
On Fri, Dec 31, 2010 at 01:51:18AM -0800, Sarah wrote:> > Dear all, > > I'm having trouble with my dataframe, and I hope someone can help me out... > I have data from 40 subjects, displayed in a dataframe. I have randomly > assigned subjects to group 1 or 0 (mar.y==0 or mar.y==1, with probabilities > used). > In the end, I want 34 cases assigned to group 0, with the rest of the > subjects assigned to group 1. However, if there are more than 34 cases > assigned to group 0 due to the randomness, I would like to keep 34 cases in > group 0 (this is already written in my script below), but with the rest of > the cases assigned to group 1. (Vice versa, if there are less than 34 cases > assigned to group 0, I would like to sample cases from group 1 and put them > in group 0, while retaining the rest of group 1 in my dataframe.) > I can't figure out how to keep 34 cases in group 0, WHILE assigning the rest > of the cases a value 1 (mar.y==1)... > > if (length(which(df$mar.y==0))>34) { > df <- df[sample(which(df$mar.y==0),34), ] > } else { > df <- df[c(which(df$mar.y==0), > sample(which(df$mar.y==1),34-length(which(df$mar.y==0)))), ] > }I am not sure, what is the question. According to my tests, this code works, if you want to rewrite df by a data frame with exactly 34 cases. The command sample(which(...)) is slightly dangerous, since if which() produces only one index, say i, then sample(which()) samples from 1:i. However, with the parameters 34 and 40, your code uses sample() to vectors of length at least 35 or at least 40 - 34. If you want to keep all cases and only reassign the groups, you can either modify df$mar.y (and not the whole df) or introduce a new column of df with the index of the new group. Petr Savicky.
On 12/31/2010 08:51 PM, Sarah wrote:> > Dear all, > > I'm having trouble with my dataframe, and I hope someone can help me out... > I have data from 40 subjects, displayed in a dataframe. I have randomly > assigned subjects to group 1 or 0 (mar.y==0 or mar.y==1, with probabilities > used). > In the end, I want 34 cases assigned to group 0, with the rest of the > subjects assigned to group 1. However, if there are more than 34 cases > assigned to group 0 due to the randomness, I would like to keep 34 cases in > group 0 (this is already written in my script below), but with the rest of > the cases assigned to group 1. (Vice versa, if there are less than 34 cases > assigned to group 0, I would like to sample cases from group 1 and put them > in group 0, while retaining the rest of group 1 in my dataframe.) > I can't figure out how to keep 34 cases in group 0, WHILE assigning the rest > of the cases a value 1 (mar.y==1)... > > if (length(which(df$mar.y==0))>34) { > df<- df[sample(which(df$mar.y==0),34), ] > } else { > df<- df[c(which(df$mar.y==0), > sample(which(df$mar.y==1),34-length(which(df$mar.y==0)))), ] > } > > (I'm aware that using this script is not the most elegant way to solve the > problem, but because this script is part of a larger design, I have to stick > to this example.)Hi Sarah, Why not just use sample to select 34 of your cases from the 40? df$mar.y<-1 df$mar.y[sample(1:40,34)]<-0 Jim