Mansfield, Desmond
2013-Apr-06 23:07 UTC
[R] Multiple subsetting of a dataframe based on many conditions
Hello Everybody, I'm working with a dataframe that has 18 columns. I would like to subset the data in one of these columns, "present", according to combinations of data in six of the other columns within the data frame and then save this into a text file. The columns I would like to use to subset "present" are: * answer (1:4) [answer takes the values 1 to 4] *p.num (1:18) * session (1:2) * count (1:8) * type (1:3) So there are a total of 3456 possible subsetting combinations. At present, I have been using the following and manually changing the values in each line and re-running the code. input<-subset(input, answer==1) input.s2g<-subset(input, p.num == 1) input.s2g<-subset(input.s2g, session == "S2") input.s2g<-subset(input.s2g, count==8) input.s2g<-subset(input.s2g, type==1) write.table(s2g, file = "1_1_S2_8_1", sep = "\t", col.names = F, row.names = F) But this takes me hours and is obviously prone to error. There must be an easier way? Thanks for any help! [[alternative HTML version deleted]]
Adams, Jean
2013-Apr-08 14:45 UTC
[R] Multiple subsetting of a dataframe based on many conditions
# here's an example data frame n <- 10 mydf <- data.frame(present=rnorm(n), answer=sample(1:4, n, replace=TRUE), p.num=sample(1:18, n, replace=TRUE), session=sample(1:2, n, replace=TRUE), count=sample(1:8, n, replace=TRUE), type=sample(1:3, n, replace=TRUE)) # define a new variable, combo5, that represents the combination of the five columns you specified mydf$combo5 <- with(mydf, interaction(answer, p.num, session, count, type, drop=TRUE)) # split the data frame according to combo5 # this gives you a list of data frames mydf.split <- split(mydf, mydf$combo5) # use lapply() and write.table() to write each date frame in the list to a file lapply(mydf.split, function(x) write.table(x, file=as.character(x$combo5[1]), sep="\t", col.names=F, row.names=F)) Jean On Sat, Apr 6, 2013 at 6:07 PM, Mansfield, Desmond <dcm206@exeter.ac.uk>wrote:> Hello Everybody, > > I'm working with a dataframe that has 18 columns. I would like to subset > the data in one of these columns, "present", according to combinations of > data in six of the other columns within the data frame and then save this > into a text file. The columns I would like to use to subset "present" are: > > > * answer (1:4) [answer takes the values 1 to 4] > *p.num (1:18) > * session (1:2) > * count (1:8) > * type (1:3) > > > So there are a total of 3456 possible subsetting combinations. > > > At present, I have been using the following and manually changing the > values in each line and re-running the code. > > input<-subset(input, answer==1) > input.s2g<-subset(input, p.num == 1) > input.s2g<-subset(input.s2g, session == "S2") > input.s2g<-subset(input.s2g, count==8) > input.s2g<-subset(input.s2g, type==1) > > > write.table(s2g, file = "1_1_S2_8_1", sep = "\t", col.names = F, row.names > = F) > > But this takes me hours and is obviously prone to error. There must be an > easier way? > > > Thanks for any help! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]