Dear r-help list members, I am quite new to R, and hope to seek advice from you about a problem I have been cracking my head over. Apologies if this seems like a simple problem. I have essentially two tables. The first (Table A) is a standard patient clinicopathological data table, where rows correspond to patient IDs and columns correspond to clinical features. Records in this table are stored as 1 or 0 (denoting presence). An example is provided below. The second (Table B) is a table that represents a 'key' to Table A. This Table B has a category field, as well as a feature field which links to the Table B. Unfortunately, this is a one-to-many relationship, and the numbers in the feature field represent the respective columns in Table A, delimited by semicolons. So in the example below, I need to collapse the data in Table B into a table with nrow equivalent to the number of categories and ncol number of patients. The collapsing of each categoriy, will be based on a Boolean OR, or the equivalent ANY in R (so long as 1 of the features is true, the resulting outcome will be true) data.table.a <- matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10))) data.table.b <- data.frame (ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8")) In the example tables above, we hope to collapse the features by category - so the final desired output will be a total of 10 patients as rows, and a total of 5 categories as columns. (after collapsing the features by a Boolean OR). (i.e. if any of the features in the category are present, it will be a TRUE). I apologize for the apparently awkward table, but this is what I had to start with. I tried expanding data.table.b$FEATURE using strsplit, which resulted in a list, and then I got stuck there for a long time. Thanks for any help. Greg [[alternative HTML version deleted]]
Hi, even after rereading, I have little of a clue what it is exactly that you are trying to do. It'd help if you provided a more concise, step-by-step description and/or the smallest unambiguous example of the two tables AND of what should come out at the end. Also, unless for relatively trivial problems, the list typically likes to see some own effort and where you are stuck, rather than to solve the whole problem. Best, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Advice-needed-on-awkward-tables-tp2173289p2173341.html Sent from the R help mailing list archive at Nabble.com.
Greg Orm wrote:> Dear r-help list members, > > I am quite new to R, and hope to seek advice from you about a problem I have > been cracking my head over. Apologies if this seems like a simple problem. > >I would not call it exactly simple... but mostly because your data representation is so obscure.> I have essentially two tables. The first (Table A) is a standard patient > clinicopathological data table, where rows correspond to patient IDs and > columns correspond to clinical features. Records in this table are stored as > 1 or 0 (denoting presence). An example is provided below. > > The second (Table B) is a table that represents a 'key' to Table A. This > Table B has a category field, as well as a feature field which links to the > Table B. Unfortunately, this is a one-to-many relationship, and the numbers > in the feature field represent the respective columns in Table A, delimited > by semicolons. So in the example below, I need to collapse the data in Table > B into a table with nrow equivalent to the number of categories and ncol > number of patients. The collapsing of each categoriy, will be based on a > Boolean OR, or the equivalent ANY in R (so long as 1 of the features is > true, the resulting outcome will be true) > > data.table.a <- > matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10))) > data.table.b <- data.frame > (ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8")) > > In the example tables above, we hope to collapse the features by category - > so the final desired output will be a total of 10 patients as rows, and a > total of 5 categories as columns. (after collapsing the features by a > Boolean OR). (i.e. if any of the features in the category are present, it > will be a TRUE). > > I apologize for the apparently awkward table, but this is what I had to > start with. I tried expanding data.table.b$FEATURE using strsplit, which > resulted in a list, and then I got stuck there for a long time. > > Thanks for any help. > > Greg >Replacing your bizarre representation of features with something sane... library(reshape) data.table.b2 <- data.frame(ID=c(1,2,2,3,4,5,5,6,6,7),FEATURE=c(9,3,5,7,4,6,10,1,2,8)) data.table.a.melt <- melt( data.table.a ) names(data.table.a.melt) <- c("Patient", "FEATURE", "value" ) data.table.a.melt$FEATURE <- as.numeric(sub("([^0-9]*)(\\d+)$","\\2",data.table.a.melt$FEATURE,perl=TRUE)) data.table.merge1 <- merge( data.table.a.melt, data.table.b2 ) data.table.merge2 <- merge( data.table.merge1, data.table.b[c("ID","CATEGORY")] ) data.table.merge2$value <- 0!=data.table.merge2$value result <- cast(data.table.merge2, Patient ~ CATEGORY, any )
On 05/11/2010 02:05 PM, Greg Orm wrote:> Dear r-help list members, > > I am quite new to R, and hope to seek advice from you about a problem I have > been cracking my head over. Apologies if this seems like a simple problem. > > I have essentially two tables. The first (Table A) is a standard patient > clinicopathological data table, where rows correspond to patient IDs and > columns correspond to clinical features. Records in this table are stored as > 1 or 0 (denoting presence). An example is provided below. > > The second (Table B) is a table that represents a 'key' to Table A. This > Table B has a category field, as well as a feature field which links to the > Table B. Unfortunately, this is a one-to-many relationship, and the numbers > in the feature field represent the respective columns in Table A, delimited > by semicolons. So in the example below, I need to collapse the data in Table > B into a table with nrow equivalent to the number of categories and ncol > number of patients. The collapsing of each categoriy, will be based on a > Boolean OR, or the equivalent ANY in R (so long as 1 of the features is > true, the resulting outcome will be true) > > data.table.a<- > matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10))) > data.table.b<- data.frame > (ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8")) > > In the example tables above, we hope to collapse the features by category - > so the final desired output will be a total of 10 patients as rows, and a > total of 5 categories as columns. (after collapsing the features by a > Boolean OR). (i.e. if any of the features in the category are present, it > will be a TRUE). > > I apologize for the apparently awkward table, but this is what I had to > start with. I tried expanding data.table.b$FEATURE using strsplit, which > resulted in a list, and then I got stuck there for a long time.Hi Greg, Messy, but I think it works. feature2category<-function(dta,dtb) { categories<-unique(dtb$CATEGORY) category.table<-matrix(0,nrow=dim(dta)[1], ncol=length(categories)) colnames(category.table)<- paste("Category",1:length(categories)) for(patrow in 1:dim(dta)[1]) { for(catrow in 1:dim(dtb)[1]) { acols<-as.numeric(unlist(strsplit( as.character(dtb[catrow,"FEATURE"]),";"))) cat("patrow",patrow,"catrow",catrow,acols,"\n") category.table[patrow,dtb[catrow,"CATEGORY"]]<- as.logical(sum(dta[patrow,acols])) } } return(as.data.frame(category.table)) } Jim