thr3ads.net - R help - [R] Advice needed on awkward tables [May 2010]

If this information is useful, please help other people find it:
Share via:

Greg Orm

2010-May-11 04:05 UTC

[R] Advice needed on awkward tables

Dear r-help list members,

I am quite new to R, and hope to seek advice from you about a problem I have
been cracking my head over. Apologies if this seems like a simple problem.

I have essentially two tables. The first (Table A) is a standard patient
clinicopathological data table, where rows correspond to patient IDs and
columns correspond to clinical features. Records in this table are stored as
1 or 0 (denoting presence). An example is provided below.

The second (Table B) is a table that represents a 'key' to Table A. This
Table B has a category field, as well as a feature field which links to the
Table B. Unfortunately, this is a one-to-many relationship, and the numbers
in the feature field represent the respective columns in Table A, delimited
by semicolons. So in the example below, I need to collapse the data in Table
B into a table with nrow equivalent to the number of categories and ncol number
of patients. The collapsing of each categoriy, will be based on a
Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
true, the resulting outcome will be true)

data.table.a <-
matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
data.table.b <- data.frame
(ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))

In the example tables above, we hope to collapse the features by category -
so the final desired output will be a total of 10 patients as rows, and a
total of 5 categories as columns. (after collapsing the features by a
Boolean OR). (i.e. if any of the features in the category are present, it
will be a TRUE).

I apologize for the apparently awkward table, but this is what I had to
start with. I tried expanding data.table.b$FEATURE using strsplit, which
resulted in a list, and then I got stuck there for a long time.

Thanks for any help.

Greg

	[[alternative HTML version deleted]]

Daniel Malter

2010-May-11 05:34 UTC

head link

[R] Advice needed on awkward tables

Hi, even after rereading, I have little of a clue what it is exactly that you
are trying to do. It'd help if you provided a more concise, step-by-step
description and/or the smallest unambiguous example of the two tables AND of
what should come out at the end. Also, unless for relatively trivial
problems, the list typically likes to see some own effort and where you are
stuck, rather than to solve the whole problem.

Best,
Daniel

-- 
View this message in context:
r.789695.n4.nabble.com/Advice-needed-on-awkward-tables-tp2173289p2173341.html
Sent from the R help mailing list archive at Nabble.com.

Jeff Newmiller

2010-May-11 06:53 UTC

head link

[R] Advice needed on awkward tables

Greg Orm wrote:> Dear r-help list members,
>
> I am quite new to R, and hope to seek advice from you about a problem I
have
> been cracking my head over. Apologies if this seems like a simple problem.
>
>   I would not call it exactly simple... but mostly because your data
representation is so obscure.
> I have essentially two tables. The first (Table A) is a standard patient
> clinicopathological data table, where rows correspond to patient IDs and
> columns correspond to clinical features. Records in this table are stored
as
> 1 or 0 (denoting presence). An example is provided below.
>
> The second (Table B) is a table that represents a 'key' to Table A.
This
> Table B has a category field, as well as a feature field which links to the
> Table B. Unfortunately, this is a one-to-many relationship, and the numbers
> in the feature field represent the respective columns in Table A, delimited
> by semicolons. So in the example below, I need to collapse the data in
Table
> B into a table with nrow equivalent to the number of categories and ncol
> number of patients. The collapsing of each categoriy, will be based on a
> Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
> true, the resulting outcome will be true)
>
> data.table.a <-
>
matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
> data.table.b <- data.frame
>
(ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))
>
> In the example tables above, we hope to collapse the features by category -
> so the final desired output will be a total of 10 patients as rows, and a
> total of 5 categories as columns. (after collapsing the features by a
> Boolean OR). (i.e. if any of the features in the category are present, it
> will be a TRUE).
>
> I apologize for the apparently awkward table, but this is what I had to
> start with. I tried expanding data.table.b$FEATURE using strsplit, which
> resulted in a list, and then I got stuck there for a long time.
>
> Thanks for any help.
>
> Greg
>   Replacing your bizarre representation of features with something sane...

library(reshape)

data.table.b2 <- 
data.frame(ID=c(1,2,2,3,4,5,5,6,6,7),FEATURE=c(9,3,5,7,4,6,10,1,2,8))

data.table.a.melt <- melt( data.table.a )
names(data.table.a.melt) <- c("Patient", "FEATURE",
"value" )

data.table.a.melt$FEATURE <- 
as.numeric(sub("([^0-9]*)(\\d+)$","\\2",data.table.a.melt$FEATURE,perl=TRUE))

data.table.merge1 <- merge( data.table.a.melt, data.table.b2 )
data.table.merge2 <- merge( data.table.merge1, 
data.table.b[c("ID","CATEGORY")] )
data.table.merge2$value <- 0!=data.table.merge2$value
result <- cast(data.table.merge2, Patient ~ CATEGORY, any )

Jim Lemon

2010-May-11 10:39 UTC

head link

[R] Advice needed on awkward tables

On 05/11/2010 02:05 PM, Greg Orm wrote:> Dear r-help list members,
>
> I am quite new to R, and hope to seek advice from you about a problem I
have
> been cracking my head over. Apologies if this seems like a simple problem.
>
> I have essentially two tables. The first (Table A) is a standard patient
> clinicopathological data table, where rows correspond to patient IDs and
> columns correspond to clinical features. Records in this table are stored
as
> 1 or 0 (denoting presence). An example is provided below.
>
> The second (Table B) is a table that represents a 'key' to Table A.
This
> Table B has a category field, as well as a feature field which links to the
> Table B. Unfortunately, this is a one-to-many relationship, and the numbers
> in the feature field represent the respective columns in Table A, delimited
> by semicolons. So in the example below, I need to collapse the data in
Table
> B into a table with nrow equivalent to the number of categories and ncol
> number of patients. The collapsing of each categoriy, will be based on a
> Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
> true, the resulting outcome will be true)
>
> data.table.a<-
>
matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
> data.table.b<- data.frame
>
(ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))
>
> In the example tables above, we hope to collapse the features by category -
> so the final desired output will be a total of 10 patients as rows, and a
> total of 5 categories as columns. (after collapsing the features by a
> Boolean OR). (i.e. if any of the features in the category are present, it
> will be a TRUE).
>
> I apologize for the apparently awkward table, but this is what I had to
> start with. I tried expanding data.table.b$FEATURE using strsplit, which
> resulted in a list, and then I got stuck there for a long time.
Hi Greg,
Messy, but I think it works.

feature2category<-function(dta,dtb) {
  categories<-unique(dtb$CATEGORY)
  category.table<-matrix(0,nrow=dim(dta)[1],
   ncol=length(categories))
  colnames(category.table)<-
   paste("Category",1:length(categories))
  for(patrow in 1:dim(dta)[1]) {
   for(catrow in 1:dim(dtb)[1]) {
    acols<-as.numeric(unlist(strsplit(
     as.character(dtb[catrow,"FEATURE"]),";")))
   
cat("patrow",patrow,"catrow",catrow,acols,"\n")
    category.table[patrow,dtb[catrow,"CATEGORY"]]<-
     as.logical(sum(dta[patrow,acols]))
   }
  }
  return(as.data.frame(category.table))
}

Jim

Possibly Parallel Threads

Search for more maybe matching threads

R help - May 2010 - Advice needed on awkward tables

[R] Advice needed on awkward tables

[R] Advice needed on awkward tables

[R] Advice needed on awkward tables

[R] Advice needed on awkward tables

Possibly Parallel Threads