Hi, Thanks in advance for any advice you can give me, I am very stumped on this problem... I use R every day and consider myself a confident user, but this seems to be an elementary problem.. Outline of problem: I am analysing the results of a study on protein expression in cancer tissues. I have raw intensities from 2 different types of cancer and normal tissue, which can be taken from several different parts of the cell, as well as patient information. Part of the analysis calls for a fold-change calculation. In order to do this I am sub-setting the dataset by cancer type, merging each cancer dataset with the data from the Normal tissue, then calculating fold change for matching individuals and cell section. The problem is that I have been tracking one factor in particular ('branch', values 2 or 3) and once the final merge occurs, the second level of this factor seems to disappear in the last dataset, even though it was present before. See code & output below:> dim(tma) > names(tma)[1] "Code" "marker" "cell" "tumourA" "tumourEXP" "int" "stain" "tumourPERC" "branch"> levels(tma$tumourA)[1] "DCIS" "LN Metastasis" "Normal" "Primary Invasive Carcinoma" #split into cancer and normal tissue> tma1<-subset(tma, tumourA=="Primary Invasive Carcinoma") > tma2<-subset(tma, tumourA=="LN Metastasis") > tmaN<-subset(tma, tumourA=="Normal")#size of datasets> dim(tma1)[1] 587 9> dim(tma2)[1] 323 9> dim(tmaN)[1] 142 9 #merge back with normal type> tma1.1<-merge(tmaN, tma1, by="Code") > tma2.1<-merge(tmaN, tma2, by="Code")#new dimensions (seem excessively large)> dim(tma1.1)[1] 2439 17> dim(tma2.1)[1] 625 17 #progression of "branch: factor in datasets. Note last one where it disappears...> table(tma$branch)2 3 450 613> table(tma1$branch)2 3 314 273> table(tma2$branch)2 3 39 284> table(tmaN$branch)2 3 91 51> table(tma1.1$branch.x)2 3 1806 633> table(tma2.1$branch.x)3 625 Please, can someone tell me what's going on? Thanks you very much, Zoe van Havre [[alternative HTML version deleted]]
Patrick Connolly
2009-Dec-17 06:51 UTC
[R] Help with Merge - unexpected loss of factor level
On Thu, 17-Dec-2009 at 03:17PM +1000, Zoe van Havre wrote: [...] |> The problem is that I have been tracking one factor in particular |> ('branch', values 2 or 3) and once the final merge occurs, the |> second level of this factor seems to disappear in the last dataset, |> even though it was present before. See code & output below: |> |> > dim(tma) You didn't tell us that one. What size is it? |> > names(tma) |> [1] "Code" "marker" "cell" "tumourA" "tumourEXP" "int" "stain" "tumourPERC" "branch" |> > levels(tma$tumourA) |> [1] "DCIS" "LN Metastasis" "Normal" "Primary Invasive Carcinoma" |> #split into cancer and normal tissue |> > tma1<-subset(tma, tumourA=="Primary Invasive Carcinoma") |> > tma2<-subset(tma, tumourA=="LN Metastasis") |> > tmaN<-subset(tma, tumourA=="Normal") |> [...] |> 2 3 |> 91 51 |> > table(tma1.1$branch.x) |> |> 2 3 |> 1806 633 |> > table(tma2.1$branch.x) |> |> 3 |> 625 |> |> |> Please, can someone tell me what's going on? I suspect you'd have a lot of NAs in there. Try this: sapply(tma, function(x) sum(is.na(x))) If that doesn't tell you something interesting, try with the subsets. Or maybe when you use table(), try the exclude=NULL argument. HTH -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.