If you make the levels the same does that give what you want:
levs <- c(LETTERS[1:6], "0")
tmp1 <- data.frame(col1 = factor(c("A", "A",
"C", "C", "0", "0"), levs))
tmp2 <- data.frame(col1 = factor(c("C", "D",
"E", "F"), levs), col2 = 1:4)
merge(tmp2, tmp1, all = TRUE, sort = FALSE)
merge(tmp1, tmp2, all = TRUE, sort = FALSE)
On 3/6/06, Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si>
wrote:> Hello!
>
> I am merging two datasets and I have encountered a problem with sort.
> Can someone please point me to my error. Here is the example.
>
> ## I have dataframes, first one with factor and second one with factor
> ## and integer
> > tmp1 <- data.frame(col1 = factor(c("A", "A",
"C", "C", "0", "0")))
> > tmp2 <- data.frame(col1 = factor(c("C", "D",
"E", "F")), col2 = 1:4)
> > tmp1
> col1
> 1 A
> 2 A
> 3 C
> 4 C
> 5 0
> 6 0
> > tmp2
> col1 col2
> 1 C 1
> 2 D 2
> 3 E 3
> 4 F 4
>
> ## Now merge them
> > (tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y =
"col1",
> all.x = TRUE, sort = FALSE))
> col1 col2
> 1 C 1
> 2 C 1
> 3 A NA
> 4 A NA
> 5 0 NA
> 6 0 NA
>
> ## As you can see, sort was applied, since row order is not the same as
> ## in tmp1. Reading help page for ?merge did not reveal much about
> ## sorting. However I did try to see the result of "non-default"
-
> ## help page says that order should be the same as in 'y'. So above
> ## makes sense
>
> ## Now merge - but change x an y
> > (tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y =
"col1",
> all.y = TRUE, sort = FALSE))
> col1 col2
> 1 C 1
> 2 C 1
> 3 A NA
> 4 A NA
> 5 0 NA
> 6 0 NA
>
> ## The result is the same. I am stumped here. But looking a bit at these
> ## object I found something peculiar
>
> > str(tmp1)
> `data.frame': 6 obs. of 1 variable:
> $ col1: Factor w/ 3 levels "0","A","C": 2 2
3 3 1 1
> > str(tmp2)
> `data.frame': 4 obs. of 2 variables:
> $ col1: Factor w/ 4 levels
"C","D","E","F": 1 2 3 4
> $ col2: int 1 2 3 4
> > str(tmp12)
> `data.frame': 6 obs. of 2 variables:
> $ col1: Factor w/ 3 levels "0","A","C": 3 3
2 2 1 1
> $ col2: int 1 1 NA NA NA NA
> > str(tmp21)
> `data.frame': 6 obs. of 2 variables:
> $ col1: Factor w/ 6 levels
"C","D","E","F",..: 1 1 6 6 5 5
> $ col2: int 1 1 NA NA NA NA
>
> ## Is it OK, that internal presentation of factors vary between
> ## different merges. Levels are also different, once only levels
> ## from original data.frame are used, while in second example all
> ## levels are propagated.
>
> ## I have tried the same with characters
> > tmp1$col1 <- as.character(tmp1$col1)
> > tmp2$col1 <- as.character(tmp2$col1)
> > (tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y =
"col1",
> all.x = TRUE, sort = FALSE))
> col1 col2
> 1 C 1
> 2 C 1
> 3 A NA
> 4 A NA
> 5 0 NA
> 6 0 NA
>
> > (tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y =
"col1",
> all.y = TRUE, sort = FALSE))
> col1 col2
> 1 C 1
> 2 C 1
> 3 A NA
> 4 A NA
> 5 0 NA
> 6 0 NA
>
> ## The same with characters. Is this a bug. It definitely does not agree
> ## with help page, since order is not the same as in 'y'. Can
someone
> ## please check on newer versions?
>
> ## Is there any other way to get the same order as in 'y' i.e.
tmp1?
>
> > R.version
> _
> platform i486-pc-linux-gnu
> arch i486
> os linux-gnu
> system i486, linux-gnu
> status
> major 2
> minor 2.0
> year 2005
> month 10
> day 06
> svn rev 35749
> language R
>
> Thank you very much!
>
> --
> Lep pozdrav / With regards,
> Gregor Gorjanc
>
> ----------------------------------------------------------------------
> University of Ljubljana PhD student
> Biotechnical Faculty
> Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
> Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
>
> SI-1230 Domzale tel: +386 (0)1 72 17 861
> Slovenia, Europe fax: +386 (0)1 72 17 888
>
> ----------------------------------------------------------------------
> "One must learn by doing the thing; for though you think you know it,
> you have no certainty until you try." Sophocles ~ 450 B.C.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>