Mohammad Tanvir Ahamed
2016-Mar-03 11:57 UTC
[R] Extract row form a dataframe by row names in another vector and factor . Need explanation
Dear Dennis Thank you very much for your detail reply . It was really helpful to understand. Tanvir Ahamed G?teborg, Sweden | mashranga at yahoo.com ________________________________ From: Dennis Murphy <djmuser at gmail.com> Sent: Thursday, 3 March 2016, 4:38 Subject: Re: [R] Extract row form a dataframe by row names in another vector and factor . Need explanation Welcome to the wonderful world of factors. In your second case, v2, the vector is character, so R matches the input character string to the lookup table of row names. OTOH, v1 is a factor - it behaves differently when used for subsetting, and this example illustrates why you shouldn't use them for this purpose. Let's look at it:> v1[1] f g h i j Levels: f g h i j> str(v1)Factor w/ 5 levels "f","g","h","i",..: 1 2 3 4 5> levels(v1)[1] "f" "g" "h" "i" "j"> as.integer(v1)[1] 1 2 3 4 5> str(levels(v1))chr [1:5] "f" "g" "h" "i" "j" When you used v1 to subset rows, it uses the labels of the factor for subsetting. Since these were not set, R defaults to the factor's underlying numeric codes. This is why res1 selected the first five observations. These alternatives do what you want: dat[levels(v1), ] dat[as.character(v1), ] # behaves like v2 (an atomic vector) # Another approach: define a factor with appropriate labels: x <- as.character(dat1[, "BB"]) v3 <- factor(x, levels = unique(x), labels = unique(x)) dat[v3, ] There are a couple alternative avenues you could have chosen (e.g., match() or which()), but they are overkill for this simple case. Your real problem was converting a character matrix to a data frame in the first place - this converted all of the columns to factors with different sets of levels: str(dat1) This illustrates one of the important differences between data frames and matrices. In a matrix, every element must be of the same class. Specifically, a matrix is an atomic vector with a 'dim' attribute. In contrast, each _column_ of a data frame must have elements of the same class, but they do not have to be the same class from one column to the next. One way to have avoided the conversion to factor would have been to use the argument stringsAsFactors = FALSE in the data.frame() call - by default, it is TRUE. More importantly, the conversion to data frame for dat1 was unnecessary - observe:> dat1<-matrix(letters[1:20],ncol=4) > colnames(dat1)<-c("AA","BB","CC","DD") > dat[dat1[, "BB"], ]SA1 SA2 SA3 SA4 SA5 f 6 16 26 36 46 g 7 17 27 37 47 h 8 18 28 38 48 i 9 19 29 39 49 j 10 20 30 40 50 For the same reason, it was unnecessary to convert dat to a data frame. Let's look at a matrix version instead: dat2 <- matrix(seq(50), nrow = 10) rownames(dat2) <- letters[1:10] colnames(dat2) <- paste0("SA", 1:5) dat2[dat1[, "BB"], ] # desired result Hint: You might want to spend some time to carefully learn the different major data types in R and the various modes of indexing. In general, it is not a good default practice to convert matrices to data frames. Dennis On Wed, Mar 2, 2016 at 6:05 PM, Mohammad Tanvir Ahamed via R-help <r-help at r-project.org> wrote:> Hi,Here i have written an example to explain my problem > ## Data Generationdat<-data.frame(matrix(1:50,ncol=5)) > rownames(dat)<-letters[1:10] > colnames(dat)<- c("SA1","SA2","SA3","SA4","SA5") > > dat1<-data.frame(matrix(letters[1:20],ncol=4)) > colnames(dat1)<-c("AA","BB","CC","DD") > > ## Row names > v1<-dat1[,"BB"] # Factor > v2<-as.vector(dat1[,"BB"]) # Vector > > is(v1) # Factor > is(v2) # Vector > > # Result > res1<-dat[v1,] > res2<-dat[v2,] > ##########################################################i assumed res1 and res2 are same . but it is not . Can any body please explain why ? > > > Tanvir Ahamed> [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.