All - firstly apology if this is a very basic question but i tried myself and could not find a satisfied answer. I know that i can subset a dataframe using dataframe[row,column] and if i give dataframe[row,] that specific row is provided and similarly i can do dataframe[,column] to get the entire column. what i don't understand is that if i do dataframe[<conditional expression>]and don't provide the 'comma' what is being returned e.g. i have the below code: manager <- c(1, 2, 3, 4, 5) date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") country <- c("US", "US", "UK", "UK", "UK") gender <- c("M", "F", "F", "M", "F") age <- c(32, 45, 25, 39, 99) q1 <- c(5, 3, 3, 3, 2) q2 <- c(4, 5, 5, 3, 2) q3 <- c(5, 2, 5, 4, 1) q4 <- c(5, 5, 5, NA, 2) q5 <- c(5, 5, 2, NA, 1) leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE) now if i do leadership[leadership$country == "US",] two row are being returned as managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat 1 1 10/24/08 US M 32 5 4 5 5 5 Young 2 2 10/28/08 US F 45 3 5 2 5 5 Young but if i do leadership[leadership$country == "US"] to get the entire data frame where country is US i am getting below managerID JoinDate q1 q2 agecat 1 1 10/24/08 5 4 Young 2 2 10/28/08 3 5 Young 3 3 10/1/08 3 5 Young 4 4 10/12/08 3 3 Young 5 5 5/1/09 2 2 <NA> Please guide me what am i doing wrong. Thanks [[alternative HTML version deleted]]
Try a simpler example:> ick <- data.frame(x=1:5, a=letters[1:5], c=month.abb[1:5], y=11:15) > ickx a c y 1 1 a Jan 11 2 2 b Feb 12 3 3 c Mar 13 4 4 d Apr 14 5 5 e May 15> ick[2]a 1 a 2 b 3 c 4 d 5 e> > ick[3]c 1 Jan 2 Feb 3 Mar 4 Apr 5 May If you use [] without a comma, it returns the specified columns. ick[ c(FALSE,TRUE,TRUE,FALSE) ] will return the second and third columns, those where the logical vector is TRUE. This is because data frames are actually lists in disguise> is.list(ick) [1] TRUE-Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 2/27/14 7:00 AM, "Kapil Shukla" <shukla.kapil at gmail.com> wrote:>All - firstly apology if this is a very basic question but i tried myself >and could not find a satisfied answer. > >I know that i can subset a dataframe using dataframe[row,column] and if i >give dataframe[row,] that specific row is provided and similarly i can do >dataframe[,column] to get the entire column. > >what i don't understand is that if i do dataframe[<conditional >expression>]and don't provide the 'comma' what is being returned > >e.g. i have the below code: > >manager <- c(1, 2, 3, 4, 5) >date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") >country <- c("US", "US", "UK", "UK", "UK") >gender <- c("M", "F", "F", "M", "F") >age <- c(32, 45, 25, 39, 99) >q1 <- c(5, 3, 3, 3, 2) >q2 <- c(4, 5, 5, 3, 2) >q3 <- c(5, 2, 5, 4, 1) >q4 <- c(5, 5, 5, NA, 2) >q5 <- c(5, 5, 2, NA, 1) >leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, >q4, q5, stringsAsFactors=FALSE) > >now if i do > > >leadership[leadership$country == "US",] > >two row are being returned as > > > > managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat >1 1 10/24/08 US M 32 5 4 5 5 5 Young >2 2 10/28/08 US F 45 3 5 2 5 5 Young > > >but if i do > >leadership[leadership$country == "US"] to get the entire data frame >where country is US i am getting below > > > managerID JoinDate q1 q2 agecat >1 1 10/24/08 5 4 Young >2 2 10/28/08 3 5 Young >3 3 10/1/08 3 5 Young >4 4 10/12/08 3 3 Young >5 5 5/1/09 2 2 <NA> > > > >Please guide me what am i doing wrong. > > >Thanks > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi, Thanks for the example! I cannot really tell you why you get what you get when you type leadership[leadership$country == "US"] But what I know (or think I know) is that when you don't write the comma, R will take it as a condition for the columns. It means that leadership[1:2] is identical to leadership[,1:2] identical(leadership[1:2],leadership[,1:2]) [1] TRUE If you want all rows where "US" is present in "country", then you did it fine using leadership[leadership$country == "US", ] HTH, Ivan -- Ivan Calandra, ATER Universit? de Franche-Comt? UFR STGI - UMR 6249 Chrono-Environnement 4 Place Tharradin - BP 71427 25211 Montb?liard Cedex, FRANCE ivan.calandra at univ-fcomte.fr http://biogeosciences.u-bourgogne.fr/calandra Le 27/02/14 16:00, Kapil Shukla a ?crit :> All - firstly apology if this is a very basic question but i tried myself > and could not find a satisfied answer. > > I know that i can subset a dataframe using dataframe[row,column] and if i > give dataframe[row,] that specific row is provided and similarly i can do > dataframe[,column] to get the entire column. > > what i don't understand is that if i do dataframe[<conditional > expression>]and don't provide the 'comma' what is being returned > > e.g. i have the below code: > > manager <- c(1, 2, 3, 4, 5) > date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") > country <- c("US", "US", "UK", "UK", "UK") > gender <- c("M", "F", "F", "M", "F") > age <- c(32, 45, 25, 39, 99) > q1 <- c(5, 3, 3, 3, 2) > q2 <- c(4, 5, 5, 3, 2) > q3 <- c(5, 2, 5, 4, 1) > q4 <- c(5, 5, 5, NA, 2) > q5 <- c(5, 5, 2, NA, 1) > leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, > q4, q5, stringsAsFactors=FALSE) > > now if i do > > > leadership[leadership$country == "US",] > > two row are being returned as > > > > managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat > 1 1 10/24/08 US M 32 5 4 5 5 5 Young > 2 2 10/28/08 US F 45 3 5 2 5 5 Young > > > but if i do > > leadership[leadership$country == "US"] to get the entire data frame > where country is US i am getting below > > > managerID JoinDate q1 q2 agecat > 1 1 10/24/08 5 4 Young > 2 2 10/28/08 3 5 Young > 3 3 10/1/08 3 5 Young > 4 4 10/12/08 3 3 Young > 5 5 5/1/09 2 2 <NA> > > > > Please guide me what am i doing wrong. > > > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >