Hello all, I don't understand a strange behavior in data frame manipulation. data_frame1 = data.frame(Site = c("S1", "S2", "S3", "S4", "L1", "L2", "L3", "L4"), Number = c(1, 3, 5, 2, 1, 1, 2, 1)) data_frame2 = data_frame1 [data_frame1$Site != "S1", ] dput (data_frame2) structure(list(Site = structure(c(6L, 7L, 8L, 1L, 2L, 3L, 4L), .Label = c("L1", "L2", "L3", "L4", "S1", "S2", "S3", "S4"), class = "factor"), Number = structure(c(3L, 4L, 2L, 1L, 1L, 2L, 1L), .Label = c("1", "2", "3", "5"), class = "factor")), .Names = c("Site", "Number" ), row.names = 2:8, class = "data.frame") Why site "S1" do not disappeared from data_frame2's structure? And what I have to do to eliminate it definitively from my new data frame (data_frame2)? Sorry for this basic question, but I really did not understand... Thanks in advanced, Raoni -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.raoni at gmail.com
This has nothing to do with data frames and everything to do with how factors behave. The levels of a factor are not necessarily linked with the content of the factor. For example, a factor representing "Male" and "Female" has both of those levels even if all the data in a subset represents "Male". If you want traces of those eliminated values removed, consider using character data rather than factors. In particular, using the as.is=TRUE or the stringsAsFactors= FALSE argument to read.table and similar functions will prevent automatic generation of factors. You can then choose when to convert to factor after you have manipulated your data. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Raoni Rodrigues <caciquesamurai at gmail.com> wrote:>Hello all, > >I don't understand a strange behavior in data frame manipulation. > >data_frame1 = data.frame(Site = c("S1", "S2", "S3", "S4", "L1", "L2", >"L3", "L4"), > Number = c(1, 3, 5, 2, 1, 1, 2, 1)) > >data_frame2 = data_frame1 [data_frame1$Site != "S1", ] > >dput (data_frame2) > >structure(list(Site = structure(c(6L, 7L, 8L, 1L, 2L, 3L, 4L), .Label >c("L1", >"L2", "L3", "L4", "S1", "S2", "S3", "S4"), class = "factor"), > Number = structure(c(3L, 4L, 2L, 1L, 1L, 2L, 1L), .Label = c("1", > "2", "3", "5"), class = "factor")), .Names = c("Site", "Number" >), row.names = 2:8, class = "data.frame") > >Why site "S1" do not disappeared from data_frame2's structure? > >And what I have to do to eliminate it definitively from my new data >frame (data_frame2)? > >Sorry for this basic question, but I really did not understand... > >Thanks in advanced, > >Raoni
Hi> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Raoni Rodrigues > Sent: Tuesday, September 25, 2012 7:22 AM > To: r-help at r-project.org > Subject: [R] Strange data frame behavior > > Hello all, > > I don't understand a strange behavior in data frame manipulation. > > data_frame1 = data.frame(Site = c("S1", "S2", "S3", "S4", "L1", "L2", > "L3", "L4"), > Number = c(1, 3, 5, 2, 1, 1, 2, 1)) > > data_frame2 = data_frame1 [data_frame1$Site != "S1", , drop=T] > > dput (data_frame2) > > structure(list(Site = structure(c(6L, 7L, 8L, 1L, 2L, 3L, 4L), .Label > c("L1", "L2", "L3", "L4", "S1", "S2", "S3", "S4"), class = "factor"), > Number = structure(c(3L, 4L, 2L, 1L, 1L, 2L, 1L), .Label = c("1", > "2", "3", "5"), class = "factor")), .Names = c("Site", "Number" > ), row.names = 2:8, class = "data.frame") > > Why site "S1" do not disappeared from data_frame2's structure?Because Site is a factor and its levels are preserved in subset operations. See ?"[" and especially factor part and drop parameter. You can either get rid of factor and change it to character or explicitly call factor to Site variable factor(data_frame2$Site) to get rid of empty levels Regards Petr> > And what I have to do to eliminate it definitively from my new data > frame (data_frame2)? > > Sorry for this basic question, but I really did not understand... > > Thanks in advanced, > > Raoni > -- > Raoni Rosa Rodrigues > Research Associate of Fish Transposition Center CTPeixes Universidade > Federal de Minas Gerais - UFMG Brasil rodrigues.raoni at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.