Jeff Newmiller
2023-Feb-12 22:57 UTC
[R] Removing variables from data frame with a wile card
x["V2"] is more efficient than using drop=FALSE, and perfectly normal syntax (data frames are lists of columns). I would ignore the naysayers, or put a comment in if you want to accelerate their uptake. As I understand it, one of the main reasons tibbles exist is because of drop=TRUE. List-slice (single-dimension) indexing works equally well with both standard and tibble types of data frames. On February 12, 2023 2:30:15 PM PST, Andrew Simmons <akwsimmo at gmail.com> wrote:>drop = FALSE means that should the indexing select exactly one column, then >return a data frame with one column, instead of the object in the column. >It's usually not necessary, but I've messed up some data before by assuming >the indexing always returns a data frame when it doesn't, so drop = FALSE >let's me that I will always get a data frame. > >``` >x <- data.frame(V1 = 1:5, V2 = letters[1:5]) >x[, "V2"] >x[, "V2", drop = FALSE] >``` > >You'll notice that the first returns a character vector, a through e, where >the second returns a data frame with one column where the object in the >column is the same character vector. > >You could alternatively use > >x["V2"] > >which should be identical to x[, "V2", drop = FALSE], but some people don't >like that because it doesn't look like matrix indexing anymore. > > >On Sun, Feb 12, 2023, 17:18 Steven T. Yen <styen at ntu.edu.tw> wrote: > >> In the line suggested by Andrew Simmons, >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> what does drop=FALSE do? Thanks. >> >> On 1/14/2023 8:48 PM, Steven Yen wrote: >> >> Thanks to all. Very helpful. >> >> Steven from iPhone >> >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo at gmail.com> >> <akwsimmo at gmail.com> wrote: >> >> ?You'll want to use grep() or grepl(). By default, grep() uses extended >> regular expressions to find matches, but you can also use perl regular >> expressions and globbing (after converting to a regular expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting >> with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen at ntu.edu.tw> >> <styen at ntu.edu.tw> wrote: >> >> >> I have a data frame containing variables "yr3",...,"yr28". >> >> >> How do I remove them with a wild card----something similar to "del yr*" >> >> in Windows/doc? Thank you. >> >> >> colnames(mydata) >> >> [1] "year" "weight" "confeduc" "confothr" "college" >> >> [6] ... >> >> [41] "yr3" "yr4" "yr5" "yr6" "yr7" >> >> [46] "yr8" "yr9" "yr10" "yr11" "yr12" >> >> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >> >> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >> >> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >> >> [66] "yr28"... >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Steven T. Yen
2023-Feb-12 23:17 UTC
[R] Removing variables from data frame with a wile card
Thanks Jeff and Andrew. My initial file, mydata, is a data frame with 92 columns (variables). After the operation (trimming), it remains a data frame with 72 variables. So yes indeed, I do not need the drop=FALSE.> is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 92 >mydata<-mydata[,!grepl("^yr",colnames(mydata)),drop=FALSE] > is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 72 On 2/13/2023 6:57 AM, Jeff Newmiller wrote:> x["V2"] > > is more efficient than using drop=FALSE, and perfectly normal syntax (data frames are lists of columns). I would ignore the naysayers, or put a comment in if you want to accelerate their uptake. > > As I understand it, one of the main reasons tibbles exist is because of drop=TRUE. List-slice (single-dimension) indexing works equally well with both standard and tibble types of data frames. > > On February 12, 2023 2:30:15 PM PST, Andrew Simmons<akwsimmo at gmail.com> wrote: >> drop = FALSE means that should the indexing select exactly one column, then >> return a data frame with one column, instead of the object in the column. >> It's usually not necessary, but I've messed up some data before by assuming >> the indexing always returns a data frame when it doesn't, so drop = FALSE >> let's me that I will always get a data frame. >> >> ``` >> x <- data.frame(V1 = 1:5, V2 = letters[1:5]) >> x[, "V2"] >> x[, "V2", drop = FALSE] >> ``` >> >> You'll notice that the first returns a character vector, a through e, where >> the second returns a data frame with one column where the object in the >> column is the same character vector. >> >> You could alternatively use >> >> x["V2"] >> >> which should be identical to x[, "V2", drop = FALSE], but some people don't >> like that because it doesn't look like matrix indexing anymore. >> >> >> On Sun, Feb 12, 2023, 17:18 Steven T. Yen<styen at ntu.edu.tw> wrote: >> >>> In the line suggested by Andrew Simmons, >>> >>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >>> >>> what does drop=FALSE do? Thanks. >>> >>> On 1/14/2023 8:48 PM, Steven Yen wrote: >>> >>> Thanks to all. Very helpful. >>> >>> Steven from iPhone >>> >>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons<akwsimmo at gmail.com> >>> <akwsimmo at gmail.com> wrote: >>> >>> ?You'll want to use grep() or grepl(). By default, grep() uses extended >>> regular expressions to find matches, but you can also use perl regular >>> expressions and globbing (after converting to a regular expression). >>> For example: >>> >>> grepl("^yr", colnames(mydata)) >>> >>> will tell you which 'colnames' start with "yr". If you'd rather you >>> use globbing: >>> >>> grepl(glob2rx("yr*"), colnames(mydata)) >>> >>> Then you might write something like this to remove the columns starting >>> with yr: >>> >>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >>> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen<styen at ntu.edu.tw> >>> <styen at ntu.edu.tw> wrote: >>> >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> >>> How do I remove them with a wild card----something similar to "del yr*" >>> >>> in Windows/doc? Thank you. >>> >>> >>> colnames(mydata) >>> >>> [1] "year" "weight" "confeduc" "confothr" "college" >>> >>> [6] ... >>> >>> [41] "yr3" "yr4" "yr5" "yr6" "yr7" >>> >>> [46] "yr8" "yr9" "yr10" "yr11" "yr12" >>> >>> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >>> >>> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >>> >>> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >>> >>> [66] "yr28"... >>> >>> >>> ______________________________________________ >>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
On Sun, 12 Feb 2023 14:57:36 -0800 Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> x["V2"] > > is more efficient than using drop=FALSE, and perfectly normal syntax > (data frames are lists of columns).<SNIP> I never cease to be amazed by the sagacity and perspicacity of the designers of R. I would have worried that x["V2"] would turn out to be a *list* (of length 1), but no, it retains the data.frame class, which is clearly the Right Thing To Do. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619