Valentin Petzel
2023-Jan-14 18:21 UTC
[R] Removing variables from data frame with a wile card
Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gross at gmail.com:> Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier. > > > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons <akwsimmo at gmail.com> > Cc: R-help Mailing List <r-help at r-project.org> > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo at gmail.com> wrote: >> >> ?You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen at ntu.edu.tw> wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> How do I remove them with a wild card----something similar to "del yr*" >>> in Windows/doc? Thank you. >>> >>>> colnames(mydata) >>> ? [1] "year"?????? "weight"???? "confeduc"?? "confothr" "college" >>> ? [6] ... >>> [41] "yr3"??????? "yr4"??????? "yr5"??????? "yr6" "yr7" >>> [46] "yr8"??????? "yr9"??????? "yr10"?????? "yr11" "yr12" >>> [51] "yr13"?????? "yr14"?????? "yr15"?????? "yr16" "yr17" >>> [56] "yr18"?????? "yr19"?????? "yr20"?????? "yr21" "yr22" >>> [61] "yr23"?????? "yr24"?????? "yr25"?????? "yr26" "yr27" >>> [66] "yr28"... >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Sorkin, John
2023-Jan-15 16:54 UTC
[R] Removing variables from data frame with a wile card
I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R. # Create data frame with three columns one <- rep(1,10) one two <- rep(2,10) two three <- rep(3,10) three mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three columns\n") mydata # Drop the column whose name starts with th, i.e. column three # Find the location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n") ColumToDelete # Drop the column whose name starts with "th" newdata2 <- mydata[,-ColumnToDelete] cat("Data frame after droping column whose name is three\n") newdata2 I hope this helps. John ________________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Valentin Petzel <valentin at petzel.at> Sent: Saturday, January 14, 2023 1:21 PM To: avi.e.gross at gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gross at gmail.com:> Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier. > > > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons <akwsimmo at gmail.com> > Cc: R-help Mailing List <r-help at r-project.org> > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo at gmail.com> wrote: >> >> ?You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen at ntu.edu.tw> wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> How do I remove them with a wild card----something similar to "del yr*" >>> in Windows/doc? Thank you. >>> >>>> colnames(mydata) >>> [1] "year"?????? "weight"???? "confeduc"?? "confothr" "college" >>> [6] ... >>> [41] "yr3"??????? "yr4"??????? "yr5"??????? "yr6" "yr7" >>> [46] "yr8"??????? "yr9"??????? "yr10"?????? "yr11" "yr12" >>> [51] "yr13"?????? "yr14"?????? "yr15"?????? "yr16" "yr17" >>> [56] "yr18"?????? "yr19"?????? "yr20"?????? "yr21" "yr22" >>> [61] "yr23"?????? "yr24"?????? "yr25"?????? "yr26" "yr27" >>> [66] "yr28"... >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0 >>> PLEASE do read the posting guide >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0 >>> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0 > PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0 > PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0 PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.