thr3ads.net - R help - [R] Removing variables from data frame with a wile card [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Jeff Newmiller

2023-Feb-12 22:57 UTC

[R] Removing variables from data frame with a wile card

x["V2"]

is more efficient than using drop=FALSE, and perfectly normal syntax (data
frames are lists of columns).  I would ignore the naysayers, or put a comment in
if you want to accelerate their uptake.

As I understand it, one of the main reasons tibbles exist is because of
drop=TRUE. List-slice (single-dimension) indexing works equally well with both
standard and tibble types of data frames.

On February 12, 2023 2:30:15 PM PST, Andrew Simmons <akwsimmo at
gmail.com> wrote:>drop = FALSE means that should the indexing select exactly one column, then
>return a data frame with one column, instead of the object in the column.
>It's usually not necessary, but I've messed up some data before by
assuming
>the indexing always returns a data frame when it doesn't, so drop =
FALSE
>let's me that I will always get a data frame.
>
>```
>x <- data.frame(V1 = 1:5, V2 = letters[1:5])
>x[, "V2"]
>x[, "V2", drop = FALSE]
>```
>
>You'll notice that the first returns a character vector, a through e,
where
>the second returns a data frame with one column where the object in the
>column is the same character vector.
>
>You could alternatively use
>
>x["V2"]
>
>which should be identical to x[, "V2", drop = FALSE], but some
people don't
>like that because it doesn't look like matrix indexing anymore.
>
>
>On Sun, Feb 12, 2023, 17:18 Steven T. Yen <styen at ntu.edu.tw> wrote:
>
>> In the line suggested by Andrew Simmons,
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop =
FALSE]
>>
>> what does drop=FALSE do? Thanks.
>>
>> On 1/14/2023 8:48 PM, Steven Yen wrote:
>>
>> Thanks to all. Very helpful.
>>
>> Steven from iPhone
>>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo at
gmail.com>
>> <akwsimmo at gmail.com> wrote:
>>
>> ?You'll want to use grep() or grepl(). By default, grep() uses
extended
>> regular expressions to find matches, but you can also use perl regular
>> expressions and globbing (after converting to a regular expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If
you'd rather you
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns starting
>> with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop =
FALSE]
>>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen at
ntu.edu.tw>
>> <styen at ntu.edu.tw> wrote:
>>
>>
>> I have a data frame containing variables
"yr3",...,"yr28".
>>
>>
>> How do I remove them with a wild card----something similar to "del
yr*"
>>
>> in Windows/doc? Thank you.
>>
>>
>> colnames(mydata)
>>
>>   [1] "year"       "weight"    
"confeduc"   "confothr" "college"
>>
>>   [6] ...
>>
>>  [41] "yr3"        "yr4"        "yr5"    
"yr6" "yr7"
>>
>>  [46] "yr8"        "yr9"        "yr10"   
"yr11" "yr12"
>>
>>  [51] "yr13"       "yr14"       "yr15"   
"yr16" "yr17"
>>
>>  [56] "yr18"       "yr19"       "yr20"   
"yr21" "yr22"
>>
>>  [61] "yr23"       "yr24"       "yr25"   
"yr26" "yr27"
>>
>>  [66] "yr28"...
>>
>>
>> ______________________________________________
>>
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Steven T. Yen

2023-Feb-12 23:17 UTC

head link

[R] Removing variables from data frame with a wile card

Thanks Jeff and Andrew. My initial file, mydata, is a data frame with 92 
columns (variables). After the operation (trimming), it remains a data 
frame with 72 variables. So yes indeed, I do not need the drop=FALSE.
> is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 92 > mydata<-mydata[,!grepl("^yr",colnames(mydata)),drop=FALSE] > 
is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 72

On 2/13/2023 6:57 AM, Jeff Newmiller wrote:> x["V2"]
>
> is more efficient than using drop=FALSE, and perfectly normal syntax (data
frames are lists of columns).  I would ignore the naysayers, or put a comment in
if you want to accelerate their uptake.
>
> As I understand it, one of the main reasons tibbles exist is because of
drop=TRUE. List-slice (single-dimension) indexing works equally well with both
standard and tibble types of data frames.
>
> On February 12, 2023 2:30:15 PM PST, Andrew Simmons<akwsimmo at
gmail.com>  wrote:
>> drop = FALSE means that should the indexing select exactly one column,
then
>> return a data frame with one column, instead of the object in the
column.
>> It's usually not necessary, but I've messed up some data before
by assuming
>> the indexing always returns a data frame when it doesn't, so drop =
FALSE
>> let's me that I will always get a data frame.
>>
>> ```
>> x <- data.frame(V1 = 1:5, V2 = letters[1:5])
>> x[, "V2"]
>> x[, "V2", drop = FALSE]
>> ```
>>
>> You'll notice that the first returns a character vector, a through
e, where
>> the second returns a data frame with one column where the object in the
>> column is the same character vector.
>>
>> You could alternatively use
>>
>> x["V2"]
>>
>> which should be identical to x[, "V2", drop = FALSE], but
some people don't
>> like that because it doesn't look like matrix indexing anymore.
>>
>>
>> On Sun, Feb 12, 2023, 17:18 Steven T. Yen<styen at ntu.edu.tw> 
wrote:
>>
>>> In the line suggested by Andrew Simmons,
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)),
drop = FALSE]
>>>
>>> what does drop=FALSE do? Thanks.
>>>
>>> On 1/14/2023 8:48 PM, Steven Yen wrote:
>>>
>>> Thanks to all. Very helpful.
>>>
>>> Steven from iPhone
>>>
>>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons<akwsimmo at
gmail.com>
>>> <akwsimmo at gmail.com>  wrote:
>>>
>>> ?You'll want to use grep() or grepl(). By default, grep() uses
extended
>>> regular expressions to find matches, but you can also use perl
regular
>>> expressions and globbing (after converting to a regular
expression).
>>> For example:
>>>
>>> grepl("^yr", colnames(mydata))
>>>
>>> will tell you which 'colnames' start with "yr".
If you'd rather you
>>> use globbing:
>>>
>>> grepl(glob2rx("yr*"), colnames(mydata))
>>>
>>> Then you might write something like this to remove the columns
starting
>>> with yr:
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)),
drop = FALSE]
>>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen<styen at
ntu.edu.tw>
>>> <styen at ntu.edu.tw>  wrote:
>>>
>>>
>>> I have a data frame containing variables
"yr3",...,"yr28".
>>>
>>>
>>> How do I remove them with a wild card----something similar to
"del yr*"
>>>
>>> in Windows/doc? Thank you.
>>>
>>>
>>> colnames(mydata)
>>>
>>>    [1] "year"       "weight"    
"confeduc"   "confothr" "college"
>>>
>>>    [6] ...
>>>
>>>   [41] "yr3"        "yr4"       
"yr5"        "yr6" "yr7"
>>>
>>>   [46] "yr8"        "yr9"       
"yr10"       "yr11" "yr12"
>>>
>>>   [51] "yr13"       "yr14"      
"yr15"       "yr16" "yr17"
>>>
>>>   [56] "yr18"       "yr19"      
"yr20"       "yr21" "yr22"
>>>
>>>   [61] "yr23"       "yr24"      
"yr25"       "yr26" "yr27"
>>>
>>>   [66] "yr28"...
>>>
>>>
>>> ______________________________________________
>>>
>>> R-help at r-project.org  mailing list -- To UNSUBSCRIBE and more,
see
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting
guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.	[[alternative HTML version deleted]]

Rolf Turner

2023-Feb-13 01:33 UTC

head link

[R] Removing variables from data frame with a wile card

On Sun, 12 Feb 2023 14:57:36 -0800
Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> x["V2"]
> 
> is more efficient than using drop=FALSE, and perfectly normal syntax
> (data frames are lists of columns).
<SNIP>

I never cease to be amazed by the sagacity and perspicacity of the
designers of R.  I  would have worried that x["V2"] would turn out to
be
a *list* (of length 1), but no, it retains the data.frame class, which
is clearly the Right Thing To Do.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. phone: +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

R help - Feb 2023 - Removing variables from data frame with a wile card

[R] Removing variables from data frame with a wile card

[R] Removing variables from data frame with a wile card

[R] Removing variables from data frame with a wile card