thr3ads.net - R help - [R] "Copy-pastable" output of 1000 plus variables [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Bruce Ratner PhD

2017-Apr-23 19:07 UTC

[R] "Copy-pastable" output of 1000 plus variables

R-helpers:
I'm reading "Advanced R" (Wickham), which provides his way, quoted
below, of keeping variables. This cherry-picking approach clearly is not
practical with a large dataset.

"If you know the columns you don?t want, use set operations to work out
which colums to keep: df[setdiff(names(df), "z")]"

I'm looking for a way of producing an output of 1000 plus variables, such
that I can get a clean listing of variables, not like from st(), that are easily
copy-pastable for selecting the variables I want to keep.

Any suggestion is appreciated.
Thanks. 
Bruce

David Winsemius

2017-Apr-23 19:57 UTC

head link

[R] "Copy-pastable" output of 1000 plus variables

It would be best if you could demonstrate _with_ _code_ the sort of operation
you propose.

David

Sent from my iPhone
> On Apr 23, 2017, at 1:07 PM, Bruce Ratner PhD <br at dmstat1.com>
wrote:
> 
> R-helpers:
> I'm reading "Advanced R" (Wickham), which provides his way,
quoted below, of keeping variables. This cherry-picking approach clearly is not
practical with a large dataset.
> 
> "If you know the columns you don?t want, use set operations to work
out which colums to keep: df[setdiff(names(df), "z")]"
> 
> I'm looking for a way of producing an output of 1000 plus variables,
such that I can get a clean listing of variables, not like from st(), that are
easily copy-pastable for selecting the variables I want to keep.
> 
> Any suggestion is appreciated.
> Thanks. 
> Bruce
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2017-Apr-23 20:38 UTC

head link

[R] "Copy-pastable" output of 1000 plus variables

Coming from an Excel background, copying and pasting seems attractive, but it
does not create a reproducible record of what you did so it becomes quite tiring
and frustrating after some time has passed and you return to your analysis.

Nitpick: you put the setdiff function in the row selection position, an error I
am sure Hadley did not recommend.

Since R is programmable, there are far more ways to select columns than just
setdiff. Since your description of desired features is vague, you are unlikely
to get the answer you would really like from your email. Some possibilities to
think about:

a) use regular expressions and grep or grepl to select by similar character
patterns. E.g. all columns including the the substring "value" or
"key": grep( "key|value", names( dta ). Possible to specify
very complex selection patterns, but there are whole books on regular
expressions, so you can't expect to learn all about them on this R-specific
mailing list.

b) use a separate csv file with a column listing each column name, and then one
column for each subset you want to define, using TRUE/FALSE values to include or
not include the column name identified. E.g.

# typically easier to manage in an external data file, online for example only
colsets <- read.csv( text"Colname,set1,set2
key,TRUE,TRUE
value1,TRUE,FALSE
value2,TRUE,FALSE
factor1,FALSE,TRUE
",header=TRUE,as.is=TRUE)
dta[ , colsets$set1 ]

Also your criteria of "clean listing" and "copy-pasteable"
are likely mutually exclusive, depending how you interpret them. You might be
able to use dput to export a set of column names that can be re-imported
accurately, but you might not regard it as "clean" if you are thinking
"readable".
-- 
Sent from my phone. Please excuse my brevity.

On April 23, 2017 12:07:19 PM PDT, Bruce Ratner PhD <br at dmstat1.com>
wrote:>R-helpers:
>I'm reading "Advanced R" (Wickham), which provides his way,
quoted
>below, of keeping variables. This cherry-picking approach clearly is
>not practical with a large dataset. 
>
>"If you know the columns you don?t want, use set operations to work out
>which colums to keep: df[setdiff(names(df), "z")]"
>
>I'm looking for a way of producing an output of 1000 plus variables,
>such that I can get a clean listing of variables, not like from st(),
>that are easily copy-pastable for selecting the variables I want to
>keep. 
>
>Any suggestion is appreciated.
>Thanks. 
>Bruce
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

BR_email

2017-Apr-23 20:43 UTC

head link

[R] "Copy-pastable" output of 1000 plus variables

David:
I cannot demonstrate _with_ _code_ , otherwise I would not have a 
problem. However, I can illustrate:
In SAS, I can run Proc SQL for a dump, VARLIST_IS_HERE, showing on the 
computer screen the variables, e.g., ID, X1, X2, X3, ..., X1000,
  that I can copy and paste into the editor window (e.g., R Source 
window) to easily select which variables among the big data of today
I want keep.


  

David Winsemius wrote:> It would be best if you could demonstrate _with_ _code_ the sort of
operation you propose.
>
> David
>
> Sent from my iPhone
>
>> On Apr 23, 2017, at 1:07 PM, Bruce Ratner PhD <br at dmstat1.com>
wrote:
>>
>> R-helpers:
>> I'm reading "Advanced R" (Wickham), which provides his
way, quoted below, of keeping variables. This cherry-picking approach clearly is
not practical with a large dataset.
>>
>> "If you know the columns you don?t want, use set operations to
work out which colums to keep: df[setdiff(names(df), "z")]"
>>
>> I'm looking for a way of producing an output of 1000 plus
variables, such that I can get a clean listing of variables, not like from st(),
that are easily copy-pastable for selecting the variables I want to keep.
>>
>> Any suggestion is appreciated.
>> Thanks.
>> Bruce
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>

BR_email

2017-Apr-23 20:46 UTC

head link

[R] "Copy-pastable" output of 1000 plus variables

Jeff:
Thanks, Please see my reply to David.
Bruce

Bruce Ratner, Ph.D.
The Significant Statistician?
(516) 791-3544
Statistical Predictive Analtyics -- www.DMSTAT1.com
Machine-Learning Data Mining and Modeling -- www.GenIQ.net
  

Jeff Newmiller wrote:> Coming from an Excel background, copying and pasting seems attractive, but
it does not create a reproducible record of what you did so it becomes quite
tiring and frustrating after some time has passed and you return to your
analysis.
>
> Nitpick: you put the setdiff function in the row selection position, an
error I am sure Hadley did not recommend.
>
> Since R is programmable, there are far more ways to select columns than
just setdiff. Since your description of desired features is vague, you are
unlikely to get the answer you would really like from your email. Some
possibilities to think about:
>
> a) use regular expressions and grep or grepl to select by similar character
patterns. E.g. all columns including the the substring "value" or
"key": grep( "key|value", names( dta ). Possible to specify
very complex selection patterns, but there are whole books on regular
expressions, so you can't expect to learn all about them on this R-specific
mailing list.
>
> b) use a separate csv file with a column listing each column name, and then
one column for each subset you want to define, using TRUE/FALSE values to
include or not include the column name identified. E.g.
>
> # typically easier to manage in an external data file, online for example
only
> colsets <- read.csv( text> "Colname,set1,set2
> key,TRUE,TRUE
> value1,TRUE,FALSE
> value2,TRUE,FALSE
> factor1,FALSE,TRUE
> ",header=TRUE,as.is=TRUE)
> dta[ , colsets$set1 ]
>
> Also your criteria of "clean listing" and
"copy-pasteable" are likely mutually exclusive, depending how you
interpret them. You might be able to use dput to export a set of column names
that can be re-imported accurately, but you might not regard it as
"clean" if you are thinking "readable".

David Winsemius

2017-Apr-24 03:39 UTC

head link

[R] "Copy-pastable" output of 1000 plus variables

In context.

Sent from my iPhone
> On Apr 23, 2017, at 2:38 PM, Jeff Newmiller <jdnewmil at
dcn.davis.ca.us> wrote:
> 
> Coming from an Excel background, copying and pasting seems attractive, but
it does not create a reproducible record of what you did so it becomes quite
tiring and frustrating after some time has passed and you return to your
analysis.
> 
> Nitpick: you put the setdiff function in the row selection position, an
error I am sure Hadley did not recommend.
That was not how my wetware interpreter read that code. I saw it as a single
argument to "[".

Best;
David> 
> Since R is programmable, there are far more ways to select columns than
just setdiff. Since your description of desired features is vague, you are
unlikely to get the answer you would really like from your email. Some
possibilities to think about:
> 
> a) use regular expressions and grep or grepl to select by similar character
patterns. E.g. all columns including the the substring "value" or
"key": grep( "key|value", names( dta ). Possible to specify
very complex selection patterns, but there are whole books on regular
expressions, so you can't expect to learn all about them on this R-specific
mailing list.
> 
> b) use a separate csv file with a column listing each column name, and then
one column for each subset you want to define, using TRUE/FALSE values to
include or not include the column name identified. E.g.
> 
> # typically easier to manage in an external data file, online for example
only
> colsets <- read.csv( text> "Colname,set1,set2
> key,TRUE,TRUE
> value1,TRUE,FALSE
> value2,TRUE,FALSE
> factor1,FALSE,TRUE
> ",header=TRUE,as.is=TRUE)
> dta[ , colsets$set1 ]
> 
> Also your criteria of "clean listing" and
"copy-pasteable" are likely mutually exclusive, depending how you
interpret them. You might be able to use dput to export a set of column names
that can be re-imported accurately, but you might not regard it as
"clean" if you are thinking "readable".
> -- 
> Sent from my phone. Please excuse my brevity.
> 
>> On April 23, 2017 12:07:19 PM PDT, Bruce Ratner PhD <br at
dmstat1.com> wrote:
>> R-helpers:
>> I'm reading "Advanced R" (Wickham), which provides his
way, quoted
>> below, of keeping variables. This cherry-picking approach clearly is
>> not practical with a large dataset. 
>> 
>> "If you know the columns you don?t want, use set operations to
work out
>> which colums to keep: df[setdiff(names(df), "z")]"
>> 
>> I'm looking for a way of producing an output of 1000 plus
variables,
>> such that I can get a clean listing of variables, not like from st(),
>> that are easily copy-pastable for selecting the variables I want to
>> keep. 
>> 
>> Any suggestion is appreciated.
>> Thanks. 
>> Bruce
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2017 - "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables

[R] "Copy-pastable" output of 1000 plus variables