I am very agnostic about tidyverse/base R. However, the complexity of
setting up NSE functions is often simply not needed, and I encounter so
many people who simply disregard base R as being too outdated so that they
never learn how simple solutions in R can be. The contrast between your
solution and Bert's was... perhaps informative, but a nuclear bomb where
an axe was sufficient.
On Fri, 2 Jul 2021, Avi Gross via R-help wrote:
> I know what you mean Jeff. Yes I am very familiar with base R techniques.
What I had hoped for was to do two things that some of the other methods
mentioned do that ended up bringing two data.frames together as part of the
solution.
>
> Much of what I used is now standard R. I was looking at the accessory
functions now commonly used in dplyr that let you dynamically select which
columns to work with like begins_with() to choose. Sadly, they seem to work on a
top-level but not easily within a call to something like paste(...) where they
are not evaluated in the way I want.
>
> But the odd method I tried can also be used in standard R with a bit of
work. You can create a function without using dplyr that takes your df and uses
it to concatenate and end with something like:
>
> df$new_col <- do_something(df, selected_cols)
>
> That too adds a column without the need to merge larger structures
explicitly..
>
> But your other point is a tad religious in a sense. I happen to prefer
learning a core language first then looking at enhancement opportunities. But at
some point, if teaching someone new who wants to focus on getting a job done
simply but not necessarily repeatedly or in some ideal way, it is best to do
things in a way that their mind flows better.
>
> Many things in the tidyverse are redundant with base R or just
"fix" inconsistencies like making sure the first argument is always
the same. But many add substantially to doing things in a more step-by-step
manner.
>
> I do not worship the base language as it first came out or even as it has
evolved. I do like to know what choices I have and pick and choose among them as
needed. Of course a forum like this is more about base R than otherwise and I
acknowledge that. Still, the ":=" operator is now base R. There is a
new pipeline operator "|>" in base R. Some ideas, good or
otherwise, do get in eventually.
>
> I started doing graphs using base R as in the plot() command. It was
adequate but I wanted better. So I learned about Lattice and various packages
and eventually ggplot. I can now do things I barely imagined before and am still
learning that there is much more I can do with packages underneath much of the
magic and also additional packages layered above it, in some sense. So I do not
approach that with an either-or mentality either.
>
> Note I am not really talking about just R. I have similar issues with other
languages I program in such as Python. None of them were created fully-formed
and many had to add huge amounts to adapt to additional wants and needs. Base R
for me is often inadequate. But so what?
>
> The task being asked for in this thread in isolation, indeed may not be
done any better using packages. However, if it is part of a larger set of tasks
that can be pipelined, it may well be and I personally was wondering if there
was a way in dplyr. There probably is a much better way than I assembled if I
only knew about it, and if not, they may add this kind of indirection in a
future release if deemed worthy of doing. I have gone back to programs I did
years ago with humungous amounts of code using what I knew then and reducing it
drastically now that I can tell a function to select say all my column names
that end in .orig and apply a set of functions to them with output going to the
base name followed by .mean and .sd and so on. All that can often be done in one
or two lines of code where previously I had to do 18 near repetitions of each
part and then another and another. That used a limited form of dynamism.
>
> Be that as it may I think the requester has enough info and we can move on.
>
> -----Original Message-----
> From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> Sent: Friday, July 2, 2021 1:03 AM
> To: Avi Gross <avigross at verizon.net>; Avi Gross via R-help
<r-help at r-project.org>; R-help at r-project.org
> Subject: Re: [R] concatenating columns in data.frame
>
> I use parts of the tidyverse frequently, but this post is the best argument
I can imagine for learning base R techniques.
>
> On July 1, 2021 8:41:06 PM PDT, Avi Gross via R-help <r-help at
r-project.org> wrote:
>> Micha,
>>
>> Others have provided ways in standard R so I will contribute a somewhat
>> odd solution using the dplyr and related packages in the tidyverse
>> including a sample data.frame/tibble I made. It requires newer versions
>> of R and other packages as it uses some fairly esoteric features
>> including "the big bang" and the new ":=" operator
and more.
>>
>> You can use your own data with whatever columns you need, of course.
>>
>> The goal is to have umpteen columns in the data that you want to add an
>> additional columns to an existing tibble that is the result of
>> concatenating the rowwise contents of a dynamically supplied vector of
>> column names in quotes. First we need something to work with so here is
>> a sample:
>>
>> #--start
>> # load required packages, or a bunch at once!
>> library(tidyverse)
>>
>> # Pick how many rows you want. For a demo, 3 is plenty N <- 3
>>
>> # Make a sample tibble with N rows and the following 4 columns mydf
<-
>> tibble(alpha = 1:N,
>> beta=letters[1:N],
>> gamma = N:1,
>> delta = month.abb[1:N])
>>
>> # show the original tibble
>> print(mydf)
>> #--end
>>
>> In flat text mode, here is the output:
>>
>>> print(mydf)
>> # A tibble: 3 x 4
>> alpha beta gamma delta
>> <int> <chr> <int> <chr>
>> 1 1 a 3 Jan
>> 2 2 b 2 Feb
>> 3 3 c 1 Mar
>>
>> Now I want to make a function that is used instead of the mutate verb.
>> I made a weird one-liner that is a tad hard to explain so first let me
>> mention the requirements.
>>
>> It will take a first argument that is a tibble and in a pipeline this
>> would be passed invisibly.
>> The second required argument is a vector or list containing the names
>> of the columns as strings. A column can be re-used multiple times.
>> The third optional argument is what to name the new column with a
>> default if omitted.
>> The fourth optional argument allows you to choose a different separator
>> than "" if you wish.
>>
>> The function should be usable in a pipeline on both sides so it should
>> also return the input tibble with an extra column to the output.
>>
>> Here is the function:
>>
>> my_mutate <- function(df, columns, colnew="concatenated",
sep=""){
>> df %>%
>> mutate( "{colnew}" := paste(!!!rlang::syms(columns), sep =
sep )) }
>>
>> Yes, the above can be done inline as a long one-liner:
>>
>> my_mutate <- function(df, columns, colnew="concatenated",
sep="")
>> mutate(df, "{colnew}" := paste(!!!rlang::syms(columns), sep =
sep ))
>>
>> Here are examples of it running:
>>
>>
>>> choices <- c("beta", "delta",
"alpha", "delta") mydf %>%
>>> my_mutate(choices, "me2")
>> # A tibble: 3 x 5
>> alpha beta gamma delta me2
>> <int> <chr> <int> <chr> <chr>
>> 1 1 a 3 Jan aJan1Jan
>> 2 2 b 2 Feb bFeb2Feb
>> 3 3 c 1 Mar cMar3Mar
>>> mydf %>% my_mutate(choices, "me2",":")
>> # A tibble: 3 x 5
>> alpha beta gamma delta me2
>> <int> <chr> <int> <chr> <chr>
>> 1 1 a 3 Jan a:Jan:1:Jan
>> 2 2 b 2 Feb b:Feb:2:Feb
>> 3 3 c 1 Mar c:Mar:3:Mar
>>> mydf %>% my_mutate(c("beta", "beta",
"gamma", "gamma", "delta",
>>> "alpha"))
>> # A tibble: 3 x 5
>> alpha beta gamma delta concatenated
>> <int> <chr> <int> <chr> <chr>
>> 1 1 a 3 Jan aa33Jan1
>> 2 2 b 2 Feb bb22Feb2
>> 3 3 c 1 Mar cc11Mar3
>>> mydf %>% my_mutate(list("beta", "beta",
"gamma", "gamma", "delta",
>>> "alpha"))
>> # A tibble: 3 x 5
>> alpha beta gamma delta concatenated
>> <int> <chr> <int> <chr> <chr>
>> 1 1 a 3 Jan aa33Jan1
>> 2 2 b 2 Feb bb22Feb2
>> 3 3 c 1 Mar cc11Mar3
>>> mydf %>% my_mutate(columns=list("alpha",
"beta", "gamma", "delta",
>>> "gamma", "beta", "alpha"),
>> + sep="/*/",
>> +
colnew="NewRandomNAME"
>> + )
>> # A tibble: 3 x 5
>> alpha beta gamma delta NewRandomNAME
>> <int> <chr> <int> <chr> <chr>
>> 1 1 a 3 Jan 1/*/a/*/3/*/Jan/*/3/*/a/*/1
>> 2 2 b 2 Feb 2/*/b/*/2/*/Feb/*/2/*/b/*/2
>> 3 3 c 1 Mar 3/*/c/*/1/*/Mar/*/1/*/c/*/3
>>
>> Does this meet your normal need? Just to show it works in a pipeline,
>> here is a variant:
>>
>> mydf %>%
>> tail(2) %>%
>> my_mutate(c("beta", "beta"), "betabeta")
%>%
>> print() %>%
>> my_mutate(list("alpha", "betabeta",
"gamma"),
>> "buildson",
>> "&")
>>
>> The above only keeps the last two lines of the tibble, makes a double
>> copy of "beta" under a new name, prints the intermediate
result,
>> continues to make another concatenation using the variable created
>> earlier then prints the result:
>>
>> Here is the run:
>>
>>> mydf %>%
>> + tail(2) %>%
>> + my_mutate(c("beta", "beta"),
"betabeta") %>%
>> + print() %>%
>> + my_mutate(list("alpha", "betabeta",
"gamma"),
>> + "buildson",
>> + "&")
>> # A tibble: 2 x 5
>> alpha beta gamma delta betabeta
>> <int> <chr> <int> <chr> <chr>
>> 1 2 b 2 Feb bb
>> 2 3 c 1 Mar cc
>> # A tibble: 2 x 6
>> alpha beta gamma delta betabeta buildson
>> <int> <chr> <int> <chr> <chr>
<chr>
>> 1 2 b 2 Feb bb 2&bb&2
>> 2 3 c 1 Mar cc 3&cc&1
>>
>> As to how the darn function works, that was a learning experience for
>> me to build using features I have not had occasion to use. If anyone
>> remains interested, read on.
>>
>> The following needs newish features:
>>
>> "{colnew}" := SOMETHING
>>
>> The colon-equals operator in newer R/dplyr can be sort of used in an
>> odd way that allows the name of the variable to be in quotes and in
>> brackets akin to the way glue() does it. The variable colnew is
>> evaluated and substituted so the name used for the column is now
>> dynamic.
>>
>> The function does a paste using this:
>>
>> !!!rlang::syms(columns)
>>
>> The problem is paste() wants multiple arguments and we have a single
>> argument that is either a vector or another kind of vector called a
>> list. The trick is to convert the vector into symbols then use
"!!!" to
>> convert something like 'c("alpha", "beta",
"gamma")' into something
>> more like ' "alpha", "beta", "gamma"
' so that paste sees them as
>> multiple arguments to concatenate in vector fashion.
>>
>> And, the function is not polished but I am sure you can all see some of
>> what is needed like checking the arguments for validity, including not
>> having a name for the new column that clashes with existing column
>> names, doing something sane if no columns to concatenate are offered
>> and so on.
>>
>> Just showing a different approach. The base R methods are fine.
>>
>> - Avi
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces at r-project.org> On Behalf Of Micha
Silver
>> Sent: Thursday, July 1, 2021 10:36 AM
>> To: R-help at r-project.org
>> Subject: [R] concatenating columns in data.frame
>>
>> I need to create a new data.frame column as a concatenation of existing
>> character columns. But the number and name of the columns to
>> concatenate needs to be passed in dynamically. The code below does what
>> I want, but seems very clumsy. Any suggestions how to improve?
>>
>>
>> df = data.frame("A"=sample(letters, 10),
"B"=sample(letters, 10),
>> "C"=sample(letters,10), "D"=sample(letters, 10))
>>
>> # Which columns to concat:
>>
>> use_columns = c("D", "B")
>>
>>
>> UpdateCombo = function(df, use_columns) {
>> use_df = df[, use_columns]
>> combo_list = lapply(1:nrow(use_df), function(r) {
>> r_combo = paste(use_df[r,], collapse="_")
>> return(data.frame("Combo" = r_combo))
>> })
>> combo = do.call(rbind, combo_list)
>>
>> names(combo) = "Combo"
>>
>> return(combo)
>>
>> }
>>
>>
>> combo_col = UpdateCombo(df, use_columns)
>>
>> df_combo = do.call(cbind, list(df, combo_col))
>>
>>
>> Thanks
>>
>>
>> --
>> Micha Silver
>> Ben Gurion Univ.
>> Sde Boker, Remote Sensing Lab
>> cell: +972-523-665918
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k