I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. John
I think there are probably a number of purposes for (advantages to?) the pipe operator. One is that it can avoid nested operations: plot(mean(sqrt(c(1:10)))) ## this is my silly example code which can get difficult to read. This is arguably easier to read and understand: c(1:10) %>% sqrt() %>% mean() %>% plot() As the chain of operations become longer, and as each "link" in the chain becomes more complex, the value of the pipe approach, compared to deep nesting in parentheses, increases, in my view. --Chris Ryan On Tue, Jan 3, 2023 at 11:48 AM Sorkin, John <jsorkin at som.umaryland.edu> wrote:> > I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: > > c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? > > P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. > > John > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear John, some more experienced users might give you a different and more helpful answer, but I was not really convinced by the pipe operator until I tried it out, for the same reasons as you. In my opinion, the pipe operator is there only to improve the readability of your code. Think about e.g. format()ing or round()ing the example you gave: you start having a lot of imbricated functions and it becomes difficult to read (because of lots of brackets, commas and so on, and it gets worse when adding arguments). The pipe operator makes it clearer. An alternative to the pipe operator with good readability is creating intermediary objects, but you create a lot of useless objects. Depending on the size of the objects, it could become problematic. Somehow, I just ended up paraphrasing Wickham & Grolemund (https://r4ds.had.co.nz/pipes.html); they explain the advantages much better than I can. In any case, once I started using it, I realized that all the pros for the pipe operator are real and now I like using it! Best, Ivan *LEIBNIZ-ZENTRUM* *F?R ARCH?OLOGIE* *Dr. Ivan CALANDRA* **Imaging Lab MONREPOS Archaeological Research Centre, Schloss Monrepos 56567 Neuwied, Germany T: +49 2631 9772 243 T: +49 6131 8885 543 ivan.calandra at leiza.de leiza.de <http://www.leiza.de/> <http://www.leiza.de/> ORCID <https://orcid.org/0000-0003-3816-6359> ResearchGate <https://www.researchgate.net/profile/Ivan_Calandra> LEIZA is a foundation under public law of the State of Rhineland-Palatinate and the City of Mainz. Its headquarters are in Mainz. Supervision is carried out by the Ministry of Science and Health of the State of Rhineland-Palatinate. LEIZA is a research museum of the Leibniz Association. On 03/01/2023 17:48, Sorkin, John wrote:> I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: > > c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? > > P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. > > John > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
The pipe shortens code and results in fewer variables because you do not have to save intermediate steps. Once you get used to the idea it is useful. Note that there is also the |> pipe that is part of base R. As far as I know it does the same thing as %>%, or at my level of programing I have not encountered a difference. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John Sent: Tuesday, January 3, 2023 11:49 AM To: 'R-help Mailing List' <r-help at r-project.org> Subject: [R] Pipe operator [External Email] I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. John ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C73edce5d4e084253a39008daedaa653f%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638083613362415015%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fV9Ca3OAleDX%2BwuPJIONYStrAdaQhXTsq61jh2pLtDY%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C73edce5d4e084253a39008daedaa653f%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638083613362415015%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YUnV9kE1RcbB3BwM5gKwKwc3qNKhIVNFtxOxKmpbGrQ%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
To expand a little on Christopher's answer. The short answer is that having the different syntaxes can lead to more readable code (when used properly). Note that there are now 2 different (but somewhat similar) pipes available in R (there could be more in some package(s) that I don't know about, but will just talk about the main 2). The %>% pipe comes from the magrittr package, but many other packages now import that package. But you need to load the magrittr package, either directly or indirectly, before you can use that pipe. The magrittr pipe is a function call, so there is small increase in time and memory for using it, but it is a small fraction of a second and a few bytes of memory, so you probably will not notice the increased usage. The core R language now has a built in pipe |> which is handled by the parser, so no extra function calls and you do not need to load any extra packages (though you need a somewhat recent version of R, within the last year or so). The built-in |> pipe is a little pickier, you need to include the parentheses in a function call, e.g. 1:10 |> mean() where the magrittr pipe can work with that call or the function without parentheses, e.g. 1:10 %>% mean or 1:10 %>% mean(), this makes %>% a little easier to work with anonymous functions. If the previous return needs to be passed to an argument other than the first, then %>% uses "." and |> uses "_". The magrittr package has additional versions of the pipe and some functions that wrap around common operators to make it easier to use them with pipes, so there are still advantages to loading that package if any of those are helpful. For a simple case like your example, the pipe probably does not help with readability much, but as we string more function calls together. For example, here are 3 ways to compute the geometric mean of the data in a vector "x": exp(mean(log(x))) logx <- log(x) mlx <- mean(logx) exp(mtx) x |> log() |> mean() |> exp() These all do the same thing, but the first option is read from the middle outward (which can be tricky) and is even more complicated if you use additional arguments to any of the functions. The second option reads top down, but requires creating intermediate variables. The last reads similar to the second, but without the extra variables. Spreading the series of function calls across multiple rows makes it easier to read and easily lets you insert a line like `print() |>` for debugging or checking intermediate results, and single lines can easily be commented out to skip that step. I have found myself using code like the following to compute a table, print it, and compute the proportions all in one step: table(f, g) |> print() |> prop.table() The pipes also work very well with the tidyverse, or even the tidy data ideas without those packages where we use a single function for each change, e.g. start with a data frame, select a subset of the columns, filter to a subset of the rows, mutate a column, join to another data frame, then pass the final result to a modeling function like `lm` (and then pass that result to a summary function). This is nicely readable when each step is its own line. On Tue, Jan 3, 2023 at 9:49 AM Sorkin, John <jsorkin at som.umaryland.edu> wrote:> > I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: > > c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? > > P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. > > John > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
John, The topic has indeed been discussed here endlessly but new people still stumble upon it. Until recently, the formal R language did not have a built-in pipe functionality. It was widely used through an assortment of packages and there are quite a few variations on the theme including different implementations. Most existing code does use the operator %>% but there is now a built-in |> operator that is generally faster but is not as easy to use in a few cases. Please forget the use of the word FILE here. Pipes are a form of syntactic sugar that generally is about the FIRST argument to a function. They are NOT meant to be used just for the trivial case you mention where indeed there is an easy way to do things. Yes, they work in such situations. But consider a deeply nested expression like this: Result <- round(max(cos(x), 3.14159/4), 3) There are MANY deeper nested expressions like this commonly used. The above can be written linearly as in Temp1 <- cos(x) Temp2 <- max(Temp1, 3.14159/4) Result <- round(Temp2, 3) Translation, for some variable x, calculate the cosine and take the maximum value of it as compared to pi/4 and round the result to three decimal places. Not an uncommon kind of thing to do and sometimes you can nest such things many layers deep and get hopelessly confused if not done somewhat linearly. What pipes allow is to write this closer to the second way while not seeing or keeping any temporary variables around. The goal is to replace the FIRST argument to a function with whatever resulted as the value of the previous expression. That is often a vector or data.frame or list or any kind of object but can also be fairly complex as in a list of lists of matrices. So you can still start with cos(x) OR you can write this where the x is removed from within and leaves cos() empty: x %>% cos or x |> cos() In the previous version of pipes the parentheses after cos() are optional if there are no additional arguments but the new pipe requires them. So continuing the above, using multiple lines, the pipe looks like: Result <- x %>% cos() %>% max(3.14159/4) %>% round(3) This gives the same result but is arguably easier for some to read and follow. Nobody forces you to use it and for simple cases, most people don't. There is a grouping of packages called the tidyverse that makes heavy use of pipes routine as they made most or all their functions such that the first argument is the one normally piped to and it can be very handy to write code that says, read in your data into a variable (a data.frame or tibble often) and PIPE IT to a function that renames some columns and PIPE the resulting modified object to a function that retains only selected rows and pipe that to a function that drops some of the columns and pipe that to a function that groups the items or sorts them and pipe that to a function that does a join with another object or generates a report or so many other things. So the real answer is that piping is another WAY of doing things from a programmers perspective. Underneath it all, it is mostly syntactic sugar and the interpreter rearranges your code and performs the steps in what seems like a different order at times. Generally, you do not need to care. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John Sent: Tuesday, January 3, 2023 11:49 AM To: 'R-help Mailing List' <r-help at r-project.org> Subject: [R] Pipe operator I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. John ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
R is a functional language, hence the pipe operator is not needed. Also it makes the code unreadable as it is less obvious how a call stack looks like and what the arguments to the function calls are. It is relevant for a shell for piping text streams. If people cannot live without the pipe operator (and I wonder why you want to add a level of complexity, as it is more obfuscated what the actual function calls are), please use R's internal one, as it is known by the parser and hence debugging etc is better integrated. Best, Uwe Ligges On 03.01.2023 17:48, Sorkin, John wrote:> I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: > > c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? > > P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. > > John > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
The simplest and best answer is "fashion". In FSharp,> (|>);;val it: ('a -> ('a -> 'b) -> 'b) The ability to turn f x y into y |> f x makes perfect sense in a programming language where Currying (representing a function of n arguments as a function of 1 argument that returns a function of n-1 arguments, similarly represented) is a way of life. It can result in code that is more readable. And it is pretty much unavoidable: let x |> f = f x is definable in the language. In programming languages like Erlang and R, where Currying is *not* a way of life, the matter is otherwise. Really, it's all about whether you talk like Luke or like Yoda talk, it's not about what you say or efficiency or anything but perceived readability. On Wed, 4 Jan 2023 at 05:49, Sorkin, John <jsorkin at som.umaryland.edu> wrote:> I am trying to understand the reason for existence of the pipe operator, > %>%, and when one should use it. It is my understanding that the operator > sends the file to the left of the operator to the function immediately to > the right of the operator: > > c(1:10) %>% mean results in a value of 5.5 which is exactly the same as > the result one obtains using the mean function directly, viz. > mean(c(1:10)). What is the reason for having two syntactically different > but semantically identical ways to call a function? Is one more efficient > than the other? Does one use less memory than the other? > > P.S. Please forgive what might seem to be a question with an obvious > answer. I am a programmer dinosaur. I have been programming for more than > 50 years. When I started programming in the 1960s the only pipe one spoke > about was a bong. > > John > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
With 50 years of programming experience, just think about how useful pipe operator is in shell scripting. The output of previous call becomes the input of next call... Genious idea from our beloved unix conversion... On 01/03/23 16:48, Sorkin, John wrote:>I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator: > >c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other? > >P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong. > >John