Antonin Klima
2017-May-05 17:00 UTC
[Rd] A few suggestions and perspectives from a PhD student
Dear Sir or Madam, I am in 2nd year of my PhD in bioinformatics, after taking my Master?s in computer science, and have been using R heavily during my PhD. As such, I have put together a list of certain features in R that, in my opinion, would be beneficial to add, or could be improved. The first two are already implemented in packages, but given that it is implemented as user-defined operators, it greatly restricts its usefulness. I hope you will find my suggestions interesting. If you find time, I will welcome any feedback as to whether you find the suggestions useful, or why you do not think they should be implemented. I will also welcome if you enlighten me with any features I might be unaware of, that might solve the issues I have pointed out below. 1) piping Currently available in package magrittr, piping makes the code better readable by having the line start at its natural starting point, and following with functions that are applied - in order. The readability of several nested calls with a number of parameters each is almost zero, it?s almost as if one would need to come up with the solution himself. Pipeline in comparison is very straightforward, especially together with the point (2). The package here works rather good nevertheless, the shortcomings of piping not being native are not quite as severe as in point (2). Nevertheless, an intuitive symbol such as | would be helpful, and it sometimes bothers me that I have to parenthesize anonymous function, which would probably not be required in a native pipe-operator, much like it is not required in f.ex. lapply. That is, 1:5 %>% function(x) x+2 should be totally fine 2) currying Currently available in package Curry. The idea is that, having a function such as foo = function(x, y) x+y, one would like to write for example lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not make a value result, but it can still give a function result - a function of y. This would be indeed most useful for various apply functions, rather than writing function(x) foo(3,x). I suggest that currying would make the code easier to write, and more readable, especially when using apply functions. One might imagine that there could be some confusion with such a feature, especially from people unfamiliar with functional programming, although R already does take function as first-order arguments, so it could be just fine. But one could address it with special syntax, such as $foo(3) [$foo(x=3)] for partial application. The current currying package has very limited usefulness, as, being limited by the user-defined operator framework, it only rarely can contribute to less code/more readability. Compare yourself: $foo(x=3) vs foo %<% 3 goo = function(a,b,c) $goo(b=3) vs goo %><% list(b=3) Moreover, one would often like currying to have highest priority. For example, when piping: data %>% foo %>% foo1 %<% 3 if one wants to do data %>% foo %>% $foo(x=3) 3) Code executable only when running the script itself Whereas the first two suggestions are somewhat stealing from Haskell and the like, this suggestion would be stealing from Python. I?m building quite a complicated pipeline, using S4 classes. After defining the class and its methods, I also define how to build the class to my likings, based on my input data, using various now-defined methods. So I end up having a list of command line arguments to process, and the way to create the class instance based on them. If I write it to the class file, however, I end up running the code when it is sourced from the next step in the pipeline, that needs the previous class definitions. A feature such as pythonic ?if __name__ == __main__? would thus be useful. As it is, I had to create run scripts as separate files. Which is actually not so terrible, given the class and its methods often span a few hundred lines, but still. 4) non-exported global variables I also find it lacking, that I seem to be unable to create constants that would not get passed to files that source the class definition. That is, if class1 features global constant CONSTANT=3, then if class2 sources class1, it will also include the constant. This 1) clutters the namespace when running the code interactively, 2) potentially overwrites the constants in case of nameclash. Some kind of export/nonexport variable syntax, or symbolic import, or namespace would be useful. I know if I converted it to a package I would get at least something like a namespace, but still. I understand that the variable cannot just not be imported, in general, as the functions will generally rely on it (otherwise it wouldn?t have to be there). But one could consider hiding it in an implicit namespace for the file, for example. 5) S4 methods with same name, for different classes Say I have an S4 class called datasetSingle, and another S4 class called datasetMulti, which gathers up a number of datasetSingle classes, and adds some extra functionality on top. The datasetSingle class may have a method replicates, that returns a named vector assigning replicate number to experiment names of the dataset. But I would also like to have a function with the same name for the datasetMulti class, that returns for data frame, or list, covering replicate numbers for all the datasets included. But then, I need to setGeneric for the method. But if I set generic before both implementations, I will reset the generic in the second call, losing the definition for ?replicates? for datasetSingle. Skipping this in the code for datasetMulti means that 1) I have to remember that I had the function defined for datasetSingle, 2) if I remove the function or change its name in datasetSingle, I now have to change the datasetMulti class file too. Moreover, if I would like to have a different generic for the datasetMulti version, I have to change it not in datasetMulti class file, but in the datasetSingle file, where it might not make much sense. In this case, I wanted to have another argument ?datasets?, which would return the replicates only for the datasets specified, rather than for all. I made a wrapper that could circumvent the first issue, but the second issue is not easy to circumvent. 6) Many parameters freeze S4 method calls If I specify ca over 6 parameters for an S4 method, I would often get a ?freeze? on the method call. The process would eat up a lot of memory before going into the call, upon which it would execute the call as normal (if it didn?t run out of memory or I didn?t run out of patience). Subsequent calls of the method would not include this overhead. The amount of memory this could take could be in gigabytes, and the time in minutes. I suspect this might be due to generating an entry in call table for each accepted signature. It can be circumvented, but sure isn?t a behaviour one would expect. 7) Default values for S4 methods It would seem that it is not possible to set up default parameters for an S4 method in a usual way of definiton = function (x, y=5). I resorted to making class unions with ?missing? for signatures on the call, with the call starting with if(missing(param)) param=DEFAULT_VALUE, but it certainly does not improve readability or ease of coding. Thank you for your time if you have finished reading thus far. :) Looking forward to any answer. Yours Sincerely, Antonin Klima
Ista Zahn
2017-May-05 17:55 UTC
[Rd] A few suggestions and perspectives from a PhD student
On Fri, May 5, 2017 at 1:00 PM, Antonin Klima <antonink at idi.ntnu.no> wrote:> Dear Sir or Madam, > > I am in 2nd year of my PhD in bioinformatics, after taking my Master?s in computer science, and have been using R heavily during my PhD. As such, I have put together a list of certain features in R that, in my opinion, would be beneficial to add, or could be improved. The first two are already implemented in packages, but given that it is implemented as user-defined operators, it greatly restricts its usefulness.Why do you think being implemented in a contributed package restricts the usefulness of a feature? I hope you will find my suggestions interesting. If you find time, I will welcome any feedback as to whether you find the suggestions useful, or why you do not think they should be implemented. I will also welcome if you enlighten me with any features I might be unaware of, that might solve the issues I have pointed out below.> > 1) piping > Currently available in package magrittr, piping makes the code better readable by having the line start at its natural starting point, and following with functions that are applied - in order. The readability of several nested calls with a number of parameters each is almost zero, it?s almost as if one would need to come up with the solution himself. Pipeline in comparison is very straightforward, especially together with the point (2).You may be surprised to learn that not everyone thinks pipes are a good idea. Personally I see some advantages, but there is also a big downside with is that they mess up the call stack and make tracking down errors via traceback() more difficult. There is a simple alternative to pipes already built in to R that gives you some of the advantages of %>% without messing up the call stack. Using Hadley's famous "little bunny foo foo" example: foo_foo <- little_bunny() ## nesting (it is rough) bop( scoop( hop(foo_foo, through = forest), up = field_mice ), on = head ) ## magrittr foo_foo %>% hop(through = forest) %>% scoop(up = field_mouse) %>% bop(on = head) ## regular R assignment foo_foo -> . hop(., through = forest) -> . scoop(., up = field_mouse) -> . bop(., on = head) This is more limited that magrittr's %>%, but it gives you a lot of the advantages without the disadvantages.> > The package here works rather good nevertheless, the shortcomings of piping not being native are not quite as severe as in point (2). Nevertheless, an intuitive symbol such as | would be helpful, and it sometimes bothers me that I have to parenthesize anonymous function, which would probably not be required in a native pipe-operator, much like it is not required in f.ex. lapply. That is, > 1:5 %>% function(x) x+2 > should be totally fineThat seems pretty small-potatoes to me.> > 2) currying > Currently available in package Curry. The idea is that, having a function such as foo = function(x, y) x+y, one would like to write for example lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not make a value result, but it can still give a function result - a function of y. This would be indeed most useful for various apply functions, rather than writing function(x) foo(3,x).You can already do lapply(1:5, foo, y = 3) (assuming that the first argument to foo is named "y") I'm stopping here since I don't have anything useful to say about your subsequent points. Best, Ista> > I suggest that currying would make the code easier to write, and more readable, especially when using apply functions. One might imagine that there could be some confusion with such a feature, especially from people unfamiliar with functional programming, although R already does take function as first-order arguments, so it could be just fine. But one could address it with special syntax, such as $foo(3) [$foo(x=3)] for partial application. The current currying package has very limited usefulness, as, being limited by the user-defined operator framework, it only rarely can contribute to less code/more readability. Compare yourself: > $foo(x=3) vs foo %<% 3 > goo = function(a,b,c) > $goo(b=3) vs goo %><% list(b=3) > > Moreover, one would often like currying to have highest priority. For example, when piping: > data %>% foo %>% foo1 %<% 3 > if one wants to do data %>% foo %>% $foo(x=3) > > 3) Code executable only when running the script itself > Whereas the first two suggestions are somewhat stealing from Haskell and the like, this suggestion would be stealing from Python. I?m building quite a complicated pipeline, using S4 classes. After defining the class and its methods, I also define how to build the class to my likings, based on my input data, using various now-defined methods. So I end up having a list of command line arguments to process, and the way to create the class instance based on them. If I write it to the class file, however, I end up running the code when it is sourced from the next step in the pipeline, that needs the previous class definitions. > > A feature such as pythonic ?if __name__ == __main__? would thus be useful. As it is, I had to create run scripts as separate files. Which is actually not so terrible, given the class and its methods often span a few hundred lines, but still. > > 4) non-exported global variables > I also find it lacking, that I seem to be unable to create constants that would not get passed to files that source the class definition. That is, if class1 features global constant CONSTANT=3, then if class2 sources class1, it will also include the constant. This 1) clutters the namespace when running the code interactively, 2) potentially overwrites the constants in case of nameclash. Some kind of export/nonexport variable syntax, or symbolic import, or namespace would be useful. I know if I converted it to a package I would get at least something like a namespace, but still. > > I understand that the variable cannot just not be imported, in general, as the functions will generally rely on it (otherwise it wouldn?t have to be there). But one could consider hiding it in an implicit namespace for the file, for example. > > 5) S4 methods with same name, for different classes > Say I have an S4 class called datasetSingle, and another S4 class called datasetMulti, which gathers up a number of datasetSingle classes, and adds some extra functionality on top. The datasetSingle class may have a method replicates, that returns a named vector assigning replicate number to experiment names of the dataset. But I would also like to have a function with the same name for the datasetMulti class, that returns for data frame, or list, covering replicate numbers for all the datasets included. > > But then, I need to setGeneric for the method. But if I set generic before both implementations, I will reset the generic in the second call, losing the definition for ?replicates? for datasetSingle. Skipping this in the code for datasetMulti means that 1) I have to remember that I had the function defined for datasetSingle, 2) if I remove the function or change its name in datasetSingle, I now have to change the datasetMulti class file too. Moreover, if I would like to have a different generic for the datasetMulti version, I have to change it not in datasetMulti class file, but in the datasetSingle file, where it might not make much sense. In this case, I wanted to have another argument ?datasets?, which would return the replicates only for the datasets specified, rather than for all. > > I made a wrapper that could circumvent the first issue, but the second issue is not easy to circumvent. > > 6) Many parameters freeze S4 method calls > If I specify ca over 6 parameters for an S4 method, I would often get a ?freeze? on the method call. The process would eat up a lot of memory before going into the call, upon which it would execute the call as normal (if it didn?t run out of memory or I didn?t run out of patience). Subsequent calls of the method would not include this overhead. The amount of memory this could take could be in gigabytes, and the time in minutes. I suspect this might be due to generating an entry in call table for each accepted signature. It can be circumvented, but sure isn?t a behaviour one would expect. > > 7) Default values for S4 methods > It would seem that it is not possible to set up default parameters for an S4 method in a usual way of definiton = function (x, y=5). I resorted to making class unions with ?missing? for signatures on the call, with the call starting with if(missing(param)) param=DEFAULT_VALUE, but it certainly does not improve readability or ease of coding. > > > Thank you for your time if you have finished reading thus far. :) Looking forward to any answer. > > Yours Sincerely, > Antonin Klima > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Gabor Grothendieck
2017-May-05 20:33 UTC
[Rd] A few suggestions and perspectives from a PhD student
Regarding the anonymous-function-in-a-pipeline point one can already do this which does use brackets but even so it involves fewer characters than the example shown. Here { . * 2 } is basically a lambda whose argument is dot. Would this be sufficient? library(magrittr) 1.5 %>% { . * 2 } ## [1] 3 Regarding currying note that with magrittr Ista's code could be written as: 1:5 %>% lapply(foo, y = 3) or at the expense of slightly more verbosity: 1:5 %>% Map(f = . %>% foo(y = 3)) On Fri, May 5, 2017 at 1:00 PM, Antonin Klima <antonink at idi.ntnu.no> wrote:> Dear Sir or Madam, > > I am in 2nd year of my PhD in bioinformatics, after taking my Master?s in computer science, and have been using R heavily during my PhD. As such, I have put together a list of certain features in R that, in my opinion, would be beneficial to add, or could be improved. The first two are already implemented in packages, but given that it is implemented as user-defined operators, it greatly restricts its usefulness. I hope you will find my suggestions interesting. If you find time, I will welcome any feedback as to whether you find the suggestions useful, or why you do not think they should be implemented. I will also welcome if you enlighten me with any features I might be unaware of, that might solve the issues I have pointed out below. > > 1) piping > Currently available in package magrittr, piping makes the code better readable by having the line start at its natural starting point, and following with functions that are applied - in order. The readability of several nested calls with a number of parameters each is almost zero, it?s almost as if one would need to come up with the solution himself. Pipeline in comparison is very straightforward, especially together with the point (2). > > The package here works rather good nevertheless, the shortcomings of piping not being native are not quite as severe as in point (2). Nevertheless, an intuitive symbol such as | would be helpful, and it sometimes bothers me that I have to parenthesize anonymous function, which would probably not be required in a native pipe-operator, much like it is not required in f.ex. lapply. That is, > 1:5 %>% function(x) x+2 > should be totally fine > > 2) currying > Currently available in package Curry. The idea is that, having a function such as foo = function(x, y) x+y, one would like to write for example lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not make a value result, but it can still give a function result - a function of y. This would be indeed most useful for various apply functions, rather than writing function(x) foo(3,x). > > I suggest that currying would make the code easier to write, and more readable, especially when using apply functions. One might imagine that there could be some confusion with such a feature, especially from people unfamiliar with functional programming, although R already does take function as first-order arguments, so it could be just fine. But one could address it with special syntax, such as $foo(3) [$foo(x=3)] for partial application. The current currying package has very limited usefulness, as, being limited by the user-defined operator framework, it only rarely can contribute to less code/more readability. Compare yourself: > $foo(x=3) vs foo %<% 3 > goo = function(a,b,c) > $goo(b=3) vs goo %><% list(b=3) > > Moreover, one would often like currying to have highest priority. For example, when piping: > data %>% foo %>% foo1 %<% 3 > if one wants to do data %>% foo %>% $foo(x=3) > > 3) Code executable only when running the script itself > Whereas the first two suggestions are somewhat stealing from Haskell and the like, this suggestion would be stealing from Python. I?m building quite a complicated pipeline, using S4 classes. After defining the class and its methods, I also define how to build the class to my likings, based on my input data, using various now-defined methods. So I end up having a list of command line arguments to process, and the way to create the class instance based on them. If I write it to the class file, however, I end up running the code when it is sourced from the next step in the pipeline, that needs the previous class definitions. > > A feature such as pythonic ?if __name__ == __main__? would thus be useful. As it is, I had to create run scripts as separate files. Which is actually not so terrible, given the class and its methods often span a few hundred lines, but still. > > 4) non-exported global variables > I also find it lacking, that I seem to be unable to create constants that would not get passed to files that source the class definition. That is, if class1 features global constant CONSTANT=3, then if class2 sources class1, it will also include the constant. This 1) clutters the namespace when running the code interactively, 2) potentially overwrites the constants in case of nameclash. Some kind of export/nonexport variable syntax, or symbolic import, or namespace would be useful. I know if I converted it to a package I would get at least something like a namespace, but still. > > I understand that the variable cannot just not be imported, in general, as the functions will generally rely on it (otherwise it wouldn?t have to be there). But one could consider hiding it in an implicit namespace for the file, for example. > > 5) S4 methods with same name, for different classes > Say I have an S4 class called datasetSingle, and another S4 class called datasetMulti, which gathers up a number of datasetSingle classes, and adds some extra functionality on top. The datasetSingle class may have a method replicates, that returns a named vector assigning replicate number to experiment names of the dataset. But I would also like to have a function with the same name for the datasetMulti class, that returns for data frame, or list, covering replicate numbers for all the datasets included. > > But then, I need to setGeneric for the method. But if I set generic before both implementations, I will reset the generic in the second call, losing the definition for ?replicates? for datasetSingle. Skipping this in the code for datasetMulti means that 1) I have to remember that I had the function defined for datasetSingle, 2) if I remove the function or change its name in datasetSingle, I now have to change the datasetMulti class file too. Moreover, if I would like to have a different generic for the datasetMulti version, I have to change it not in datasetMulti class file, but in the datasetSingle file, where it might not make much sense. In this case, I wanted to have another argument ?datasets?, which would return the replicates only for the datasets specified, rather than for all. > > I made a wrapper that could circumvent the first issue, but the second issue is not easy to circumvent. > > 6) Many parameters freeze S4 method calls > If I specify ca over 6 parameters for an S4 method, I would often get a ?freeze? on the method call. The process would eat up a lot of memory before going into the call, upon which it would execute the call as normal (if it didn?t run out of memory or I didn?t run out of patience). Subsequent calls of the method would not include this overhead. The amount of memory this could take could be in gigabytes, and the time in minutes. I suspect this might be due to generating an entry in call table for each accepted signature. It can be circumvented, but sure isn?t a behaviour one would expect. > > 7) Default values for S4 methods > It would seem that it is not possible to set up default parameters for an S4 method in a usual way of definiton = function (x, y=5). I resorted to making class unions with ?missing? for signatures on the call, with the call starting with if(missing(param)) param=DEFAULT_VALUE, but it certainly does not improve readability or ease of coding. > > > Thank you for your time if you have finished reading thus far. :) Looking forward to any answer. > > Yours Sincerely, > Antonin Klima > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Antonin Klima
2017-May-08 12:08 UTC
[Rd] A few suggestions and perspectives from a PhD student
Thanks for the answers, I?m aware of the ?.? option, just wanted to give a very simple example. But the lapply ??' parameter use has eluded me and thanks for enlightening me. What do you mean by messing up the call stack. As far as I understand it, piping should translate into same code as deep nesting. So then I only see a tiny downside for debugging here. No loss of time/space efficiency or anything. With a change of inadvertent error in your example, coming from the fact that a variable is being reused and noone now checks for me whether it is being passed between the lines. And with having to specify the variable every single time. For me, that solution is clearly inferior. Too bad you didn?t find my other comments interesting though.>Why do you think being implemented in a contributed package restricts >the usefulness of a feature?I guess it depends on your philosophy. It may not restrict it per say, although it would make a lot of sense to me reusing the bash-style ?|' and have a shorter, more readable version. One has extra dependence on a package for an item that fits the language so well that it should be its part. It is without doubt my most used operator at least. Going to some of my folders I found 101 uses in 750 lines, and 132 uses in 3303 lines. I would compare it to having a computer game being really good with a fan-created mod, but lacking otherwise. :) So to me, it makes sense that if there is no doubt that a feature improves the language, and especially if people extensively use it through a package already, it should be part of the ?standard?. Question is whether it is indeed very popular, and whether you share my view. But that?s now up to you, I just wanted to point it out I guess. Best Regards, Antonin> On 05 May 2017, at 22:33, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: > > Regarding the anonymous-function-in-a-pipeline point one can already > do this which does use brackets but even so it involves fewer > characters than the example shown. Here { . * 2 } is basically a > lambda whose argument is dot. Would this be sufficient? > > library(magrittr) > > 1.5 %>% { . * 2 } > ## [1] 3 > > Regarding currying note that with magrittr Ista's code could be written as: > > 1:5 %>% lapply(foo, y = 3) > > or at the expense of slightly more verbosity: > > 1:5 %>% Map(f = . %>% foo(y = 3)) > > > On Fri, May 5, 2017 at 1:00 PM, Antonin Klima <antonink at idi.ntnu.no> wrote: >> Dear Sir or Madam, >> >> I am in 2nd year of my PhD in bioinformatics, after taking my Master?s in computer science, and have been using R heavily during my PhD. As such, I have put together a list of certain features in R that, in my opinion, would be beneficial to add, or could be improved. The first two are already implemented in packages, but given that it is implemented as user-defined operators, it greatly restricts its usefulness. I hope you will find my suggestions interesting. If you find time, I will welcome any feedback as to whether you find the suggestions useful, or why you do not think they should be implemented. I will also welcome if you enlighten me with any features I might be unaware of, that might solve the issues I have pointed out below. >> >> 1) piping >> Currently available in package magrittr, piping makes the code better readable by having the line start at its natural starting point, and following with functions that are applied - in order. The readability of several nested calls with a number of parameters each is almost zero, it?s almost as if one would need to come up with the solution himself. Pipeline in comparison is very straightforward, especially together with the point (2). >> >> The package here works rather good nevertheless, the shortcomings of piping not being native are not quite as severe as in point (2). Nevertheless, an intuitive symbol such as | would be helpful, and it sometimes bothers me that I have to parenthesize anonymous function, which would probably not be required in a native pipe-operator, much like it is not required in f.ex. lapply. That is, >> 1:5 %>% function(x) x+2 >> should be totally fine >> >> 2) currying >> Currently available in package Curry. The idea is that, having a function such as foo = function(x, y) x+y, one would like to write for example lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not make a value result, but it can still give a function result - a function of y. This would be indeed most useful for various apply functions, rather than writing function(x) foo(3,x). >> >> I suggest that currying would make the code easier to write, and more readable, especially when using apply functions. One might imagine that there could be some confusion with such a feature, especially from people unfamiliar with functional programming, although R already does take function as first-order arguments, so it could be just fine. But one could address it with special syntax, such as $foo(3) [$foo(x=3)] for partial application. The current currying package has very limited usefulness, as, being limited by the user-defined operator framework, it only rarely can contribute to less code/more readability. Compare yourself: >> $foo(x=3) vs foo %<% 3 >> goo = function(a,b,c) >> $goo(b=3) vs goo %><% list(b=3) >> >> Moreover, one would often like currying to have highest priority. For example, when piping: >> data %>% foo %>% foo1 %<% 3 >> if one wants to do data %>% foo %>% $foo(x=3) >> >> 3) Code executable only when running the script itself >> Whereas the first two suggestions are somewhat stealing from Haskell and the like, this suggestion would be stealing from Python. I?m building quite a complicated pipeline, using S4 classes. After defining the class and its methods, I also define how to build the class to my likings, based on my input data, using various now-defined methods. So I end up having a list of command line arguments to process, and the way to create the class instance based on them. If I write it to the class file, however, I end up running the code when it is sourced from the next step in the pipeline, that needs the previous class definitions. >> >> A feature such as pythonic ?if __name__ == __main__? would thus be useful. As it is, I had to create run scripts as separate files. Which is actually not so terrible, given the class and its methods often span a few hundred lines, but still. >> >> 4) non-exported global variables >> I also find it lacking, that I seem to be unable to create constants that would not get passed to files that source the class definition. That is, if class1 features global constant CONSTANT=3, then if class2 sources class1, it will also include the constant. This 1) clutters the namespace when running the code interactively, 2) potentially overwrites the constants in case of nameclash. Some kind of export/nonexport variable syntax, or symbolic import, or namespace would be useful. I know if I converted it to a package I would get at least something like a namespace, but still. >> >> I understand that the variable cannot just not be imported, in general, as the functions will generally rely on it (otherwise it wouldn?t have to be there). But one could consider hiding it in an implicit namespace for the file, for example. >> >> 5) S4 methods with same name, for different classes >> Say I have an S4 class called datasetSingle, and another S4 class called datasetMulti, which gathers up a number of datasetSingle classes, and adds some extra functionality on top. The datasetSingle class may have a method replicates, that returns a named vector assigning replicate number to experiment names of the dataset. But I would also like to have a function with the same name for the datasetMulti class, that returns for data frame, or list, covering replicate numbers for all the datasets included. >> >> But then, I need to setGeneric for the method. But if I set generic before both implementations, I will reset the generic in the second call, losing the definition for ?replicates? for datasetSingle. Skipping this in the code for datasetMulti means that 1) I have to remember that I had the function defined for datasetSingle, 2) if I remove the function or change its name in datasetSingle, I now have to change the datasetMulti class file too. Moreover, if I would like to have a different generic for the datasetMulti version, I have to change it not in datasetMulti class file, but in the datasetSingle file, where it might not make much sense. In this case, I wanted to have another argument ?datasets?, which would return the replicates only for the datasets specified, rather than for all. >> >> I made a wrapper that could circumvent the first issue, but the second issue is not easy to circumvent. >> >> 6) Many parameters freeze S4 method calls >> If I specify ca over 6 parameters for an S4 method, I would often get a ?freeze? on the method call. The process would eat up a lot of memory before going into the call, upon which it would execute the call as normal (if it didn?t run out of memory or I didn?t run out of patience). Subsequent calls of the method would not include this overhead. The amount of memory this could take could be in gigabytes, and the time in minutes. I suspect this might be due to generating an entry in call table for each accepted signature. It can be circumvented, but sure isn?t a behaviour one would expect. >> >> 7) Default values for S4 methods >> It would seem that it is not possible to set up default parameters for an S4 method in a usual way of definiton = function (x, y=5). I resorted to making class unions with ?missing? for signatures on the call, with the call starting with if(missing(param)) param=DEFAULT_VALUE, but it certainly does not improve readability or ease of coding. >> >> >> Thank you for your time if you have finished reading thus far. :) Looking forward to any answer. >> >> Yours Sincerely, >> Antonin Klima >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com
Possibly Parallel Threads
- A few suggestions and perspectives from a PhD student
- A few suggestions and perspectives from a PhD student
- A few suggestions and perspectives from a PhD student
- A few suggestions and perspectives from a PhD student
- A few suggestions and perspectives from a PhD student