Titus von der Malsburg
2009-May-12 10:18 UTC
[R] What's the best way to tell a function about relevant fields in data frames
Hi list, I have a function that detects saccadic eye movements in a time series of eye positions sampled at a rate of 250Hz. This function needs three vectors: x-coordinate, y-coordinate, trial-id. This information is usually contained in a data frame that also has some other fields. The names of the fields are not standardized.> head(eyemovements)time x y trial 51 880446504 53.18 375.73 1 52 880450686 53.20 375.79 1 53 880454885 53.35 376.14 1 54 880459060 53.92 376.39 1 55 880463239 54.14 376.52 1 56 880467426 54.46 376.74 1 There are now several possibilities for the signature of the function: 1. Passing the columns separately: detect(eyemovements$x, eyemovements$y, eyemovements$trial) or: with(eyemovements, detect(x, y, trial)) 2. Passing the data frame plus the names of the fields: detect(eyemovements, "x", "y", "trial") 3. Passing the data frame plus a formula specifying the relevant fields: detect(eyemovements, ~x+y|trial) 4. Passing a formula and getting the data from the environment: with(eyemovements, detect(~x+y|trial)) I saw instances of all those variants (and others) in the wild. Is there a canonical way to tell a function which fields in a data frame are relevant? What other alternatives are possible? What are the pros and cons of the alternatives? Thanks, Titus
Zeljko Vrba
2009-May-12 10:26 UTC
[R] What's the best way to tell a function about relevant fields in data frames
On Tue, May 12, 2009 at 12:18:59PM +0200, Titus von der Malsburg wrote:> > Is there a canonical way to tell a function which fields in a data > frame are relevant? What other alternatives are possible? What are > the pros and cons of the alternatives? >Why not simply rearrange your data frames to have standardized column names (see names() function), and write functions that operate on the standardized format? The change need not be destructive, you can first make a copy of the data. If all data frames have the same sequence of variables (time, x, y), you can just use indices to refer to the columns, e.g. 1 corresponds to the time variable.
Gabor Grothendieck
2009-May-12 10:33 UTC
[R] What's the best way to tell a function about relevant fields in data frames
You could define a generic detect(obj, ...) that dispatches (using S3): detect.formula(fo, data) detect.data.frame(data) detect.default(x, y, trial) where the first two call the third thereby modeling it on lm, a common approach, and giving the user choice in interface. On Tue, May 12, 2009 at 6:18 AM, Titus von der Malsburg <malsburg at gmail.com> wrote:> > Hi list, > > I have a function that detects saccadic eye movements in a time series > of eye positions sampled at a rate of 250Hz. ?This function needs > three vectors: x-coordinate, y-coordinate, trial-id. ?This information > is usually contained in a data frame that also has some other fields. > The names of the fields are not standardized. > >> head(eyemovements) > ? ? ? ?time ? ? x ? ? ?y trial > 51 880446504 53.18 375.73 ? ? 1 > 52 880450686 53.20 375.79 ? ? 1 > 53 880454885 53.35 376.14 ? ? 1 > 54 880459060 53.92 376.39 ? ? 1 > 55 880463239 54.14 376.52 ? ? 1 > 56 880467426 54.46 376.74 ? ? 1 > > There are now several possibilities for the signature of the function: > > 1. Passing the columns separately: > > ? ?detect(eyemovements$x, eyemovements$y, eyemovements$trial) > > ?or: > > ? ?with(eyemovements, > ? ? ? ? detect(x, y, trial)) > > 2. Passing the data frame plus the names of the fields: > > ? ?detect(eyemovements, "x", "y", "trial") > > 3. Passing the data frame plus a formula specifying the relevant > fields: > > ? ?detect(eyemovements, ~x+y|trial) > > 4. Passing a formula and getting the data from the environment: > > ? ?with(eyemovements, > ? ? ? ? detect(~x+y|trial)) > > I saw instances of all those variants (and others) in the wild. > > Is there a canonical way to tell a function which fields in a data > frame are relevant? ?What other alternatives are possible? ?What are > the pros and cons of the alternatives? > > Thanks, Titus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Duncan Murdoch
2009-May-12 10:55 UTC
[R] What's the best way to tell a function about relevant fields in data frames
On 12/05/2009 6:18 AM, Titus von der Malsburg wrote:> Hi list, > > I have a function that detects saccadic eye movements in a time series > of eye positions sampled at a rate of 250Hz. This function needs > three vectors: x-coordinate, y-coordinate, trial-id. This information > is usually contained in a data frame that also has some other fields. > The names of the fields are not standardized. > >> head(eyemovements) > time x y trial > 51 880446504 53.18 375.73 1 > 52 880450686 53.20 375.79 1 > 53 880454885 53.35 376.14 1 > 54 880459060 53.92 376.39 1 > 55 880463239 54.14 376.52 1 > 56 880467426 54.46 376.74 1 > > There are now several possibilities for the signature of the function: > > 1. Passing the columns separately: > > detect(eyemovements$x, eyemovements$y, eyemovements$trial) > > or: > > with(eyemovements, > detect(x, y, trial))I'd choose this one, with one modification described below.> > 2. Passing the data frame plus the names of the fields: > > detect(eyemovements, "x", "y", "trial")I think this is too inflexible. What if you want to temporarily change one variable? You don't want to have to create a whole new dataframe, it's better to just substitute in another variable.> > 3. Passing the data frame plus a formula specifying the relevant > fields: > > detect(eyemovements, ~x+y|trial) > > 4. Passing a formula and getting the data from the environment: > > with(eyemovements, > detect(~x+y|trial))Rather than 3 or 4, I would use the more common idiom detect(~x+y|trial, data=eyemovements) (and the formula might be x+y~trial). But I think the formula interface is too general for your needs. What would ~x+y+z|trial mean? I'd suggest something like 1 but using the convention plot.default() uses, where you have x and y arguments, but y can be skipped if x is a matrix/dataframe/formula/list. It uses the xy.coords() function to do the extraction. Duncan Murdoch> > I saw instances of all those variants (and others) in the wild. > > Is there a canonical way to tell a function which fields in a data > frame are relevant? What other alternatives are possible? What are > the pros and cons of the alternatives? > > Thanks, Titus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.