E Hofstadler
2011-Apr-01 11:08 UTC
[R] programming: telling a function where to look for the entered variables
Hi there, Could someone help me with the following programming problem..? I have written a function that works for my intended purpose, but it is quite closely tied to a particular dataframe and the names of the variables in this dataframe. However, I'd like to use the same function for different dataframes and variables. My problem is that I'm not quite sure how to tell my function in which dataframe the entered variables are located. Here's some reproducible data and the function: # create reproducible data set.seed(124) xvar <- sample(0:3, 1000, replace = T) yvar <- sample(0:1, 1000, replace=T) zvar <- rnorm(100) lvar <- sample(0:1, 1000, replace=T) Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar)) Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow")) Fulldf$yvar <- factor(yvar, labels=c("area1","area2")) Fulldf$lvar <- factor(lvar, labels=c("yes","no")) and here's the function in the form that it currently works: from a subset of the dataframe Fulldf, a contingency table is created (in my actual data, several other operations are then performed on that contingency table, but these are not relevant for the problem in question, therefore I've deleted it) . # function as it currently works: tailored to a particular dataframe (Fulldf) myfunct <- function(subgroup){ # enter a particular subgroup for which the contingency table should be calculated (i.e. a particular value of the factor lvar) Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) #restrict dataframe to given subgroup and two columns of the original dataframe Data.tmp <- na.omit(Data.tmp) # exclude missing values indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table return(indextable) } #Since I need to use the function with different dataframes and variable names, I'd like to be able to tell my function the name of the dataframe and variables it should use for calculating the index. This is how I tried to modify the first part of the #function, but it didn't work: # function as I would like it to work: independent of any particular dataframe or variable names (doesn't work) myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){ #enter the subgroup, the variable names to be used and the dataframe in which they are found Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar", deparse(substitute(yvarname)))) # trying to subset the given dataframe for the given subgroup of the given variable. The variable "xvar" happens to have the same name in all dataframes) but the variable yvarname has different names in the different dataframes Data.tmp <- na.omit(Data.tmp) indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the contingency table on the basis of the entered variables return(indextable) } calling myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf) results in the following error: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected My feeling is that R doesn't know where to look for the entered variables (lvar, yvar), but I'm not sure how to solve this problem. I tried using with() and even attach() within the function, but that didn't work. Any help is greatly appreciated. Best, Esther P.S.: Are there books that elaborate programming in R for beginners -- and I mean things like how to best use vectorization instead of loops and general "best practice" tips for programming. Most of the books I've been looking at focus on applying R for particular statistical analyses, and only comparably briefly deal with more general programming aspects. I was wondering if there's any books or tutorials out there that cover the latter aspects in a more elaborate and systematic way...?
Nick Sabbe
2011-Apr-01 11:34 UTC
[R] programming: telling a function where to look for the entered variables
See the warning in ?subset. Passing the column name of lvar is not the same as passing the 'contextual column' (as I coin it in these circumstances). You can solve it by indeed using [] instead. For my own comfort, here is the relevant line from your original function: Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) Which should become something like (untested but should be close): Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")] This should be a lot easier to translate based on column names, as the column names are now used as such. HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of E Hofstadler Sent: vrijdag 1 april 2011 13:09 To: r-help at r-project.org Subject: [R] programming: telling a function where to look for the entered variables Hi there, Could someone help me with the following programming problem..? I have written a function that works for my intended purpose, but it is quite closely tied to a particular dataframe and the names of the variables in this dataframe. However, I'd like to use the same function for different dataframes and variables. My problem is that I'm not quite sure how to tell my function in which dataframe the entered variables are located. Here's some reproducible data and the function: # create reproducible data set.seed(124) xvar <- sample(0:3, 1000, replace = T) yvar <- sample(0:1, 1000, replace=T) zvar <- rnorm(100) lvar <- sample(0:1, 1000, replace=T) Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar)) Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow")) Fulldf$yvar <- factor(yvar, labels=c("area1","area2")) Fulldf$lvar <- factor(lvar, labels=c("yes","no")) and here's the function in the form that it currently works: from a subset of the dataframe Fulldf, a contingency table is created (in my actual data, several other operations are then performed on that contingency table, but these are not relevant for the problem in question, therefore I've deleted it) . # function as it currently works: tailored to a particular dataframe (Fulldf) myfunct <- function(subgroup){ # enter a particular subgroup for which the contingency table should be calculated (i.e. a particular value of the factor lvar) Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) #restrict dataframe to given subgroup and two columns of the original dataframe Data.tmp <- na.omit(Data.tmp) # exclude missing values indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table return(indextable) } #Since I need to use the function with different dataframes and variable names, I'd like to be able to tell my function the name of the dataframe and variables it should use for calculating the index. This is how I tried to modify the first part of the #function, but it didn't work: # function as I would like it to work: independent of any particular dataframe or variable names (doesn't work) myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){ #enter the subgroup, the variable names to be used and the dataframe in which they are found Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar", deparse(substitute(yvarname)))) # trying to subset the given dataframe for the given subgroup of the given variable. The variable "xvar" happens to have the same name in all dataframes) but the variable yvarname has different names in the different dataframes Data.tmp <- na.omit(Data.tmp) indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the contingency table on the basis of the entered variables return(indextable) } calling myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf) results in the following error: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected My feeling is that R doesn't know where to look for the entered variables (lvar, yvar), but I'm not sure how to solve this problem. I tried using with() and even attach() within the function, but that didn't work. Any help is greatly appreciated. Best, Esther P.S.: Are there books that elaborate programming in R for beginners -- and I mean things like how to best use vectorization instead of loops and general "best practice" tips for programming. Most of the books I've been looking at focus on applying R for particular statistical analyses, and only comparably briefly deal with more general programming aspects. I was wondering if there's any books or tutorials out there that cover the latter aspects in a more elaborate and systematic way...? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
E Hofstadler
2011-Apr-01 12:28 UTC
[R] programming: telling a function where to look for the entered variables
Thanks Nick and Juan for your replies. Nick, thanks for pointing out the warning in subset(). I'm not sure though I understand the example you provided -- because despite using subset() rather than bracket notation, the original function (myfunct) does what is expected of it. The problem I have is with the second function (myfunct.better), where variable names + dataframe are not fixed within the function but passed to the function when calling it -- and even with bracket notation I don't quite manage to tell R where to look for the columns that related to the entered column names. (but then perhaps I misunderstood you) This is what I tried (using bracket notation): myfunct.better(dataframe, subgroup, lvarname,yvarname){ Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup, c("xvar",deparse(substitute(yvarname)))] } but this creates an empty contingency table only -- perhaps because my use of deparse() is flawed (I think what is converted into a string is "lvarname" and "yvarname", rather than the column names that these two function-variables represent in the dataframe)? 2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>:> See the warning in ?subset. > Passing the column name of lvar is not the same as passing the 'contextual > column' (as I coin it in these circumstances). > You can solve it by indeed using [] instead. > > For my own comfort, here is the relevant line from your original function: > Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) > Which should become something like (untested but should be close): > Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")] > > This should be a lot easier to translate based on column names, as the > column names are now used as such. > > HTH, > > > Nick Sabbe > -- > ping: nick.sabbe at ugent.be > link: http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > > -- Do Not Disapprove > > > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of E Hofstadler > Sent: vrijdag 1 april 2011 13:09 > To: r-help at r-project.org > Subject: [R] programming: telling a function where to look for the entered > variables > > Hi there, > > Could someone help me with the following programming problem..? > > I have written a function that works for my intended purpose, but it > is quite closely tied to a particular dataframe and the names of the > variables in this dataframe. However, I'd like to use the same > function for different dataframes and variables. My problem is that > I'm not quite sure how to tell my function in which dataframe the > entered variables are located. > > Here's some reproducible data and the function: > > # create reproducible data > set.seed(124) > xvar <- sample(0:3, 1000, replace = T) > yvar <- sample(0:1, 1000, replace=T) > zvar <- rnorm(100) > lvar <- sample(0:1, 1000, replace=T) > Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar)) > Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow")) > Fulldf$yvar <- factor(yvar, labels=c("area1","area2")) > Fulldf$lvar <- factor(lvar, labels=c("yes","no")) > > and here's the function in the form that it currently works: from a > subset of the dataframe Fulldf, a contingency table is created (in my > actual data, several other operations are then performed on that > contingency table, but these are not relevant for the problem in > question, therefore I've deleted it) . > > # function as it currently works: tailored to a particular dataframe > (Fulldf) > > myfunct <- function(subgroup){ # enter a particular subgroup for which > the contingency table should be calculated (i.e. a particular value of > the factor lvar) > Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) > #restrict dataframe to given subgroup and two columns of the original > dataframe > Data.tmp <- na.omit(Data.tmp) # exclude missing values > indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table > return(indextable) > } > > #Since I need to use the function with different dataframes and > variable names, I'd like to be able to tell my function the name of > the dataframe and variables it should use for calculating the index. > This is how I tried to modify the first part of the #function, but it > didn't work: > > # function as I would like it to work: independent of any particular > dataframe or variable names (doesn't work) > > myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){ > #enter the subgroup, the variable names to be used and the dataframe > in which they are found > ? ?Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar", > deparse(substitute(yvarname)))) # trying to subset the given dataframe > for the given subgroup of the given variable. The variable "xvar" > happens to have the same name in all dataframes) but the variable > yvarname has different names in the different dataframes > Data.tmp <- na.omit(Data.tmp) > ? ?indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the > contingency table on the basis of the entered variables > return(indextable) > } > > calling > > myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf) > > results in the following error: > > Error in `[.data.frame`(x, r, vars, drop = drop) : > ?undefined columns selected > > My feeling is that R doesn't know where to look for the entered > variables (lvar, yvar), but I'm not sure how to solve this problem. I > tried using with() and even attach() within the function, but that > didn't work. > > Any help is greatly appreciated. > > Best, > Esther > > P.S.: > Are there books that elaborate programming in R for beginners -- and I > mean things like how to best use vectorization instead of loops and > general "best practice" tips for programming. Most of the books I've > been looking at focus on applying R for particular statistical > analyses, and only comparably briefly deal with more general > programming aspects. I was wondering if there's any books or tutorials > out there that cover the latter aspects in a more elaborate and > systematic way...? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
E Hofstadler
2011-Apr-01 12:54 UTC
[R] programming: telling a function where to look for the entered variables
2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>:> This should be a version that does what you want.Indeed it does, thank you very much!> Because you named the variable lvarname, I assumed you were already passing > "lvar" instead of trying to pass lvar (without the quotes), which is in no > way a 'name'.Sorry about that, I can see how my variable names were somewhat confusing. Many thanks once again!> > > > -----Original Message----- > From: irene.prix at googlemail.com [mailto:irene.prix at googlemail.com] On Behalf > Of E Hofstadler > Sent: vrijdag 1 april 2011 14:28 > To: Nick Sabbe > Cc: r-help at r-project.org > Subject: Re: [R] programming: telling a function where to look for the > entered variables > > Thanks Nick and Juan for your replies. > > Nick, thanks for pointing out the warning in subset(). I'm not sure > though I understand the example you provided -- because despite using > subset() rather than bracket notation, the original function (myfunct) > does what is expected of it. The problem I have is with the second > function (myfunct.better), where variable names + dataframe are not > fixed within the function but passed to the function when calling it > -- and even with bracket notation I don't quite manage to tell R where > to look for the columns that related to the entered column names. > (but then perhaps I misunderstood you) > > This is what I tried (using bracket notation): > > myfunct.better(dataframe, subgroup, lvarname,yvarname){ > Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup, > c("xvar",deparse(substitute(yvarname)))] > } > > but this creates an empty contingency table only -- perhaps because my > use of deparse() is flawed (I think what is converted into a string is > "lvarname" and "yvarname", rather than the column names that these two > function-variables represent in the dataframe)? > > > 2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>: >> See the warning in ?subset. >> Passing the column name of lvar is not the same as passing the 'contextual >> column' (as I coin it in these circumstances). >> You can solve it by indeed using [] instead. >> >> For my own comfort, here is the relevant line from your original function: >> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) >> Which should become something like (untested but should be close): >> Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")] >> >> This should be a lot easier to translate based on column names, as the >> column names are now used as such. >> >> HTH, >> >> >> Nick Sabbe >> -- >> ping: nick.sabbe at ugent.be >> link: http://biomath.ugent.be >> wink: A1.056, Coupure Links 653, 9000 Gent >> ring: 09/264.59.36 >> >> -- Do Not Disapprove >> >> >> >> >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On >> Behalf Of E Hofstadler >> Sent: vrijdag 1 april 2011 13:09 >> To: r-help at r-project.org >> Subject: [R] programming: telling a function where to look for the entered >> variables >> >> Hi there, >> >> Could someone help me with the following programming problem..? >> >> I have written a function that works for my intended purpose, but it >> is quite closely tied to a particular dataframe and the names of the >> variables in this dataframe. However, I'd like to use the same >> function for different dataframes and variables. My problem is that >> I'm not quite sure how to tell my function in which dataframe the >> entered variables are located. >> >> Here's some reproducible data and the function: >> >> # create reproducible data >> set.seed(124) >> xvar <- sample(0:3, 1000, replace = T) >> yvar <- sample(0:1, 1000, replace=T) >> zvar <- rnorm(100) >> lvar <- sample(0:1, 1000, replace=T) >> Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar)) >> Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow")) >> Fulldf$yvar <- factor(yvar, labels=c("area1","area2")) >> Fulldf$lvar <- factor(lvar, labels=c("yes","no")) >> >> and here's the function in the form that it currently works: from a >> subset of the dataframe Fulldf, a contingency table is created (in my >> actual data, several other operations are then performed on that >> contingency table, but these are not relevant for the problem in >> question, therefore I've deleted it) . >> >> # function as it currently works: tailored to a particular dataframe >> (Fulldf) >> >> myfunct <- function(subgroup){ # enter a particular subgroup for which >> the contingency table should be calculated (i.e. a particular value of >> the factor lvar) >> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar")) >> #restrict dataframe to given subgroup and two columns of the original >> dataframe >> Data.tmp <- na.omit(Data.tmp) # exclude missing values >> indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table >> return(indextable) >> } >> >> #Since I need to use the function with different dataframes and >> variable names, I'd like to be able to tell my function the name of >> the dataframe and variables it should use for calculating the index. >> This is how I tried to modify the first part of the #function, but it >> didn't work: >> >> # function as I would like it to work: independent of any particular >> dataframe or variable names (doesn't work) >> >> myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){ >> #enter the subgroup, the variable names to be used and the dataframe >> in which they are found >> ? ?Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar", >> deparse(substitute(yvarname)))) # trying to subset the given dataframe >> for the given subgroup of the given variable. The variable "xvar" >> happens to have the same name in all dataframes) but the variable >> yvarname has different names in the different dataframes >> Data.tmp <- na.omit(Data.tmp) >> ? ?indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the >> contingency table on the basis of the entered variables >> return(indextable) >> } >> >> calling >> >> myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf) >> >> results in the following error: >> >> Error in `[.data.frame`(x, r, vars, drop = drop) : >> ?undefined columns selected >> >> My feeling is that R doesn't know where to look for the entered >> variables (lvar, yvar), but I'm not sure how to solve this problem. I >> tried using with() and even attach() within the function, but that >> didn't work. >> >> Any help is greatly appreciated. >> >> Best, >> Esther >> >> P.S.: >> Are there books that elaborate programming in R for beginners -- and I >> mean things like how to best use vectorization instead of loops and >> general "best practice" tips for programming. Most of the books I've >> been looking at focus on applying R for particular statistical >> analyses, and only comparably briefly deal with more general >> programming aspects. I was wondering if there's any books or tutorials >> out there that cover the latter aspects in a more elaborate and >> systematic way...? >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > >