Dear R experts, I am sorry for sending this email again. I would imagine yesterday and maybe today, have been very busy days with the release of R v 2.7.0. I join all the R users who are very gratful for your contant work and efforts, specially knowing that you are doing this for the sake of science, without gettig any compensation for that. Having written that, I decided to send the email below again, in case it was forgotten; or maybe I am missing something very basic? I am using version 2.7.0, in windows XP. Start of yesterday's email: I am trying to optimize my script, because right now it requires a lot of memory. The goal is to generate four plots in one page. Every plot corresponds to the means and sem's calculated for a given variable at different days. In order to obtain the means and sem's I apply the 'by' function. The way I have done it so far is like this: Read the data Generate a summary of the mean and sem of a variable at every Day. Plot the mean and sem of that variable. Repeat the same process for the other 3 variables. I tried to optimize the code by using a for loop, the code is below. #Reading the data dato<-read.csv('mydata.csv') names(dato)<-c("id","day","tx","var1","var2","var3","var4") dato<-dato[,1:7] #Specify varible to be plotted variable<-c('var1','var2','var3','var4') #Define parameters of window where panel: margins, number of plots in the panel windows(height=9, width=9, rescale='fixed') par(mfrow=c(2,2),xpd=T, bty='l', omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3)) for (k in variable) { dat<-dato[!is.na(k),] summ<-by(dat,dat[,c("tx","day")], function(x) { mn<-mean(x$k) std<-sd(x$k) n<-length(x$k) se<-std/sqrt(n) lowb<-mn-se upb<-mn+se data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se) }) summ<-do.call("rbind",summ) #Definining x axis range xmax<-unique(max(summ$day,na.rm=TRUE)) xmin<-unique(min(summ$day,na.rm=TRUE)) yaxmin<-unique(min(summ$lowb)) yaxmax<-unique(max(summ$upb)) plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), ylab=k, las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax))))) points(summ$day,summ$mn) } Where variable is a vector that specifies all the variables I want to plot. But I am getting the following error: ?Error in var(as.vector(x), na.rm = na.rm) : 'x' is empty In addition: Warning message: In mean.default(x$k) : argument is not numeric or logical: returning NA? Could some one please show me how to structure my code to achieve my final goal, which is to simplify it? I am attaching a csv file in case you want to run my code. Thank you very much in advance for your time and help, Judith ____________________________________________________________________________________ Be a better friend, newshound, and
Hi Judith, Could you provide a copy of your data as well? (Either as a csv file, or by copying and pasting the output of dput(my.data.frame) or by generating a data.frame of random numbers with the same structure as your data). That will help people to see what your code does and suggest improvements. Hadley On Tue, Apr 22, 2008 at 12:30 PM, Judith Flores <juryef at yahoo.com> wrote:> Dear R experts, > > > I am sorry for sending this email again. I would > imagine yesterday and maybe today, have been very busy > days with the release of R v 2.7.0. I join all the R > users who are very gratful for your contant work and > efforts, specially knowing that you are doing this for > the sake of science, without gettig any compensation > for that. > Having written that, I decided to send the email > below again, in case it was forgotten; or maybe I am > missing something very basic? > > I am using version 2.7.0, in windows XP. > > Start of yesterday's email: > > I am trying to optimize my script, because right > now it requires a lot of memory. The goal is to > generate four plots in one page. Every plot > corresponds to the means and sem's calculated for a > given variable at different days. In order to obtain > the means and sem's I apply the 'by' function. The way > I have done it so far is like this: > > Read the data > Generate a summary of the mean and sem of a variable > at every Day. > Plot the mean and sem of that variable. > > Repeat the same process for the other 3 variables. > > I tried to optimize the code by using a for loop, > the code is below. > > > > #Reading the data > dato<-read.csv('mydata.csv') > names(dato)<-c("id","day","tx","var1","var2","var3","var4") > dato<-dato[,1:7] > > #Specify varible to be plotted > variable<-c('var1','var2','var3','var4') > > #Define parameters of window where panel: margins, > number of plots in the panel > windows(height=9, width=9, rescale='fixed') > par(mfrow=c(2,2),xpd=T, bty='l', > omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3)) > > > for (k in variable) { > > dat<-dato[!is.na(k),] > > > > summ<-by(dat,dat[,c("tx","day")], function(x) { > mn<-mean(x$k) > std<-sd(x$k) > n<-length(x$k) > se<-std/sqrt(n) > lowb<-mn-se > upb<-mn+se > > data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se) > }) > summ<-do.call("rbind",summ) > > > > > #Definining x axis range > xmax<-unique(max(summ$day,na.rm=TRUE)) > xmin<-unique(min(summ$day,na.rm=TRUE)) > > yaxmin<-unique(min(summ$lowb)) > yaxmax<-unique(max(summ$upb)) > > > plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), > ylab=k, > > las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax))))) > points(summ$day,summ$mn) > > } > > > > > Where variable is a vector that specifies all the > variables I want to plot. > > But I am getting the following error: > > "Error in var(as.vector(x), na.rm = na.rm) : 'x' is > empty > In addition: Warning message: > In mean.default(x$k) : argument is not numeric or > logical: returning NA" > > Could some one please show me how to structure my > code to achieve my final goal, which is to simplify > it? > > I am attaching a csv file in case you want to run my > code. > > Thank you very much in advance for your time and help, > > Judith > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- http://had.co.nz/
One of the things that is probably happening is that the 'by' is producing all possible combinations and in some cases 'x' is size zero. Put a check in the function within the 'by' to check for this condition and just return a NULL. Another approach that I use is to "split(seq(nrow(df), condition), drop=TRUE)" which will only give the indices for groupings that are non-zero. You can then use the indices to index into the dataframe for the variables you want: x.index <- split(seq(nrow(dat)), dat[,c("tx","day")]) results <- lapply(x.index, function(.indx){ mn <- mean(dat$k[.indx]) ...... data.frame(....) }) This will provide a 'list' that you can process through. On Tue, Apr 22, 2008 at 1:30 PM, Judith Flores <juryef at yahoo.com> wrote:> Dear R experts, > > > I am sorry for sending this email again. I would > imagine yesterday and maybe today, have been very busy > days with the release of R v 2.7.0. I join all the R > users who are very gratful for your contant work and > efforts, specially knowing that you are doing this for > the sake of science, without gettig any compensation > for that. > Having written that, I decided to send the email > below again, in case it was forgotten; or maybe I am > missing something very basic? > > I am using version 2.7.0, in windows XP. > > Start of yesterday's email: > > I am trying to optimize my script, because right > now it requires a lot of memory. The goal is to > generate four plots in one page. Every plot > corresponds to the means and sem's calculated for a > given variable at different days. In order to obtain > the means and sem's I apply the 'by' function. The way > I have done it so far is like this: > > Read the data > Generate a summary of the mean and sem of a variable > at every Day. > Plot the mean and sem of that variable. > > Repeat the same process for the other 3 variables. > > I tried to optimize the code by using a for loop, > the code is below. > > > > #Reading the data > dato<-read.csv('mydata.csv') > names(dato)<-c("id","day","tx","var1","var2","var3","var4") > dato<-dato[,1:7] > > #Specify varible to be plotted > variable<-c('var1','var2','var3','var4') > > #Define parameters of window where panel: margins, > number of plots in the panel > windows(height=9, width=9, rescale='fixed') > par(mfrow=c(2,2),xpd=T, bty='l', > omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3)) > > > for (k in variable) { > > dat<-dato[!is.na(k),] > > > > summ<-by(dat,dat[,c("tx","day")], function(x) { > mn<-mean(x$k) > std<-sd(x$k) > n<-length(x$k) > se<-std/sqrt(n) > lowb<-mn-se > upb<-mn+se > > data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se) > }) > summ<-do.call("rbind",summ) > > > > > #Definining x axis range > xmax<-unique(max(summ$day,na.rm=TRUE)) > xmin<-unique(min(summ$day,na.rm=TRUE)) > > yaxmin<-unique(min(summ$lowb)) > yaxmax<-unique(max(summ$upb)) > > > plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), > ylab=k, > > las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax))))) > points(summ$day,summ$mn) > > } > > > > > Where variable is a vector that specifies all the > variables I want to plot. > > But I am getting the following error: > > "Error in var(as.vector(x), na.rm = na.rm) : 'x' is > empty > In addition: Warning message: > In mean.default(x$k) : argument is not numeric or > logical: returning NA" > > Could some one please show me how to structure my > code to achieve my final goal, which is to simplify > it? > > I am attaching a csv file in case you want to run my > code. > > Thank you very much in advance for your time and help, > > Judith > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
After talking about it, I forgot to put the drop=TRUE in the 'split' call: x.index <- split(seq(nrow(dat)), dat[,c("tx","day")], drop=TRUE) results <- lapply(x.index, function(.indx){ mn <- mean(dat$k[.indx]) ...... data.frame(....) }) On Tue, Apr 22, 2008 at 1:30 PM, Judith Flores <juryef at yahoo.com> wrote:> Dear R experts, > > > I am sorry for sending this email again. I would > imagine yesterday and maybe today, have been very busy > days with the release of R v 2.7.0. I join all the R > users who are very gratful for your contant work and > efforts, specially knowing that you are doing this for > the sake of science, without gettig any compensation > for that. > Having written that, I decided to send the email > below again, in case it was forgotten; or maybe I am > missing something very basic? > > I am using version 2.7.0, in windows XP. > > Start of yesterday's email: > > I am trying to optimize my script, because right > now it requires a lot of memory. The goal is to > generate four plots in one page. Every plot > corresponds to the means and sem's calculated for a > given variable at different days. In order to obtain > the means and sem's I apply the 'by' function. The way > I have done it so far is like this: > > Read the data > Generate a summary of the mean and sem of a variable > at every Day. > Plot the mean and sem of that variable. > > Repeat the same process for the other 3 variables. > > I tried to optimize the code by using a for loop, > the code is below. > > > > #Reading the data > dato<-read.csv('mydata.csv') > names(dato)<-c("id","day","tx","var1","var2","var3","var4") > dato<-dato[,1:7] > > #Specify varible to be plotted > variable<-c('var1','var2','var3','var4') > > #Define parameters of window where panel: margins, > number of plots in the panel > windows(height=9, width=9, rescale='fixed') > par(mfrow=c(2,2),xpd=T, bty='l', > omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3)) > > > for (k in variable) { > > dat<-dato[!is.na(k),] > > > > summ<-by(dat,dat[,c("tx","day")], function(x) { > mn<-mean(x$k) > std<-sd(x$k) > n<-length(x$k) > se<-std/sqrt(n) > lowb<-mn-se > upb<-mn+se > > data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se) > }) > summ<-do.call("rbind",summ) > > > > > #Definining x axis range > xmax<-unique(max(summ$day,na.rm=TRUE)) > xmin<-unique(min(summ$day,na.rm=TRUE)) > > yaxmin<-unique(min(summ$lowb)) > yaxmax<-unique(max(summ$upb)) > > > plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), > ylab=k, > > las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax))))) > points(summ$day,summ$mn) > > } > > > > > Where variable is a vector that specifies all the > variables I want to plot. > > But I am getting the following error: > > "Error in var(as.vector(x), na.rm = na.rm) : 'x' is > empty > In addition: Warning message: > In mean.default(x$k) : argument is not numeric or > logical: returning NA" > > Could some one please show me how to structure my > code to achieve my final goal, which is to simplify > it? > > I am attaching a csv file in case you want to run my > code. > > Thank you very much in advance for your time and help, > > Judith > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?