Dear R experts,
I am sorry for sending this email again. I would
imagine yesterday and maybe today, have been very busy
days with the release of R v 2.7.0. I join all the R
users who are very gratful for your contant work and
efforts, specially knowing that you are doing this for
the sake of science, without gettig any compensation
for that.
Having written that, I decided to send the email
below again, in case it was forgotten; or maybe I am
missing something very basic?
I am using version 2.7.0, in windows XP.
Start of yesterday's email:
I am trying to optimize my script, because right
now it requires a lot of memory. The goal is to
generate four plots in one page. Every plot
corresponds to the means and sem's calculated for a
given variable at different days. In order to obtain
the means and sem's I apply the 'by' function. The way
I have done it so far is like this:
Read the data
Generate a summary of the mean and sem of a variable
at every Day.
Plot the mean and sem of that variable.
Repeat the same process for the other 3 variables.
I tried to optimize the code by using a for loop,
the code is below.
#Reading the data
dato<-read.csv('mydata.csv')
names(dato)<-c("id","day","tx","var1","var2","var3","var4")
dato<-dato[,1:7]
#Specify varible to be plotted
variable<-c('var1','var2','var3','var4')
#Define parameters of window where panel: margins,
number of plots in the panel
windows(height=9, width=9, rescale='fixed')
par(mfrow=c(2,2),xpd=T, bty='l',
omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
for (k in variable) {
dat<-dato[!is.na(k),]
summ<-by(dat,dat[,c("tx","day")], function(x) {
mn<-mean(x$k)
std<-sd(x$k)
n<-length(x$k)
se<-std/sqrt(n)
lowb<-mn-se
upb<-mn+se
data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
})
summ<-do.call("rbind",summ)
#Definining x axis range
xmax<-unique(max(summ$day,na.rm=TRUE))
xmin<-unique(min(summ$day,na.rm=TRUE))
yaxmin<-unique(min(summ$lowb))
yaxmax<-unique(max(summ$upb))
plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
ylab=k,
las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
points(summ$day,summ$mn)
}
Where variable is a vector that specifies all the
variables I want to plot.
But I am getting the following error:
?Error in var(as.vector(x), na.rm = na.rm) : 'x' is
empty
In addition: Warning message:
In mean.default(x$k) : argument is not numeric or
logical: returning NA?
Could some one please show me how to structure my
code to achieve my final goal, which is to simplify
it?
I am attaching a csv file in case you want to run my
code.
Thank you very much in advance for your time and help,
Judith
____________________________________________________________________________________
Be a better friend, newshound, and
Hi Judith, Could you provide a copy of your data as well? (Either as a csv file, or by copying and pasting the output of dput(my.data.frame) or by generating a data.frame of random numbers with the same structure as your data). That will help people to see what your code does and suggest improvements. Hadley On Tue, Apr 22, 2008 at 12:30 PM, Judith Flores <juryef at yahoo.com> wrote:> Dear R experts, > > > I am sorry for sending this email again. I would > imagine yesterday and maybe today, have been very busy > days with the release of R v 2.7.0. I join all the R > users who are very gratful for your contant work and > efforts, specially knowing that you are doing this for > the sake of science, without gettig any compensation > for that. > Having written that, I decided to send the email > below again, in case it was forgotten; or maybe I am > missing something very basic? > > I am using version 2.7.0, in windows XP. > > Start of yesterday's email: > > I am trying to optimize my script, because right > now it requires a lot of memory. The goal is to > generate four plots in one page. Every plot > corresponds to the means and sem's calculated for a > given variable at different days. In order to obtain > the means and sem's I apply the 'by' function. The way > I have done it so far is like this: > > Read the data > Generate a summary of the mean and sem of a variable > at every Day. > Plot the mean and sem of that variable. > > Repeat the same process for the other 3 variables. > > I tried to optimize the code by using a for loop, > the code is below. > > > > #Reading the data > dato<-read.csv('mydata.csv') > names(dato)<-c("id","day","tx","var1","var2","var3","var4") > dato<-dato[,1:7] > > #Specify varible to be plotted > variable<-c('var1','var2','var3','var4') > > #Define parameters of window where panel: margins, > number of plots in the panel > windows(height=9, width=9, rescale='fixed') > par(mfrow=c(2,2),xpd=T, bty='l', > omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3)) > > > for (k in variable) { > > dat<-dato[!is.na(k),] > > > > summ<-by(dat,dat[,c("tx","day")], function(x) { > mn<-mean(x$k) > std<-sd(x$k) > n<-length(x$k) > se<-std/sqrt(n) > lowb<-mn-se > upb<-mn+se > > data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se) > }) > summ<-do.call("rbind",summ) > > > > > #Definining x axis range > xmax<-unique(max(summ$day,na.rm=TRUE)) > xmin<-unique(min(summ$day,na.rm=TRUE)) > > yaxmin<-unique(min(summ$lowb)) > yaxmax<-unique(max(summ$upb)) > > > plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax), > ylab=k, > > las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax))))) > points(summ$day,summ$mn) > > } > > > > > Where variable is a vector that specifies all the > variables I want to plot. > > But I am getting the following error: > > "Error in var(as.vector(x), na.rm = na.rm) : 'x' is > empty > In addition: Warning message: > In mean.default(x$k) : argument is not numeric or > logical: returning NA" > > Could some one please show me how to structure my > code to achieve my final goal, which is to simplify > it? > > I am attaching a csv file in case you want to run my > code. > > Thank you very much in advance for your time and help, > > Judith > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- http://had.co.nz/
One of the things that is probably happening is that the 'by' is
producing all possible combinations and in some cases 'x' is size
zero. Put a check in the function within the 'by' to check for this
condition and just return a NULL.
Another approach that I use is to "split(seq(nrow(df), condition),
drop=TRUE)" which will only give the indices for groupings that are
non-zero. You can then use the indices to index into the dataframe
for the variables you want:
x.index <- split(seq(nrow(dat)), dat[,c("tx","day")])
results <- lapply(x.index, function(.indx){
mn <- mean(dat$k[.indx])
......
data.frame(....)
})
This will provide a 'list' that you can process through.
On Tue, Apr 22, 2008 at 1:30 PM, Judith Flores <juryef at yahoo.com>
wrote:> Dear R experts,
>
>
> I am sorry for sending this email again. I would
> imagine yesterday and maybe today, have been very busy
> days with the release of R v 2.7.0. I join all the R
> users who are very gratful for your contant work and
> efforts, specially knowing that you are doing this for
> the sake of science, without gettig any compensation
> for that.
> Having written that, I decided to send the email
> below again, in case it was forgotten; or maybe I am
> missing something very basic?
>
> I am using version 2.7.0, in windows XP.
>
> Start of yesterday's email:
>
> I am trying to optimize my script, because right
> now it requires a lot of memory. The goal is to
> generate four plots in one page. Every plot
> corresponds to the means and sem's calculated for a
> given variable at different days. In order to obtain
> the means and sem's I apply the 'by' function. The way
> I have done it so far is like this:
>
> Read the data
> Generate a summary of the mean and sem of a variable
> at every Day.
> Plot the mean and sem of that variable.
>
> Repeat the same process for the other 3 variables.
>
> I tried to optimize the code by using a for loop,
> the code is below.
>
>
>
> #Reading the data
> dato<-read.csv('mydata.csv')
>
names(dato)<-c("id","day","tx","var1","var2","var3","var4")
> dato<-dato[,1:7]
>
> #Specify varible to be plotted
> variable<-c('var1','var2','var3','var4')
>
> #Define parameters of window where panel: margins,
> number of plots in the panel
> windows(height=9, width=9, rescale='fixed')
> par(mfrow=c(2,2),xpd=T, bty='l',
> omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
>
>
> for (k in variable) {
>
> dat<-dato[!is.na(k),]
>
>
>
> summ<-by(dat,dat[,c("tx","day")], function(x) {
> mn<-mean(x$k)
> std<-sd(x$k)
> n<-length(x$k)
> se<-std/sqrt(n)
> lowb<-mn-se
> upb<-mn+se
>
> data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
> })
> summ<-do.call("rbind",summ)
>
>
>
>
> #Definining x axis range
> xmax<-unique(max(summ$day,na.rm=TRUE))
> xmin<-unique(min(summ$day,na.rm=TRUE))
>
> yaxmin<-unique(min(summ$lowb))
> yaxmax<-unique(max(summ$upb))
>
>
>
plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
> ylab=k,
>
> las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
> points(summ$day,summ$mn)
>
> }
>
>
>
>
> Where variable is a vector that specifies all the
> variables I want to plot.
>
> But I am getting the following error:
>
> "Error in var(as.vector(x), na.rm = na.rm) : 'x' is
> empty
> In addition: Warning message:
> In mean.default(x$k) : argument is not numeric or
> logical: returning NA"
>
> Could some one please show me how to structure my
> code to achieve my final goal, which is to simplify
> it?
>
> I am attaching a csv file in case you want to run my
> code.
>
> Thank you very much in advance for your time and help,
>
> Judith
>
>
>
>
>
>
____________________________________________________________________________________
> Be a better friend, newshound, and
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
After talking about it, I forgot to put the drop=TRUE in the 'split'
call:
x.index <- split(seq(nrow(dat)), dat[,c("tx","day")],
drop=TRUE)
results <- lapply(x.index, function(.indx){
mn <- mean(dat$k[.indx])
......
data.frame(....)
})
On Tue, Apr 22, 2008 at 1:30 PM, Judith Flores <juryef at yahoo.com>
wrote:> Dear R experts,
>
>
> I am sorry for sending this email again. I would
> imagine yesterday and maybe today, have been very busy
> days with the release of R v 2.7.0. I join all the R
> users who are very gratful for your contant work and
> efforts, specially knowing that you are doing this for
> the sake of science, without gettig any compensation
> for that.
> Having written that, I decided to send the email
> below again, in case it was forgotten; or maybe I am
> missing something very basic?
>
> I am using version 2.7.0, in windows XP.
>
> Start of yesterday's email:
>
> I am trying to optimize my script, because right
> now it requires a lot of memory. The goal is to
> generate four plots in one page. Every plot
> corresponds to the means and sem's calculated for a
> given variable at different days. In order to obtain
> the means and sem's I apply the 'by' function. The way
> I have done it so far is like this:
>
> Read the data
> Generate a summary of the mean and sem of a variable
> at every Day.
> Plot the mean and sem of that variable.
>
> Repeat the same process for the other 3 variables.
>
> I tried to optimize the code by using a for loop,
> the code is below.
>
>
>
> #Reading the data
> dato<-read.csv('mydata.csv')
>
names(dato)<-c("id","day","tx","var1","var2","var3","var4")
> dato<-dato[,1:7]
>
> #Specify varible to be plotted
> variable<-c('var1','var2','var3','var4')
>
> #Define parameters of window where panel: margins,
> number of plots in the panel
> windows(height=9, width=9, rescale='fixed')
> par(mfrow=c(2,2),xpd=T, bty='l',
> omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
>
>
> for (k in variable) {
>
> dat<-dato[!is.na(k),]
>
>
>
> summ<-by(dat,dat[,c("tx","day")], function(x) {
> mn<-mean(x$k)
> std<-sd(x$k)
> n<-length(x$k)
> se<-std/sqrt(n)
> lowb<-mn-se
> upb<-mn+se
>
> data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
> })
> summ<-do.call("rbind",summ)
>
>
>
>
> #Definining x axis range
> xmax<-unique(max(summ$day,na.rm=TRUE))
> xmin<-unique(min(summ$day,na.rm=TRUE))
>
> yaxmin<-unique(min(summ$lowb))
> yaxmax<-unique(max(summ$upb))
>
>
>
plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
> ylab=k,
>
> las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
> points(summ$day,summ$mn)
>
> }
>
>
>
>
> Where variable is a vector that specifies all the
> variables I want to plot.
>
> But I am getting the following error:
>
> "Error in var(as.vector(x), na.rm = na.rm) : 'x' is
> empty
> In addition: Warning message:
> In mean.default(x$k) : argument is not numeric or
> logical: returning NA"
>
> Could some one please show me how to structure my
> code to achieve my final goal, which is to simplify
> it?
>
> I am attaching a csv file in case you want to run my
> code.
>
> Thank you very much in advance for your time and help,
>
> Judith
>
>
>
>
>
>
____________________________________________________________________________________
> Be a better friend, newshound, and
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?