thr3ads.net - R help - [R] Loess over split data [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Luc Villandre

2009-Apr-23 19:56 UTC

[R] Loess over split data

Dear R users,

I am having trouble devising an efficient way to run a loess() function 
on all columns of a data.frame (with the x factor remaining the same for 
all columns) and I was hoping that someone here could help me fix my 
code so that I won't have to resort to using a for loop.

(You'll find the upcoming lines of code in a single block near the end 
of the message.)

Here's a small sample of my data :
   
my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, 
273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, 
5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ;

"birth.vec" is in days and "final.weight" is in grams. Now,
what I'm
trying to do is to output smoothed curves for the 10th, 50th and 90th 
percentiles over completed weeks. In other words, I transform 
"birth.vec" into a factor vector of completed weeks with

completed.weeks = as.factor(floor(my.data$birth.vec/7)) ;

I use these factors to get weight quantiles by completed weeks with

quan.list = tapply(my.data$final.weight,INDEX = 
completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ;

I then collapse the list into a data.frame with

quan.frame = data.frame(do.call(rbind,quan.list)) ;

Now comes the time to apply the loess() function to each percentile 
curve (i.e. to each column in quan.frame). Note that the x values for 
all percentile columns (i.e. X10., X50., X90.)

x.values = as.numeric(rownames(quan.frame)) ;

Once again, using tapply() (and transposing beforehand):

t.quan.frame = t(quan.frame) ;

## The following command doesn't work

smooth.result = 
tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess)
;

The "formula" argument of loess() is of course missing, since I have
no
idea how I could specify it in a way that the function could understand it.

Now, for your convenience, here's the code patched together:

########
my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, 
273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, 
5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ;
completed.weeks = as.factor(floor(my.data$birth.vec/7)) ;
quan.list = tapply(my.data$final.weight,INDEX = 
completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ;
quan.frame = data.frame(do.call(rbind,quan.list)) ;
x.values = as.numeric(rownames(quan.frame)) ;
t.quan.frame = t(quan.frame) ;

## The following command doesn't work

smooth.result = 
tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess)
;       
#######

So, is there a way to modify this code so that I'll get smoothed 
percentile curves?  Any suggestion as to how it could be improved is 
also welcome.

Although the forum archive is fairly comprehensive, phrasing search 
queries in a way that will yield the desired result is sometimes hard, 
hence this request for help that to some of you will most likely seem 
redundant.

I thank you sincerely for your time,

Luc

Emmanuel Charpentier

2009-Apr-25 10:38 UTC

head link

[R] Loess over split data

Dear Luc,

I stumbled on your unanswered question only this morning. Sorry for this
lateness...

Le jeudi 23 avril 2009 ? 15:56 -0400, Luc Villandre a ?crit
:> Dear R users,
> 
> I am having trouble devising an efficient way to run a loess() function 
> on all columns of a data.frame (with the x factor remaining the same for 
> all columns) and I was hoping that someone here could help me fix my 
> code so that I won't have to resort to using a for loop.
> 
> (You'll find the upcoming lines of code in a single block near the end 
> of the message.)
> 
> Here's a small sample of my data :
>    
> my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, 
> 273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, 
> 5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ;
> 
> "birth.vec" is in days and "final.weight" is in grams.
Now, what I'm
> trying to do is to output smoothed curves for the 10th, 50th and 90th 
> percentiles over completed weeks. In other words, I transform 
> "birth.vec" into a factor vector of completed weeks with
> 
> completed.weeks = as.factor(floor(my.data$birth.vec/7)) ;
> 
> I use these factors to get weight quantiles by completed weeks with
> 
> quan.list = tapply(my.data$final.weight,INDEX = 
> completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ;
> 
> I then collapse the list into a data.frame with
> 
> quan.frame = data.frame(do.call(rbind,quan.list)) ;
> 
> Now comes the time to apply the loess() function to each percentile 
> curve (i.e. to each column in quan.frame). Note that the x values for 
> all percentile columns (i.e. X10., X50., X90.)
> 
> x.values = as.numeric(rownames(quan.frame)) ;
> 
> Once again, using tapply() (and transposing beforehand):
> 
> t.quan.frame = t(quan.frame) ;
> 
> ## The following command doesn't work
> 
> smooth.result = 
>
tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess)
> ;
> 
> The "formula" argument of loess() is of course missing, since I
have no
> idea how I could specify it in a way that the function could understand it.
Why not tapply(yadda, yadda, function(x)loess(whatever you mean, x is
data...)) ?

But the whole idea seems ... bizarre :

1) converting time to a *class* variable destroys any ordinal relation
between different times ; a "curve" has no meaning. At the minimum,
keep
this variable numeric.

2) You lose any notion of individual data (e. g. "weight" of each
week).

3) Do you have reason to expect empirical quantiles very much different
of asymptotic quantiles (modulo a possible variable transformation) ?
That forces you to "coarsen" your data, which is rarely a good idea...
If no, the asymptotic IC80 curves can be obtained much cheaper. Modulo
some possible typos :

my.loess<-loess(final.weight~birth.vec,data=my.data)
my.pred<-predict(my.loess,
                 new.data=data.frame(birth.vec=my.pred.times<-with(my.data,
                                                                  
seq(min(birth.weight),
                                                                      
max(birth.weight)))),
                 se=TRUE)
my.curve=with(my.pred,
              data.frame(time=my.pred.times,
                         center=fit,
                         lb=fit+qnorm(0.1)*se.fit,
                         ub=fit+qnorm(0.9*se.fit))

# Graphics *will* help :

with(my.curve,{
     plot(x=range(time),
          y=range(center,lb,ub),
          type="n",
          xlab="Achieved time (days)",
          ylab="Final weight (g)",
          main="A suspicious loess() fit");
     lines(final.weight~birth.vec, type="p", data=my.data) ;
     lines(center~time, lty=1) ;
     lines(lb~time, lty=3) ;
     lines(ub~time, lty=3)
   })

Adding your empirical centiles to this graph might help...

NB : on your example data set, there is a big gap between the earliest
point and the rest, thus suspicious predictions in the gap (lower bound
is *negative* ! ) ; maybe a variable change (log ?) might be in order ?
This can be assessed only on the full data.

HTH,

					Emmanuel Charpentier

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Apr 2009 - Loess over split data

[R] Loess over split data

[R] Loess over split data

Possibly Parallel Threads