thr3ads.net - R help - R: [R] slope estimations of teeth like data [Jun 2004]

If this information is useful, please help other people find it:
Share via:

Petr Pikal

2004-Jun-15 13:21 UTC

R: [R] slope estimations of teeth like data

On 15 Jun 2004 at 13:52, Vito Muggeo wrote: 
> Dear Petr, 
> Probably I don't understand exactly what you are looking for. 
>  
> However your "plot(x,c(y,z))" suggests a broken-line model for
the
> response "c(y,x)" versus the variables x. Therefore you could
estimate
> a segmented model to obtain (different) slope (and breakpoint) 
> estimates. See the package segmented. 
Thank you Vito, but it is not what I want. plot(x,c(y,z)) shows only one
"spike"
and I have many such spikes in actual data. 

My actual data look like those 

set.seed(1) 
y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
yy <- NULL 
for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
y.l <- length(yy) 
plot(1:y.l, yy) 

x axis is actually a time and y is a weight of gradually filled conteiner, which
is
irregularly emptied. I want to do an hourly and/or daily averages of increases
in
weight (it can by done by aggregate)  

myfac <- gl(y.l/12,12,length=1271) #hopefully length is ok 

y.agg <- aggregate(diff(yy), list(myfac), mean) 
## there will be list(hod=cut(time.axis,"hour")) construction actually

0.03 can be expected average result and some aggregated values ar OK but some  
are wrong as they include values from emptying time. 

*** This*** is probably what I need, I need to set some logical vector which
will
be TRUE when there was a filling time and FALSE during other times. And I  
need to specify it according a data I have available. 

Best what I was able to do was to consider filling time as a time when let say 

diff(yy) >= 0 

was between prespecified limits, but you know how it is with real life and  
prespecified limits.  

Or I can plot my data against time, manually find out regions which are correct
and make a aggregation only with correct data. But there are 24*60*3 values  
each day so I prefer not to do it manually.  

Or finally I can throw away any hourly average which is not in set limits, but I
prefer to throw away as little data as possible. 

I hope I was able to clarify the issue a bit. 

Thank you 
Best regards 
Petr 

>  
> best, 
> vito 
>  
>  
>  
> ----- Original Message ----- 
> From: Petr Pikal <petr.pikal at precheza.cz> 
> To: <r-help at stat.math.ethz.ch> 
> Sent: Tuesday, June 15, 2004 1:11 PM 
> Subject: [R] slope estimations of teeth like data 
>  
>  
> > Dear all 
> > 
> > Suppose I have teeth like data similar like 
> > 
> > x <- 1:200 
> > y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
> > z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
> > plot(x,c(y,z)) 
> > 
> > and I want to have a gradient estimations for some values from 
> > increasing 
> part of 
> > data 
> > 
> > like 
> > 
> > y.agg <- aggregate(diff(c(y,z)), 
> > list(rep(seq(1,200,10),each=10)[1:199]), 
> mean) 
> > 
> > y.agg[1:10,]  ##is OK, I want that 
> > y.agg[11:20,] ##is not OK, I do not want that 
> > 
> > actual data are similar but more irregular and have subsequent 
> > gradual 
> increases 
> > and decreases, more like 
> > 
> > set.seed(1) 
> > yy<-NULL 
> > for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
> > length(yy) 
> > [1] 1098 
> > 
> > plot(1:1098,yy) 
> > 
> > Is there anybody who has some experience with such data, mainly how 
> > to 
> extract 
> > only increasing portions or to filter values of "yy" such as
only
> aggregated slopes 
> > from increasing parts are computed and other parts are set to NA. 
> Sometimes 
> > actual data have so long parts of steady or even slightly increasing 
> values at 
> > decreasing part that aggregated values are slightly positive 
> > although they 
> are 
> > actually from decreasing portion of data. 
> > 
> > Thank you 
> > Petr Pikal 
> > petr.pikal at precheza.cz 
> > 
> > ______________________________________________ 
> > R-help at stat.math.ethz.ch mailing list 
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html 
Petr Pikal
petr.pikal at precheza.cz

Gabor Grothendieck

2004-Jun-15 21:33 UTC

head link

R: [R] slope estimations of teeth like data

I am not entirely sure what is essential and what is inessential in 
this problem.  In the yy data shown, the peaks occur in one step and
all the peaks are equal.  Just plotting

plot(diff(yy))

shows that there is a huge region of potential cutoffs for separating
out the large single step increases from the other differences.

If it continues to be true that the increases are large relative 
to everything else but the cutoff is not quite so obvious as 
here, we can calculate a cutoff by just sorting the unique
differences and partitioning them in every possible
way into small and large differences and then choosing the 
partition whose residual variance is least.   

Its actually pretty simple to do that from first pinciples, as
shown here, but there might be a function in R that does it
even more directly.

diffyy <- diff(yy)
yy. <- unique(sort(diffyy)) # unique diffs sorted
yy. <- yy.[-1] - diff(yy.)/2  # midpoints

# For each 2 class partition into small & large, get residual var
res <- sapply(yy., function(x) 
                     var(resid(lm(diffyy ~ factor(diffyy < x)))) )

cutoff <- yy.[which.min(res)]

# indices of points involved in rise
which(diffyy > cutoff) + 1

---

Petr Pikal <petr.pikal at precheza.cz>

On 15 Jun 2004 at 13:52, Vito Muggeo wrote: 
> Dear Petr, 
> Probably I don't understand exactly what you are looking for. 
> 
> However your "plot(x,c(y,z))" suggests a broken-line model for
the
> response "c(y,x)" versus the variables x. Therefore you could
estimate
> a segmented model to obtain (different) slope (and breakpoint) 
> estimates. See the package segmented. 
Thank you Vito, but it is not what I want. plot(x,c(y,z)) shows only one
"spike"
and I have many such spikes in actual data. 

My actual data look like those 

set.seed(1) 
y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
yy <- NULL 
for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
y.l <- length(yy) 
plot(1:y.l, yy) 

x axis is actually a time and y is a weight of gradually filled conteiner, which
is
irregularly emptied. I want to do an hourly and/or daily averages of increases
in
weight (it can by done by aggregate) 

myfac <- gl(y.l/12,12,length=1271) #hopefully length is ok 

y.agg <- aggregate(diff(yy), list(myfac), mean) 
## there will be list(hod=cut(time.axis,"hour")) construction actually

0.03 can be expected average result and some aggregated values ar OK but some 
are wrong as they include values from emptying time. 

*** This*** is probably what I need, I need to set some logical vector which
will
be TRUE when there was a filling time and FALSE during other times. And I 
need to specify it according a data I have available. 

Best what I was able to do was to consider filling time as a time when let say 

diff(yy) >= 0 

was between prespecified limits, but you know how it is with real life and 
prespecified limits. 

Or I can plot my data against time, manually find out regions which are correct 
and make a aggregation only with correct data. But there are 24*60*3 values 
each day so I prefer not to do it manually. 

Or finally I can throw away any hourly average which is not in set limits, but I
prefer to throw away as little data as possible. 

I hope I was able to clarify the issue a bit. 

Thank you 
Best regards 
Petr 

> 
> best, 
> vito 
> 
> 
> 
> ----- Original Message ----- 
> From: Petr Pikal <petr.pikal at precheza.cz> 
> To: <r-help at stat.math.ethz.ch> 
> Sent: Tuesday, June 15, 2004 1:11 PM 
> Subject: [R] slope estimations of teeth like data 
> 
> 
> > Dear all 
> > 
> > Suppose I have teeth like data similar like 
> > 
> > x <- 1:200 
> > y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
> > z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
> > plot(x,c(y,z)) 
> > 
> > and I want to have a gradient estimations for some values from 
> > increasing 
> part of 
> > data 
> > 
> > like 
> > 
> > y.agg <- aggregate(diff(c(y,z)), 
> > list(rep(seq(1,200,10),each=10)[1:199]), 
> mean) 
> > 
> > y.agg[1:10,] ##is OK, I want that 
> > y.agg[11:20,] ##is not OK, I do not want that 
> > 
> > actual data are similar but more irregular and have subsequent 
> > gradual 
> increases 
> > and decreases, more like 
> > 
> > set.seed(1) 
> > yy<-NULL 
> > for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
> > length(yy) 
> > [1] 1098 
> > 
> > plot(1:1098,yy) 
> > 
> > Is there anybody who has some experience with such data, mainly how 
> > to 
> extract 
> > only increasing portions or to filter values of "yy" such as
only
> aggregated slopes 
> > from increasing parts are computed and other parts are set to NA. 
> Sometimes 
> > actual data have so long parts of steady or even slightly increasing 
> values at 
> > decreasing part that aggregated values are slightly positive 
> > although they 
> are 
> > actually from decreasing portion of data. 
> > 
> > Thank you 
> > Petr Pikal 
> > petr.pikal at precheza.cz 
> >

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jun 2004 - R: slope estimations of teeth like data

R: [R] slope estimations of teeth like data

R: [R] slope estimations of teeth like data

Possibly Parallel Threads