thr3ads.net - R help - [R] Line similarity [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Satsangi, Vivek (GE Capital)

2013-Apr-30 19:57 UTC

[R] Line similarity

Folks,

                This is probably a "help me google this properly,
please"-type of question.

                In TIBCO Spotfire, there is a procedure called "line
similarity". I use this to determine which observations show a growing,
stable or declining pattern... sort of like a mini-regression on the time-line
for each observation.

                So of the input is something like this:

Name Year_1_value Year_2_value Year_3_value
A 1 2 3
B 2 7 19
C 3 4 2
D 10 7 6
E 4 4 5
F NA 3 6

Then the desired output is as follows:
A Growing
B Growing
C Stable
D Declining
E Stable
F Growing (or NA is also fine)

                The data can also be unstacked, i.e. the three years could be
separate rows if necessary.
                Is there a package for R that implements something like the
above? I can obviously try do a set of simple regressions to classify the rows,
but I want to gain from the thoughts and learnings of others who may have taken
the time to implement a package.
                I tried searching with the words "line similarity" or
its variants to no avail.

                Thanks in advance for your pointers!

Vivek Satsangi
GE Capital
Americas


	[[alternative HTML version deleted]]

William Dunlap

2013-Apr-30 20:47 UTC

head link

[R] Line similarity

Here is one way to, for each row in the data.frame v, regress the numbers in
columns 2 through 4 on the numbers 1 through 3, storing only the slopes, and
then creating a column saying if the slope is greater than zero or not.
> v[,"Beta"] <- vapply(seq_len(nrow(v)),                                        FUN=function(i)coef(lm(value~year,
data=data.frame(value=as.numeric(v[i,2:4]), year=seq_len(3))))[2],
                                        FUN.VALUE=0)> v[,"Growing"] <- v[,"Beta"] > 0
> v  Name Year_1_value Year_2_value Year_3_value Beta Growing
1    A            1            2            3  1.0    TRUE
2    B            2            7           19  8.5    TRUE
3    C            3            4            2 -0.5   FALSE
4    D           10            7            6 -2.0   FALSE
5    E            4            4            5  0.5    TRUE
6    F           NA            3            6  3.0    TRUE

Since you are doing least-squares regression in which the predictors are the
same for all regressions (expect the one with the NA in it) you can also
do> coef(lm(value ~ year, list(value=t(as.matrix(v[1:5,2:4])),
year=seq_len(3))))[2,]   1    2    3    4    5 
 1.0  8.5 -0.5 -2.0  0.5
but you have to then make a special case for each pattern of missing values.

If you always use a 3-consecutive-year period you can use
   Growing <- v[,"Year_1_value"] < v[, "Year_3_value"]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Satsangi, Vivek (GE Capital)
> Sent: Tuesday, April 30, 2013 12:57 PM
> To: r-help at r-project.org
> Subject: [R] Line similarity
> 
> Folks,
> 
>                 This is probably a "help me google this properly,
please"-type of question.
> 
>                 In TIBCO Spotfire, there is a procedure called "line
similarity". I use this to
> determine which observations show a growing, stable or declining pattern...
sort of like a
> mini-regression on the time-line for each observation.
> 
>                 So of the input is something like this:
> 
> Name Year_1_value Year_2_value Year_3_value
> A 1 2 3
> B 2 7 19
> C 3 4 2
> D 10 7 6
> E 4 4 5
> F NA 3 6
> 
> Then the desired output is as follows:
> A Growing
> B Growing
> C Stable
> D Declining
> E Stable
> F Growing (or NA is also fine)
> 
>                 The data can also be unstacked, i.e. the three years could
be separate rows if
> necessary.
>                 Is there a package for R that implements something like the
above? I can
> obviously try do a set of simple regressions to classify the rows, but I
want to gain from
> the thoughts and learnings of others who may have taken the time to
implement a
> package.
>                 I tried searching with the words "line
similarity" or its variants to no avail.
> 
>                 Thanks in advance for your pointers!
> 
> Vivek Satsangi
> GE Capital
> Americas
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2013-Apr-30 20:50 UTC

head link

[R] Line similarity

1. Read "an Introduction to R" or other R tutorial to learn how R
works.

2. You apparently wish to apply a function, f, to each row of a data
frame or matrix to classify it as growing, declining, etc. Only you
know what that function should look like. Write it.

3. Apply it using ?apply or perhaps the functionality of the plyR
package. There could be other ways to do it depending on what data
structure you use for your data. That is why you need to do some self
study.

?bert





On Tue, Apr 30, 2013 at 12:57 PM, Satsangi, Vivek (GE Capital)
<Vivek.Satsangi at ge.com> wrote:> Folks,
>
>                 This is probably a "help me google this properly,
please"-type of question.
>
>                 In TIBCO Spotfire, there is a procedure called "line
similarity". I use this to determine which observations show a growing,
stable or declining pattern... sort of like a mini-regression on the time-line
for each observation.
>
>                 So of the input is something like this:
>
> Name Year_1_value Year_2_value Year_3_value
> A 1 2 3
> B 2 7 19
> C 3 4 2
> D 10 7 6
> E 4 4 5
> F NA 3 6
>
> Then the desired output is as follows:
> A Growing
> B Growing
> C Stable
> D Declining
> E Stable
> F Growing (or NA is also fine)
>
>                 The data can also be unstacked, i.e. the three years could
be separate rows if necessary.
>                 Is there a package for R that implements something like the
above? I can obviously try do a set of simple regressions to classify the rows,
but I want to gain from the thoughts and learnings of others who may have taken
the time to implement a package.
>                 I tried searching with the words "line
similarity" or its variants to no avail.
>
>                 Thanks in advance for your pointers!
>
> Vivek Satsangi
> GE Capital
> Americas
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

arun

2013-Apr-30 21:40 UTC

head link

[R] Line similarity

Hi,
You could also do:
v<- read.table(text="
Name Year_1_value Year_2_value Year_3_value
A 1 2 3
B 2 7 19
C 3 4 2
D 10 7 6
E 4 4 5
F NA 3 6
",sep="",header=TRUE,stringsAsFactors=FALSE)
names(v)[-1]<-gsub("(.*\\d+)_.*$","\\1",names(v)[-1])
v2<- v


v1<-reshape(v,direction="long",varying=2:4,sep="_")
v$Beta<-sapply(split(v1,v1$Name),function(x) coef(lm(Year~time,data=x))[2])

v$Growing<- v$Beta>0
?v
#? Name Year_1 Year_2 Year_3 Beta Growing
#1??? A????? 1????? 2????? 3? 1.0??? TRUE
#2??? B????? 2????? 7???? 19? 8.5??? TRUE
#3??? C????? 3????? 4????? 2 -0.5?? FALSE
#4??? D???? 10????? 7????? 6 -2.0?? FALSE
#5??? E????? 4????? 4????? 5? 0.5??? TRUE
#6??? F???? NA????? 3????? 6? 3.0??? TRUE

#or
library(plyr)
v2$Beta<- ldply(dlply(v1,.(Name),lm, formula=Year~time),coef)[,3]
v2$Growing<- v2$Beta>0
?identical(v,v2)
#[1] TRUE


A.K.



----- Original Message -----
From: "Satsangi, Vivek (GE Capital)" <Vivek.Satsangi at ge.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Tuesday, April 30, 2013 3:57 PM
Subject: [R] Line similarity

Folks,

? ? ? ? ? ? ? ? This is probably a "help me google this properly,
please"-type of question.

? ? ? ? ? ? ? ? In TIBCO Spotfire, there is a procedure called "line
similarity". I use this to determine which observations show a growing,
stable or declining pattern... sort of like a mini-regression on the time-line
for each observation.

? ? ? ? ? ? ? ? So of the input is something like this:

Name Year_1_value Year_2_value Year_3_value
A 1 2 3
B 2 7 19
C 3 4 2
D 10 7 6
E 4 4 5
F NA 3 6

Then the desired output is as follows:
A Growing
B Growing
C Stable
D Declining
E Stable
F Growing (or NA is also fine)

? ? ? ? ? ? ? ? The data can also be unstacked, i.e. the three years could be
separate rows if necessary.
? ? ? ? ? ? ? ? Is there a package for R that implements something like the
above? I can obviously try do a set of simple regressions to classify the rows,
but I want to gain from the thoughts and learnings of others who may have taken
the time to implement a package.
? ? ? ? ? ? ? ? I tried searching with the words "line similarity" or
its variants to no avail.

? ? ? ? ? ? ? ? Thanks in advance for your pointers!

Vivek Satsangi
GE Capital
Americas


??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Apr 2013 - Line similarity

[R] Line similarity

[R] Line similarity

[R] Line similarity

[R] Line similarity

Apparently Analagous Threads