Folks, This is probably a "help me google this properly, please"-type of question. In TIBCO Spotfire, there is a procedure called "line similarity". I use this to determine which observations show a growing, stable or declining pattern... sort of like a mini-regression on the time-line for each observation. So of the input is something like this: Name Year_1_value Year_2_value Year_3_value A 1 2 3 B 2 7 19 C 3 4 2 D 10 7 6 E 4 4 5 F NA 3 6 Then the desired output is as follows: A Growing B Growing C Stable D Declining E Stable F Growing (or NA is also fine) The data can also be unstacked, i.e. the three years could be separate rows if necessary. Is there a package for R that implements something like the above? I can obviously try do a set of simple regressions to classify the rows, but I want to gain from the thoughts and learnings of others who may have taken the time to implement a package. I tried searching with the words "line similarity" or its variants to no avail. Thanks in advance for your pointers! Vivek Satsangi GE Capital Americas [[alternative HTML version deleted]]
Here is one way to, for each row in the data.frame v, regress the numbers in columns 2 through 4 on the numbers 1 through 3, storing only the slopes, and then creating a column saying if the slope is greater than zero or not.> v[,"Beta"] <- vapply(seq_len(nrow(v)),FUN=function(i)coef(lm(value~year, data=data.frame(value=as.numeric(v[i,2:4]), year=seq_len(3))))[2], FUN.VALUE=0)> v[,"Growing"] <- v[,"Beta"] > 0 > vName Year_1_value Year_2_value Year_3_value Beta Growing 1 A 1 2 3 1.0 TRUE 2 B 2 7 19 8.5 TRUE 3 C 3 4 2 -0.5 FALSE 4 D 10 7 6 -2.0 FALSE 5 E 4 4 5 0.5 TRUE 6 F NA 3 6 3.0 TRUE Since you are doing least-squares regression in which the predictors are the same for all regressions (expect the one with the NA in it) you can also do> coef(lm(value ~ year, list(value=t(as.matrix(v[1:5,2:4])), year=seq_len(3))))[2,]1 2 3 4 5 1.0 8.5 -0.5 -2.0 0.5 but you have to then make a special case for each pattern of missing values. If you always use a 3-consecutive-year period you can use Growing <- v[,"Year_1_value"] < v[, "Year_3_value"] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Satsangi, Vivek (GE Capital) > Sent: Tuesday, April 30, 2013 12:57 PM > To: r-help at r-project.org > Subject: [R] Line similarity > > Folks, > > This is probably a "help me google this properly, please"-type of question. > > In TIBCO Spotfire, there is a procedure called "line similarity". I use this to > determine which observations show a growing, stable or declining pattern... sort of like a > mini-regression on the time-line for each observation. > > So of the input is something like this: > > Name Year_1_value Year_2_value Year_3_value > A 1 2 3 > B 2 7 19 > C 3 4 2 > D 10 7 6 > E 4 4 5 > F NA 3 6 > > Then the desired output is as follows: > A Growing > B Growing > C Stable > D Declining > E Stable > F Growing (or NA is also fine) > > The data can also be unstacked, i.e. the three years could be separate rows if > necessary. > Is there a package for R that implements something like the above? I can > obviously try do a set of simple regressions to classify the rows, but I want to gain from > the thoughts and learnings of others who may have taken the time to implement a > package. > I tried searching with the words "line similarity" or its variants to no avail. > > Thanks in advance for your pointers! > > Vivek Satsangi > GE Capital > Americas > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
1. Read "an Introduction to R" or other R tutorial to learn how R works. 2. You apparently wish to apply a function, f, to each row of a data frame or matrix to classify it as growing, declining, etc. Only you know what that function should look like. Write it. 3. Apply it using ?apply or perhaps the functionality of the plyR package. There could be other ways to do it depending on what data structure you use for your data. That is why you need to do some self study. ?bert On Tue, Apr 30, 2013 at 12:57 PM, Satsangi, Vivek (GE Capital) <Vivek.Satsangi at ge.com> wrote:> Folks, > > This is probably a "help me google this properly, please"-type of question. > > In TIBCO Spotfire, there is a procedure called "line similarity". I use this to determine which observations show a growing, stable or declining pattern... sort of like a mini-regression on the time-line for each observation. > > So of the input is something like this: > > Name Year_1_value Year_2_value Year_3_value > A 1 2 3 > B 2 7 19 > C 3 4 2 > D 10 7 6 > E 4 4 5 > F NA 3 6 > > Then the desired output is as follows: > A Growing > B Growing > C Stable > D Declining > E Stable > F Growing (or NA is also fine) > > The data can also be unstacked, i.e. the three years could be separate rows if necessary. > Is there a package for R that implements something like the above? I can obviously try do a set of simple regressions to classify the rows, but I want to gain from the thoughts and learnings of others who may have taken the time to implement a package. > I tried searching with the words "line similarity" or its variants to no avail. > > Thanks in advance for your pointers! > > Vivek Satsangi > GE Capital > Americas > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hi, You could also do: v<- read.table(text=" Name Year_1_value Year_2_value Year_3_value A 1 2 3 B 2 7 19 C 3 4 2 D 10 7 6 E 4 4 5 F NA 3 6 ",sep="",header=TRUE,stringsAsFactors=FALSE) names(v)[-1]<-gsub("(.*\\d+)_.*$","\\1",names(v)[-1]) v2<- v v1<-reshape(v,direction="long",varying=2:4,sep="_") v$Beta<-sapply(split(v1,v1$Name),function(x) coef(lm(Year~time,data=x))[2]) v$Growing<- v$Beta>0 ?v #? Name Year_1 Year_2 Year_3 Beta Growing #1??? A????? 1????? 2????? 3? 1.0??? TRUE #2??? B????? 2????? 7???? 19? 8.5??? TRUE #3??? C????? 3????? 4????? 2 -0.5?? FALSE #4??? D???? 10????? 7????? 6 -2.0?? FALSE #5??? E????? 4????? 4????? 5? 0.5??? TRUE #6??? F???? NA????? 3????? 6? 3.0??? TRUE #or library(plyr) v2$Beta<- ldply(dlply(v1,.(Name),lm, formula=Year~time),coef)[,3] v2$Growing<- v2$Beta>0 ?identical(v,v2) #[1] TRUE A.K. ----- Original Message ----- From: "Satsangi, Vivek (GE Capital)" <Vivek.Satsangi at ge.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Tuesday, April 30, 2013 3:57 PM Subject: [R] Line similarity Folks, ? ? ? ? ? ? ? ? This is probably a "help me google this properly, please"-type of question. ? ? ? ? ? ? ? ? In TIBCO Spotfire, there is a procedure called "line similarity". I use this to determine which observations show a growing, stable or declining pattern... sort of like a mini-regression on the time-line for each observation. ? ? ? ? ? ? ? ? So of the input is something like this: Name Year_1_value Year_2_value Year_3_value A 1 2 3 B 2 7 19 C 3 4 2 D 10 7 6 E 4 4 5 F NA 3 6 Then the desired output is as follows: A Growing B Growing C Stable D Declining E Stable F Growing (or NA is also fine) ? ? ? ? ? ? ? ? The data can also be unstacked, i.e. the three years could be separate rows if necessary. ? ? ? ? ? ? ? ? Is there a package for R that implements something like the above? I can obviously try do a set of simple regressions to classify the rows, but I want to gain from the thoughts and learnings of others who may have taken the time to implement a package. ? ? ? ? ? ? ? ? I tried searching with the words "line similarity" or its variants to no avail. ? ? ? ? ? ? ? ? Thanks in advance for your pointers! Vivek Satsangi GE Capital Americas ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.