This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. What is an equivalent if x is a data frame? The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list. Terry T.
Hi Terry, I take your question to mean how to label distinct rows of a data frame. If that is not your question please clarify. I found the row.match() function in the package prodlim that can be used to solve this. However since your request requires no additional dependencies I borrowed the relevant code from the row.match function. Here is some obfuscated code to provide your answer in one line, per your request. (less obfuscated code just below that. Assuming your data frame is called 'df': df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep "\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) ) The last column of df now contains the 'label' i.e. the row number of the first row in df that is the same as the given row. Somewhat less obfuscated getLabels <- function(df) { match( do.call("paste", c(df[, , drop = FALSE], sep = "\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) ) } myDataFrame$label <- getLabels(myDataFrame) HTH, Eric On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. < therneau at mayo.edu> wrote:> This question likely has a 1 line answer, I'm just not seeing it. (2, 3, > or 10 lines is fine too.) > > For a vector I can do group <- match(x, unqiue(x)) to get a vector that > labels each element of x. > What is an equivalent if x is a data frame? > > The result does not have to be fast: the data set will have < 100 > elements. Since this is inside the survival package, and that package is > on the 'recommended' list, I can't depend on any package outside the > recommended list. > > Terry T. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
"Label" is not a clear term for data frames, but most data frames have rownames. If dta is a data frame, not a tibble, rownames( dta )[ !duplicated( dta ) ] Or could use row indexes directly which( !duplicated( dta ) ) -- Sent from my phone. Please excuse my brevity. On September 18, 2017 6:54:29 AM PDT, Eric Berger <ericjberger at gmail.com> wrote:>Hi Terry, >I take your question to mean how to label distinct rows of a data >frame. If >that is not your question please clarify. >I found the row.match() function in the package prodlim that can be >used to >solve this. >However since your request requires no additional dependencies I >borrowed >the relevant code from the row.match function. >Here is some obfuscated code to provide your answer in one line, per >your >request. (less obfuscated code just below that. > >Assuming your data frame is called 'df': > >df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep >>"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) >) > >The last column of df now contains the 'label' i.e. the row number of >the >first row in df that is the same as the given row. > >Somewhat less obfuscated > >getLabels <- function(df) { > match( do.call("paste", c(df[, , drop = FALSE], >sep = "\\r")), > do.call("paste", c(unique(df)[, , drop >= FALSE], sep = "\\r")) ) > } > >myDataFrame$label <- getLabels(myDataFrame) > > >HTH, > >Eric > > >On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. < >therneau at mayo.edu> wrote: > >> This question likely has a 1 line answer, I'm just not seeing it. >(2, 3, >> or 10 lines is fine too.) >> >> For a vector I can do group <- match(x, unqiue(x)) to get a vector >that >> labels each element of x. >> What is an equivalent if x is a data frame? >> >> The result does not have to be fast: the data set will have < 100 >> elements. Since this is inside the survival package, and that >package is >> on the 'recommended' list, I can't depend on any package outside the >> recommended list. >> >> Terry T. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi! 2017-09-18 07:13 -0500, Therneau, Terry M., Ph.D. wrote:> This question likely has a 1 line answer, I'm just not seeing > it.??(2, 3, or 10 lines is? > fine too.) > > For a vector I can do group??<- match(x, unqiue(x)) to get a vector > that labels each? > element of x.Actually, you get a vector of indices matching 'unique(x)', not a labelled vector.> x<-c("A","B","C","A","C","D") > group<-match(x, unique(x)) > group[1] 1 2 3 1 3 4> What is an equivalent if x is a data frame?So you will generate an index where duplicated rows have the row index of the first occurrence, right? This could work:>?x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1)) > group<-rownames(x) >?for (i in 1:(nrow(x)-1)) {?? ? ?for (j in (i+1):nrow(x)) {? ? ? ? ? if (sum(as.numeric(x[i,]==x[j,]))==ncol(x)) {? ? ? ? ? ? ?group[j]<-group[i] } ? ? ?} ? ?}> ?group[1] "1" "2" "3" "3" "5" "1" HTH, Kimmo
You could use merge() with an ID column pasted onto the table of names, as in> tbl <- data.frame(FirstName=c("Abe","Abe","Bob","Chuck","Chuck"),Surname=c("Xavier","Yates","Yates","Yates","Zapf"), Id=paste0("P",101:105))> tblFirstName Surname Id 1 Abe Xavier P101 2 Abe Yates P102 3 Bob Yates P103 4 Chuck Yates P104 5 Chuck Zapf P105> merge(data.frame(FirstName=c("Abe","Chuck","Dave"),Surname=rep("Yates",3)), tbl, all.x=TRUE) FirstName Surname Id 1 Abe Yates P102 2 Chuck Yates P104 3 Dave Yates <NA> Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Sep 18, 2017 at 5:13 AM, Therneau, Terry M., Ph.D. < therneau at mayo.edu> wrote:> This question likely has a 1 line answer, I'm just not seeing it. (2, 3, > or 10 lines is fine too.) > > For a vector I can do group <- match(x, unqiue(x)) to get a vector that > labels each element of x. > What is an equivalent if x is a data frame? > > The result does not have to be fast: the data set will have < 100 > elements. Since this is inside the survival package, and that package is > on the 'recommended' list, I can't depend on any package outside the > recommended list. > > Terry T. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <therneau at mayo.edu> wrote: > > This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) > > For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. > What is an equivalent if x is a data frame? >In the past I've use apply with past to generate "group" identifiers: x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1)) apply(x, 1, paste, collapse=".") [1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"> The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
Yes. My understanding is that you want the identifier to have the same number of rows as the data frame. A slight variant of David's solution would then be: do.call(paste0,x) -- Bert On Mon, Sep 18, 2017 at 8:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. < > therneau at mayo.edu> wrote: > > > > This question likely has a 1 line answer, I'm just not seeing it. (2, > 3, or 10 lines is fine too.) > > > > For a vector I can do group <- match(x, unqiue(x)) to get a vector that > labels each element of x. > > What is an equivalent if x is a data frame? > > > > In the past I've use apply with past to generate "group" identifiers: > > > x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1)) > > apply(x, 1, paste, collapse=".") > [1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1" > > > > The result does not have to be fast: the data set will have < 100 > elements. Since this is inside the survival package, and that package is > on the 'recommended' list, I can't depend on any package outside the > recommended list. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]