Dear members, I want to create a sequence of numbers for the multiple records of individual animal in my dataset. The SAS code below will do the trick, but I want to learn to do it in R. Can anyone help ? data ht&ssn; set ht&ssn; by anml_key; if first.anml_key then do; seq_ht_rslt=0; end; seq_ht_rslt+1; Thanks in advance. Stella ___________________________________________________________________________ This message, including attachments, is confidential. If you are not the intended recipient, please contact us as soon as possible and then destroy the message. Do not copy, disclose or use the contents in any way. The recipient should check this email and any attachments for viruses and other defects. Livestock Improvement Corporation Limited and any of its subsidiaries and associates are not responsible for the consequences of any virus, data corruption, interception or unauthorised amendments to this email. Because of the many uncertainties of email transmission we cannot guarantee that a reply to this email will be received even if correctly sent. Unless specifically stated to the contrary, this email does not designate an information system for the purposes of section 11(a) of the New Zealand Electronic Transactions Act 2002.
ssim at lic.co.nz writes:> Dear members, > > I want to create a sequence of numbers for the multiple records of > individual animal in my dataset. The SAS code below will do the trick, but > I want to learn to do it in R. Can anyone help ? > > data ht&ssn; > set ht&ssn; > by anml_key; > if first.anml_key then do; > seq_ht_rslt=0; > end; > seq_ht_rslt+1; > > Thanks in advance.Whoa. Who just said that SAS data step code was clearer than R? Quite a bit of implicit knowledge in that one. Here's one way (someone please think up a better name for ave()...):> x <- numeric(nrow(airquality)) > ave(x, airquality$Month, FUN=function(z)seq(along=z))[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 [37] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [55] 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 [73] 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 [91] 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [109] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 [127] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [145] 22 23 24 25 26 27 28 29 30 or, same basic idea but a little less cryptic:> tb <- table(airquality$Month) > l <- lapply(tb, function(x)seq(length=x)) > unsplit(l, airquality$Month)[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 (etc.) or, brute force and ignorance:> x <- numeric(nrow(airquality)) > for (i in unique(airquality$Month)) {+ ix <- airquality$Month == i + x[ix] <- seq(along=x[ix]) + }> x[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 .... or, going to the opposite extreme (Gabor et al. are going to try and beat me on this...):> seq.factor <- function(f) ave(rep(1,length(f)),f,FUN=cumsum) > seq(as.factor(airquality$Month))[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 .... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
I think this might do it.> x.1 <- data.frame(x=sample(1:3,20,T), y=sample(10:12,20,T)) # createtest data> x.1 # print it outx y 1 2 11 2 3 11 3 2 10 4 1 12 5 3 11 6 1 10 7 3 10 8 1 11 9 1 12 10 1 11 11 1 12 12 1 12 13 2 11 14 3 11 15 3 10 16 3 10 17 2 12 18 2 10 19 3 11 20 2 11 # split the data by the numbers in 'x' (would be your 'amnl_key) # and add a column containing the sequence number> x.s <- by(x.1, x.1$x, function(x){x$seq <- seq(along=x$x); x})# the result in 'x.s' is a list and the rows have to be recombined (rbind) to form the result> x.s # print out the datax.1$x: 1 x y seq 4 1 12 1 6 1 10 2 8 1 11 3 9 1 12 4 10 1 11 5 11 1 12 6 12 1 12 7 ------------------------------------------------------------ x.1$x: 2 x y seq 1 2 11 1 3 2 10 2 13 2 11 3 17 2 12 4 18 2 10 5 20 2 11 6 ------------------------------------------------------------ x.1$x: 3 x y seq 2 3 11 1 5 3 11 2 7 3 10 3 14 3 11 4 15 3 10 5 16 3 10 6 19 3 11 7> do.call('rbind', x.s) # bind the rows and print out the resultx y seq 1.4 1 12 1 1.6 1 10 2 1.8 1 11 3 1.9 1 12 4 1.10 1 11 5 1.11 1 12 6 1.12 1 12 7 2.1 2 11 1 2.3 2 10 2 2.13 2 11 3 2.17 2 12 4 2.18 2 10 5 2.20 2 11 6 3.2 3 11 1 3.5 3 11 2 3.7 3 10 3 3.14 3 11 4 3.15 3 10 5 3.16 3 10 6 3.19 3 11 7>__________________________________________________________ James Holtman "What is the problem you are trying to solve?" Executive Technical Consultant -- Office of Technology, Convergys james.holtman at convergys.com +1 (513) 723-2929 ssim at lic.co.nz Sent by: To: r-help at stat.math.ethz.ch r-help-bounces at stat.m cc: ath.ethz.ch Subject: [R] RE : Create sequence for dataset 11/21/2004 16:28 Dear members, I want to create a sequence of numbers for the multiple records of individual animal in my dataset. The SAS code below will do the trick, but I want to learn to do it in R. Can anyone help ? data ht&ssn; set ht&ssn; by anml_key; if first.anml_key then do; seq_ht_rslt=0; end; seq_ht_rslt+1; Thanks in advance. Stella ___________________________________________________________________________ This message, including attachments, is confidential. If you are not the intended recipient, please contact us as soon as possible and then destroy the message. Do not copy, disclose or use the contents in any way. The recipient should check this email and any attachments for viruses and other defects. Livestock Improvement Corporation Limited and any of its subsidiaries and associates are not responsible for the consequences of any virus, data corruption, interception or unauthorised amendments to this email. Because of the many uncertainties of email transmission we cannot guarantee that a reply to this email will be received even if correctly sent. Unless specifically stated to the contrary, this email does not designate an information system for the purposes of section 11(a) of the New Zealand Electronic Transactions Act 2002. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
ssim at lic.co.nz (Stella) asked I want to create a sequence of numbers for the multiple records of individual animal in my dataset. The SAS code below will do the trick, but I want to learn to do it in R. Can anyone help ? data ht&ssn; set ht&ssn; by anml_key; if first.anml_key then do; seq_ht_rslt=0; end; seq_ht_rslt+1; Someone was saying how readable SAS data steps were. I must say that as someone who has written code in more than 160 programming languages I find this _completely_ unreadable. (Is the initial value for seq_ht_rslt 0 or 1?) So I'm going to have to guess what was intended. Suppose you have a data.frame ht_ssn and want to add a sequence number column for it. That's easy: ht_ssn$seqno <- seq(length = nrow(ht_ssn)) Now suppose that there is an ht_ssn$anml_key column which says which individual animal each row corresponds to, and many rows may correspond to the same animal. data_sequence_number <- function (data, column = "anml_key") { # Extract the key column. # If it is not already a factor, make it one. # From this factor, extract the level numbers. as.numeric(as.factor(data[[column]])) } ht_ssn$seq_ht_rslt <- data_sequence_number(ht_ssn) Probably I have completely misunderstood the question. One thing which will be different is the actual numeric values. If I've understood the SAS version, it will assign numbers to keys in the order in which the keys are encountered, while the R code above will assign numbers to keys in increasing order of key. So if the input contains just "Sammy" then "Jumbo" the SAS version might assign numbers 1, 2 while the R version would assign 2, 1. If this really matters, use x <- data[[column]] as.numeric(as.factor(x, levels = unique(x)))