thr3ads.net - R help - [R] RE : Create sequence for dataset [Nov 2004]

If this information is useful, please help other people find it:
Share via:

ssim@lic.co.nz

2004-Nov-21 21:28 UTC

[R] RE : Create sequence for dataset

Dear members,

I want to create a sequence of numbers for the multiple records of
individual animal in my dataset. The SAS code below will do the trick, but
I want to learn to do it in R. Can anyone help ?

data ht&ssn;
set ht&ssn;
by anml_key;
if first.anml_key then do;
seq_ht_rslt=0;
end;
seq_ht_rslt+1;

Thanks in advance.

Stella
___________________________________________________________________________
This message, including attachments, is confidential. If you are not the
intended recipient, please contact us as soon as possible and then destroy
the message. Do not copy, disclose or use the contents in any way.

The recipient should check this email and any attachments for viruses and
other defects. Livestock Improvement Corporation Limited and any of its
subsidiaries and associates are not responsible for the consequences of any
virus, data corruption, interception or unauthorised amendments to this
email.

Because of the many uncertainties of email transmission we cannot guarantee
that a reply to this email will be received even if correctly sent. Unless
specifically stated to the contrary, this email does not designate an
information system for the purposes of section 11(a) of the New Zealand
Electronic Transactions Act 2002.

Peter Dalgaard

2004-Nov-21 22:57 UTC

head link

[R] RE : Create sequence for dataset

ssim at lic.co.nz writes:
> Dear members,
> 
> I want to create a sequence of numbers for the multiple records of
> individual animal in my dataset. The SAS code below will do the trick, but
> I want to learn to do it in R. Can anyone help ?
> 
> data ht&ssn;
> set ht&ssn;
> by anml_key;
> if first.anml_key then do;
> seq_ht_rslt=0;
> end;
> seq_ht_rslt+1;
> 
> Thanks in advance.
Whoa. Who just said that SAS data step code was clearer than R? Quite
a bit of implicit knowledge in that one.

Here's one way (someone please think up a better name for ave()...):
> x <- numeric(nrow(airquality))
> ave(x, airquality$Month, FUN=function(z)seq(along=z))  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31  1  2  3  4  5
 [37]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 [55] 24 25 26 27 28 29 30  1  2  3  4  5  6  7  8  9 10 11
 [73] 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
 [91] 30 31  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
[109] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  1  2  3
[127]  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
[145] 22 23 24 25 26 27 28 29 30

or, same basic idea but a little less cryptic:
> tb <- table(airquality$Month) 
> l <- lapply(tb, function(x)seq(length=x))
> unsplit(l, airquality$Month)     [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31  1  2  3  4  5
(etc.)

or, brute force and ignorance:
> x <- numeric(nrow(airquality))
> for (i in unique(airquality$Month)) {+   ix <- airquality$Month == i
+   x[ix] <- seq(along=x[ix])
+ }> x  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31  1  2  3  4  5
....

or, going to the opposite extreme (Gabor et al. are going to try and
beat me on this...):
> seq.factor <- function(f) ave(rep(1,length(f)),f,FUN=cumsum)
> seq(as.factor(airquality$Month))  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31  1  2  3  4  5
....

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

james.holtman@convergys.com

2004-Nov-21 23:18 UTC

head link

[R] RE : Create sequence for dataset

I think this might do it.
> x.1 <- data.frame(x=sample(1:3,20,T), y=sample(10:12,20,T))  # create
test data> x.1  # print it out   x  y
1  2 11
2  3 11
3  2 10
4  1 12
5  3 11
6  1 10
7  3 10
8  1 11
9  1 12
10 1 11
11 1 12
12 1 12
13 2 11
14 3 11
15 3 10
16 3 10
17 2 12
18 2 10
19 3 11
20 2 11
# split the data by the numbers in 'x' (would be your 'amnl_key)
# and add a column containing the sequence number> x.s <- by(x.1, x.1$x, function(x){x$seq <- seq(along=x$x); x})# the result in 'x.s' is a list and the rows have to be recombined
(rbind)
to form the result> x.s  # print out the datax.1$x: 1
   x  y seq
4  1 12   1
6  1 10   2
8  1 11   3
9  1 12   4
10 1 11   5
11 1 12   6
12 1 12   7
------------------------------------------------------------
x.1$x: 2
   x  y seq
1  2 11   1
3  2 10   2
13 2 11   3
17 2 12   4
18 2 10   5
20 2 11   6
------------------------------------------------------------
x.1$x: 3
   x  y seq
2  3 11   1
5  3 11   2
7  3 10   3
14 3 11   4
15 3 10   5
16 3 10   6
19 3 11   7> do.call('rbind', x.s)  # bind the rows and print out the result     x  y seq
1.4  1 12   1
1.6  1 10   2
1.8  1 11   3
1.9  1 12   4
1.10 1 11   5
1.11 1 12   6
1.12 1 12   7
2.1  2 11   1
2.3  2 10   2
2.13 2 11   3
2.17 2 12   4
2.18 2 10   5
2.20 2 11   6
3.2  3 11   1
3.5  3 11   2
3.7  3 10   3
3.14 3 11   4
3.15 3 10   5
3.16 3 10   6
3.19 3 11   7>__________________________________________________________
James Holtman        "What is the problem you are trying to solve?"
Executive Technical Consultant  --  Office of Technology, Convergys
james.holtman at convergys.com
+1 (513) 723-2929



                      ssim at lic.co.nz
                      Sent by:                     To:       r-help at
stat.math.ethz.ch
                      r-help-bounces at stat.m        cc:
                      ath.ethz.ch                  Subject:  [R] RE : Create
sequence for dataset


                      11/21/2004 16:28






Dear members,

I want to create a sequence of numbers for the multiple records of
individual animal in my dataset. The SAS code below will do the trick, but
I want to learn to do it in R. Can anyone help ?

data ht&ssn;
set ht&ssn;
by anml_key;
if first.anml_key then do;
seq_ht_rslt=0;
end;
seq_ht_rslt+1;

Thanks in advance.

Stella
___________________________________________________________________________
This message, including attachments, is confidential. If you are not the
intended recipient, please contact us as soon as possible and then destroy
the message. Do not copy, disclose or use the contents in any way.

The recipient should check this email and any attachments for viruses and
other defects. Livestock Improvement Corporation Limited and any of its
subsidiaries and associates are not responsible for the consequences of any
virus, data corruption, interception or unauthorised amendments to this
email.

Because of the many uncertainties of email transmission we cannot guarantee
that a reply to this email will be received even if correctly sent. Unless
specifically stated to the contrary, this email does not designate an
information system for the purposes of section 11(a) of the New Zealand
Electronic Transactions Act 2002.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Richard A. O'Keefe

2004-Nov-22 01:38 UTC

head link

[R] RE : Create sequence for dataset

ssim at lic.co.nz (Stella) asked
	I want to create a sequence of numbers for the multiple records of
	individual animal in my dataset. The SAS code below will do the trick, but
	I want to learn to do it in R. Can anyone help ?
	
	data ht&ssn;
	set ht&ssn;
	by anml_key;
	if first.anml_key then do;
	seq_ht_rslt=0;
	end;
	seq_ht_rslt+1;
	
Someone was saying how readable SAS data steps were.
I must say that as someone who has written code in more than 160
programming languages I find this _completely_ unreadable.
(Is the initial value for seq_ht_rslt 0 or 1?)
So I'm going to have to guess what was intended.

Suppose you have a data.frame ht_ssn and want to add a sequence number
column for it.  That's easy:

	ht_ssn$seqno <- seq(length = nrow(ht_ssn))

Now suppose that there is an ht_ssn$anml_key column which says which
individual animal each row corresponds to, and many rows may correspond
to the same animal.

	data_sequence_number <- function (data, column = "anml_key") {
	    # Extract the key column.
	    # If it is not already a factor, make it one.
	    # From this factor, extract the level numbers.
	    as.numeric(as.factor(data[[column]]))
	}

	ht_ssn$seq_ht_rslt <- data_sequence_number(ht_ssn)

Probably I have completely misunderstood the question.

One thing which will be different is the actual numeric values.
If I've understood the SAS version, it will assign numbers to keys
in the order in which the keys are encountered, while the R code
above will assign numbers to keys in increasing order of key.  So
if the input contains just "Sammy" then "Jumbo" the SAS
version
might assign numbers 1, 2 while the R version would assign 2, 1.

If this really matters, use
	x <- data[[column]]
	as.numeric(as.factor(x, levels = unique(x)))

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Nov 2004 - RE : Create sequence for dataset

[R] RE : Create sequence for dataset

[R] RE : Create sequence for dataset

[R] RE : Create sequence for dataset

[R] RE : Create sequence for dataset

Apparently Analagous Threads