thr3ads.net - R help - [R] beginner programming question [Dec 2003]

If this information is useful, please help other people find it:
Share via:

Adrian Dusa

2003-Dec-17 19:28 UTC

[R] beginner programming question

Hi all,

 

The last e-mails about beginners gave me the courage to post a question;
from a beginner's perspective, there are a lot of questions that I'm
tempted to ask. But I'm trying to find the answers either in the
documentation, either in the about 15 free books I have, either in the
help archives (I often found many similar questions posted in the past).

Being an (still actual) user of SPSS, I'd like to be able to do
everything in R. I've learned that the best way of doing it is to
struggle and find a solution no matter what, refraining from doing it
with SPSS. I've became more and more aware of the almost unlimited
possibilities that R offers and I'd like to completely switch to R
whenever I think I'm ready.

 

I have a (rather theoretical) programming problem for which I have found
a solution, but I feel it is a rather poor one. I wonder if there's some
other (more clever) solution, using (maybe?) vectorization or
subscripting.

 

A toy example would be:

 

rel1       rel2       rel3       age0     age1     age2     age3
sex0     sex1     sex2     sex3

1          3          NA        25         23         2          NA
1          2          1          NA

4          1          3          35         67         34         10
2          2          1          2

1          4          4          39         40         59         60
1          2          2          1

4          NA        NA        45         70         NA        NA
2          2          NA        NA

 

where rel1...3 states the kinship with the respondent (person 0)

code 1 meaning husband/wife, code 4 meaning parent and code 3 for
children.

 

I would like to get the age for husbands (code 1) in a first column and
wife's age in the second:

 

ageh     agew

25         23

34         35

39         40

 

My solution uses *for* loops and *if*s checking for code 1 in each
element in the first 3 columns, then checking in the last three columns
for husband's code, then taking the corresponding age in a new matrix.
I've learned that *for* loops are very slow (and indeed with my dataset
of some 2000 rows and 13 columns for kinship it takes quite a lot).

I found the "Looping" chapter in "S poetry" very useful (it
did saved me
from *for* loops a couple of times, thanks!).

 

Any hints would be appreciated,

Adrian

 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Adrian Dusa (adi@roda.ro)
Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/> )
1, Schitu Magureanu Bd.
76625 Bucharest sector 5
Romania


Tel./Fax:

+40 (21) 312.66.18\

+40 (21) 312.02.10/ int.101

 


	[[alternative HTML version deleted]]

Gabor Grothendieck

2003-Dec-17 20:02 UTC

head link

[R] beginner programming question

Define function f to take a vector as input representing
a single input row.   f should (1) transform this to a vector 
representing the required row of output or else (2) produce 
NULL if no row is to be output for that input row.

Then use this code where z is your input matrix:

t( matrix( unlist( apply( z, 1, f ) ), 2) )



---
Date: Wed, 17 Dec 2003 21:28:05 +0200 
From: Adrian Dusa <adi at roda.ro>
To: <r-help at stat.math.ethz.ch> 
Subject: [R] beginner programming question 

 
 
Hi all,



The last e-mails about beginners gave me the courage to post a question;
from a beginner's perspective, there are a lot of questions that I'm
tempted to ask. But I'm trying to find the answers either in the
documentation, either in the about 15 free books I have, either in the
help archives (I often found many similar questions posted in the past).

Being an (still actual) user of SPSS, I'd like to be able to do
everything in R. I've learned that the best way of doing it is to
struggle and find a solution no matter what, refraining from doing it
with SPSS. I've became more and more aware of the almost unlimited
possibilities that R offers and I'd like to completely switch to R
whenever I think I'm ready.



I have a (rather theoretical) programming problem for which I have found
a solution, but I feel it is a rather poor one. I wonder if there's some
other (more clever) solution, using (maybe?) vectorization or
subscripting.



A toy example would be:



rel1 rel2 rel3 age0 age1 age2 age3
sex0 sex1 sex2 sex3

1 3 NA 25 23 2 NA
1 2 1 NA

4 1 3 35 67 34 10
2 2 1 2

1 4 4 39 40 59 60
1 2 2 1

4 NA NA 45 70 NA NA
2 2 NA NA



where rel1...3 states the kinship with the respondent (person 0)

code 1 meaning husband/wife, code 4 meaning parent and code 3 for
children.



I would like to get the age for husbands (code 1) in a first column and
wife's age in the second:



ageh agew

25 23

34 35

39 40



My solution uses *for* loops and *if*s checking for code 1 in each
element in the first 3 columns, then checking in the last three columns
for husband's code, then taking the corresponding age in a new matrix.
I've learned that *for* loops are very slow (and indeed with my dataset
of some 2000 rows and 13 columns for kinship it takes quite a lot).

I found the "Looping" chapter in "S poetry" very useful (it
did saved me
from *for* loops a couple of times, thanks!).



Any hints would be appreciated,

Adrian



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Adrian Dusa (adi at roda.ro)
Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/>; )
1, Schitu Magureanu Bd.
76625 Bucharest sector 5
Romania

Ray Brownrigg

2003-Dec-17 21:04 UTC

head link

[R] beginner programming question

> From: "Gabor Grothendieck" <ggrothendieck at myway.com>
> Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST)
> 
> Define function f to take a vector as input representing
> a single input row.   f should (1) transform this to a vector 
> representing the required row of output or else (2) produce 
> NULL if no row is to be output for that input row.
> 
> Then use this code where z is your input matrix:
> 
> t( matrix( unlist( apply( z, 1, f ) ), 2) )
> But as has been pointed out recently, apply really is still just a for
loop.
> > From: Adrian Dusa <adi at roda.ro>
> > Date: Wed, 17 Dec 2003 21:28:05 +0200 
> > 
> > I have a (rather theoretical) programming problem for which I have
found
> > a solution, but I feel it is a rather poor one. I wonder if
there's some
> > other (more clever) solution, using (maybe?) vectorization or
> > subscripting.
Here is a subscripting solution, where (for consistency with above) z is
your data [from read.table(filename, header=T)]:
> z  rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3
1    1    3   NA   25   23    2   NA    1    2    1   NA
2    4    1    3   35   67   34   10    2    2    1    2
3    1    4    4   39   40   59   60    1    2    2    1
4    4   NA   NA   45   70   NA   NA    2    2   NA   NA> res <- matrix(NA, nrow=length(z[, 1]), ncol=2,         dimnames=list(rownames=rownames(z), colnames=c("ageh",
"agew")))> w <- w0 <- w1 <- w2 <- which(z[, c("rel1",
"rel2", "rel3")] == 1, T)
					# find spouse entries> w0[, 2] <- z[, "sex0"][w[, 1]]	# indices for respondent's
age
> w1[, 2] <- 3 - w0[, 2]		# indices for spouse's age
> w2[, 2] <- 4 + w[, 2]			# indices of spouse's age
> res[w0] <- z[, "age0"][w[, 1]]	# set respondent's age
> res[w1] <- z[w2]			# set spouse's age
> res        colnames
rownames ageh agew
       1   25   23
       2   34   35
       3   39   40
       4   NA   NA>Ray Brownrigg

Gabor Grothendieck

2003-Dec-17 22:03 UTC

head link

[R] beginner programming question

This is just a response to the part where you refer to an apply
loop really being a for loop.  In a sense this true, but
it should nevertheless be recognized that the apply solution
has a number of advantages over for:

- it nicely separates the problem into a single line that is 
independent of the details of the problem and localizes them 
in f

- the rows are pasted together automatically avoiding messy
appending or creation and filling in of a structure

- it avoids the use of indices

Of course, some apply loops come pretty close to for loops.  For
example, consider this variation:

  t( matrix( unlist (sapply( 1:nrow(z), function(i) f(z[i,]) ) ), 2 ))

and compare it to the for loop:

 out <- NULL
 for ( i in 1:nrow(z) ) {
   v <- f( z[i,] )
   if ( ! is.null(v) ) out <- rbind( out, v )
}

but even this apply, which is clearly inferior to the one in my
original posting, retains the first two advantages listed.

---

Date: Thu, 18 Dec 2003 10:04:52 +1300 (NZDT) 
From: Ray Brownrigg <ray at mcs.vuw.ac.nz>
To: <adi at roda.ro>, <ggrothendieck at myway.com>, <r-help at
stat.math.ethz.ch>
Subject: RE: [R] beginner programming question 

 
 > From: "Gabor Grothendieck" <ggrothendieck at myway.com>
> Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST)
> 
> Define function f to take a vector as input representing
> a single input row. f should (1) transform this to a vector 
> representing the required row of output or else (2) produce 
> NULL if no row is to be output for that input row.
> 
> Then use this code where z is your input matrix:
> 
> t( matrix( unlist( apply( z, 1, f ) ), 2) )
> But as has been pointed out recently, apply really is still just a for
loop.
> > From: Adrian Dusa <adi at roda.ro>
> > Date: Wed, 17 Dec 2003 21:28:05 +0200 
> > 
> > I have a (rather theoretical) programming problem for which I have
found
> > a solution, but I feel it is a rather poor one. I wonder if
there's some
> > other (more clever) solution, using (maybe?) vectorization or
> > subscripting.
Here is a subscripting solution, where (for consistency with above) z is
your data [from read.table(filename, header=T)]:
> zrel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3
1 1 3 NA 25 23 2 NA 1 2 1 NA
2 4 1 3 35 67 34 10 2 2 1 2
3 1 4 4 39 40 59 60 1 2 2 1
4 4 NA NA 45 70 NA NA 2 2 NA NA> res <- matrix(NA, nrow=length(z[, 1]), ncol=2,dimnames=list(rownames=rownames(z), colnames=c("ageh",
"agew")))> w <- w0 <- w1 <- w2 <- which(z[, c("rel1",
"rel2", "rel3")] == 1, T)
                         # find spouse entries> w0[, 2] <- z[, "sex0"][w[, 1]]     # indices for
respondent's age
> w1[, 2] <- 3 - w0[, 2]          # indices for spouse's age
> w2[, 2] <- 4 + w[, 2]               # indices of spouse's age
> res[w0] <- z[, "age0"][w[, 1]]     # set respondent's age
> res[w1] <- z[w2]               # set spouse's age
> rescolnames
rownames ageh agew
1 25 23
2 34 35
3 39 40
4 NA NA>Ray Brownrigg

Tony Plate

2003-Dec-17 22:40 UTC

head link

[R] beginner programming question

Another way to approach this is to first massage the data into a more 
regular format.  This may or may not be simpler or faster than other 
solutions suggested.

 > x <- read.table("clipboard", header=T)
 > x
   rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3
1    1    3   NA   25   23    2   NA    1    2    1   NA
2    4    1    3   35   67   34   10    2    2    1    2
3    1    4    4   39   40   59   60    1    2    2    1
4    4   NA   NA   45   70   NA   NA    2    2   NA   NA
 > nn <-
c("rel","age0","age","sex0","sex")
 > xx <-
rbind("colnames<-"(x[,c("rel1","age0","age1","sex0","sex1")],
nn),
+ 
"colnames<-"(x[,c("rel2","age0","age2","sex0","sex2")],
nn),
+ 
"colnames<-"(x[,c("rel3","age0","age3","sex0","sex3")],
nn))
 > xx
    rel age0 age sex0 sex
1    1   25  23    1   2
2    4   35  67    2   2
3    1   39  40    1   2
4    4   45  70    2   2
11   3   25   2    1   1
21   1   35  34    2   1
31   4   39  59    1   2
41  NA   45  NA    2  NA
12  NA   25  NA    1  NA
22   3   35  10    2   2
32   4   39  60    1   1
42  NA   45  NA    2  NA
 >
 > rbind(subset(xx, xx$rel==1 & (xx$sex0==1 | 
xx$sex0==xx$sex))[,c("age0","age")], subset(xx, xx$rel==1
& xx$sex==1 &
xx$sex0!=xx$sex)[,c("age","age0")])
    age0 age
1    25  23
3    39  40
21   35  34
 >

hope this helps,

Tony Plate

PS.  To advanced R users: Is the above usage of the "colnames<-"
function
within an expression regarded as acceptable or as undesirable programming 
style? -- I've rarely seen it used, but it can be quite useful.

At Wednesday 09:28 PM 12/17/2003 +0200, Adrian Dusa
wrote:>Hi all,
>
>
>
>The last e-mails about beginners gave me the courage to post a question;
>from a beginner's perspective, there are a lot of questions that I'm
>tempted to ask. But I'm trying to find the answers either in the
>documentation, either in the about 15 free books I have, either in the
>help archives (I often found many similar questions posted in the past).
>
>Being an (still actual) user of SPSS, I'd like to be able to do
>everything in R. I've learned that the best way of doing it is to
>struggle and find a solution no matter what, refraining from doing it
>with SPSS. I've became more and more aware of the almost unlimited
>possibilities that R offers and I'd like to completely switch to R
>whenever I think I'm ready.
>
>
>
>I have a (rather theoretical) programming problem for which I have found
>a solution, but I feel it is a rather poor one. I wonder if there's some
>other (more clever) solution, using (maybe?) vectorization or
>subscripting.
>
>
>
>A toy example would be:
>
>
>
>rel1       rel2       rel3       age0     age1     age2     age3
>sex0     sex1     sex2     sex3
>
>1          3          NA        25         23         2          NA
>1          2          1          NA
>
>4          1          3          35         67         34         10
>2          2          1          2
>
>1          4          4          39         40         59         60
>1          2          2          1
>
>4          NA        NA        45         70         NA        NA
>2          2          NA        NA
>
>
>
>where rel1...3 states the kinship with the respondent (person 0)
>
>code 1 meaning husband/wife, code 4 meaning parent and code 3 for
>children.
>
>
>
>I would like to get the age for husbands (code 1) in a first column and
>wife's age in the second:
>
>
>
>ageh     agew
>
>25         23
>
>34         35
>
>39         40
>
>
>
>My solution uses *for* loops and *if*s checking for code 1 in each
>element in the first 3 columns, then checking in the last three columns
>for husband's code, then taking the corresponding age in a new matrix.
>I've learned that *for* loops are very slow (and indeed with my dataset
>of some 2000 rows and 13 columns for kinship it takes quite a lot).
>
>I found the "Looping" chapter in "S poetry" very useful
(it did saved me
>from *for* loops a couple of times, thanks!).
>
>
>
>Any hints would be appreciated,
>
>Adrian
>
>
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Adrian Dusa (adi at roda.ro)
>Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/> )
>1, Schitu Magureanu Bd.
>76625 Bucharest sector 5
>Romania
>
>
>Tel./Fax:
>
>+40 (21) 312.66.18\
>
>+40 (21) 312.02.10/ int.101
>
>
>
>
>         [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Adrian Dusa

2003-Dec-22 22:45 UTC

head link

[R] beginner programming question

Thank you all! I did it, and it worked just fine. In the last week I've been
torturing the syntaxes in various ways, until finally it was all clear. The 
subscripting solution opened new doors for me.
Particularly, the reshape command gave me about three days of a head ache. I 
read the help about 20 times, trying to figure out how to do it; the trouble 
with the help was that it doesn't present examples of reshaping for multiple
sets of varying variables, nor that the new variables' names in the long
format
should be defined as a vector with the v.names attribute.

Anyway, the syntax is:
> x <- read.table("clipboard", header=T)
> x  rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3
1    1    3   NA   25   23    2   NA    1    2    1   NA
2    4    1    3   35   67   34   10    2    2    1    2
3    1    4    4   39   40   59   60    1    2    2    1
4    4   NA   NA   45   70   NA   NA    2    2   NA   NA
> xx <- reshape(x, varying=list(names(x)[1:3], names(x)[5:7], + names(x)[9:11]), v.names=c("rel", "age", "sex"),
direction="long")> xx    age0 sex0 time rel age sex id
1.1   25    1    1   1  23   2  1
2.1   35    2    1   4  67   2  2
3.1   39    1    1   1  40   2  3
4.1   45    2    1   4  70   2  4
1.2   25    1    2   3   2   1  1
2.2   35    2    2   1  34   1  2
3.2   39    1    2   4  59   2  3
4.2   45    2    2  NA  NA  NA  4
1.3   25    1    3  NA  NA  NA  1
2.3   35    2    3   3  10   2  2
3.3   39    1    3   4  60   1  3
4.3   45    2    3  NA  NA  NA  4
> xx <- subset(xx, xx$rel==1)
> rbind(subset(xx, xx$sex0==1)[,c("age0","age")],+ subset(xx, xx$sex==1)[,c("age","age0")])
    age0 age
1.1   25  23
3.1   39  40
2.2   35  34

I wish you a Merry Xmas, you are a truly great community.
Adrian

-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
Sent: Thursday, December 18, 2003 5:53 PM
To: Tony Plate
Cc: adi at roda.ro; r-help at stat.math.ethz.ch
Subject: Re: [R] beginner programming question

On Wed, 17 Dec 2003, Tony Plate wrote:
> Another way to approach this is to first massage the data into a more
> regular format.  This may or may not be simpler or faster than other
> solutions suggested.
You could also use the reshape() command to do the massaging

	-thomas
>  > x <- read.table("clipboard", header=T)
>  > x
>    rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3
> 1    1    3   NA   25   23    2   NA    1    2    1   NA
> 2    4    1    3   35   67   34   10    2    2    1    2
> 3    1    4    4   39   40   59   60    1    2    2    1
> 4    4   NA   NA   45   70   NA   NA    2    2   NA   NA
>  > nn <-
c("rel","age0","age","sex0","sex")
>  > xx <-
rbind("colnames<-"(x[,c("rel1","age0","age1","sex0","sex1")],
nn),
> + 
"colnames<-"(x[,c("rel2","age0","age2","sex0","sex2")],
nn),
> + 
"colnames<-"(x[,c("rel3","age0","age3","sex0","sex3")],
nn))
>  > xx
>     rel age0 age sex0 sex
> 1    1   25  23    1   2
> 2    4   35  67    2   2
> 3    1   39  40    1   2
> 4    4   45  70    2   2
> 11   3   25   2    1   1
> 21   1   35  34    2   1
> 31   4   39  59    1   2
> 41  NA   45  NA    2  NA
> 12  NA   25  NA    1  NA
> 22   3   35  10    2   2
> 32   4   39  60    1   1
> 42  NA   45  NA    2  NA
>  >
>  > rbind(subset(xx, xx$rel==1 & (xx$sex0==1 |
> xx$sex0==xx$sex))[,c("age0","age")], subset(xx,
xx$rel==1 & xx$sex==1 &
> xx$sex0!=xx$sex)[,c("age","age0")])
>     age0 age
> 1    25  23
> 3    39  40
> 21   35  34
>  >
>
> hope this helps,
>
> Tony Plate
>
> PS.  To advanced R users: Is the above usage of the
"colnames<-" function
> within an expression regarded as acceptable or as undesirable programming
> style? -- I've rarely seen it used, but it can be quite useful.



-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Dec 2003 - beginner programming question

[R] beginner programming question

[R] beginner programming question

[R] beginner programming question

[R] beginner programming question

[R] beginner programming question

[R] beginner programming question

Apparently Analagous Threads