Hi all, The last e-mails about beginners gave me the courage to post a question; from a beginner's perspective, there are a lot of questions that I'm tempted to ask. But I'm trying to find the answers either in the documentation, either in the about 15 free books I have, either in the help archives (I often found many similar questions posted in the past). Being an (still actual) user of SPSS, I'd like to be able to do everything in R. I've learned that the best way of doing it is to struggle and find a solution no matter what, refraining from doing it with SPSS. I've became more and more aware of the almost unlimited possibilities that R offers and I'd like to completely switch to R whenever I think I'm ready. I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. A toy example would be: rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 3 NA 25 23 2 NA 1 2 1 NA 4 1 3 35 67 34 10 2 2 1 2 1 4 4 39 40 59 60 1 2 2 1 4 NA NA 45 70 NA NA 2 2 NA NA where rel1...3 states the kinship with the respondent (person 0) code 1 meaning husband/wife, code 4 meaning parent and code 3 for children. I would like to get the age for husbands (code 1) in a first column and wife's age in the second: ageh agew 25 23 34 35 39 40 My solution uses *for* loops and *if*s checking for code 1 in each element in the first 3 columns, then checking in the last three columns for husband's code, then taking the corresponding age in a new matrix. I've learned that *for* loops are very slow (and indeed with my dataset of some 2000 rows and 13 columns for kinship it takes quite a lot). I found the "Looping" chapter in "S poetry" very useful (it did saved me from *for* loops a couple of times, thanks!). Any hints would be appreciated, Adrian ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adrian Dusa (adi@roda.ro) Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/> ) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania Tel./Fax: +40 (21) 312.66.18\ +40 (21) 312.02.10/ int.101 [[alternative HTML version deleted]]
Define function f to take a vector as input representing a single input row. f should (1) transform this to a vector representing the required row of output or else (2) produce NULL if no row is to be output for that input row. Then use this code where z is your input matrix: t( matrix( unlist( apply( z, 1, f ) ), 2) ) --- Date: Wed, 17 Dec 2003 21:28:05 +0200 From: Adrian Dusa <adi at roda.ro> To: <r-help at stat.math.ethz.ch> Subject: [R] beginner programming question Hi all, The last e-mails about beginners gave me the courage to post a question; from a beginner's perspective, there are a lot of questions that I'm tempted to ask. But I'm trying to find the answers either in the documentation, either in the about 15 free books I have, either in the help archives (I often found many similar questions posted in the past). Being an (still actual) user of SPSS, I'd like to be able to do everything in R. I've learned that the best way of doing it is to struggle and find a solution no matter what, refraining from doing it with SPSS. I've became more and more aware of the almost unlimited possibilities that R offers and I'd like to completely switch to R whenever I think I'm ready. I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. A toy example would be: rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 3 NA 25 23 2 NA 1 2 1 NA 4 1 3 35 67 34 10 2 2 1 2 1 4 4 39 40 59 60 1 2 2 1 4 NA NA 45 70 NA NA 2 2 NA NA where rel1...3 states the kinship with the respondent (person 0) code 1 meaning husband/wife, code 4 meaning parent and code 3 for children. I would like to get the age for husbands (code 1) in a first column and wife's age in the second: ageh agew 25 23 34 35 39 40 My solution uses *for* loops and *if*s checking for code 1 in each element in the first 3 columns, then checking in the last three columns for husband's code, then taking the corresponding age in a new matrix. I've learned that *for* loops are very slow (and indeed with my dataset of some 2000 rows and 13 columns for kinship it takes quite a lot). I found the "Looping" chapter in "S poetry" very useful (it did saved me from *for* loops a couple of times, thanks!). Any hints would be appreciated, Adrian ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adrian Dusa (adi at roda.ro) Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/>; ) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania
> From: "Gabor Grothendieck" <ggrothendieck at myway.com> > Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST) > > Define function f to take a vector as input representing > a single input row. f should (1) transform this to a vector > representing the required row of output or else (2) produce > NULL if no row is to be output for that input row. > > Then use this code where z is your input matrix: > > t( matrix( unlist( apply( z, 1, f ) ), 2) ) >But as has been pointed out recently, apply really is still just a for loop.> > From: Adrian Dusa <adi at roda.ro> > > Date: Wed, 17 Dec 2003 21:28:05 +0200 > > > > I have a (rather theoretical) programming problem for which I have found > > a solution, but I feel it is a rather poor one. I wonder if there's some > > other (more clever) solution, using (maybe?) vectorization or > > subscripting.Here is a subscripting solution, where (for consistency with above) z is your data [from read.table(filename, header=T)]:> zrel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 1 3 NA 25 23 2 NA 1 2 1 NA 2 4 1 3 35 67 34 10 2 2 1 2 3 1 4 4 39 40 59 60 1 2 2 1 4 4 NA NA 45 70 NA NA 2 2 NA NA> res <- matrix(NA, nrow=length(z[, 1]), ncol=2,dimnames=list(rownames=rownames(z), colnames=c("ageh", "agew")))> w <- w0 <- w1 <- w2 <- which(z[, c("rel1", "rel2", "rel3")] == 1, T)# find spouse entries> w0[, 2] <- z[, "sex0"][w[, 1]] # indices for respondent's age > w1[, 2] <- 3 - w0[, 2] # indices for spouse's age > w2[, 2] <- 4 + w[, 2] # indices of spouse's age > res[w0] <- z[, "age0"][w[, 1]] # set respondent's age > res[w1] <- z[w2] # set spouse's age > rescolnames rownames ageh agew 1 25 23 2 34 35 3 39 40 4 NA NA>Ray Brownrigg
This is just a response to the part where you refer to an apply loop really being a for loop. In a sense this true, but it should nevertheless be recognized that the apply solution has a number of advantages over for: - it nicely separates the problem into a single line that is independent of the details of the problem and localizes them in f - the rows are pasted together automatically avoiding messy appending or creation and filling in of a structure - it avoids the use of indices Of course, some apply loops come pretty close to for loops. For example, consider this variation: t( matrix( unlist (sapply( 1:nrow(z), function(i) f(z[i,]) ) ), 2 )) and compare it to the for loop: out <- NULL for ( i in 1:nrow(z) ) { v <- f( z[i,] ) if ( ! is.null(v) ) out <- rbind( out, v ) } but even this apply, which is clearly inferior to the one in my original posting, retains the first two advantages listed. --- Date: Thu, 18 Dec 2003 10:04:52 +1300 (NZDT) From: Ray Brownrigg <ray at mcs.vuw.ac.nz> To: <adi at roda.ro>, <ggrothendieck at myway.com>, <r-help at stat.math.ethz.ch> Subject: RE: [R] beginner programming question> From: "Gabor Grothendieck" <ggrothendieck at myway.com> > Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST) > > Define function f to take a vector as input representing > a single input row. f should (1) transform this to a vector > representing the required row of output or else (2) produce > NULL if no row is to be output for that input row. > > Then use this code where z is your input matrix: > > t( matrix( unlist( apply( z, 1, f ) ), 2) ) >But as has been pointed out recently, apply really is still just a for loop.> > From: Adrian Dusa <adi at roda.ro> > > Date: Wed, 17 Dec 2003 21:28:05 +0200 > > > > I have a (rather theoretical) programming problem for which I have found > > a solution, but I feel it is a rather poor one. I wonder if there's some > > other (more clever) solution, using (maybe?) vectorization or > > subscripting.Here is a subscripting solution, where (for consistency with above) z is your data [from read.table(filename, header=T)]:> zrel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 1 3 NA 25 23 2 NA 1 2 1 NA 2 4 1 3 35 67 34 10 2 2 1 2 3 1 4 4 39 40 59 60 1 2 2 1 4 4 NA NA 45 70 NA NA 2 2 NA NA> res <- matrix(NA, nrow=length(z[, 1]), ncol=2,dimnames=list(rownames=rownames(z), colnames=c("ageh", "agew")))> w <- w0 <- w1 <- w2 <- which(z[, c("rel1", "rel2", "rel3")] == 1, T)# find spouse entries> w0[, 2] <- z[, "sex0"][w[, 1]] # indices for respondent's age > w1[, 2] <- 3 - w0[, 2] # indices for spouse's age > w2[, 2] <- 4 + w[, 2] # indices of spouse's age > res[w0] <- z[, "age0"][w[, 1]] # set respondent's age > res[w1] <- z[w2] # set spouse's age > rescolnames rownames ageh agew 1 25 23 2 34 35 3 39 40 4 NA NA>Ray Brownrigg
Another way to approach this is to first massage the data into a more regular format. This may or may not be simpler or faster than other solutions suggested. > x <- read.table("clipboard", header=T) > x rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 1 3 NA 25 23 2 NA 1 2 1 NA 2 4 1 3 35 67 34 10 2 2 1 2 3 1 4 4 39 40 59 60 1 2 2 1 4 4 NA NA 45 70 NA NA 2 2 NA NA > nn <- c("rel","age0","age","sex0","sex") > xx <- rbind("colnames<-"(x[,c("rel1","age0","age1","sex0","sex1")], nn), + "colnames<-"(x[,c("rel2","age0","age2","sex0","sex2")], nn), + "colnames<-"(x[,c("rel3","age0","age3","sex0","sex3")], nn)) > xx rel age0 age sex0 sex 1 1 25 23 1 2 2 4 35 67 2 2 3 1 39 40 1 2 4 4 45 70 2 2 11 3 25 2 1 1 21 1 35 34 2 1 31 4 39 59 1 2 41 NA 45 NA 2 NA 12 NA 25 NA 1 NA 22 3 35 10 2 2 32 4 39 60 1 1 42 NA 45 NA 2 NA > > rbind(subset(xx, xx$rel==1 & (xx$sex0==1 | xx$sex0==xx$sex))[,c("age0","age")], subset(xx, xx$rel==1 & xx$sex==1 & xx$sex0!=xx$sex)[,c("age","age0")]) age0 age 1 25 23 3 39 40 21 35 34 > hope this helps, Tony Plate PS. To advanced R users: Is the above usage of the "colnames<-" function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. At Wednesday 09:28 PM 12/17/2003 +0200, Adrian Dusa wrote:>Hi all, > > > >The last e-mails about beginners gave me the courage to post a question; >from a beginner's perspective, there are a lot of questions that I'm >tempted to ask. But I'm trying to find the answers either in the >documentation, either in the about 15 free books I have, either in the >help archives (I often found many similar questions posted in the past). > >Being an (still actual) user of SPSS, I'd like to be able to do >everything in R. I've learned that the best way of doing it is to >struggle and find a solution no matter what, refraining from doing it >with SPSS. I've became more and more aware of the almost unlimited >possibilities that R offers and I'd like to completely switch to R >whenever I think I'm ready. > > > >I have a (rather theoretical) programming problem for which I have found >a solution, but I feel it is a rather poor one. I wonder if there's some >other (more clever) solution, using (maybe?) vectorization or >subscripting. > > > >A toy example would be: > > > >rel1 rel2 rel3 age0 age1 age2 age3 >sex0 sex1 sex2 sex3 > >1 3 NA 25 23 2 NA >1 2 1 NA > >4 1 3 35 67 34 10 >2 2 1 2 > >1 4 4 39 40 59 60 >1 2 2 1 > >4 NA NA 45 70 NA NA >2 2 NA NA > > > >where rel1...3 states the kinship with the respondent (person 0) > >code 1 meaning husband/wife, code 4 meaning parent and code 3 for >children. > > > >I would like to get the age for husbands (code 1) in a first column and >wife's age in the second: > > > >ageh agew > >25 23 > >34 35 > >39 40 > > > >My solution uses *for* loops and *if*s checking for code 1 in each >element in the first 3 columns, then checking in the last three columns >for husband's code, then taking the corresponding age in a new matrix. >I've learned that *for* loops are very slow (and indeed with my dataset >of some 2000 rows and 13 columns for kinship it takes quite a lot). > >I found the "Looping" chapter in "S poetry" very useful (it did saved me >from *for* loops a couple of times, thanks!). > > > >Any hints would be appreciated, > >Adrian > > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Adrian Dusa (adi at roda.ro) >Romanian Social Data Archive (www.roda.ro <http://www.roda.ro/> ) >1, Schitu Magureanu Bd. >76625 Bucharest sector 5 >Romania > > >Tel./Fax: > >+40 (21) 312.66.18\ > >+40 (21) 312.02.10/ int.101 > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Thank you all! I did it, and it worked just fine. In the last week I've been torturing the syntaxes in various ways, until finally it was all clear. The subscripting solution opened new doors for me. Particularly, the reshape command gave me about three days of a head ache. I read the help about 20 times, trying to figure out how to do it; the trouble with the help was that it doesn't present examples of reshaping for multiple sets of varying variables, nor that the new variables' names in the long format should be defined as a vector with the v.names attribute. Anyway, the syntax is:> x <- read.table("clipboard", header=T) > xrel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 1 3 NA 25 23 2 NA 1 2 1 NA 2 4 1 3 35 67 34 10 2 2 1 2 3 1 4 4 39 40 59 60 1 2 2 1 4 4 NA NA 45 70 NA NA 2 2 NA NA> xx <- reshape(x, varying=list(names(x)[1:3], names(x)[5:7],+ names(x)[9:11]), v.names=c("rel", "age", "sex"), direction="long")> xxage0 sex0 time rel age sex id 1.1 25 1 1 1 23 2 1 2.1 35 2 1 4 67 2 2 3.1 39 1 1 1 40 2 3 4.1 45 2 1 4 70 2 4 1.2 25 1 2 3 2 1 1 2.2 35 2 2 1 34 1 2 3.2 39 1 2 4 59 2 3 4.2 45 2 2 NA NA NA 4 1.3 25 1 3 NA NA NA 1 2.3 35 2 3 3 10 2 2 3.3 39 1 3 4 60 1 3 4.3 45 2 3 NA NA NA 4> xx <- subset(xx, xx$rel==1) > rbind(subset(xx, xx$sex0==1)[,c("age0","age")],+ subset(xx, xx$sex==1)[,c("age","age0")]) age0 age 1.1 25 23 3.1 39 40 2.2 35 34 I wish you a Merry Xmas, you are a truly great community. Adrian -----Original Message----- From: Thomas Lumley [mailto:tlumley at u.washington.edu] Sent: Thursday, December 18, 2003 5:53 PM To: Tony Plate Cc: adi at roda.ro; r-help at stat.math.ethz.ch Subject: Re: [R] beginner programming question On Wed, 17 Dec 2003, Tony Plate wrote:> Another way to approach this is to first massage the data into a more > regular format. This may or may not be simpler or faster than other > solutions suggested.You could also use the reshape() command to do the massaging -thomas> > x <- read.table("clipboard", header=T) > > x > rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 > 1 1 3 NA 25 23 2 NA 1 2 1 NA > 2 4 1 3 35 67 34 10 2 2 1 2 > 3 1 4 4 39 40 59 60 1 2 2 1 > 4 4 NA NA 45 70 NA NA 2 2 NA NA > > nn <- c("rel","age0","age","sex0","sex") > > xx <- rbind("colnames<-"(x[,c("rel1","age0","age1","sex0","sex1")], nn), > + "colnames<-"(x[,c("rel2","age0","age2","sex0","sex2")], nn), > + "colnames<-"(x[,c("rel3","age0","age3","sex0","sex3")], nn)) > > xx > rel age0 age sex0 sex > 1 1 25 23 1 2 > 2 4 35 67 2 2 > 3 1 39 40 1 2 > 4 4 45 70 2 2 > 11 3 25 2 1 1 > 21 1 35 34 2 1 > 31 4 39 59 1 2 > 41 NA 45 NA 2 NA > 12 NA 25 NA 1 NA > 22 3 35 10 2 2 > 32 4 39 60 1 1 > 42 NA 45 NA 2 NA > > > > rbind(subset(xx, xx$rel==1 & (xx$sex0==1 | > xx$sex0==xx$sex))[,c("age0","age")], subset(xx, xx$rel==1 & xx$sex==1 & > xx$sex0!=xx$sex)[,c("age","age0")]) > age0 age > 1 25 23 > 3 39 40 > 21 35 34 > > > > hope this helps, > > Tony Plate > > PS. To advanced R users: Is the above usage of the "colnames<-" function > within an expression regarded as acceptable or as undesirable programming > style? -- I've rarely seen it used, but it can be quite useful.------------------------------------------------- This mail sent through IMP: http://horde.org/imp/