Tom Cohen
2008-Jan-15 23:42 UTC
[R] help with reshaping data into long format (correct question)
Dear list, I have the following data set id 1 2 3 4 5 6 7 8 9 10 disease a b c d e f g h i j age 23 40 32 34 25 32 22 35 29 21 city NY LD NY SG NY LD VG SA LD SG sex 1 1 2 2 2 2 1 1 1 2 treat_a y y y y treat_b n n n n n n ques1_1 2 4 5 6 8 3 1 2 4 5 ques1_2 6 4 5 12 10 9 8 4 5 7 ques1_3 17 23 32 25 14 24 23 22 32 29 ques2_1 4 7 9 10 6 8 5 7 8 9 ques2_2 8 9 10 12 17 19 14 21 22 19 ques2_3 23 18 19 20 23 24 26 28 29 22 ques3_1 5 7 9 1 4 7 9 8 10 5 ques3_2 34 35 32 23 31 29 27 25 32 33 ques3_3 29 33 27 25 27 23 24 29 27 24 where the first row is the header row in a dataframe. First I want to merge the two variables treat_a and treat_b to a new variable called "treat" which will be given n if it's left blank in the variable treat_a and y if it's left blank in treat_b. The new data set will look like id 1 2 3 4 5 6 7 8 9 10 disease a b c d e f g h i j age 23 40 32 34 25 32 22 35 29 21 city NY LD NY SG NY LD VG SA LD SG sex 1 1 2 2 2 2 1 1 1 2 treat n n n y y y n n y n ques1_1 2 4 5 6 8 3 1 2 4 5 ques1_2 6 4 5 12 10 9 8 4 5 7 ques1_3 17 23 32 25 14 24 23 22 32 29 ques2_1 4 7 9 10 6 8 5 7 8 9 ques2_2 8 9 10 12 17 19 14 21 22 19 ques2_3 23 18 19 20 23 24 26 28 29 22 ques3_1 5 7 9 1 4 7 9 8 10 5 ques3_2 34 35 32 23 31 29 27 25 32 33 ques3_3 29 33 27 25 27 23 24 29 27 24 Now I want to reshape the data in a long format with target output id disease age city sex treat ques ques_value 1 a 23 NY 1 n 1_1 2 1 a 23 NY 1 n 1_2 6 1 a 23 NY 1 n 1_3 17 1 a 23 NY 1 n 2_1 4 1 a 23 NY 1 n 2_2 8 1 a 23 NY 1 n 2_3 23 1 a 23 NY 1 n 3_1 5 1 a 23 NY 1 n 3_2 34 1 a 23 NY 1 n 3_3 29 2 b 40 LD 1 n 1_1 4 2 b 40 LD 1 n 1_2 4 2 b 40 LD 1 n 1_3 23 2 b 40 LD 1 n 2_1 7 2 b 40 LD 1 n 2_2 9 2 b 40 LD 1 n 2_3 18 2 b 40 LD 1 n 3_1 7 2 b 40 LD 1 n 3_2 35 2 b 40 LD 1 n 3_3 33 .. .. .. 10 j 21 SG 2 n 3_3 24 How can I do this in R? Thanks alot for any help, Tom --------------------------------- Jmfr pris p flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html [[alternative HTML version deleted]]
Henrique Dallazuanna
2008-Jan-16 12:44 UTC
[R] help with reshaping data into long format (correct question)
try this: x[6, which(x[5,]=="y")] <- "y" levels(x$id) <- c(levels(x$id)[drop=T], "treat") x <- x[-5,] x[5, "id"] <- "treat" levels(x$id) <- gsub("^ques", "", levels(x$id)) x3 <- as.data.frame(t(x[,-1])) names(x3) <- x$id foo <- function(x, ...) { tmp <- as.numeric(as.character(unlist(x[,grep("_", names(x), value=T)]))) y <- x[,c("disease", "age", "city", "sex", "treat")][rep(1,length(tmp)),] newdf <- data.frame(y, quess=grep("_", names(x), value=T), value=tmp) return(newdf) } do.call(rbind, lapply(x4, foo)) On 15/01/2008, Tom Cohen <tom.cohen78 at yahoo.se> wrote:> > Dear list, > I have the following data set > > id 1 2 3 4 5 6 7 8 9 10 > disease a b c d e f g h i j > age 23 40 32 34 25 32 22 35 29 21 > city NY LD NY SG NY LD VG SA LD SG > sex 1 1 2 2 2 2 1 1 1 2 > treat_a y y y y > treat_b n n n n n n > ques1_1 2 4 5 6 8 3 1 2 4 5 > ques1_2 6 4 5 12 10 9 8 4 5 7 > ques1_3 17 23 32 25 14 24 23 22 32 29 > ques2_1 4 7 9 10 6 8 5 7 8 9 > ques2_2 8 9 10 12 17 19 14 21 22 19 > ques2_3 23 18 19 20 23 24 26 28 29 22 > ques3_1 5 7 9 1 4 7 9 8 10 5 > ques3_2 34 35 32 23 31 29 27 25 32 33 > ques3_3 29 33 27 25 27 23 24 29 27 24 > > where the first row is the header row in a dataframe. First I want to merge the two variables > treat_a and treat_b to a new variable called "treat" which will be given n if it's left blank > in the variable treat_a and y if it's left blank in treat_b. The new data set will look like > id 1 2 3 4 5 6 7 8 9 10 > disease a b c d e f g h i j > age 23 40 32 34 25 32 22 35 29 21 > city NY LD NY SG NY LD VG SA LD SG > sex 1 1 2 2 2 2 1 1 1 2 > treat n n n y y y n n y n > ques1_1 2 4 5 6 8 3 1 2 4 5 > ques1_2 6 4 5 12 10 9 8 4 5 7 > ques1_3 17 23 32 25 14 24 23 22 32 29 > ques2_1 4 7 9 10 6 8 5 7 8 9 > ques2_2 8 9 10 12 17 19 14 21 22 19 > ques2_3 23 18 19 20 23 24 26 28 29 22 > ques3_1 5 7 9 1 4 7 9 8 10 5 > ques3_2 34 35 32 23 31 29 27 25 32 33 > ques3_3 29 33 27 25 27 23 24 29 27 24 > Now I want to reshape the data in a long format with target output > > id disease age city sex treat ques ques_value > 1 a 23 NY 1 n 1_1 2 > 1 a 23 NY 1 n 1_2 6 > 1 a 23 NY 1 n 1_3 17 > 1 a 23 NY 1 n 2_1 4 > 1 a 23 NY 1 n 2_2 8 > 1 a 23 NY 1 n 2_3 23 > 1 a 23 NY 1 n 3_1 5 > 1 a 23 NY 1 n 3_2 34 > 1 a 23 NY 1 n 3_3 29 > 2 b 40 LD 1 n 1 _1 4 > 2 b 40 LD 1 n 1 _2 4 > 2 b 40 LD 1 n 1 _3 23 > 2 b 40 LD 1 n 2_1 7 > 2 b 40 LD 1 n 2_2 9 > 2 b 40 LD 1 n 2_3 18 > 2 b 40 LD 1 n 3_1 7 > 2 b 40 LD 1 n 3_2 35 > 2 b 40 LD 1 n 3_3 33 > .. > .. > .. > 10 j 21 SG 2 n 3_3 24 > How can I do this in R? > Thanks alot for any help, > Tom > > > --------------------------------- > > J?mf?r pris p? flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O