Tom Cohen
2008-Jan-15 23:42 UTC
[R] help with reshaping data into long format (correct question)
Dear list,
I have the following data set
id 1 2 3 4 5 6 7 8 9 10
disease a b c d e f g h i j
age 23 40 32 34 25 32 22 35 29 21
city NY LD NY SG NY LD VG SA LD SG
sex 1 1 2 2 2 2 1 1 1 2
treat_a y y y y
treat_b n n n n n n
ques1_1 2 4 5 6 8 3 1 2 4 5
ques1_2 6 4 5 12 10 9 8 4 5 7
ques1_3 17 23 32 25 14 24 23 22 32 29
ques2_1 4 7 9 10 6 8 5 7 8 9
ques2_2 8 9 10 12 17 19 14 21 22 19
ques2_3 23 18 19 20 23 24 26 28 29 22
ques3_1 5 7 9 1 4 7 9 8 10 5
ques3_2 34 35 32 23 31 29 27 25 32 33
ques3_3 29 33 27 25 27 23 24 29 27 24
where the first row is the header row in a dataframe. First I want to merge the
two variables
treat_a and treat_b to a new variable called "treat" which will be
given n if it's left blank
in the variable treat_a and y if it's left blank in treat_b. The new data
set will look like
id 1 2 3 4 5 6 7 8 9 10
disease a b c d e f g h i j
age 23 40 32 34 25 32 22 35 29 21
city NY LD NY SG NY LD VG SA LD SG
sex 1 1 2 2 2 2 1 1 1 2
treat n n n y y y n n y n
ques1_1 2 4 5 6 8 3 1 2 4 5
ques1_2 6 4 5 12 10 9 8 4 5 7
ques1_3 17 23 32 25 14 24 23 22 32 29
ques2_1 4 7 9 10 6 8 5 7 8 9
ques2_2 8 9 10 12 17 19 14 21 22 19
ques2_3 23 18 19 20 23 24 26 28 29 22
ques3_1 5 7 9 1 4 7 9 8 10 5
ques3_2 34 35 32 23 31 29 27 25 32 33
ques3_3 29 33 27 25 27 23 24 29 27 24
Now I want to reshape the data in a long format with target output
id disease age city sex treat ques ques_value
1 a 23 NY 1 n 1_1 2
1 a 23 NY 1 n 1_2 6
1 a 23 NY 1 n 1_3 17
1 a 23 NY 1 n 2_1 4
1 a 23 NY 1 n 2_2 8
1 a 23 NY 1 n 2_3 23
1 a 23 NY 1 n 3_1 5
1 a 23 NY 1 n 3_2 34
1 a 23 NY 1 n 3_3 29
2 b 40 LD 1 n 1_1 4
2 b 40 LD 1 n 1_2 4
2 b 40 LD 1 n 1_3 23
2 b 40 LD 1 n 2_1 7
2 b 40 LD 1 n 2_2 9
2 b 40 LD 1 n 2_3 18
2 b 40 LD 1 n 3_1 7
2 b 40 LD 1 n 3_2 35
2 b 40 LD 1 n 3_3 33
..
..
..
10 j 21 SG 2 n 3_3 24
How can I do this in R?
Thanks alot for any help,
Tom
---------------------------------
Jmfr pris p flygbiljetter och hotellrum:
http://shopping.yahoo.se/c-169901-resor-biljetter.html
[[alternative HTML version deleted]]
Henrique Dallazuanna
2008-Jan-16 12:44 UTC
[R] help with reshaping data into long format (correct question)
try this:
x[6, which(x[5,]=="y")] <- "y"
levels(x$id) <- c(levels(x$id)[drop=T], "treat")
x <- x[-5,]
x[5, "id"] <- "treat"
levels(x$id) <- gsub("^ques", "", levels(x$id))
x3 <- as.data.frame(t(x[,-1]))
names(x3) <- x$id
foo <- function(x, ...)
{
tmp <- as.numeric(as.character(unlist(x[,grep("_", names(x),
value=T)])))
y <- x[,c("disease", "age", "city",
"sex", "treat")][rep(1,length(tmp)),]
newdf <- data.frame(y, quess=grep("_", names(x), value=T),
value=tmp)
return(newdf)
}
do.call(rbind, lapply(x4, foo))
On 15/01/2008, Tom Cohen <tom.cohen78 at yahoo.se>
wrote:>
> Dear list,
> I have the following data set
>
> id 1 2 3 4 5 6 7 8 9 10
> disease a b c d e f g h i j
> age 23 40 32 34 25 32 22 35 29 21
> city NY LD NY SG NY LD VG SA LD SG
> sex 1 1 2 2 2 2 1 1 1 2
> treat_a y y y y
> treat_b n n n n n n
> ques1_1 2 4 5 6 8 3 1 2 4 5
> ques1_2 6 4 5 12 10 9 8 4 5 7
> ques1_3 17 23 32 25 14 24 23 22 32 29
> ques2_1 4 7 9 10 6 8 5 7 8 9
> ques2_2 8 9 10 12 17 19 14 21 22 19
> ques2_3 23 18 19 20 23 24 26 28 29 22
> ques3_1 5 7 9 1 4 7 9 8 10 5
> ques3_2 34 35 32 23 31 29 27 25 32 33
> ques3_3 29 33 27 25 27 23 24 29 27 24
>
> where the first row is the header row in a dataframe. First I want to merge
the two variables
> treat_a and treat_b to a new variable called "treat" which will
be given n if it's left blank
> in the variable treat_a and y if it's left blank in treat_b. The new
data set will look like
> id 1 2 3 4 5 6 7 8 9 10
> disease a b c d e f g h i j
> age 23 40 32 34 25 32 22 35 29 21
> city NY LD NY SG NY LD VG SA LD SG
> sex 1 1 2 2 2 2 1 1 1 2
> treat n n n y y y n n y n
> ques1_1 2 4 5 6 8 3 1 2 4 5
> ques1_2 6 4 5 12 10 9 8 4 5 7
> ques1_3 17 23 32 25 14 24 23 22 32 29
> ques2_1 4 7 9 10 6 8 5 7 8 9
> ques2_2 8 9 10 12 17 19 14 21 22 19
> ques2_3 23 18 19 20 23 24 26 28 29 22
> ques3_1 5 7 9 1 4 7 9 8 10 5
> ques3_2 34 35 32 23 31 29 27 25 32 33
> ques3_3 29 33 27 25 27 23 24 29 27 24
> Now I want to reshape the data in a long format with target output
>
> id disease age city sex treat ques ques_value
> 1 a 23 NY 1 n 1_1 2
> 1 a 23 NY 1 n 1_2 6
> 1 a 23 NY 1 n 1_3 17
> 1 a 23 NY 1 n 2_1 4
> 1 a 23 NY 1 n 2_2 8
> 1 a 23 NY 1 n 2_3 23
> 1 a 23 NY 1 n 3_1 5
> 1 a 23 NY 1 n 3_2 34
> 1 a 23 NY 1 n 3_3 29
> 2 b 40 LD 1 n 1 _1 4
> 2 b 40 LD 1 n 1 _2 4
> 2 b 40 LD 1 n 1 _3 23
> 2 b 40 LD 1 n 2_1 7
> 2 b 40 LD 1 n 2_2 9
> 2 b 40 LD 1 n 2_3 18
> 2 b 40 LD 1 n 3_1 7
> 2 b 40 LD 1 n 3_2 35
> 2 b 40 LD 1 n 3_3 33
> ..
> ..
> ..
> 10 j 21 SG 2 n 3_3 24
> How can I do this in R?
> Thanks alot for any help,
> Tom
>
>
> ---------------------------------
>
> J?mf?r pris p? flygbiljetter och hotellrum:
http://shopping.yahoo.se/c-169901-resor-biljetter.html
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O