Jon Minton
2007-Sep-22 20:07 UTC
[R] reshape() to wide with varying number of responses to fields
Hello, I have a dataframe containing the following variables: PID, Field, Value Where PID refers to a unique individual, Field to a particular question, and Value to a particular response to a question. I?d like this in wide format, with a different row for each PID. However.... there are differing numbers of Values associated with each Field, for each PID. For example, a Field question may be ?Which of the following are important for you:? ? some people tick one box in response to it, other 3 or 4: thus PID A might generate 3 Values for Field X, whereas PID B would only generate 1 Value: PID Field Value A X a A X b A X c B X b Etc. What I?d like is a wide format, with one Pid per row, and enough columns for each of the possible values the PID could provide for a particular field, filled in with the values if they exist, or NAs if they don?t: PID X.1 X.2 X.3 A a b c B b NA NA Etc. I already know the maximum number of Values any PID has provided for each Field (i.e. the maximum number of columns to allow for each Field response), but don?t know how to create this dataframe using reshape (or if reshape is the right function to use). Any help much appreciated. I?ve not seen a ?how-to? regarding this in the archive. Thanks, Jon Checked by AVG Free Edition. 20/09/2007 12:07
hadley wickham
2007-Sep-22 22:24 UTC
[R] reshape() to wide with varying number of responses to fields
Hi Jon, You can do this with the reshape package. Your data are already in "molten" form, so you can just do: cast(rename(df, c("Value" = "value"), PID ~ Field) You can find out more about reshape at http://had.co.nz/reshape Hadley On 9/22/07, Jon Minton <jm540 at york.ac.uk> wrote:> Hello, > > I have a dataframe containing the following variables: > > PID, Field, Value > > Where PID refers to a unique individual, Field to a particular question, and > Value to a particular response to a question. > > I'd like this in wide format, with a different row for each PID. > However.... there are differing numbers of Values associated with each > Field, for each PID. > For example, a Field question may be "Which of the following are important > for you:" ? some people tick one box in response to it, other 3 or 4: thus > PID A might generate 3 Values for Field X, whereas PID B would only generate > 1 Value: > > PID Field Value > A X a > A X b > A X c > B X b > Etc. > What I'd like is a wide format, with one Pid per row, and enough columns for > each of the possible values the PID could provide for a particular field, > filled in with the values if they exist, or NAs if they don't: > > PID X.1 X.2 X.3 > A a b c > B b NA NA > Etc. > > I already know the maximum number of Values any PID has provided for each > Field (i.e. the maximum number of columns to allow for each Field response), > but don't know how to create this dataframe using reshape (or if reshape is > the right function to use). > > > Any help much appreciated. I've not seen a 'how-to' regarding this in the > archive. > Thanks, Jon > > > > > Checked by AVG Free Edition. > > 20/09/2007 > 12:07 > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- http://had.co.nz/