Hello: I am running a simulation study and am stuck with a subsetting problem. Here is the basic issue: I generated data and am running a simulation that uses multiple imputation. For each generated dataset, I used multiple imputation. The resultant dataset is in wide for where each imputation is recorded as a separate column (though the different simulations are stacked). Here is an example of what it looks like: sim X1 X2 X3 sim.1 X1.1 X1.1 X3.1 1 # # # # # # # 1 # # # # # # # 1 # # # # # # # 2 # # # # # # # 2 # # # # # # # 2 # # # # # # # sim refers to the simulated/generated dataset. X1-X3 are the values for the first imputed dataset, X1.1-X3.1 are the values for the second imputed dataset. The problem is that I want the data to be in long format, like this: sim m X1 X2 X3 1 1 # # # 1 2 # # # 2 1 # # # 2 2 # # # where m is the imputation number. This will allow me to do cleaner calculations (e.g. X3-X1). I know I can subset the data manually - e.g. [,1:10] and save this to separate datasets then rbind; however, I'm looking for a more flexible approach to do this. This manual approach would be quite tedious as number of imputations (and therefore number of columns) increased (with only 10 imputations, there are roughly 810 columns). Also,I would like to avoid having to recode each time I change the number of imputations. THe same is true for the reshape function, which would require naming a huge number of columns and edits each time 'm' changes. Is there a flexible way to approach this? I'm inclined to use a for loop, but know that 1) this is generally inefficient and 2) am having trouble with the coding regardless. Any suggestions are appreciated. Thanks, Andrea -- Andrea Lamont, MA Clinical-Community Psychology University of South Carolina Barnwell College Columbia, SC 29208 Please consider the environment before printing this email. CONFIDENTIAL: This transmission is intended for the use of the individual(s) or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Should the reader of this message not be the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy/delete all copies of the original message. [[alternative HTML version deleted]]
Hi, It is better to provide a reproducible example using ?dput() df1<- read.table(text=" sim?? X1?? X2?? X3?? sim.1?? X1.1??? X2.1??? X3.1 1???? 5??? 4???? 5??????? 1?????????? 4????????? 3??????? 7 1???? 4??? 3???? 2??????? 1?????????? 7????????? 4???????? 1 1???? 3??? 9???? 4??????? 1?????????? 5????????? 8???????? 4 2???? 6??? 4???? 8??????? 2?????????? 3????????? 9???????? 5 2???? 7??? 8???? 4??????? 2?????????? 5????????? 4???????? 8 2???? 9??? 6???? 7??????? 2?????????? 9????????? 5???????? 6 ",sep="",header=TRUE) res<-reshape(df1,sep=".",varying=list(c("sim","sim.1"),c("X1","X1.1"),c("X2","X2.1"),c("X3","X3.1")),direction="long",timevar="m")[,-5] res ??? m sim X1 X2 id 1.1 1?? 1? 5? 4? 1 2.1 1?? 1? 4? 3? 2 3.1 1?? 1? 3? 9? 3 4.1 1?? 2? 6? 4? 4 5.1 1?? 2? 7? 8? 5 6.1 1?? 2? 9? 6? 6 1.2 2?? 1? 4? 3? 1 2.2 2?? 1? 7? 4? 2 3.2 2?? 1? 5? 8? 3 4.2 2?? 2? 3? 9? 4 5.2 2?? 2? 5? 4? 5 6.2 2?? 2? 9? 5? 6 ? A.K. ----- Original Message ----- From: Andrea Lamont <alamont082 at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, July 23, 2013 10:35 AM Subject: [R] flexible approach to subsetting data Hello: I am running a simulation study and am stuck with a subsetting problem. Here is the basic issue: I generated data and am running a simulation that uses multiple imputation. For each generated dataset, I used multiple imputation.? The resultant dataset is in wide for where each imputation is recorded as a separate column (though the different simulations are stacked).? Here is an example of what it looks like: sim? X1? X2? X3? sim.1? X1.1? ? X1.1? ? X3.1 1? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # 1? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # 1? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # 2? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # 2? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # 2? ? ? ? #? ? #? ? #? ? ? ? #? ? ? ? ? #? ? ? ? ? #? ? ? ? # sim refers to the simulated/generated dataset. X1-X3 are the values for the first imputed dataset, X1.1-X3.1 are the values for the second imputed dataset. The problem is that I want the data to be in long format, like this: sim m X1 X2 X3 1? 1? #? #? ? # 1? 2? #? #? ? # 2? 1? #? #? ? # 2? 2? #? #? ? # where m is the imputation number. This will allow me to do cleaner calculations (e.g. X3-X1). I know I can subset the data manually - e.g. [,1:10] and save this to separate datasets then? rbind; however, I'm looking for a more flexible approach to do this.? This manual approach would be quite tedious as number of imputations (and therefore number of columns) increased (with only 10 imputations, there are roughly 810 columns). Also,I would like to avoid having to recode each time I change the number of imputations. THe same is true for the reshape function, which would require naming a huge number of columns and edits each time 'm' changes. Is there a flexible way to approach this? I'm inclined to use a for loop, but know that 1) this is generally inefficient and 2) am having trouble with the coding regardless. Any suggestions are appreciated. Thanks, Andrea -- Andrea Lamont, MA Clinical-Community Psychology University of South Carolina Barnwell College Columbia, SC 29208 Please consider the environment before printing this email. CONFIDENTIAL: This transmission is intended for the use of the individual(s) or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Should the reader of this message not be the intended recipient(s), you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited.? If you are not the intended recipient, please contact the sender by reply email and destroy/delete all copies of the original message. ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Check out the reshape() function of the reshape package. Here's one of the examples from ?reshape. Jean library(reshape) wide <- reshape(Indometh, v.names="conc", idvar="Subject", timevar="time", direction="wide") long <- reshape(wide, direction="long") wide long On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont <alamont082@gmail.com> wrote:> Hello: > > I am running a simulation study and am stuck with a subsetting problem. > > Here is the basic issue: > I generated data and am running a simulation that uses multiple imputation. > For each generated dataset, I used multiple imputation. The resultant > dataset is in wide for where each imputation is recorded as a separate > column (though the different simulations are stacked). Here is an example > of what it looks like: > > sim X1 X2 X3 sim.1 X1.1 X1.1 X3.1 > 1 # # # # # # # > 1 # # # # # # # > 1 # # # # # # # > 2 # # # # # # # > 2 # # # # # # # > 2 # # # # # # # > > sim refers to the simulated/generated dataset. X1-X3 are the values for the > first imputed dataset, X1.1-X3.1 are the values for the second imputed > dataset. > > The problem is that I want the data to be in long format, like this: > > sim m X1 X2 X3 > 1 1 # # # > 1 2 # # # > 2 1 # # # > 2 2 # # # > > where m is the imputation number. > This will allow me to do cleaner calculations (e.g. X3-X1). > > I know I can subset the data manually - e.g. [,1:10] and save this to > separate datasets then rbind; however, I'm looking for a more flexible > approach to do this. This manual approach would be quite tedious as number > of imputations (and therefore number of columns) increased (with only 10 > imputations, there are roughly 810 columns). Also,I would like to > avoid having to recode each time I change the number of imputations. > > THe same is true for the reshape function, which would require naming > a huge number of columns and edits each time 'm' changes. > > > Is there a flexible way to approach this? I'm inclined to use a for loop, > but know that 1) this is generally inefficient and 2) am having trouble > with > the coding regardless. > > Any suggestions are appreciated. > > Thanks, > Andrea > > > -- > Andrea Lamont, MA > Clinical-Community Psychology > University of South Carolina > Barnwell College > Columbia, SC 29208 > > Please consider the environment before printing this email. > > CONFIDENTIAL: This transmission is intended for the use of the > individual(s) or entity to which it is addressed, and may contain > information that is privileged, confidential, and exempt from disclosure > under applicable law. Should the reader of this message not be the intended > recipient(s), you are hereby notified that any dissemination, distribution, > or copying of this communication is strictly prohibited. If you are not > the intended recipient, please contact the sender by reply email and > destroy/delete all copies of the original message. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]