thr3ads.net - R help - [R] flexible approach to subsetting data [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Andrea Lamont

2013-Jul-23 14:35 UTC

[R] flexible approach to subsetting data

Hello:

I am running a simulation study and am stuck with a subsetting problem.

Here is the basic issue:
I generated data and am running a simulation that uses multiple imputation.
For each generated dataset, I used multiple imputation.  The resultant
dataset is in wide for where each imputation is recorded as a separate
column (though the different simulations are stacked).  Here is an example
of what it looks like:

sim   X1   X2   X3   sim.1   X1.1    X1.1    X3.1
1         #    #     #        #           #          #         #
1         #    #     #        #           #          #         #
1         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #

sim refers to the simulated/generated dataset. X1-X3 are the values for the
first imputed dataset, X1.1-X3.1 are the values for the second imputed
dataset.

The problem is that I want the data to be in long format, like this:

sim m X1 X2 X3
1  1   #   #    #
1  2   #   #    #
2  1   #   #    #
2  2   #   #    #

where m is the imputation number.
This will allow me to do cleaner calculations (e.g. X3-X1).

I know I can subset the data manually - e.g. [,1:10] and save this to
separate datasets then  rbind; however, I'm looking for a more flexible
approach to do this.  This manual approach would be quite tedious as number
of imputations (and therefore number of columns) increased (with only 10
imputations, there are roughly 810 columns). Also,I would like to
avoid having to recode each time I change the number of imputations.

THe same is true for the reshape function, which would require naming
a huge number of columns and edits each time 'm' changes.


Is there a flexible way to approach this? I'm inclined to use a for loop,
but know that 1) this is generally inefficient and 2) am having trouble with
the coding regardless.

Any suggestions are appreciated.

Thanks,
Andrea


-- 
Andrea Lamont, MA
Clinical-Community Psychology
University of South Carolina
Barnwell College
Columbia, SC 29208

Please consider the environment before printing this email.

CONFIDENTIAL: This transmission is intended for the use of the
individual(s) or entity to which it is addressed, and may contain
information that is privileged, confidential, and exempt from disclosure
under applicable law. Should the reader of this message not be the intended
recipient(s), you are hereby notified that any dissemination, distribution,
or copying of this communication is strictly prohibited.  If you are not
the intended recipient, please contact the sender by reply email and
destroy/delete all copies of the original message.

	[[alternative HTML version deleted]]

arun

2013-Jul-23 16:00 UTC

head link

[R] flexible approach to subsetting data

Hi,
It is better to provide a reproducible example using ?dput()
df1<- read.table(text="
sim?? X1?? X2?? X3?? sim.1?? X1.1??? X2.1??? X3.1
1???? 5??? 4???? 5??????? 1?????????? 4????????? 3??????? 7
1???? 4??? 3???? 2??????? 1?????????? 7????????? 4???????? 1
1???? 3??? 9???? 4??????? 1?????????? 5????????? 8???????? 4
2???? 6??? 4???? 8??????? 2?????????? 3????????? 9???????? 5
2???? 7??? 8???? 4??????? 2?????????? 5????????? 4???????? 8
2???? 9??? 6???? 7??????? 2?????????? 9????????? 5???????? 6
",sep="",header=TRUE)
res<-reshape(df1,sep=".",varying=list(c("sim","sim.1"),c("X1","X1.1"),c("X2","X2.1"),c("X3","X3.1")),direction="long",timevar="m")[,-5]
res
??? m sim X1 X2 id
1.1 1?? 1? 5? 4? 1
2.1 1?? 1? 4? 3? 2
3.1 1?? 1? 3? 9? 3
4.1 1?? 2? 6? 4? 4
5.1 1?? 2? 7? 8? 5
6.1 1?? 2? 9? 6? 6
1.2 2?? 1? 4? 3? 1
2.2 2?? 1? 7? 4? 2
3.2 2?? 1? 5? 8? 3
4.2 2?? 2? 3? 9? 4
5.2 2?? 2? 5? 4? 5
6.2 2?? 2? 9? 5? 6
?
A.K.




----- Original Message -----
From: Andrea Lamont <alamont082 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 23, 2013 10:35 AM
Subject: [R] flexible approach to subsetting data

Hello:

I am running a simulation study and am stuck with a subsetting problem.

Here is the basic issue:
I generated data and am running a simulation that uses multiple imputation.
For each generated dataset, I used multiple imputation.? The resultant
dataset is in wide for where each imputation is recorded as a separate
column (though the different simulations are stacked).? Here is an example
of what it looks like:

sim?  X1?  X2?  X3?  sim.1?  X1.1? ? X1.1? ? X3.1
1? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #
1? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #
1? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #
2? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #
2? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #
2? ? ? ?  #? ? #? ?  #? ? ? ? #? ? ? ? ?  #? ? ? ? ? #? ? ? ?  #

sim refers to the simulated/generated dataset. X1-X3 are the values for the
first imputed dataset, X1.1-X3.1 are the values for the second imputed
dataset.

The problem is that I want the data to be in long format, like this:

sim m X1 X2 X3
1? 1?  #?  #? ? #
1? 2?  #?  #? ? #
2? 1?  #?  #? ? #
2? 2?  #?  #? ? #

where m is the imputation number.
This will allow me to do cleaner calculations (e.g. X3-X1).

I know I can subset the data manually - e.g. [,1:10] and save this to
separate datasets then? rbind; however, I'm looking for a more flexible
approach to do this.? This manual approach would be quite tedious as number
of imputations (and therefore number of columns) increased (with only 10
imputations, there are roughly 810 columns). Also,I would like to
avoid having to recode each time I change the number of imputations.

THe same is true for the reshape function, which would require naming
a huge number of columns and edits each time 'm' changes.


Is there a flexible way to approach this? I'm inclined to use a for loop,
but know that 1) this is generally inefficient and 2) am having trouble with
the coding regardless.

Any suggestions are appreciated.

Thanks,
Andrea


-- 
Andrea Lamont, MA
Clinical-Community Psychology
University of South Carolina
Barnwell College
Columbia, SC 29208

Please consider the environment before printing this email.

CONFIDENTIAL: This transmission is intended for the use of the
individual(s) or entity to which it is addressed, and may contain
information that is privileged, confidential, and exempt from disclosure
under applicable law. Should the reader of this message not be the intended
recipient(s), you are hereby notified that any dissemination, distribution,
or copying of this communication is strictly prohibited.? If you are not
the intended recipient, please contact the sender by reply email and
destroy/delete all copies of the original message.

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Adams, Jean

2013-Jul-23 17:01 UTC

head link

[R] flexible approach to subsetting data

Check out the reshape() function of the reshape package.  Here's one of the
examples from ?reshape.

Jean


library(reshape)
wide <- reshape(Indometh, v.names="conc",
idvar="Subject", timevar="time",
direction="wide")
long <- reshape(wide, direction="long")
wide
long



On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont <alamont082@gmail.com>
wrote:
> Hello:
>
> I am running a simulation study and am stuck with a subsetting problem.
>
> Here is the basic issue:
> I generated data and am running a simulation that uses multiple imputation.
> For each generated dataset, I used multiple imputation.  The resultant
> dataset is in wide for where each imputation is recorded as a separate
> column (though the different simulations are stacked).  Here is an example
> of what it looks like:
>
> sim   X1   X2   X3   sim.1   X1.1    X1.1    X3.1
> 1         #    #     #        #           #          #         #
> 1         #    #     #        #           #          #         #
> 1         #    #     #        #           #          #         #
> 2         #    #     #        #           #          #         #
> 2         #    #     #        #           #          #         #
> 2         #    #     #        #           #          #         #
>
> sim refers to the simulated/generated dataset. X1-X3 are the values for the
> first imputed dataset, X1.1-X3.1 are the values for the second imputed
> dataset.
>
> The problem is that I want the data to be in long format, like this:
>
> sim m X1 X2 X3
> 1  1   #   #    #
> 1  2   #   #    #
> 2  1   #   #    #
> 2  2   #   #    #
>
> where m is the imputation number.
> This will allow me to do cleaner calculations (e.g. X3-X1).
>
> I know I can subset the data manually - e.g. [,1:10] and save this to
> separate datasets then  rbind; however, I'm looking for a more flexible
> approach to do this.  This manual approach would be quite tedious as number
> of imputations (and therefore number of columns) increased (with only 10
> imputations, there are roughly 810 columns). Also,I would like to
> avoid having to recode each time I change the number of imputations.
>
> THe same is true for the reshape function, which would require naming
> a huge number of columns and edits each time 'm' changes.
>
>
> Is there a flexible way to approach this? I'm inclined to use a for
loop,
> but know that 1) this is generally inefficient and 2) am having trouble
> with
> the coding regardless.
>
> Any suggestions are appreciated.
>
> Thanks,
> Andrea
>
>
> --
> Andrea Lamont, MA
> Clinical-Community Psychology
> University of South Carolina
> Barnwell College
> Columbia, SC 29208
>
> Please consider the environment before printing this email.
>
> CONFIDENTIAL: This transmission is intended for the use of the
> individual(s) or entity to which it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. Should the reader of this message not be the intended
> recipient(s), you are hereby notified that any dissemination, distribution,
> or copying of this communication is strictly prohibited.  If you are not
> the intended recipient, please contact the sender by reply email and
> destroy/delete all copies of the original message.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jul 2013 - flexible approach to subsetting data

[R] flexible approach to subsetting data

[R] flexible approach to subsetting data

[R] flexible approach to subsetting data