thr3ads.net - R help - [R] reshaping data [May 2010]

If this information is useful, please help other people find it:
Share via:

Mia Bengtsson

2010-May-20 17:13 UTC

[R] reshaping data

Hello,

I am a relatively new R-user who has a lot to learn. I have a large dataset that
is in the following dataframe format:

red		A	B	C
green	D
blue	E	F

Where red, green and blue are "species" names and A, B and C are
observations (corresponding to DNA sequences). Each observation can only belong
to one species. I would like to list the observations in one column, with the
species they belong to in the next. Like this:

A	red
B	red
C	red
D	green
E	blue
F	blue

I have tried using reshape() and stack() but I cannot get my head around it. Any
help is highly appreciated!

Thanks in advance,
__________________________________

Mia Bengtsson, PhD-student
Department of Biology
University of Bergen
+47 55584715
+47 97413634
mia.bengtsson at bio.uib.no

Henrique Dallazuanna

2010-May-20 20:53 UTC

head link

[R] reshaping data

Try this:

x_long <- reshape(x, direction = 'long', varying = 2:4, sep =
'', idvar 'V1', timevar = 'V')
subset(x_long[order(x_long$V1),], V != "")

On Thu, May 20, 2010 at 2:13 PM, Mia Bengtsson
<mia.bengtsson@bio.uib.no>wrote:
> Hello,
>
> I am a relatively new R-user who has a lot to learn. I have a large dataset
> that is in the following dataframe format:
>
> red             A       B       C
> green   D
> blue    E       F
>
> Where red, green and blue are "species" names and A, B and C are
> observations (corresponding to DNA sequences). Each observation can only
> belong to one species. I would like to list the observations in one column,
> with the species they belong to in the next. Like this:
>
> A       red
> B       red
> C       red
> D       green
> E       blue
> F       blue
>
> I have tried using reshape() and stack() but I cannot get my head around
> it. Any help is highly appreciated!
>
> Thanks in advance,
> __________________________________
>
> Mia Bengtsson, PhD-student
> Department of Biology
> University of Bergen
> +47 55584715
> +47 97413634
> mia.bengtsson@bio.uib.no
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Mia Bengtsson

2010-May-21 10:39 UTC

head link

[R] reshaping data

Thank you Dennis and Henrique for your help!

Both solutions work! I just need to find a way of removing the empty
"cells" from the final "long" dataframe since they are not
NAs.

Maybe there is an easier way of doing this of the data is not treated as a
dataframe? The original data file that is derived from another program (mothur)
is a textfile with the following format:

red \t A,B,C
green \t D
blue \t E,F

The first column "species" is separated from the
"sequences"(A, B, C...) with tab, and then the "sequences"
are separated from each other with commas.

I imported into R as what I thought was a dataframe using:

test1<-readLines("path/test")
test2<-gsub(pattern= "\t", otu, replacement=",")
test3<-textConnection(test2)
test.df<-read.csv(test3, header=F)

Should I rather have imported it as something else if I want to reshape it into
a list as described previously?

Thanks a million!

/ Mia Bengtsson


On May 21, 2010, at 2:15 AM, Dennis Murphy wrote:
> Hi:
> 
> 
> On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson
<mia.bengtsson@bio.uib.no> wrote:
> Hello,
> 
> I am a relatively new R-user who has a lot to learn. I have a large dataset
that is in the following dataframe format:
> 
> red             A       B       C
> green   D
> blue    E       F
> 
> This isn't a data frame in R - if it were, it would have NA (or at
least ""/" "padding at the end of each row.
> Data frames are not ragged arrays. To have this type of structure in R, the
data would have to be in a list.
> 
> This matters because Henrique's solution with reshape() assumes a data
frame as input. A similar solution
> would be to use melt() in the reshape package, something like
> 
> library(reshape)
> longdf <- melt(yourdf, id.var = 'species')
> longdf
> 
> If you have NA padding, the way to get rid of them in the reshaped data
frame is (with the above approach)
> 
> longdf[!is.na(longdf$value), -longdf$variable]
> 
> If the padding is with blanks, then Henrique's solution works here,
too.
> 
> HTH,
> Dennis
> 
> 
> Where red, green and blue are "species" names and A, B and C are
observations (corresponding to DNA sequences). Each observation can only belong
to one species. I would like to list the observations in one column, with the
species they belong to in the next. Like this:
> 
> A       red
> B       red
> C       red
> D       green
> E       blue
> F       blue
> 
> I have tried using reshape() and stack() but I cannot get my head around
it. Any help is highly appreciated!
> 
> Thanks in advance,
> __________________________________
> 
> Mia Bengtsson, PhD-student
> Department of Biology
> University of Bergen
> +47 55584715
> +47 97413634
> mia.bengtsson@bio.uib.no
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more possibly parallel threads

R help - May 2010 - reshaping data

[R] reshaping data

[R] reshaping data

[R] reshaping data

Possibly Parallel Threads