Thank you Dennis and Henrique for your help!
Both solutions work! I just need to find a way of removing the empty
"cells" from the final "long" dataframe since they are not
NAs.
Maybe there is an easier way of doing this of the data is not treated as a
dataframe? The original data file that is derived from another program (mothur)
is a textfile with the following format:
red \t A,B,C
green \t D
blue \t E,F
The first column "species" is separated from the
"sequences"(A, B, C...) with tab, and then the "sequences"
are separated from each other with commas.
I imported into R as what I thought was a dataframe using:
test1<-readLines("path/test")
test2<-gsub(pattern= "\t", otu, replacement=",")
test3<-textConnection(test2)
test.df<-read.csv(test3, header=F)
Should I rather have imported it as something else if I want to reshape it into
a list as described previously?
Thanks a million!
/ Mia Bengtsson
On May 21, 2010, at 2:15 AM, Dennis Murphy wrote:
> Hi:
>
>
> On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson
<mia.bengtsson@bio.uib.no> wrote:
> Hello,
>
> I am a relatively new R-user who has a lot to learn. I have a large dataset
that is in the following dataframe format:
>
> red A B C
> green D
> blue E F
>
> This isn't a data frame in R - if it were, it would have NA (or at
least ""/" "padding at the end of each row.
> Data frames are not ragged arrays. To have this type of structure in R, the
data would have to be in a list.
>
> This matters because Henrique's solution with reshape() assumes a data
frame as input. A similar solution
> would be to use melt() in the reshape package, something like
>
> library(reshape)
> longdf <- melt(yourdf, id.var = 'species')
> longdf
>
> If you have NA padding, the way to get rid of them in the reshaped data
frame is (with the above approach)
>
> longdf[!is.na(longdf$value), -longdf$variable]
>
> If the padding is with blanks, then Henrique's solution works here,
too.
>
> HTH,
> Dennis
>
>
> Where red, green and blue are "species" names and A, B and C are
observations (corresponding to DNA sequences). Each observation can only belong
to one species. I would like to list the observations in one column, with the
species they belong to in the next. Like this:
>
> A red
> B red
> C red
> D green
> E blue
> F blue
>
> I have tried using reshape() and stack() but I cannot get my head around
it. Any help is highly appreciated!
>
> Thanks in advance,
> __________________________________
>
> Mia Bengtsson, PhD-student
> Department of Biology
> University of Bergen
> +47 55584715
> +47 97413634
> mia.bengtsson@bio.uib.no
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]