thr3ads.net - R help - [R] How to load load multiple text files and order by id [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Richard Green

2011-Mar-06 02:39 UTC

[R] How to load load multiple text files and order by id

Hello R users,
I am fairly new to R and was hoping you could point me in the right
direction I have a set of text files (36).
Each file has only two columns (id and count) , I am trying to figure out a
way to load all the files together and
then have them ordered by id into a matrix data frame. For example

If each txt file has :
ID           count
id_00002 20
id_00003 3

A Merged File:
ID           count_file1 count_file2 count_file3 count_file4
id_00002 20         8              12               5             19 26
id_00003 3 0 2 0 0 0
id_00004 75 84 241 149 271 257

Is there a relatively simply way to do that in R? I was trying with <-
read.table
and then <- cbind but that does not appear to be working.  Any suggestions
folks have are appreciated.
Thanks
-Rich

	[[alternative HTML version deleted]]

Kingsley G. Morse Jr.

2011-Mar-06 05:40 UTC

head link

[R] How to load load multiple text files and order by id

Hi Richard,

If you haven't tried it already, maybe you could
read the files into separate data frames with
read.table(), and then combine them with merge().

Type

    ?merge

to learn more.

Good luck,
Kingsley

On 03/05/11 18:39, Richard Green wrote:> Hello R users,
> I am fairly new to R and was hoping you could point me in the right
> direction I have a set of text files (36).
> Each file has only two columns (id and count) , I am trying to figure out a
> way to load all the files together and
> then have them ordered by id into a matrix data frame. For example
> 
> If each txt file has :
> ID           count
> id_00002 20
> id_00003 3
> 
> A Merged File:
> ID           count_file1 count_file2 count_file3 count_file4
> id_00002 20         8              12               5             19 26
> id_00003 3 0 2 0 0 0
> id_00004 75 84 241 149 271 257
> 
> Is there a relatively simply way to do that in R? I was trying with <-
> read.table
> and then <- cbind but that does not appear to be working.  Any
suggestions
> folks have are appreciated.
> Thanks
> -Rich
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Dennis Murphy

2011-Mar-06 07:36 UTC

head link

[R] How to load load multiple text files and order by id

Hi:

This is basically Scott's idea with a few added details.

Let's assume your files have similar names - e.g., they differ only by
number.
The example below creates ten files of similar structure to yours. There are
then two paths one can follow: (1) put all the files into a specific
directory, or
(2) keep them where they are.

This is my current working directory (Win 7):> getwd()[1] "C:/Users/Dennis/Documents"

# Create ten files, each with 20 IDs and a random count. The files are then
# written as csv files to the current working directory. This is simply a
way
# for me to generate data that in some sense mimics the data you already
have.
# You don't need to reproduce this since you already have the file list.
for (i in 1:10) {
    df <- data.frame(id = sprintf('%02d', 1:20),
                     count = rpois(20, 50))
    write.csv(df, file = paste('file_', sprintf('%02d', i),
'.csv', sep ''),
                row.names = FALSE)
     }

# Option 1: Move all the files to a separate subdirectory of the current
directory-
# I'll call it 'myfiles', because I'm highly imaginative. [If
your files
have different names
# that are difficult to isolate with a certain string pattern, this is
probably the best option.]
# Once the files are moved, I can change the working directory to myfiles:
setwd('myfiles')

# > getwd()
# [1] "C:/Users/Dennis/Documents/myfiles"

# Now, read all the csv files from this directory into a list object - in
your case,
# it may be simpler to define a vector of names with list.files() instead
and check
# that it's right before using lapply, something like
# filelist <- list.files(pattern = '.csv', all.files = FALSE)
# readlist <- lapply(filelist, read.csv, header = TRUE)
# The line below combines the two.
readlist <- lapply(list.files(pattern = 'csv', all.files = FALSE),
                     read.csv, header = TRUE)

# Assign names count_01 to count_10 to the list components (rationale: these
# are the column names I'll want to use in the final data frame)
names(readlist) <- paste('count', sprintf('%02d',
1:length(readlist)), sep '_')
# As Scott intimated (but never used :), fire up the plyr and reshape
packages:
library(plyr)
library(reshape)
# The first command is equivalent to do.call(rbind, readlist), but the
advantage of
# ldply is that it copies over the list component names in a variable named
.id as well,
# which as we'll see is very useful...
dtf <- ldply(readlist, rbind)
head(dtf)      # to see the first few lines

# The cast() function in the reshape package takes our 'long' data in
dtf
and
# reshapes it to 'wide' form according to the formula - in this case,
the
rows will
# be the id numbers and the columns will be count_01 - count_10.
Fortunately,
# the count is taken as the 'value' variable. (This is made more
explicit in
the
# reshape2 package, where the corresponding function is dcast() and count
# would be (in quotes) the argument of value_var = )...but this works:
cast(dtf, id ~ .id)

# Option 2: The files happen to be in the same directory as getwd(), but may
be
# mixed in with a bunch of other files. This is the case in my Documents
directory.
setwd('..')
getwd()
[1] "C:/Users/Dennis/Documents"

# I may have other .csv files in this directory, so I'm probably better off
trying to
# match 'file_' instead of '.csv'. Otherwise, it's pretty
much the same
story as above:
list2 <- lapply(list.files(pattern = 'file_', all.files = FALSE),
                     read.csv, header = TRUE)
names(list2) <- paste('count', sprintf('%02d',
1:length(readlist)), sep '_')
dtg <-  ldply(list2, rbind)
cast(dtg, id ~ .id)

A third option is to create a separate subdirectory for the data files, copy
an R shortcut
into that directory (at least under Windows, anyway), go to Properties and
change the
'StartIn' directory to its name. Then follow Option 1.

HTH,
Dennis

On Sat, Mar 5, 2011 at 6:39 PM, Richard Green <greener@uw.edu> wrote:
> Hello R users,
> I am fairly new to R and was hoping you could point me in the right
> direction I have a set of text files (36).
> Each file has only two columns (id and count) , I am trying to figure out a
> way to load all the files together and
> then have them ordered by id into a matrix data frame. For example
>
> If each txt file has :
> ID           count
> id_00002 20
> id_00003 3
>
> A Merged File:
> ID           count_file1 count_file2 count_file3 count_file4
> id_00002 20         8              12               5             19 26
> id_00003 3 0 2 0 0 0
> id_00004 75 84 241 149 271 257
>
> Is there a relatively simply way to do that in R? I was trying with <-
> read.table
> and then <- cbind but that does not appear to be working.  Any
suggestions
> folks have are appreciated.
> Thanks
> -Rich
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more maybe matching threads

R help - Mar 2011 - How to load load multiple text files and order by id

[R] How to load load multiple text files and order by id

[R] How to load load multiple text files and order by id

[R] How to load load multiple text files and order by id

Apparently Analagous Threads