Ista Zahn
2011-Mar-11 16:52 UTC
[R] Any existing functions for reading and extracting data from path names?
Hi helpeRs,
I have inherited a set of data files that use the file system as a
sort of poor man's database, i.e., the data files are nested in
directories that indicate which city they come from. For example:
dir.create("deleteme")
for(i in paste("deleteme", c("New York", "Los
Angeles"), sep="/")) {
dir.create(i)
for(j in paste("data", 1:2, ".csv", sep="")) {
write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
}
}
list.files("deleteme", recursive=TRUE)
What I want to end up with is
x city wave
1 New York 1
1 Los Angeles 1
1 New York 2
1 Los Angeles 2
I've started writting a simple function to do this, but it seems like
a common situation and I'm wondering if there are any packages or
functions that might make this easier.
Thanks!
Ista
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
Henrik Bengtsson
2011-Mar-11 18:02 UTC
[R] Any existing functions for reading and extracting data from path names?
Hi,
the R.filesets package was designed for this. It is heavily used by
the aroma framework (http://www.aroma-project.org/), so it got a fair
bit of mileage now (in a good a way). Here is how you could setup
your data set and work with the data.
# - - - - - - - - - - - -
# Setup file data set
# - - - - - - - - - - - -
library("R.filesets");
paths <- list.files(path="deleteme", full.names=TRUE);
dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path));
ds <- Reduce(append, dsList);
# Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv
setFullNamesTranslator(ds, function(name, file, ...) {
path <- getPath(file);
paste(c(basename(path), name), collapse=",");
});
# - - - - - - - - - - - -
# Examples
# - - - - - - - - - - - -
# Get the full names (a fullname consists of
# a name and comma-separated tags)> getFullNames(ds)
[1] "Los Angeles,data1" "Los Angeles,data2"
[3] "New York,data1" "New York,data2"
# Get the names> getNames(ds)
[1] "Los Angeles" "Los Angeles"
[3] "New York" "New York"
> ds
TabularTextFileSet:
Name: Los Angeles
Tags:
Full name: Los Angeles
Number of files: 4
Names: Los Angeles, Los Angeles, New York, New York [4]
Path (to the first file): deleteme/Los Angeles
Total file size: 0.00 MB
RAM: 0.01MB
# Get 2nd file> df <- getFile(ds, 2)
> df
TabularTextFile:
Name: Los Angeles
Tags: data2
Full name: Los Angeles,data2
Pathname: deleteme/Los Angeles/data2.csv
File size: 80 bytes
RAM: 0.01 MB
Number of data rows: 10
Columns [2]: '', 'x'
Number of text lines: 11
# Read one data file> data <- readDataFrame(df)
> data
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
# Read all data files> dataList <- lapply(ds, readDataFrame)
> dataList
$`Los Angeles,data1
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`Los Angeles,data2
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`New York,data1`
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`New York,data2`
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Most methods in R.filesets are currently poorly documented (no
time/resources/...), but there is more in there than documented so
feel free to ask if you have any questions.
Hope this helps
/Henrik
On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu>
wrote:> Hi helpeRs,
>
> I have inherited a set of data files that use the file system as a
> sort of poor man's database, i.e., the data files are nested in
> directories that indicate which city they come from. For example:
>
> dir.create("deleteme")
> for(i in paste("deleteme", c("New York", "Los
Angeles"), sep="/")) {
> ? ?dir.create(i)
> ? ?for(j in paste("data", 1:2, ".csv",
sep="")) {
> ? ? ? ?write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
> ? ?}
> }
>
> list.files("deleteme", recursive=TRUE)
>
> What I want to end up with is
>
> ?x ? ? ? ?city wave
> ?1 ? ?New York ? ?1
> ?1 Los Angeles ? ?1
> ?1 ? ?New York ? ?2
> ?1 Los Angeles ? ?2
>
> I've started writting a simple function to do this, but it seems like
> a common situation and I'm wondering if there are any packages or
> functions that might make this easier.
>
> Thanks!
> Ista
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Ista Zahn
2011-Mar-11 18:15 UTC
[R] Any existing functions for reading and extracting data from path names?
Thanks Henrik, that is exactly what I was hoping for! Best, Ista On Fri, Mar 11, 2011 at 1:02 PM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:> Hi, > > the R.filesets package was designed for this. ?It is heavily used by > the aroma framework (http://www.aroma-project.org/), so it got a fair > bit of mileage now (in a good a way). ?Here is how you could setup > your data set and work with the data. > > > # - - - - - - - - - - - - > # Setup file data set > # - - - - - - - - - - - - > library("R.filesets"); > paths <- list.files(path="deleteme", full.names=TRUE); > dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path)); > ds <- Reduce(append, dsList); > > # Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv > setFullNamesTranslator(ds, function(name, file, ...) { > ?path <- getPath(file); > ?paste(c(basename(path), name), collapse=","); > }); > > > > # - - - - - - - - - - - - > # Examples > # - - - - - - - - - - - - > # Get the full names (a fullname consists of > # a name and comma-separated tags) >> getFullNames(ds) > [1] "Los Angeles,data1" "Los Angeles,data2" > [3] "New York,data1" "New York,data2" > > # Get the names >> getNames(ds) > [1] "Los Angeles" "Los Angeles" > [3] "New York" ? ?"New York" > >> ds > TabularTextFileSet: > Name: Los Angeles > Tags: > Full name: Los Angeles > Number of files: 4 > Names: Los Angeles, Los Angeles, New York, New York [4] > Path (to the first file): deleteme/Los Angeles > Total file size: 0.00 MB > RAM: 0.01MB > > > # Get 2nd file >> df <- getFile(ds, 2) >> df > > TabularTextFile: > Name: Los Angeles > Tags: data2 > Full name: Los Angeles,data2 > Pathname: deleteme/Los Angeles/data2.csv > File size: 80 bytes > RAM: 0.01 MB > Number of data rows: 10 > Columns [2]: '', 'x' > Number of text lines: 11 > > > > # Read one data file >> data <- readDataFrame(df) >> data > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > > # Read all data files >> dataList <- lapply(ds, readDataFrame) >> dataList > $`Los Angeles,data1 > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`Los Angeles,data2 > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`New York,data1` > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`New York,data2` > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > Most methods in R.filesets are currently poorly documented (no > time/resources/...), but there is more in there than documented so > feel free to ask if you have any questions. > > Hope this helps > > /Henrik > > On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote: >> Hi helpeRs, >> >> I have inherited a set of data files that use the file system as a >> sort of poor man's database, i.e., the data files are nested in >> directories that indicate which city they come from. For example: >> >> dir.create("deleteme") >> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { >> ? ?dir.create(i) >> ? ?for(j in paste("data", 1:2, ".csv", sep="")) { >> ? ? ? ?write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) >> ? ?} >> } >> >> list.files("deleteme", recursive=TRUE) >> >> What I want to end up with is >> >> ?x ? ? ? ?city wave >> ?1 ? ?New York ? ?1 >> ?1 Los Angeles ? ?1 >> ?1 ? ?New York ? ?2 >> ?1 Los Angeles ? ?2 >> >> I've started writting a simple function to do this, but it seems like >> a common situation and I'm wondering if there are any packages or >> functions that might make this easier. >> >> Thanks! >> Ista >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Mikhail Titov
2011-Mar-12 00:39 UTC
[R] Any existing functions for reading and extracting data from path names?
I'm not sure what you are trying to achieve, but I think this can be a good
starting point:
files <- list.files("deleteme", full.names=TRUE, recursive=TRUE)
names <- sapply(strsplit(files, "/", TRUE), "[", 2)
x <- lapply(files, function(f) {
out <- read.csv(f)
out$city <- strsplit(f, "/", TRUE)[[1]][2]
out
})
y <- do.call("rbind", x)
Mikhail
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ista Zahn
> Sent: Friday, March 11, 2011 10:53 AM
> To: r-help at r-project.org
> Subject: [R] Any existing functions for reading and extracting data
> from path names?
>
> Hi helpeRs,
>
> I have inherited a set of data files that use the file system as a
> sort of poor man's database, i.e., the data files are nested in
> directories that indicate which city they come from. For example:
>
> dir.create("deleteme")
> for(i in paste("deleteme", c("New York", "Los
Angeles"), sep="/")) {
> dir.create(i)
> for(j in paste("data", 1:2, ".csv",
sep="")) {
> write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
> }
> }
>
> list.files("deleteme", recursive=TRUE)
>
> What I want to end up with is
>
> x city wave
> 1 New York 1
> 1 Los Angeles 1
> 1 New York 2
> 1 Los Angeles 2
>
> I've started writting a simple function to do this, but it seems like
> a common situation and I'm wondering if there are any packages or
> functions that might make this easier.
>
> Thanks!
> Ista
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2011-Mar-12 02:50 UTC
[R] Any existing functions for reading and extracting data from path names?
Thanks Mikhail. I've been doing something very similar to your example below, I was just wondering if anyone had packaged functions for this task. Thanks again, Ista On Sat, Mar 12, 2011 at 12:39 AM, Mikhail Titov <mlt at gmx.us> wrote:> I'm not sure what you are trying to achieve, but I think this can be a good > starting point: > > files <- list.files("deleteme", full.names=TRUE, recursive=TRUE) > names <- sapply(strsplit(files, "/", TRUE), "[", 2) > x <- lapply(files, function(f) { > ? ?out <- read.csv(f) > ? ?out$city <- strsplit(f, "/", TRUE)[[1]][2] > ? ?out > ? ?}) > y <- do.call("rbind", x) > > Mikhail > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of Ista Zahn >> Sent: Friday, March 11, 2011 10:53 AM >> To: r-help at r-project.org >> Subject: [R] Any existing functions for reading and extracting data >> from path names? >> >> Hi helpeRs, >> >> I have inherited a set of data files that use the file system as a >> sort of poor man's database, i.e., the data files are nested in >> directories that indicate which city they come from. For example: >> >> dir.create("deleteme") >> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { >> ? ? dir.create(i) >> ? ? for(j in paste("data", 1:2, ".csv", sep="")) { >> ? ? ? ? write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) >> ? ? } >> } >> >> list.files("deleteme", recursive=TRUE) >> >> What I want to end up with is >> >> ? x ? ? ? ?city wave >> ? 1 ? ?New York ? ?1 >> ? 1 Los Angeles ? ?1 >> ? 1 ? ?New York ? ?2 >> ? 1 Los Angeles ? ?2 >> >> I've started writting a simple function to do this, but it seems like >> a common situation and I'm wondering if there are any packages or >> functions that might make this easier. >> >> Thanks! >> Ista >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org