Ista Zahn
2011-Mar-11 16:52 UTC
[R] Any existing functions for reading and extracting data from path names?
Hi helpeRs, I have inherited a set of data files that use the file system as a sort of poor man's database, i.e., the data files are nested in directories that indicate which city they come from. For example: dir.create("deleteme") for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { dir.create(i) for(j in paste("data", 1:2, ".csv", sep="")) { write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) } } list.files("deleteme", recursive=TRUE) What I want to end up with is x city wave 1 New York 1 1 Los Angeles 1 1 New York 2 1 Los Angeles 2 I've started writting a simple function to do this, but it seems like a common situation and I'm wondering if there are any packages or functions that might make this easier. Thanks! Ista -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Henrik Bengtsson
2011-Mar-11 18:02 UTC
[R] Any existing functions for reading and extracting data from path names?
Hi, the R.filesets package was designed for this. It is heavily used by the aroma framework (http://www.aroma-project.org/), so it got a fair bit of mileage now (in a good a way). Here is how you could setup your data set and work with the data. # - - - - - - - - - - - - # Setup file data set # - - - - - - - - - - - - library("R.filesets"); paths <- list.files(path="deleteme", full.names=TRUE); dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path)); ds <- Reduce(append, dsList); # Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv setFullNamesTranslator(ds, function(name, file, ...) { path <- getPath(file); paste(c(basename(path), name), collapse=","); }); # - - - - - - - - - - - - # Examples # - - - - - - - - - - - - # Get the full names (a fullname consists of # a name and comma-separated tags)> getFullNames(ds)[1] "Los Angeles,data1" "Los Angeles,data2" [3] "New York,data1" "New York,data2" # Get the names> getNames(ds)[1] "Los Angeles" "Los Angeles" [3] "New York" "New York"> dsTabularTextFileSet: Name: Los Angeles Tags: Full name: Los Angeles Number of files: 4 Names: Los Angeles, Los Angeles, New York, New York [4] Path (to the first file): deleteme/Los Angeles Total file size: 0.00 MB RAM: 0.01MB # Get 2nd file> df <- getFile(ds, 2) > dfTabularTextFile: Name: Los Angeles Tags: data2 Full name: Los Angeles,data2 Pathname: deleteme/Los Angeles/data2.csv File size: 80 bytes RAM: 0.01 MB Number of data rows: 10 Columns [2]: '', 'x' Number of text lines: 11 # Read one data file> data <- readDataFrame(df) > datax 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 # Read all data files> dataList <- lapply(ds, readDataFrame) > dataList$`Los Angeles,data1 x 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 $`Los Angeles,data2 x 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 $`New York,data1` x 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 $`New York,data2` x 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 Most methods in R.filesets are currently poorly documented (no time/resources/...), but there is more in there than documented so feel free to ask if you have any questions. Hope this helps /Henrik On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote:> Hi helpeRs, > > I have inherited a set of data files that use the file system as a > sort of poor man's database, i.e., the data files are nested in > directories that indicate which city they come from. For example: > > dir.create("deleteme") > for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { > ? ?dir.create(i) > ? ?for(j in paste("data", 1:2, ".csv", sep="")) { > ? ? ? ?write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) > ? ?} > } > > list.files("deleteme", recursive=TRUE) > > What I want to end up with is > > ?x ? ? ? ?city wave > ?1 ? ?New York ? ?1 > ?1 Los Angeles ? ?1 > ?1 ? ?New York ? ?2 > ?1 Los Angeles ? ?2 > > I've started writting a simple function to do this, but it seems like > a common situation and I'm wondering if there are any packages or > functions that might make this easier. > > Thanks! > Ista > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Ista Zahn
2011-Mar-11 18:15 UTC
[R] Any existing functions for reading and extracting data from path names?
Thanks Henrik, that is exactly what I was hoping for! Best, Ista On Fri, Mar 11, 2011 at 1:02 PM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:> Hi, > > the R.filesets package was designed for this. ?It is heavily used by > the aroma framework (http://www.aroma-project.org/), so it got a fair > bit of mileage now (in a good a way). ?Here is how you could setup > your data set and work with the data. > > > # - - - - - - - - - - - - > # Setup file data set > # - - - - - - - - - - - - > library("R.filesets"); > paths <- list.files(path="deleteme", full.names=TRUE); > dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path)); > ds <- Reduce(append, dsList); > > # Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv > setFullNamesTranslator(ds, function(name, file, ...) { > ?path <- getPath(file); > ?paste(c(basename(path), name), collapse=","); > }); > > > > # - - - - - - - - - - - - > # Examples > # - - - - - - - - - - - - > # Get the full names (a fullname consists of > # a name and comma-separated tags) >> getFullNames(ds) > [1] "Los Angeles,data1" "Los Angeles,data2" > [3] "New York,data1" "New York,data2" > > # Get the names >> getNames(ds) > [1] "Los Angeles" "Los Angeles" > [3] "New York" ? ?"New York" > >> ds > TabularTextFileSet: > Name: Los Angeles > Tags: > Full name: Los Angeles > Number of files: 4 > Names: Los Angeles, Los Angeles, New York, New York [4] > Path (to the first file): deleteme/Los Angeles > Total file size: 0.00 MB > RAM: 0.01MB > > > # Get 2nd file >> df <- getFile(ds, 2) >> df > > TabularTextFile: > Name: Los Angeles > Tags: data2 > Full name: Los Angeles,data2 > Pathname: deleteme/Los Angeles/data2.csv > File size: 80 bytes > RAM: 0.01 MB > Number of data rows: 10 > Columns [2]: '', 'x' > Number of text lines: 11 > > > > # Read one data file >> data <- readDataFrame(df) >> data > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > > # Read all data files >> dataList <- lapply(ds, readDataFrame) >> dataList > $`Los Angeles,data1 > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`Los Angeles,data2 > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`New York,data1` > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > $`New York,data2` > ? ? ? x > 1 ? 1 ?1 > 2 ? 2 ?2 > 3 ? 3 ?3 > 4 ? 4 ?4 > 5 ? 5 ?5 > 6 ? 6 ?6 > 7 ? 7 ?7 > 8 ? 8 ?8 > 9 ? 9 ?9 > 10 10 10 > > Most methods in R.filesets are currently poorly documented (no > time/resources/...), but there is more in there than documented so > feel free to ask if you have any questions. > > Hope this helps > > /Henrik > > On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote: >> Hi helpeRs, >> >> I have inherited a set of data files that use the file system as a >> sort of poor man's database, i.e., the data files are nested in >> directories that indicate which city they come from. For example: >> >> dir.create("deleteme") >> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { >> ? ?dir.create(i) >> ? ?for(j in paste("data", 1:2, ".csv", sep="")) { >> ? ? ? ?write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) >> ? ?} >> } >> >> list.files("deleteme", recursive=TRUE) >> >> What I want to end up with is >> >> ?x ? ? ? ?city wave >> ?1 ? ?New York ? ?1 >> ?1 Los Angeles ? ?1 >> ?1 ? ?New York ? ?2 >> ?1 Los Angeles ? ?2 >> >> I've started writting a simple function to do this, but it seems like >> a common situation and I'm wondering if there are any packages or >> functions that might make this easier. >> >> Thanks! >> Ista >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Mikhail Titov
2011-Mar-12 00:39 UTC
[R] Any existing functions for reading and extracting data from path names?
I'm not sure what you are trying to achieve, but I think this can be a good starting point: files <- list.files("deleteme", full.names=TRUE, recursive=TRUE) names <- sapply(strsplit(files, "/", TRUE), "[", 2) x <- lapply(files, function(f) { out <- read.csv(f) out$city <- strsplit(f, "/", TRUE)[[1]][2] out }) y <- do.call("rbind", x) Mikhail> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Ista Zahn > Sent: Friday, March 11, 2011 10:53 AM > To: r-help at r-project.org > Subject: [R] Any existing functions for reading and extracting data > from path names? > > Hi helpeRs, > > I have inherited a set of data files that use the file system as a > sort of poor man's database, i.e., the data files are nested in > directories that indicate which city they come from. For example: > > dir.create("deleteme") > for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { > dir.create(i) > for(j in paste("data", 1:2, ".csv", sep="")) { > write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) > } > } > > list.files("deleteme", recursive=TRUE) > > What I want to end up with is > > x city wave > 1 New York 1 > 1 Los Angeles 1 > 1 New York 2 > 1 Los Angeles 2 > > I've started writting a simple function to do this, but it seems like > a common situation and I'm wondering if there are any packages or > functions that might make this easier. > > Thanks! > Ista > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2011-Mar-12 02:50 UTC
[R] Any existing functions for reading and extracting data from path names?
Thanks Mikhail. I've been doing something very similar to your example below, I was just wondering if anyone had packaged functions for this task. Thanks again, Ista On Sat, Mar 12, 2011 at 12:39 AM, Mikhail Titov <mlt at gmx.us> wrote:> I'm not sure what you are trying to achieve, but I think this can be a good > starting point: > > files <- list.files("deleteme", full.names=TRUE, recursive=TRUE) > names <- sapply(strsplit(files, "/", TRUE), "[", 2) > x <- lapply(files, function(f) { > ? ?out <- read.csv(f) > ? ?out$city <- strsplit(f, "/", TRUE)[[1]][2] > ? ?out > ? ?}) > y <- do.call("rbind", x) > > Mikhail > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of Ista Zahn >> Sent: Friday, March 11, 2011 10:53 AM >> To: r-help at r-project.org >> Subject: [R] Any existing functions for reading and extracting data >> from path names? >> >> Hi helpeRs, >> >> I have inherited a set of data files that use the file system as a >> sort of poor man's database, i.e., the data files are nested in >> directories that indicate which city they come from. For example: >> >> dir.create("deleteme") >> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) { >> ? ? dir.create(i) >> ? ? for(j in paste("data", 1:2, ".csv", sep="")) { >> ? ? ? ? write.csv(data.frame(x=1:10), file=paste(i, j, sep="/")) >> ? ? } >> } >> >> list.files("deleteme", recursive=TRUE) >> >> What I want to end up with is >> >> ? x ? ? ? ?city wave >> ? 1 ? ?New York ? ?1 >> ? 1 Los Angeles ? ?1 >> ? 1 ? ?New York ? ?2 >> ? 1 Los Angeles ? ?2 >> >> I've started writting a simple function to do this, but it seems like >> a common situation and I'm wondering if there are any packages or >> functions that might make this easier. >> >> Thanks! >> Ista >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org