thr3ads.net - R help - [R] Looping Through List of .csv Files to Work with Subsets of the Data [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Chad Danyluck

2015-Jun-08 21:48 UTC

[R] Looping Through List of .csv Files to Work with Subsets of the Data

Hello,

I want to subset specific rows of data from 80 .csv files and write those
subsets into new .csv files. The data I want to subset starts on a
different row for each original .csv file. I've created variables that
identify which row the subset should start and end on, but I want to loop
through this process and I am not sure what to do. I've attempted to write
the loop below, albeit, much of it is pseudo code. If anyone can provide me
with some tips I'd appreciate it.

#### This data file is used to create the variables where the subsetting
starts and ends for each participant ####
mig.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data & Syntax/mig.data.csv")

# These are the variable names for the start and end of each subset of
relevant data (baseline, audio, and free)
participant.ids <- mig.processed.data$participant.id
participant.baseline.start <- mig.processed.data$baseline.row.start
participant.baseline.end <- mig.processed.data$baseline.row.end
participant.audio.start <- mig.processed.data$audio.meditation.row.start
participant.audio.end <- mig.processed.data$audio.meditation.row.end
participant.free.start <- mig.processed.data$free.meditation.row.start
participant.free.end <- mig.processed.data$free.meditation.row.end

# read into a list the individual files from which to subset the data
participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG
-
Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text
Files")

# loop through each participant
for (i in 1:length(participant.files)) {

    # get baseline rows
    results.baseline <-
participant.files[participant.baseline.start[i]:participant.baseline.end[i],]

    # get audio rows
    results.audio
<- participant.files[participant.audio.start[i]:participant.audio.end[i],]

    # get free rows
    results.free <-
participant.files[participant.free.start[i]:participant.free.end[i],]

    # write out participant relevant data
    write.csv(results.baseline, file="baseline[i].csv")
    write.csv(results.audio, file = "audio[i].csv")
    write.csv(results.free, file = "free[i].csv")

}

-- 
Chad M. Danyluck, MA
PhD Candidate, Psychology
University of Toronto



?There is nothing either good or bad but thinking makes it so.? - William
Shakespeare

	[[alternative HTML version deleted]]

MacQueen, Don

2015-Jun-09 00:07 UTC

head link

[R] Looping Through List of .csv Files to Work with Subsets of the Data

So you have 80 files, one for each participant?

It appears that from each of the 80 files you want to extract three
subsets of rows,
  one set for baseline
  one set for audio
  one set for "free"

What I think I would do, if the above is correct, is create one
"master"
file. This file will have eight columns:
(I'll show an example column name, followed by a description)
  id  participant id
  fn   file name for that participant
  srb  start row for baseline
  erb  end row for baseline
  sra  start row for audio
  era  end row for audio
  srf  start row for free
  erf  end row for free

This may be fairly close to what you already have, but I'm not sure.

I would then load the master file into R
  mstf <- read.csv( {the master file} )

Then loop through its rows, and since each row has all the information
necessary to read the participant's individual file and identify which
rows to subset, a loop like this should work.

for (irow in seq(nrow(mstf$id))) {

  id <- mstf$id[irow]
  ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
  ## to ensure that the files sort properly when viewed by the operating
system
  idc <- formatC(id, width=2, flag='0')

  crnt.file <- read.csv( mstf$fn[irow] )

  ## base
  tmp.base <- crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
  write.csv(tmp.base, file=paste0('baseline',idc,'.csv')


  ## audio
  tmp.audio <- crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
  write.csv(tmp.audio, file=paste0('audio',idc,'.csv')



  ## free
  tmp.free <- crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
  write.csv(tmp.free, file=paste0('free',idc,'.csv')

}


Obviously, I can't test this. And there may be (likely are!) some typos in
it.

Note that it's not necessary to create variables that identify which row
the subset should start and end on; these are just looked up from the
master file when needed. Similarly, the three respective subsets are
stored in temporary data frames, because they are not (I presume) needed
when the whole thing is done. (if they were needed, then a different
strategy would be more appropriate)

There are different ways to index the loop. I just picked one.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 6/8/15, 2:48 PM, "Chad Danyluck" <c.danyluck at gmail.com>
wrote:
>Hello,
>
>I want to subset specific rows of data from 80 .csv files and write those
>subsets into new .csv files. The data I want to subset starts on a
>different row for each original .csv file. I've created variables that
>identify which row the subset should start and end on, but I want to loop
>through this process and I am not sure what to do. I've attempted to
write
>the loop below, albeit, much of it is pseudo code. If anyone can provide
>me
>with some tips I'd appreciate it.
>
>#### This data file is used to create the variables where the subsetting
>starts and ends for each participant ####
>mig.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
>Dissertation/Data & Syntax/mig.data.csv")
>
># These are the variable names for the start and end of each subset of
>relevant data (baseline, audio, and free)
>participant.ids <- mig.processed.data$participant.id
>participant.baseline.start <- mig.processed.data$baseline.row.start
>participant.baseline.end <- mig.processed.data$baseline.row.end
>participant.audio.start <- mig.processed.data$audio.meditation.row.start
>participant.audio.end <- mig.processed.data$audio.meditation.row.end
>participant.free.start <- mig.processed.data$free.meditation.row.start
>participant.free.end <- mig.processed.data$free.meditation.row.end
>
># read into a list the individual files from which to subset the data
>participant.files <-
list.files("/Users/cdanyluck/Documents/Studies/MIG -
>Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text
Files")
>
># loop through each participant
>for (i in 1:length(participant.files)) {
>
>    # get baseline rows
>    results.baseline <-
>participant.files[participant.baseline.start[i]:participant.baseline.end[i
>],]
>
>    # get audio rows
>    results.audio
><-
participant.files[participant.audio.start[i]:participant.audio.end[i],]
>
>    # get free rows
>    results.free <-
>participant.files[participant.free.start[i]:participant.free.end[i],]
>
>    # write out participant relevant data
>    write.csv(results.baseline, file="baseline[i].csv")
>    write.csv(results.audio, file = "audio[i].csv")
>    write.csv(results.free, file = "free[i].csv")
>
>}
>
>-- 
>Chad M. Danyluck, MA
>PhD Candidate, Psychology
>University of Toronto
>
>
>
>?There is nothing either good or bad but thinking makes it so.? - William
>Shakespeare
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Chad Danyluck

2015-Jun-09 02:15 UTC

head link

[R] Looping Through List of .csv Files to Work with Subsets of the Data

Thank you Don.

I've incorporated your suggestions which have helped me to understand how
loops work better than previously. However, the loop gets stuck trying to
read the current file:

mig.processed.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data & Syntax/mig.log.data.addition.csv")

## ASSUMPTION: Starting with augmented processedbook and correct
free.meditation.end
#### Read in all data files and Loop through to create new data files
segmented by the rows identified before ####

# get required data
participant.ids <- mig.processed.data$participant.id
participant.baseline.start <- mig.processed.data$baseline.row.start
participant.baseline.end <- mig.processed.data$baseline.row.end
participant.audio.start <- mig.processed.data$audio.meditation.row.start
participant.audio.end <- mig.processed.data$audio.meditation.row.end
participant.free.start <- mig.processed.data$free.meditation.row.start
participant.free.end <- mig.processed.data$free.meditation.row.end

participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG
-
Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text
Files")

for (i in 1:length(participant.files)) {

 id <- participant.files[i]

  ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
  ## to ensure that the files sort properly when viewed by the operating
#system
 idc <- formatC(id, width=3, flag='0')

#current file
  crnt.file[i] <- read.csv( participant.files[i] )

## base
  tmp.base <-
crnt.file[participant.baseline.start:participant.baseline.end, ]
  write.csv(tmp.base, file=paste0('baseline',idc,'.csv'))


  ## audio
  tmp.audio <- crnt.file[participant.audio.start:participant.audio.end, ]
  write.csv(tmp.audio, file=paste0('audio',idc,'.csv'))



  ## free
  tmp.free <- crnt.file[participant.free.start:participant.free.end, ]
  write.csv(tmp.free, file=paste0('free',idc,'.csv'))

}

The error message reads:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file '103.csv': No such file
or directory

So it seems to be calling the first file in the list but getting stuck. Any
suggestions?

Best,

Chad

On Mon, Jun 8, 2015 at 8:07 PM, MacQueen, Don <macqueen1 at llnl.gov>
wrote:
> So you have 80 files, one for each participant?
>
> It appears that from each of the 80 files you want to extract three
> subsets of rows,
>   one set for baseline
>   one set for audio
>   one set for "free"
>
> What I think I would do, if the above is correct, is create one
"master"
> file. This file will have eight columns:
> (I'll show an example column name, followed by a description)
>   id  participant id
>   fn   file name for that participant
>   srb  start row for baseline
>   erb  end row for baseline
>   sra  start row for audio
>   era  end row for audio
>   srf  start row for free
>   erf  end row for free
>
> This may be fairly close to what you already have, but I'm not sure.
>
> I would then load the master file into R
>   mstf <- read.csv( {the master file} )
>
> Then loop through its rows, and since each row has all the information
> necessary to read the participant's individual file and identify which
> rows to subset, a loop like this should work.
>
> for (irow in seq(nrow(mstf$id))) {
>
>   id <- mstf$id[irow]
>   ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
>   ## to ensure that the files sort properly when viewed by the operating
> system
>   idc <- formatC(id, width=2, flag='0')
>
>   crnt.file <- read.csv( mstf$fn[irow] )
>
>   ## base
>   tmp.base <- crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
>   write.csv(tmp.base, file=paste0('baseline',idc,'.csv')
>
>
>   ## audio
>   tmp.audio <- crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
>   write.csv(tmp.audio, file=paste0('audio',idc,'.csv')
>
>
>
>   ## free
>   tmp.free <- crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
>   write.csv(tmp.free, file=paste0('free',idc,'.csv')
>
> }
>
>
> Obviously, I can't test this. And there may be (likely are!) some typos
in
> it.
>
> Note that it's not necessary to create variables that identify which
row
> the subset should start and end on; these are just looked up from the
> master file when needed. Similarly, the three respective subsets are
> stored in temporary data frames, because they are not (I presume) needed
> when the whole thing is done. (if they were needed, then a different
> strategy would be more appropriate)
>
> There are different ways to index the loop. I just picked one.
>
> --
> Don MacQueen
>
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
>
>
>
>
>
> On 6/8/15, 2:48 PM, "Chad Danyluck" <c.danyluck at
gmail.com> wrote:
>
> >Hello,
> >
> >I want to subset specific rows of data from 80 .csv files and write
those
> >subsets into new .csv files. The data I want to subset starts on a
> >different row for each original .csv file. I've created variables
that
> >identify which row the subset should start and end on, but I want to
loop
> >through this process and I am not sure what to do. I've attempted
to write
> >the loop below, albeit, much of it is pseudo code. If anyone can
provide
> >me
> >with some tips I'd appreciate it.
> >
> >#### This data file is used to create the variables where the
subsetting
> >starts and ends for each participant ####
> >mig.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
> >Dissertation/Data & Syntax/mig.data.csv")
> >
> ># These are the variable names for the start and end of each subset of
> >relevant data (baseline, audio, and free)
> >participant.ids <- mig.processed.data$participant.id
> >participant.baseline.start <- mig.processed.data$baseline.row.start
> >participant.baseline.end <- mig.processed.data$baseline.row.end
> >participant.audio.start <-
mig.processed.data$audio.meditation.row.start
> >participant.audio.end <- mig.processed.data$audio.meditation.row.end
> >participant.free.start <-
mig.processed.data$free.meditation.row.start
> >participant.free.end <- mig.processed.data$free.meditation.row.end
> >
> ># read into a list the individual files from which to subset the data
> >participant.files <-
list.files("/Users/cdanyluck/Documents/Studies/MIG -
> >Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text
Files")
> >
> ># loop through each participant
> >for (i in 1:length(participant.files)) {
> >
> >    # get baseline rows
> >    results.baseline <-
>
>participant.files[participant.baseline.start[i]:participant.baseline.end[i
> >],]
> >
> >    # get audio rows
> >    results.audio
> ><-
participant.files[participant.audio.start[i]:participant.audio.end[i],]
> >
> >    # get free rows
> >    results.free <-
> >participant.files[participant.free.start[i]:participant.free.end[i],]
> >
> >    # write out participant relevant data
> >    write.csv(results.baseline, file="baseline[i].csv")
> >    write.csv(results.audio, file = "audio[i].csv")
> >    write.csv(results.free, file = "free[i].csv")
> >
> >}
> >
> >--
> >Chad M. Danyluck, MA
> >PhD Candidate, Psychology
> >University of Toronto
> >
> >
> >
> >?There is nothing either good or bad but thinking makes it so.? -
William
> >Shakespeare
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Chad M. Danyluck, MA
PhD Candidate, Psychology
University of Toronto



?There is nothing either good or bad but thinking makes it so.? - William
Shakespeare

	[[alternative HTML version deleted]]

R help - Jun 2015 - Looping Through List of .csv Files to Work with Subsets of the Data

[R] Looping Through List of .csv Files to Work with Subsets of the Data

[R] Looping Through List of .csv Files to Work with Subsets of the Data

[R] Looping Through List of .csv Files to Work with Subsets of the Data