Hello R folks,
I have recently discovered the power of working with multiple data frames in
lists. However, I am having trouble understanding how to perform operations
on individual columns of data frames in the list. For example, I have a
water quality data set (sample data included below) that consists of roughly
a dozen data frames. Some of the data frames have a chr column called
'Month' that I need to to convert to a date with the proper format. I
would
like to iterate through all of the data frames in the list and format all of
those that have the 'Month' column. I can accomplish this with a
for-loop
(e.g., below) but I cannot figure out how to do this with the plyr or apply
families. This is just one example of the formatting that I have to perform
so I would really like to avoid loops, and I would love to learn how to
better work with lists as well.
I would appreciate greatly any guidance.
Thank you and regards,
Stevan
a for-loop like this works, but is not an ideal solution:
for (i in 1:length(data)) {if ("Month" %in% names(data[[i]]))
data[[i]]$Month<- as.POSIXct(data[[i]]$Month, format="%Y/%m/%d")}
sample data (head of two data frames from the list of all data frames):
structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6,
Site_Number = c("R5", "R6a", "R8",
"R9a", "R14", "R15"),
Month = c("2001/10/01", "2001/10/01",
"2001/10/01", "2001/10/01",
"2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L, NA,
NA,
215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408),
Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L,
325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5,
417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names
c("ID",
"Site_Number", "Month", "Exc_A", "Em_A",
"Fl_A", "Exc_B", "Em_B",
"Fl_B"), row.names = c(NA, 6L), class = "data.frame"),
algae.csv structure(list(
ID = 1:6, SiteNumber = c("R1", "R2A", "R2B",
"R3", "R4",
"R5"), SiteLocation = c("CAP canal above Waddell Canal",
"Lake Pleasant integrated sample", "Lake Pleasant integrated
sample",
"Waddell Canal", "Cap Canal at 7th St.", "Verde
River btwn Horseshoe and
Bartlett"
), ClusterName = c("cap", "cap", "cap",
"cap", "cap", "verde"
), SiteAcronym = c("cap-siphon", "pleasant-epi",
"pleasant-hypo",
"waddell canal", "cap @ 7th st", "verde abv
bartlett"), Date c("1999/08/18",
"1999/08/18", "1999/08/18", "1999/08/18",
"1999/08/18", "1999/08/16"
), Month = c("1999/08/01", "1999/08/01",
"1999/08/01", "1999/08/01",
"1999/08/01", "1999/08/01"), SampleType =
c("", "", "", "",
"", ""), Conductance = c(800, 890, 850, 870, 830, 500),
ChlA = c(0.3,
0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7,
4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7),
Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L,
0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L,
18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L
), AlgaeComments = c("", "", "", "",
"", "")), .Names = c("ID",
"SiteNumber", "SiteLocation", "ClusterName",
"SiteAcronym", "Date",
"Month", "SampleType", "Conductance",
"ChlA", "Phaeophytin",
"PhaeophytinChlA", "Chlorophyta", "Cyanophyta",
"Bacillariophyta",
"Total", "AlgaeComments"), row.names = c(NA, 6L), class =
"data.frame")),
.Names = c("3D_Fluorescence.csv",
"algae.csv"))
--
View this message in context:
http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html
Sent from the R help mailing list archive at Nabble.com.
Adams, Jean
2015-Apr-13 12:51 UTC
[R] operations on columns when data frames are in a list
If you write a function that takes a data frame as an argument and returns
a data frame, you can use lapply to carry out the tasks that you want. For
example, if your list of data frames is called mydat ...
mon2date <- function(df) {
if ("Month" %in% names(df)) {
df$Month<- as.POSIXct(df$Month, format="%Y/%m/%d")
}
return(df)
}
mydat2 <- lapply(mydat, mon2date)
Jean
On Sun, Apr 12, 2015 at 5:30 PM, Steve E. <searl at vt.edu> wrote:
> Hello R folks,
>
> I have recently discovered the power of working with multiple data frames
> in
> lists. However, I am having trouble understanding how to perform operations
> on individual columns of data frames in the list. For example, I have a
> water quality data set (sample data included below) that consists of
> roughly
> a dozen data frames. Some of the data frames have a chr column called
> 'Month' that I need to to convert to a date with the proper format.
I would
> like to iterate through all of the data frames in the list and format all
> of
> those that have the 'Month' column. I can accomplish this with a
for-loop
> (e.g., below) but I cannot figure out how to do this with the plyr or apply
> families. This is just one example of the formatting that I have to perform
> so I would really like to avoid loops, and I would love to learn how to
> better work with lists as well.
>
> I would appreciate greatly any guidance.
>
>
> Thank you and regards,
> Stevan
>
>
> a for-loop like this works, but is not an ideal solution:
>
> for (i in 1:length(data)) {if ("Month" %in% names(data[[i]]))
> data[[i]]$Month<- as.POSIXct(data[[i]]$Month,
format="%Y/%m/%d")}
>
>
>
> sample data (head of two data frames from the list of all data frames):
>
> structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6,
> Site_Number = c("R5", "R6a", "R8",
"R9a", "R14", "R15"),
> Month = c("2001/10/01", "2001/10/01",
"2001/10/01", "2001/10/01",
> "2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L,
NA, NA,
> 215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408),
> Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L,
> 325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5,
> 417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names >
c("ID",
> "Site_Number", "Month", "Exc_A",
"Em_A", "Fl_A", "Exc_B", "Em_B",
> "Fl_B"), row.names = c(NA, 6L), class = "data.frame"),
algae.csv > structure(list(
> ID = 1:6, SiteNumber = c("R1", "R2A",
"R2B", "R3", "R4",
> "R5"), SiteLocation = c("CAP canal above Waddell
Canal",
> "Lake Pleasant integrated sample", "Lake Pleasant
integrated sample",
> "Waddell Canal", "Cap Canal at 7th St.",
"Verde River btwn Horseshoe
> and
> Bartlett"
> ), ClusterName = c("cap", "cap", "cap",
"cap", "cap", "verde"
> ), SiteAcronym = c("cap-siphon", "pleasant-epi",
"pleasant-hypo",
> "waddell canal", "cap @ 7th st", "verde abv
bartlett"), Date > c("1999/08/18",
> "1999/08/18", "1999/08/18", "1999/08/18",
"1999/08/18", "1999/08/16"
> ), Month = c("1999/08/01", "1999/08/01",
"1999/08/01", "1999/08/01",
> "1999/08/01", "1999/08/01"), SampleType =
c("", "", "", "",
> "", ""), Conductance = c(800, 890, 850, 870, 830,
500), ChlA = c(0.3,
> 0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7,
> 4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7),
> Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L,
> 0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L,
> 18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L
> ), AlgaeComments = c("", "", "",
"", "", "")), .Names = c("ID",
> "SiteNumber", "SiteLocation", "ClusterName",
"SiteAcronym", "Date",
> "Month", "SampleType", "Conductance",
"ChlA", "Phaeophytin",
> "PhaeophytinChlA", "Chlorophyta",
"Cyanophyta", "Bacillariophyta",
> "Total", "AlgaeComments"), row.names = c(NA, 6L), class
= "data.frame")),
> .Names = c("3D_Fluorescence.csv",
> "algae.csv"))
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]