Hello R folks, I have recently discovered the power of working with multiple data frames in lists. However, I am having trouble understanding how to perform operations on individual columns of data frames in the list. For example, I have a water quality data set (sample data included below) that consists of roughly a dozen data frames. Some of the data frames have a chr column called 'Month' that I need to to convert to a date with the proper format. I would like to iterate through all of the data frames in the list and format all of those that have the 'Month' column. I can accomplish this with a for-loop (e.g., below) but I cannot figure out how to do this with the plyr or apply families. This is just one example of the formatting that I have to perform so I would really like to avoid loops, and I would love to learn how to better work with lists as well. I would appreciate greatly any guidance. Thank you and regards, Stevan a for-loop like this works, but is not an ideal solution: for (i in 1:length(data)) {if ("Month" %in% names(data[[i]])) data[[i]]$Month<- as.POSIXct(data[[i]]$Month, format="%Y/%m/%d")} sample data (head of two data frames from the list of all data frames): structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6, Site_Number = c("R5", "R6a", "R8", "R9a", "R14", "R15"), Month = c("2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L, NA, NA, 215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408), Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L, 325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5, 417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names c("ID", "Site_Number", "Month", "Exc_A", "Em_A", "Fl_A", "Exc_B", "Em_B", "Fl_B"), row.names = c(NA, 6L), class = "data.frame"), algae.csv structure(list( ID = 1:6, SiteNumber = c("R1", "R2A", "R2B", "R3", "R4", "R5"), SiteLocation = c("CAP canal above Waddell Canal", "Lake Pleasant integrated sample", "Lake Pleasant integrated sample", "Waddell Canal", "Cap Canal at 7th St.", "Verde River btwn Horseshoe and Bartlett" ), ClusterName = c("cap", "cap", "cap", "cap", "cap", "verde" ), SiteAcronym = c("cap-siphon", "pleasant-epi", "pleasant-hypo", "waddell canal", "cap @ 7th st", "verde abv bartlett"), Date c("1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/16" ), Month = c("1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01"), SampleType = c("", "", "", "", "", ""), Conductance = c(800, 890, 850, 870, 830, 500), ChlA = c(0.3, 0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7, 4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7), Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L, 0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L, 18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L ), AlgaeComments = c("", "", "", "", "", "")), .Names = c("ID", "SiteNumber", "SiteLocation", "ClusterName", "SiteAcronym", "Date", "Month", "SampleType", "Conductance", "ChlA", "Phaeophytin", "PhaeophytinChlA", "Chlorophyta", "Cyanophyta", "Bacillariophyta", "Total", "AlgaeComments"), row.names = c(NA, 6L), class = "data.frame")), .Names = c("3D_Fluorescence.csv", "algae.csv")) -- View this message in context: http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html Sent from the R help mailing list archive at Nabble.com.
Adams, Jean
2015-Apr-13 12:51 UTC
[R] operations on columns when data frames are in a list
If you write a function that takes a data frame as an argument and returns a data frame, you can use lapply to carry out the tasks that you want. For example, if your list of data frames is called mydat ... mon2date <- function(df) { if ("Month" %in% names(df)) { df$Month<- as.POSIXct(df$Month, format="%Y/%m/%d") } return(df) } mydat2 <- lapply(mydat, mon2date) Jean On Sun, Apr 12, 2015 at 5:30 PM, Steve E. <searl at vt.edu> wrote:> Hello R folks, > > I have recently discovered the power of working with multiple data frames > in > lists. However, I am having trouble understanding how to perform operations > on individual columns of data frames in the list. For example, I have a > water quality data set (sample data included below) that consists of > roughly > a dozen data frames. Some of the data frames have a chr column called > 'Month' that I need to to convert to a date with the proper format. I would > like to iterate through all of the data frames in the list and format all > of > those that have the 'Month' column. I can accomplish this with a for-loop > (e.g., below) but I cannot figure out how to do this with the plyr or apply > families. This is just one example of the formatting that I have to perform > so I would really like to avoid loops, and I would love to learn how to > better work with lists as well. > > I would appreciate greatly any guidance. > > > Thank you and regards, > Stevan > > > a for-loop like this works, but is not an ideal solution: > > for (i in 1:length(data)) {if ("Month" %in% names(data[[i]])) > data[[i]]$Month<- as.POSIXct(data[[i]]$Month, format="%Y/%m/%d")} > > > > sample data (head of two data frames from the list of all data frames): > > structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6, > Site_Number = c("R5", "R6a", "R8", "R9a", "R14", "R15"), > Month = c("2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", > "2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L, NA, NA, > 215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408), > Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L, > 325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5, > 417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names > c("ID", > "Site_Number", "Month", "Exc_A", "Em_A", "Fl_A", "Exc_B", "Em_B", > "Fl_B"), row.names = c(NA, 6L), class = "data.frame"), algae.csv > structure(list( > ID = 1:6, SiteNumber = c("R1", "R2A", "R2B", "R3", "R4", > "R5"), SiteLocation = c("CAP canal above Waddell Canal", > "Lake Pleasant integrated sample", "Lake Pleasant integrated sample", > "Waddell Canal", "Cap Canal at 7th St.", "Verde River btwn Horseshoe > and > Bartlett" > ), ClusterName = c("cap", "cap", "cap", "cap", "cap", "verde" > ), SiteAcronym = c("cap-siphon", "pleasant-epi", "pleasant-hypo", > "waddell canal", "cap @ 7th st", "verde abv bartlett"), Date > c("1999/08/18", > "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/16" > ), Month = c("1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", > "1999/08/01", "1999/08/01"), SampleType = c("", "", "", "", > "", ""), Conductance = c(800, 890, 850, 870, 830, 500), ChlA = c(0.3, > 0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7, > 4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7), > Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L, > 0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L, > 18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L > ), AlgaeComments = c("", "", "", "", "", "")), .Names = c("ID", > "SiteNumber", "SiteLocation", "ClusterName", "SiteAcronym", "Date", > "Month", "SampleType", "Conductance", "ChlA", "Phaeophytin", > "PhaeophytinChlA", "Chlorophyta", "Cyanophyta", "Bacillariophyta", > "Total", "AlgaeComments"), row.names = c(NA, 6L), class = "data.frame")), > .Names = c("3D_Fluorescence.csv", > "algae.csv")) > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]