Philip Monk
2021-Oct-25 14:09 UTC
[R] Dates as headers causing confusion but needed to convert to Julian days for ANOVA
Hello, First post - apologies if I get anything wrong - either in describing the question (I've only been using R for a week) or etiquette. I have CSV files of Land Surface Temperature (LST) data derived from Landsat 8 data and exported from Google Earth Engine. I am investigating whether the construction of utility-scale solar power plants affects the local climate. I need to tidy the CSV files so that I can use Two-way ANOVA w/repeated measures but am having problems due to column headers (necessarily, I think) being dates. Each CSV currently has the following columns: Buffer Values 100-2000 in 100 increments. Buffers are 100m wide and extend outwards from each site boundary. 24 columns of monthly data. Column headers are in date format (currently dd/mm/yyyy in Excel) and relate to the date on which the original Landsat 8 image from which the LST data are derived was captured. I need these dates to calculate the 'Julian day' (1-365.25) for each month, and also to extract the Year. Time Currently 1 = pre-construction and 2 = post-construction. The data frame created when importing one of these CSV's into R looks like this: 'data.frame': 20 obs. of 14 variables: $ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ... $ X15.01.2010: num 6.09 5.27 4.45 3.39 2.9 ... $ X16.02.2010: num 6.41 5.99 5.61 4.78 4.31 ... $ X20.03.2010: num 8.93 7.38 6.12 5.61 5.61 ... $ X24.04.2011: num 6.28 5.81 5.15 4.54 4.32 ... $ X07.05.2010: num 6.13 5.54 5.35 4.82 4.52 ... $ X08.06.2010: num 7.71 7.4 6.82 6.14 5.82 ... $ X13.07.2011: num 4.07 2.93 2.69 2.47 2.53 ... $ X11.08.2010: num 5.96 5.68 5.38 4.96 4.57 ... $ X12.09.2010: num 5.76 5.15 4.54 3.87 3.46 ... $ X17.10.2011: num 3.16 2.51 2.51 2.06 2.01 ... $ X15.11.2010: num 4.72 3.77 3.24 2.74 2.49 ... $ X01.12.2010: num 4.26 3.516 2.154 1.056 0.315 ... $ Time : int 1 1 1 1 1 1 1 1 1 1 ... Importing a CSV into R that has a date as a column header (in whatever format) causes problems! R adds the 'X', and converts the separator. I was using 'gather' and 'pivot_longer' (see below) but the date issue has wrecked that approach. I've tried reformating the date, trying to remove the X, and going away to learn more about data frames, dplyr, and readr. I'm not making any progress, though, and I'm just getting more confused. Helped requested ~~~~~~~~~~~~~~ How should I proceed to tidy the data such that I can: *) extract the year and Julian day for each date, then convert the date to the name of the month? *) create a tidy table with columns for Buffer, Month, Year, Julian day, LST (the values), and Time (1 = pre-construction, 2 = post-construction of a solar farm). Prior to deciding I needed to calculate the Julian day for use in ANOVA I was doing this (with month names rather than dates - please remember I'm a newbie!): data <- read.csv(... attach(data) # data_long <- data %>% pivot_longer(!Buffer, names_to = "month", values_to = "LST") # data_long <- data %>% pivot_longer(!Buffer, names_to = c("month", "Time"), names_sep = 13, values_to = "LST") data_long <- gather(data, Month, LST, January:December, factor_key=TRUE) data_long$Time <- as.factor(data$Time) str(data_long) 'pivot_longer' didn't work, but 'gather' did to create the long data needed for ANOVA. For example: 'data.frame': 480 obs. of 4 variables: $ Buffer: int 100 200 300 400 500 600 700 800 900 1000 ... $ Time : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... $ Month : Factor w/ 12 levels "January","February",..: 1 1 1 1 1 1 1 1 1 1 ... $ LST : num NA 0.803 0.803 1.044 0.475 ... Suggestions/hints/solutions would be most welcome. :) Thanks for your time, Philip Part-time PhD Student (Environmental Science) Lancaster University, UK. [[alternative HTML version deleted]]
Bert Gunter
2021-Oct-25 21:02 UTC
[R] Dates as headers causing confusion but needed to convert to Julian days for ANOVA
Well, both newbies and oldies need to read and follow the Help files carefully. In this case, note the "check.names" argument of ?read.csv. You need to set it to FALSE in your (omitted) read.csv call, because your strings are not syntactically valid names (follow the "make.names" link to learn what are valid names). Here is a little example:> z1 <- read.csv('mydat') > z2 <- read.csv('mydat', check.names = FALSE) > z1X01.12.2019 X02.15.2020 1 1 5 2 2 6 3 3 7> z201/12/2019 02/15/2020 1 1 5 2 2 6 3 3 7 However, because your column names are *not* syntactically valid, you'll have to change them anyway to avoid further infelicities in accessing and manipulating the data(e.g. see ?names). How you choose to do this is up to you. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 25, 2021 at 1:26 PM Philip Monk <prmonk at gmail.com> wrote:> Hello, > > First post - apologies if I get anything wrong - either in describing the > question (I've only been using R for a week) or etiquette. > > I have CSV files of Land Surface Temperature (LST) data derived from > Landsat 8 data and exported from Google Earth Engine. I am investigating > whether the construction of utility-scale solar power plants affects the > local climate. > > I need to tidy the CSV files so that I can use Two-way ANOVA w/repeated > measures but am having problems due to column headers (necessarily, I > think) being dates. > > Each CSV currently has the following columns: > > Buffer > Values 100-2000 in 100 increments. Buffers are 100m wide and extend > outwards from each site boundary. > > 24 columns of monthly data. > Column headers are in date format (currently dd/mm/yyyy in Excel) and > relate to the date on which the original Landsat 8 image from which the LST > data are derived was captured. > I need these dates to calculate the 'Julian day' (1-365.25) for each month, > and also to extract the Year. > > Time > Currently 1 = pre-construction and 2 = post-construction. > > The data frame created when importing one of these CSV's into R looks like > this: > > 'data.frame': 20 obs. of 14 variables: > $ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ... > $ X15.01.2010: num 6.09 5.27 4.45 3.39 2.9 ... > $ X16.02.2010: num 6.41 5.99 5.61 4.78 4.31 ... > $ X20.03.2010: num 8.93 7.38 6.12 5.61 5.61 ... > $ X24.04.2011: num 6.28 5.81 5.15 4.54 4.32 ... > $ X07.05.2010: num 6.13 5.54 5.35 4.82 4.52 ... > $ X08.06.2010: num 7.71 7.4 6.82 6.14 5.82 ... > $ X13.07.2011: num 4.07 2.93 2.69 2.47 2.53 ... > $ X11.08.2010: num 5.96 5.68 5.38 4.96 4.57 ... > $ X12.09.2010: num 5.76 5.15 4.54 3.87 3.46 ... > $ X17.10.2011: num 3.16 2.51 2.51 2.06 2.01 ... > $ X15.11.2010: num 4.72 3.77 3.24 2.74 2.49 ... > $ X01.12.2010: num 4.26 3.516 2.154 1.056 0.315 ... > $ Time : int 1 1 1 1 1 1 1 1 1 1 ... > > > Importing a CSV into R that has a date as a column header (in whatever > format) causes problems! R adds the 'X', and converts the separator. > > I was using 'gather' and 'pivot_longer' (see below) but the date issue has > wrecked that approach. I've tried reformating the date, trying to remove > the X, and going away to learn more about data frames, dplyr, and readr. > I'm not making any progress, though, and I'm just getting more confused. > > Helped requested > ~~~~~~~~~~~~~~ > > How should I proceed to tidy the data such that I can: > > *) extract the year and Julian day for each date, then convert the date to > the name of the month? > *) create a tidy table with columns for Buffer, Month, Year, Julian day, > LST (the values), and Time (1 = pre-construction, 2 = post-construction of > a solar farm). > > Prior to deciding I needed to calculate the Julian day for use in ANOVA I > was doing this (with month names rather than dates - please remember I'm a > newbie!): > > data <- read.csv(... > attach(data) > # data_long <- data %>% pivot_longer(!Buffer, names_to = "month", values_to > = "LST") > # data_long <- data %>% pivot_longer(!Buffer, names_to = c("month", > "Time"), names_sep = 13, values_to = "LST") > data_long <- gather(data, Month, LST, January:December, factor_key=TRUE) > data_long$Time <- as.factor(data$Time) > str(data_long) > > 'pivot_longer' didn't work, but 'gather' did to create the long data needed > for ANOVA. > > For example: > > 'data.frame': 480 obs. of 4 variables: > $ Buffer: int 100 200 300 400 500 600 700 800 900 1000 ... > $ Time : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... > $ Month : Factor w/ 12 levels "January","February",..: 1 1 1 1 1 1 1 1 1 1 > ... > $ LST : num NA 0.803 0.803 1.044 0.475 ... > > Suggestions/hints/solutions would be most welcome. :) > > Thanks for your time, > > Philip > > Part-time PhD Student (Environmental Science) > Lancaster University, UK. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jeff Newmiller
2021-Oct-25 21:22 UTC
[R] Dates as headers causing confusion but needed to convert to Julian days for ANOVA
You did not say which function you used to import the csv file, but it looks like you probably used read.csv without setting the check.names argument to FALSE. Whether you change that out not, once you have reshaped the data, you can use a format specifier with as.Date to extract a date. (See ?strptime for format string specifiers.) On October 25, 2021 7:09:24 AM PDT, Philip Monk <prmonk at gmail.com> wrote:>Hello, > >First post - apologies if I get anything wrong - either in describing the >question (I've only been using R for a week) or etiquette. > >I have CSV files of Land Surface Temperature (LST) data derived from >Landsat 8 data and exported from Google Earth Engine. I am investigating >whether the construction of utility-scale solar power plants affects the >local climate. > >I need to tidy the CSV files so that I can use Two-way ANOVA w/repeated >measures but am having problems due to column headers (necessarily, I >think) being dates. > >Each CSV currently has the following columns: > >Buffer >Values 100-2000 in 100 increments. Buffers are 100m wide and extend >outwards from each site boundary. > >24 columns of monthly data. >Column headers are in date format (currently dd/mm/yyyy in Excel) and >relate to the date on which the original Landsat 8 image from which the LST >data are derived was captured. >I need these dates to calculate the 'Julian day' (1-365.25) for each month, >and also to extract the Year. > >Time >Currently 1 = pre-construction and 2 = post-construction. > >The data frame created when importing one of these CSV's into R looks like >this: > >'data.frame': 20 obs. of 14 variables: > $ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ... > $ X15.01.2010: num 6.09 5.27 4.45 3.39 2.9 ... > $ X16.02.2010: num 6.41 5.99 5.61 4.78 4.31 ... > $ X20.03.2010: num 8.93 7.38 6.12 5.61 5.61 ... > $ X24.04.2011: num 6.28 5.81 5.15 4.54 4.32 ... > $ X07.05.2010: num 6.13 5.54 5.35 4.82 4.52 ... > $ X08.06.2010: num 7.71 7.4 6.82 6.14 5.82 ... > $ X13.07.2011: num 4.07 2.93 2.69 2.47 2.53 ... > $ X11.08.2010: num 5.96 5.68 5.38 4.96 4.57 ... > $ X12.09.2010: num 5.76 5.15 4.54 3.87 3.46 ... > $ X17.10.2011: num 3.16 2.51 2.51 2.06 2.01 ... > $ X15.11.2010: num 4.72 3.77 3.24 2.74 2.49 ... > $ X01.12.2010: num 4.26 3.516 2.154 1.056 0.315 ... > $ Time : int 1 1 1 1 1 1 1 1 1 1 ... > > >Importing a CSV into R that has a date as a column header (in whatever >format) causes problems! R adds the 'X', and converts the separator. > >I was using 'gather' and 'pivot_longer' (see below) but the date issue has >wrecked that approach. I've tried reformating the date, trying to remove >the X, and going away to learn more about data frames, dplyr, and readr. >I'm not making any progress, though, and I'm just getting more confused. > >Helped requested >~~~~~~~~~~~~~~ > >How should I proceed to tidy the data such that I can: > >*) extract the year and Julian day for each date, then convert the date to >the name of the month? >*) create a tidy table with columns for Buffer, Month, Year, Julian day, >LST (the values), and Time (1 = pre-construction, 2 = post-construction of >a solar farm). > >Prior to deciding I needed to calculate the Julian day for use in ANOVA I >was doing this (with month names rather than dates - please remember I'm a >newbie!): > >data <- read.csv(... >attach(data) ># data_long <- data %>% pivot_longer(!Buffer, names_to = "month", values_to >= "LST") ># data_long <- data %>% pivot_longer(!Buffer, names_to = c("month", >"Time"), names_sep = 13, values_to = "LST") >data_long <- gather(data, Month, LST, January:December, factor_key=TRUE) >data_long$Time <- as.factor(data$Time) >str(data_long) > >'pivot_longer' didn't work, but 'gather' did to create the long data needed >for ANOVA. > >For example: > >'data.frame': 480 obs. of 4 variables: > $ Buffer: int 100 200 300 400 500 600 700 800 900 1000 ... > $ Time : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... > $ Month : Factor w/ 12 levels "January","February",..: 1 1 1 1 1 1 1 1 1 1 >... > $ LST : num NA 0.803 0.803 1.044 0.475 ... > >Suggestions/hints/solutions would be most welcome. :) > >Thanks for your time, > >Philip > >Part-time PhD Student (Environmental Science) >Lancaster University, UK. > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Daniel Nordlund
2021-Oct-25 22:15 UTC
[R] Dates as headers causing confusion but needed to convert to Julian days for ANOVA
On 10/25/2021 7:09 AM, Philip Monk wrote:> Hello, > > First post - apologies if I get anything wrong - either in describing the > question (I've only been using R for a week) or etiquette. > > I have CSV files of Land Surface Temperature (LST) data derived from > Landsat 8 data and exported from Google Earth Engine. I am investigating > whether the construction of utility-scale solar power plants affects the > local climate. > > I need to tidy the CSV files so that I can use Two-way ANOVA w/repeated > measures but am having problems due to column headers (necessarily, I > think) being dates. > > Each CSV currently has the following columns: > > Buffer > Values 100-2000 in 100 increments. Buffers are 100m wide and extend > outwards from each site boundary. > > 24 columns of monthly data. > Column headers are in date format (currently dd/mm/yyyy in Excel) and > relate to the date on which the original Landsat 8 image from which the LST > data are derived was captured. > I need these dates to calculate the 'Julian day' (1-365.25) for each month, > and also to extract the Year. > > Time > Currently 1 = pre-construction and 2 = post-construction. > > The data frame created when importing one of these CSV's into R looks like > this: > > 'data.frame': 20 obs. of 14 variables: > $ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ... > $ X15.01.2010: num 6.09 5.27 4.45 3.39 2.9 ... > $ X16.02.2010: num 6.41 5.99 5.61 4.78 4.31 ... > $ X20.03.2010: num 8.93 7.38 6.12 5.61 5.61 ... > $ X24.04.2011: num 6.28 5.81 5.15 4.54 4.32 ... > $ X07.05.2010: num 6.13 5.54 5.35 4.82 4.52 ... > $ X08.06.2010: num 7.71 7.4 6.82 6.14 5.82 ... > $ X13.07.2011: num 4.07 2.93 2.69 2.47 2.53 ... > $ X11.08.2010: num 5.96 5.68 5.38 4.96 4.57 ... > $ X12.09.2010: num 5.76 5.15 4.54 3.87 3.46 ... > $ X17.10.2011: num 3.16 2.51 2.51 2.06 2.01 ... > $ X15.11.2010: num 4.72 3.77 3.24 2.74 2.49 ... > $ X01.12.2010: num 4.26 3.516 2.154 1.056 0.315 ... > $ Time : int 1 1 1 1 1 1 1 1 1 1 ... > > > Importing a CSV into R that has a date as a column header (in whatever > format) causes problems! R adds the 'X', and converts the separator. > > I was using 'gather' and 'pivot_longer' (see below) but the date issue has > wrecked that approach. I've tried reformating the date, trying to remove > the X, and going away to learn more about data frames, dplyr, and readr. > I'm not making any progress, though, and I'm just getting more confused. > > Helped requested > ~~~~~~~~~~~~~~ > > How should I proceed to tidy the data such that I can: > > *) extract the year and Julian day for each date, then convert the date to > the name of the month? > *) create a tidy table with columns for Buffer, Month, Year, Julian day, > LST (the values), and Time (1 = pre-construction, 2 = post-construction of > a solar farm). > > Prior to deciding I needed to calculate the Julian day for use in ANOVA I > was doing this (with month names rather than dates - please remember I'm a > newbie!): > > data <- read.csv(... > attach(data) > # data_long <- data %>% pivot_longer(!Buffer, names_to = "month", values_to > = "LST") > # data_long <- data %>% pivot_longer(!Buffer, names_to = c("month", > "Time"), names_sep = 13, values_to = "LST") > data_long <- gather(data, Month, LST, January:December, factor_key=TRUE) > data_long$Time <- as.factor(data$Time) > str(data_long) > > 'pivot_longer' didn't work, but 'gather' did to create the long data needed > for ANOVA. > > For example: > > 'data.frame': 480 obs. of 4 variables: > $ Buffer: int 100 200 300 400 500 600 700 800 900 1000 ... > $ Time : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... > $ Month : Factor w/ 12 levels "January","February",..: 1 1 1 1 1 1 1 1 1 1 > ... > $ LST : num NA 0.803 0.803 1.044 0.475 ... > > Suggestions/hints/solutions would be most welcome. :) > > Thanks for your time, > > Philip > > Part-time PhD Student (Environmental Science) > Lancaster University, UK. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Using the data that you read in, you can use "pivot_longer", and various "date" functions to get what you want. For example # pivot your data data_long <- data %>% pivot_longer(starts_with("X"), names_to = "chr_date", values_to = "LST") # now you can use various data functions to get your month, day, and year # for example data_long$month <- month(as.Date(data_long$chr_date,"X%d.%m.%Y")) You may want to read up on the various date functions built in to R such as as.POSIXct, as.POSIXlt, as.Date, and maybe look at the contributed package, lubridate. Hope this is helpful, Dan -- Daniel Nordlund Port Townsend, WA USA -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus