R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) Hi. I have data set with day month year integers. I am creating a date column from those using lubridate. a hundred or so rows failed to parse. The problem is April and September have day = 31. paste(df1$year, df1$month, df1$day, sep = "-") ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial #The resulting Date vector can be added to df1 as a new column called date: df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning head(df1) sapply(df1$date,class) #"date" summary(df1$date) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) # year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 I am trying to replace those with 30. I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument The following are screwy attempts at this simple repair, ??mutate_if ??replace is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) #year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 #So need those months with 30 days that are 31 to be 30 View(missing_dates) install.packages("dplyr") library(dplyr) View(missing_dates) # ..those were the values you're going to replace I thought this function from stackover would work, but get error when I try to add filter #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] return(.data_Frame) } df.Rep(missing_dates, 3, 31, 30) #--So I should be able to apply this to the complete df1 data somehow? head(df1) df.Rep(df1, filter(month == c(4,9)), 31, 30) #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types Other screwy attempts: select(df1, month, day, year) str(df1) #'data.frame': 34786 obs. of 14 variables: #To choose rows, use filter(): #mutate_if(df1, month =4,9), day = 30) filter(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() # 1 FALSE FALSE 31161 # 2 FALSE TRUE 576 # 3 TRUE FALSE 2981 # 4 TRUE TRUE 68 df1 %>% mutate(day=replace(day, month == c(4,9), 30)) %>% as.data.frame() View(as.list(df1, month == 4)) View(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() View(df1, month == c(4,9)) # df1 %>% # group_by(month == c(4,9), day == 30) %>% I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R. Thank you for any advice. WHP Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
> On Jul 12, 2018, at 8:17 AM, Bill Poling <Bill.Poling at zelis.com> wrote: > > > R version 3.5.1 (2018-07-02) -- "Feather Spray" > Copyright (C) 2018 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Hi. > > I have data set with day month year integers. I am creating a date column from those using lubridate. > > a hundred or so rows failed to parse. > > The problem is April and September have day = 31. > > paste(df1$year, df1$month, df1$day, sep = "-") > > ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial > > #The resulting Date vector can be added to df1 as a new column called date: > df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning > > > head(df1) > sapply(df1$date,class) #"date" > summary(df1$date) > # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's > #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > # year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > I am trying to replace those with 30.Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.) missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30> missing_datesyear month day 3144 2000 9 30 3817 2000 4 30 3818 2000 4 30 3819 2000 4 30 3820 2000 4 30 3856 2000 9 30 Best; David.> > I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. > > https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 > https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace > https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument > The following are screwy attempts at this simple repair, > > ??mutate_if > > ??replace > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > #year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > #So need those months with 30 days that are 31 to be 30 > View(missing_dates) > > install.packages("dplyr") > library(dplyr) > > > View(missing_dates) > # ..those were the values you're going to replace > > I thought this function from stackover would work, but get error when I try to add filter > > #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 > df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ > .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] > return(.data_Frame) > } > > df.Rep(missing_dates, 3, 31, 30) > > #--So I should be able to apply this to the complete df1 data somehow? > head(df1) > df.Rep(df1, filter(month == c(4,9)), 31, 30) > #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types > > > Other screwy attempts: > > > select(df1, month, day, year) > str(df1) > #'data.frame': 34786 obs. of 14 variables: > #To choose rows, use filter(): > > #mutate_if(df1, month =4,9), day = 30) > > > filter(df1, month == c(4,9), day == 31) > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > # 1 FALSE FALSE 31161 > # 2 FALSE TRUE 576 > # 3 TRUE FALSE 2981 > # 4 TRUE TRUE 68 > > df1 %>% > mutate(day=replace(day, month == c(4,9), 30)) %>% > as.data.frame() > View(as.list(df1, month == 4)) > View(df1, month == c(4,9), day == 31) > > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > View(df1, month == c(4,9)) > > # df1 %>% > # group_by(month == c(4,9), day == 30) %>% > > > I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R. > > Thank you for any advice. > > WHP > > > > > > > > > > > > > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol! Thank you David Min. 1st Qu. Median Mean 3rd Qu. Max. "1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31" WHP From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, July 12, 2018 11:29 AM To: Bill Poling <Bill.Poling at zelis.com> Cc: r-help (r-help at r-project.org) <r-help at r-project.org> Subject: Re: [R] Help with replace()> On Jul 12, 2018, at 8:17 AM, Bill Poling <Bill.Poling at zelis.com<mailto:Bill.Poling at zelis.com>> wrote: > > > R version 3.5.1 (2018-07-02) -- "Feather Spray" > Copyright (C) 2018 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Hi. > > I have data set with day month year integers. I am creating a date column from those using lubridate. > > a hundred or so rows failed to parse. > > The problem is April and September have day = 31. > > paste(df1$year, df1$month, df1$day, sep = "-") > > ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial > > #The resulting Date vector can be added to df1 as a new column called date: > df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning > > > head(df1) > sapply(df1$date,class) #"date" > summary(df1$date) > # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's > #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > # year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > I am trying to replace those with 30.Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.) missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30> missing_datesyear month day 3144 2000 9 30 3817 2000 4 30 3818 2000 4 30 3819 2000 4 30 3820 2000 4 30 3856 2000 9 30 Best; David.> > I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. > > https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1> > https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace> > https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument> > The following are screwy attempts at this simple repair, > > ??mutate_if > > ??replace > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > #year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > #So need those months with 30 days that are 31 to be 30 > View(missing_dates) > > install.packages("dplyr") > library(dplyr) > > > View(missing_dates) > # ..those were the values you're going to replace > > I thought this function from stackover would work, but get error when I try to add filter > > #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1> > df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ > .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] > return(.data_Frame) > } > > df.Rep(missing_dates, 3, 31, 30) > > #--So I should be able to apply this to the complete df1 data somehow? > head(df1) > df.Rep(df1, filter(month == c(4,9)), 31, 30) > #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types > > > Other screwy attempts: > > > select(df1, month, day, year) > str(df1) > #'data.frame': 34786 obs. of 14 variables: > #To choose rows, use filter(): > > #mutate_if(df1, month =4,9), day = 30) > > > filter(df1, month == c(4,9), day == 31) > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > # 1 FALSE FALSE 31161 > # 2 FALSE TRUE 576 > # 3 TRUE FALSE 2981 > # 4 TRUE TRUE 68 > > df1 %>% > mutate(day=replace(day, month == c(4,9), 30)) %>% > as.data.frame() > View(as.list(df1, month == 4)) > View(df1, month == c(4,9), day == 31) > > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > View(df1, month == c(4,9)) > > # df1 %>% > # group_by(month == c(4,9), day == 30) %>% > > > I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R. > > Thank you for any advice. > > WHP > > > > > > > > > > > > > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}} > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}