Here is a sample of my data frame, obtained with read_csv2 from readr package. myDf <- structure(list(X15 = c("30.09.2015", "05.10.2015", "30.09.2015", "29.09.2015", "10.10.2015"), X16 = c("02.10.2015", "06.10.2015", "01.10.2015", "01.10.2015", "13.10.2015"), X17 = c("Grains", "Grains", "Grains", "Grains", "Grains"), X18 = c("Soyabeans", "Soyabeans", "Soyabeans", "Soyabeans", "Soyabeans"), X19 = c("20,000", "20,000", "20,000", "29,930", "26,000")), .Names = c("X15", "X16", "X17", "X18", "X19"), class = c("tbl_df", "data.frame"), row.names = c(NA, -5L)) gabx at hortensia [R] str(myDf) Classes ?tbl_df? and 'data.frame': 5 obs. of 5 variables: $ X15: chr "30.09.2015" "05.10.2015" "30.09.2015" "29.09.2015" ... $ X16: chr "02.10.2015" "06.10.2015" "01.10.2015" "01.10.2015" ... $ X17: chr "Grains" "Grains" "Grains" "Grains" ... $ X18: chr "Soyabeans" "Soyabeans" "Soyabeans" "Soyabeans" ... $ X19: chr "20,000" "20,000" "20,000" "29,930" ... I want to change date to date class and numbers (X19) to numeric, and keep the class of my object. This code works: myDf$X19 <- as.numeric(gsub(",", "", myDf$X19)) myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y")) myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y")) Now, as I have more than 5 columns, this can be fastidious and slowing code (?), even if I can group by type. Columns are only types of char, num and Date, so it could be OK. I tried with lapply for the Date columns. It works BUT will place NA in any columns with numbers as characters. The reuslt will be this for X19: num NA NA NA NA NA NA NA NA NA NA .. How can I target my goal with something else than lapply or writing a line for each type ? Thank you for hints. -- google.com/+arnaudgabourygabx
On 10/12/2015 6:12 AM, arnaud gaboury wrote:> Here is a sample of my data frame, obtained with read_csv2 from readr package. > > myDf <- structure(list(X15 = c("30.09.2015", "05.10.2015", "30.09.2015", > > "29.09.2015", "10.10.2015"), X16 = c("02.10.2015", "06.10.2015", > "01.10.2015", "01.10.2015", "13.10.2015"), X17 = c("Grains", > "Grains", "Grains", "Grains", "Grains"), X18 = c("Soyabeans", > "Soyabeans", "Soyabeans", "Soyabeans", "Soyabeans"), X19 = c("20,000", > "20,000", "20,000", "29,930", "26,000")), .Names = c("X15", "X16", > "X17", "X18", "X19"), class = c("tbl_df", "data.frame"), row.names = c(NA, > -5L)) > > gabx at hortensia [R] str(myDf) > Classes ?tbl_df? and 'data.frame': 5 obs. of 5 variables: > $ X15: chr "30.09.2015" "05.10.2015" "30.09.2015" "29.09.2015" ... > $ X16: chr "02.10.2015" "06.10.2015" "01.10.2015" "01.10.2015" ... > $ X17: chr "Grains" "Grains" "Grains" "Grains" ... > $ X18: chr "Soyabeans" "Soyabeans" "Soyabeans" "Soyabeans" ... > $ X19: chr "20,000" "20,000" "20,000" "29,930" ... > > I want to change date to date class and numbers (X19) to numeric, and > keep the class of my object. > > This code works: > > myDf$X19 <- as.numeric(gsub(",", "", myDf$X19)) > myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y")) > myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y")) > > Now, as I have more than 5 columns, this can be fastidious and slowing > code (?), even if I can group by type. Columns are only types of char, > num and Date, so it could be OK. > > I tried with lapply for the Date columns. It works BUT will place NA > in any columns with numbers as characters. > The reuslt will be this for X19: num NA NA NA NA NA NA NA NA NA NA .. > > How can I target my goal with something else than lapply or writing a > line for each type ?I don't see how a function could reliably detect the types, but it might be good enough to use a regular expression, possibly just on the first line of the result. Once you've identified columns, e.g. numcols <- 19 datecols <- c(15:16) etc, you can use lapply: myDf[,numcols] <- lapply(myDf[, numcools, drop=FALSE], function(x) as.numeric(gsub(",", "", x))) You can simplify myDf[,numcols] to myDf[numcols] if you want, but I think it makes it less clear. Duncan Murdoch
On Thu, Dec 10, 2015 at 12:54 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 10/12/2015 6:12 AM, arnaud gaboury wrote: > >> Here is a sample of my data frame, obtained with read_csv2 from readr >> package. >> >> myDf <- structure(list(X15 = c("30.09.2015", "05.10.2015", "30.09.2015", >> >> "29.09.2015", "10.10.2015"), X16 = c("02.10.2015", "06.10.2015", >> "01.10.2015", "01.10.2015", "13.10.2015"), X17 = c("Grains", >> "Grains", "Grains", "Grains", "Grains"), X18 = c("Soyabeans", >> "Soyabeans", "Soyabeans", "Soyabeans", "Soyabeans"), X19 = c("20,000", >> "20,000", "20,000", "29,930", "26,000")), .Names = c("X15", "X16", >> "X17", "X18", "X19"), class = c("tbl_df", "data.frame"), row.names = c(NA, >> -5L)) >> >> gabx at hortensia [R] str(myDf) >> Classes ?tbl_df? and 'data.frame': 5 obs. of 5 variables: >> $ X15: chr "30.09.2015" "05.10.2015" "30.09.2015" "29.09.2015" ... >> $ X16: chr "02.10.2015" "06.10.2015" "01.10.2015" "01.10.2015" ... >> $ X17: chr "Grains" "Grains" "Grains" "Grains" ... >> $ X18: chr "Soyabeans" "Soyabeans" "Soyabeans" "Soyabeans" ... >> $ X19: chr "20,000" "20,000" "20,000" "29,930" ... >> >> I want to change date to date class and numbers (X19) to numeric, and >> keep the class of my object. >> >> This code works: >> >> myDf$X19 <- as.numeric(gsub(",", "", myDf$X19)) >> myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y")) >> myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y")) >> >> Now, as I have more than 5 columns, this can be fastidious and slowing >> code (?), even if I can group by type. Columns are only types of char, >> num and Date, so it could be OK. >> >> I tried with lapply for the Date columns. It works BUT will place NA >> in any columns with numbers as characters. >> The reuslt will be this for X19: num NA NA NA NA NA NA NA NA NA NA .. >> >> How can I target my goal with something else than lapply or writing a >> line for each type ? >> > > I don't see how a function could reliably detect the types,In fact, I only have 25 columns, so it is not difficult to list them in the 3 types: char, num and Date. No need of a function thus.> but it might be good enough to use a regular expression, possibly just on > the first line of the result. Once you've identified columns, e.g. > > numcols <- 19 > datecols <- c(15:16) > > etc, you can use lapply: > > myDf[,numcols] <- lapply(myDf[, numcools, drop=FALSE], function(x) > as.numeric(gsub(",", "", x))) > > You can simplify myDf[,numcols] to myDf[numcols] if you want, but I think > it makes it less clear.Thank you.> > > Duncan Murdoch > >-- google.com/+arnaudgabourygabx <https://plus.google.com/_/notifications/emlink?emr=05814804238976922326&emid=CKiv-v6PvboCFcfoQgod6msAAA&path=%2F116159236040461325607%2Fop%2Fu&dt=1383086841306&ub=50> [[alternative HTML version deleted]]