thr3ads.net - R help - [R] change col types of a df/tbl

If this information is useful, please help other people find it:
Share via:

arnaud gaboury

2015-Dec-10 11:12 UTC

[R] change col types of a df/tbl_df

Here is a sample of my data frame, obtained with read_csv2 from readr package.

myDf <- structure(list(X15 = c("30.09.2015",
"05.10.2015", "30.09.2015",

"29.09.2015", "10.10.2015"), X16 = c("02.10.2015",
"06.10.2015",
"01.10.2015", "01.10.2015", "13.10.2015"), X17 =
c("Grains",
"Grains", "Grains", "Grains", "Grains"),
X18 = c("Soyabeans",
"Soyabeans", "Soyabeans", "Soyabeans",
"Soyabeans"), X19 = c("20,000",
"20,000", "20,000", "29,930",
"26,000")), .Names = c("X15", "X16",
"X17", "X18", "X19"), class =
c("tbl_df", "data.frame"), row.names = c(NA,
-5L))

gabx at hortensia [R] str(myDf)
Classes ?tbl_df? and 'data.frame': 5 obs. of  5 variables:
 $ X15: chr  "30.09.2015" "05.10.2015"
"30.09.2015" "29.09.2015" ...
 $ X16: chr  "02.10.2015" "06.10.2015"
"01.10.2015" "01.10.2015" ...
 $ X17: chr  "Grains" "Grains" "Grains"
"Grains" ...
 $ X18: chr  "Soyabeans" "Soyabeans" "Soyabeans"
"Soyabeans" ...
 $ X19: chr  "20,000" "20,000" "20,000"
"29,930" ...

I want to change date to date class and numbers (X19) to numeric, and
keep the class of my object.

This code works:

myDf$X19 <- as.numeric(gsub(",", "", myDf$X19))
myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y"))
myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y"))

Now, as I have more than 5 columns, this can be fastidious and slowing
code (?), even if I can group by type. Columns are only types of char,
num and Date, so it could be OK.

I tried with lapply for the Date columns. It works BUT will place NA
in any columns with numbers as characters.
The reuslt will be this for X19:  num NA NA NA NA NA NA NA NA NA NA ..

How can I target my goal with something else than lapply or writing a
line for each type ?

Thank you for hints.


-- 

google.com/+arnaudgabourygabx

Duncan Murdoch

2015-Dec-10 11:54 UTC

head link

[R] change col types of a df/tbl_df

On 10/12/2015 6:12 AM, arnaud gaboury wrote:> Here is a sample of my data frame, obtained with read_csv2 from readr
package.
>
> myDf <- structure(list(X15 = c("30.09.2015",
"05.10.2015", "30.09.2015",
>
> "29.09.2015", "10.10.2015"), X16 =
c("02.10.2015", "06.10.2015",
> "01.10.2015", "01.10.2015", "13.10.2015"),
X17 = c("Grains",
> "Grains", "Grains", "Grains",
"Grains"), X18 = c("Soyabeans",
> "Soyabeans", "Soyabeans", "Soyabeans",
"Soyabeans"), X19 = c("20,000",
> "20,000", "20,000", "29,930",
"26,000")), .Names = c("X15", "X16",
> "X17", "X18", "X19"), class =
c("tbl_df", "data.frame"), row.names = c(NA,
> -5L))
>
> gabx at hortensia [R] str(myDf)
> Classes ?tbl_df? and 'data.frame': 5 obs. of  5 variables:
>   $ X15: chr  "30.09.2015" "05.10.2015"
"30.09.2015" "29.09.2015" ...
>   $ X16: chr  "02.10.2015" "06.10.2015"
"01.10.2015" "01.10.2015" ...
>   $ X17: chr  "Grains" "Grains" "Grains"
"Grains" ...
>   $ X18: chr  "Soyabeans" "Soyabeans"
"Soyabeans" "Soyabeans" ...
>   $ X19: chr  "20,000" "20,000" "20,000"
"29,930" ...
>
> I want to change date to date class and numbers (X19) to numeric, and
> keep the class of my object.
>
> This code works:
>
> myDf$X19 <- as.numeric(gsub(",", "", myDf$X19))
> myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y"))
> myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y"))
>
> Now, as I have more than 5 columns, this can be fastidious and slowing
> code (?), even if I can group by type. Columns are only types of char,
> num and Date, so it could be OK.
>
> I tried with lapply for the Date columns. It works BUT will place NA
> in any columns with numbers as characters.
> The reuslt will be this for X19:  num NA NA NA NA NA NA NA NA NA NA ..
>
> How can I target my goal with something else than lapply or writing a
> line for each type ?
I don't see how a function could reliably detect the types, but it might 
be good enough to use a regular expression, possibly just on the first 
line of the result.  Once you've identified columns, e.g.

  numcols <- 19
  datecols <- c(15:16)

etc, you can use lapply:

myDf[,numcols] <- lapply(myDf[, numcools, drop=FALSE], function(x) 
as.numeric(gsub(",", "", x)))

You can simplify myDf[,numcols] to myDf[numcols] if you want, but I 
think it makes it less clear.

Duncan Murdoch

arnaud gaboury

2015-Dec-10 12:10 UTC

head link

[R] change col types of a df/tbl_df

On Thu, Dec 10, 2015 at 12:54 PM, Duncan Murdoch <murdoch.duncan at
gmail.com>
wrote:
> On 10/12/2015 6:12 AM, arnaud gaboury wrote:
>
>> Here is a sample of my data frame, obtained with read_csv2 from readr
>> package.
>>
>> myDf <- structure(list(X15 = c("30.09.2015",
"05.10.2015", "30.09.2015",
>>
>> "29.09.2015", "10.10.2015"), X16 =
c("02.10.2015", "06.10.2015",
>> "01.10.2015", "01.10.2015",
"13.10.2015"), X17 = c("Grains",
>> "Grains", "Grains", "Grains",
"Grains"), X18 = c("Soyabeans",
>> "Soyabeans", "Soyabeans", "Soyabeans",
"Soyabeans"), X19 = c("20,000",
>> "20,000", "20,000", "29,930",
"26,000")), .Names = c("X15", "X16",
>> "X17", "X18", "X19"), class =
c("tbl_df", "data.frame"), row.names = c(NA,
>> -5L))
>>
>> gabx at hortensia [R] str(myDf)
>> Classes ?tbl_df? and 'data.frame': 5 obs. of  5 variables:
>>   $ X15: chr  "30.09.2015" "05.10.2015"
"30.09.2015" "29.09.2015" ...
>>   $ X16: chr  "02.10.2015" "06.10.2015"
"01.10.2015" "01.10.2015" ...
>>   $ X17: chr  "Grains" "Grains" "Grains"
"Grains" ...
>>   $ X18: chr  "Soyabeans" "Soyabeans"
"Soyabeans" "Soyabeans" ...
>>   $ X19: chr  "20,000" "20,000" "20,000"
"29,930" ...
>>
>> I want to change date to date class and numbers (X19) to numeric, and
>> keep the class of my object.
>>
>> This code works:
>>
>> myDf$X19 <- as.numeric(gsub(",", "", myDf$X19))
>> myDf$X15 <- as.Date(myDf$X15, format = "%d.%m.%Y"))
>> myDf$X16 <- as.Date(myDf$X16, format = "%d.%m.%Y"))
>>
>> Now, as I have more than 5 columns, this can be fastidious and slowing
>> code (?), even if I can group by type. Columns are only types of char,
>> num and Date, so it could be OK.
>>
>> I tried with lapply for the Date columns. It works BUT will place NA
>> in any columns with numbers as characters.
>> The reuslt will be this for X19:  num NA NA NA NA NA NA NA NA NA NA ..
>>
>> How can I target my goal with something else than lapply or writing a
>> line for each type ?
>>
>
> I don't see how a function could reliably detect the types,
In fact, I only have 25 columns, so it is not difficult to list them in the
3 types: char, num and Date. No need of a function thus.

> but it might be good enough to use a regular expression, possibly just on
> the first line of the result.  Once you've identified columns, e.g.
>
>  numcols <- 19
>  datecols <- c(15:16)
>
> etc, you can use lapply:
>
> myDf[,numcols] <- lapply(myDf[, numcools, drop=FALSE], function(x)
> as.numeric(gsub(",", "", x)))
>
> You can simplify myDf[,numcols] to myDf[numcols] if you want, but I think
> it makes it less clear.

Thank you.
>
>
> Duncan Murdoch
>
>

-- 

google.com/+arnaudgabourygabx
<https://plus.google.com/_/notifications/emlink?emr=05814804238976922326&emid=CKiv-v6PvboCFcfoQgod6msAAA&path=%2F116159236040461325607%2Fop%2Fu&dt=1383086841306&ub=50>

	[[alternative HTML version deleted]]

R help - Dec 2015 - change col types of a df/tbl_df

[R] change col types of a df/tbl_df

[R] change col types of a df/tbl_df

[R] change col types of a df/tbl_df