thr3ads.net - R help - [R] Help with read.csv.sql() [Jul 2020]

If this information is useful, please help other people find it:
Share via:

2020-Jul-20 00:07 UTC

[R] Help with read.csv.sql()

On 07/18/2020 01:38 PM, William Michels wrote:> Do either of the postings/threads below help?
>
>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>
> Otherwise you can try reading through the FAQ on Github:
>
> https://github.com/ggrothendieck/sqldf
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote:
>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>> Hello,
>>>
>>> I don't believe that what you are asking for is possible but
like Bert suggested, you can do it after reading in the data.
>>> You could write a convenience function to read the data, then
change what you need to change.
>>> Then the function would return this final object.
>>>
>>> Rui Barradas
>>>
>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>
>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>> Is there some reason that you can't make the changes to
the data frame (column names, as.date(), ...) *after* you have read all your
data in?
>>>>>
>>>>> Do all your csv files use the same names and date formats?
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people
keep coming along and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>
>>>>>
>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com <mailto:agents at meddatainc.com>> wrote:
>>>>>
>>>>>      I have created a dataframe with columns that are
characters, integers and numeric and with column names assigned by me. I am
using read.csv.sql() to read portions of a number of large csv files into this
dataframe, each csv file having a header row with columb names.
>>>>>
>>>>>      The problem I am having is that the csv files have
header rows with column names that are slightly different from the column names
I have assigned in the dataframe and it seems that when I read the csv data into
the dataframe, the column names from the csv file replace the column names I
chose when creating the dataframe.
>>>>>
>>>>>      I have been unable to figure out if it is possible to
assign column names of my choosing in the read.csv.sql() function? I have tried
various variations but none seem to work. I tried colClasses = c(....) but that
did not work, I tried field.types = c(...) but could not get that to work
either.
>>>>>
>>>>>      It seems that the above should be feasible but I am
missing something? Does anyone know?
>>>>>
>>>>>      A secondary issue is that the csv files have a column
with a date in mm/dd/yyyy format that I would like to make into a Date type
column in my dataframe. Again, I have been unable to find a way - if at all
possible - to force a conversion into a Date format when importing into the
dataframe. The best I have so far is to import is a character column and then
use as.Date() to later force the conversion of the dataframe column.
>>>>>
>>>>>      Is it possible to do this when importing using
read.csv.sql()?
>>>>>
>>>>>      ______________________________________________
>>>>>      R-help at r-project.org <mailto:R-help at
r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>      PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>      and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>> Yes, the files use the same column names and date format (at
least as far as I know now.) I agree I could do it as you suggest above but from
a purist perspective I would rather do it when importing the data using
read.csv.sql(), particularly if column names and/or date format might change, or
be different between different files. I am indeed selecting rows from a large
number of csv files so this is entirely plausible.
>>>>
>>>> Has anyone been able to name columns in the read.csv.sql() call
and/or force date format conversion in the call itself? The first refers to
naming columns differently from what a header in the csv file may have.
>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>> The documentation for read.csv.sql() suggests that colClasses() and/or
field.types() should work but I may well have misunderstood the documentation,
hence my question in this group.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
I had read the sqldf() documentation but was left with the impression that what
I want to do is not easily doable.

Ulrik Stervbo

2020-Jul-29 15:11 UTC

head link

[R] Help with read.csv.sql()

You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
     STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
     STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
     STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
     STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
     STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
     STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
     STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
     STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
   col_types = cols(
     myDate = col_date(format = ""),
     myTime = col_time(format = ""),
     myLat = col_number(),
     myLong = col_number(),
     .default = col_character()
   )
   )

read_csv(
   lines,
   col_types = cols_only(
     Id = col_character(),
     Date = col_date(format = ""),
     Time = col_time(format = "")
   )
)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
   col_types = cols_only(
     myId = col_character(),
     myDate = col_date(format = ""),
     myTime = col_time(format = "")
   )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>> 
>>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>> 
>> Otherwise you can try reading through the FAQ on Github:
>> 
>> https://github.com/ggrothendieck/sqldf
>> 
>> HTH, Bill.
>> 
>> W. Michels, Ph.D.
>> 
>> 
>> 
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com>
wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>> 
>>>> I don't believe that what you are asking for is possible
but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then
change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>> 
>>>> Rui Barradas
>>>> 
>>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>> 
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the
changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have
read all
>>>>>> your data in?
>>>>>> 
>>>>>> Do all your csv files use the same names and date
formats?
>>>>>> 
>>>>>> 
>>>>>> Bert Gunter
>>>>>> 
>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com
>>>>>> <mailto:agents at meddatainc.com>> wrote:
>>>>>> 
>>>>>>      I have created a dataframe with columns that are
characters,
>>>>>> integers and numeric and with column names assigned by
me. I am
>>>>>> using read.csv.sql() to read portions of a number of
large csv
>>>>>> files into this dataframe, each csv file having a
header row with
>>>>>> columb names.
>>>>>> 
>>>>>>      The problem I am having is that the csv files have
header
>>>>>> rows with column names that are slightly different from
the column
>>>>>> names I have assigned in the dataframe and it seems
that when I
>>>>>> read the csv data into the dataframe, the column names
from the
>>>>>> csv file replace the column names I chose when creating
the
>>>>>> dataframe.
>>>>>> 
>>>>>>      I have been unable to figure out if it is possible
to assign
>>>>>> column names of my choosing in the read.csv.sql()
function? I have
>>>>>> tried various variations but none seem to work. I tried
colClasses
>>>>>> = c(....) but that did not work, I tried field.types =
c(...) but
>>>>>> could not get that to work either.
>>>>>> 
>>>>>>      It seems that the above should be feasible but I
am missing
>>>>>> something? Does anyone know?
>>>>>> 
>>>>>>      A secondary issue is that the csv files have a
column with a
>>>>>> date in mm/dd/yyyy format that I would like to make
into a Date
>>>>>> type column in my dataframe. Again, I have been unable
to find a
>>>>>> way - if at all possible - to force a conversion into a
Date
>>>>>> format when importing into the dataframe. The best I
have so far
>>>>>> is to import is a character column and then use
as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>> 
>>>>>>      Is it possible to do this when importing using 
>>>>>> read.csv.sql()?
>>>>>> 
>>>>>>      ______________________________________________
>>>>>>      R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      PLEASE do read the posting guide 
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>      and provide commented, minimal, self-contained,
reproducible
>>>>>> code.
>>>>>> 
>>>>> Yes, the files use the same column names and date format
(at least
>>>>> as far as I know now.) I agree I could do it as you suggest
above
>>>>> but from a purist perspective I would rather do it when
importing
>>>>> the data using read.csv.sql(), particularly if column names
and/or
>>>>> date format might change, or be different between different
files.
>>>>> I am indeed selecting rows from a large number of csv files
so this
>>>>> is entirely plausible.
>>>>> 
>>>>> Has anyone been able to name columns in the read.csv.sql()
call
>>>>> and/or force date format conversion in the call itself? The
first
>>>>> refers to naming columns differently from what a header in
the csv
>>>>> file may have.
>>>>> 
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses() 
>>> and/or field.types() should work but I may well have misunderstood 
>>> the documentation, hence my question in this group.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ulrik Stervbo

2020-Jul-29 15:14 UTC

head link

[R] Help with read.csv.sql()

You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
     STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
     STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
     STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
     STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
     STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
     STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
     STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
     STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
   col_types = cols(
     myDate = col_date(format = ""),
     myTime = col_time(format = ""),
     myLat = col_number(),
     myLong = col_number(),
     .default = col_character()
   )
   )

read_csv(
   lines,
   col_types = cols_only(
     Id = col_character(),
     Date = col_date(format = ""),
     Time = col_time(format = "")
   )
)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
   col_types = cols_only(
     myId = col_character(),
     myDate = col_date(format = ""),
     myTime = col_time(format = "")
   )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>> 
>>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>> 
>> Otherwise you can try reading through the FAQ on Github:
>> 
>> https://github.com/ggrothendieck/sqldf
>> 
>> HTH, Bill.
>> 
>> W. Michels, Ph.D.
>> 
>> 
>> 
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com>
wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>> 
>>>> I don't believe that what you are asking for is possible
but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then
change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>> 
>>>> Rui Barradas
>>>> 
>>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>> 
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the
changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have
read all
>>>>>> your data in?
>>>>>> 
>>>>>> Do all your csv files use the same names and date
formats?
>>>>>> 
>>>>>> 
>>>>>> Bert Gunter
>>>>>> 
>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com
>>>>>> <mailto:agents at meddatainc.com>> wrote:
>>>>>> 
>>>>>>      I have created a dataframe with columns that are
characters,
>>>>>> integers and numeric and with column names assigned by
me. I am
>>>>>> using read.csv.sql() to read portions of a number of
large csv
>>>>>> files into this dataframe, each csv file having a
header row with
>>>>>> columb names.
>>>>>> 
>>>>>>      The problem I am having is that the csv files have
header
>>>>>> rows with column names that are slightly different from
the column
>>>>>> names I have assigned in the dataframe and it seems
that when I
>>>>>> read the csv data into the dataframe, the column names
from the
>>>>>> csv file replace the column names I chose when creating
the
>>>>>> dataframe.
>>>>>> 
>>>>>>      I have been unable to figure out if it is possible
to assign
>>>>>> column names of my choosing in the read.csv.sql()
function? I have
>>>>>> tried various variations but none seem to work. I tried
colClasses
>>>>>> = c(....) but that did not work, I tried field.types =
c(...) but
>>>>>> could not get that to work either.
>>>>>> 
>>>>>>      It seems that the above should be feasible but I
am missing
>>>>>> something? Does anyone know?
>>>>>> 
>>>>>>      A secondary issue is that the csv files have a
column with a
>>>>>> date in mm/dd/yyyy format that I would like to make
into a Date
>>>>>> type column in my dataframe. Again, I have been unable
to find a
>>>>>> way - if at all possible - to force a conversion into a
Date
>>>>>> format when importing into the dataframe. The best I
have so far
>>>>>> is to import is a character column and then use
as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>> 
>>>>>>      Is it possible to do this when importing using 
>>>>>> read.csv.sql()?
>>>>>> 
>>>>>>      ______________________________________________
>>>>>>      R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      PLEASE do read the posting guide 
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>      and provide commented, minimal, self-contained,
reproducible
>>>>>> code.
>>>>>> 
>>>>> Yes, the files use the same column names and date format
(at least
>>>>> as far as I know now.) I agree I could do it as you suggest
above
>>>>> but from a purist perspective I would rather do it when
importing
>>>>> the data using read.csv.sql(), particularly if column names
and/or
>>>>> date format might change, or be different between different
files.
>>>>> I am indeed selecting rows from a large number of csv files
so this
>>>>> is entirely plausible.
>>>>> 
>>>>> Has anyone been able to name columns in the read.csv.sql()
call
>>>>> and/or force date format conversion in the call itself? The
first
>>>>> refers to naming columns differently from what a header in
the csv
>>>>> file may have.
>>>>> 
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses() 
>>> and/or field.types() should work but I may well have misunderstood 
>>> the documentation, hence my question in this group.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rasmus Liland

2020-Jul-29 15:40 UTC

head link

[R] Help with read.csv.sql()

Dear Ulrik,

On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help
wrote:> library(readr)
> read_csv(
This thread was about 
sqldf::read.csv.sql ... 

What is the purpose of bringing up 
readr::read_csv?  I am unfamilliar with 
it, so it might be a good one.

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200729/84c29673/attachment.sig>

R help - Jul 2020 - Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()