thr3ads.net - R help - [R] Help with read.csv.sql() [Jul 2020]

If this information is useful, please help other people find it:
Share via:

2020-Jul-18 16:59 UTC

[R] Help with read.csv.sql()

On 07/18/2020 11:54 AM, Rui Barradas wrote:> Hello,
>
> I don't believe that what you are asking for is possible but like Bert
suggested, you can do it after reading in the data.
> You could write a convenience function to read the data, then change what
you need to change.
> Then the function would return this final object.
>
> Rui Barradas
>
> ?s 16:43 de 18/07/2020, H escreveu:
>
>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>> Is there some reason that you can't make the changes to the
data frame (column names, as.date(), ...) *after* you have read all your data
in?
>>>
>>> Do all your csv files use the same names and date formats?
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep
coming along and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>
>>>
>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com
<mailto:agents at meddatainc.com>> wrote:
>>>
>>> ???? I have created a dataframe with columns that are characters,
integers and numeric and with column names assigned by me. I am using
read.csv.sql() to read portions of a number of large csv files into this
dataframe, each csv file having a header row with columb names.
>>>
>>> ???? The problem I am having is that the csv files have header rows
with column names that are slightly different from the column names I have
assigned in the dataframe and it seems that when I read the csv data into the
dataframe, the column names from the csv file replace the column names I chose
when creating the dataframe.
>>>
>>> ???? I have been unable to figure out if it is possible to assign
column names of my choosing in the read.csv.sql() function? I have tried various
variations but none seem to work. I tried colClasses = c(....) but that did not
work, I tried field.types = c(...) but could not get that to work either.
>>>
>>> ???? It seems that the above should be feasible but I am missing
something? Does anyone know?
>>>
>>> ???? A secondary issue is that the csv files have a column with a
date in mm/dd/yyyy format that I would like to make into a Date type column in
my dataframe. Again, I have been unable to find a way - if at all possible - to
force a conversion into a Date format when importing into the dataframe. The
best I have so far is to import is a character column and then use as.Date() to
later force the conversion of the dataframe column.
>>>
>>> ???? Is it possible to do this when importing using read.csv.sql()?
>>>
>>> ???? ______________________________________________
>>> ???? R-help at r-project.org <mailto:R-help at r-project.org>
mailing list -- To UNSUBSCRIBE and more, see
>>> ???? https://stat.ethz.ch/mailman/listinfo/r-help
>>> ???? PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> ???? and provide commented, minimal, self-contained, reproducible
code.
>>>
>> Yes, the files use the same column names and date format (at least as
far as I know now.) I agree I could do it as you suggest above but from a purist
perspective I would rather do it when importing the data using read.csv.sql(),
particularly if column names and/or date format might change, or be different
between different files. I am indeed selecting rows from a large number of csv
files so this is entirely plausible.
>>
>> Has anyone been able to name columns in the read.csv.sql() call and/or
force date format conversion in the call itself? The first refers to naming
columns differently from what a header in the csv file may have.
>>
>>
>> ????[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>The documentation for read.csv.sql() suggests that colClasses() and/or
field.types() should work but I may well have misunderstood the documentation,
hence my question in this group.

Rui Barradas

2020-Jul-18 17:09 UTC

head link

[R] Help with read.csv.sql()

Hello,

The documentation says the following.

field.types
A list whose names are the column names and whose contents are the 
SQLite types (not the R class names) of the columns.


So argument field.types is a named list.

 ?- The list members names are the column names of the table to be read.
 ?- The list members values are SQLite types, like "CHAR",
"VARCHAR",
"INT", etc.


As for colClasses, those are R class names.

Rui Barradas




?s 17:59 de 18/07/2020, H escreveu:> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>> Hello,
>>
>> I don't believe that what you are asking for is possible but like
Bert suggested, you can do it after reading in the data.
>> You could write a convenience function to read the data, then change
what you need to change.
>> Then the function would return this final object.
>>
>> Rui Barradas
>>
>> ?s 16:43 de 18/07/2020, H escreveu:
>>
>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>> Is there some reason that you can't make the changes to the
data frame (column names, as.date(), ...) *after* you have read all your data
in?
>>>>
>>>> Do all your csv files use the same names and date formats?
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep
coming along and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>>
>>>>
>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com
<mailto:agents at meddatainc.com>> wrote:
>>>>
>>>>  ???? I have created a dataframe with columns that are
characters, integers and numeric and with column names assigned by me. I am
using read.csv.sql() to read portions of a number of large csv files into this
dataframe, each csv file having a header row with columb names.
>>>>
>>>>  ???? The problem I am having is that the csv files have header
rows with column names that are slightly different from the column names I have
assigned in the dataframe and it seems that when I read the csv data into the
dataframe, the column names from the csv file replace the column names I chose
when creating the dataframe.
>>>>
>>>>  ???? I have been unable to figure out if it is possible to
assign column names of my choosing in the read.csv.sql() function? I have tried
various variations but none seem to work. I tried colClasses = c(....) but that
did not work, I tried field.types = c(...) but could not get that to work
either.
>>>>
>>>>  ???? It seems that the above should be feasible but I am
missing something? Does anyone know?
>>>>
>>>>  ???? A secondary issue is that the csv files have a column
with a date in mm/dd/yyyy format that I would like to make into a Date type
column in my dataframe. Again, I have been unable to find a way - if at all
possible - to force a conversion into a Date format when importing into the
dataframe. The best I have so far is to import is a character column and then
use as.Date() to later force the conversion of the dataframe column.
>>>>
>>>>  ???? Is it possible to do this when importing using
read.csv.sql()?
>>>>
>>>>  ???? ______________________________________________
>>>>  ???? R-help at r-project.org <mailto:R-help at
r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>>>>  ???? https://stat.ethz.ch/mailman/listinfo/r-help
>>>>  ???? PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>  ???? and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>> Yes, the files use the same column names and date format (at least
as far as I know now.) I agree I could do it as you suggest above but from a
purist perspective I would rather do it when importing the data using
read.csv.sql(), particularly if column names and/or date format might change, or
be different between different files. I am indeed selecting rows from a large
number of csv files so this is entirely plausible.
>>>
>>> Has anyone been able to name columns in the read.csv.sql() call
and/or force date format conversion in the call itself? The first refers to
naming columns differently from what a header in the csv file may have.
>>>
>>>
>>>  ????[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> The documentation for read.csv.sql() suggests that colClasses() and/or
field.types() should work but I may well have misunderstood the documentation,
hence my question in this group.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast.
https://www.avast.com/antivirus

William Michels

2020-Jul-18 17:38 UTC

head link

[R] Help with read.csv.sql()

Do either of the postings/threads below help?

https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html

Otherwise you can try reading through the FAQ on Github:

https://github.com/ggrothendieck/sqldf

HTH, Bill.

W. Michels, Ph.D.



On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com>
wrote:>
> On 07/18/2020 11:54 AM, Rui Barradas wrote:
> > Hello,
> >
> > I don't believe that what you are asking for is possible but like
Bert suggested, you can do it after reading in the data.
> > You could write a convenience function to read the data, then change
what you need to change.
> > Then the function would return this final object.
> >
> > Rui Barradas
> >
> > ?s 16:43 de 18/07/2020, H escreveu:
> >
> >> On 07/17/2020 09:49 PM, Bert Gunter wrote:
> >>> Is there some reason that you can't make the changes to
the data frame (column names, as.date(), ...) *after* you have read all your
data in?
> >>>
> >>> Do all your csv files use the same names and date formats?
> >>>
> >>>
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep
coming along and sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >>>
> >>>
> >>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com
<mailto:agents at meddatainc.com>> wrote:
> >>>
> >>>      I have created a dataframe with columns that are
characters, integers and numeric and with column names assigned by me. I am
using read.csv.sql() to read portions of a number of large csv files into this
dataframe, each csv file having a header row with columb names.
> >>>
> >>>      The problem I am having is that the csv files have header
rows with column names that are slightly different from the column names I have
assigned in the dataframe and it seems that when I read the csv data into the
dataframe, the column names from the csv file replace the column names I chose
when creating the dataframe.
> >>>
> >>>      I have been unable to figure out if it is possible to
assign column names of my choosing in the read.csv.sql() function? I have tried
various variations but none seem to work. I tried colClasses = c(....) but that
did not work, I tried field.types = c(...) but could not get that to work
either.
> >>>
> >>>      It seems that the above should be feasible but I am
missing something? Does anyone know?
> >>>
> >>>      A secondary issue is that the csv files have a column
with a date in mm/dd/yyyy format that I would like to make into a Date type
column in my dataframe. Again, I have been unable to find a way - if at all
possible - to force a conversion into a Date format when importing into the
dataframe. The best I have so far is to import is a character column and then
use as.Date() to later force the conversion of the dataframe column.
> >>>
> >>>      Is it possible to do this when importing using
read.csv.sql()?
> >>>
> >>>      ______________________________________________
> >>>      R-help at r-project.org <mailto:R-help at
r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> >>>      https://stat.ethz.ch/mailman/listinfo/r-help
> >>>      PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >>>      and provide commented, minimal, self-contained,
reproducible code.
> >>>
> >> Yes, the files use the same column names and date format (at least
as far as I know now.) I agree I could do it as you suggest above but from a
purist perspective I would rather do it when importing the data using
read.csv.sql(), particularly if column names and/or date format might change, or
be different between different files. I am indeed selecting rows from a large
number of csv files so this is entirely plausible.
> >>
> >> Has anyone been able to name columns in the read.csv.sql() call
and/or force date format conversion in the call itself? The first refers to
naming columns differently from what a header in the csv file may have.
> >>
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> The documentation for read.csv.sql() suggests that colClasses() and/or
field.types() should work but I may well have misunderstood the documentation,
hence my question in this group.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rasmus Liland

2020-Jul-19 03:42 UTC

head link

[R] Help with read.csv.sql()

On 2020-07-18 18:09 +0100, Rui Barradas wrote:
| ?s 17:59 de 18/07/2020, H escreveu:
| | On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com> wrote:
| | | 
| | | The problem I am having is that 
| | | the csv files have header rows 
| | | with column names that are 
| | | slightly different from the column 
| | | names I have assigned in the 
| | | dataframe and it seems that when I 
| | | read the csv data into the 
| | | dataframe, the column names from 
| | | the csv file replace the column 
| | | names I chose when creating the 
| | | dataframe.
| | | 
| | | A secondary issue is that the csv 
| | | files have a column with a date in 
| | | mm/dd/yyyy format that I would 
| | | like to make into a Date type 
| | | column in my dataframe. Again, I 
| | | have been unable to find a way - 
| | | if at all possible - to force a 
| | | conversion into a Date format when 
| | | importing into the dataframe. The 
| | | best I have so far is to import is 
| | | a character column and then use 
| | | as.Date() to later force the 
| | | conversion of the dataframe 
| | | column.
| | 
| | The documentation for read.csv.sql() 
| | suggests that colClasses() and/or 
| | field.types() should work but I may 
| | well have misunderstood the 
| | documentation, hence my question in 
| | this group.
| 
| As for colClasses, those are R class 
| names.

Ok Mister H, I might have hit the nail 
on the head this time with this badass 
example for your usecase:

	# Make a csv with %d/%m/%Y dates in it ...
	Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158
	STM05-1 2005/02/28 19:44 Good -35.487 177.129
	STM05-1 2005/02/28 23:01 Unknown -35.399 177.064
	STM05-1 2005/03/01 07:28 Unknown -34.978 177.268
	STM05-1 2005/03/01 18:06 Poor -34.799 177.027
	STM05-1 2005/03/01 18:47 Poor -34.85 177.059
	STM05-2 2005/02/28 12:49 Good -35.928 177.328
	STM05-2 2005/02/28 21:23 Poor -35.926 177.314
	"
	DF <- read.table(textConnection(Lines), as.is = TRUE,
	  col.names = c("Id", "Date", "Time",
"Quality", "Lat", "Long"))
	DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"),
"%d/%m/%Y")
	write.csv(DF, file="df.csv", row.names=FALSE)
	
	colClasses <-
	  c("character",
	    "Date",
	    "character",
	    "character",
	    "numeric",
	    "numeric")
	sql <- paste0(
	  "select ",
	    "date(",  # [2]
	      "substr(Date, 8, 4) || '-' || ",  # [1]
	      "substr(Date, 5, 2) || '-' || ",
	      "substr(Date, 2, 2)), Long, Lat, Quality ",
	  "from ff where Quality like '%oo%' and Long>177.129")
	ff <- file(description="df.csv", open="r")
	dat <- sqldf::read.csv.sql(
	  sql=sql, colClasses=colClasses)
	close(ff)
	
	str(dat)
	
	as.Date(dat[,1])
	dat[,3]

Both sqlite and Postgres has a function 
substr you can call on strings like 
this.[5]  I have a hunch this has always 
been possible in sql from way back ... 

The warning from sqldf about unused 
connections, might suggest file 
descriptor handling to be a bit crusty 
... [3] 

The thing is, defining the second column 
as of type Date in colClasses happens to 
work, but it's still character when you 
check with str(dat) ... perhaps it has 
something to do with this info from [4]: 

	as_tibble_row() converts a vector to 
	a tibble with one row. The input 
	must be a bare vector, e.g. vectors 
	of dates are not supported yet. If 
	the input is a list, all elements 
	must have length one.

[1]
https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite
[2] https://www.sqlite.org/lang_datefunc.html
[3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8
[4] https://tibble.tidyverse.org/reference/as_tibble.html
[5] https://www.sqlite.org/lang_corefunc.html#substr, 
https://www.postgresql.org/docs/9.1/functions-string.html,
http://www.h2database.com/html/functions.html#substring 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200719/f6c63baa/attachment.sig>

2020-Jul-20 00:07 UTC

head link

[R] Help with read.csv.sql()

On 07/18/2020 01:38 PM, William Michels wrote:> Do either of the postings/threads below help?
>
>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>
> Otherwise you can try reading through the FAQ on Github:
>
> https://github.com/ggrothendieck/sqldf
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote:
>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>> Hello,
>>>
>>> I don't believe that what you are asking for is possible but
like Bert suggested, you can do it after reading in the data.
>>> You could write a convenience function to read the data, then
change what you need to change.
>>> Then the function would return this final object.
>>>
>>> Rui Barradas
>>>
>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>
>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>> Is there some reason that you can't make the changes to
the data frame (column names, as.date(), ...) *after* you have read all your
data in?
>>>>>
>>>>> Do all your csv files use the same names and date formats?
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people
keep coming along and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>
>>>>>
>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com <mailto:agents at meddatainc.com>> wrote:
>>>>>
>>>>>      I have created a dataframe with columns that are
characters, integers and numeric and with column names assigned by me. I am
using read.csv.sql() to read portions of a number of large csv files into this
dataframe, each csv file having a header row with columb names.
>>>>>
>>>>>      The problem I am having is that the csv files have
header rows with column names that are slightly different from the column names
I have assigned in the dataframe and it seems that when I read the csv data into
the dataframe, the column names from the csv file replace the column names I
chose when creating the dataframe.
>>>>>
>>>>>      I have been unable to figure out if it is possible to
assign column names of my choosing in the read.csv.sql() function? I have tried
various variations but none seem to work. I tried colClasses = c(....) but that
did not work, I tried field.types = c(...) but could not get that to work
either.
>>>>>
>>>>>      It seems that the above should be feasible but I am
missing something? Does anyone know?
>>>>>
>>>>>      A secondary issue is that the csv files have a column
with a date in mm/dd/yyyy format that I would like to make into a Date type
column in my dataframe. Again, I have been unable to find a way - if at all
possible - to force a conversion into a Date format when importing into the
dataframe. The best I have so far is to import is a character column and then
use as.Date() to later force the conversion of the dataframe column.
>>>>>
>>>>>      Is it possible to do this when importing using
read.csv.sql()?
>>>>>
>>>>>      ______________________________________________
>>>>>      R-help at r-project.org <mailto:R-help at
r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>      PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>      and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>> Yes, the files use the same column names and date format (at
least as far as I know now.) I agree I could do it as you suggest above but from
a purist perspective I would rather do it when importing the data using
read.csv.sql(), particularly if column names and/or date format might change, or
be different between different files. I am indeed selecting rows from a large
number of csv files so this is entirely plausible.
>>>>
>>>> Has anyone been able to name columns in the read.csv.sql() call
and/or force date format conversion in the call itself? The first refers to
naming columns differently from what a header in the csv file may have.
>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>> The documentation for read.csv.sql() suggests that colClasses() and/or
field.types() should work but I may well have misunderstood the documentation,
hence my question in this group.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
I had read the sqldf() documentation but was left with the impression that what
I want to do is not easily doable.

R help - Jul 2020 - Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()

[R] Help with read.csv.sql()